Skip to contents

Adjusts observed OD flows so that their margins match given benchmark totals using iterative proportional fitting (IPF), also known as raking.

Usage

adjust_raking_ratio(
  mpd_od_df,
  origin_targets = NULL,
  destination_targets = NULL,
  benchmark_od_df = NULL,
  flow_col_bench = "flow",
  group_cols = NULL,
  max_iter = 200,
  tol = 1e-06,
  clip_min = 0,
  clip_max = Inf,
  keep_cols = character()
)

Arguments

mpd_od_df

Data frame with at least: origin, destination, flow; an mpd_source column is carried through if present. May include stratification variables named in group_cols.

origin_targets

Optional data frame of origin margin targets with: origin, target, and, if using group_cols, also those columns. Targets are interpreted within each (group_cols) subset.

destination_targets

Optional data frame of destination margin targets with: destination, target, and, if using group_cols, also those columns.

benchmark_od_df

Optional benchmark OD with columns: origin, destination and a benchmark flow column (see flow_col_bench). If supplied and origin_targets and/or destination_targets are NULL, margins are derived from this benchmark (by origin / destination, optionally stratified).

flow_col_bench

Name of benchmark flow column in benchmark_od_df. Default "flow".

group_cols

Optional character vector of stratification variables (e.g. c("age_group","sex")). If provided, these must exist in mpd_od_df and the corresponding target tables; raking is performed independently within each group combination.

max_iter

Maximum IPF iterations. Default 200.

tol

Convergence tolerance on relative margin differences. Default 1e-6.

clip_min

Lower bound used to clamp resulting cell weights. Default 0.

clip_max

Upper bound used to clamp resulting cell weights. Default Inf.

keep_cols

Optional character vector of extra columns from mpd_od_df to keep in the output.

Value

A tibble with:

  • origin, destination, (mpd_source), (group_cols)

  • flow: original observed flow

  • flow_adj: raked flow

  • weight_ipf: multiplicative weight = flow_adj / flow

The output also includes attributes:

  • "ipf_converged": logical.

  • "ipf_iterations": iterations used.

Details

Notation used throughout:

  • \(F_{ij}^{mpd}\): observed MPD flow from origin \(i\) to destination \(j\)

  • \(F_{ij}^{adj}\): adjusted flow

  • \(T_i^{(O)}\): target total outflow for origin \(i\)

  • \(T_j^{(D)}\): target total inflow for destination \(j\)

  • \(w_{ij}\): multiplicative IPF weight

This is a generic implementation that covers:

  1. Location-only case (most users): raking on origin and/or destination totals derived from benchmark flows or population.

  2. Stratified case (age, sex, etc.): raking within each combination of group_cols, using group-specific origin and destination margins.

The method operates on aggregated flows (no microdata) and is deliberately transparent:

$$F_{ij}^{adj} = F_{ij}^{mpd} \times w_{ij}$$

where the cell weights \(w_{ij}\) are determined so that:

  • \(\sum_j F_{ij}^{adj} = T_i^{(O)}\) for all origins with supplied origin targets, and/or

  • \(\sum_i F_{ij}^{adj} = T_j^{(D)}\) for all destinations with supplied destination targets,

with \(T_i^{(O)}\) and \(T_j^{(D)}\) provided by the user, typically from census or high-quality benchmark flows or populations.

  • If only origin_targets is supplied, raking enforces origin margins.

  • If only destination_targets is supplied, raking enforces destination margins.

  • If both are supplied (or derived from benchmark_od_df), standard bi-proportional IPF is performed.

  • Cells with zero initial flow cannot be created by this implementation; if benchmark margins suggest mass in structurally zero cells, margins will not be matched exactly. This is by design and should be inspected.