Raking ratio / IPF adjustment of OD flows — adjust_raking

Adjusts observed OD flows so that their margins match given benchmark totals using iterative proportional fitting (IPF), also known as raking.

Usage

adjust_raking_ratio(
  mpd_od_df,
  origin_targets = NULL,
  destination_targets = NULL,
  benchmark_od_df = NULL,
  flow_col_bench = "flow",
  group_cols = NULL,
  max_iter = 200,
  tol = 1e-06,
  clip_min = 0,
  clip_max = Inf,
  keep_cols = character()
)

Arguments

mpd_od_df: Data frame with at least: origin, destination, flow; an mpd_source column is carried through if present. May include stratification variables named in group_cols.
origin_targets: Optional data frame of origin margin targets with: origin, target, and, if using group_cols, also those columns. Targets are interpreted within each (group_cols) subset.
destination_targets: Optional data frame of destination margin targets with: destination, target, and, if using group_cols, also those columns.
benchmark_od_df: Optional benchmark OD with columns: origin, destination and a benchmark flow column (see flow_col_bench). If supplied and origin_targets and/or destination_targets are NULL, margins are derived from this benchmark (by origin / destination, optionally stratified).
flow_col_bench: Name of benchmark flow column in benchmark_od_df. Default "flow".
group_cols: Optional character vector of stratification variables (e.g. c("age_group","sex")). If provided, these must exist in mpd_od_df and the corresponding target tables; raking is performed independently within each group combination.
max_iter: Maximum IPF iterations. Default 200.
tol: Convergence tolerance on relative margin differences. Default 1e-6.
clip_min: Lower bound used to clamp resulting cell weights. Default 0.
clip_max: Upper bound used to clamp resulting cell weights. Default Inf.
keep_cols: Optional character vector of extra columns from mpd_od_df to keep in the output.

Value

A tibble with:

origin, destination, (mpd_source), (group_cols)
flow: original observed flow
flow_adj: raked flow
weight_ipf: multiplicative weight = flow_adj / flow

The output also includes attributes:

"ipf_converged": logical.
"ipf_iterations": iterations used.

Details

Notation used throughout:

$F_{ij}^{mpd}$: observed MPD flow from origin $i$ to destination $j$
$F_{ij}^{adj}$: adjusted flow
$T_i^{(O)}$: target total outflow for origin $i$
$T_j^{(D)}$: target total inflow for destination $j$
$w_{ij}$: multiplicative IPF weight

This is a generic implementation that covers:

Location-only case (most users): raking on origin and/or destination totals derived from benchmark flows or population.
Stratified case (age, sex, etc.): raking within each combination of group_cols, using group-specific origin and destination margins.

The method operates on aggregated flows (no microdata) and is deliberately transparent:

$$F_{ij}^{adj} = F_{ij}^{mpd} \times w_{ij}$$

where the cell weights $w_{ij}$ are determined so that:

$\sum_j F_{ij}^{adj} = T_i^{(O)}$ for all origins with supplied origin targets, and/or
$\sum_i F_{ij}^{adj} = T_j^{(D)}$ for all destinations with supplied destination targets,

with $T_i^{(O)}$ and $T_j^{(D)}$ provided by the user, typically from census or high-quality benchmark flows or populations.

If only origin_targets is supplied, raking enforces origin margins.
If only destination_targets is supplied, raking enforces destination margins.
If both are supplied (or derived from benchmark_od_df), standard bi-proportional IPF is performed.
Cells with zero initial flow cannot be created by this implementation; if benchmark margins suggest mass in structurally zero cells, margins will not be matched exactly. This is by design and should be inspected.