Adjusts observed OD flows so that their margins match given benchmark totals using iterative proportional fitting (IPF), also known as raking.
Usage
adjust_raking_ratio(
mpd_od_df,
origin_targets = NULL,
destination_targets = NULL,
benchmark_od_df = NULL,
flow_col_bench = "flow",
group_cols = NULL,
max_iter = 200,
tol = 1e-06,
clip_min = 0,
clip_max = Inf,
keep_cols = character()
)Arguments
- mpd_od_df
Data frame with at least:
origin, destination, flow; anmpd_sourcecolumn is carried through if present. May include stratification variables named ingroup_cols.- origin_targets
Optional data frame of origin margin targets with:
origin, target, and, if usinggroup_cols, also those columns. Targets are interpreted within each (group_cols) subset.- destination_targets
Optional data frame of destination margin targets with:
destination, target, and, if usinggroup_cols, also those columns.- benchmark_od_df
Optional benchmark OD with columns:
origin, destinationand a benchmark flow column (seeflow_col_bench). If supplied andorigin_targetsand/ordestination_targetsare NULL, margins are derived from this benchmark (by origin / destination, optionally stratified).- flow_col_bench
Name of benchmark flow column in
benchmark_od_df. Default "flow".- group_cols
Optional character vector of stratification variables (e.g.
c("age_group","sex")). If provided, these must exist inmpd_od_dfand the corresponding target tables; raking is performed independently within each group combination.- max_iter
Maximum IPF iterations. Default 200.
- tol
Convergence tolerance on relative margin differences. Default 1e-6.
- clip_min
Lower bound used to clamp resulting cell weights. Default 0.
- clip_max
Upper bound used to clamp resulting cell weights. Default Inf.
- keep_cols
Optional character vector of extra columns from
mpd_od_dfto keep in the output.
Value
A tibble with:
origin, destination, (mpd_source), (group_cols)
flow: original observed flow
flow_adj: raked flow
weight_ipf: multiplicative weight = flow_adj / flow
The output also includes attributes:
"ipf_converged": logical."ipf_iterations": iterations used.
Details
Notation used throughout:
\(F_{ij}^{mpd}\): observed MPD flow from origin \(i\) to destination \(j\)
\(F_{ij}^{adj}\): adjusted flow
\(T_i^{(O)}\): target total outflow for origin \(i\)
\(T_j^{(D)}\): target total inflow for destination \(j\)
\(w_{ij}\): multiplicative IPF weight
This is a generic implementation that covers:
Location-only case (most users): raking on origin and/or destination totals derived from benchmark flows or population.
Stratified case (age, sex, etc.): raking within each combination of
group_cols, using group-specific origin and destination margins.
The method operates on aggregated flows (no microdata) and is deliberately transparent:
$$F_{ij}^{adj} = F_{ij}^{mpd} \times w_{ij}$$
where the cell weights \(w_{ij}\) are determined so that:
\(\sum_j F_{ij}^{adj} = T_i^{(O)}\) for all origins with supplied origin targets, and/or
\(\sum_i F_{ij}^{adj} = T_j^{(D)}\) for all destinations with supplied destination targets,
with \(T_i^{(O)}\) and \(T_j^{(D)}\) provided by the user, typically from census or high-quality benchmark flows or populations.
If only
origin_targetsis supplied, raking enforces origin margins.If only
destination_targetsis supplied, raking enforces destination margins.If both are supplied (or derived from
benchmark_od_df), standard bi-proportional IPF is performed.Cells with zero initial flow cannot be created by this implementation; if benchmark margins suggest mass in structurally zero cells, margins will not be matched exactly. This is by design and should be inspected.