Validate origin-conditioned destination-share distributions — validate_flow

Compares OD-flow distributions after normalizing each origin's destination flows into shares. This is a distributional allocation validation diagnostic: it checks whether each origin allocates its flows across destinations in a similar way to a reference OD table, rather than checking individual OD-cell magnitudes. By default, the function compares adjusted OD flows with benchmark OD flows and reports KL(benchmark || adjusted) plus Jensen-Shannon divergence. It can also compare raw MPD versus benchmark and raw MPD versus adjusted flows through comparisons. For each comparison ID, the first series is the x/baseline distribution, the second series is the y/reference distribution, and directional KL is KL(Y || X). Lower values indicate closer destination-allocation fidelity. These metrics assess spatial allocation, not total flow scale or individual OD-pair residual size.

Usage

validate_flow_distribution(
  adj_df,
  benchmark_od_df,
  flow_col_adj = "flow_adj",
  flow_col_mpd = "flow",
  flow_col_bench = "flow",
  epsilon = 1e-08,
  method_name = NA_character_,
  weight_by = c("none", "benchmark_origin_total"),
  comparisons = c("adjusted_vs_benchmark"),
  return_origin_level = TRUE,
  return_od_level = FALSE
)

Arguments

adj_df: Data frame with at least origin, destination, and an adjusted flow column (default "flow_adj").
benchmark_od_df: Data frame with at least origin, destination, and a benchmark flow column (default "flow").
flow_col_adj: Name of adjusted flow column in adj_df. Default "flow_adj".
flow_col_mpd: Name of raw MPD flow column in adj_df. Default "flow". Required only when comparisons includes raw MPD flows.
flow_col_bench: Name of benchmark flow column in benchmark_od_df. Default "flow".
epsilon: Small positive smoothing constant added to benchmark and adjusted flows before shares are computed. Default 1e-8.
method_name: Optional label for the adjustment method. Stored in the summary and origin-level outputs.
weight_by: Weighting rule for weighted summary means. Currently "none" or "benchmark_origin_total".
comparisons: Distribution comparisons to compute. The default "adjusted_vs_benchmark" preserves the original behavior. Use "all" to compute "adjusted_vs_benchmark", "raw_vs_benchmark", and "raw_vs_adjusted".
return_origin_level: Logical, return one row per origin in the output. Default TRUE.
return_od_level: Logical, return OD-level share and contribution rows. Default FALSE.

Value

A list with:

summary: one row per requested comparison with origin count, mean, median, and optionally benchmark-total-weighted mean KL and JSD,
origin_level: origin-level tibble with KL, JSD, destination count, raw, adjusted, benchmark, reference, and comparison origin totals, plus zero-total flags when return_origin_level = TRUE,
od_level: OD-level share and contribution rows when return_od_level = TRUE.