Aim
This vignette shows how origin-destination (OD) flows can be prepared for visual inspection. The aim is not to replace the formal validation diagnostics in debiasR, but to give a quick way to see which origin-destination corridors carry the largest flows and where those corridors are located.
The examples use benchmark Census travel-to-work flows between Local Authority Districts (LADs) from debiasRdata, together with LAD centroid coordinates. The same structure can be used for observed mobile-phone-derived flows or adjusted flows returned by debiasR.
Getting ready
The README shows how to install debiasR and the companion debiasRdata package from GitHub. This vignette also uses dplyr for data preparation and the experimental flow-map branch of mapgl for interactive visualisation.
The interactive map uses experimental flow-map support from mapgl. If the package is not installed, the preparation code below still shows the data structure needed for the map.
We start with the LAD benchmark OD-flow table and the LAD centroid table. The OD-flow table stores one row per origin-destination pair. The centroid table stores one coordinate pair per LAD.
benchmark_flows <- debiasRdata::census_lad_OD_travel2work
centroids <- debiasRdata::lad_centroidsIdentify the largest flows
Flow maps can become difficult to read if every OD pair is plotted at once. A practical first step is to inspect the largest flows and then pass a bounded set of between-area flows to the interactive map.
largest_flows <- benchmark_flows |>
dplyr::filter(origin != destination) |>
dplyr::slice_max(order_by = flow, n = 40, with_ties = FALSE)
largest_flows |>
dplyr::slice_head(n = 10) origin destination flow
1 E08000025 E08000029 23181
2 E08000028 E08000025 22798
3 E06000025 E06000023 22358
4 E06000011 E06000010 21836
5 E06000010 E06000011 18706
6 E06000023 E06000025 18510
7 E08000032 E08000035 17226
8 E08000018 E08000019 16970
9 E08000029 E08000025 16859
10 E08000014 E08000012 16145
The table above shows the largest between-area flows. Within-area flows are excluded here because they have the same origin and destination.
Prepare map inputs
mapgl::add_flowmap() expects two plain data frames:
-
locations, with one row per area and columns identifying the area, longitude and latitude. -
flows, with one row per OD pair and columns identifying the origin, destination and flow count.
We create those inputs from the LAD centroid and Census OD-flow tables.
locations <- data.frame(
id = centroids$area,
name = centroids$name,
lon = centroids$longitude,
lat = centroids$latitude,
stringsAsFactors = FALSE
)
flows <- benchmark_flows |>
dplyr::filter(origin != destination, flow > 0) |>
dplyr::slice_max(order_by = flow, n = 10000, with_ties = FALSE) |>
dplyr::left_join(
centroids |>
dplyr::select(origin = area, origin_name = name),
by = "origin"
) |>
dplyr::left_join(
centroids |>
dplyr::select(destination = area, dest_name = name),
by = "destination"
) |>
dplyr::transmute(
origin = origin,
dest = destination,
origin_name = origin_name,
dest_name = dest_name,
count = flow
)
head(locations) id name lon lat
1 E06000001 Hartlepool -1.27018 54.67614
2 E06000002 Middlesbrough -1.21099 54.54467
3 E06000003 Redcar and Cleveland -1.00608 54.56752
4 E06000004 Stockton-on-Tees -1.30664 54.55691
5 E06000005 Darlington -1.56835 54.53534
6 E06000006 Halton -2.68853 53.33424
head(flows) origin dest origin_name dest_name
1 E08000025 E08000029 Birmingham Solihull
2 E08000028 E08000025 Sandwell Birmingham
3 E06000025 E06000023 South Gloucestershire Bristol, City of
4 E06000011 E06000010 East Riding of Yorkshire Kingston upon Hull, City of
5 E06000010 E06000011 Kingston upon Hull, City of East Riding of Yorkshire
6 E06000023 E06000025 Bristol, City of South Gloucestershire
count
1 23181
2 22798
3 22358
4 21836
5 18706
6 18510
The locations data frame links LAD codes to coordinates. The flows data frame stores the OD pairs to display. Here we keep the 10,000 largest between-area Census flows to keep the browser widget responsive.
Interactive flow map
The map below uses mapgl::add_flowmap() to display LAD travel-to-work flows. The flow layer is interactive: you can zoom, pan and inspect flow corridors in the rendered HTML page.
mapgl::maplibre(
style = mapgl::carto_style("dark-matter"),
center = c(-2.0, 54.0),
zoom = 5
) |>
mapgl::add_flowmap(
id = "census-lad-flows",
locations = locations,
flows = flows,
flow_color_scheme = "Teal",
flow_dark_mode = TRUE,
flow_lines_rendering_mode = "curved",
flow_line_thickness_scale = 1.1,
flow_clustering_enabled = TRUE,
flow_max_top_flows_display_num = 10000,
tooltip = list(
location = "{name}",
flow = "{origin.name} -> {dest.name}<br>{count}"
)
)The map highlights major benchmark travel-to-work corridors. This kind of visualisation is useful for checking whether a small number of corridors dominate the OD system, identifying flows that may deserve closer inspection and comparing raw, adjusted or benchmark flow patterns.
Visualise other flow columns
The same workflow can be used for observed mobile-phone-derived flows or adjusted flows. The only requirement is that the table has an origin column, a destination column and a numeric flow column. For example, after using an adjustment method from debiasR, replace benchmark_flows with the adjusted table and select flow_adj as the plotted value.
adjusted_flows <- debiasR::adjust_inverse_penetration(
mpd_od_df = debiasRdata::lad_OD_travel2work,
coverage_df = debiasRdata::coverage_lad,
weight_by = "origin"
)
largest_adjusted_flows <- adjusted_flows |>
dplyr::filter(origin != destination) |>
dplyr::slice_max(order_by = flow_adj, n = 40, with_ties = FALSE)In a validation workflow, visualisation is most useful when it is paired with formal metrics. A map can show where large flows are located, but it does not quantify whether adjusted flows are closer to a benchmark. For that, use the validation methods in the validation vignette.
Installation note
The flow-map functionality used above is currently available from an experimental mapgl branch. You can install it with:
# install.packages("pak")
pak::pak("e-kotov/mapgl@flowmap")Other packages can also be used for flow visualisation. flowmapper is a possible route for static flow maps, while flowmapblue provides interactive flow maps but is less current than the emerging mapgl route.