Mobile-phone-derived location records can provide timely evidence on where people are located and how they move between places. However, these records do not represent all people and places equally. Some groups, areas, devices, operators and time periods may be more visible in the data than others because people differ in phone ownership and use, operators differ in market share and network coverage, and some observation windows capture some movements better than others.
The challenge
A population count based on active devices does not always represent the true population count. Similarly, OD flows derived from mobile phone data do not necessarily represent the true number of people moving from an origin location to a destination location. Mobile phone records may underrepresent populations and flows in some places and overrepresent them in others.
A common way to assess whether mobile phone data offer a good representation of the underlying population is to check whether observed device counts are approximately proportional to benchmark population counts across areas. This assessment can be useful as a quick “sanity check”, but it is not enough to establish whether the data are representative. A dataset can scale well with the benchmark population overall while still being systematically too high in some places and too low in others. Looking at residuals helps reveal these local departures from the overall pattern. If the residuals are related to local population characteristics, then the errors are not random. Instead, they suggest that some types of places or populations are more visible in the data than others.
Consequences
If these biases are ignored, downstream estimates can mislead decisions:
- displacement analysis can miss groups with low digital coverage
- transport models can overstate movement from overrepresented areas
- estimates of exposure to hazards, disease outbreaks or service disruptions can reflect uneven data coverage rather than true population exposure
- data from one mobile operator, app or platform can be mistaken for evidence about the whole population
- policy comparisons between places can reflect differences in data coverage rather than real differences in movement
Therefore, the problem is not that mobile-phone-derived data are wrong, but biases in their measurement process must be acknowledged, assessed and adjusted where possible.
What debiasR adds
debiasR supports a practical workflow:
- measure coverage and representativeness bias in population counts and flows
- model how bias varies across places
- adjust OD flows
- validate adjusted flows against benchmark OD data
While bias assessment and adjustment does not remove the need for caution in drawing conclusions from the data, it makes the assumptions explicit enough to test, discuss and report results with greater confidence.