pseudohouseholds: Generating Representative Points on Roads in Regions for Travel Analyses

The algorithm for creating pseudohouseholds

Let’s walk through the algorithm by giving a demonstration using test data included with the package. We’ll consider a single dissemination block (DB) in Ottawa, Ontario, with unique identifier (DBUID) “35061699003”. DBs are the smallest region at which Statistics Canada makes population data available, so they’re a good starting point for a fine-grained analysis.

Here we load our packages, isolate our DB, and print it to the console:

library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
library(ggplot2)
library(ggspatial)
library(pseudohouseholds)

db <- dplyr::filter(pseudohouseholds::ottawa_db_shp, DBUID == "35061699003")

print(db)
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 369747.8 ymin: 5027016 xmax: 370185.7 ymax: 5027330
#> Projected CRS: NAD83 / MTM zone 9
#> # A tibble: 1 × 3
#>   DBUID       dbpop2021                                                 geometry
#> * <chr>           <dbl>                                       <MULTIPOLYGON [m]>
#> 1 35061699003       125 (((370113.7 5027251, 370075.7 5027241, 370026.3 5027236…

Note that its coordinate reference system (CRS) is projected. Specifically, it’s using NAD83/MTM zone 9 (32189) which gives units in meters. It’s important to use a projected CRS with this package.

Let’s also plot it to get an idea of its scale:

This DB is a few hundred meters across, borders Bank St. on its west, and extends to the east into an irregularly shaped residential area. From the printout above, we see that it had a population of 125 in 2021.

If the region is unpopulated, optionally return a default point.

By default, if a region is unpopulated it receives one single point with population 0 of phh_type type 1. This point will be the centroid if it is inside the region, otherwise it will be an approximation of the region’s “pole of inaccessibility.” Details are provided below. ### Find road segments intersecting or near the region

First, we identify road segments that either intersect with the region or come within a specified buffer. The default buffer is 5 meters, which we adopt here, although larger or smaller buffers may be appropriate in different settings.

Plotting our DB and the road segments that intersect it:

Create initial points by sampling along the road segments

Next we sample points along these road segments. The sampling frequency is given by the parameter phh_density, which is passed along directly to the function sf::st_line_sample() and is measured in units of “PHHs per unit.” The default value is 0.005 which, in meters, translates to one PHH every 200 meters. Note that this is for our initial sampling, and that the final PHHs may be closer or farther than this.

Optionally, users can set a desired minimum number of regional PHHs using the parameter min_phhs_per_region, which can be helpful with smaller regions (e.g. small city blocks).

Create candidate PHHs beside the road network

We want our PHHs to be inside our DB, so our next step is to perturb these points by the distance delta_distance to try to create candidates within the region. To do so, we:

Find the DB’s centroid;
Create a “pull” set of candidates that are moved towards the centroid by delta_distance; and,
Create a “push” set of candidates that are moved away from the centroid by delta_distance.

As can be seen, most of the blue “pull” candidates are within the DB and most of the “push” candidates are outside; however, because of the DB’s odd geometry, one “pull” candidate is outside, and one “push” is inside. Creating two sets of candidates increases the algorithm’s processing overhead, but it helps to ensure that our PHHs are distributed the way we want.

Keep only candidates within the region

Next, we keep only those candidate PHHs that are inside the region using a spatial filter.

(If necessary) If no valid points are returned, sample radially around our on-street points

The push/pull algorithm may not return valid PHHs in some cases, for example if the region is concave and has only small intersections with any road. In this case we use a backup algorithm that samples 16 candidate points distributed radially with radius delta_distance around our on-street point and selects the first candidate point within the region.

(Optionally) Ensure our PHHs have a minimum population

If we are assigning populations to our PHHs, we next check to ensure each PHH will have at least some minimum population set by the parameter min_phh_pop. This is helpful for “thinning out” PHHs in large sparsely-populated rural areas, where the default line sampling density would lead to a large number of PHHs with populations less than 1.

This can be disabled by setting min_phh_pop to 0.

(Optionally) Remove candidates that are too close to others

Optionally, we can remove any PHHs that are within min_phh_distance units of another. This is helpful to reduce artificial over-crowding caused by specific road features: for example, if many short road segments converge, each could receive a candidate PHH from sf::st_line_sample().

Distribute regional population among PHHs

We now have a set of points that are near a road, within the region, and not too close together. The final step is to assign each PHH an equal share of the region’s population.

pseudohouseholds: Generating Representative Points on Roads in Regions for Travel Analyses

Summary

What are pseudohouseholds?

The pseudohouseholds package

The algorithm for creating pseudohouseholds

If the region is unpopulated, optionally return a default point.

Create initial points by sampling along the road segments

Create candidate PHHs beside the road network

Keep only candidates within the region

(If necessary) If no valid points are returned, sample radially around our on-street points

(Optionally) Ensure our PHHs have a minimum population

(Optionally) Remove candidates that are too close to others

Distribute regional population among PHHs

Considerations

The final number of PHHs may be more or less than you expect.

Any input roads may receive PHHs, so ensure only habitable roads are used as inputs.

PHHs do not correspond directly to dwellings, and should only be used as a proxy to “smooth out” populations for analysis.

Details

The variable `phh_type`

The default point

pseudohouseholds: Generating Representative Points on Roads in Regions for Travel Analyses

Summary

What are pseudohouseholds?

The pseudohouseholds package

The algorithm for creating pseudohouseholds

If the region is unpopulated, optionally return a default point.

Create initial points by sampling along the road segments

Create candidate PHHs beside the road network

Keep only candidates within the region

(If necessary) If no valid points are returned, sample radially around our on-street points

(Optionally) Ensure our PHHs have a minimum population

(Optionally) Remove candidates that are too close to others

Distribute regional population among PHHs

Considerations

The final number of PHHs may be more or less than you expect.

Any input roads may receive PHHs, so ensure only habitable roads are used as inputs.

PHHs do not correspond directly to dwellings, and should only be used as a proxy to “smooth out” populations for analysis.

Details

The variable phh_type

The default point

The variable `phh_type`