Using OSM Building Data

In areas where good OpenStreetMap building surveys are available, this data makes a good proxy for a population density when no census is available. The osmdata package lets a user query the OSM database for particular features. This vignette illustrates how to get OSM building data and use it for a sampling design.

Input Polygon

First we need to define a polygon for our study region. Here we construct it from a WKT string, but typically this might be read in from a data source such as a geopackage or a shapefile, or even drawn on a map using R’s interactove locator function.

library(sidclosepairs)
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
study_area = "POLYGON (
 (-2.794796 54.02322, -2.801250 54.02686, -2.802311 54.02867, -2.802351 54.02891, -2.802334 54.03084,
  -2.802218 54.03190, -2.801649 54.03199, -2.801142 54.03202, -2.784621 54.03237, -2.780087 54.03196,
  -2.778327 54.03087, -2.777984 54.03048, -2.775405 54.02391, -2.775982 54.02327, -2.794796 54.02322))"

## remove newline markers
study_area = gsub("\n","", study_area)

# convert WKT to spatial polygon data frame
study_area = st_as_sf(data.frame(wkt=study_area), wkt="wkt", crs=4326)

Get the Buildings

Now with our study area defined, we construct a query using osmdata package functions and get back the points in the bounding box of our study area. Taking the centroid of the buildings makes sure we reduce polygons down to points. Finally we take only the points that are within the polygon itself, not just the bounding box. We’ll wrap these calls into a function for re-use and call it to get buildings in our study area.

library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
## or use another overpass server
set_overpass_url("https://overpass-api.de/api/interpreter")
get_buildings <- function(area){
    osmd <- opq(bbox = st_bbox(area))
    osmd <- add_osm_feature(osmd, key = "building")
    osmd <- osmdata_sf(osmd)
    centroids_in_area <- st_intersection(st_centroid(osmd$osm_polygons), area)
    return(centroids_in_area)
}
buildings = get_buildings(study_area)
#> Warning: st_centroid assumes attributes are constant over geometries
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries

If you want to do this for several study areas you should consider wrapping the osmdata calls into a single function for convenience.

Inhibitory Close Pair Sampling

Now we’ll sample fifty points in total, with ten sets of close pairs, keeping our inhibitory points 100 metres apart, and our close pairs in a 20 metre radius.

First we have to convert our data to a local Cartesian coordinate system, so we use EPSG code 27700, for the Ordnance Survey GB system which is appropriate for this region in Lancaster.

crs = 27700
study_gb = st_transform(study_area, crs)
buildings = st_transform(buildings, crs)

Now we can sample our 50 points with 10 close pairs using the “census” proposal method for the inhibitory points, and a 20 metre radius for the close pairs.

pts_buildings = icpSample(50, 10, 100, 20, censusxy=buildings, poly=study_gb, proposal="census")
pts_buildings
#> Simple feature collection with 50 features and 0 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 347574.9 ymin: 458892.7 xmax: 349078.5 ymax: 459832.3
#> Projected CRS: OSGB36 / British National Grid
#> First 10 features:
#>                     geometry
#> 1  POINT (348653.1 458927.4)
#> 2    POINT (348129 459428.4)
#> 3  POINT (348082.6 459700.9)
#> 4  POINT (348169.2 459752.9)
#> 5  POINT (347897.2 459020.8)
#> 6  POINT (348374.2 459217.5)
#> 7  POINT (348655.3 459616.5)
#> 8  POINT (348542.3 459182.1)
#> 9  POINT (347819.7 459264.6)
#> 10 POINT (347574.9 459634.9)

Plotting

To show these points we can first plot the building locations in grey, then overlay the first 40 points in red to show the inhibitory points, and then add the close pairs in blue.

plot(study_gb)
plot(st_geometry(buildings), add=TRUE, col="grey", cex=.5)
plot(st_geometry(pts_buildings[1:40,]), add=TRUE, col="red")
plot(st_geometry(pts_buildings[41:50,]), add=TRUE, col="blue")

Note that the blue close pair points are not guaranteed to be on a building, but some other method will need to be used to select the second building for a survey related to that point. Often the nearest building to the point will be sampled, or the nearest in the direction indicated by the two points.