The sidclosepairs
package is designed to produce
sampling designs for surveys. The emphasis is on getting a set of
samples that are spatially spread across the survey area - a spatially
inhibitory design - with a user-defined number of close pair samples to
look at small-scale structure in the data.
The package supports four different methods for generating sampling
designs. Each work on an initial sequential rejection sampling process
where n
points are generated from a spatial distribution,
and each point is accepted if it is greater than the given threshold
distance from the previous accepted points.
The methods for the initial sample points are:
n
points uniformly split across N
polygons, maintaining the inhibitory distance across all polygons, so
that a point in polygon 1 will be greater than the distance parameter
from points in all polygons.When the initial inhibitory sample points have been generated a random subset are selected and each gets a close pair point, chose uniformly randomly within a given radius. Note that the close pair points have no bearing on the non-uniformity of the initial sample point selection, and are not subject themselves to the inhibitory process. This means that it is possible for two points in the output to be closer than the inhibitory distance if close-pair points are generated from their parent points in some circumstances.
This package uses sf
class data for points and polygons,
and terra
rasters for gridded values. The return value from
the main function is also an sf
class data frame of points.
The code returns points in the same coordinate system as the input
polygon.
Distances are computed using the coordinate system of the data, so unprojected (lat-long) coordinates will typically use degrees as a distance measure, even though this does not produce true circles (except on the equator). The code will warn you about this. Since sampling methodology is rarely implemented on a global scale, you should probably transform lat-long coordinates to a local cartesian coordinate system (such as the appropriate UTM zone) before use.
First we’ll create a polygon to represent the study area. This would normally come from a spatial data source such as a GeoPackage or Shapefile, or be created by drawing a simple boundary on screen.
library(sf)
study_area = st_sfc(st_polygon(list(
cbind(c(-2.79479557075698, -2.80125014547061, -2.80231050277473,
-2.80235124747829, -2.80233386828029, -2.80221829171468, -2.80164935839459,
-2.80114175901799, -2.78462082038709, -2.78008674567188, -2.77832720481475,
-2.7779840993325, -2.77540503667928, -2.77598222899305, -2.79479557075698),
c(54.023222117637, 54.0268647946785, 54.0286678871387, 54.0289066469941,
54.0308376726309, 54.0318977236945, 54.0319945415705, 54.0320201272935,
54.032365192172, 54.0319603024302, 54.0308731143842, 54.0304837495073,
54.0239123648346, 54.0232677630067, 54.023222117637)))),
crs=4326
)
plot(study_area, axes=TRUE, las=2)
This is in the common WGS84 lat-long coordinate system used as
default in most global mapping and GPS applications, so we convert it to
a local coordinate system for further work. In this case we use the
EPSG:27700
system as this is the standard reference for
data in Great Britain.
The simplest sampling scheme is where the initial inhibitory points
are sampled uniformly from the polygon and rejected if too close to
previously sampled points. To do this we call icpSample
with the default proposal
argument (ie leave it out). In
this example we want to generate 50 (n
) survey points in
total, with 10 (k
) close pairs. The initial 40 points must
be at least 100 metres from each other (delta
), and the
close pairs are somewhere within a 20 metre radius of their inhibitory
point (zeta
).
From this plot you should be able to pick out the ten close pairs.
This method generates the initial inhibitory process points from a non-uniform density defined by a raster grid over the polygon. This grid must have the same coordinate system as the polygon.
The close pairs are generated in a uniform disc about each inhibitory point, with no consideration of the density grid. However they will be generated inside the polygon by rejection sampling.
library(terra)
#> terra 1.7.83
box = st_bbox(study_xy)
density = rast(matrix(1:(100*100),100,100), extent=box[c(1,3,2,4)],crs="epsg:27700")
density = mask(density, vect(study_xy))
density
#> class : SpatRaster
#> dimensions : 100, 100, 1 (nrow, ncol, nlyr)
#> resolution : 17.59, 10.18429 (x, y)
#> extent : 347539.3, 349298.3, 458864.2, 459882.6 (xmin, xmax, ymin, ymax)
#> coord. ref. : OSGB36 / British National Grid (EPSG:27700)
#> source(s) : memory
#> name : lyr.1
#> min value : 4
#> max value : 9997
For this method you first need a set of points that would typically be a complete census of locations in the area under investigation. The process then draws a point from this set sequentially, rejecting the point if it is nearer than the threshold to any previously selected points. Then the close pairs are added, independent of the census points.
For an example, we’ll generate a cluster of a number of points inside the bounding box of the study area polygon, and then clip it to the polygon so we only get points inside our study area.
box = st_bbox(study_xy)
ncensus = 800
xy = data.frame(rnorm(ncensus, (box["xmax"]+box["xmin"])/2, (box["xmax"]-box["xmin"])/6),
rnorm(ncensus, (box["ymax"]+box["ymin"])/2, (box["ymax"]-box["ymin"])/6))
pxy = st_as_sf(xy, coords=1:2, crs=27700)
pxy = st_intersection(pxy, study_xy)
plot(study_xy)
plot(pxy, add=TRUE)
To do inhibitory sampling of 50 points with 20 close pairs we specify
the censusxy
parameter and choose the “census” proposal
method.
pts_census = icpSample(50, 20, 100, 10, censusxy=pxy, poly=study_xy, proposal="census")
plot(study_xy)
plot(pts_census, add=TRUE)
Note that the close pair points are not generated with regards to the census locations, but are uniformly sampled from a disc centred on the selected inhibitory points. Again, rejection sampling is used to generate a point guaranteed to be inside the polygon.
Also notice that even though the census locations are highly non-uniform, the output sampling points look evenly spread because of the inhibitory nature of the initial point selection.
To illustrate this process, first we create a spatial data frame with three fairly simple polygons. Normally these would be read from a spatial data source but for simplicity here they will be created from the WKT text format with a bit of processing.
wkt = "MULTIPOLYGON (
((348868.0 459145.5, 349149.6 459250.2, 349203.7 458842.1, 348918.5 458936.0, 348868.0 459145.5)),
((348286.7 459290.0, 348405.8 459318.8, 348586.3 459300.8, 348795.8 459210.5, 348723.5 458824.1,
348290.3 458853.0, 348286.7 459290.0)),
((347712.6 459477.8, 347792.0 459730.6, 348044.8 459932.8, 348351.7 459893.1, 348261.4 459730.6,
348243.3 459571.6, 348283.1 459376.6, 347712.6 459477.8))
)"
wkt = gsub("\n","",wkt) # remove line breaks
three = st_cast(st_as_sf(data.frame(geometry=wkt),wkt="geometry", crs=27700),"POLYGON")
Now we want to sample 30 total points, with 6 sets of close pairs. This means 24 initial inhibitory points, to be split across the 3 polygons, and then the 6 close pair points are randomly assigned.
If the requested number of inhibitory points does not divide by the number of polygons, then a warning is shown alerting the user to the uneven split of points.
You can specify zero close pairs if you only want to get a set of
inhibitory points out. You can then test that the minimum point distance
is over the threshold specified by delta
.
If you have a set of points and want to add a number of close pairs,
then use the add_close_pairs
function. This returns the
input points with a number (k
) of additional close pair
points randomly sampled in a disc of radius zeta
within the
given polygon. In this example three new points are added to the
inhibitory points created in the previous section. The close pair points
will always be the last ones in the returned data, so it is possible to
subset those out for example to colour them differently.
Sequential sampling of inhibitory points can result in very long, or even infinite, times to complete. Some requested sampling schemes may even be impossible, for example requesting 100 points in a 100m square that are 30m apart - its just impossible to squeeze 100 points in with that minimum separation. The code does not check for possibility or have any timeout for long-running possible simulations.
If you want to run this code in an automated manner and make sure it
completes, then you might want to wrap the call in
R.utils::withTimout
. Otherwise you need to be there to hit
Control-C or ESC or click the button that interrupts R once you think it
might have got stuck.
As mentioned previously, the code works using the coordinate system of the polygon given. Hence all distances and thresholds must be given in the units of the coordinate system. For unprojected lat-long coordinates, this means decimal degrees. A warning is given if this is used, since it results in distortion of ground distances due to the ellipsoid of the earth.
The study_area
object created earlier, and here is an
attempt to create points from that polygon object. Note how small the
distance values are, since they now have to be in decimal degrees, and
not metres.
pts_ll = icpSample(n=10, k=1, delta=0.001, zeta=0.00025, poly=study_area)
#> Warning in icpSample(n = 10, k = 1, delta = 0.001, zeta = 0.00025, poly =
#> study_area): Polygon is in lat-long coordinates so distances must be in
#> degrees. Consider transforming to a local projected coordinate system
pts_ll
#> Simple feature collection with 10 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -2.796445 ymin: 54.02484 xmax: -2.778025 ymax: 54.03109
#> Geodetic CRS: WGS 84
#> geometry
#> 1 POINT (-2.778025 54.02838)
#> 2 POINT (-2.794239 54.02745)
#> 3 POINT (-2.796445 54.03018)
#> 4 POINT (-2.794823 54.0289)
#> 5 POINT (-2.791562 54.03109)
#> 6 POINT (-2.784327 54.02904)
#> 7 POINT (-2.796178 54.02566)
#> 8 POINT (-2.790323 54.02484)
#> 9 POINT (-2.790983 54.02992)
#> 10 POINT (-2.784498 54.0292)
Most of this code was originally written by Melodie Sammarro as part of her PhD supervised by Chris Jewell and Barry Rowlingson. The code was then cleaned up and made more consistent by Barry Rowlingson.