notes

> library(stipple)
  Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE

Model Fitting

Stipple functions are of the form:

fit = stipple(formula, data, space, time, ...)

The formula has to specify the start (and potentially end) times of infections, the spatial location label, plus the linear and offset terms, and the distance-decay function.

The data needs a column for location as well as start and finish times of infections, as well as covariates for the linear and offset terms.

The space needs a column that uniquely identifies each location which will match the spatial location in the formula (and the data). This will usually be a name or other unique identifier.

The time specifies the time points.

The ... args are passed through.

The space parameter.

For a discrete set of spatial units, use an sf spatial object with point or polygon geometry.

> space1
  Simple feature collection with 10 features and 1 field
  Geometry type: POINT
  Dimension:     XY
  Bounding box:  xmin: 0.1520295 ymin: 0.02018126 xmax: 0.7944928 ymax: 0.9819677
  CRS:           NA
     Place                     geometry
  1      A  POINT (0.6514732 0.6705838)
  2      B  POINT (0.2056386 0.2096754)
  3      C  POINT (0.7944928 0.7286262)
  4      D    POINT (0.7793199 0.24737)
  5      E  POINT (0.7888169 0.5720457)
  6      F  POINT (0.1520295 0.9819677)
  7      G  POINT (0.4108782 0.4970021)
  8      H  POINT (0.7085115 0.9710731)
  9      I  POINT (0.1716835 0.5318939)
  10     J POINT (0.5241297 0.02018126)

Spatial units may have extra covariate information in columns:

> space2
  Simple feature collection with 10 features and 2 fields
  Geometry type: POINT
  Dimension:     XY
  Bounding box:  xmin: 0.1520295 ymin: 0.02018126 xmax: 0.7944928 ymax: 0.9819677
  CRS:           NA
     Place   Pop                     geometry
  1      A 12787  POINT (0.6514732 0.6705838)
  2      B 17027  POINT (0.2056386 0.2096754)
  3      C 19843  POINT (0.7944928 0.7286262)
  4      D 17933    POINT (0.7793199 0.24737)
  5      E 14740  POINT (0.7888169 0.5720457)
  6      F 11542  POINT (0.1520295 0.9819677)
  7      G 11508  POINT (0.4108782 0.4970021)
  8      H 19850  POINT (0.7085115 0.9710731)
  9      I 15594  POINT (0.1716835 0.5318939)
  10     J 13964 POINT (0.5241297 0.02018126)

Data for stipple models

Minimally the data needs an identifier that matches a column in the space data and an infection start time.

> data1
    T0 Place
  1 16     C
  2 23     D
  3 43     B
  4 63     A
  5 67     E

If the infections are not recovered such that the case is infectious for the whole of the period, then no infection end time is needed, and the data is equivalent to specifying an end time of infinity.

> data2
    T0  T1 Place
  1 16 Inf     C
  2 23 Inf     D
  3 43 Inf     B
  4 63 Inf     A
  5 67 Inf     E

If the infections do result in recoveries then a second column specifying recovery times should be present.

> data3
    T0 T1 Place
  1 16 27     C
  2 23 28     D
  3 43 47     B
  4 63 71     A
  5 67 73     E

Multiple infections in the same spatial unit should appear multiple times in the data.

> data4
    T0 T1 Place
  1 16 27     C
  2 23 28     D
  3 43 47     D
  4 63 71     A
  5 67 73     E

Multiple independent experiments or observation sets should have extra columns to indicate the grouping.

> data5
     T0 Place Grp
  1  16     D   1
  2  41     A   1
  3  41     C   1
  4  82     E   1
  5  99     B   1
  6  50     B   2
  7  63     D   2
  8  65     E   2
  9  78     A   2
  10 94     C   2

Infection cases may have explanatory variables as extra columns in the data.

> data6
     T0 Place Grp Age
  1  16     D   1  34
  2  41     A   1  51
  3  41     C   1  33
  4  82     E   1  31
  5  99     B   1  32
  6  50     B   2  28
  7  63     D   2  31
  8  65     E   2  44
  9  78     A   2  48
  10 94     C   2  24

The fullest general data looks like this:

> data7
     T0  T1 Place Age Grp
  1  16  26     D  34   1
  2  41  50     A  51   1
  3  41  47     C  33   1
  4  82  85     E  31   1
  5  99 105     B  32   1
  6  50  55     B  28   2
  7  63  71     D  31   2
  8  65  68     E  44   2
  9  78  87     A  48   2
  10 94 103     C  24   2

Formula

Simplest

This says that the cases happen at time T0 in location Place and once infected are always infected. The model has only a constant term on the RHS and so this has no other covariates:

T0@Place ~ 1

Most Complex

This says that the cases are infected at time T0, recover at time T1, and are in location Place. We have one case-based covariate, Age, and one place-based covariate, Pop. The data consists of multiple replications defined by Grp:

(T0 - T1)@Place + Age|Grp ~ Pop

More complex formulae can be constructed by adding further terms to either of the covariates or grouping parts:

(T0 - T1)@Place + Age + Salary | Grp + Type ~ Pop + Area + Gov

This specifies a model dependent on the age and salary of the case, and on the population, area, and government in the location. Replications are defined by unique combinations of Grp and Type.

Formula testing

> FList = list(
+     T0@Place ~ 1,
+     T0@Place ~ Age,
+     T0@Place + Age ~ Pop,
+     T0@Place|Exp ~ 1,
+     T0@Place |Exp ~  Pop,
+     (T0-T1)@Place ~ 1,
+     (T0-T1)@Place + Age ~ 1,
+     (T0-T1)@Place + Age ~ Pop,
+     (T0-T1)@Place|Exp ~ 1,
+     (T0-T1)@Place + Age|Exp ~ Pop
+     )
> parsed = lapply(FList, function(f){
+     stipple:::parse_stipple_formula(f)
+ })
> do.call(rbind, parsed)
        f                                 st         tstart tend      location
   [1,] T0@Place ~ 1                      expression ?      numeric,0 ?       
   [2,] T0@Place ~ Age                    expression ?      numeric,0 ?       
   [3,] T0@Place + Age ~ Pop              expression ?      numeric,0 ?       
   [4,] T0@Place | Exp ~ 1                expression ?      numeric,0 ?       
   [5,] T0@Place | Exp ~ Pop              expression ?      numeric,0 ?       
   [6,] (T0 - T1)@Place ~ 1               expression ?      ?         ?       
   [7,] (T0 - T1)@Place + Age ~ 1         expression ?      ?         ?       
   [8,] (T0 - T1)@Place + Age ~ Pop       expression ?      ?         ?       
   [9,] (T0 - T1)@Place | Exp ~ 1         expression ?      ?         ?       
  [10,] (T0 - T1)@Place + Age | Exp ~ Pop expression ?      ?         ?       
        case_group case_covars place_group place_covars
   [1,] integer,0  ~1          integer,0   ~1          
   [2,] integer,0  ~1          integer,0   ~Age        
   [3,] integer,0  ~Age        integer,0   ~Pop        
   [4,] ?          ~1          integer,0   ~1          
   [5,] ?          ~1          integer,0   ~Pop        
   [6,] integer,0  ~1          integer,0   ~1          
   [7,] integer,0  ~Age        integer,0   ~1          
   [8,] integer,0  ~Age        integer,0   ~Pop        
   [9,] ?          ~1          integer,0   ~1          
  [10,] ?          ~Age        integer,0   ~Pop