Worked Example

Outline

This example uses some synthetic data to illustrate the various formula and modelling options in the package.

library(stipple)
library(sf)

Cases

We have a number of cases of infection with the county code in the FIPS column, the age of the infected animal in AGE, the lifecycle stage in STAGE, and the start and end of infectious period in two further columns. These are in a data frame.

cases
#>     FIPS Age Stage      Start        End
#> 1  37039 5.0   Egg 2010-04-04 2010-04-23
#> 2  37163 0.0   Egg 2010-04-10 2010-04-21
#> 3  37019 3.5 Adult 2010-04-20 2010-05-05
#> 4  37005 1.0   Egg 2010-04-25 2010-05-03
#> 5  37069 1.0 Adult 2010-04-27 2010-05-06
#> 6  37087 3.0 Larva 2010-05-01 2010-05-19
#> 7  37143 0.0 Larva 2010-05-07 2010-05-15
#> 8  37089 3.0 Larva 2010-05-28 2010-06-15
#> 9  37147 4.0   Egg 2010-05-29 2010-06-04
#> 10 37007 3.0 Adult 2010-06-01 2010-06-16

Locations

We also have the geographic data as the county map for the state with some animal population estimate (COUNT) and a habitat suitability factor (HAB). Each feature has its name and FIPS code.

nc
#> Simple feature collection with 100 features and 4 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS:  NAD27
#> First 10 features:
#>           NAME  FIPS    HAB COUNT                           geom
#> 1         Ashe 37009   Good    89 MULTIPOLYGON (((-81.47276 3...
#> 2    Alleghany 37005   Good    50 MULTIPOLYGON (((-81.23989 3...
#> 3        Surry 37171 Medium    89 MULTIPOLYGON (((-80.45634 3...
#> 4    Currituck 37053   Poor    89 MULTIPOLYGON (((-76.00897 3...
#> 5  Northampton 37131   Poor   137 MULTIPOLYGON (((-77.21767 3...
#> 6     Hertford 37091   Poor    78 MULTIPOLYGON (((-76.74506 3...
#> 7       Camden 37029   Poor   127 MULTIPOLYGON (((-76.00897 3...
#> 8        Gates 37073   Poor   119 MULTIPOLYGON (((-76.56251 3...
#> 9       Warren 37185   Poor   175 MULTIPOLYGON (((-78.30876 3...
#> 10      Stokes 37169 Medium   242 MULTIPOLYGON (((-80.02567 3...
plot(nc[,'COUNT'])

plot(nc[,'HAB'])

Model Formula

The fullest model with this data would be (Start - End)@FIPS + Age + Stage ~ COUNT + HAB, specifying an infectious period between Start and End, and testing for influence of the Age and Stage values from each case and the POP1 and HAB variables for each location.

We can use build_data to see the results of this formula on our data:

build_data((Start-End)@FIPS + Age + Stage ~ COUNT + HAB, cases, nc)
#>         Start        End  FIPS StageLarva StageAdult Age COUNT HABMedium
#> 1  2010-04-04 2010-04-23 37039          0          0 5.0    67         0
#> 2  2010-04-10 2010-04-21 37163          0          0 0.0    91         1
#> 3  2010-04-20 2010-05-05 37019          0          1 3.5    50         1
#> 4  2010-04-25 2010-05-03 37005          0          0 1.0    50         0
#> 5  2010-04-27 2010-05-06 37069          0          1 1.0    67         1
#> 6  2010-05-01 2010-05-19 37087          1          0 3.0    81         0
#> 7  2010-05-07 2010-05-15 37143          1          0 0.0   138         0
#> 8  2010-05-28 2010-06-15 37089          1          0 3.0    71         1
#> 9  2010-05-29 2010-06-04 37147          0          0 4.0   175         0
#> 10 2010-06-01 2010-06-16 37007          0          1 3.0    69         1
#>    HABPoor
#> 1        0
#> 2        0
#> 3        0
#> 4        0
#> 5        0
#> 6        0
#> 7        1
#> 8        0
#> 9        1
#> 10       0

This data frame is compiled from the case and place data frames, and notice that any categorical variables are expanded into binary values, with the first level of the factor dropped in the usual way.

The simplest model would be to just model the location and time, with no covariates, creating a simple test of no-spatial correlation in incidents. The relevant built data for this would be:

build_data((Start - End)@FIPS ~ 1, cases, nc)
#>         Start        End  FIPS
#> 1  2010-04-04 2010-04-23 37039
#> 2  2010-04-10 2010-04-21 37163
#> 3  2010-04-20 2010-05-05 37019
#> 4  2010-04-25 2010-05-03 37005
#> 5  2010-04-27 2010-05-06 37069
#> 6  2010-05-01 2010-05-19 37087
#> 7  2010-05-07 2010-05-15 37143
#> 8  2010-05-28 2010-06-15 37089
#> 9  2010-05-29 2010-06-04 37147
#> 10 2010-06-01 2010-06-16 37007

If our infectious cases never recover, then we can model with just the start times:

build_data(Start@FIPS ~ 1, cases, nc)
#>         Start  FIPS
#> 1  2010-04-04 37039
#> 2  2010-04-10 37163
#> 3  2010-04-20 37019
#> 4  2010-04-25 37005
#> 5  2010-04-27 37069
#> 6  2010-05-01 37087
#> 7  2010-05-07 37143
#> 8  2010-05-28 37089
#> 9  2010-05-29 37147
#> 10 2010-06-01 37007

If infectious cases end ten days after the start, this can be used as an end-time expression.

build_data((Start-(Start+10))@FIPS ~ 1, cases, nc)
#>         Start        End  FIPS
#> 1  2010-04-04 2010-04-14 37039
#> 2  2010-04-10 2010-04-20 37163
#> 3  2010-04-20 2010-04-30 37019
#> 4  2010-04-25 2010-05-05 37005
#> 5  2010-04-27 2010-05-07 37069
#> 6  2010-05-01 2010-05-11 37087
#> 7  2010-05-07 2010-05-17 37143
#> 8  2010-05-28 2010-06-07 37089
#> 9  2010-05-29 2010-06-08 37147
#> 10 2010-06-01 2010-06-11 37007

Grouping Formula

Suppose that a further survey happens the next year, and a new set of infectious events is collected. These events are assumed independent of the first set and so we want to group by the survey year and maximise the likelihood over both sets for the same parameters.

The case data looks like this:

cases2
#>     FIPS Age Stage      Start        End Survey
#> 1  37039 5.0   Egg 2010-04-04 2010-04-23   2010
#> 2  37163 0.0   Egg 2010-04-10 2010-04-21   2010
#> 3  37019 3.5 Adult 2010-04-20 2010-05-05   2010
#> 4  37005 1.0   Egg 2010-04-25 2010-05-03   2010
#> 5  37069 1.0 Adult 2010-04-27 2010-05-06   2010
#> 6  37087 3.0 Larva 2010-05-01 2010-05-19   2010
#> 7  37143 0.0 Larva 2010-05-07 2010-05-15   2010
#> 8  37089 3.0 Larva 2010-05-28 2010-06-15   2010
#> 9  37147 4.0   Egg 2010-05-29 2010-06-04   2010
#> 10 37007 3.0 Adult 2010-06-01 2010-06-16   2010
#> 11 37107 4.0 Larva 2011-03-05 2011-03-16   2011
#> 12 37039 1.0   Egg 2011-03-07 2011-03-19   2011
#> 13 37193 2.0 Adult 2011-03-19 2011-03-31   2011
#> 14 37167 3.0 Larva 2011-03-23 2011-04-09   2011
#> 15 37133 3.5 Larva 2011-03-27 2011-04-11   2011
#> 16 37131 2.5   Egg 2011-03-29 2011-04-11   2011
#> 17 37007 3.5 Larva 2011-04-15 2011-05-02   2011
#> 18 37081 3.5   Egg 2011-05-10 2011-05-21   2011
#> 19 37135 1.5   Egg 2011-05-20 2011-06-03   2011
#> 20 37151 4.5 Larva 2011-05-26 2011-06-10   2011

To group by survey, give the grouping expression on the left-hand side of the formula. For simplicity here is a simple formula with a grouping expression. The build_data function will split the data into groups to produce a list of data frames:

build_data( Start@FIPS | Survey ~ 1, cases2, nc)
#> $`2010`
#>         Start  FIPS
#> 1  2010-04-04 37039
#> 2  2010-04-10 37163
#> 3  2010-04-20 37019
#> 4  2010-04-25 37005
#> 5  2010-04-27 37069
#> 6  2010-05-01 37087
#> 7  2010-05-07 37143
#> 8  2010-05-28 37089
#> 9  2010-05-29 37147
#> 10 2010-06-01 37007
#> 
#> $`2011`
#>         Start  FIPS
#> 11 2011-03-05 37107
#> 12 2011-03-07 37039
#> 13 2011-03-19 37193
#> 14 2011-03-23 37167
#> 15 2011-03-27 37133
#> 16 2011-03-29 37131
#> 17 2011-04-15 37007
#> 18 2011-05-10 37081
#> 19 2011-05-20 37135
#> 20 2011-05-26 37151

Model Fitting and Results

Not yet implemented

To fit the model by maximum likelihood, call stipple with the formula, both data objects, and some other arguments (to be determined):

m = stipple( (Start-End)@FIPS + Age + Stage ~ COUNT + HAB, cases, nc, ...)

The summary output of this model fit would then look something like this:

summary(m)

Call:
    stipple(formula = (Start-End)@FIPS ~ Age + Stage + COUNT + HAB, 
            cases = cases,
            places = nc,
            ...)
Fit:
    status: ok

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
Age           100.65     168.49   0.597  0.59236   
StageLarva    521.74     628.28   0.830  0.46721   
StageAdult    352.53     954.13   0.369  0.73631   
COUNT           6.54      21.43   0.305  0.78011   
HABMedium     382.08     869.41   0.439  0.69006   
HABPoor       256.06    2147.85   0.119  0.91264
d(Theta)        6.12       1.72   3.558  0.03787 **
d(Phi)        832.16     129.63   6.420  0.00766 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    Null deviance: 3417767  on 11  degrees of freedom
Residual deviance: 1837430  on 5  degrees of freedom
AIC: 165.59

This shows that none of the covariates are statistically significant but the two parameters of the distance model, theta and phi, are.