This example uses some synthetic data to illustrate the various formula and modelling options in the package.
We have a number of cases of infection with the county code in the
FIPS
column, the age of the infected animal in
AGE
, the lifecycle stage in STAGE
, and the
start and end of infectious period in two further columns. These are in
a data frame.
cases
#> FIPS Age Stage Start End
#> 1 37039 5.0 Egg 2010-04-04 2010-04-23
#> 2 37163 0.0 Egg 2010-04-10 2010-04-21
#> 3 37019 3.5 Adult 2010-04-20 2010-05-05
#> 4 37005 1.0 Egg 2010-04-25 2010-05-03
#> 5 37069 1.0 Adult 2010-04-27 2010-05-06
#> 6 37087 3.0 Larva 2010-05-01 2010-05-19
#> 7 37143 0.0 Larva 2010-05-07 2010-05-15
#> 8 37089 3.0 Larva 2010-05-28 2010-06-15
#> 9 37147 4.0 Egg 2010-05-29 2010-06-04
#> 10 37007 3.0 Adult 2010-06-01 2010-06-16
We also have the geographic data as the county map for the state with
some animal population estimate (COUNT
) and a habitat
suitability factor (HAB
). Each feature has its name and
FIPS code.
nc
#> Simple feature collection with 100 features and 4 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
#> First 10 features:
#> NAME FIPS HAB COUNT geom
#> 1 Ashe 37009 Good 89 MULTIPOLYGON (((-81.47276 3...
#> 2 Alleghany 37005 Good 50 MULTIPOLYGON (((-81.23989 3...
#> 3 Surry 37171 Medium 89 MULTIPOLYGON (((-80.45634 3...
#> 4 Currituck 37053 Poor 89 MULTIPOLYGON (((-76.00897 3...
#> 5 Northampton 37131 Poor 137 MULTIPOLYGON (((-77.21767 3...
#> 6 Hertford 37091 Poor 78 MULTIPOLYGON (((-76.74506 3...
#> 7 Camden 37029 Poor 127 MULTIPOLYGON (((-76.00897 3...
#> 8 Gates 37073 Poor 119 MULTIPOLYGON (((-76.56251 3...
#> 9 Warren 37185 Poor 175 MULTIPOLYGON (((-78.30876 3...
#> 10 Stokes 37169 Medium 242 MULTIPOLYGON (((-80.02567 3...
plot(nc[,'COUNT'])
The fullest model with this data would be
(Start - End)@FIPS + Age + Stage ~ COUNT + HAB
, specifying
an infectious period between Start
and End
,
and testing for influence of the Age
and Stage
values from each case and the POP1
and HAB
variables for each location.
We can use build_data
to see the results of this formula
on our data:
build_data((Start-End)@FIPS + Age + Stage ~ COUNT + HAB, cases, nc)
#> Start End FIPS StageLarva StageAdult Age COUNT HABMedium
#> 1 2010-04-04 2010-04-23 37039 0 0 5.0 67 0
#> 2 2010-04-10 2010-04-21 37163 0 0 0.0 91 1
#> 3 2010-04-20 2010-05-05 37019 0 1 3.5 50 1
#> 4 2010-04-25 2010-05-03 37005 0 0 1.0 50 0
#> 5 2010-04-27 2010-05-06 37069 0 1 1.0 67 1
#> 6 2010-05-01 2010-05-19 37087 1 0 3.0 81 0
#> 7 2010-05-07 2010-05-15 37143 1 0 0.0 138 0
#> 8 2010-05-28 2010-06-15 37089 1 0 3.0 71 1
#> 9 2010-05-29 2010-06-04 37147 0 0 4.0 175 0
#> 10 2010-06-01 2010-06-16 37007 0 1 3.0 69 1
#> HABPoor
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 0
#> 6 0
#> 7 1
#> 8 0
#> 9 1
#> 10 0
This data frame is compiled from the case and place data frames, and notice that any categorical variables are expanded into binary values, with the first level of the factor dropped in the usual way.
The simplest model would be to just model the location and time, with no covariates, creating a simple test of no-spatial correlation in incidents. The relevant built data for this would be:
build_data((Start - End)@FIPS ~ 1, cases, nc)
#> Start End FIPS
#> 1 2010-04-04 2010-04-23 37039
#> 2 2010-04-10 2010-04-21 37163
#> 3 2010-04-20 2010-05-05 37019
#> 4 2010-04-25 2010-05-03 37005
#> 5 2010-04-27 2010-05-06 37069
#> 6 2010-05-01 2010-05-19 37087
#> 7 2010-05-07 2010-05-15 37143
#> 8 2010-05-28 2010-06-15 37089
#> 9 2010-05-29 2010-06-04 37147
#> 10 2010-06-01 2010-06-16 37007
If our infectious cases never recover, then we can model with just the start times:
build_data(Start@FIPS ~ 1, cases, nc)
#> Start FIPS
#> 1 2010-04-04 37039
#> 2 2010-04-10 37163
#> 3 2010-04-20 37019
#> 4 2010-04-25 37005
#> 5 2010-04-27 37069
#> 6 2010-05-01 37087
#> 7 2010-05-07 37143
#> 8 2010-05-28 37089
#> 9 2010-05-29 37147
#> 10 2010-06-01 37007
If infectious cases end ten days after the start, this can be used as an end-time expression.
build_data((Start-(Start+10))@FIPS ~ 1, cases, nc)
#> Start End FIPS
#> 1 2010-04-04 2010-04-14 37039
#> 2 2010-04-10 2010-04-20 37163
#> 3 2010-04-20 2010-04-30 37019
#> 4 2010-04-25 2010-05-05 37005
#> 5 2010-04-27 2010-05-07 37069
#> 6 2010-05-01 2010-05-11 37087
#> 7 2010-05-07 2010-05-17 37143
#> 8 2010-05-28 2010-06-07 37089
#> 9 2010-05-29 2010-06-08 37147
#> 10 2010-06-01 2010-06-11 37007
Suppose that a further survey happens the next year, and a new set of infectious events is collected. These events are assumed independent of the first set and so we want to group by the survey year and maximise the likelihood over both sets for the same parameters.
The case data looks like this:
cases2
#> FIPS Age Stage Start End Survey
#> 1 37039 5.0 Egg 2010-04-04 2010-04-23 2010
#> 2 37163 0.0 Egg 2010-04-10 2010-04-21 2010
#> 3 37019 3.5 Adult 2010-04-20 2010-05-05 2010
#> 4 37005 1.0 Egg 2010-04-25 2010-05-03 2010
#> 5 37069 1.0 Adult 2010-04-27 2010-05-06 2010
#> 6 37087 3.0 Larva 2010-05-01 2010-05-19 2010
#> 7 37143 0.0 Larva 2010-05-07 2010-05-15 2010
#> 8 37089 3.0 Larva 2010-05-28 2010-06-15 2010
#> 9 37147 4.0 Egg 2010-05-29 2010-06-04 2010
#> 10 37007 3.0 Adult 2010-06-01 2010-06-16 2010
#> 11 37107 4.0 Larva 2011-03-05 2011-03-16 2011
#> 12 37039 1.0 Egg 2011-03-07 2011-03-19 2011
#> 13 37193 2.0 Adult 2011-03-19 2011-03-31 2011
#> 14 37167 3.0 Larva 2011-03-23 2011-04-09 2011
#> 15 37133 3.5 Larva 2011-03-27 2011-04-11 2011
#> 16 37131 2.5 Egg 2011-03-29 2011-04-11 2011
#> 17 37007 3.5 Larva 2011-04-15 2011-05-02 2011
#> 18 37081 3.5 Egg 2011-05-10 2011-05-21 2011
#> 19 37135 1.5 Egg 2011-05-20 2011-06-03 2011
#> 20 37151 4.5 Larva 2011-05-26 2011-06-10 2011
To group by survey, give the grouping expression on the left-hand
side of the formula. For simplicity here is a simple formula with a
grouping expression. The build_data
function will split the
data into groups to produce a list of data frames:
build_data( Start@FIPS | Survey ~ 1, cases2, nc)
#> $`2010`
#> Start FIPS
#> 1 2010-04-04 37039
#> 2 2010-04-10 37163
#> 3 2010-04-20 37019
#> 4 2010-04-25 37005
#> 5 2010-04-27 37069
#> 6 2010-05-01 37087
#> 7 2010-05-07 37143
#> 8 2010-05-28 37089
#> 9 2010-05-29 37147
#> 10 2010-06-01 37007
#>
#> $`2011`
#> Start FIPS
#> 11 2011-03-05 37107
#> 12 2011-03-07 37039
#> 13 2011-03-19 37193
#> 14 2011-03-23 37167
#> 15 2011-03-27 37133
#> 16 2011-03-29 37131
#> 17 2011-04-15 37007
#> 18 2011-05-10 37081
#> 19 2011-05-20 37135
#> 20 2011-05-26 37151
Not yet implemented
To fit the model by maximum likelihood, call stipple
with the formula, both data objects, and some other arguments (to be
determined):
The summary output of this model fit would then look something like this:
summary(m)
Call:
stipple(formula = (Start-End)@FIPS ~ Age + Stage + COUNT + HAB,
cases = cases,
places = nc,
...)
Fit:
status: ok
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Age 100.65 168.49 0.597 0.59236
StageLarva 521.74 628.28 0.830 0.46721
StageAdult 352.53 954.13 0.369 0.73631
COUNT 6.54 21.43 0.305 0.78011
HABMedium 382.08 869.41 0.439 0.69006
HABPoor 256.06 2147.85 0.119 0.91264
d(Theta) 6.12 1.72 3.558 0.03787 **
d(Phi) 832.16 129.63 6.420 0.00766 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null deviance: 3417767 on 11 degrees of freedom
Residual deviance: 1837430 on 5 degrees of freedom
AIC: 165.59
This shows that none of the covariates are statistically significant but the two parameters of the distance model, theta and phi, are.