--- title: "Worked Example" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Worked Example} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} #library(ragg) #knitr::opts_chunk$set(dev = "png", dev.args = list(type = "cairo-png")) #knitr::opts_chunk$set(dev = "ragg_png") #knitr::opts_chunk$set(fig.ext="png") knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(stipple) d = stipple:::demo_data() cases = d$events cases2 = d$events2 nc = d$places ``` ## Outline This example uses some synthetic data to illustrate the various formula and modelling options in the package. ```{r setup} library(stipple) library(sf) ``` ## Cases We have a number of cases of infection with the county code in the `FIPS` column, the age of the infected animal in `AGE`, the lifecycle stage in `STAGE`, and the start and end of infectious period in two further columns. These are in a data frame. ```{r thecases} cases ``` ## Locations We also have the geographic data as the county map for the state with some animal population estimate (`COUNT`) and a habitat suitability factor (`HAB`). Each feature has its name and FIPS code. ```{r theplaces} nc plot(nc[,'COUNT']) plot(nc[,'HAB']) ``` ## Model Formula The fullest model with this data would be `(Start - End)@FIPS + Age + Stage ~ COUNT + HAB`, specifying an infectious period between `Start` and `End`, and testing for influence of the `Age` and `Stage` values from each case and the `POP1` and `HAB` variables for each location. We can use `build_data` to see the results of this formula on our data: ```{r buildit} build_data((Start-End)@FIPS + Age + Stage ~ COUNT + HAB, cases, nc) ``` This data frame is compiled from the case and place data frames, and notice that any categorical variables are expanded into binary values, with the first level of the factor dropped in the usual way. The simplest model would be to just model the location and time, with no covariates, creating a simple test of no-spatial correlation in incidents. The relevant built data for this would be: ```{r buildsimple} build_data((Start - End)@FIPS ~ 1, cases, nc) ``` If our infectious cases never recover, then we can model with just the start times: ```{r buildstart} build_data(Start@FIPS ~ 1, cases, nc) ``` If infectious cases end ten days after the start, this can be used as an end-time expression. ```{r buildend} build_data((Start-(Start+10))@FIPS ~ 1, cases, nc) ``` ## Grouping Formula Suppose that a further survey happens the next year, and a new set of infectious events is collected. These events are assumed independent of the first set and so we want to group by the survey year and maximise the likelihood over both sets for the same parameters. The case data looks like this: ```{r casedata2} cases2 ``` To group by survey, give the grouping expression on the left-hand side of the formula. For simplicity here is a simple formula with a grouping expression. The `build_data` function will split the data into groups to produce a list of data frames: ```{r group1} build_data( Start@FIPS | Survey ~ 1, cases2, nc) ``` ## Model Fitting and Results **Not yet implemented** To fit the model by maximum likelihood, call `stipple` with the formula, both data objects, and some other arguments (to be determined): ```{r examplefit, eval=FALSE} m = stipple( (Start-End)@FIPS + Age + Stage ~ COUNT + HAB, cases, nc, ...) ``` The summary output of this model fit would then look something like this: ```{r fakey, eval=FALSE} summary(m) Call: stipple(formula = (Start-End)@FIPS ~ Age + Stage + COUNT + HAB, cases = cases, places = nc, ...) Fit: status: ok Coefficients: Estimate Std. Error t value Pr(>|t|) Age 100.65 168.49 0.597 0.59236 StageLarva 521.74 628.28 0.830 0.46721 StageAdult 352.53 954.13 0.369 0.73631 COUNT 6.54 21.43 0.305 0.78011 HABMedium 382.08 869.41 0.439 0.69006 HABPoor 256.06 2147.85 0.119 0.91264 d(Theta) 6.12 1.72 3.558 0.03787 ** d(Phi) 832.16 129.63 6.420 0.00766 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Null deviance: 3417767 on 11 degrees of freedom Residual deviance: 1837430 on 5 degrees of freedom AIC: 165.59 ``` This shows that none of the covariates are statistically significant but the two parameters of the distance model, theta and phi, are.