---
title: "Worked Example"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Worked Example}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}


---

```{r, include = FALSE}
#library(ragg)
#knitr::opts_chunk$set(dev = "png", dev.args = list(type = "cairo-png"))
#knitr::opts_chunk$set(dev = "ragg_png")
#knitr::opts_chunk$set(fig.ext="png")
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
  )
library(stipple)
d = stipple:::demo_data()
cases = d$events
cases2 = d$events2
nc = d$places

```

## Outline

This example uses some synthetic data to illustrate the various formula and modelling options
in the package.

```{r setup}
library(stipple)
library(sf)
```

## Cases

We have a number of cases of infection with the county code in the
`FIPS` column, the age of the infected animal in `AGE`, the lifecycle stage in `STAGE`, and the start and end of
infectious period in two further columns. These are in a data frame.

```{r thecases}
cases
```

## Locations

We also have the geographic data as the county map for the state with some animal 
population estimate (`COUNT`) and a habitat suitability factor (`HAB`). Each feature has its name and
FIPS code.

```{r theplaces}
nc
plot(nc[,'COUNT'])
plot(nc[,'HAB'])
```


## Model Formula

The fullest model with this data would be `(Start - End)@FIPS + Age + Stage ~ COUNT + HAB`, specifying an
infectious period between `Start` and `End`, and testing for influence of the `Age` and `Stage` values from 
each case and the `POP1` and `HAB` variables for each location.

We can use `build_data` to see the results of this formula on our data:

```{r buildit}
build_data((Start-End)@FIPS + Age + Stage ~ COUNT + HAB, cases, nc)
```

This data frame is compiled from the case and place data frames, and notice that any
categorical variables are expanded into binary values, with the first level of the factor
dropped in the usual way.

The simplest model would be to just model the location and time, with no covariates, creating
a simple test of no-spatial correlation in incidents. The relevant built data for this would be:

```{r buildsimple}
build_data((Start - End)@FIPS ~ 1, cases, nc)
```

If our infectious cases never recover, then we can model with just the start times:

```{r buildstart}
build_data(Start@FIPS ~ 1, cases, nc)
```

 
If infectious cases end ten days after the start, this can be used as an end-time expression.

```{r buildend}
build_data((Start-(Start+10))@FIPS ~ 1, cases, nc)
```

## Grouping Formula

Suppose that a further survey happens the next year, and a new set of infectious events is
collected. These events are assumed independent of the first set and so we want to group
by the survey year and maximise the likelihood over both sets for the same parameters.

The case  data looks like this:

```{r casedata2}
cases2
```

To group by survey, give the grouping expression on the left-hand side of the formula. For
simplicity here is a simple formula with a grouping expression. The 
`build_data` function will split the data into groups to produce a list of data frames:

```{r group1}
build_data( Start@FIPS | Survey ~ 1, cases2, nc)
```


## Model Fitting and Results

**Not yet implemented**

To fit the model by maximum likelihood, call `stipple` with the formula, both
data objects, and some other arguments (to be determined):

```{r examplefit, eval=FALSE}
m = stipple( (Start-End)@FIPS + Age + Stage ~ COUNT + HAB, cases, nc, ...)
```

The summary output of this model fit would then look something like this:

```{r fakey, eval=FALSE}
summary(m)

Call:
    stipple(formula = (Start-End)@FIPS ~ Age + Stage + COUNT + HAB, 
            cases = cases,
            places = nc,
            ...)
Fit:
    status: ok

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
Age           100.65     168.49   0.597  0.59236   
StageLarva    521.74     628.28   0.830  0.46721   
StageAdult    352.53     954.13   0.369  0.73631   
COUNT           6.54      21.43   0.305  0.78011   
HABMedium     382.08     869.41   0.439  0.69006   
HABPoor       256.06    2147.85   0.119  0.91264
d(Theta)        6.12       1.72   3.558  0.03787 **
d(Phi)        832.16     129.63   6.420  0.00766 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    Null deviance: 3417767  on 11  degrees of freedom
Residual deviance: 1837430  on 5  degrees of freedom
AIC: 165.59
```

This shows that none of the covariates are statistically significant but
the two parameters of the distance model, theta and phi, are.