---
title: "Stipple Formula Specification"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Stipple Formula Specification}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
                      collapse = TRUE,
                      prompt=TRUE,
  comment = " "
)
```


```{r setup, include=FALSE}
library(stipple)
d = stipple:::make_test_data()

```

## Event Time and Place

The left-hand side of the formula must have a time and place specified. This is done by separating
time from space expressions with an `@` sign.

```{r eval=FALSE, prompt=FALSE}
T0@Place ~ 1
```

Like this, the events are assumed to start at `T0` and continue forever.

If events have a start and finish time specified in data frame
columns, then put them in parentheses and separate with a minus sign
(`-`). Think of this as showing the event as being active from the
first time to the second time:


```{r eval=FALSE, prompt=FALSE}
(T0 - T1)@Place ~ 1
```

If either of these terms is also an expression, then put it in parentheses. For example, suppose you
have the start times of events that all have a fixed duration of 10 units. You could add a new column
to your data frame - which was the start time column plus 10 - or you can express it in the formula:

```{r eval=FALSE, prompt=FALSE}
(T0 - (T0+10))@Place ~ 1
```

Make sure you include the outer parentheses - this is not valid:

```{r eval=FALSE, prompt=FALSE}
T0 - (T0+10)@Place ~ 1
```


## Case Variables and Place Variables

Everything on the left of the `~` refers to the *case* data, and everything on the right of the
`~` refers to the *place* data. 

So if you have a variable that relates to the case event and wish to include it as a covariate,
put it on the left. If you have variables that relate to the place where the event occurred and want to
have that as a covariate, put it on the right.

For example, this formula will add a term for the age column from the case data to the model:

```{r eval=FALSE, prompt=FALSE}
T0@Place + Age ~ 1
```

This formula will add an additional term for the soil type at the locations:

```{r eval=FALSE, prompt=FALSE}
T0@Place + Age ~ Soil
```


## Infectivity and Susceptibility Variables

By default any explanatory variables contribute to the susceptibility and infectivity of a case. To 
change this, wrap the term in `inf()` or `sus()`. For example, this will only use age to model the 
infectivity of a case, and only use the soil type to model the susceptibility of a case in an area:

```{r eval=FALSE, prompt=FALSE}
T0@Place + inf(Age) ~ sus(Soil)
```

## Grouping

In some settings there is data from several independent "experiments", and the likelihood
for the whole data is the simple product of the individual likelihoods. To facilitate this,
the formula may include a grouping term on the left-hand side to show which case events are grouped with 
each other.

For example, if annual surveys have been made of some events, and these need to be all modelled 
together, you could do:

```{r eval=FALSE, prompt=FALSE}
T0@Place|Survey ~ 1
```

where `Survey` is a column in the case event data.

Groupings can be sums of more than one column, in which case the groups are formed from unique
combinations of the column values. For example if survey years and seasons are an appropriate group,
then this may be the correct formula:

```{r eval=FALSE, prompt=FALSE}
T0@Place|Survey+Season ~ 1
```

Note that if you want to see if there is a season *effect*, then `Season` should appear on the 
left-hand side as an explanatory variable:

```{r eval=FALSE, prompt=FALSE}
T0@Place + Season | Survey ~ 1
```


Groupings can also be specified on the *right* side of the formula `~`, and these again relate
to the *place* data. This then divides the data into separate geographic subsets
which contribute indepdently to the likelihood. This may be appropriate if the data consists
of spatially disparate (and hence independent) data sets but you want to pool the data to get
one estimate of the model parameters.