--- title: "Stipple Formula Specification" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Stipple Formula Specification} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, prompt=TRUE, comment = " " ) ``` ```{r setup, include=FALSE} library(stipple) d = stipple:::make_test_data() ``` ## Event Time and Place The left-hand side of the formula must have a time and place specified. This is done by separating time from space expressions with an `@` sign. ```{r eval=FALSE, prompt=FALSE} T0@Place ~ 1 ``` Like this, the events are assumed to start at `T0` and continue forever. If events have a start and finish time specified in data frame columns, then put them in parentheses and separate with a minus sign (`-`). Think of this as showing the event as being active from the first time to the second time: ```{r eval=FALSE, prompt=FALSE} (T0 - T1)@Place ~ 1 ``` If either of these terms is also an expression, then put it in parentheses. For example, suppose you have the start times of events that all have a fixed duration of 10 units. You could add a new column to your data frame - which was the start time column plus 10 - or you can express it in the formula: ```{r eval=FALSE, prompt=FALSE} (T0 - (T0+10))@Place ~ 1 ``` Make sure you include the outer parentheses - this is not valid: ```{r eval=FALSE, prompt=FALSE} T0 - (T0+10)@Place ~ 1 ``` ## Case Variables and Place Variables Everything on the left of the `~` refers to the *case* data, and everything on the right of the `~` refers to the *place* data. So if you have a variable that relates to the case event and wish to include it as a covariate, put it on the left. If you have variables that relate to the place where the event occurred and want to have that as a covariate, put it on the right. For example, this formula will add a term for the age column from the case data to the model: ```{r eval=FALSE, prompt=FALSE} T0@Place + Age ~ 1 ``` This formula will add an additional term for the soil type at the locations: ```{r eval=FALSE, prompt=FALSE} T0@Place + Age ~ Soil ``` ## Infectivity and Susceptibility Variables By default any explanatory variables contribute to the susceptibility and infectivity of a case. To change this, wrap the term in `inf()` or `sus()`. For example, this will only use age to model the infectivity of a case, and only use the soil type to model the susceptibility of a case in an area: ```{r eval=FALSE, prompt=FALSE} T0@Place + inf(Age) ~ sus(Soil) ``` ## Grouping In some settings there is data from several independent "experiments", and the likelihood for the whole data is the simple product of the individual likelihoods. To facilitate this, the formula may include a grouping term on the left-hand side to show which case events are grouped with each other. For example, if annual surveys have been made of some events, and these need to be all modelled together, you could do: ```{r eval=FALSE, prompt=FALSE} T0@Place|Survey ~ 1 ``` where `Survey` is a column in the case event data. Groupings can be sums of more than one column, in which case the groups are formed from unique combinations of the column values. For example if survey years and seasons are an appropriate group, then this may be the correct formula: ```{r eval=FALSE, prompt=FALSE} T0@Place|Survey+Season ~ 1 ``` Note that if you want to see if there is a season *effect*, then `Season` should appear on the left-hand side as an explanatory variable: ```{r eval=FALSE, prompt=FALSE} T0@Place + Season | Survey ~ 1 ``` Groupings can also be specified on the *right* side of the formula `~`, and these again relate to the *place* data. This then divides the data into separate geographic subsets which contribute indepdently to the likelihood. This may be appropriate if the data consists of spatially disparate (and hence independent) data sets but you want to pool the data to get one estimate of the model parameters.