Reading WW Data

Attach Package

Attach the package. If it is not installed or you are developing, use load_all from the devtools package. Otherwise:

library(openww)

Read ODS File

The open WW data is distributed in ODS (Open Document Spreadsheet) format. This can be read in Excel, Libre Office, and other spreadsheet programs. Functions in the package will read and re-format sheets from the file.

A sample spreadsheet is included in the package.

ods_file = system.file("extdata","Final_EMHP_Wastewater_Data_January_2022.ods",package="openww")

Daily Data

The daily data can be read with read_daily_ww_ods():

ww_daily = read_daily_ww_ods(ods_file)
head(ww_daily)
#>             Site_code       date  conc
#> 2  UKENAN_AW_TP000004 2021-06-02  4340
#> 4  UKENAN_AW_TP000004 2021-06-04    NA
#> 6  UKENAN_AW_TP000004 2021-06-06    NA
#> 7  UKENAN_AW_TP000004 2021-06-07    NA
#> 9  UKENAN_AW_TP000004 2021-06-09    NA
#> 11 UKENAN_AW_TP000004 2021-06-11 45426
summary(ww_daily)
#>   Site_code              date                 conc        
#>  Length:29232       Min.   :2021-06-01   Min.   :    162  
#>  Class :character   1st Qu.:2021-07-25   1st Qu.:   3248  
#>  Mode  :character   Median :2021-09-19   Median :   9554  
#>                     Mean   :2021-09-17   Mean   :  26661  
#>                     3rd Qu.:2021-11-08   3rd Qu.:  26826  
#>                     Max.   :2022-01-10   Max.   :1879113  
#>                                          NA's   :5053

The spreadsheet is a full table of sites in rows and dates in columns. When observations are not made the cell is blank. When observations are taken but are below the threshold level of detection then the cell contains the text "tLOD". In the converted data read here, un-made observations are excluded, and below-threshold measurements are recorded as NA.

Weekly Data

The weekly data can be read with read_weekly_ww_ods():

ww_weekly = read_weekly_ww_ods(ods_file)
head(ww_weekly)
#>            Site_code       date  conc
#> 1 UKENAN_AW_TP000004 2021-06-01  1205
#> 2 UKENAN_AW_TP000004 2021-06-08 16369
#> 3 UKENAN_AW_TP000004 2021-06-15  1315
#> 4 UKENAN_AW_TP000004 2021-06-22   442
#> 5 UKENAN_AW_TP000004 2021-06-29  7420
#> 6 UKENAN_AW_TP000004 2021-07-06   220
summary(ww_weekly)
#>   Site_code              date                 conc       
#>  Length:8218        Min.   :2021-06-01   Min.   :   160  
#>  Class :character   1st Qu.:2021-07-20   1st Qu.:  3734  
#>  Mode  :character   Median :2021-09-14   Median : 10536  
#>                     Mean   :2021-09-15   Mean   : 22596  
#>                     3rd Qu.:2021-11-09   3rd Qu.: 26125  
#>                     Max.   :2022-01-04   Max.   :596142

Spatial Data

Reading

The package files includes a geopackage of spatial data. One layer in this file is the point locations of the treatment works.

library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
sites_gpkg = system.file("extdata","sites.gpkg",package="openww")
stw = st_read(sites_gpkg, "sites", quiet=TRUE)

head(stw)
#> Simple feature collection with 6 features and 6 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -0.4339846 ymin: 52.13165 xmax: 1.603073 ymax: 53.69675
#> Geodetic CRS:  WGS 84
#>               code                   name              Region_name
#> 1 ukenanawtp000004             ANWICK STW            East Midlands
#> 2 ukenanawtp000012 BARTON-UPON-HUMBER STW Yorkshire and The Humber
#> 3 ukenanawtp000015            BECCLES STW          East of England
#> 4 ukenanawtp000016            BEDFORD STW          East of England
#> 5 ukenanawtp000023             BOSTON STW            East Midlands
#> 6 ukenanawtp000026             BOURNE STW            East Midlands
#>            Site_code          Site_name Population                        geom
#> 1 UKENAN_AW_TP000004             Anwick       5866  POINT (-0.3387851 53.0357)
#> 2 UKENAN_AW_TP000012 Barton-upon-Humber      11518 POINT (-0.4339846 53.69675)
#> 3 UKENAN_AW_TP000015            Beccles      10882   POINT (1.603073 52.45682)
#> 4 UKENAN_AW_TP000016            Bedford     151259  POINT (-0.416479 52.13165)
#> 5 UKENAN_AW_TP000023             Boston      38150 POINT (0.01826465 52.94919)
#> 6 UKENAN_AW_TP000026             Bourne      21752 POINT (-0.3575288 52.76608)
summary(stw)
#>      code               name           Region_name         Site_code        
#>  Length:274         Length:274         Length:274         Length:274        
#>  Class :character   Class :character   Class :character   Class :character  
#>  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
#>                                                                             
#>                                                                             
#>                                                                             
#>   Site_name           Population                 geom    
#>  Length:274         Min.   :   2204   POINT        :274  
#>  Class :character   1st Qu.:  27412   epsg:4326    :  0  
#>  Mode  :character   Median :  59651   +proj=long...:  0  
#>                     Mean   : 134611                      
#>                     3rd Qu.: 122028                      
#>                     Max.   :3031194

Merging and Plotting

## take a subset of one day
day_7_4 = ww_daily[ww_daily$date=="2021-07-04",]

## merge with spatial by common column "Site_code":
day_7_4 = st_as_sf(merge(day_7_4, stw))

## transform concentration to log
day_7_4$log_conc = log(day_7_4$conc)

## plot
plot(day_7_4[,"log_conc"], pch=19, cex=0.5)

Better maps with context can be done with the tmap package.