Data/index.Rmd at main · codatmo/Data · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
title: "Data for CoDatMo"
author: "Breck Baldwin"
date: "3/25/2021"
output:
  html_document:
    includes:
       in_header: _html/ga.html
---
The global setup for this Rmarkdown document is:
```{r setup, include=TRUE}
knitr::opts_chunk$set(
	echo = TRUE,
	message = FALSE,
	warning = FALSE,
	results = "hide",
	error = FALSE,
	comment = ''
)
```

# Welcome to the CoDatMo data repository

This document exists in the repository at [https://github.com/codatmo/Data/](https://github.com/codatmo/Data) as `index.Rmd` and rendered in html as [https://github.com/codatmo/Data/index.html](https://github.com/codatmo/Data/index.html).

Additional data and descriptions are at [https://github.com/codatmo/Data/blob/main/README.md](https://github.com/codatmo/Data/blob/main/README.md) which also contains a link to this file.

This model is part of the CoDatMo (Co)vid (Dat)a (Mo)deling site (https://codatmo.github.io/) which is intended to replicate and make available important COVID models written in Bayesian modeling languages like Stan or PyMC3.

# Contents

This page explain various data sets in the repo and show how to load them in R data frames.

## UK

```{r}
library(tidyr)

# be sure to set working directory correctly

deaths <- read.csv("NHS_Regions_Datasets/nhs_regions_deaths.csv",
                   header=FALSE)
deaths_t <- t(deaths)
region_names <- deaths_t[1,2:8]
deaths.df <- data.frame(deaths_t[c(-1,-2),])
colnames(deaths.df) <- c('week', region_names)
deaths_long.df <- gather(deaths.df,'area','deaths',2:8)
deaths_long.df$death_count <- as.numeric(deaths_long.df$deaths)
deaths_long.df$date <- as.Date(deaths_long.df$week)

```

Deaths are collected weekly.

```{r}
library(ggplot2)

ggplot(deaths_long.df) +
  aes(x=date, y=death_count, group=area, color=area) +
  geom_line() -> p1

p1

```


```{r}
library(tidyr)
calls_111 <- read.csv("NHS_Regions_Datasets/nhs_regions_111_calls.csv",
                      header=FALSE, row.names=NULL)
calls_111_t <- t(calls_111)
calls_111.df <- data.frame(calls_111_t)
colnames(calls_111.df) <- c('date',calls_111_t[1,2:8])
calls_111.df <- calls_111.df[c(-1,-2),] #get rid of names and source
calls_111_long.df <- gather(calls_111.df, 'loc', 'call_count', 2:8)
calls_111_long.df$calls <- as.numeric(calls_111_long.df$call_count)

head(calls_111_long.df)


```

```{r}
library(ggplot2)
library(gridExtra)

ggplot(calls_111_long.df) +
  aes(x=date, y=calls, group=loc, color=loc) +
  geom_line() -> p2

p2

```

## Serialize data
```{r}
saveRDS(calls_111.df, "NHS_Regions_Datasets/nhs_regions_111_calls.Rds")
saveRDS(deaths.df, "NHS_Regions_Datasets/nhs_regions_deaths.Rds")

```

Serializing data for use by other projects. Known uses:

* [https://codatmo.github.io/Liverpool/model_reproduction_checklist.html](https://codatmo.github.io/Liverpool/model_reproduction_checklist.html)