trait_bootstrap_ms/example_traitstrap.Rmd at master · EnquistLab/trait_bootstrap_ms · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
---
title: "Example for Traitstrap"
author: "Aud H. Hlabritter"
date: "4/6/2021"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

library("traitstrap")
library("tidyverse")
```

```{r loadd, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


## Organize your data

To use traitstrap you need to have two datasets:
* one dataset with information on the abundance/biomass/size of your community, which is a measure of dominance that can be used to weigh the traits.
* one dataset with the traits for each species (or as many species and individuals you have data for) in your community.

The datasets need to be organized in a **tidy** and **long** format. Let's have a look how these two datasets should look like.

This is the **community data** with a column called **Taxon** with all the species, **Cover** which has information about the cover of each species per plot and some columns with information about the **hierarchy** (i.e. site and plot).
Note that cover can be replaced by any other measure of dominance.
Also, there can be more levels for hierarchy, i.e. block, gradient, region etc.
Finally, taxon can also include a hierarchy, i.e. taxon, genus, family.

```{r comm-data, echo=FALSE, eval=TRUE}
community
```

The **trait data** is organized in a similar way and needs to have the columns: **Taxon**, **Trait**, **Value**, and the **hierarchy**.
The Taxon and the hierarchy need to correspond with the community data.
The Trait column contains the different traits and the value column contains the trait values.

```{r trait-data, echo=FALSE, eval=TRUE}
trait
```


## Trait imputation

The **trait_impute function** uses a hierarchical sampling design, which allows to account for incomplete trait collections, traits from different spatial or temporal levels (i.e. local traits vs. databases) and/or experimental designs.

The first two mendatory arguments in the function are the two datasets:
**comm** and **traits**

Next you need to define four columns in your datasets:
* **abundance** which is a measure of dominance of your species in your community dataset. This can be abundance, cover, biomass, size, etc.
* **taxon_col** is the column in your community and trait data that define the species names.
* **trait_col** is the column in your trait data that defines the traits
* **value_col** is the column in your trait data that defines the trait values

All the other arguments in traitstrap are not mendatory.

With **scale_hierarchy** you can define the levels at which the traits have been collected and their order starting with the highest level (i.e. global database, region, site, block, plot).
In the example below we have **Site** and **PlotID**.

The trait_impute function will choose if available a trait value from the lowest level, i.e. species X from plot A and if no trait is available from that level, it will move up the hierarchy and choose a trait from species X from plot B at the same site.
If there is no trait available from species X in the same site, it will choose a trait value from another site.

The argument **min_n_in_samples** allows users to define the minimum number in sample at each level for the trait imputation.
If the minimum number is not reached, trait values from the next level will also be imputed, to avoid sampling the same individual several times, which will result in unrealistic variances.
The default value is 5

In the **other_col** arumgent you can define columns in the trait dataset that are not important for the trait imputation but that you want to keep.


```{r trait-impute, echo=TRUE, eval=FALSE}

trait_imputation <- trait_impute(
    comm = community,
    traits = trait,

    abundance_col = "Cover",
    taxon_col = "Taxon",
    trait_col = "Trait",
    value_col = "Value",

    scale_hierarchy = c("Site", "PlotID"),
    min_n_in_sample = 3
  )

trait_imputation
```


There are two more options for defining a hierarchy.
With **taxon_col** you can define a hierarchy for the taxonomy.
If traits for a specific species are not available, traits from the same genus will be imputed.

The argument **treatment_col** allows to incorporate an experimental design where traits are preferably imputed from the same experimental treatment (i.e. control vs. treatment) and this can be defined at a certain level using the **treatment_level** argument (i.e. site).

Including all arguments would look like this:

```{r trait-impute2, echo=TRUE, eval=FALSE}

trait_imputation2 <- trait_impute(
    comm = community,
    traits = trait,

    abundance_col = "Cover",
    taxon_col = c("Taxon", "Genus"),

    trait_col = "Trait",
    value_col = "Value",

    scale_hierarchy = c("Site", "PlotID"),
    min_n_in_sample = 3

    treatment_col = "Treatment",
    treatment_level = "Site",

  )
```


## Non-parametric bootstrapping

The output of the trait imputation can be used in the **trait_np_bootstrap** function to do a **non-parametric bootstrapping**.
You also have to define **nrep** the number of trait distributions that are generated.
After generating the distributions this function calculates all statistical moments: **mean**, **variance**, **skewness** and **kurtosis**.


```{r non-parap-boot, echo=TRUE, eval=FALSE}

np_bootstrapped_moments <- trait_np_bootstrap(
  trait_imputation,
  nrep = 200
  )

np_bootstrapped_moments
```


Finally, with the **trait_summarise_boot_moments** the moments can be summarized and confidence intervalls calcualted.
With **sd_mult** you can define which confidence intervall is calcualted.

```{r summarize, echo=TRUE, eval=FALSE}

sum_boot_moment <- trait_summarise_boot_moments(
  np_bootstrapped_moments,
  sd_mult = 1.96
  )

sum_boot_moment
```


## Parametric bootstrapping

Traitstrap also features two functions to allow for **parametric bootstrapping**:

The **trait_fit_distributions** function fits parametric distributions for each species-by-trait combination at the finest scale of the user-supplied hierarchy.
This function takes as input:
1) an object of class imputed traits (as produced by the function trait_impute), and
2) the type of distribution to be fitted.
Either a single distribution type can be used for all traits, or traits can be assigned specific distributions types by supplying the function with a named list of traits.
Currently supported distribution types are normal, log-normal, and beta.
The function returns a dataframe containing fitted distribution parameters.

```{r fit-dist, echo=TRUE, eval=FALSE}

fitted_distributions <- trait_fit_distributions(
  imputed_traits = trait_imputation,
  distribution_type = "lognormal"
  )

fitted_distributions
```


The **trait_parametric_bootstrap** function is a parametric analog of the trait_np_bootstrap function.
It takes in fitted trait distributions produced by **trait_fit_distributions** and randomly samples from among the fitted distributions proportionally to species abundances.
As with trait_np_bootstrap, the number of samples per replicated draw are specified with the parameter sample_size, and the number of replicated draws is specified by the parameter **nrep**.
The output of trait_parametric_bootstrap can be summarized using **trait_summarize_boot_moments** (see above).

```{r para-boot, echo=TRUE, eval=FALSE}

p_bootstrapped_moments <- trait_parametric_bootstrap(
    fitted_distributions = fitted_distributions,
    nrep = nrep_pbs
    )

p_bootstrapped_moments
```


## Check your data

Traitstrap has a couple of functions to check your data.

The **coverage_plot** function shows the trait coverage of the community for each level.
Basically, this function summarized from which level the traits are imputed, and how much coverage of the community is reached.
This can be important plot to show, because we know that the coverage should be around X% (ref to the paper!)

```{r coverage-plot, echo=TRUE, eval=FALSE}

autoplot(trait_imputation)

```


Another important information is to know of which taxa the traits are missing.
Traitstrap has a function **trait_missing** which gives you this overview.


```{r missing-traits, echo=TRUE, eval=FALSE}

trait_missing(trait_impute = trait_imputation,
              comm = community)

```