-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathqiime2-workflow.html
More file actions
361 lines (319 loc) · 30.7 KB
/
qiime2-workflow.html
File metadata and controls
361 lines (319 loc) · 30.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<title> 5 | Qiime2 workflow | Osburn Lab Protocols</title>
<meta name="description" content="This is a collection of protocols for the Osburn Lab." />
<meta name="generator" content="bookdown 0.19 and GitBook 2.6.7" />
<meta property="og:title" content=" 5 | Qiime2 workflow | Osburn Lab Protocols" />
<meta property="og:type" content="book" />
<meta property="og:description" content="This is a collection of protocols for the Osburn Lab." />
<meta name="github-repo" content="rstudio/osburnlab/protocols" />
<meta name="twitter:card" content="summary" />
<meta name="twitter:title" content=" 5 | Qiime2 workflow | Osburn Lab Protocols" />
<meta name="twitter:description" content="This is a collection of protocols for the Osburn Lab." />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black" />
<link rel="prev" href="create-a-protocol.html"/>
<link rel="next" href="quest-tutorial.html"/>
<script src="libs/jquery-2.2.3/jquery.min.js"></script>
<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-clipboard.css" rel="stylesheet" />
<style type="text/css">
a.sourceLine { display: inline-block; line-height: 1.25; }
a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
a.sourceLine:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
a.sourceLine { text-indent: -1em; padding-left: 1em; }
}
pre.numberSource a.sourceLine
{ position: relative; left: -4em; }
pre.numberSource a.sourceLine::before
{ content: attr(title);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; pointer-events: all; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
}
pre.numberSource { margin-left: 3em; padding-left: 4px; }
div.sourceCode
{ color: #cccccc; background-color: #303030; }
@media screen {
a.sourceLine::before { text-decoration: underline; }
}
code span.al { color: #ffcfaf; } /* Alert */
code span.an { color: #7f9f7f; font-weight: bold; } /* Annotation */
code span.at { } /* Attribute */
code span.bn { color: #dca3a3; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #f0dfaf; } /* ControlFlow */
code span.ch { color: #dca3a3; } /* Char */
code span.cn { color: #dca3a3; font-weight: bold; } /* Constant */
code span.co { color: #7f9f7f; } /* Comment */
code span.cv { color: #7f9f7f; font-weight: bold; } /* CommentVar */
code span.do { color: #7f9f7f; } /* Documentation */
code span.dt { color: #dfdfbf; } /* DataType */
code span.dv { color: #dcdccc; } /* DecVal */
code span.er { color: #c3bf9f; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #c0bed1; } /* Float */
code span.fu { color: #efef8f; } /* Function */
code span.im { } /* Import */
code span.in { color: #7f9f7f; font-weight: bold; } /* Information */
code span.kw { color: #f0dfaf; } /* Keyword */
code span.op { color: #f0efd0; } /* Operator */
code span.ot { color: #efef8f; } /* Other */
code span.pp { color: #ffcfaf; font-weight: bold; } /* Preprocessor */
code span.sc { color: #dca3a3; } /* SpecialChar */
code span.ss { color: #cc9393; } /* SpecialString */
code span.st { color: #cc9393; } /* String */
code span.va { } /* Variable */
code span.vs { color: #cc9393; } /* VerbatimString */
code span.wa { color: #7f9f7f; font-weight: bold; } /* Warning */
</style>
<link rel="stylesheet" href="style.css" type="text/css" />
<link rel="stylesheet" href="font-awesome.min.css" type="text/css" />
</head>
<body>
<div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
<div class="book-summary">
<nav role="navigation">
<ul class="summary">
<li>
<a href="./">
<img class="logo" src="images/LabLogo_White-01.png" height="50">
</a>
</li>
<li class="divider"></li>
<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> | About</a></li>
<li class="chapter" data-level="2" data-path="data-access.html"><a href="data-access.html"><i class="fa fa-check"></i><b>2</b> | Data Access + Storage</a><ul>
<li class="chapter" data-level="2.1" data-path="data-access.html"><a href="data-access.html#macos-users"><i class="fa fa-check"></i><b>2.1</b> MacOS Users</a></li>
<li class="chapter" data-level="2.2" data-path="data-access.html"><a href="data-access.html#windows-users"><i class="fa fa-check"></i><b>2.2</b> Windows Users</a></li>
</ul></li>
<li class="chapter" data-level="3" data-path="version-control.html"><a href="version-control.html"><i class="fa fa-check"></i><b>3</b> | Version Control</a></li>
<li class="chapter" data-level="4" data-path="create-a-protocol.html"><a href="create-a-protocol.html"><i class="fa fa-check"></i><b>4</b> | Create a Protocol</a></li>
<li class="chapter" data-level="5" data-path="qiime2-workflow.html"><a href="qiime2-workflow.html"><i class="fa fa-check"></i><b>5</b> | Qiime2 workflow</a><ul>
<li class="chapter" data-level="5.1" data-path="qiime2-workflow.html"><a href="qiime2-workflow.html#import-data"><i class="fa fa-check"></i><b>5.1</b> Import data</a></li>
<li class="chapter" data-level="5.2" data-path="qiime2-workflow.html"><a href="qiime2-workflow.html#demultiplexing"><i class="fa fa-check"></i><b>5.2</b> Demultiplexing</a></li>
<li class="chapter" data-level="5.3" data-path="qiime2-workflow.html"><a href="qiime2-workflow.html#denoising-and-asv-generation"><i class="fa fa-check"></i><b>5.3</b> Denoising and ASV generation</a></li>
<li class="chapter" data-level="5.4" data-path="qiime2-workflow.html"><a href="qiime2-workflow.html#taxonomy"><i class="fa fa-check"></i><b>5.4</b> Taxonomy</a></li>
<li class="chapter" data-level="5.5" data-path="qiime2-workflow.html"><a href="qiime2-workflow.html#taxa-barplots-and-diversity-analyses-in-qiime2"><i class="fa fa-check"></i><b>5.5</b> Taxa barplots and diversity analyses in Qiime2</a></li>
</ul></li>
<li class="chapter" data-level="6" data-path="quest-tutorial.html"><a href="quest-tutorial.html"><i class="fa fa-check"></i><b>6</b> | Quest tutorial</a><ul>
<li class="chapter" data-level="6.1" data-path="quest-tutorial.html"><a href="quest-tutorial.html#getting-acquainted-with-quest"><i class="fa fa-check"></i><b>6.1</b> Getting acquainted with Quest</a></li>
<li class="chapter" data-level="6.2" data-path="quest-tutorial.html"><a href="quest-tutorial.html#using-qiime2-on-quest"><i class="fa fa-check"></i><b>6.2</b> Using Qiime2 on Quest</a></li>
<li class="chapter" data-level="6.3" data-path="quest-tutorial.html"><a href="quest-tutorial.html#best-practices-in-a-shared-computing-environment"><i class="fa fa-check"></i><b>6.3</b> Best practices in a shared computing environment</a></li>
<li class="chapter" data-level="6.4" data-path="quest-tutorial.html"><a href="quest-tutorial.html#interactive-jobs-on-quest"><i class="fa fa-check"></i><b>6.4</b> Interactive jobs on Quest</a></li>
<li class="chapter" data-level="6.5" data-path="quest-tutorial.html"><a href="quest-tutorial.html#batch-jobs-on-quest"><i class="fa fa-check"></i><b>6.5</b> Batch jobs on Quest</a></li>
<li class="chapter" data-level="6.6" data-path="quest-tutorial.html"><a href="quest-tutorial.html#a-note-on-partitions"><i class="fa fa-check"></i><b>6.6</b> A note on partitions</a></li>
<li class="chapter" data-level="6.7" data-path="quest-tutorial.html"><a href="quest-tutorial.html#more-information"><i class="fa fa-check"></i><b>6.7</b> More information</a></li>
</ul></li>
<li class="chapter" data-level="7" data-path="DAPI.html"><a href="DAPI.html"><i class="fa fa-check"></i><b>7</b> | DAPI + Cell Counting</a><ul>
<li class="chapter" data-level="7.1" data-path="DAPI.html"><a href="DAPI.html#sample-prep"><i class="fa fa-check"></i><b>7.1</b> Sample prep</a></li>
<li class="chapter" data-level="7.2" data-path="DAPI.html"><a href="DAPI.html#operating-the-microscope"><i class="fa fa-check"></i><b>7.2</b> Operating the Microscope</a></li>
<li class="chapter" data-level="7.3" data-path="DAPI.html"><a href="DAPI.html#cell-counting"><i class="fa fa-check"></i><b>7.3</b> Cell Counting</a></li>
<li class="chapter" data-level="7.4" data-path="DAPI.html"><a href="DAPI.html#troubleshooting"><i class="fa fa-check"></i><b>7.4</b> Troubleshooting</a></li>
</ul></li>
<li class="chapter" data-level="8" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html"><i class="fa fa-check"></i><b>8</b> | Submit Sequence Data</a><ul>
<li class="chapter" data-level="8.1" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#new-submission"><i class="fa fa-check"></i><b>8.1</b> New Submission</a></li>
<li class="chapter" data-level="8.2" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#subission-type"><i class="fa fa-check"></i><b>8.2</b> 1 Subission Type</a></li>
<li class="chapter" data-level="8.3" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#submitter"><i class="fa fa-check"></i><b>8.3</b> 2 Submitter</a></li>
<li class="chapter" data-level="8.4" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#sequencing-technology"><i class="fa fa-check"></i><b>8.4</b> 3 Sequencing Technology</a></li>
<li class="chapter" data-level="8.5" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#sequences"><i class="fa fa-check"></i><b>8.5</b> 4 Sequences</a></li>
<li class="chapter" data-level="8.6" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#sequence-processing"><i class="fa fa-check"></i><b>8.6</b> 5 Sequence Processing</a></li>
<li class="chapter" data-level="8.7" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#source-info"><i class="fa fa-check"></i><b>8.7</b> 6 Source Info</a></li>
<li class="chapter" data-level="8.8" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#bioproject-info"><i class="fa fa-check"></i><b>8.8</b> 7 BioProject Info</a></li>
<li class="chapter" data-level="8.9" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#biosample-type"><i class="fa fa-check"></i><b>8.9</b> 8 BioSample Type</a></li>
<li class="chapter" data-level="8.10" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#biosample-attributes"><i class="fa fa-check"></i><b>8.10</b> 9 BioSample Attributes</a></li>
<li class="chapter" data-level="8.11" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#references"><i class="fa fa-check"></i><b>8.11</b> 10 References</a></li>
<li class="chapter" data-level="8.12" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#review-submit"><i class="fa fa-check"></i><b>8.12</b> 11 Review & Submit</a></li>
<li class="chapter" data-level="8.13" data-path="submit-sequence-data.html"><a href="submit-sequence-data.html#correcting-submission"><i class="fa fa-check"></i><b>8.13</b> Correcting Submission</a></li>
</ul></li>
<li class="chapter" data-level="9" data-path="metabolic.html"><a href="metabolic.html"><i class="fa fa-check"></i><b>9</b> | Functional Gene Annotation with METABOLIC</a></li>
<li class="divider"></li>
<li class="social">
<a target="blank" href="https://github.com/osburnlab" class="icon fa-github"></a>
<a target="blank" href="https://twitter.com/osburnlab" class="icon fa-twitter"></a>
</li>
</ul>
</nav>
</div>
<div class="book-body">
<div class="body-inner">
<div class="book-header" role="navigation">
<h1>
<i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Osburn Lab Protocols</a>
</h1>
</div>
<div class="page-wrapper" tabindex="-1" role="main">
<div class="page-inner">
<section class="normal" id="section-">
<div id="qiime2-workflow" class="section level1">
<h1><span class="header-section-number"> 5</span> | Qiime2 workflow</h1>
<p><font size="1"><strong>Created by:</strong> Matt Selensky on 2019-11-19 </br>
<strong>Last updated:</strong> 2019-11-30 </font></p>
<h2>
Workflow for 16S amplicon sequence analysis in Qiime2
</h2>
<div id="import-data" class="section level2">
<h2><span class="header-section-number">5.1</span> Import data</h2>
<p>This protocol is designed for the processing of 16S rRNA gene amplicon data using 515F/806R primers. Qiime2 requires us to convert the raw data the sequencing center sends us into Qiime-zipped artifacts, or a <code>.qza</code> extension. We must first import our data (paired-end .fastq files) into this format:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb10-1" title="1"><span class="ex">qiime</span> tools import \</a>
<a class="sourceLine" id="cb10-2" title="2"> --type EMPPairedEndSequences \</a>
<a class="sourceLine" id="cb10-3" title="3"> --input-path emp-paired-end-sequences \</a>
<a class="sourceLine" id="cb10-4" title="4"> --output-path emp-paired-end-sequences.qza</a></code></pre></div>
<p>The above function requires three .fastq files in the folder <code>emp-paired-end-sequences</code>. One file is for the forward reads, one is for the reverse reads, and another is for the barcodes. They <em>must</em> be named <code>forward.fastq.gz</code>, <code>reverse.fastq.gz</code>, and <code>barcodes.fastq.gz</code>, respectively.</p>
<p><code>qiime tools import</code> will yield a single output, <code>emp-paired-end-sequences.qza</code>, that will contain all of the barcoded reads from every single sample submitted to the sequencing center.</p>
</div>
<div id="demultiplexing" class="section level2">
<h2><span class="header-section-number">5.2</span> Demultiplexing</h2>
<p>At the sequencing center, DNA sequences were given a barcode specific to each sample so we can track which sample our reads in the <code>emp-paired-end-sequences.qza</code> file originated from. We do this by <em>demultiplexing</em> our sequences.</p>
<p>In Qiime2, we need to create a metadata file that contains the barcodes used for each sample. Check out this <a href="https://docs.google.com/spreadsheets/d/1y3yM50tW_23H7fXeou9XwyM92VNd8dCtgk8ndHOMSMs/edit#gid=0">example</a> from the Qiime2 documentation of how this metadata file should be formatted. The sequencing center should send a mapping file from which you can obtain the barcodes for each sample. Be sure to save the metadata file as a .tsv. Specify the barcodes and other associated metadata only for the samples you are interested in analyzing. As is often the case in our lab, sequencing data is typically sent back as a mix of samples from different projects. Only including your samples in the metadata file will subset the large <code>.qza</code> file and will significantly cut down on computation time.</p>
<p>Because we are demultiplexing EMP paired-end sequences, we should use the <code>demux emp-paired</code> command. The column which contains the barcode in the metadata file for each sample must be specified using the argument <code>barcodes-column</code>:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb11-1" title="1"><span class="ex">qiime</span> demux emp-paired \</a>
<a class="sourceLine" id="cb11-2" title="2"> --m-barcodes-file sample-metadata.tsv \</a>
<a class="sourceLine" id="cb11-3" title="3"> --m-barcodes-column barcode-sequence \</a>
<a class="sourceLine" id="cb11-4" title="4"> --i-seqs emp-paired-end-sequences.qza \</a>
<a class="sourceLine" id="cb11-5" title="5"> --o-per-sample-sequences demux.qza </a></code></pre></div>
<p>Note: if you have reverse complement sequences, you must pass the argument, <code>--p-rev-comp-mapping-barcodes</code> to your <code>demux</code> command to account for this.</p>
<p>You can look at this on the <a href="https://view.qiime2.org/">Qiime2 viewer</a> by producing a Qiime-zipped visualization file, <code>.qzv</code>, from your now-demultiplexed <code>.qza</code> output:</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb12-1" title="1"><span class="ex">qiime</span> demux summarize \</a>
<a class="sourceLine" id="cb12-2" title="2"> --i-data demux.qza \</a>
<a class="sourceLine" id="cb12-3" title="3"> --o-visualization demux.qzv</a></code></pre></div>
<p>From the interactive quality plot in <code>demux.qzv</code>, we can see the distribution of quality scores for each sequenced base. If analyzing paired-end data, you will see two plots: one for the forward read, and one for the reverse read. We use this visualization to inform how we will trim and truncate the ends of the reads in the next denoising step using <code>dada2 denoise-paired</code></p>
</div>
<div id="denoising-and-asv-generation" class="section level2">
<h2><span class="header-section-number">5.3</span> Denoising and ASV generation</h2>
<p>We will use the <a href="https://www.ncbi.nlm.nih.gov/pubmed/27214047">DADA2</a> algorithm to denoise our data and generate amplicon sequence variants (ASVs). DADA2 is a robust way to filter out noisy sequences, correct errors in marginal sequences, remove chimeras, remove singletons, join denoised paired-end reads, <em>and</em> dereplicate sequences. Previously, each of these functions would require separate commands, but DADA2 does it all-in-one. Therefore, this is a particularly computationally intense process. One should consider running <code>dada2</code> on a computer that can handle it (perhaps by accessing Northwestern’s high-performance computing cluster, <a href="https://www.it.northwestern.edu/research/user-services/quest/">Quest</a>).</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb13-1" title="1"><span class="ex">qiime</span> dada2 denoise-paired \</a>
<a class="sourceLine" id="cb13-2" title="2"> --i-demultiplexed-seqs demux.qza <span class="dt">\ </span></a>
<a class="sourceLine" id="cb13-3" title="3"> <span class="ex">--p-trim-left-f</span> 13 \</a>
<a class="sourceLine" id="cb13-4" title="4"> --p-trim-left-r 13 \</a>
<a class="sourceLine" id="cb13-5" title="5"> --p-trunc-len-f 150 \</a>
<a class="sourceLine" id="cb13-6" title="6"> --p-trunc-len-r 150 \</a>
<a class="sourceLine" id="cb13-7" title="7"> --o-table table.qza \</a>
<a class="sourceLine" id="cb13-8" title="8"> --o-representative-sequences rep-seqs.qza \</a>
<a class="sourceLine" id="cb13-9" title="9"> --o-denoising-stats denoising-stats.qza</a></code></pre></div>
<p>By inspecting the interactive <code>demux.qzv</code> file produced in the previous step on the <a href="https://view.qiime2.org/">Qiime2 viewer</a>, we observe that sequence quality scores are lower than average until base #14 in both the forward and reverse reads. We will want to trim these low-quality sequences from our data. Use the argument <code>p-trim</code> to specify the number of nucleotides that should be trimmed from the left end of the forward (<code>left-f</code>) and reverse (<code>left-r</code>) reads, which we define as <code>13</code> here. Similarly, the <code>p-trunc-len</code> argument is used to trim the right ends of, or <em>truncate</em>, our reads. Since we have paired-end data, our amplicons are 150 nucleotides long. We define <code>p-trunc-len</code> for the forward and reverse reads as <code>150</code> because we do not observe a drop off in quality scoring on their right ends in our <code>demux.qzv</code> file.</p>
<p>The output <code>rep-seqs.qza</code> contains a list of the ASVs found across all samples, and be used in the next step of our processing workflow: assigning taxonomy.</p>
</div>
<div id="taxonomy" class="section level2">
<h2><span class="header-section-number">5.4</span> Taxonomy</h2>
<p>At this point, our ASVs lack any meaningful identification - we don’t know whether ASV ‘A’ comes from the bacterium <em>E. coli</em> or the archaeum <em>S. solfataricus</em>! We determine who is present in our samples by assigning taxonomic IDs to each “query” sequence (from <code>rep-seqs.qza</code>). We do this by comparing query sequences to a database of known reference sequences ( <a href="https://www.arb-silva.de/documentation/silva-taxonomy/">Silva</a> is an excellent choice for our purposes). A major advantage of using Qiime2 is that it contains the <code>classify-sklearn</code> algorithm, which uses machine learning via Naive Bayes to classify sequences. As is the case with other machine learning applications, the classifier must be <em>trained</em>. Classifier training is required for every new reference database/amplicon pair, and would be the most resource-intensive step in our workflow by far. Luckily for us, Silva is routinely used to classify sequences coming from 16S rRNA gene amplification using the 515F/806R primers, and the Qiime2 developers provide a free, <a href="https://docs.qiime2.org/2019.10/data-resources/">pre-trained Silva classifier</a> in their documentation for just that! Download this classifier - you will need it! At the time this was written, I used <code>silva-132-99-515-806-nb-classifier-2018.qza</code>, the latest version of the pre-trained Silva classifier.</p>
<p>Even without the extra training step, it is highly recommended to run <code>classify-sklearn</code> on a high-performance computing cluster, as it is very memory intensive and slow (budget several hours or even a day for this to complete!). Please refer to our Quest tutorial to get started on how to submit jobs on Northwestern’s cluster.</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb14-1" title="1"><span class="ex">qiime</span> feature-classifier classify-sklearn \</a>
<a class="sourceLine" id="cb14-2" title="2"> --i-classifier classifier.qza \</a>
<a class="sourceLine" id="cb14-3" title="3"> --i-reads rep-seqs.qza \</a>
<a class="sourceLine" id="cb14-4" title="4"> --o-classification taxonomy.qza</a></code></pre></div>
<p>If you so choose, you can visualize the resultant taxonomy file on the <a href="https://view.qiime2.org/">Qiime 2 viewer</a> to verify that classification was successful:</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb15-1" title="1"><span class="ex">qiime</span> metadata tabulate \</a>
<a class="sourceLine" id="cb15-2" title="2"> --m-input-file taxonomy.qza <span class="dt">\ </span></a>
<a class="sourceLine" id="cb15-3" title="3"> <span class="ex">--o-visualization</span> taxonomy.qzv</a></code></pre></div>
</div>
<div id="taxa-barplots-and-diversity-analyses-in-qiime2" class="section level2">
<h2><span class="header-section-number">5.5</span> Taxa barplots and diversity analyses in Qiime2</h2>
<p>You can quickly visualize the community composition of your samples via the <code>taxa barplot</code> command. This requires your clustered feature table and taxonomy.qza from the previous step as inputs. In the <a href="https://view.qiime2.org/">Qiime 2 viewer</a>, you can export the data that feeds the taxa barplot as .csv files specific to each level of taxonomic classification. Use those files in R to produce publication-quality figures.</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb16-1" title="1"><span class="ex">qiime</span> taxa barplot \</a>
<a class="sourceLine" id="cb16-2" title="2"> --i-table table.qza \</a>
<a class="sourceLine" id="cb16-3" title="3"> --i-taxonomy taxonomy.qza <span class="dt">\ </span></a>
<a class="sourceLine" id="cb16-4" title="4"> <span class="ex">--m-metadata-file</span> metadata.tsv <span class="dt">\ </span></a>
<a class="sourceLine" id="cb16-5" title="5"> <span class="ex">--o-visualization</span> taxa-bar-plots.qzv</a></code></pre></div>
<p>In fact, we don’t really want to use Qiime2 for making any sort of figure for presentations or publications, but it does have a few handy tools to quickly visualize your data to inform which types of figures you want to make. Let’s start with the built-in diversity analyses offered by Qiime2.</p>
<p>Many diversity analyses compute diversity by incorporating phylogeny. That means we have to generate a phylogenetic tree of how our sequences are related to each other! Both rooted and unrooted trees are outputs of the <code>align-to-tree-mafft-fasttree</code> command. UniFrac and Faith’s Phylogenetic Diverstiy require the use of a rooted tree.</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb17-1" title="1"><span class="ex">qiime</span> phylogeny align-to-tree-mafft-fasttree \</a>
<a class="sourceLine" id="cb17-2" title="2"> --i-sequences rep-seqs.qza \</a>
<a class="sourceLine" id="cb17-3" title="3"> --o-alignment aligned-rep-seqs.qza \</a>
<a class="sourceLine" id="cb17-4" title="4"> --o-masked-alignment masked-aligned-rep-seqs.qza \</a>
<a class="sourceLine" id="cb17-5" title="5"> --o-tree unrooted-tree.qza \</a>
<a class="sourceLine" id="cb17-6" title="6"> --o-rooted-tree rooted-tree.qza</a></code></pre></div>
<p>Additionally, the analyses we are about to perform will be subsampling our data to estimate diversity. This <em>rarefaction</em> is done so we can compare diversity across samples of different sizes, thereby minimizing bias. We need to know the sequencing depth we should take so we don’t miss out on too many rare sequences (by choosing too low of a depth) or too many samples themselves (by choosing too high of a depth). By making an alpha rarefaction plot that we can visualize on the Qiime2 viewer, we can make an informed decision:</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb18-1" title="1"><span class="ex">qiime</span> diversity alpha-rarefaction \</a>
<a class="sourceLine" id="cb18-2" title="2"> --i-table table.qza \</a>
<a class="sourceLine" id="cb18-3" title="3"> --i-phylogeny rooted-tree.qza \</a>
<a class="sourceLine" id="cb18-4" title="4"> --p-max-depth 20000 \</a>
<a class="sourceLine" id="cb18-5" title="5"> --m-metadata-file metadata.tsv \</a>
<a class="sourceLine" id="cb18-6" title="6"> --o-visualization alpha-rarefaction.qzv</a></code></pre></div>
<p>From this visualization, you should choose a sequencing depth at which the observed OTUs from most samples level off, without excluding too many samples. In our example, we will choose a depth of 6500.</p>
<p>After determining the degree of rarefaction, we can compute core diversity metrics in Qiime2:</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb19-1" title="1"><span class="ex">qiime</span> diversity core-metrics-phylogenetic \</a>
<a class="sourceLine" id="cb19-2" title="2"> --i-phylogeny rooted-tree.qza \</a>
<a class="sourceLine" id="cb19-3" title="3"> --i-table table.qza \</a>
<a class="sourceLine" id="cb19-4" title="4"> --p-sampling-depth 6500 \</a>
<a class="sourceLine" id="cb19-5" title="5"> --m-metadata-file metadata.tsv \</a>
<a class="sourceLine" id="cb19-6" title="6"> --output-dir core-metrics-results</a></code></pre></div>
<p>The <code>core-metrics-results</code> folder will contain both alpha and beta diversity metrics. For each metric, you can determine diversity significance using <code>diversity alpha-group-significance</code> or <code>diversity beta-group-significance</code>:</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb20-1" title="1"><span class="ex">qiime</span> diversity alpha-group-significance \</a>
<a class="sourceLine" id="cb20-2" title="2"> --i-alpha-diversity core-metrics-results/alpha-div-metric-of-interest.qza \</a>
<a class="sourceLine" id="cb20-3" title="3"> --m-metadata-file metadata.tsv \</a>
<a class="sourceLine" id="cb20-4" title="4"> --o-visualization core-metrics-results/metric-group-significance.qzv</a></code></pre></div>
<div class="sourceCode" id="cb21"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb21-1" title="1"><span class="ex">qiime</span> diversity beta-group-significance \</a>
<a class="sourceLine" id="cb21-2" title="2"> --i-distance-matrix core-metrics-results/distance_matrix.qza \</a>
<a class="sourceLine" id="cb21-3" title="3"> --m-metadata-file metadata.tsv \</a>
<a class="sourceLine" id="cb21-4" title="4"> --m-metadata-column comparisonofinterest \</a>
<a class="sourceLine" id="cb21-5" title="5"> --o-visualization core-metrics-results/unweighted-unifrac-comparisonofinterest-significance.qzv</a></code></pre></div>
<p>Beta diversity visualizations can be viewed via Qiime2 View’s Emperor, which offers an interactive three-dimensional platform to explore relationships in your data.</p>
</div>
</div>
</section>
</div>
</div>
</div>
<a href="create-a-protocol.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
<a href="quest-tutorial.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
</div>
</div>
<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
<script src="libs/gitbook-2.6.7/js/clipboard.min.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-clipboard.js"></script>
<script>
gitbook.require(["gitbook"], function(gitbook) {
gitbook.start({
"sharing": null,
"fontsettings": {
"theme": "white",
"family": "sans",
"size": 2
},
"edit": {
"link": "https://github.com/OsburnLab/Protocols/edit/master/04-qiime2.Rmd",
"text": "Edit"
},
"history": {
"link": null,
"text": null
},
"view": {
"link": null,
"text": null
},
"download": ["bookdown-demo.pdf", "bookdown-demo.epub"],
"toc": {
"collapse": "subsection"
}
});
});
</script>
</body>
</html>