Cargando…

Optimization and performance testing of a sequence processing pipeline applied to detection of nonindigenous species

Genetic taxonomic assignment can be more sensitive than morphological taxonomic assignment, particularly for small, cryptic or rare species. Sequence processing is essential to taxonomic assignment, but can also produce errors because optimal parameters are not known a priori. Here, we explored how...

Descripción completa

Detalles Bibliográficos
Autores principales: Scott, Ryan, Zhan, Aibin, Brown, Emily A., Chain, Frédéric J. J., Cristescu, Melania E., Gras, Robin, MacIsaac, Hugh J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5999198/
https://www.ncbi.nlm.nih.gov/pubmed/29928298
http://dx.doi.org/10.1111/eva.12604
_version_ 1783331384386912256
author Scott, Ryan
Zhan, Aibin
Brown, Emily A.
Chain, Frédéric J. J.
Cristescu, Melania E.
Gras, Robin
MacIsaac, Hugh J.
author_facet Scott, Ryan
Zhan, Aibin
Brown, Emily A.
Chain, Frédéric J. J.
Cristescu, Melania E.
Gras, Robin
MacIsaac, Hugh J.
author_sort Scott, Ryan
collection PubMed
description Genetic taxonomic assignment can be more sensitive than morphological taxonomic assignment, particularly for small, cryptic or rare species. Sequence processing is essential to taxonomic assignment, but can also produce errors because optimal parameters are not known a priori. Here, we explored how sequence processing parameters influence taxonomic assignment of 18S sequences from bulk zooplankton samples produced by 454 pyrosequencing. We optimized a sequence processing pipeline for two common research goals, estimation of species richness and early detection of aquatic invasive species (AIS), and then tested most optimal models’ performances through simulations. We tested 1,050 parameter sets on 18S sequences from 20 AIS to determine optimal parameters for each research goal. We tested optimized pipelines’ performances (detectability and sensitivity) by computationally inoculating sequences of 20 AIS into ten bulk zooplankton samples from ports across Canada. We found that optimal parameter selection generally depends on the research goal. However, regardless of research goal, we found that metazoan 18S sequences produced by 454 pyrosequencing should be trimmed to 375–400 bp and sequence quality filtering should be relaxed (1.5 ≤ maximum expected error ≤ 3.0, Phred score = 10). Clustering and denoising were only viable for estimating species richness, because these processing steps made some species undetectable at low sequence abundances which would not be useful for early detection of AIS. With parameter sets optimized for early detection of AIS, 90% of AIS were detected with fewer than 11 target sequences, regardless of whether clustering or denoising was used. Despite developments in next‐generation sequencing, sequence processing remains an important issue owing to difficulties in balancing false‐positive and false‐negative errors in metabarcoding data.
format Online
Article
Text
id pubmed-5999198
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-59991982018-06-20 Optimization and performance testing of a sequence processing pipeline applied to detection of nonindigenous species Scott, Ryan Zhan, Aibin Brown, Emily A. Chain, Frédéric J. J. Cristescu, Melania E. Gras, Robin MacIsaac, Hugh J. Evol Appl Original Articles Genetic taxonomic assignment can be more sensitive than morphological taxonomic assignment, particularly for small, cryptic or rare species. Sequence processing is essential to taxonomic assignment, but can also produce errors because optimal parameters are not known a priori. Here, we explored how sequence processing parameters influence taxonomic assignment of 18S sequences from bulk zooplankton samples produced by 454 pyrosequencing. We optimized a sequence processing pipeline for two common research goals, estimation of species richness and early detection of aquatic invasive species (AIS), and then tested most optimal models’ performances through simulations. We tested 1,050 parameter sets on 18S sequences from 20 AIS to determine optimal parameters for each research goal. We tested optimized pipelines’ performances (detectability and sensitivity) by computationally inoculating sequences of 20 AIS into ten bulk zooplankton samples from ports across Canada. We found that optimal parameter selection generally depends on the research goal. However, regardless of research goal, we found that metazoan 18S sequences produced by 454 pyrosequencing should be trimmed to 375–400 bp and sequence quality filtering should be relaxed (1.5 ≤ maximum expected error ≤ 3.0, Phred score = 10). Clustering and denoising were only viable for estimating species richness, because these processing steps made some species undetectable at low sequence abundances which would not be useful for early detection of AIS. With parameter sets optimized for early detection of AIS, 90% of AIS were detected with fewer than 11 target sequences, regardless of whether clustering or denoising was used. Despite developments in next‐generation sequencing, sequence processing remains an important issue owing to difficulties in balancing false‐positive and false‐negative errors in metabarcoding data. John Wiley and Sons Inc. 2018-02-20 /pmc/articles/PMC5999198/ /pubmed/29928298 http://dx.doi.org/10.1111/eva.12604 Text en © 2018 The Authors. Evolutionary Applications published by John Wiley & Sons Ltd This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Articles
Scott, Ryan
Zhan, Aibin
Brown, Emily A.
Chain, Frédéric J. J.
Cristescu, Melania E.
Gras, Robin
MacIsaac, Hugh J.
Optimization and performance testing of a sequence processing pipeline applied to detection of nonindigenous species
title Optimization and performance testing of a sequence processing pipeline applied to detection of nonindigenous species
title_full Optimization and performance testing of a sequence processing pipeline applied to detection of nonindigenous species
title_fullStr Optimization and performance testing of a sequence processing pipeline applied to detection of nonindigenous species
title_full_unstemmed Optimization and performance testing of a sequence processing pipeline applied to detection of nonindigenous species
title_short Optimization and performance testing of a sequence processing pipeline applied to detection of nonindigenous species
title_sort optimization and performance testing of a sequence processing pipeline applied to detection of nonindigenous species
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5999198/
https://www.ncbi.nlm.nih.gov/pubmed/29928298
http://dx.doi.org/10.1111/eva.12604
work_keys_str_mv AT scottryan optimizationandperformancetestingofasequenceprocessingpipelineappliedtodetectionofnonindigenousspecies
AT zhanaibin optimizationandperformancetestingofasequenceprocessingpipelineappliedtodetectionofnonindigenousspecies
AT brownemilya optimizationandperformancetestingofasequenceprocessingpipelineappliedtodetectionofnonindigenousspecies
AT chainfredericjj optimizationandperformancetestingofasequenceprocessingpipelineappliedtodetectionofnonindigenousspecies
AT cristescumelaniae optimizationandperformancetestingofasequenceprocessingpipelineappliedtodetectionofnonindigenousspecies
AT grasrobin optimizationandperformancetestingofasequenceprocessingpipelineappliedtodetectionofnonindigenousspecies
AT macisaachughj optimizationandperformancetestingofasequenceprocessingpipelineappliedtodetectionofnonindigenousspecies