Cargando…

Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods

Metabarcoding has the potential to become a rapid, sensitive, and effective approach for identifying species in complex environmental samples. Accurate molecular identification of species depends on the ability to generate operational taxonomic units (OTUs) that correspond to biological species. Due...

Descripción completa

Detalles Bibliográficos
Autores principales: Flynn, Jullien M, Brown, Emily A, Chain, Frédéric J J, MacIsaac, Hugh J, Cristescu, Melania E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BlackWell Publishing Ltd 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4461425/
https://www.ncbi.nlm.nih.gov/pubmed/26078860
http://dx.doi.org/10.1002/ece3.1497
_version_ 1782375531363172352
author Flynn, Jullien M
Brown, Emily A
Chain, Frédéric J J
MacIsaac, Hugh J
Cristescu, Melania E
author_facet Flynn, Jullien M
Brown, Emily A
Chain, Frédéric J J
MacIsaac, Hugh J
Cristescu, Melania E
author_sort Flynn, Jullien M
collection PubMed
description Metabarcoding has the potential to become a rapid, sensitive, and effective approach for identifying species in complex environmental samples. Accurate molecular identification of species depends on the ability to generate operational taxonomic units (OTUs) that correspond to biological species. Due to the sometimes enormous estimates of biodiversity using this method, there is a great need to test the efficacy of data analysis methods used to derive OTUs. Here, we evaluate the performance of various methods for clustering length variable 18S amplicons from complex samples into OTUs using a mock community and a natural community of zooplankton species. We compare analytic procedures consisting of a combination of (1) stringent and relaxed data filtering, (2) singleton sequences included and removed, (3) three commonly used clustering algorithms (mothur, UCLUST, and UPARSE), and (4) three methods of treating alignment gaps when calculating sequence divergence. Depending on the combination of methods used, the number of OTUs varied by nearly two orders of magnitude for the mock community (60–5068 OTUs) and three orders of magnitude for the natural community (22–22191 OTUs). The use of relaxed filtering and the inclusion of singletons greatly inflated OTU numbers without increasing the ability to recover species. Our results also suggest that the method used to treat gaps when calculating sequence divergence can have a great impact on the number of OTUs. Our findings are particularly relevant to studies that cover taxonomically diverse species and employ markers such as rRNA genes in which length variation is extensive.
format Online
Article
Text
id pubmed-4461425
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BlackWell Publishing Ltd
record_format MEDLINE/PubMed
spelling pubmed-44614252015-06-15 Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods Flynn, Jullien M Brown, Emily A Chain, Frédéric J J MacIsaac, Hugh J Cristescu, Melania E Ecol Evol Original Research Metabarcoding has the potential to become a rapid, sensitive, and effective approach for identifying species in complex environmental samples. Accurate molecular identification of species depends on the ability to generate operational taxonomic units (OTUs) that correspond to biological species. Due to the sometimes enormous estimates of biodiversity using this method, there is a great need to test the efficacy of data analysis methods used to derive OTUs. Here, we evaluate the performance of various methods for clustering length variable 18S amplicons from complex samples into OTUs using a mock community and a natural community of zooplankton species. We compare analytic procedures consisting of a combination of (1) stringent and relaxed data filtering, (2) singleton sequences included and removed, (3) three commonly used clustering algorithms (mothur, UCLUST, and UPARSE), and (4) three methods of treating alignment gaps when calculating sequence divergence. Depending on the combination of methods used, the number of OTUs varied by nearly two orders of magnitude for the mock community (60–5068 OTUs) and three orders of magnitude for the natural community (22–22191 OTUs). The use of relaxed filtering and the inclusion of singletons greatly inflated OTU numbers without increasing the ability to recover species. Our results also suggest that the method used to treat gaps when calculating sequence divergence can have a great impact on the number of OTUs. Our findings are particularly relevant to studies that cover taxonomically diverse species and employ markers such as rRNA genes in which length variation is extensive. BlackWell Publishing Ltd 2015-06 2015-05-13 /pmc/articles/PMC4461425/ /pubmed/26078860 http://dx.doi.org/10.1002/ece3.1497 Text en © 2015 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. http://creativecommons.org/licenses/by/4.0/ This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Research
Flynn, Jullien M
Brown, Emily A
Chain, Frédéric J J
MacIsaac, Hugh J
Cristescu, Melania E
Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods
title Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods
title_full Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods
title_fullStr Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods
title_full_unstemmed Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods
title_short Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods
title_sort toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4461425/
https://www.ncbi.nlm.nih.gov/pubmed/26078860
http://dx.doi.org/10.1002/ece3.1497
work_keys_str_mv AT flynnjullienm towardaccuratemolecularidentificationofspeciesincomplexenvironmentalsamplestestingtheperformanceofsequencefilteringandclusteringmethods
AT brownemilya towardaccuratemolecularidentificationofspeciesincomplexenvironmentalsamplestestingtheperformanceofsequencefilteringandclusteringmethods
AT chainfredericjj towardaccuratemolecularidentificationofspeciesincomplexenvironmentalsamplestestingtheperformanceofsequencefilteringandclusteringmethods
AT macisaachughj towardaccuratemolecularidentificationofspeciesincomplexenvironmentalsamplestestingtheperformanceofsequencefilteringandclusteringmethods
AT cristescumelaniae towardaccuratemolecularidentificationofspeciesincomplexenvironmentalsamplestestingtheperformanceofsequencefilteringandclusteringmethods