Cargando…

Multivariate Cutoff Level Analysis (MultiCoLA) of large community data sets

High-throughput sequencing techniques are becoming attractive to molecular biologists and ecologists as they provide a time- and cost-effective way to explore diversity patterns in environmental samples at an unprecedented resolution. An issue common to many studies is the definition of what fractio...

Descripción completa

Detalles Bibliográficos
Autores principales: Gobet, Angélique, Quince, Christopher, Ramette, Alban
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2926624/
https://www.ncbi.nlm.nih.gov/pubmed/20547594
http://dx.doi.org/10.1093/nar/gkq545
_version_ 1782185714873532416
author Gobet, Angélique
Quince, Christopher
Ramette, Alban
author_facet Gobet, Angélique
Quince, Christopher
Ramette, Alban
author_sort Gobet, Angélique
collection PubMed
description High-throughput sequencing techniques are becoming attractive to molecular biologists and ecologists as they provide a time- and cost-effective way to explore diversity patterns in environmental samples at an unprecedented resolution. An issue common to many studies is the definition of what fractions of a data set should be considered as rare or dominant. Yet this question has neither been satisfactorily addressed, nor is the impact of such definition on data set structure and interpretation been fully evaluated. Here we propose a strategy, MultiCoLA (Multivariate Cutoff Level Analysis), to systematically assess the impact of various abundance or rarity cutoff levels on the resulting data set structure and on the consistency of the further ecological interpretation. We applied MultiCoLA to a 454 massively parallel tag sequencing data set of V6 ribosomal sequences from marine microbes in temperate coastal sands. Consistent ecological patterns were maintained after removing up to 35–40% rare sequences and similar patterns of beta diversity were observed after denoising the data set by using a preclustering algorithm of 454 flowgrams. This example validates the importance of exploring the impact of the definition of rarity in large community data sets. Future applications can be foreseen for data sets from different types of habitats, e.g. other marine environments, soil and human microbiota.
format Text
id pubmed-2926624
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29266242010-08-30 Multivariate Cutoff Level Analysis (MultiCoLA) of large community data sets Gobet, Angélique Quince, Christopher Ramette, Alban Nucleic Acids Res Methods Online High-throughput sequencing techniques are becoming attractive to molecular biologists and ecologists as they provide a time- and cost-effective way to explore diversity patterns in environmental samples at an unprecedented resolution. An issue common to many studies is the definition of what fractions of a data set should be considered as rare or dominant. Yet this question has neither been satisfactorily addressed, nor is the impact of such definition on data set structure and interpretation been fully evaluated. Here we propose a strategy, MultiCoLA (Multivariate Cutoff Level Analysis), to systematically assess the impact of various abundance or rarity cutoff levels on the resulting data set structure and on the consistency of the further ecological interpretation. We applied MultiCoLA to a 454 massively parallel tag sequencing data set of V6 ribosomal sequences from marine microbes in temperate coastal sands. Consistent ecological patterns were maintained after removing up to 35–40% rare sequences and similar patterns of beta diversity were observed after denoising the data set by using a preclustering algorithm of 454 flowgrams. This example validates the importance of exploring the impact of the definition of rarity in large community data sets. Future applications can be foreseen for data sets from different types of habitats, e.g. other marine environments, soil and human microbiota. Oxford University Press 2010-08 2010-06-14 /pmc/articles/PMC2926624/ /pubmed/20547594 http://dx.doi.org/10.1093/nar/gkq545 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Gobet, Angélique
Quince, Christopher
Ramette, Alban
Multivariate Cutoff Level Analysis (MultiCoLA) of large community data sets
title Multivariate Cutoff Level Analysis (MultiCoLA) of large community data sets
title_full Multivariate Cutoff Level Analysis (MultiCoLA) of large community data sets
title_fullStr Multivariate Cutoff Level Analysis (MultiCoLA) of large community data sets
title_full_unstemmed Multivariate Cutoff Level Analysis (MultiCoLA) of large community data sets
title_short Multivariate Cutoff Level Analysis (MultiCoLA) of large community data sets
title_sort multivariate cutoff level analysis (multicola) of large community data sets
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2926624/
https://www.ncbi.nlm.nih.gov/pubmed/20547594
http://dx.doi.org/10.1093/nar/gkq545
work_keys_str_mv AT gobetangelique multivariatecutofflevelanalysismulticolaoflargecommunitydatasets
AT quincechristopher multivariatecutofflevelanalysismulticolaoflargecommunitydatasets
AT ramettealban multivariatecutofflevelanalysismulticolaoflargecommunitydatasets