Cargando…

Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities

BACKGROUND: Recent advances in sequencing strategies make possible unprecedented depth and scale of sampling for molecular detection of microbial diversity. Two major paradigm-shifting discoveries include the detection of bacterial diversity that is one to two orders of magnitude greater than previo...

Descripción completa

Detalles Bibliográficos
Autores principales: Stoeck, Thorsten, Behnke, Anke, Christen, Richard, Amaral-Zettler, Linda, Rodriguez-Mora, Maria J, Chistoserdov, Andrei, Orsi, William, Edgcomb, Virginia P
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2777867/
https://www.ncbi.nlm.nih.gov/pubmed/19886985
http://dx.doi.org/10.1186/1741-7007-7-72
_version_ 1782174205006053376
author Stoeck, Thorsten
Behnke, Anke
Christen, Richard
Amaral-Zettler, Linda
Rodriguez-Mora, Maria J
Chistoserdov, Andrei
Orsi, William
Edgcomb, Virginia P
author_facet Stoeck, Thorsten
Behnke, Anke
Christen, Richard
Amaral-Zettler, Linda
Rodriguez-Mora, Maria J
Chistoserdov, Andrei
Orsi, William
Edgcomb, Virginia P
author_sort Stoeck, Thorsten
collection PubMed
description BACKGROUND: Recent advances in sequencing strategies make possible unprecedented depth and scale of sampling for molecular detection of microbial diversity. Two major paradigm-shifting discoveries include the detection of bacterial diversity that is one to two orders of magnitude greater than previous estimates, and the discovery of an exciting 'rare biosphere' of molecular signatures ('species') of poorly understood ecological significance. We applied a high-throughput parallel tag sequencing (454 sequencing) protocol adopted for eukaryotes to investigate protistan community complexity in two contrasting anoxic marine ecosystems (Framvaren Fjord, Norway; Cariaco deep-sea basin, Venezuela). Both sampling sites have previously been scrutinized for protistan diversity by traditional clone library construction and Sanger sequencing. By comparing these clone library data with 454 amplicon library data, we assess the efficiency of high-throughput tag sequencing strategies. We here present a novel, highly conservative bioinformatic analysis pipeline for the processing of large tag sequence data sets. RESULTS: The analyses of ca. 250,000 sequence reads revealed that the number of detected Operational Taxonomic Units (OTUs) far exceeded previous richness estimates from the same sites based on clone libraries and Sanger sequencing. More than 90% of this diversity was represented by OTUs with less than 10 sequence tags. We detected a substantial number of taxonomic groups like Apusozoa, Chrysomerophytes, Centroheliozoa, Eustigmatophytes, hyphochytriomycetes, Ichthyosporea, Oikomonads, Phaeothamniophytes, and rhodophytes which remained undetected by previous clone library-based diversity surveys of the sampling sites. The most important innovations in our newly developed bioinformatics pipeline employ (i) BLASTN with query parameters adjusted for highly variable domains and a complete database of public ribosomal RNA (rRNA) gene sequences for taxonomic assignments of tags; (ii) a clustering of tags at k differences (Levenshtein distance) with a newly developed algorithm enabling very fast OTU clustering for large tag sequence data sets; and (iii) a novel parsing procedure to combine the data from individual analyses. CONCLUSION: Our data highlight the magnitude of the under-sampled 'protistan gap' in the eukaryotic tree of life. This study illustrates that our current understanding of the ecological complexity of protist communities, and of the global species richness and genome diversity of protists, is severely limited. Even though 454 pyrosequencing is not a panacea, it allows for more comprehensive insights into the diversity of protistan communities, and combined with appropriate statistical tools, enables improved ecological interpretations of the data and projections of global diversity.
format Text
id pubmed-2777867
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27778672009-11-17 Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities Stoeck, Thorsten Behnke, Anke Christen, Richard Amaral-Zettler, Linda Rodriguez-Mora, Maria J Chistoserdov, Andrei Orsi, William Edgcomb, Virginia P BMC Biol Research Article BACKGROUND: Recent advances in sequencing strategies make possible unprecedented depth and scale of sampling for molecular detection of microbial diversity. Two major paradigm-shifting discoveries include the detection of bacterial diversity that is one to two orders of magnitude greater than previous estimates, and the discovery of an exciting 'rare biosphere' of molecular signatures ('species') of poorly understood ecological significance. We applied a high-throughput parallel tag sequencing (454 sequencing) protocol adopted for eukaryotes to investigate protistan community complexity in two contrasting anoxic marine ecosystems (Framvaren Fjord, Norway; Cariaco deep-sea basin, Venezuela). Both sampling sites have previously been scrutinized for protistan diversity by traditional clone library construction and Sanger sequencing. By comparing these clone library data with 454 amplicon library data, we assess the efficiency of high-throughput tag sequencing strategies. We here present a novel, highly conservative bioinformatic analysis pipeline for the processing of large tag sequence data sets. RESULTS: The analyses of ca. 250,000 sequence reads revealed that the number of detected Operational Taxonomic Units (OTUs) far exceeded previous richness estimates from the same sites based on clone libraries and Sanger sequencing. More than 90% of this diversity was represented by OTUs with less than 10 sequence tags. We detected a substantial number of taxonomic groups like Apusozoa, Chrysomerophytes, Centroheliozoa, Eustigmatophytes, hyphochytriomycetes, Ichthyosporea, Oikomonads, Phaeothamniophytes, and rhodophytes which remained undetected by previous clone library-based diversity surveys of the sampling sites. The most important innovations in our newly developed bioinformatics pipeline employ (i) BLASTN with query parameters adjusted for highly variable domains and a complete database of public ribosomal RNA (rRNA) gene sequences for taxonomic assignments of tags; (ii) a clustering of tags at k differences (Levenshtein distance) with a newly developed algorithm enabling very fast OTU clustering for large tag sequence data sets; and (iii) a novel parsing procedure to combine the data from individual analyses. CONCLUSION: Our data highlight the magnitude of the under-sampled 'protistan gap' in the eukaryotic tree of life. This study illustrates that our current understanding of the ecological complexity of protist communities, and of the global species richness and genome diversity of protists, is severely limited. Even though 454 pyrosequencing is not a panacea, it allows for more comprehensive insights into the diversity of protistan communities, and combined with appropriate statistical tools, enables improved ecological interpretations of the data and projections of global diversity. BioMed Central 2009-11-03 /pmc/articles/PMC2777867/ /pubmed/19886985 http://dx.doi.org/10.1186/1741-7007-7-72 Text en Copyright © 2009 Stoeck et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Stoeck, Thorsten
Behnke, Anke
Christen, Richard
Amaral-Zettler, Linda
Rodriguez-Mora, Maria J
Chistoserdov, Andrei
Orsi, William
Edgcomb, Virginia P
Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities
title Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities
title_full Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities
title_fullStr Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities
title_full_unstemmed Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities
title_short Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities
title_sort massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2777867/
https://www.ncbi.nlm.nih.gov/pubmed/19886985
http://dx.doi.org/10.1186/1741-7007-7-72
work_keys_str_mv AT stoeckthorsten massivelyparalleltagsequencingrevealsthecomplexityofanaerobicmarineprotistancommunities
AT behnkeanke massivelyparalleltagsequencingrevealsthecomplexityofanaerobicmarineprotistancommunities
AT christenrichard massivelyparalleltagsequencingrevealsthecomplexityofanaerobicmarineprotistancommunities
AT amaralzettlerlinda massivelyparalleltagsequencingrevealsthecomplexityofanaerobicmarineprotistancommunities
AT rodriguezmoramariaj massivelyparalleltagsequencingrevealsthecomplexityofanaerobicmarineprotistancommunities
AT chistoserdovandrei massivelyparalleltagsequencingrevealsthecomplexityofanaerobicmarineprotistancommunities
AT orsiwilliam massivelyparalleltagsequencingrevealsthecomplexityofanaerobicmarineprotistancommunities
AT edgcombvirginiap massivelyparalleltagsequencingrevealsthecomplexityofanaerobicmarineprotistancommunities