Cargando…

Divide and Conquer: Enriching Environmental Sequencing Data

BACKGROUND: In environmental sequencing projects, a mix of DNA from a whole microbial community is fragmented and sequenced, with one of the possible goals being to reconstruct partial or complete genomes of members of the community. In communities with high diversity of species, a significant propo...

Descripción completa

Detalles Bibliográficos
Autores principales: Bergeron, Anne, Belcaid, Mahdi, Steward, Grieg F., Poisson, Guylaine
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1952108/
https://www.ncbi.nlm.nih.gov/pubmed/17786202
http://dx.doi.org/10.1371/journal.pone.0000830
_version_ 1782134592606568448
author Bergeron, Anne
Belcaid, Mahdi
Steward, Grieg F.
Poisson, Guylaine
author_facet Bergeron, Anne
Belcaid, Mahdi
Steward, Grieg F.
Poisson, Guylaine
author_sort Bergeron, Anne
collection PubMed
description BACKGROUND: In environmental sequencing projects, a mix of DNA from a whole microbial community is fragmented and sequenced, with one of the possible goals being to reconstruct partial or complete genomes of members of the community. In communities with high diversity of species, a significant proportion of the sequences do not overlap any other fragment in the sample. This problem will arise not only in situations with a relatively even distribution of many species, but also when the community in a particular environment is routinely dominated by the same few species. In the former case, no genomes may be assembled at all, while in the latter case a few dominant species in an environment will always be sequenced at high coverage to the detriment of coverage of the greater number of sparse species. METHODS AND RESULTS: Here we show that, with the same global sequencing effort, separating the species into two or more sub-communities prior to sequencing can yield a much higher proportion of sequences that can be assembled. We first use the Lander-Waterman model to show that, if the expected percentage of singleton sequences is higher than 25%, then, under the uniform distribution hypothesis, splitting the community is always a wise choice. We then construct simulated microbial communities to show that the results hold for highly non-uniform distributions. We also show that, for the distributions considered in the experiments, it is possible to estimate quite accurately the relative diversity of the two sub-communities. CONCLUSION: Given the fact that several methods exist to split microbial communities based on physical properties such as size, density, surface biochemistry, or optical properties, we strongly suggest that groups involved in environmental sequencing, and expecting high diversity, consider splitting their communities in order to maximize the information content of their sequencing effort.
format Text
id pubmed-1952108
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-19521082007-09-05 Divide and Conquer: Enriching Environmental Sequencing Data Bergeron, Anne Belcaid, Mahdi Steward, Grieg F. Poisson, Guylaine PLoS One Research Article BACKGROUND: In environmental sequencing projects, a mix of DNA from a whole microbial community is fragmented and sequenced, with one of the possible goals being to reconstruct partial or complete genomes of members of the community. In communities with high diversity of species, a significant proportion of the sequences do not overlap any other fragment in the sample. This problem will arise not only in situations with a relatively even distribution of many species, but also when the community in a particular environment is routinely dominated by the same few species. In the former case, no genomes may be assembled at all, while in the latter case a few dominant species in an environment will always be sequenced at high coverage to the detriment of coverage of the greater number of sparse species. METHODS AND RESULTS: Here we show that, with the same global sequencing effort, separating the species into two or more sub-communities prior to sequencing can yield a much higher proportion of sequences that can be assembled. We first use the Lander-Waterman model to show that, if the expected percentage of singleton sequences is higher than 25%, then, under the uniform distribution hypothesis, splitting the community is always a wise choice. We then construct simulated microbial communities to show that the results hold for highly non-uniform distributions. We also show that, for the distributions considered in the experiments, it is possible to estimate quite accurately the relative diversity of the two sub-communities. CONCLUSION: Given the fact that several methods exist to split microbial communities based on physical properties such as size, density, surface biochemistry, or optical properties, we strongly suggest that groups involved in environmental sequencing, and expecting high diversity, consider splitting their communities in order to maximize the information content of their sequencing effort. Public Library of Science 2007-09-05 /pmc/articles/PMC1952108/ /pubmed/17786202 http://dx.doi.org/10.1371/journal.pone.0000830 Text en Bergeron et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bergeron, Anne
Belcaid, Mahdi
Steward, Grieg F.
Poisson, Guylaine
Divide and Conquer: Enriching Environmental Sequencing Data
title Divide and Conquer: Enriching Environmental Sequencing Data
title_full Divide and Conquer: Enriching Environmental Sequencing Data
title_fullStr Divide and Conquer: Enriching Environmental Sequencing Data
title_full_unstemmed Divide and Conquer: Enriching Environmental Sequencing Data
title_short Divide and Conquer: Enriching Environmental Sequencing Data
title_sort divide and conquer: enriching environmental sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1952108/
https://www.ncbi.nlm.nih.gov/pubmed/17786202
http://dx.doi.org/10.1371/journal.pone.0000830
work_keys_str_mv AT bergeronanne divideandconquerenrichingenvironmentalsequencingdata
AT belcaidmahdi divideandconquerenrichingenvironmentalsequencingdata
AT stewardgriegf divideandconquerenrichingenvironmentalsequencingdata
AT poissonguylaine divideandconquerenrichingenvironmentalsequencingdata