Cargando…

Occupancy Modeling, Maximum Contig Size Probabilities and Designing Metagenomics Experiments

Mathematical aspects of coverage and gaps in genome assembly have received substantial attention by bioinformaticians. Typical problems under consideration suppose that reads can be experimentally obtained from a single genome and that the number of reads will be set to cover a large percentage of t...

Descripción completa

Detalles Bibliográficos
Autor principal: Stanhope, Stephen A.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2912229/
https://www.ncbi.nlm.nih.gov/pubmed/20686599
http://dx.doi.org/10.1371/journal.pone.0011652
_version_ 1782184560909352960
author Stanhope, Stephen A.
author_facet Stanhope, Stephen A.
author_sort Stanhope, Stephen A.
collection PubMed
description Mathematical aspects of coverage and gaps in genome assembly have received substantial attention by bioinformaticians. Typical problems under consideration suppose that reads can be experimentally obtained from a single genome and that the number of reads will be set to cover a large percentage of that genome at a desired depth. In metagenomics experiments genomes from multiple species are simultaneously analyzed and obtaining large numbers of reads per genome is unlikely. We propose the probability of obtaining at least one contig of a desired minimum size from each novel genome in the pool without restriction based on depth of coverage as a metric for metagenomic experimental design. We derive an approximation to the distribution of maximum contig size for single genome assemblies using relatively few reads. This approximation is verified in simulation studies and applied to a number of different metagenomic experimental design problems, ranging in difficulty from detecting a single novel genome in a pool of known species to detecting each of a random number of novel genomes collectively sized and with abundances corresponding to given distributions in a single pool.
format Text
id pubmed-2912229
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29122292010-08-03 Occupancy Modeling, Maximum Contig Size Probabilities and Designing Metagenomics Experiments Stanhope, Stephen A. PLoS One Research Article Mathematical aspects of coverage and gaps in genome assembly have received substantial attention by bioinformaticians. Typical problems under consideration suppose that reads can be experimentally obtained from a single genome and that the number of reads will be set to cover a large percentage of that genome at a desired depth. In metagenomics experiments genomes from multiple species are simultaneously analyzed and obtaining large numbers of reads per genome is unlikely. We propose the probability of obtaining at least one contig of a desired minimum size from each novel genome in the pool without restriction based on depth of coverage as a metric for metagenomic experimental design. We derive an approximation to the distribution of maximum contig size for single genome assemblies using relatively few reads. This approximation is verified in simulation studies and applied to a number of different metagenomic experimental design problems, ranging in difficulty from detecting a single novel genome in a pool of known species to detecting each of a random number of novel genomes collectively sized and with abundances corresponding to given distributions in a single pool. Public Library of Science 2010-07-29 /pmc/articles/PMC2912229/ /pubmed/20686599 http://dx.doi.org/10.1371/journal.pone.0011652 Text en Stephen A. Stanhope. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Stanhope, Stephen A.
Occupancy Modeling, Maximum Contig Size Probabilities and Designing Metagenomics Experiments
title Occupancy Modeling, Maximum Contig Size Probabilities and Designing Metagenomics Experiments
title_full Occupancy Modeling, Maximum Contig Size Probabilities and Designing Metagenomics Experiments
title_fullStr Occupancy Modeling, Maximum Contig Size Probabilities and Designing Metagenomics Experiments
title_full_unstemmed Occupancy Modeling, Maximum Contig Size Probabilities and Designing Metagenomics Experiments
title_short Occupancy Modeling, Maximum Contig Size Probabilities and Designing Metagenomics Experiments
title_sort occupancy modeling, maximum contig size probabilities and designing metagenomics experiments
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2912229/
https://www.ncbi.nlm.nih.gov/pubmed/20686599
http://dx.doi.org/10.1371/journal.pone.0011652
work_keys_str_mv AT stanhopestephena occupancymodelingmaximumcontigsizeprobabilitiesanddesigningmetagenomicsexperiments