Cargando…

Microbial comparative pan-genomics using binomial mixture models

BACKGROUND: The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend t...

Descripción completa

Detalles Bibliográficos
Autores principales: Snipen, Lars, Almøy, Trygve, Ussery, David W
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2907702/
https://www.ncbi.nlm.nih.gov/pubmed/19691844
http://dx.doi.org/10.1186/1471-2164-10-385
_version_ 1782184132364730368
author Snipen, Lars
Almøy, Trygve
Ussery, David W
author_facet Snipen, Lars
Almøy, Trygve
Ussery, David W
author_sort Snipen, Lars
collection PubMed
description BACKGROUND: The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend the latter approach by using statistical ideas developed for capture-recapture problems in ecology and epidemiology. RESULTS: We estimate core- and pan-genome sizes for 16 different bacterial species. The results reveal a complex dependency structure for most species, manifested as heterogeneous detection probabilities. Estimated pan-genome sizes range from small (around 2600 gene families) in Buchnera aphidicola to large (around 43000 gene families) in Escherichia coli. Results for Echerichia coli show that as more data become available, a larger diversity is estimated, indicating an extensive pool of rarely occurring genes in the population. CONCLUSION: Analyzing pan-genomics data with binomial mixture models is a way to handle dependencies between genomes, which we find is always present. A bottleneck in the estimation procedure is the annotation of rarely occurring genes.
format Text
id pubmed-2907702
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29077022010-07-22 Microbial comparative pan-genomics using binomial mixture models Snipen, Lars Almøy, Trygve Ussery, David W BMC Genomics Methodology Article BACKGROUND: The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend the latter approach by using statistical ideas developed for capture-recapture problems in ecology and epidemiology. RESULTS: We estimate core- and pan-genome sizes for 16 different bacterial species. The results reveal a complex dependency structure for most species, manifested as heterogeneous detection probabilities. Estimated pan-genome sizes range from small (around 2600 gene families) in Buchnera aphidicola to large (around 43000 gene families) in Escherichia coli. Results for Echerichia coli show that as more data become available, a larger diversity is estimated, indicating an extensive pool of rarely occurring genes in the population. CONCLUSION: Analyzing pan-genomics data with binomial mixture models is a way to handle dependencies between genomes, which we find is always present. A bottleneck in the estimation procedure is the annotation of rarely occurring genes. BioMed Central 2009-08-19 /pmc/articles/PMC2907702/ /pubmed/19691844 http://dx.doi.org/10.1186/1471-2164-10-385 Text en Copyright ©2009 Snipen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Snipen, Lars
Almøy, Trygve
Ussery, David W
Microbial comparative pan-genomics using binomial mixture models
title Microbial comparative pan-genomics using binomial mixture models
title_full Microbial comparative pan-genomics using binomial mixture models
title_fullStr Microbial comparative pan-genomics using binomial mixture models
title_full_unstemmed Microbial comparative pan-genomics using binomial mixture models
title_short Microbial comparative pan-genomics using binomial mixture models
title_sort microbial comparative pan-genomics using binomial mixture models
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2907702/
https://www.ncbi.nlm.nih.gov/pubmed/19691844
http://dx.doi.org/10.1186/1471-2164-10-385
work_keys_str_mv AT snipenlars microbialcomparativepangenomicsusingbinomialmixturemodels
AT almøytrygve microbialcomparativepangenomicsusingbinomialmixturemodels
AT usserydavidw microbialcomparativepangenomicsusingbinomialmixturemodels