Cargando…
Genome sequence-based species delimitation with confidence intervals and improved distance functions
BACKGROUND: For the last 25 years species delimitation in prokaryotes (Archaea and Bacteria) was to a large extent based on DNA-DNA hybridization (DDH), a tedious lab procedure designed in the early 1970s that served its purpose astonishingly well in the absence of deciphered genome sequences. With...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3665452/ https://www.ncbi.nlm.nih.gov/pubmed/23432962 http://dx.doi.org/10.1186/1471-2105-14-60 |
_version_ | 1782271251756089344 |
---|---|
author | Meier-Kolthoff, Jan P Auch, Alexander F Klenk, Hans-Peter Göker, Markus |
author_facet | Meier-Kolthoff, Jan P Auch, Alexander F Klenk, Hans-Peter Göker, Markus |
author_sort | Meier-Kolthoff, Jan P |
collection | PubMed |
description | BACKGROUND: For the last 25 years species delimitation in prokaryotes (Archaea and Bacteria) was to a large extent based on DNA-DNA hybridization (DDH), a tedious lab procedure designed in the early 1970s that served its purpose astonishingly well in the absence of deciphered genome sequences. With the rapid progress in genome sequencing time has come to directly use the now available and easy to generate genome sequences for delimitation of species. GBDP (Genome Blast Distance Phylogeny) infers genome-to-genome distances between pairs of entirely or partially sequenced genomes, a digital, highly reliable estimator for the relatedness of genomes. Its application as an in-silico replacement for DDH was recently introduced. The main challenge in the implementation of such an application is to produce digital DDH values that must mimic the wet-lab DDH values as close as possible to ensure consistency in the Prokaryotic species concept. RESULTS: Correlation and regression analyses were used to determine the best-performing methods and the most influential parameters. GBDP was further enriched with a set of new features such as confidence intervals for intergenomic distances obtained via resampling or via the statistical models for DDH prediction and an additional family of distance functions. As in previous analyses, GBDP obtained the highest agreement with wet-lab DDH among all tested methods, but improved models led to a further increase in the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from the statistical models, whereas those obtained via resampling showed marked differences between the underlying distance functions. CONCLUSIONS: Despite the high accuracy of GBDP-based DDH prediction, inferences from limited empirical data are always associated with a certain degree of uncertainty. It is thus crucial to enrich in-silico DDH replacements with confidence-interval estimation, enabling the user to statistically evaluate the outcomes. Such methodological advancements, easily accessible through the web service at http://ggdc.dsmz.de, are crucial steps towards a consistent and truly genome sequence-based classification of microorganisms. |
format | Online Article Text |
id | pubmed-3665452 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36654522013-06-05 Genome sequence-based species delimitation with confidence intervals and improved distance functions Meier-Kolthoff, Jan P Auch, Alexander F Klenk, Hans-Peter Göker, Markus BMC Bioinformatics Methodology Article BACKGROUND: For the last 25 years species delimitation in prokaryotes (Archaea and Bacteria) was to a large extent based on DNA-DNA hybridization (DDH), a tedious lab procedure designed in the early 1970s that served its purpose astonishingly well in the absence of deciphered genome sequences. With the rapid progress in genome sequencing time has come to directly use the now available and easy to generate genome sequences for delimitation of species. GBDP (Genome Blast Distance Phylogeny) infers genome-to-genome distances between pairs of entirely or partially sequenced genomes, a digital, highly reliable estimator for the relatedness of genomes. Its application as an in-silico replacement for DDH was recently introduced. The main challenge in the implementation of such an application is to produce digital DDH values that must mimic the wet-lab DDH values as close as possible to ensure consistency in the Prokaryotic species concept. RESULTS: Correlation and regression analyses were used to determine the best-performing methods and the most influential parameters. GBDP was further enriched with a set of new features such as confidence intervals for intergenomic distances obtained via resampling or via the statistical models for DDH prediction and an additional family of distance functions. As in previous analyses, GBDP obtained the highest agreement with wet-lab DDH among all tested methods, but improved models led to a further increase in the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from the statistical models, whereas those obtained via resampling showed marked differences between the underlying distance functions. CONCLUSIONS: Despite the high accuracy of GBDP-based DDH prediction, inferences from limited empirical data are always associated with a certain degree of uncertainty. It is thus crucial to enrich in-silico DDH replacements with confidence-interval estimation, enabling the user to statistically evaluate the outcomes. Such methodological advancements, easily accessible through the web service at http://ggdc.dsmz.de, are crucial steps towards a consistent and truly genome sequence-based classification of microorganisms. BioMed Central 2013-02-21 /pmc/articles/PMC3665452/ /pubmed/23432962 http://dx.doi.org/10.1186/1471-2105-14-60 Text en Copyright © 2013 Meier-Kolthoff et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Meier-Kolthoff, Jan P Auch, Alexander F Klenk, Hans-Peter Göker, Markus Genome sequence-based species delimitation with confidence intervals and improved distance functions |
title | Genome sequence-based species delimitation with confidence intervals and improved distance functions |
title_full | Genome sequence-based species delimitation with confidence intervals and improved distance functions |
title_fullStr | Genome sequence-based species delimitation with confidence intervals and improved distance functions |
title_full_unstemmed | Genome sequence-based species delimitation with confidence intervals and improved distance functions |
title_short | Genome sequence-based species delimitation with confidence intervals and improved distance functions |
title_sort | genome sequence-based species delimitation with confidence intervals and improved distance functions |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3665452/ https://www.ncbi.nlm.nih.gov/pubmed/23432962 http://dx.doi.org/10.1186/1471-2105-14-60 |
work_keys_str_mv | AT meierkolthoffjanp genomesequencebasedspeciesdelimitationwithconfidenceintervalsandimproveddistancefunctions AT auchalexanderf genomesequencebasedspeciesdelimitationwithconfidenceintervalsandimproveddistancefunctions AT klenkhanspeter genomesequencebasedspeciesdelimitationwithconfidenceintervalsandimproveddistancefunctions AT gokermarkus genomesequencebasedspeciesdelimitationwithconfidenceintervalsandimproveddistancefunctions |