Cargando…
mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters
Motivation: Ancient DNA (aDNA) molecules in fossilized bones and teeth, coprolites, sediments, mummified specimens and museum collections represent fantastic sources of information for evolutionary biologists, revealing the agents of past epidemics and the dynamics of past populations. However, the...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694634/ https://www.ncbi.nlm.nih.gov/pubmed/23613487 http://dx.doi.org/10.1093/bioinformatics/btt193 |
_version_ | 1782274877809491968 |
---|---|
author | Jónsson, Hákon Ginolhac, Aurélien Schubert, Mikkel Johnson, Philip L. F. Orlando, Ludovic |
author_facet | Jónsson, Hákon Ginolhac, Aurélien Schubert, Mikkel Johnson, Philip L. F. Orlando, Ludovic |
author_sort | Jónsson, Hákon |
collection | PubMed |
description | Motivation: Ancient DNA (aDNA) molecules in fossilized bones and teeth, coprolites, sediments, mummified specimens and museum collections represent fantastic sources of information for evolutionary biologists, revealing the agents of past epidemics and the dynamics of past populations. However, the analysis of aDNA generally faces two major issues. Firstly, sequences consist of a mixture of endogenous and various exogenous backgrounds, mostly microbial. Secondly, high nucleotide misincorporation rates can be observed as a result of severe post-mortem DNA damage. Such misincorporation patterns are instrumental to authenticate ancient sequences versus modern contaminants. We recently developed the user-friendly mapDamage package that identifies such patterns from next-generation sequencing (NGS) sequence datasets. The absence of formal statistical modeling of the DNA damage process, however, precluded rigorous quantitative comparisons across samples. Results: Here, we describe mapDamage 2.0 that extends the original features of mapDamage by incorporating a statistical model of DNA damage. Assuming that damage events depend only on sequencing position and post-mortem deamination, our Bayesian statistical framework provides estimates of four key features of aDNA molecules: the average length of overhangs (λ), nick frequency (ν) and cytosine deamination rates in both double-stranded regions ([Image: see text]) and overhangs ([Image: see text]). Our model enables rescaling base quality scores according to their probability of being damaged. mapDamage 2.0 handles NGS datasets with ease and is compatible with a wide range of DNA library protocols. Availability: mapDamage 2.0 is available at ginolhac.github.io/mapDamage/ as a Python package and documentation is maintained at the Centre for GeoGenetics Web site (geogenetics.ku.dk/publications/mapdamage2.0/). Contact: jonsson.hakon@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-3694634 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-36946342013-06-27 mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters Jónsson, Hákon Ginolhac, Aurélien Schubert, Mikkel Johnson, Philip L. F. Orlando, Ludovic Bioinformatics Applications Notes Motivation: Ancient DNA (aDNA) molecules in fossilized bones and teeth, coprolites, sediments, mummified specimens and museum collections represent fantastic sources of information for evolutionary biologists, revealing the agents of past epidemics and the dynamics of past populations. However, the analysis of aDNA generally faces two major issues. Firstly, sequences consist of a mixture of endogenous and various exogenous backgrounds, mostly microbial. Secondly, high nucleotide misincorporation rates can be observed as a result of severe post-mortem DNA damage. Such misincorporation patterns are instrumental to authenticate ancient sequences versus modern contaminants. We recently developed the user-friendly mapDamage package that identifies such patterns from next-generation sequencing (NGS) sequence datasets. The absence of formal statistical modeling of the DNA damage process, however, precluded rigorous quantitative comparisons across samples. Results: Here, we describe mapDamage 2.0 that extends the original features of mapDamage by incorporating a statistical model of DNA damage. Assuming that damage events depend only on sequencing position and post-mortem deamination, our Bayesian statistical framework provides estimates of four key features of aDNA molecules: the average length of overhangs (λ), nick frequency (ν) and cytosine deamination rates in both double-stranded regions ([Image: see text]) and overhangs ([Image: see text]). Our model enables rescaling base quality scores according to their probability of being damaged. mapDamage 2.0 handles NGS datasets with ease and is compatible with a wide range of DNA library protocols. Availability: mapDamage 2.0 is available at ginolhac.github.io/mapDamage/ as a Python package and documentation is maintained at the Centre for GeoGenetics Web site (geogenetics.ku.dk/publications/mapdamage2.0/). Contact: jonsson.hakon@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2013-07-01 2013-04-23 /pmc/articles/PMC3694634/ /pubmed/23613487 http://dx.doi.org/10.1093/bioinformatics/btt193 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Applications Notes Jónsson, Hákon Ginolhac, Aurélien Schubert, Mikkel Johnson, Philip L. F. Orlando, Ludovic mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters |
title | mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters |
title_full | mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters |
title_fullStr | mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters |
title_full_unstemmed | mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters |
title_short | mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters |
title_sort | mapdamage2.0: fast approximate bayesian estimates of ancient dna damage parameters |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694634/ https://www.ncbi.nlm.nih.gov/pubmed/23613487 http://dx.doi.org/10.1093/bioinformatics/btt193 |
work_keys_str_mv | AT jonssonhakon mapdamage20fastapproximatebayesianestimatesofancientdnadamageparameters AT ginolhacaurelien mapdamage20fastapproximatebayesianestimatesofancientdnadamageparameters AT schubertmikkel mapdamage20fastapproximatebayesianestimatesofancientdnadamageparameters AT johnsonphiliplf mapdamage20fastapproximatebayesianestimatesofancientdnadamageparameters AT orlandoludovic mapdamage20fastapproximatebayesianestimatesofancientdnadamageparameters |