Cargando…

mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters

Motivation: Ancient DNA (aDNA) molecules in fossilized bones and teeth, coprolites, sediments, mummified specimens and museum collections represent fantastic sources of information for evolutionary biologists, revealing the agents of past epidemics and the dynamics of past populations. However, the...

Descripción completa

Detalles Bibliográficos
Autores principales: Jónsson, Hákon, Ginolhac, Aurélien, Schubert, Mikkel, Johnson, Philip L. F., Orlando, Ludovic
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694634/
https://www.ncbi.nlm.nih.gov/pubmed/23613487
http://dx.doi.org/10.1093/bioinformatics/btt193
_version_ 1782274877809491968
author Jónsson, Hákon
Ginolhac, Aurélien
Schubert, Mikkel
Johnson, Philip L. F.
Orlando, Ludovic
author_facet Jónsson, Hákon
Ginolhac, Aurélien
Schubert, Mikkel
Johnson, Philip L. F.
Orlando, Ludovic
author_sort Jónsson, Hákon
collection PubMed
description Motivation: Ancient DNA (aDNA) molecules in fossilized bones and teeth, coprolites, sediments, mummified specimens and museum collections represent fantastic sources of information for evolutionary biologists, revealing the agents of past epidemics and the dynamics of past populations. However, the analysis of aDNA generally faces two major issues. Firstly, sequences consist of a mixture of endogenous and various exogenous backgrounds, mostly microbial. Secondly, high nucleotide misincorporation rates can be observed as a result of severe post-mortem DNA damage. Such misincorporation patterns are instrumental to authenticate ancient sequences versus modern contaminants. We recently developed the user-friendly mapDamage package that identifies such patterns from next-generation sequencing (NGS) sequence datasets. The absence of formal statistical modeling of the DNA damage process, however, precluded rigorous quantitative comparisons across samples. Results: Here, we describe mapDamage 2.0 that extends the original features of mapDamage by incorporating a statistical model of DNA damage. Assuming that damage events depend only on sequencing position and post-mortem deamination, our Bayesian statistical framework provides estimates of four key features of aDNA molecules: the average length of overhangs (λ), nick frequency (ν) and cytosine deamination rates in both double-stranded regions ([Image: see text]) and overhangs ([Image: see text]). Our model enables rescaling base quality scores according to their probability of being damaged. mapDamage 2.0 handles NGS datasets with ease and is compatible with a wide range of DNA library protocols. Availability: mapDamage 2.0 is available at ginolhac.github.io/mapDamage/ as a Python package and documentation is maintained at the Centre for GeoGenetics Web site (geogenetics.ku.dk/publications/mapdamage2.0/). Contact: jonsson.hakon@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3694634
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36946342013-06-27 mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters Jónsson, Hákon Ginolhac, Aurélien Schubert, Mikkel Johnson, Philip L. F. Orlando, Ludovic Bioinformatics Applications Notes Motivation: Ancient DNA (aDNA) molecules in fossilized bones and teeth, coprolites, sediments, mummified specimens and museum collections represent fantastic sources of information for evolutionary biologists, revealing the agents of past epidemics and the dynamics of past populations. However, the analysis of aDNA generally faces two major issues. Firstly, sequences consist of a mixture of endogenous and various exogenous backgrounds, mostly microbial. Secondly, high nucleotide misincorporation rates can be observed as a result of severe post-mortem DNA damage. Such misincorporation patterns are instrumental to authenticate ancient sequences versus modern contaminants. We recently developed the user-friendly mapDamage package that identifies such patterns from next-generation sequencing (NGS) sequence datasets. The absence of formal statistical modeling of the DNA damage process, however, precluded rigorous quantitative comparisons across samples. Results: Here, we describe mapDamage 2.0 that extends the original features of mapDamage by incorporating a statistical model of DNA damage. Assuming that damage events depend only on sequencing position and post-mortem deamination, our Bayesian statistical framework provides estimates of four key features of aDNA molecules: the average length of overhangs (λ), nick frequency (ν) and cytosine deamination rates in both double-stranded regions ([Image: see text]) and overhangs ([Image: see text]). Our model enables rescaling base quality scores according to their probability of being damaged. mapDamage 2.0 handles NGS datasets with ease and is compatible with a wide range of DNA library protocols. Availability: mapDamage 2.0 is available at ginolhac.github.io/mapDamage/ as a Python package and documentation is maintained at the Centre for GeoGenetics Web site (geogenetics.ku.dk/publications/mapdamage2.0/). Contact: jonsson.hakon@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2013-07-01 2013-04-23 /pmc/articles/PMC3694634/ /pubmed/23613487 http://dx.doi.org/10.1093/bioinformatics/btt193 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Applications Notes
Jónsson, Hákon
Ginolhac, Aurélien
Schubert, Mikkel
Johnson, Philip L. F.
Orlando, Ludovic
mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters
title mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters
title_full mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters
title_fullStr mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters
title_full_unstemmed mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters
title_short mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters
title_sort mapdamage2.0: fast approximate bayesian estimates of ancient dna damage parameters
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694634/
https://www.ncbi.nlm.nih.gov/pubmed/23613487
http://dx.doi.org/10.1093/bioinformatics/btt193
work_keys_str_mv AT jonssonhakon mapdamage20fastapproximatebayesianestimatesofancientdnadamageparameters
AT ginolhacaurelien mapdamage20fastapproximatebayesianestimatesofancientdnadamageparameters
AT schubertmikkel mapdamage20fastapproximatebayesianestimatesofancientdnadamageparameters
AT johnsonphiliplf mapdamage20fastapproximatebayesianestimatesofancientdnadamageparameters
AT orlandoludovic mapdamage20fastapproximatebayesianestimatesofancientdnadamageparameters