Cargando…

A method for positive forensic identification of samples from extremely low-coverage sequence data

BACKGROUND: Determining whether two DNA samples originate from the same individual is difficult when the amount of retrievable DNA is limited. This is often the case for ancient, historic, and forensic samples. The most widely used approaches rely on amplification of a defined panel of multi-allelic...

Descripción completa

Detalles Bibliográficos
Autores principales: Vohr, Samuel H., Buen Abad Najar, Carlos Fernando, Shapiro, Beth, Green, Richard E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4672566/
https://www.ncbi.nlm.nih.gov/pubmed/26643904
http://dx.doi.org/10.1186/s12864-015-2241-6
_version_ 1782404595241189376
author Vohr, Samuel H.
Buen Abad Najar, Carlos Fernando
Shapiro, Beth
Green, Richard E.
author_facet Vohr, Samuel H.
Buen Abad Najar, Carlos Fernando
Shapiro, Beth
Green, Richard E.
author_sort Vohr, Samuel H.
collection PubMed
description BACKGROUND: Determining whether two DNA samples originate from the same individual is difficult when the amount of retrievable DNA is limited. This is often the case for ancient, historic, and forensic samples. The most widely used approaches rely on amplification of a defined panel of multi-allelic markers and comparison to similar data from other samples. When the amount retrievable DNA is low these approaches fail. RESULTS: We describe a new method for assessing whether shotgun DNA sequence data from two samples are consistent with originating from the same or different individuals. Our approach makes use of the large catalogs of single nucleotide polymorphism (SNP) markers to maximize the chances of observing potentially discriminating alleles. We further reduce the amount of data required by taking advantage of patterns of linkage disequilibrium modeled by a reference panel of haplotypes to indirectly compare observations at pairs of linked SNPs. Using both coalescent simulations and real sequencing data from modern and ancient sources, we show that this approach is robust with respect to the reference panel and has power to detect positive identity from DNA libraries with less than 1 % random and non-overlapping genome coverage in each sample. CONCLUSION: We present a powerful new approach that can determine whether DNA from two samples originated from the same individual even when only minute quantities of DNA are recoverable from each. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2241-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4672566
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46725662015-12-09 A method for positive forensic identification of samples from extremely low-coverage sequence data Vohr, Samuel H. Buen Abad Najar, Carlos Fernando Shapiro, Beth Green, Richard E. BMC Genomics Methodology Article BACKGROUND: Determining whether two DNA samples originate from the same individual is difficult when the amount of retrievable DNA is limited. This is often the case for ancient, historic, and forensic samples. The most widely used approaches rely on amplification of a defined panel of multi-allelic markers and comparison to similar data from other samples. When the amount retrievable DNA is low these approaches fail. RESULTS: We describe a new method for assessing whether shotgun DNA sequence data from two samples are consistent with originating from the same or different individuals. Our approach makes use of the large catalogs of single nucleotide polymorphism (SNP) markers to maximize the chances of observing potentially discriminating alleles. We further reduce the amount of data required by taking advantage of patterns of linkage disequilibrium modeled by a reference panel of haplotypes to indirectly compare observations at pairs of linked SNPs. Using both coalescent simulations and real sequencing data from modern and ancient sources, we show that this approach is robust with respect to the reference panel and has power to detect positive identity from DNA libraries with less than 1 % random and non-overlapping genome coverage in each sample. CONCLUSION: We present a powerful new approach that can determine whether DNA from two samples originated from the same individual even when only minute quantities of DNA are recoverable from each. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2241-6) contains supplementary material, which is available to authorized users. BioMed Central 2015-12-07 /pmc/articles/PMC4672566/ /pubmed/26643904 http://dx.doi.org/10.1186/s12864-015-2241-6 Text en © Vohr et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Vohr, Samuel H.
Buen Abad Najar, Carlos Fernando
Shapiro, Beth
Green, Richard E.
A method for positive forensic identification of samples from extremely low-coverage sequence data
title A method for positive forensic identification of samples from extremely low-coverage sequence data
title_full A method for positive forensic identification of samples from extremely low-coverage sequence data
title_fullStr A method for positive forensic identification of samples from extremely low-coverage sequence data
title_full_unstemmed A method for positive forensic identification of samples from extremely low-coverage sequence data
title_short A method for positive forensic identification of samples from extremely low-coverage sequence data
title_sort method for positive forensic identification of samples from extremely low-coverage sequence data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4672566/
https://www.ncbi.nlm.nih.gov/pubmed/26643904
http://dx.doi.org/10.1186/s12864-015-2241-6
work_keys_str_mv AT vohrsamuelh amethodforpositiveforensicidentificationofsamplesfromextremelylowcoveragesequencedata
AT buenabadnajarcarlosfernando amethodforpositiveforensicidentificationofsamplesfromextremelylowcoveragesequencedata
AT shapirobeth amethodforpositiveforensicidentificationofsamplesfromextremelylowcoveragesequencedata
AT greenricharde amethodforpositiveforensicidentificationofsamplesfromextremelylowcoveragesequencedata
AT vohrsamuelh methodforpositiveforensicidentificationofsamplesfromextremelylowcoveragesequencedata
AT buenabadnajarcarlosfernando methodforpositiveforensicidentificationofsamplesfromextremelylowcoveragesequencedata
AT shapirobeth methodforpositiveforensicidentificationofsamplesfromextremelylowcoveragesequencedata
AT greenricharde methodforpositiveforensicidentificationofsamplesfromextremelylowcoveragesequencedata