Cargando…
SMaSH: Sample matching using SNPs in humans
BACKGROUND: Inadvertent sample swaps are a real threat to data quality in any medium to large scale omics studies. While matches between samples from the same individual can in principle be identified from a few well characterized single nucleotide polymorphisms (SNPs), omics data types often only p...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6936078/ https://www.ncbi.nlm.nih.gov/pubmed/31888490 http://dx.doi.org/10.1186/s12864-019-6332-7 |
_version_ | 1783483679192907776 |
---|---|
author | Westphal, Maximillian Frankhouser, David Sonzone, Carmine Shields, Peter G. Yan, Pearlly Bundschuh, Ralf |
author_facet | Westphal, Maximillian Frankhouser, David Sonzone, Carmine Shields, Peter G. Yan, Pearlly Bundschuh, Ralf |
author_sort | Westphal, Maximillian |
collection | PubMed |
description | BACKGROUND: Inadvertent sample swaps are a real threat to data quality in any medium to large scale omics studies. While matches between samples from the same individual can in principle be identified from a few well characterized single nucleotide polymorphisms (SNPs), omics data types often only provide low to moderate coverage, thus requiring integration of evidence from a large number of SNPs to determine if two samples derive from the same individual or not. METHODS: We select about six thousand SNPs in the human genome and develop a Bayesian framework that is able to robustly identify sample matches between next generation sequencing data sets. RESULTS: We validate our approach on a variety of data sets. Most importantly, we show that our approach can establish identity between different omics data types such as Exome, RNA-Seq, and MethylCap-Seq. We demonstrate how identity detection degrades with sample quality and read coverage, but show that twenty million reads of a fairly low quality RNA-Seq sample are still sufficient for reliable sample identification. CONCLUSION: Our tool, SMASH, is able to identify sample mismatches in next generation sequencing data sets between different sequencing modalities and for low quality sequencing data. |
format | Online Article Text |
id | pubmed-6936078 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69360782019-12-31 SMaSH: Sample matching using SNPs in humans Westphal, Maximillian Frankhouser, David Sonzone, Carmine Shields, Peter G. Yan, Pearlly Bundschuh, Ralf BMC Genomics Research BACKGROUND: Inadvertent sample swaps are a real threat to data quality in any medium to large scale omics studies. While matches between samples from the same individual can in principle be identified from a few well characterized single nucleotide polymorphisms (SNPs), omics data types often only provide low to moderate coverage, thus requiring integration of evidence from a large number of SNPs to determine if two samples derive from the same individual or not. METHODS: We select about six thousand SNPs in the human genome and develop a Bayesian framework that is able to robustly identify sample matches between next generation sequencing data sets. RESULTS: We validate our approach on a variety of data sets. Most importantly, we show that our approach can establish identity between different omics data types such as Exome, RNA-Seq, and MethylCap-Seq. We demonstrate how identity detection degrades with sample quality and read coverage, but show that twenty million reads of a fairly low quality RNA-Seq sample are still sufficient for reliable sample identification. CONCLUSION: Our tool, SMASH, is able to identify sample mismatches in next generation sequencing data sets between different sequencing modalities and for low quality sequencing data. BioMed Central 2019-12-30 /pmc/articles/PMC6936078/ /pubmed/31888490 http://dx.doi.org/10.1186/s12864-019-6332-7 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Westphal, Maximillian Frankhouser, David Sonzone, Carmine Shields, Peter G. Yan, Pearlly Bundschuh, Ralf SMaSH: Sample matching using SNPs in humans |
title | SMaSH: Sample matching using SNPs in humans |
title_full | SMaSH: Sample matching using SNPs in humans |
title_fullStr | SMaSH: Sample matching using SNPs in humans |
title_full_unstemmed | SMaSH: Sample matching using SNPs in humans |
title_short | SMaSH: Sample matching using SNPs in humans |
title_sort | smash: sample matching using snps in humans |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6936078/ https://www.ncbi.nlm.nih.gov/pubmed/31888490 http://dx.doi.org/10.1186/s12864-019-6332-7 |
work_keys_str_mv | AT westphalmaximillian smashsamplematchingusingsnpsinhumans AT frankhouserdavid smashsamplematchingusingsnpsinhumans AT sonzonecarmine smashsamplematchingusingsnpsinhumans AT shieldspeterg smashsamplematchingusingsnpsinhumans AT yanpearlly smashsamplematchingusingsnpsinhumans AT bundschuhralf smashsamplematchingusingsnpsinhumans |