Cargando…

A cross-sample statistical model for SNP detection in short-read sequencing data

Highly multiplex DNA sequencers have greatly expanded our ability to survey human genomes for previously unknown single nucleotide polymorphisms (SNPs). However, sequencing and mapping errors, though rare, contribute substantially to the number of false discoveries in current SNP callers. We demonst...

Descripción completa

Detalles Bibliográficos
Autores principales: Muralidharan, Omkar, Natsoulis, Georges, Bell, John, Newburger, Daniel, Xu, Hua, Kela, Itai, Ji, Hanlee, Zhang, Nancy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245949/
https://www.ncbi.nlm.nih.gov/pubmed/22064853
http://dx.doi.org/10.1093/nar/gkr851
_version_ 1782219913304211456
author Muralidharan, Omkar
Natsoulis, Georges
Bell, John
Newburger, Daniel
Xu, Hua
Kela, Itai
Ji, Hanlee
Zhang, Nancy
author_facet Muralidharan, Omkar
Natsoulis, Georges
Bell, John
Newburger, Daniel
Xu, Hua
Kela, Itai
Ji, Hanlee
Zhang, Nancy
author_sort Muralidharan, Omkar
collection PubMed
description Highly multiplex DNA sequencers have greatly expanded our ability to survey human genomes for previously unknown single nucleotide polymorphisms (SNPs). However, sequencing and mapping errors, though rare, contribute substantially to the number of false discoveries in current SNP callers. We demonstrate that we can significantly reduce the number of false positive SNP calls by pooling information across samples. Although many studies prepare and sequence multiple samples with the same protocol, most existing SNP callers ignore cross-sample information. In contrast, we propose an empirical Bayes method that uses cross-sample information to learn the error properties of the data. This error information lets us call SNPs with a lower false discovery rate than existing methods.
format Online
Article
Text
id pubmed-3245949
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-32459492012-01-03 A cross-sample statistical model for SNP detection in short-read sequencing data Muralidharan, Omkar Natsoulis, Georges Bell, John Newburger, Daniel Xu, Hua Kela, Itai Ji, Hanlee Zhang, Nancy Nucleic Acids Res Methods Online Highly multiplex DNA sequencers have greatly expanded our ability to survey human genomes for previously unknown single nucleotide polymorphisms (SNPs). However, sequencing and mapping errors, though rare, contribute substantially to the number of false discoveries in current SNP callers. We demonstrate that we can significantly reduce the number of false positive SNP calls by pooling information across samples. Although many studies prepare and sequence multiple samples with the same protocol, most existing SNP callers ignore cross-sample information. In contrast, we propose an empirical Bayes method that uses cross-sample information to learn the error properties of the data. This error information lets us call SNPs with a lower false discovery rate than existing methods. Oxford University Press 2012-01 2011-11-07 /pmc/articles/PMC3245949/ /pubmed/22064853 http://dx.doi.org/10.1093/nar/gkr851 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Muralidharan, Omkar
Natsoulis, Georges
Bell, John
Newburger, Daniel
Xu, Hua
Kela, Itai
Ji, Hanlee
Zhang, Nancy
A cross-sample statistical model for SNP detection in short-read sequencing data
title A cross-sample statistical model for SNP detection in short-read sequencing data
title_full A cross-sample statistical model for SNP detection in short-read sequencing data
title_fullStr A cross-sample statistical model for SNP detection in short-read sequencing data
title_full_unstemmed A cross-sample statistical model for SNP detection in short-read sequencing data
title_short A cross-sample statistical model for SNP detection in short-read sequencing data
title_sort cross-sample statistical model for snp detection in short-read sequencing data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245949/
https://www.ncbi.nlm.nih.gov/pubmed/22064853
http://dx.doi.org/10.1093/nar/gkr851
work_keys_str_mv AT muralidharanomkar acrosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT natsoulisgeorges acrosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT belljohn acrosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT newburgerdaniel acrosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT xuhua acrosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT kelaitai acrosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT jihanlee acrosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT zhangnancy acrosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT muralidharanomkar crosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT natsoulisgeorges crosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT belljohn crosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT newburgerdaniel crosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT xuhua crosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT kelaitai crosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT jihanlee crosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata
AT zhangnancy crosssamplestatisticalmodelforsnpdetectioninshortreadsequencingdata