Cargando…

SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies

Motivation: Next-generation sequencing presents several statistical challenges, with one of the most fundamental being determining an individual's genotype from multiple aligned short read sequences at a position. Some simple approaches for genotype calling apply fixed filters, such as calling...

Descripción completa

Detalles Bibliográficos
Autores principales: Martin, E. R., Kinnamon, D. D., Schmidt, M. A., Powell, E. H., Zuchner, S., Morris, R. W.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2971572/
https://www.ncbi.nlm.nih.gov/pubmed/20861027
http://dx.doi.org/10.1093/bioinformatics/btq526
_version_ 1782190645088092160
author Martin, E. R.
Kinnamon, D. D.
Schmidt, M. A.
Powell, E. H.
Zuchner, S.
Morris, R. W.
author_facet Martin, E. R.
Kinnamon, D. D.
Schmidt, M. A.
Powell, E. H.
Zuchner, S.
Morris, R. W.
author_sort Martin, E. R.
collection PubMed
description Motivation: Next-generation sequencing presents several statistical challenges, with one of the most fundamental being determining an individual's genotype from multiple aligned short read sequences at a position. Some simple approaches for genotype calling apply fixed filters, such as calling a heterozygote if more than a specified percentage of the reads have variant nucleotide calls. Other genotype-calling methods, such as MAQ and SOAPsnp, are implementations of Bayes classifiers in that they classify genotypes using posterior genotype probabilities. Results: Here, we propose a novel genotype-calling algorithm that, in contrast to the other methods, estimates parameters underlying the posterior probabilities in an adaptive way rather than arbitrarily specifying them a priori. The algorithm, which we call SeqEM, applies the well-known Expectation-Maximization algorithm to an appropriate likelihood for a sample of unrelated individuals with next-generation sequence data, leveraging information from the sample to estimate genotype probabilities and the nucleotide-read error rate. We demonstrate using analytic calculations and simulations that SeqEM results in genotype-call error rates as small as or smaller than filtering approaches and MAQ. We also apply SeqEM to exome sequence data in eight related individuals and compare the results to genotypes from an Illumina SNP array, showing that SeqEM behaves well in real data that deviates from idealized assumptions. Conclusion: SeqEM offers an improved, robust and flexible genotype-calling approach that can be widely applied in the next-generation sequencing studies. Availability and implementation: Software for SeqEM is freely available from our website: www.hihg.org under Software Download. Contact: emartin1@med.miami.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Text
id pubmed-2971572
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29715722010-11-04 SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies Martin, E. R. Kinnamon, D. D. Schmidt, M. A. Powell, E. H. Zuchner, S. Morris, R. W. Bioinformatics Original Papers Motivation: Next-generation sequencing presents several statistical challenges, with one of the most fundamental being determining an individual's genotype from multiple aligned short read sequences at a position. Some simple approaches for genotype calling apply fixed filters, such as calling a heterozygote if more than a specified percentage of the reads have variant nucleotide calls. Other genotype-calling methods, such as MAQ and SOAPsnp, are implementations of Bayes classifiers in that they classify genotypes using posterior genotype probabilities. Results: Here, we propose a novel genotype-calling algorithm that, in contrast to the other methods, estimates parameters underlying the posterior probabilities in an adaptive way rather than arbitrarily specifying them a priori. The algorithm, which we call SeqEM, applies the well-known Expectation-Maximization algorithm to an appropriate likelihood for a sample of unrelated individuals with next-generation sequence data, leveraging information from the sample to estimate genotype probabilities and the nucleotide-read error rate. We demonstrate using analytic calculations and simulations that SeqEM results in genotype-call error rates as small as or smaller than filtering approaches and MAQ. We also apply SeqEM to exome sequence data in eight related individuals and compare the results to genotypes from an Illumina SNP array, showing that SeqEM behaves well in real data that deviates from idealized assumptions. Conclusion: SeqEM offers an improved, robust and flexible genotype-calling approach that can be widely applied in the next-generation sequencing studies. Availability and implementation: Software for SeqEM is freely available from our website: www.hihg.org under Software Download. Contact: emartin1@med.miami.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2010-11-15 2010-09-21 /pmc/articles/PMC2971572/ /pubmed/20861027 http://dx.doi.org/10.1093/bioinformatics/btq526 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Martin, E. R.
Kinnamon, D. D.
Schmidt, M. A.
Powell, E. H.
Zuchner, S.
Morris, R. W.
SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies
title SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies
title_full SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies
title_fullStr SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies
title_full_unstemmed SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies
title_short SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies
title_sort seqem: an adaptive genotype-calling approach for next-generation sequencing studies
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2971572/
https://www.ncbi.nlm.nih.gov/pubmed/20861027
http://dx.doi.org/10.1093/bioinformatics/btq526
work_keys_str_mv AT martiner seqemanadaptivegenotypecallingapproachfornextgenerationsequencingstudies
AT kinnamondd seqemanadaptivegenotypecallingapproachfornextgenerationsequencingstudies
AT schmidtma seqemanadaptivegenotypecallingapproachfornextgenerationsequencingstudies
AT powelleh seqemanadaptivegenotypecallingapproachfornextgenerationsequencingstudies
AT zuchners seqemanadaptivegenotypecallingapproachfornextgenerationsequencingstudies
AT morrisrw seqemanadaptivegenotypecallingapproachfornextgenerationsequencingstudies