Cargando…

vi-HMM: a novel HMM-based method for sequence variant identification in short-read data

BACKGROUND: Accurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in next-generation sequencing (NGS) applications. Existing methods for calling these variants often make sim...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Man, Hasan, Mohammad Shabbir, Zhu, Hongxiao, Zhang, Liqing, Wu, Xiaowei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6387560/
https://www.ncbi.nlm.nih.gov/pubmed/30795817
http://dx.doi.org/10.1186/s40246-019-0194-6
_version_ 1783397610104553472
author Tang, Man
Hasan, Mohammad Shabbir
Zhu, Hongxiao
Zhang, Liqing
Wu, Xiaowei
author_facet Tang, Man
Hasan, Mohammad Shabbir
Zhu, Hongxiao
Zhang, Liqing
Wu, Xiaowei
author_sort Tang, Man
collection PubMed
description BACKGROUND: Accurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in next-generation sequencing (NGS) applications. Existing methods for calling these variants often make simplified assumptions of positional independence and fail to leverage the dependence between genotypes at nearby loci that is caused by linkage disequilibrium (LD). RESULTS AND CONCLUSION: We propose vi-HMM, a hidden Markov model (HMM)-based method for calling SNPs and INDELs in mapped short-read data. This method allows transitions between hidden states (defined as “SNP,” “Ins,” “Del,” and “Match”) of adjacent genomic bases and determines an optimal hidden state path by using the Viterbi algorithm. The inferred hidden state path provides a direct solution to the identification of SNPs and INDELs. Simulation studies show that, under various sequencing depths, vi-HMM outperforms commonly used variant calling methods in terms of sensitivity and F(1) score. When applied to the real data, vi-HMM demonstrates higher accuracy in calling SNPs and INDELs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40246-019-0194-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6387560
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63875602019-03-04 vi-HMM: a novel HMM-based method for sequence variant identification in short-read data Tang, Man Hasan, Mohammad Shabbir Zhu, Hongxiao Zhang, Liqing Wu, Xiaowei Hum Genomics Primary Research BACKGROUND: Accurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in next-generation sequencing (NGS) applications. Existing methods for calling these variants often make simplified assumptions of positional independence and fail to leverage the dependence between genotypes at nearby loci that is caused by linkage disequilibrium (LD). RESULTS AND CONCLUSION: We propose vi-HMM, a hidden Markov model (HMM)-based method for calling SNPs and INDELs in mapped short-read data. This method allows transitions between hidden states (defined as “SNP,” “Ins,” “Del,” and “Match”) of adjacent genomic bases and determines an optimal hidden state path by using the Viterbi algorithm. The inferred hidden state path provides a direct solution to the identification of SNPs and INDELs. Simulation studies show that, under various sequencing depths, vi-HMM outperforms commonly used variant calling methods in terms of sensitivity and F(1) score. When applied to the real data, vi-HMM demonstrates higher accuracy in calling SNPs and INDELs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40246-019-0194-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-13 /pmc/articles/PMC6387560/ /pubmed/30795817 http://dx.doi.org/10.1186/s40246-019-0194-6 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Primary Research
Tang, Man
Hasan, Mohammad Shabbir
Zhu, Hongxiao
Zhang, Liqing
Wu, Xiaowei
vi-HMM: a novel HMM-based method for sequence variant identification in short-read data
title vi-HMM: a novel HMM-based method for sequence variant identification in short-read data
title_full vi-HMM: a novel HMM-based method for sequence variant identification in short-read data
title_fullStr vi-HMM: a novel HMM-based method for sequence variant identification in short-read data
title_full_unstemmed vi-HMM: a novel HMM-based method for sequence variant identification in short-read data
title_short vi-HMM: a novel HMM-based method for sequence variant identification in short-read data
title_sort vi-hmm: a novel hmm-based method for sequence variant identification in short-read data
topic Primary Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6387560/
https://www.ncbi.nlm.nih.gov/pubmed/30795817
http://dx.doi.org/10.1186/s40246-019-0194-6
work_keys_str_mv AT tangman vihmmanovelhmmbasedmethodforsequencevariantidentificationinshortreaddata
AT hasanmohammadshabbir vihmmanovelhmmbasedmethodforsequencevariantidentificationinshortreaddata
AT zhuhongxiao vihmmanovelhmmbasedmethodforsequencevariantidentificationinshortreaddata
AT zhangliqing vihmmanovelhmmbasedmethodforsequencevariantidentificationinshortreaddata
AT wuxiaowei vihmmanovelhmmbasedmethodforsequencevariantidentificationinshortreaddata