Cargando…

PSE-HMM: genome-wide CNV detection from NGS data using an HMM with Position-Specific Emission probabilities

BACKGROUND: Copy Number Variation (CNV) is envisaged to be a major source of large structural variations in the human genome. In recent years, many studies apply Next Generation Sequencing (NGS) data for the CNV detection. However, still there is a necessity to invent more accurate computational too...

Descripción completa

Detalles Bibliográficos
Autores principales: Malekpour, Seyed Amir, Pezeshk, Hamid, Sadeghi, Mehdi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5445519/
https://www.ncbi.nlm.nih.gov/pubmed/27809781
http://dx.doi.org/10.1186/s12859-016-1296-y
_version_ 1783238910280728576
author Malekpour, Seyed Amir
Pezeshk, Hamid
Sadeghi, Mehdi
author_facet Malekpour, Seyed Amir
Pezeshk, Hamid
Sadeghi, Mehdi
author_sort Malekpour, Seyed Amir
collection PubMed
description BACKGROUND: Copy Number Variation (CNV) is envisaged to be a major source of large structural variations in the human genome. In recent years, many studies apply Next Generation Sequencing (NGS) data for the CNV detection. However, still there is a necessity to invent more accurate computational tools. RESULTS: In this study, mate pair NGS data are used for the CNV detection in a Hidden Markov Model (HMM). The proposed HMM has position specific emission probabilities, i.e. a Gaussian mixture distribution. Each component in the Gaussian mixture distribution captures a different type of aberration that is observed in the mate pairs, after being mapped to the reference genome. These aberrations may include any increase (decrease) in the insertion size or change in the direction of mate pairs that are mapped to the reference genome. This HMM with Position-Specific Emission probabilities (PSE-HMM) is utilized for the genome-wide detection of deletions and tandem duplications. The performance of PSE-HMM is evaluated on a simulated dataset and also on a real data of a Yoruban HapMap individual, NA18507. CONCLUSIONS: PSE-HMM is effective in taking observation dependencies into account and reaches a high accuracy in detecting genome-wide CNVs. MATLAB programs are available at http://bs.ipm.ir/softwares/PSE-HMM/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1296-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5445519
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54455192017-05-30 PSE-HMM: genome-wide CNV detection from NGS data using an HMM with Position-Specific Emission probabilities Malekpour, Seyed Amir Pezeshk, Hamid Sadeghi, Mehdi BMC Bioinformatics Methodology Article BACKGROUND: Copy Number Variation (CNV) is envisaged to be a major source of large structural variations in the human genome. In recent years, many studies apply Next Generation Sequencing (NGS) data for the CNV detection. However, still there is a necessity to invent more accurate computational tools. RESULTS: In this study, mate pair NGS data are used for the CNV detection in a Hidden Markov Model (HMM). The proposed HMM has position specific emission probabilities, i.e. a Gaussian mixture distribution. Each component in the Gaussian mixture distribution captures a different type of aberration that is observed in the mate pairs, after being mapped to the reference genome. These aberrations may include any increase (decrease) in the insertion size or change in the direction of mate pairs that are mapped to the reference genome. This HMM with Position-Specific Emission probabilities (PSE-HMM) is utilized for the genome-wide detection of deletions and tandem duplications. The performance of PSE-HMM is evaluated on a simulated dataset and also on a real data of a Yoruban HapMap individual, NA18507. CONCLUSIONS: PSE-HMM is effective in taking observation dependencies into account and reaches a high accuracy in detecting genome-wide CNVs. MATLAB programs are available at http://bs.ipm.ir/softwares/PSE-HMM/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1296-y) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-03 /pmc/articles/PMC5445519/ /pubmed/27809781 http://dx.doi.org/10.1186/s12859-016-1296-y Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Malekpour, Seyed Amir
Pezeshk, Hamid
Sadeghi, Mehdi
PSE-HMM: genome-wide CNV detection from NGS data using an HMM with Position-Specific Emission probabilities
title PSE-HMM: genome-wide CNV detection from NGS data using an HMM with Position-Specific Emission probabilities
title_full PSE-HMM: genome-wide CNV detection from NGS data using an HMM with Position-Specific Emission probabilities
title_fullStr PSE-HMM: genome-wide CNV detection from NGS data using an HMM with Position-Specific Emission probabilities
title_full_unstemmed PSE-HMM: genome-wide CNV detection from NGS data using an HMM with Position-Specific Emission probabilities
title_short PSE-HMM: genome-wide CNV detection from NGS data using an HMM with Position-Specific Emission probabilities
title_sort pse-hmm: genome-wide cnv detection from ngs data using an hmm with position-specific emission probabilities
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5445519/
https://www.ncbi.nlm.nih.gov/pubmed/27809781
http://dx.doi.org/10.1186/s12859-016-1296-y
work_keys_str_mv AT malekpourseyedamir psehmmgenomewidecnvdetectionfromngsdatausinganhmmwithpositionspecificemissionprobabilities
AT pezeshkhamid psehmmgenomewidecnvdetectionfromngsdatausinganhmmwithpositionspecificemissionprobabilities
AT sadeghimehdi psehmmgenomewidecnvdetectionfromngsdatausinganhmmwithpositionspecificemissionprobabilities