Cargando…

Computational identification of deleterious synonymous variants in human genomes using a feature-based approach

BACKGROUND: Although synonymous single nucleotide variants (sSNVs) do not alter the protein sequences, they have been shown to play an important role in human disease. Distinguishing pathogenic sSNVs from neutral ones is challenging because pathogenic sSNVs tend to have low prevalence. Although many...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Fang, Yao, Yao, Bin, Yannan, Zheng, Chun-Hou, Xia, Junfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357349/
https://www.ncbi.nlm.nih.gov/pubmed/30704475
http://dx.doi.org/10.1186/s12920-018-0455-6
_version_ 1783391765772894208
author Shi, Fang
Yao, Yao
Bin, Yannan
Zheng, Chun-Hou
Xia, Junfeng
author_facet Shi, Fang
Yao, Yao
Bin, Yannan
Zheng, Chun-Hou
Xia, Junfeng
author_sort Shi, Fang
collection PubMed
description BACKGROUND: Although synonymous single nucleotide variants (sSNVs) do not alter the protein sequences, they have been shown to play an important role in human disease. Distinguishing pathogenic sSNVs from neutral ones is challenging because pathogenic sSNVs tend to have low prevalence. Although many methods have been developed for predicting the functional impact of single nucleotide variants, only a few have been specifically designed for identifying pathogenic sSNVs. RESULTS: In this work, we describe a computational model, IDSV (Identification of Deleterious Synonymous Variants), which uses random forest (RF) to detect deleterious sSNVs in human genomes. We systematically investigate a total of 74 multifaceted features across seven categories: splicing, conservation, codon usage, sequence, pre-mRNA folding energy, translation efficiency, and function regions annotation features. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the sequential backward selection method. Based on the optimized 10 features, a RF classifier is developed to identify deleterious sSNVs. The results on benchmark datasets show that IDSV outperforms other state-of-the-art methods in identifying sSNVs that are pathogenic. CONCLUSIONS: We have developed an efficient feature-based prediction approach (IDSV) for deleterious sSNVs by using a wide variety of features. Among all the features, a compact and useful feature subset that has an important implication for identifying deleterious sSNVs is identified. Our results indicate that besides splicing and conservation features, a new translation efficiency feature is also an informative feature for identifying deleterious sSNVs. While the function regions annotation and sequence features are weakly informative, they may have the ability to discriminate deleterious sSNVs from benign ones when combined with other features. The data and source code are available on website http://bioinfo.ahu.edu.cn:8080/IDSV. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0455-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6357349
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63573492019-02-07 Computational identification of deleterious synonymous variants in human genomes using a feature-based approach Shi, Fang Yao, Yao Bin, Yannan Zheng, Chun-Hou Xia, Junfeng BMC Med Genomics Research BACKGROUND: Although synonymous single nucleotide variants (sSNVs) do not alter the protein sequences, they have been shown to play an important role in human disease. Distinguishing pathogenic sSNVs from neutral ones is challenging because pathogenic sSNVs tend to have low prevalence. Although many methods have been developed for predicting the functional impact of single nucleotide variants, only a few have been specifically designed for identifying pathogenic sSNVs. RESULTS: In this work, we describe a computational model, IDSV (Identification of Deleterious Synonymous Variants), which uses random forest (RF) to detect deleterious sSNVs in human genomes. We systematically investigate a total of 74 multifaceted features across seven categories: splicing, conservation, codon usage, sequence, pre-mRNA folding energy, translation efficiency, and function regions annotation features. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the sequential backward selection method. Based on the optimized 10 features, a RF classifier is developed to identify deleterious sSNVs. The results on benchmark datasets show that IDSV outperforms other state-of-the-art methods in identifying sSNVs that are pathogenic. CONCLUSIONS: We have developed an efficient feature-based prediction approach (IDSV) for deleterious sSNVs by using a wide variety of features. Among all the features, a compact and useful feature subset that has an important implication for identifying deleterious sSNVs is identified. Our results indicate that besides splicing and conservation features, a new translation efficiency feature is also an informative feature for identifying deleterious sSNVs. While the function regions annotation and sequence features are weakly informative, they may have the ability to discriminate deleterious sSNVs from benign ones when combined with other features. The data and source code are available on website http://bioinfo.ahu.edu.cn:8080/IDSV. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0455-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-31 /pmc/articles/PMC6357349/ /pubmed/30704475 http://dx.doi.org/10.1186/s12920-018-0455-6 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Shi, Fang
Yao, Yao
Bin, Yannan
Zheng, Chun-Hou
Xia, Junfeng
Computational identification of deleterious synonymous variants in human genomes using a feature-based approach
title Computational identification of deleterious synonymous variants in human genomes using a feature-based approach
title_full Computational identification of deleterious synonymous variants in human genomes using a feature-based approach
title_fullStr Computational identification of deleterious synonymous variants in human genomes using a feature-based approach
title_full_unstemmed Computational identification of deleterious synonymous variants in human genomes using a feature-based approach
title_short Computational identification of deleterious synonymous variants in human genomes using a feature-based approach
title_sort computational identification of deleterious synonymous variants in human genomes using a feature-based approach
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357349/
https://www.ncbi.nlm.nih.gov/pubmed/30704475
http://dx.doi.org/10.1186/s12920-018-0455-6
work_keys_str_mv AT shifang computationalidentificationofdeleterioussynonymousvariantsinhumangenomesusingafeaturebasedapproach
AT yaoyao computationalidentificationofdeleterioussynonymousvariantsinhumangenomesusingafeaturebasedapproach
AT binyannan computationalidentificationofdeleterioussynonymousvariantsinhumangenomesusingafeaturebasedapproach
AT zhengchunhou computationalidentificationofdeleterioussynonymousvariantsinhumangenomesusingafeaturebasedapproach
AT xiajunfeng computationalidentificationofdeleterioussynonymousvariantsinhumangenomesusingafeaturebasedapproach