Cargando…
Computational identification of deleterious synonymous variants in human genomes using a feature-based approach
BACKGROUND: Although synonymous single nucleotide variants (sSNVs) do not alter the protein sequences, they have been shown to play an important role in human disease. Distinguishing pathogenic sSNVs from neutral ones is challenging because pathogenic sSNVs tend to have low prevalence. Although many...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357349/ https://www.ncbi.nlm.nih.gov/pubmed/30704475 http://dx.doi.org/10.1186/s12920-018-0455-6 |
_version_ | 1783391765772894208 |
---|---|
author | Shi, Fang Yao, Yao Bin, Yannan Zheng, Chun-Hou Xia, Junfeng |
author_facet | Shi, Fang Yao, Yao Bin, Yannan Zheng, Chun-Hou Xia, Junfeng |
author_sort | Shi, Fang |
collection | PubMed |
description | BACKGROUND: Although synonymous single nucleotide variants (sSNVs) do not alter the protein sequences, they have been shown to play an important role in human disease. Distinguishing pathogenic sSNVs from neutral ones is challenging because pathogenic sSNVs tend to have low prevalence. Although many methods have been developed for predicting the functional impact of single nucleotide variants, only a few have been specifically designed for identifying pathogenic sSNVs. RESULTS: In this work, we describe a computational model, IDSV (Identification of Deleterious Synonymous Variants), which uses random forest (RF) to detect deleterious sSNVs in human genomes. We systematically investigate a total of 74 multifaceted features across seven categories: splicing, conservation, codon usage, sequence, pre-mRNA folding energy, translation efficiency, and function regions annotation features. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the sequential backward selection method. Based on the optimized 10 features, a RF classifier is developed to identify deleterious sSNVs. The results on benchmark datasets show that IDSV outperforms other state-of-the-art methods in identifying sSNVs that are pathogenic. CONCLUSIONS: We have developed an efficient feature-based prediction approach (IDSV) for deleterious sSNVs by using a wide variety of features. Among all the features, a compact and useful feature subset that has an important implication for identifying deleterious sSNVs is identified. Our results indicate that besides splicing and conservation features, a new translation efficiency feature is also an informative feature for identifying deleterious sSNVs. While the function regions annotation and sequence features are weakly informative, they may have the ability to discriminate deleterious sSNVs from benign ones when combined with other features. The data and source code are available on website http://bioinfo.ahu.edu.cn:8080/IDSV. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0455-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6357349 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63573492019-02-07 Computational identification of deleterious synonymous variants in human genomes using a feature-based approach Shi, Fang Yao, Yao Bin, Yannan Zheng, Chun-Hou Xia, Junfeng BMC Med Genomics Research BACKGROUND: Although synonymous single nucleotide variants (sSNVs) do not alter the protein sequences, they have been shown to play an important role in human disease. Distinguishing pathogenic sSNVs from neutral ones is challenging because pathogenic sSNVs tend to have low prevalence. Although many methods have been developed for predicting the functional impact of single nucleotide variants, only a few have been specifically designed for identifying pathogenic sSNVs. RESULTS: In this work, we describe a computational model, IDSV (Identification of Deleterious Synonymous Variants), which uses random forest (RF) to detect deleterious sSNVs in human genomes. We systematically investigate a total of 74 multifaceted features across seven categories: splicing, conservation, codon usage, sequence, pre-mRNA folding energy, translation efficiency, and function regions annotation features. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the sequential backward selection method. Based on the optimized 10 features, a RF classifier is developed to identify deleterious sSNVs. The results on benchmark datasets show that IDSV outperforms other state-of-the-art methods in identifying sSNVs that are pathogenic. CONCLUSIONS: We have developed an efficient feature-based prediction approach (IDSV) for deleterious sSNVs by using a wide variety of features. Among all the features, a compact and useful feature subset that has an important implication for identifying deleterious sSNVs is identified. Our results indicate that besides splicing and conservation features, a new translation efficiency feature is also an informative feature for identifying deleterious sSNVs. While the function regions annotation and sequence features are weakly informative, they may have the ability to discriminate deleterious sSNVs from benign ones when combined with other features. The data and source code are available on website http://bioinfo.ahu.edu.cn:8080/IDSV. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0455-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-31 /pmc/articles/PMC6357349/ /pubmed/30704475 http://dx.doi.org/10.1186/s12920-018-0455-6 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Shi, Fang Yao, Yao Bin, Yannan Zheng, Chun-Hou Xia, Junfeng Computational identification of deleterious synonymous variants in human genomes using a feature-based approach |
title | Computational identification of deleterious synonymous variants in human genomes using a feature-based approach |
title_full | Computational identification of deleterious synonymous variants in human genomes using a feature-based approach |
title_fullStr | Computational identification of deleterious synonymous variants in human genomes using a feature-based approach |
title_full_unstemmed | Computational identification of deleterious synonymous variants in human genomes using a feature-based approach |
title_short | Computational identification of deleterious synonymous variants in human genomes using a feature-based approach |
title_sort | computational identification of deleterious synonymous variants in human genomes using a feature-based approach |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357349/ https://www.ncbi.nlm.nih.gov/pubmed/30704475 http://dx.doi.org/10.1186/s12920-018-0455-6 |
work_keys_str_mv | AT shifang computationalidentificationofdeleterioussynonymousvariantsinhumangenomesusingafeaturebasedapproach AT yaoyao computationalidentificationofdeleterioussynonymousvariantsinhumangenomesusingafeaturebasedapproach AT binyannan computationalidentificationofdeleterioussynonymousvariantsinhumangenomesusingafeaturebasedapproach AT zhengchunhou computationalidentificationofdeleterioussynonymousvariantsinhumangenomesusingafeaturebasedapproach AT xiajunfeng computationalidentificationofdeleterioussynonymousvariantsinhumangenomesusingafeaturebasedapproach |