Cargando…

EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome

BACKGROUND: Predicting the functional impact of amino acid substitutions (AAS) caused by nonsynonymous single nucleotide polymorphisms (nsSNPs) is becoming increasingly important as more and more novel variants are being discovered. Bioinformatics analysis is essential to predict potentially causal...

Descripción completa

Detalles Bibliográficos
Autores principales: Zeng, Shuai, Yang, Jing, Chung, Brian Hon-Yin, Lau, Yu Lung, Yang, Wanling
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4061446/
https://www.ncbi.nlm.nih.gov/pubmed/24916671
http://dx.doi.org/10.1186/1471-2164-15-455
_version_ 1782321494119940096
author Zeng, Shuai
Yang, Jing
Chung, Brian Hon-Yin
Lau, Yu Lung
Yang, Wanling
author_facet Zeng, Shuai
Yang, Jing
Chung, Brian Hon-Yin
Lau, Yu Lung
Yang, Wanling
author_sort Zeng, Shuai
collection PubMed
description BACKGROUND: Predicting the functional impact of amino acid substitutions (AAS) caused by nonsynonymous single nucleotide polymorphisms (nsSNPs) is becoming increasingly important as more and more novel variants are being discovered. Bioinformatics analysis is essential to predict potentially causal or contributing AAS to human diseases for further analysis, as for each genome, thousands of rare or private AAS exist and only a very small number of which are related to an underlying disease. Existing algorithms in this field still have high false prediction rate and novel development is needed to take full advantage of vast amount of genomic data. RESULTS: Here we report a novel algorithm that features two innovative changes: 1. making better use of sequence conservation information by grouping the homologous protein sequences into six blocks according to evolutionary distances to human and evaluating sequence conservation in each block independently, and 2. including as many such homologous sequences as possible in analyses. Random forests are used to evaluate sequence conservation in each block and to predict potential impact of an AAS on protein function. Testing of this algorithm on a comprehensive dataset showed significant improvement on prediction accuracy upon currently widely-used programs. The algorithm and a web-based application tool implementing it, EFIN (Evaluation of Functional Impact of Nonsynonymous SNPs) were made freely available (http://paed.hku.hk/efin/) to the public. CONCLUSIONS: Grouping homologous sequences into different blocks according to the evolutionary distance of the species to human and evaluating sequence conservation in each group independently significantly improved prediction accuracy. This approach may help us better understand the roles of genetic variants in human disease and health. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-455) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4061446
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40614462014-06-19 EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome Zeng, Shuai Yang, Jing Chung, Brian Hon-Yin Lau, Yu Lung Yang, Wanling BMC Genomics Methodology Article BACKGROUND: Predicting the functional impact of amino acid substitutions (AAS) caused by nonsynonymous single nucleotide polymorphisms (nsSNPs) is becoming increasingly important as more and more novel variants are being discovered. Bioinformatics analysis is essential to predict potentially causal or contributing AAS to human diseases for further analysis, as for each genome, thousands of rare or private AAS exist and only a very small number of which are related to an underlying disease. Existing algorithms in this field still have high false prediction rate and novel development is needed to take full advantage of vast amount of genomic data. RESULTS: Here we report a novel algorithm that features two innovative changes: 1. making better use of sequence conservation information by grouping the homologous protein sequences into six blocks according to evolutionary distances to human and evaluating sequence conservation in each block independently, and 2. including as many such homologous sequences as possible in analyses. Random forests are used to evaluate sequence conservation in each block and to predict potential impact of an AAS on protein function. Testing of this algorithm on a comprehensive dataset showed significant improvement on prediction accuracy upon currently widely-used programs. The algorithm and a web-based application tool implementing it, EFIN (Evaluation of Functional Impact of Nonsynonymous SNPs) were made freely available (http://paed.hku.hk/efin/) to the public. CONCLUSIONS: Grouping homologous sequences into different blocks according to the evolutionary distance of the species to human and evaluating sequence conservation in each group independently significantly improved prediction accuracy. This approach may help us better understand the roles of genetic variants in human disease and health. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-455) contains supplementary material, which is available to authorized users. BioMed Central 2014-06-10 /pmc/articles/PMC4061446/ /pubmed/24916671 http://dx.doi.org/10.1186/1471-2164-15-455 Text en © Zeng et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Zeng, Shuai
Yang, Jing
Chung, Brian Hon-Yin
Lau, Yu Lung
Yang, Wanling
EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome
title EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome
title_full EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome
title_fullStr EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome
title_full_unstemmed EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome
title_short EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome
title_sort efin: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4061446/
https://www.ncbi.nlm.nih.gov/pubmed/24916671
http://dx.doi.org/10.1186/1471-2164-15-455
work_keys_str_mv AT zengshuai efinpredictingthefunctionalimpactofnonsynonymoussinglenucleotidepolymorphismsinhumangenome
AT yangjing efinpredictingthefunctionalimpactofnonsynonymoussinglenucleotidepolymorphismsinhumangenome
AT chungbrianhonyin efinpredictingthefunctionalimpactofnonsynonymoussinglenucleotidepolymorphismsinhumangenome
AT lauyulung efinpredictingthefunctionalimpactofnonsynonymoussinglenucleotidepolymorphismsinhumangenome
AT yangwanling efinpredictingthefunctionalimpactofnonsynonymoussinglenucleotidepolymorphismsinhumangenome