Cargando…
PaPI: pseudo amino acid composition to score human protein-coding variants
BACKGROUND: High throughput sequencing technologies are able to identify the whole genomic variation of an individual. Gene-targeted and whole-exome experiments are mainly focused on coding sequence variants related to a single or multiple nucleotides. The analysis of the biological significance of...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4411653/ https://www.ncbi.nlm.nih.gov/pubmed/25928477 http://dx.doi.org/10.1186/s12859-015-0554-8 |
_version_ | 1782368512557187072 |
---|---|
author | Limongelli, Ivan Marini, Simone Bellazzi, Riccardo |
author_facet | Limongelli, Ivan Marini, Simone Bellazzi, Riccardo |
author_sort | Limongelli, Ivan |
collection | PubMed |
description | BACKGROUND: High throughput sequencing technologies are able to identify the whole genomic variation of an individual. Gene-targeted and whole-exome experiments are mainly focused on coding sequence variants related to a single or multiple nucleotides. The analysis of the biological significance of this multitude of genomic variant is challenging and computational demanding. RESULTS: We present PaPI, a new machine-learning approach to classify and score human coding variants by estimating the probability to damage their protein-related function. The novelty of this approach consists in using pseudo amino acid composition through which wild and mutated protein sequences are represented in a discrete model. A machine learning classifier has been trained on a set of known deleterious and benign coding variants with the aim to score unobserved variants by taking into account hidden sequence patterns in human genome potentially leading to diseases. We show how the combination of amphiphilic pseudo amino acid composition, evolutionary conservation and homologous proteins based methods outperforms several prediction algorithms and it is also able to score complex variants such as deletions, insertions and indels. CONCLUSIONS: This paper describes a machine-learning approach to predict the deleteriousness of human coding variants. A freely available web application (http://papi.unipv.it) has been developed with the presented method, able to score up to thousands variants in a single run. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0554-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4411653 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-44116532015-04-29 PaPI: pseudo amino acid composition to score human protein-coding variants Limongelli, Ivan Marini, Simone Bellazzi, Riccardo BMC Bioinformatics Methodology Article BACKGROUND: High throughput sequencing technologies are able to identify the whole genomic variation of an individual. Gene-targeted and whole-exome experiments are mainly focused on coding sequence variants related to a single or multiple nucleotides. The analysis of the biological significance of this multitude of genomic variant is challenging and computational demanding. RESULTS: We present PaPI, a new machine-learning approach to classify and score human coding variants by estimating the probability to damage their protein-related function. The novelty of this approach consists in using pseudo amino acid composition through which wild and mutated protein sequences are represented in a discrete model. A machine learning classifier has been trained on a set of known deleterious and benign coding variants with the aim to score unobserved variants by taking into account hidden sequence patterns in human genome potentially leading to diseases. We show how the combination of amphiphilic pseudo amino acid composition, evolutionary conservation and homologous proteins based methods outperforms several prediction algorithms and it is also able to score complex variants such as deletions, insertions and indels. CONCLUSIONS: This paper describes a machine-learning approach to predict the deleteriousness of human coding variants. A freely available web application (http://papi.unipv.it) has been developed with the presented method, able to score up to thousands variants in a single run. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0554-8) contains supplementary material, which is available to authorized users. BioMed Central 2015-04-19 /pmc/articles/PMC4411653/ /pubmed/25928477 http://dx.doi.org/10.1186/s12859-015-0554-8 Text en © Limongelli et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Limongelli, Ivan Marini, Simone Bellazzi, Riccardo PaPI: pseudo amino acid composition to score human protein-coding variants |
title | PaPI: pseudo amino acid composition to score human protein-coding variants |
title_full | PaPI: pseudo amino acid composition to score human protein-coding variants |
title_fullStr | PaPI: pseudo amino acid composition to score human protein-coding variants |
title_full_unstemmed | PaPI: pseudo amino acid composition to score human protein-coding variants |
title_short | PaPI: pseudo amino acid composition to score human protein-coding variants |
title_sort | papi: pseudo amino acid composition to score human protein-coding variants |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4411653/ https://www.ncbi.nlm.nih.gov/pubmed/25928477 http://dx.doi.org/10.1186/s12859-015-0554-8 |
work_keys_str_mv | AT limongelliivan papipseudoaminoacidcompositiontoscorehumanproteincodingvariants AT marinisimone papipseudoaminoacidcompositiontoscorehumanproteincodingvariants AT bellazziriccardo papipseudoaminoacidcompositiontoscorehumanproteincodingvariants |