Cargando…

PaPI: pseudo amino acid composition to score human protein-coding variants

BACKGROUND: High throughput sequencing technologies are able to identify the whole genomic variation of an individual. Gene-targeted and whole-exome experiments are mainly focused on coding sequence variants related to a single or multiple nucleotides. The analysis of the biological significance of...

Descripción completa

Detalles Bibliográficos
Autores principales: Limongelli, Ivan, Marini, Simone, Bellazzi, Riccardo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4411653/
https://www.ncbi.nlm.nih.gov/pubmed/25928477
http://dx.doi.org/10.1186/s12859-015-0554-8
_version_ 1782368512557187072
author Limongelli, Ivan
Marini, Simone
Bellazzi, Riccardo
author_facet Limongelli, Ivan
Marini, Simone
Bellazzi, Riccardo
author_sort Limongelli, Ivan
collection PubMed
description BACKGROUND: High throughput sequencing technologies are able to identify the whole genomic variation of an individual. Gene-targeted and whole-exome experiments are mainly focused on coding sequence variants related to a single or multiple nucleotides. The analysis of the biological significance of this multitude of genomic variant is challenging and computational demanding. RESULTS: We present PaPI, a new machine-learning approach to classify and score human coding variants by estimating the probability to damage their protein-related function. The novelty of this approach consists in using pseudo amino acid composition through which wild and mutated protein sequences are represented in a discrete model. A machine learning classifier has been trained on a set of known deleterious and benign coding variants with the aim to score unobserved variants by taking into account hidden sequence patterns in human genome potentially leading to diseases. We show how the combination of amphiphilic pseudo amino acid composition, evolutionary conservation and homologous proteins based methods outperforms several prediction algorithms and it is also able to score complex variants such as deletions, insertions and indels. CONCLUSIONS: This paper describes a machine-learning approach to predict the deleteriousness of human coding variants. A freely available web application (http://papi.unipv.it) has been developed with the presented method, able to score up to thousands variants in a single run. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0554-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4411653
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44116532015-04-29 PaPI: pseudo amino acid composition to score human protein-coding variants Limongelli, Ivan Marini, Simone Bellazzi, Riccardo BMC Bioinformatics Methodology Article BACKGROUND: High throughput sequencing technologies are able to identify the whole genomic variation of an individual. Gene-targeted and whole-exome experiments are mainly focused on coding sequence variants related to a single or multiple nucleotides. The analysis of the biological significance of this multitude of genomic variant is challenging and computational demanding. RESULTS: We present PaPI, a new machine-learning approach to classify and score human coding variants by estimating the probability to damage their protein-related function. The novelty of this approach consists in using pseudo amino acid composition through which wild and mutated protein sequences are represented in a discrete model. A machine learning classifier has been trained on a set of known deleterious and benign coding variants with the aim to score unobserved variants by taking into account hidden sequence patterns in human genome potentially leading to diseases. We show how the combination of amphiphilic pseudo amino acid composition, evolutionary conservation and homologous proteins based methods outperforms several prediction algorithms and it is also able to score complex variants such as deletions, insertions and indels. CONCLUSIONS: This paper describes a machine-learning approach to predict the deleteriousness of human coding variants. A freely available web application (http://papi.unipv.it) has been developed with the presented method, able to score up to thousands variants in a single run. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0554-8) contains supplementary material, which is available to authorized users. BioMed Central 2015-04-19 /pmc/articles/PMC4411653/ /pubmed/25928477 http://dx.doi.org/10.1186/s12859-015-0554-8 Text en © Limongelli et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Limongelli, Ivan
Marini, Simone
Bellazzi, Riccardo
PaPI: pseudo amino acid composition to score human protein-coding variants
title PaPI: pseudo amino acid composition to score human protein-coding variants
title_full PaPI: pseudo amino acid composition to score human protein-coding variants
title_fullStr PaPI: pseudo amino acid composition to score human protein-coding variants
title_full_unstemmed PaPI: pseudo amino acid composition to score human protein-coding variants
title_short PaPI: pseudo amino acid composition to score human protein-coding variants
title_sort papi: pseudo amino acid composition to score human protein-coding variants
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4411653/
https://www.ncbi.nlm.nih.gov/pubmed/25928477
http://dx.doi.org/10.1186/s12859-015-0554-8
work_keys_str_mv AT limongelliivan papipseudoaminoacidcompositiontoscorehumanproteincodingvariants
AT marinisimone papipseudoaminoacidcompositiontoscorehumanproteincodingvariants
AT bellazziriccardo papipseudoaminoacidcompositiontoscorehumanproteincodingvariants