Cargando…

Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection

Screening and in silico modeling are critical activities for the reduction of experimental costs. They also speed up research notably and strengthen the theoretical framework, thus allowing researchers to numerically quantify the importance of a particular subset of information. For example, in fiel...

Descripción completa

Detalles Bibliográficos
Autores principales: Blanco, Jose Liñares, Porto-Pazos, Ana B., Pazos, Alejandro, Fernandez-Lozano, Carlos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6200741/
https://www.ncbi.nlm.nih.gov/pubmed/30356060
http://dx.doi.org/10.1038/s41598-018-33911-z
_version_ 1783365381621022720
author Blanco, Jose Liñares
Porto-Pazos, Ana B.
Pazos, Alejandro
Fernandez-Lozano, Carlos
author_facet Blanco, Jose Liñares
Porto-Pazos, Ana B.
Pazos, Alejandro
Fernandez-Lozano, Carlos
author_sort Blanco, Jose Liñares
collection PubMed
description Screening and in silico modeling are critical activities for the reduction of experimental costs. They also speed up research notably and strengthen the theoretical framework, thus allowing researchers to numerically quantify the importance of a particular subset of information. For example, in fields such as cancer and other highly prevalent diseases, having a reliable prediction method is crucial. The objective of this paper is to classify peptide sequences according to their anti-angiogenic activity to understand the underlying principles via machine learning. First, the peptide sequences were converted into three types of numerical molecular descriptors based on the amino acid composition. We performed different experiments with the descriptors and merged them to obtain baseline results for the performance of the models, particularly of each molecular descriptor subset. A feature selection process was applied to reduce the dimensionality of the problem and remove noisy features – which are highly present in biological problems. After a robust machine learning experimental design under equal conditions (nested resampling, cross-validation, hyperparameter tuning and different runs), we statistically and significantly outperformed the best previously published anti-angiogenic model with a generalized linear model via coordinate descent (glmnet), achieving a mean AUC value greater than 0.96 and with an accuracy of 0.86 with 200 molecular descriptors, mixed from the three groups. A final analysis with the top-40 discriminative anti-angiogenic activity peptides is presented along with a discussion of the feature selection process and the individual importance of each molecular descriptors According to our findings, anti-angiogenic activity peptides are strongly associated with amino acid sequences SP, LSL, PF, DIT, PC, GH, RQ, QD, TC, SC, AS, CLD, ST, MF, GRE, IQ, CQ and HG.
format Online
Article
Text
id pubmed-6200741
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-62007412018-10-25 Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection Blanco, Jose Liñares Porto-Pazos, Ana B. Pazos, Alejandro Fernandez-Lozano, Carlos Sci Rep Article Screening and in silico modeling are critical activities for the reduction of experimental costs. They also speed up research notably and strengthen the theoretical framework, thus allowing researchers to numerically quantify the importance of a particular subset of information. For example, in fields such as cancer and other highly prevalent diseases, having a reliable prediction method is crucial. The objective of this paper is to classify peptide sequences according to their anti-angiogenic activity to understand the underlying principles via machine learning. First, the peptide sequences were converted into three types of numerical molecular descriptors based on the amino acid composition. We performed different experiments with the descriptors and merged them to obtain baseline results for the performance of the models, particularly of each molecular descriptor subset. A feature selection process was applied to reduce the dimensionality of the problem and remove noisy features – which are highly present in biological problems. After a robust machine learning experimental design under equal conditions (nested resampling, cross-validation, hyperparameter tuning and different runs), we statistically and significantly outperformed the best previously published anti-angiogenic model with a generalized linear model via coordinate descent (glmnet), achieving a mean AUC value greater than 0.96 and with an accuracy of 0.86 with 200 molecular descriptors, mixed from the three groups. A final analysis with the top-40 discriminative anti-angiogenic activity peptides is presented along with a discussion of the feature selection process and the individual importance of each molecular descriptors According to our findings, anti-angiogenic activity peptides are strongly associated with amino acid sequences SP, LSL, PF, DIT, PC, GH, RQ, QD, TC, SC, AS, CLD, ST, MF, GRE, IQ, CQ and HG. Nature Publishing Group UK 2018-10-24 /pmc/articles/PMC6200741/ /pubmed/30356060 http://dx.doi.org/10.1038/s41598-018-33911-z Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Blanco, Jose Liñares
Porto-Pazos, Ana B.
Pazos, Alejandro
Fernandez-Lozano, Carlos
Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection
title Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection
title_full Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection
title_fullStr Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection
title_full_unstemmed Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection
title_short Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection
title_sort prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6200741/
https://www.ncbi.nlm.nih.gov/pubmed/30356060
http://dx.doi.org/10.1038/s41598-018-33911-z
work_keys_str_mv AT blancojoselinares predictionofhighantiangiogenicactivitypeptidesinsilicousingageneralizedlinearmodelandfeatureselection
AT portopazosanab predictionofhighantiangiogenicactivitypeptidesinsilicousingageneralizedlinearmodelandfeatureselection
AT pazosalejandro predictionofhighantiangiogenicactivitypeptidesinsilicousingageneralizedlinearmodelandfeatureselection
AT fernandezlozanocarlos predictionofhighantiangiogenicactivitypeptidesinsilicousingageneralizedlinearmodelandfeatureselection