Cargando…
Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection
Screening and in silico modeling are critical activities for the reduction of experimental costs. They also speed up research notably and strengthen the theoretical framework, thus allowing researchers to numerically quantify the importance of a particular subset of information. For example, in fiel...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6200741/ https://www.ncbi.nlm.nih.gov/pubmed/30356060 http://dx.doi.org/10.1038/s41598-018-33911-z |
_version_ | 1783365381621022720 |
---|---|
author | Blanco, Jose Liñares Porto-Pazos, Ana B. Pazos, Alejandro Fernandez-Lozano, Carlos |
author_facet | Blanco, Jose Liñares Porto-Pazos, Ana B. Pazos, Alejandro Fernandez-Lozano, Carlos |
author_sort | Blanco, Jose Liñares |
collection | PubMed |
description | Screening and in silico modeling are critical activities for the reduction of experimental costs. They also speed up research notably and strengthen the theoretical framework, thus allowing researchers to numerically quantify the importance of a particular subset of information. For example, in fields such as cancer and other highly prevalent diseases, having a reliable prediction method is crucial. The objective of this paper is to classify peptide sequences according to their anti-angiogenic activity to understand the underlying principles via machine learning. First, the peptide sequences were converted into three types of numerical molecular descriptors based on the amino acid composition. We performed different experiments with the descriptors and merged them to obtain baseline results for the performance of the models, particularly of each molecular descriptor subset. A feature selection process was applied to reduce the dimensionality of the problem and remove noisy features – which are highly present in biological problems. After a robust machine learning experimental design under equal conditions (nested resampling, cross-validation, hyperparameter tuning and different runs), we statistically and significantly outperformed the best previously published anti-angiogenic model with a generalized linear model via coordinate descent (glmnet), achieving a mean AUC value greater than 0.96 and with an accuracy of 0.86 with 200 molecular descriptors, mixed from the three groups. A final analysis with the top-40 discriminative anti-angiogenic activity peptides is presented along with a discussion of the feature selection process and the individual importance of each molecular descriptors According to our findings, anti-angiogenic activity peptides are strongly associated with amino acid sequences SP, LSL, PF, DIT, PC, GH, RQ, QD, TC, SC, AS, CLD, ST, MF, GRE, IQ, CQ and HG. |
format | Online Article Text |
id | pubmed-6200741 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-62007412018-10-25 Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection Blanco, Jose Liñares Porto-Pazos, Ana B. Pazos, Alejandro Fernandez-Lozano, Carlos Sci Rep Article Screening and in silico modeling are critical activities for the reduction of experimental costs. They also speed up research notably and strengthen the theoretical framework, thus allowing researchers to numerically quantify the importance of a particular subset of information. For example, in fields such as cancer and other highly prevalent diseases, having a reliable prediction method is crucial. The objective of this paper is to classify peptide sequences according to their anti-angiogenic activity to understand the underlying principles via machine learning. First, the peptide sequences were converted into three types of numerical molecular descriptors based on the amino acid composition. We performed different experiments with the descriptors and merged them to obtain baseline results for the performance of the models, particularly of each molecular descriptor subset. A feature selection process was applied to reduce the dimensionality of the problem and remove noisy features – which are highly present in biological problems. After a robust machine learning experimental design under equal conditions (nested resampling, cross-validation, hyperparameter tuning and different runs), we statistically and significantly outperformed the best previously published anti-angiogenic model with a generalized linear model via coordinate descent (glmnet), achieving a mean AUC value greater than 0.96 and with an accuracy of 0.86 with 200 molecular descriptors, mixed from the three groups. A final analysis with the top-40 discriminative anti-angiogenic activity peptides is presented along with a discussion of the feature selection process and the individual importance of each molecular descriptors According to our findings, anti-angiogenic activity peptides are strongly associated with amino acid sequences SP, LSL, PF, DIT, PC, GH, RQ, QD, TC, SC, AS, CLD, ST, MF, GRE, IQ, CQ and HG. Nature Publishing Group UK 2018-10-24 /pmc/articles/PMC6200741/ /pubmed/30356060 http://dx.doi.org/10.1038/s41598-018-33911-z Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Blanco, Jose Liñares Porto-Pazos, Ana B. Pazos, Alejandro Fernandez-Lozano, Carlos Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection |
title | Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection |
title_full | Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection |
title_fullStr | Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection |
title_full_unstemmed | Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection |
title_short | Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection |
title_sort | prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6200741/ https://www.ncbi.nlm.nih.gov/pubmed/30356060 http://dx.doi.org/10.1038/s41598-018-33911-z |
work_keys_str_mv | AT blancojoselinares predictionofhighantiangiogenicactivitypeptidesinsilicousingageneralizedlinearmodelandfeatureselection AT portopazosanab predictionofhighantiangiogenicactivitypeptidesinsilicousingageneralizedlinearmodelandfeatureselection AT pazosalejandro predictionofhighantiangiogenicactivitypeptidesinsilicousingageneralizedlinearmodelandfeatureselection AT fernandezlozanocarlos predictionofhighantiangiogenicactivitypeptidesinsilicousingageneralizedlinearmodelandfeatureselection |