Cargando…

Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides

BACKGROUND: Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids,...

Descripción completa

Detalles Bibliográficos
Autores principales: Stanislawski, Jerzy, Kotulska, Malgorzata, Unold, Olgierd
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3566972/
https://www.ncbi.nlm.nih.gov/pubmed/23327628
http://dx.doi.org/10.1186/1471-2105-14-21
_version_ 1782258635400806400
author Stanislawski, Jerzy
Kotulska, Malgorzata
Unold, Olgierd
author_facet Stanislawski, Jerzy
Kotulska, Malgorzata
Unold, Olgierd
author_sort Stanislawski, Jerzy
collection PubMed
description BACKGROUND: Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. RESULTS: We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%). The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning). CONCLUSIONS: We showed that the simplified profile generation method does not introduce an error with regard to the original method, while increasing the computational efficiency. Our new dataset proved representative enough to use simple statistical methods for testing the amylogenicity based only on six letter sequences. Statistical machine learning methods such as Alternating Decision Tree and Multilayer Perceptron can replace the energy based classifier, with advantage of very significantly reduced computational time and simplicity to perform the analysis. Additionally, a decision tree provides a set of very easily interpretable rules.
format Online
Article
Text
id pubmed-3566972
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35669722013-02-11 Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides Stanislawski, Jerzy Kotulska, Malgorzata Unold, Olgierd BMC Bioinformatics Research Article BACKGROUND: Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. RESULTS: We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%). The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning). CONCLUSIONS: We showed that the simplified profile generation method does not introduce an error with regard to the original method, while increasing the computational efficiency. Our new dataset proved representative enough to use simple statistical methods for testing the amylogenicity based only on six letter sequences. Statistical machine learning methods such as Alternating Decision Tree and Multilayer Perceptron can replace the energy based classifier, with advantage of very significantly reduced computational time and simplicity to perform the analysis. Additionally, a decision tree provides a set of very easily interpretable rules. BioMed Central 2013-01-17 /pmc/articles/PMC3566972/ /pubmed/23327628 http://dx.doi.org/10.1186/1471-2105-14-21 Text en Copyright ©2013 Stanislawski et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Stanislawski, Jerzy
Kotulska, Malgorzata
Unold, Olgierd
Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides
title Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides
title_full Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides
title_fullStr Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides
title_full_unstemmed Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides
title_short Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides
title_sort machine learning methods can replace 3d profile method in classification of amyloidogenic hexapeptides
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3566972/
https://www.ncbi.nlm.nih.gov/pubmed/23327628
http://dx.doi.org/10.1186/1471-2105-14-21
work_keys_str_mv AT stanislawskijerzy machinelearningmethodscanreplace3dprofilemethodinclassificationofamyloidogenichexapeptides
AT kotulskamalgorzata machinelearningmethodscanreplace3dprofilemethodinclassificationofamyloidogenichexapeptides
AT unoldolgierd machinelearningmethodscanreplace3dprofilemethodinclassificationofamyloidogenichexapeptides