Cargando…

Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies

BACKGROUND: All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We...

Descripción completa

Detalles Bibliográficos
Autores principales: David, Maria Pamela C, Concepcion, Gisela P, Padlan, Eduardo A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098112/
https://www.ncbi.nlm.nih.gov/pubmed/20144194
http://dx.doi.org/10.1186/1471-2105-11-79
_version_ 1782203920377970688
author David, Maria Pamela C
Concepcion, Gisela P
Padlan, Eduardo A
author_facet David, Maria Pamela C
Concepcion, Gisela P
Padlan, Eduardo A
author_sort David, Maria Pamela C
collection PubMed
description BACKGROUND: All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. RESULTS: The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. CONCLUSIONS: This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general.
format Text
id pubmed-3098112
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30981122011-05-20 Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies David, Maria Pamela C Concepcion, Gisela P Padlan, Eduardo A BMC Bioinformatics Research Article BACKGROUND: All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. RESULTS: The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. CONCLUSIONS: This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general. BioMed Central 2010-02-08 /pmc/articles/PMC3098112/ /pubmed/20144194 http://dx.doi.org/10.1186/1471-2105-11-79 Text en Copyright ©2010 David et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
David, Maria Pamela C
Concepcion, Gisela P
Padlan, Eduardo A
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
title Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
title_full Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
title_fullStr Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
title_full_unstemmed Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
title_short Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
title_sort using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098112/
https://www.ncbi.nlm.nih.gov/pubmed/20144194
http://dx.doi.org/10.1186/1471-2105-11-79
work_keys_str_mv AT davidmariapamelac usingsimpleartificialintelligencemethodsforpredictingamyloidogenesisinantibodies
AT concepciongiselap usingsimpleartificialintelligencemethodsforpredictingamyloidogenesisinantibodies
AT padlaneduardoa usingsimpleartificialintelligencemethodsforpredictingamyloidogenesisinantibodies