Cargando…
Non-H3 CDR template selection in antibody modeling through machine learning
Antibodies are proteins generated by the adaptive immune system to recognize and counteract a plethora of pathogens through specific binding. This adaptive binding is mediated by structural diversity in the six complementary determining region (CDR) loops (H1, H2, H3, L1, L2 and L3), which also make...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6330961/ https://www.ncbi.nlm.nih.gov/pubmed/30648015 http://dx.doi.org/10.7717/peerj.6179 |
_version_ | 1783387064069259264 |
---|---|
author | Long, Xiyao Jeliazkov, Jeliazko R. Gray, Jeffrey J. |
author_facet | Long, Xiyao Jeliazkov, Jeliazko R. Gray, Jeffrey J. |
author_sort | Long, Xiyao |
collection | PubMed |
description | Antibodies are proteins generated by the adaptive immune system to recognize and counteract a plethora of pathogens through specific binding. This adaptive binding is mediated by structural diversity in the six complementary determining region (CDR) loops (H1, H2, H3, L1, L2 and L3), which also makes accurate structural modeling of CDRs challenging. Both homology and de novo modeling approaches have been used; to date, the former has achieved greater accuracy for the non-H3 loops. The homology modeling of non-H3 CDRs is more accurate because non-H3 CDR loops of the same length and type can be grouped into a few structural clusters. Most antibody-modeling suites utilize homology modeling for the non-H3 CDRs, differing only in the alignment algorithm and how/if they utilize structural clusters. While RosettaAntibody and SAbPred do not explicitly assign query CDR sequences to clusters, two other approaches, PIGS and Kotai Antibody Builder, utilize sequence-based rules to assign CDR sequences to clusters. While the manually curated sequence rules can identify better structural templates, because their curation requires extensive literature search and human effort, they lag behind the deposition of new antibody structures and are infrequently updated. In this study, we propose a machine learning approach (Gradient Boosting Machine [GBM]) to learn the structural clusters of non-H3 CDRs from sequence alone. The GBM method simplifies feature selection and can easily integrate new data, compared to manual sequence rule curation. We compare the classification results using the GBM method to that of RosettaAntibody in a 3-repeat 10-fold cross-validation (CV) scheme on the cluster-annotated antibody database PyIgClassify and we observe an improvement in the classification accuracy of the concerned loops from 84.5% ± 0.24% to 88.16% ± 0.056%. The GBM models reduce the errors in specific cluster membership misclassifications when the involved clusters have relatively abundant data. Based on the factors identified, we suggest methods that can enrich structural classes with sparse data to further improve prediction accuracy in future studies. |
format | Online Article Text |
id | pubmed-6330961 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-63309612019-01-15 Non-H3 CDR template selection in antibody modeling through machine learning Long, Xiyao Jeliazkov, Jeliazko R. Gray, Jeffrey J. PeerJ Bioinformatics Antibodies are proteins generated by the adaptive immune system to recognize and counteract a plethora of pathogens through specific binding. This adaptive binding is mediated by structural diversity in the six complementary determining region (CDR) loops (H1, H2, H3, L1, L2 and L3), which also makes accurate structural modeling of CDRs challenging. Both homology and de novo modeling approaches have been used; to date, the former has achieved greater accuracy for the non-H3 loops. The homology modeling of non-H3 CDRs is more accurate because non-H3 CDR loops of the same length and type can be grouped into a few structural clusters. Most antibody-modeling suites utilize homology modeling for the non-H3 CDRs, differing only in the alignment algorithm and how/if they utilize structural clusters. While RosettaAntibody and SAbPred do not explicitly assign query CDR sequences to clusters, two other approaches, PIGS and Kotai Antibody Builder, utilize sequence-based rules to assign CDR sequences to clusters. While the manually curated sequence rules can identify better structural templates, because their curation requires extensive literature search and human effort, they lag behind the deposition of new antibody structures and are infrequently updated. In this study, we propose a machine learning approach (Gradient Boosting Machine [GBM]) to learn the structural clusters of non-H3 CDRs from sequence alone. The GBM method simplifies feature selection and can easily integrate new data, compared to manual sequence rule curation. We compare the classification results using the GBM method to that of RosettaAntibody in a 3-repeat 10-fold cross-validation (CV) scheme on the cluster-annotated antibody database PyIgClassify and we observe an improvement in the classification accuracy of the concerned loops from 84.5% ± 0.24% to 88.16% ± 0.056%. The GBM models reduce the errors in specific cluster membership misclassifications when the involved clusters have relatively abundant data. Based on the factors identified, we suggest methods that can enrich structural classes with sparse data to further improve prediction accuracy in future studies. PeerJ Inc. 2019-01-11 /pmc/articles/PMC6330961/ /pubmed/30648015 http://dx.doi.org/10.7717/peerj.6179 Text en ©2019 Long et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Long, Xiyao Jeliazkov, Jeliazko R. Gray, Jeffrey J. Non-H3 CDR template selection in antibody modeling through machine learning |
title | Non-H3 CDR template selection in antibody modeling through machine learning |
title_full | Non-H3 CDR template selection in antibody modeling through machine learning |
title_fullStr | Non-H3 CDR template selection in antibody modeling through machine learning |
title_full_unstemmed | Non-H3 CDR template selection in antibody modeling through machine learning |
title_short | Non-H3 CDR template selection in antibody modeling through machine learning |
title_sort | non-h3 cdr template selection in antibody modeling through machine learning |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6330961/ https://www.ncbi.nlm.nih.gov/pubmed/30648015 http://dx.doi.org/10.7717/peerj.6179 |
work_keys_str_mv | AT longxiyao nonh3cdrtemplateselectioninantibodymodelingthroughmachinelearning AT jeliazkovjeliazkor nonh3cdrtemplateselectioninantibodymodelingthroughmachinelearning AT grayjeffreyj nonh3cdrtemplateselectioninantibodymodelingthroughmachinelearning |