Cargando…

Non-H3 CDR template selection in antibody modeling through machine learning

Antibodies are proteins generated by the adaptive immune system to recognize and counteract a plethora of pathogens through specific binding. This adaptive binding is mediated by structural diversity in the six complementary determining region (CDR) loops (H1, H2, H3, L1, L2 and L3), which also make...

Descripción completa

Detalles Bibliográficos
Autores principales:	Long, Xiyao, Jeliazkov, Jeliazko R., Gray, Jeffrey J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2019
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6330961/ https://www.ncbi.nlm.nih.gov/pubmed/30648015 http://dx.doi.org/10.7717/peerj.6179

_version_	1783387064069259264
author	Long, Xiyao Jeliazkov, Jeliazko R. Gray, Jeffrey J.
author_facet	Long, Xiyao Jeliazkov, Jeliazko R. Gray, Jeffrey J.
author_sort	Long, Xiyao
collection	PubMed
description	Antibodies are proteins generated by the adaptive immune system to recognize and counteract a plethora of pathogens through specific binding. This adaptive binding is mediated by structural diversity in the six complementary determining region (CDR) loops (H1, H2, H3, L1, L2 and L3), which also makes accurate structural modeling of CDRs challenging. Both homology and de novo modeling approaches have been used; to date, the former has achieved greater accuracy for the non-H3 loops. The homology modeling of non-H3 CDRs is more accurate because non-H3 CDR loops of the same length and type can be grouped into a few structural clusters. Most antibody-modeling suites utilize homology modeling for the non-H3 CDRs, differing only in the alignment algorithm and how/if they utilize structural clusters. While RosettaAntibody and SAbPred do not explicitly assign query CDR sequences to clusters, two other approaches, PIGS and Kotai Antibody Builder, utilize sequence-based rules to assign CDR sequences to clusters. While the manually curated sequence rules can identify better structural templates, because their curation requires extensive literature search and human effort, they lag behind the deposition of new antibody structures and are infrequently updated. In this study, we propose a machine learning approach (Gradient Boosting Machine [GBM]) to learn the structural clusters of non-H3 CDRs from sequence alone. The GBM method simplifies feature selection and can easily integrate new data, compared to manual sequence rule curation. We compare the classification results using the GBM method to that of RosettaAntibody in a 3-repeat 10-fold cross-validation (CV) scheme on the cluster-annotated antibody database PyIgClassify and we observe an improvement in the classification accuracy of the concerned loops from 84.5% ± 0.24% to 88.16% ± 0.056%. The GBM models reduce the errors in specific cluster membership misclassifications when the involved clusters have relatively abundant data. Based on the factors identified, we suggest methods that can enrich structural classes with sparse data to further improve prediction accuracy in future studies.
format	Online Article Text
id	pubmed-6330961
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-63309612019-01-15 Non-H3 CDR template selection in antibody modeling through machine learning Long, Xiyao Jeliazkov, Jeliazko R. Gray, Jeffrey J. PeerJ Bioinformatics Antibodies are proteins generated by the adaptive immune system to recognize and counteract a plethora of pathogens through specific binding. This adaptive binding is mediated by structural diversity in the six complementary determining region (CDR) loops (H1, H2, H3, L1, L2 and L3), which also makes accurate structural modeling of CDRs challenging. Both homology and de novo modeling approaches have been used; to date, the former has achieved greater accuracy for the non-H3 loops. The homology modeling of non-H3 CDRs is more accurate because non-H3 CDR loops of the same length and type can be grouped into a few structural clusters. Most antibody-modeling suites utilize homology modeling for the non-H3 CDRs, differing only in the alignment algorithm and how/if they utilize structural clusters. While RosettaAntibody and SAbPred do not explicitly assign query CDR sequences to clusters, two other approaches, PIGS and Kotai Antibody Builder, utilize sequence-based rules to assign CDR sequences to clusters. While the manually curated sequence rules can identify better structural templates, because their curation requires extensive literature search and human effort, they lag behind the deposition of new antibody structures and are infrequently updated. In this study, we propose a machine learning approach (Gradient Boosting Machine [GBM]) to learn the structural clusters of non-H3 CDRs from sequence alone. The GBM method simplifies feature selection and can easily integrate new data, compared to manual sequence rule curation. We compare the classification results using the GBM method to that of RosettaAntibody in a 3-repeat 10-fold cross-validation (CV) scheme on the cluster-annotated antibody database PyIgClassify and we observe an improvement in the classification accuracy of the concerned loops from 84.5% ± 0.24% to 88.16% ± 0.056%. The GBM models reduce the errors in specific cluster membership misclassifications when the involved clusters have relatively abundant data. Based on the factors identified, we suggest methods that can enrich structural classes with sparse data to further improve prediction accuracy in future studies. PeerJ Inc. 2019-01-11 /pmc/articles/PMC6330961/ /pubmed/30648015 http://dx.doi.org/10.7717/peerj.6179 Text en ©2019 Long et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Long, Xiyao Jeliazkov, Jeliazko R. Gray, Jeffrey J. Non-H3 CDR template selection in antibody modeling through machine learning
title	Non-H3 CDR template selection in antibody modeling through machine learning
title_full	Non-H3 CDR template selection in antibody modeling through machine learning
title_fullStr	Non-H3 CDR template selection in antibody modeling through machine learning
title_full_unstemmed	Non-H3 CDR template selection in antibody modeling through machine learning
title_short	Non-H3 CDR template selection in antibody modeling through machine learning
title_sort	non-h3 cdr template selection in antibody modeling through machine learning
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6330961/ https://www.ncbi.nlm.nih.gov/pubmed/30648015 http://dx.doi.org/10.7717/peerj.6179
work_keys_str_mv	AT longxiyao nonh3cdrtemplateselectioninantibodymodelingthroughmachinelearning AT jeliazkovjeliazkor nonh3cdrtemplateselectioninantibodymodelingthroughmachinelearning AT grayjeffreyj nonh3cdrtemplateselectioninantibodymodelingthroughmachinelearning

Non-H3 CDR template selection in antibody modeling through machine learning

Ejemplares similares