Cargando…
Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices reflecting v...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5431489/ https://www.ncbi.nlm.nih.gov/pubmed/28484283 http://dx.doi.org/10.1038/s41598-017-01699-z |
_version_ | 1783236437175435264 |
---|---|
author | Yao, Yuhua Li, Xianhong Liao, Bo Huang, Li He, Pingan Wang, Fayou Yang, Jiasheng Sun, Hailiang Zhao, Yulong Yang, Jialiang |
author_facet | Yao, Yuhua Li, Xianhong Liao, Bo Huang, Li He, Pingan Wang, Fayou Yang, Jiasheng Sun, Hailiang Zhao, Yulong Yang, Jialiang |
author_sort | Yao, Yuhua |
collection | PubMed |
description | Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices reflecting various amino acids properties in predicting the antigenicity of influenza viruses by a random forest model. We then proposed a novel algorithm called joint random forest regression (JRFR) to jointly consider top substitution matrices. We applied JRFR to human H3N2 seasonal influenza data from 1968 to 2003. A 10-fold cross-validation shows that JRFR outperforms other popular methods in predicting antigenic variants. In addition, our results suggest that structure features are most relevant to influenza antigenicity. By restricting the analysis to data involving two adjacent antigenic clusters, we inferred a few key amino acids mutation driving the 11 historical antigenic drift events, pointing to experimentally validated mutations. Finally, we constructed an antigenic cartography of all H3N2 viruses with hemagglutinin (the glycoprotein on the surface of the influenza virus responsible for its binding to host cells) sequence available from NCBI flu database, and showed an overall correspondence and local inconsistency between genetic and antigenic evolution of H3N2 influenza viruses. |
format | Online Article Text |
id | pubmed-5431489 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-54314892017-05-16 Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method Yao, Yuhua Li, Xianhong Liao, Bo Huang, Li He, Pingan Wang, Fayou Yang, Jiasheng Sun, Hailiang Zhao, Yulong Yang, Jialiang Sci Rep Article Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices reflecting various amino acids properties in predicting the antigenicity of influenza viruses by a random forest model. We then proposed a novel algorithm called joint random forest regression (JRFR) to jointly consider top substitution matrices. We applied JRFR to human H3N2 seasonal influenza data from 1968 to 2003. A 10-fold cross-validation shows that JRFR outperforms other popular methods in predicting antigenic variants. In addition, our results suggest that structure features are most relevant to influenza antigenicity. By restricting the analysis to data involving two adjacent antigenic clusters, we inferred a few key amino acids mutation driving the 11 historical antigenic drift events, pointing to experimentally validated mutations. Finally, we constructed an antigenic cartography of all H3N2 viruses with hemagglutinin (the glycoprotein on the surface of the influenza virus responsible for its binding to host cells) sequence available from NCBI flu database, and showed an overall correspondence and local inconsistency between genetic and antigenic evolution of H3N2 influenza viruses. Nature Publishing Group UK 2017-05-08 /pmc/articles/PMC5431489/ /pubmed/28484283 http://dx.doi.org/10.1038/s41598-017-01699-z Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Yao, Yuhua Li, Xianhong Liao, Bo Huang, Li He, Pingan Wang, Fayou Yang, Jiasheng Sun, Hailiang Zhao, Yulong Yang, Jialiang Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method |
title | Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method |
title_full | Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method |
title_fullStr | Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method |
title_full_unstemmed | Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method |
title_short | Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method |
title_sort | predicting influenza antigenicity from hemagglutintin sequence data based on a joint random forest method |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5431489/ https://www.ncbi.nlm.nih.gov/pubmed/28484283 http://dx.doi.org/10.1038/s41598-017-01699-z |
work_keys_str_mv | AT yaoyuhua predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod AT lixianhong predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod AT liaobo predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod AT huangli predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod AT hepingan predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod AT wangfayou predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod AT yangjiasheng predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod AT sunhailiang predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod AT zhaoyulong predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod AT yangjialiang predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod |