Cargando…

Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method

Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices reflecting v...

Descripción completa

Detalles Bibliográficos
Autores principales: Yao, Yuhua, Li, Xianhong, Liao, Bo, Huang, Li, He, Pingan, Wang, Fayou, Yang, Jiasheng, Sun, Hailiang, Zhao, Yulong, Yang, Jialiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5431489/
https://www.ncbi.nlm.nih.gov/pubmed/28484283
http://dx.doi.org/10.1038/s41598-017-01699-z
_version_ 1783236437175435264
author Yao, Yuhua
Li, Xianhong
Liao, Bo
Huang, Li
He, Pingan
Wang, Fayou
Yang, Jiasheng
Sun, Hailiang
Zhao, Yulong
Yang, Jialiang
author_facet Yao, Yuhua
Li, Xianhong
Liao, Bo
Huang, Li
He, Pingan
Wang, Fayou
Yang, Jiasheng
Sun, Hailiang
Zhao, Yulong
Yang, Jialiang
author_sort Yao, Yuhua
collection PubMed
description Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices reflecting various amino acids properties in predicting the antigenicity of influenza viruses by a random forest model. We then proposed a novel algorithm called joint random forest regression (JRFR) to jointly consider top substitution matrices. We applied JRFR to human H3N2 seasonal influenza data from 1968 to 2003. A 10-fold cross-validation shows that JRFR outperforms other popular methods in predicting antigenic variants. In addition, our results suggest that structure features are most relevant to influenza antigenicity. By restricting the analysis to data involving two adjacent antigenic clusters, we inferred a few key amino acids mutation driving the 11 historical antigenic drift events, pointing to experimentally validated mutations. Finally, we constructed an antigenic cartography of all H3N2 viruses with hemagglutinin (the glycoprotein on the surface of the influenza virus responsible for its binding to host cells) sequence available from NCBI flu database, and showed an overall correspondence and local inconsistency between genetic and antigenic evolution of H3N2 influenza viruses.
format Online
Article
Text
id pubmed-5431489
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-54314892017-05-16 Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method Yao, Yuhua Li, Xianhong Liao, Bo Huang, Li He, Pingan Wang, Fayou Yang, Jiasheng Sun, Hailiang Zhao, Yulong Yang, Jialiang Sci Rep Article Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices reflecting various amino acids properties in predicting the antigenicity of influenza viruses by a random forest model. We then proposed a novel algorithm called joint random forest regression (JRFR) to jointly consider top substitution matrices. We applied JRFR to human H3N2 seasonal influenza data from 1968 to 2003. A 10-fold cross-validation shows that JRFR outperforms other popular methods in predicting antigenic variants. In addition, our results suggest that structure features are most relevant to influenza antigenicity. By restricting the analysis to data involving two adjacent antigenic clusters, we inferred a few key amino acids mutation driving the 11 historical antigenic drift events, pointing to experimentally validated mutations. Finally, we constructed an antigenic cartography of all H3N2 viruses with hemagglutinin (the glycoprotein on the surface of the influenza virus responsible for its binding to host cells) sequence available from NCBI flu database, and showed an overall correspondence and local inconsistency between genetic and antigenic evolution of H3N2 influenza viruses. Nature Publishing Group UK 2017-05-08 /pmc/articles/PMC5431489/ /pubmed/28484283 http://dx.doi.org/10.1038/s41598-017-01699-z Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Yao, Yuhua
Li, Xianhong
Liao, Bo
Huang, Li
He, Pingan
Wang, Fayou
Yang, Jiasheng
Sun, Hailiang
Zhao, Yulong
Yang, Jialiang
Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
title Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
title_full Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
title_fullStr Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
title_full_unstemmed Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
title_short Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
title_sort predicting influenza antigenicity from hemagglutintin sequence data based on a joint random forest method
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5431489/
https://www.ncbi.nlm.nih.gov/pubmed/28484283
http://dx.doi.org/10.1038/s41598-017-01699-z
work_keys_str_mv AT yaoyuhua predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT lixianhong predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT liaobo predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT huangli predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT hepingan predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT wangfayou predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT yangjiasheng predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT sunhailiang predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT zhaoyulong predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT yangjialiang predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod