Cargando…

Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions

Each influenza pandemic was caused at least partly by avian- and/or swine-origin influenza A viruses (IAVs). The timing of and the potential IAVs involved in the next pandemic are currently unpredictable. We aim to build machine learning (ML) models to predict human-adaptive IAV nucleotide compositi...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Jing, Zhang, Sen, Li, Bo, Hu, Yi, Kang, Xiao-Ping, Wu, Xiao-Yan, Huang, Meng-Ting, Li, Yu-Chang, Zhao, Zhong-Peng, Qin, Cheng-Feng, Jiang, Tao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7086167/
https://www.ncbi.nlm.nih.gov/pubmed/31750915
http://dx.doi.org/10.1093/molbev/msz276
_version_ 1783509071997960192
author Li, Jing
Zhang, Sen
Li, Bo
Hu, Yi
Kang, Xiao-Ping
Wu, Xiao-Yan
Huang, Meng-Ting
Li, Yu-Chang
Zhao, Zhong-Peng
Qin, Cheng-Feng
Jiang, Tao
author_facet Li, Jing
Zhang, Sen
Li, Bo
Hu, Yi
Kang, Xiao-Ping
Wu, Xiao-Yan
Huang, Meng-Ting
Li, Yu-Chang
Zhao, Zhong-Peng
Qin, Cheng-Feng
Jiang, Tao
author_sort Li, Jing
collection PubMed
description Each influenza pandemic was caused at least partly by avian- and/or swine-origin influenza A viruses (IAVs). The timing of and the potential IAVs involved in the next pandemic are currently unpredictable. We aim to build machine learning (ML) models to predict human-adaptive IAV nucleotide composition. A total of 217,549 IAV full-length coding sequences of the PB2 (polymerase basic protein-2), PB1, PA (polymerase acidic protein), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase) segments were decomposed for their codon position-based mononucleotides (12 nts) and dinucleotides (48 dnts). A total of 68,742 human sequences and 68,739 avian sequences (1:1) were resampled to characterize the human adaptation-associated (d)nts with principal component analysis (PCA) and other ML models. Then, the human adaptation of IAV sequences was predicted based on the characterized (d)nts. Respectively, 9, 12, 11, 13, 10 and 9 human-adaptive (d)nts were optimized for the six segments. PCA and hierarchical clustering analysis revealed the linear separability of the optimized (d)nts between the human-adaptive and avian-adaptive sets. The results of the confusion matrix and the area under the receiver operating characteristic curve indicated a high performance of the ML models to predict human adaptation of IAVs. Our model performed well in predicting the human adaptation of the swine/avian IAVs before and after the 2009 H1N1 pandemic. In conclusion, we identified the human adaptation-associated genomic composition of IAV segments. ML models for IAV human adaptation prediction using large IAV genomic data sets can facilitate the identification of key viral factors that affect virus transmission/pathogenicity. Most importantly, it allows the prediction of pandemic influenza.
format Online
Article
Text
id pubmed-7086167
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-70861672020-03-26 Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions Li, Jing Zhang, Sen Li, Bo Hu, Yi Kang, Xiao-Ping Wu, Xiao-Yan Huang, Meng-Ting Li, Yu-Chang Zhao, Zhong-Peng Qin, Cheng-Feng Jiang, Tao Mol Biol Evol Methods Each influenza pandemic was caused at least partly by avian- and/or swine-origin influenza A viruses (IAVs). The timing of and the potential IAVs involved in the next pandemic are currently unpredictable. We aim to build machine learning (ML) models to predict human-adaptive IAV nucleotide composition. A total of 217,549 IAV full-length coding sequences of the PB2 (polymerase basic protein-2), PB1, PA (polymerase acidic protein), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase) segments were decomposed for their codon position-based mononucleotides (12 nts) and dinucleotides (48 dnts). A total of 68,742 human sequences and 68,739 avian sequences (1:1) were resampled to characterize the human adaptation-associated (d)nts with principal component analysis (PCA) and other ML models. Then, the human adaptation of IAV sequences was predicted based on the characterized (d)nts. Respectively, 9, 12, 11, 13, 10 and 9 human-adaptive (d)nts were optimized for the six segments. PCA and hierarchical clustering analysis revealed the linear separability of the optimized (d)nts between the human-adaptive and avian-adaptive sets. The results of the confusion matrix and the area under the receiver operating characteristic curve indicated a high performance of the ML models to predict human adaptation of IAVs. Our model performed well in predicting the human adaptation of the swine/avian IAVs before and after the 2009 H1N1 pandemic. In conclusion, we identified the human adaptation-associated genomic composition of IAV segments. ML models for IAV human adaptation prediction using large IAV genomic data sets can facilitate the identification of key viral factors that affect virus transmission/pathogenicity. Most importantly, it allows the prediction of pandemic influenza. Oxford University Press 2020-04 2019-11-21 /pmc/articles/PMC7086167/ /pubmed/31750915 http://dx.doi.org/10.1093/molbev/msz276 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
Li, Jing
Zhang, Sen
Li, Bo
Hu, Yi
Kang, Xiao-Ping
Wu, Xiao-Yan
Huang, Meng-Ting
Li, Yu-Chang
Zhao, Zhong-Peng
Qin, Cheng-Feng
Jiang, Tao
Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions
title Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions
title_full Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions
title_fullStr Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions
title_full_unstemmed Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions
title_short Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions
title_sort machine learning methods for predicting human-adaptive influenza a viruses based on viral nucleotide compositions
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7086167/
https://www.ncbi.nlm.nih.gov/pubmed/31750915
http://dx.doi.org/10.1093/molbev/msz276
work_keys_str_mv AT lijing machinelearningmethodsforpredictinghumanadaptiveinfluenzaavirusesbasedonviralnucleotidecompositions
AT zhangsen machinelearningmethodsforpredictinghumanadaptiveinfluenzaavirusesbasedonviralnucleotidecompositions
AT libo machinelearningmethodsforpredictinghumanadaptiveinfluenzaavirusesbasedonviralnucleotidecompositions
AT huyi machinelearningmethodsforpredictinghumanadaptiveinfluenzaavirusesbasedonviralnucleotidecompositions
AT kangxiaoping machinelearningmethodsforpredictinghumanadaptiveinfluenzaavirusesbasedonviralnucleotidecompositions
AT wuxiaoyan machinelearningmethodsforpredictinghumanadaptiveinfluenzaavirusesbasedonviralnucleotidecompositions
AT huangmengting machinelearningmethodsforpredictinghumanadaptiveinfluenzaavirusesbasedonviralnucleotidecompositions
AT liyuchang machinelearningmethodsforpredictinghumanadaptiveinfluenzaavirusesbasedonviralnucleotidecompositions
AT zhaozhongpeng machinelearningmethodsforpredictinghumanadaptiveinfluenzaavirusesbasedonviralnucleotidecompositions
AT qinchengfeng machinelearningmethodsforpredictinghumanadaptiveinfluenzaavirusesbasedonviralnucleotidecompositions
AT jiangtao machinelearningmethodsforpredictinghumanadaptiveinfluenzaavirusesbasedonviralnucleotidecompositions