Cargando…

Predicting the host of influenza viruses based on the word vector

Newly emerging influenza viruses continue to threaten public health. A rapid determination of the host range of newly discovered influenza viruses would assist in early assessment of their risk. Here, we attempted to predict the host of influenza viruses using the Support Vector Machine (SVM) classi...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Beibei, Tan, Zhiying, Li, Kenli, Jiang, Taijiao, Peng, Yousong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5518728/
https://www.ncbi.nlm.nih.gov/pubmed/28729956
http://dx.doi.org/10.7717/peerj.3579
_version_ 1783251532374867968
author Xu, Beibei
Tan, Zhiying
Li, Kenli
Jiang, Taijiao
Peng, Yousong
author_facet Xu, Beibei
Tan, Zhiying
Li, Kenli
Jiang, Taijiao
Peng, Yousong
author_sort Xu, Beibei
collection PubMed
description Newly emerging influenza viruses continue to threaten public health. A rapid determination of the host range of newly discovered influenza viruses would assist in early assessment of their risk. Here, we attempted to predict the host of influenza viruses using the Support Vector Machine (SVM) classifier based on the word vector, a new representation and feature extraction method for biological sequences. The results show that the length of the word within the word vector, the sequence type (DNA or protein) and the species from which the sequences were derived for generating the word vector all influence the performance of models in predicting the host of influenza viruses. In nearly all cases, the models built on the surface proteins hemagglutinin (HA) and neuraminidase (NA) (or their genes) produced better results than internal influenza proteins (or their genes). The best performance was achieved when the model was built on the HA gene based on word vectors (words of three-letters long) generated from DNA sequences of the influenza virus. This results in accuracies of 99.7% for avian, 96.9% for human and 90.6% for swine influenza viruses. Compared to the method of sequence homology best-hit searches using the Basic Local Alignment Search Tool (BLAST), the word vector-based models still need further improvements in predicting the host of influenza A viruses.
format Online
Article
Text
id pubmed-5518728
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-55187282017-07-20 Predicting the host of influenza viruses based on the word vector Xu, Beibei Tan, Zhiying Li, Kenli Jiang, Taijiao Peng, Yousong PeerJ Bioinformatics Newly emerging influenza viruses continue to threaten public health. A rapid determination of the host range of newly discovered influenza viruses would assist in early assessment of their risk. Here, we attempted to predict the host of influenza viruses using the Support Vector Machine (SVM) classifier based on the word vector, a new representation and feature extraction method for biological sequences. The results show that the length of the word within the word vector, the sequence type (DNA or protein) and the species from which the sequences were derived for generating the word vector all influence the performance of models in predicting the host of influenza viruses. In nearly all cases, the models built on the surface proteins hemagglutinin (HA) and neuraminidase (NA) (or their genes) produced better results than internal influenza proteins (or their genes). The best performance was achieved when the model was built on the HA gene based on word vectors (words of three-letters long) generated from DNA sequences of the influenza virus. This results in accuracies of 99.7% for avian, 96.9% for human and 90.6% for swine influenza viruses. Compared to the method of sequence homology best-hit searches using the Basic Local Alignment Search Tool (BLAST), the word vector-based models still need further improvements in predicting the host of influenza A viruses. PeerJ Inc. 2017-07-18 /pmc/articles/PMC5518728/ /pubmed/28729956 http://dx.doi.org/10.7717/peerj.3579 Text en ©2017 Xu et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Xu, Beibei
Tan, Zhiying
Li, Kenli
Jiang, Taijiao
Peng, Yousong
Predicting the host of influenza viruses based on the word vector
title Predicting the host of influenza viruses based on the word vector
title_full Predicting the host of influenza viruses based on the word vector
title_fullStr Predicting the host of influenza viruses based on the word vector
title_full_unstemmed Predicting the host of influenza viruses based on the word vector
title_short Predicting the host of influenza viruses based on the word vector
title_sort predicting the host of influenza viruses based on the word vector
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5518728/
https://www.ncbi.nlm.nih.gov/pubmed/28729956
http://dx.doi.org/10.7717/peerj.3579
work_keys_str_mv AT xubeibei predictingthehostofinfluenzavirusesbasedonthewordvector
AT tanzhiying predictingthehostofinfluenzavirusesbasedonthewordvector
AT likenli predictingthehostofinfluenzavirusesbasedonthewordvector
AT jiangtaijiao predictingthehostofinfluenzavirusesbasedonthewordvector
AT pengyousong predictingthehostofinfluenzavirusesbasedonthewordvector