Cargando…

Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences

Predicting the hosts of newly discovered viruses is important for pandemic surveillance of infectious diseases. We investigated the use of alignment-based and alignment-free methods and support vector machine using mononucleotide frequency and dinucleotide bias to predict the hosts of viruses, and a...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Han, Sun, Fengzhu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030160/
https://www.ncbi.nlm.nih.gov/pubmed/29968780
http://dx.doi.org/10.1038/s41598-018-28308-x
_version_ 1783337089222311936
author Li, Han
Sun, Fengzhu
author_facet Li, Han
Sun, Fengzhu
author_sort Li, Han
collection PubMed
description Predicting the hosts of newly discovered viruses is important for pandemic surveillance of infectious diseases. We investigated the use of alignment-based and alignment-free methods and support vector machine using mononucleotide frequency and dinucleotide bias to predict the hosts of viruses, and applied these approaches to three datasets: rabies virus, coronavirus, and influenza A virus. For coronavirus, we used the spike gene sequences, while for rabies and influenza A viruses, we used the more conserved nucleoprotein gene sequences. We compared the three methods under different scenarios and showed that their performances are highly correlated with the variability of sequences and sample size. For conserved genes like the nucleoprotein gene, longer k-mers than mono- and dinucleotides are needed to better distinguish the sequences. We also showed that both alignment-based and alignment-free methods can accurately predict the hosts of viruses. When alignment is difficult to achieve or highly time-consuming, alignment-free methods can be a promising substitute to predict the hosts of new viruses.
format Online
Article
Text
id pubmed-6030160
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-60301602018-07-11 Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences Li, Han Sun, Fengzhu Sci Rep Article Predicting the hosts of newly discovered viruses is important for pandemic surveillance of infectious diseases. We investigated the use of alignment-based and alignment-free methods and support vector machine using mononucleotide frequency and dinucleotide bias to predict the hosts of viruses, and applied these approaches to three datasets: rabies virus, coronavirus, and influenza A virus. For coronavirus, we used the spike gene sequences, while for rabies and influenza A viruses, we used the more conserved nucleoprotein gene sequences. We compared the three methods under different scenarios and showed that their performances are highly correlated with the variability of sequences and sample size. For conserved genes like the nucleoprotein gene, longer k-mers than mono- and dinucleotides are needed to better distinguish the sequences. We also showed that both alignment-based and alignment-free methods can accurately predict the hosts of viruses. When alignment is difficult to achieve or highly time-consuming, alignment-free methods can be a promising substitute to predict the hosts of new viruses. Nature Publishing Group UK 2018-07-03 /pmc/articles/PMC6030160/ /pubmed/29968780 http://dx.doi.org/10.1038/s41598-018-28308-x Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Li, Han
Sun, Fengzhu
Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences
title Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences
title_full Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences
title_fullStr Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences
title_full_unstemmed Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences
title_short Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences
title_sort comparative studies of alignment, alignment-free and svm based approaches for predicting the hosts of viruses based on viral sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030160/
https://www.ncbi.nlm.nih.gov/pubmed/29968780
http://dx.doi.org/10.1038/s41598-018-28308-x
work_keys_str_mv AT lihan comparativestudiesofalignmentalignmentfreeandsvmbasedapproachesforpredictingthehostsofvirusesbasedonviralsequences
AT sunfengzhu comparativestudiesofalignmentalignmentfreeandsvmbasedapproachesforpredictingthehostsofvirusesbasedonviralsequences