Cargando…
Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences
Predicting the hosts of newly discovered viruses is important for pandemic surveillance of infectious diseases. We investigated the use of alignment-based and alignment-free methods and support vector machine using mononucleotide frequency and dinucleotide bias to predict the hosts of viruses, and a...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030160/ https://www.ncbi.nlm.nih.gov/pubmed/29968780 http://dx.doi.org/10.1038/s41598-018-28308-x |
_version_ | 1783337089222311936 |
---|---|
author | Li, Han Sun, Fengzhu |
author_facet | Li, Han Sun, Fengzhu |
author_sort | Li, Han |
collection | PubMed |
description | Predicting the hosts of newly discovered viruses is important for pandemic surveillance of infectious diseases. We investigated the use of alignment-based and alignment-free methods and support vector machine using mononucleotide frequency and dinucleotide bias to predict the hosts of viruses, and applied these approaches to three datasets: rabies virus, coronavirus, and influenza A virus. For coronavirus, we used the spike gene sequences, while for rabies and influenza A viruses, we used the more conserved nucleoprotein gene sequences. We compared the three methods under different scenarios and showed that their performances are highly correlated with the variability of sequences and sample size. For conserved genes like the nucleoprotein gene, longer k-mers than mono- and dinucleotides are needed to better distinguish the sequences. We also showed that both alignment-based and alignment-free methods can accurately predict the hosts of viruses. When alignment is difficult to achieve or highly time-consuming, alignment-free methods can be a promising substitute to predict the hosts of new viruses. |
format | Online Article Text |
id | pubmed-6030160 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-60301602018-07-11 Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences Li, Han Sun, Fengzhu Sci Rep Article Predicting the hosts of newly discovered viruses is important for pandemic surveillance of infectious diseases. We investigated the use of alignment-based and alignment-free methods and support vector machine using mononucleotide frequency and dinucleotide bias to predict the hosts of viruses, and applied these approaches to three datasets: rabies virus, coronavirus, and influenza A virus. For coronavirus, we used the spike gene sequences, while for rabies and influenza A viruses, we used the more conserved nucleoprotein gene sequences. We compared the three methods under different scenarios and showed that their performances are highly correlated with the variability of sequences and sample size. For conserved genes like the nucleoprotein gene, longer k-mers than mono- and dinucleotides are needed to better distinguish the sequences. We also showed that both alignment-based and alignment-free methods can accurately predict the hosts of viruses. When alignment is difficult to achieve or highly time-consuming, alignment-free methods can be a promising substitute to predict the hosts of new viruses. Nature Publishing Group UK 2018-07-03 /pmc/articles/PMC6030160/ /pubmed/29968780 http://dx.doi.org/10.1038/s41598-018-28308-x Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Li, Han Sun, Fengzhu Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences |
title | Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences |
title_full | Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences |
title_fullStr | Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences |
title_full_unstemmed | Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences |
title_short | Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences |
title_sort | comparative studies of alignment, alignment-free and svm based approaches for predicting the hosts of viruses based on viral sequences |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030160/ https://www.ncbi.nlm.nih.gov/pubmed/29968780 http://dx.doi.org/10.1038/s41598-018-28308-x |
work_keys_str_mv | AT lihan comparativestudiesofalignmentalignmentfreeandsvmbasedapproachesforpredictingthehostsofvirusesbasedonviralsequences AT sunfengzhu comparativestudiesofalignmentalignmentfreeandsvmbasedapproachesforpredictingthehostsofvirusesbasedonviralsequences |