Cargando…

Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone

Coronaviruses infect many animals, including humans, due to interspecies transmission. Three of the known human coronaviruses: MERS, SARS-CoV-1, and SARS-CoV-2, the pathogen for the COVID-19 pandemic, cause severe disease. Improved methods to predict host specificity of coronaviruses will be valuabl...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuzmin, Kiril, Adeniyi, Ayotomiwa Ezekiel, DaSouza, Arthur Kevin, Lim, Deuk, Nguyen, Huyen, Molina, Nuria Ramirez, Xiong, Lanqiao, Weber, Irene T., Harrison, Robert W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500881/
https://www.ncbi.nlm.nih.gov/pubmed/32981683
http://dx.doi.org/10.1016/j.bbrc.2020.09.010
_version_ 1783583941289050112
author Kuzmin, Kiril
Adeniyi, Ayotomiwa Ezekiel
DaSouza, Arthur Kevin
Lim, Deuk
Nguyen, Huyen
Molina, Nuria Ramirez
Xiong, Lanqiao
Weber, Irene T.
Harrison, Robert W.
author_facet Kuzmin, Kiril
Adeniyi, Ayotomiwa Ezekiel
DaSouza, Arthur Kevin
Lim, Deuk
Nguyen, Huyen
Molina, Nuria Ramirez
Xiong, Lanqiao
Weber, Irene T.
Harrison, Robert W.
author_sort Kuzmin, Kiril
collection PubMed
description Coronaviruses infect many animals, including humans, due to interspecies transmission. Three of the known human coronaviruses: MERS, SARS-CoV-1, and SARS-CoV-2, the pathogen for the COVID-19 pandemic, cause severe disease. Improved methods to predict host specificity of coronaviruses will be valuable for identifying and controlling future outbreaks. The coronavirus S protein plays a key role in host specificity by attaching the virus to receptors on the cell membrane. We analyzed 1238 spike sequences for their host specificity. Spike sequences readily segregate in t-SNE embeddings into clusters of similar hosts and/or virus species. Machine learning with SVM, Logistic Regression, Decision Tree, Random Forest gave high average accuracies, [Formula: see text] scores, sensitivities and specificities of 0.95–0.99. Importantly, sites identified by Decision Tree correspond to protein regions with known biological importance. These results demonstrate that spike sequences alone can be used to predict host specificity.
format Online
Article
Text
id pubmed-7500881
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier Inc.
record_format MEDLINE/PubMed
spelling pubmed-75008812020-09-21 Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone Kuzmin, Kiril Adeniyi, Ayotomiwa Ezekiel DaSouza, Arthur Kevin Lim, Deuk Nguyen, Huyen Molina, Nuria Ramirez Xiong, Lanqiao Weber, Irene T. Harrison, Robert W. Biochem Biophys Res Commun Article Coronaviruses infect many animals, including humans, due to interspecies transmission. Three of the known human coronaviruses: MERS, SARS-CoV-1, and SARS-CoV-2, the pathogen for the COVID-19 pandemic, cause severe disease. Improved methods to predict host specificity of coronaviruses will be valuable for identifying and controlling future outbreaks. The coronavirus S protein plays a key role in host specificity by attaching the virus to receptors on the cell membrane. We analyzed 1238 spike sequences for their host specificity. Spike sequences readily segregate in t-SNE embeddings into clusters of similar hosts and/or virus species. Machine learning with SVM, Logistic Regression, Decision Tree, Random Forest gave high average accuracies, [Formula: see text] scores, sensitivities and specificities of 0.95–0.99. Importantly, sites identified by Decision Tree correspond to protein regions with known biological importance. These results demonstrate that spike sequences alone can be used to predict host specificity. Elsevier Inc. 2020-12-10 2020-09-18 /pmc/articles/PMC7500881/ /pubmed/32981683 http://dx.doi.org/10.1016/j.bbrc.2020.09.010 Text en © 2020 Elsevier Inc. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Kuzmin, Kiril
Adeniyi, Ayotomiwa Ezekiel
DaSouza, Arthur Kevin
Lim, Deuk
Nguyen, Huyen
Molina, Nuria Ramirez
Xiong, Lanqiao
Weber, Irene T.
Harrison, Robert W.
Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone
title Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone
title_full Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone
title_fullStr Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone
title_full_unstemmed Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone
title_short Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone
title_sort machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500881/
https://www.ncbi.nlm.nih.gov/pubmed/32981683
http://dx.doi.org/10.1016/j.bbrc.2020.09.010
work_keys_str_mv AT kuzminkiril machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone
AT adeniyiayotomiwaezekiel machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone
AT dasouzaarthurkevin machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone
AT limdeuk machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone
AT nguyenhuyen machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone
AT molinanuriaramirez machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone
AT xionglanqiao machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone
AT weberirenet machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone
AT harrisonrobertw machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone