Cargando…
Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone
Coronaviruses infect many animals, including humans, due to interspecies transmission. Three of the known human coronaviruses: MERS, SARS-CoV-1, and SARS-CoV-2, the pathogen for the COVID-19 pandemic, cause severe disease. Improved methods to predict host specificity of coronaviruses will be valuabl...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500881/ https://www.ncbi.nlm.nih.gov/pubmed/32981683 http://dx.doi.org/10.1016/j.bbrc.2020.09.010 |
_version_ | 1783583941289050112 |
---|---|
author | Kuzmin, Kiril Adeniyi, Ayotomiwa Ezekiel DaSouza, Arthur Kevin Lim, Deuk Nguyen, Huyen Molina, Nuria Ramirez Xiong, Lanqiao Weber, Irene T. Harrison, Robert W. |
author_facet | Kuzmin, Kiril Adeniyi, Ayotomiwa Ezekiel DaSouza, Arthur Kevin Lim, Deuk Nguyen, Huyen Molina, Nuria Ramirez Xiong, Lanqiao Weber, Irene T. Harrison, Robert W. |
author_sort | Kuzmin, Kiril |
collection | PubMed |
description | Coronaviruses infect many animals, including humans, due to interspecies transmission. Three of the known human coronaviruses: MERS, SARS-CoV-1, and SARS-CoV-2, the pathogen for the COVID-19 pandemic, cause severe disease. Improved methods to predict host specificity of coronaviruses will be valuable for identifying and controlling future outbreaks. The coronavirus S protein plays a key role in host specificity by attaching the virus to receptors on the cell membrane. We analyzed 1238 spike sequences for their host specificity. Spike sequences readily segregate in t-SNE embeddings into clusters of similar hosts and/or virus species. Machine learning with SVM, Logistic Regression, Decision Tree, Random Forest gave high average accuracies, [Formula: see text] scores, sensitivities and specificities of 0.95–0.99. Importantly, sites identified by Decision Tree correspond to protein regions with known biological importance. These results demonstrate that spike sequences alone can be used to predict host specificity. |
format | Online Article Text |
id | pubmed-7500881 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Elsevier Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-75008812020-09-21 Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone Kuzmin, Kiril Adeniyi, Ayotomiwa Ezekiel DaSouza, Arthur Kevin Lim, Deuk Nguyen, Huyen Molina, Nuria Ramirez Xiong, Lanqiao Weber, Irene T. Harrison, Robert W. Biochem Biophys Res Commun Article Coronaviruses infect many animals, including humans, due to interspecies transmission. Three of the known human coronaviruses: MERS, SARS-CoV-1, and SARS-CoV-2, the pathogen for the COVID-19 pandemic, cause severe disease. Improved methods to predict host specificity of coronaviruses will be valuable for identifying and controlling future outbreaks. The coronavirus S protein plays a key role in host specificity by attaching the virus to receptors on the cell membrane. We analyzed 1238 spike sequences for their host specificity. Spike sequences readily segregate in t-SNE embeddings into clusters of similar hosts and/or virus species. Machine learning with SVM, Logistic Regression, Decision Tree, Random Forest gave high average accuracies, [Formula: see text] scores, sensitivities and specificities of 0.95–0.99. Importantly, sites identified by Decision Tree correspond to protein regions with known biological importance. These results demonstrate that spike sequences alone can be used to predict host specificity. Elsevier Inc. 2020-12-10 2020-09-18 /pmc/articles/PMC7500881/ /pubmed/32981683 http://dx.doi.org/10.1016/j.bbrc.2020.09.010 Text en © 2020 Elsevier Inc. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Kuzmin, Kiril Adeniyi, Ayotomiwa Ezekiel DaSouza, Arthur Kevin Lim, Deuk Nguyen, Huyen Molina, Nuria Ramirez Xiong, Lanqiao Weber, Irene T. Harrison, Robert W. Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone |
title | Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone |
title_full | Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone |
title_fullStr | Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone |
title_full_unstemmed | Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone |
title_short | Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone |
title_sort | machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500881/ https://www.ncbi.nlm.nih.gov/pubmed/32981683 http://dx.doi.org/10.1016/j.bbrc.2020.09.010 |
work_keys_str_mv | AT kuzminkiril machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone AT adeniyiayotomiwaezekiel machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone AT dasouzaarthurkevin machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone AT limdeuk machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone AT nguyenhuyen machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone AT molinanuriaramirez machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone AT xionglanqiao machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone AT weberirenet machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone AT harrisonrobertw machinelearningmethodsaccuratelypredicthostspecificityofcoronavirusesbasedonspikesequencesalone |