Cargando…

Machine learning techniques for sequence-based prediction of viral–host interactions between SARS-CoV-2 and human proteins

BACKGROUND: COVID-19 (Coronavirus Disease-19), a disease caused by the SARS-CoV-2 virus, has been declared as a pandemic by the World Health Organization on March 11, 2020. Over 15 million people have already been affected worldwide by COVID-19, resulting in more than 0.6 million deaths. Protein–pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Dey, Lopamudra, Chakraborty, Sanjay, Mukhopadhyay, Anirban
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Chang Gung University 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7470713/
https://www.ncbi.nlm.nih.gov/pubmed/33036956
http://dx.doi.org/10.1016/j.bj.2020.08.003
_version_ 1783578632466202624
author Dey, Lopamudra
Chakraborty, Sanjay
Mukhopadhyay, Anirban
author_facet Dey, Lopamudra
Chakraborty, Sanjay
Mukhopadhyay, Anirban
author_sort Dey, Lopamudra
collection PubMed
description BACKGROUND: COVID-19 (Coronavirus Disease-19), a disease caused by the SARS-CoV-2 virus, has been declared as a pandemic by the World Health Organization on March 11, 2020. Over 15 million people have already been affected worldwide by COVID-19, resulting in more than 0.6 million deaths. Protein–protein interactions (PPIs) play a key role in the cellular process of SARS-CoV-2 virus infection in the human body. Recently a study has reported some SARS-CoV-2 proteins that interact with several human proteins while many potential interactions remain to be identified. METHOD: In this article, various machine learning models are built to predict the PPIs between the virus and human proteins that are further validated using biological experiments. The classification models are prepared based on different sequence-based features of human proteins like amino acid composition, pseudo amino acid composition, and conjoint triad. RESULT: We have built an ensemble voting classifier using SVM(Radial), SVM(Polynomial), and Random Forest technique that gives a greater accuracy, precision, specificity, recall, and F1 score compared to all other models used in the work. A total of 1326 potential human target proteins of SARS-CoV-2 have been predicted by the proposed ensemble model and validated using gene ontology and KEGG pathway enrichment analysis. Several repurposable drugs targeting the predicted interactions are also reported. CONCLUSION: This study may encourage the identification of potential targets for more effective anti-COVID drug discovery.
format Online
Article
Text
id pubmed-7470713
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Chang Gung University
record_format MEDLINE/PubMed
spelling pubmed-74707132020-09-04 Machine learning techniques for sequence-based prediction of viral–host interactions between SARS-CoV-2 and human proteins Dey, Lopamudra Chakraborty, Sanjay Mukhopadhyay, Anirban Biomed J Original Article BACKGROUND: COVID-19 (Coronavirus Disease-19), a disease caused by the SARS-CoV-2 virus, has been declared as a pandemic by the World Health Organization on March 11, 2020. Over 15 million people have already been affected worldwide by COVID-19, resulting in more than 0.6 million deaths. Protein–protein interactions (PPIs) play a key role in the cellular process of SARS-CoV-2 virus infection in the human body. Recently a study has reported some SARS-CoV-2 proteins that interact with several human proteins while many potential interactions remain to be identified. METHOD: In this article, various machine learning models are built to predict the PPIs between the virus and human proteins that are further validated using biological experiments. The classification models are prepared based on different sequence-based features of human proteins like amino acid composition, pseudo amino acid composition, and conjoint triad. RESULT: We have built an ensemble voting classifier using SVM(Radial), SVM(Polynomial), and Random Forest technique that gives a greater accuracy, precision, specificity, recall, and F1 score compared to all other models used in the work. A total of 1326 potential human target proteins of SARS-CoV-2 have been predicted by the proposed ensemble model and validated using gene ontology and KEGG pathway enrichment analysis. Several repurposable drugs targeting the predicted interactions are also reported. CONCLUSION: This study may encourage the identification of potential targets for more effective anti-COVID drug discovery. Chang Gung University 2020-10 2020-09-03 /pmc/articles/PMC7470713/ /pubmed/33036956 http://dx.doi.org/10.1016/j.bj.2020.08.003 Text en © 2020 Chang Gung University. Publishing services by Elsevier B.V. http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Original Article
Dey, Lopamudra
Chakraborty, Sanjay
Mukhopadhyay, Anirban
Machine learning techniques for sequence-based prediction of viral–host interactions between SARS-CoV-2 and human proteins
title Machine learning techniques for sequence-based prediction of viral–host interactions between SARS-CoV-2 and human proteins
title_full Machine learning techniques for sequence-based prediction of viral–host interactions between SARS-CoV-2 and human proteins
title_fullStr Machine learning techniques for sequence-based prediction of viral–host interactions between SARS-CoV-2 and human proteins
title_full_unstemmed Machine learning techniques for sequence-based prediction of viral–host interactions between SARS-CoV-2 and human proteins
title_short Machine learning techniques for sequence-based prediction of viral–host interactions between SARS-CoV-2 and human proteins
title_sort machine learning techniques for sequence-based prediction of viral–host interactions between sars-cov-2 and human proteins
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7470713/
https://www.ncbi.nlm.nih.gov/pubmed/33036956
http://dx.doi.org/10.1016/j.bj.2020.08.003
work_keys_str_mv AT deylopamudra machinelearningtechniquesforsequencebasedpredictionofviralhostinteractionsbetweensarscov2andhumanproteins
AT chakrabortysanjay machinelearningtechniquesforsequencebasedpredictionofviralhostinteractionsbetweensarscov2andhumanproteins
AT mukhopadhyayanirban machinelearningtechniquesforsequencebasedpredictionofviralhostinteractionsbetweensarscov2andhumanproteins