Cargando…
Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms
The coronavirus 3C-like (3CL) protease, a cysteine protease, plays an important role in viral infection and immune escape. However, there is still a lack of effective tools for determining the cleavage sites of the 3CL protease. This study systematically investigated the diversity of the cleavage si...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Wuhan Institute of Virology, Chinese Academy of Sciences
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9060714/ https://www.ncbi.nlm.nih.gov/pubmed/35513273 http://dx.doi.org/10.1016/j.virs.2022.04.006 |
_version_ | 1784698564137451520 |
---|---|
author | Chen, Huiting Zhu, Zhaozhong Qiu, Ye Ge, Xingyi Zheng, Heping Peng, Yousong |
author_facet | Chen, Huiting Zhu, Zhaozhong Qiu, Ye Ge, Xingyi Zheng, Heping Peng, Yousong |
author_sort | Chen, Huiting |
collection | PubMed |
description | The coronavirus 3C-like (3CL) protease, a cysteine protease, plays an important role in viral infection and immune escape. However, there is still a lack of effective tools for determining the cleavage sites of the 3CL protease. This study systematically investigated the diversity of the cleavage sites of the coronavirus 3CL protease on the viral polyprotein, and found that the cleavage motif were highly conserved for viruses in the genera of Alphacoronavirus, Betacoronavirus and Gammacoronavirus. Strong residue preferences were observed at the neighboring positions of the cleavage sites. A random forest (RF) model was built to predict the cleavage sites of the coronavirus 3CL protease based on the representation of residues in cleavage motifs by amino acid indexes, and the model achieved an AUC of 0.96 in cross-validations. The RF model was further tested on an independent test dataset which were composed of cleavage sites on 99 proteins from multiple coronavirus hosts. It achieved an AUC of 0.95 and predicted correctly 80% of the cleavage sites. Then, 1,352 human proteins were predicted to be cleaved by the 3CL protease by the RF model. These proteins were enriched in several GO terms related to the cytoskeleton, such as the microtubule, actin and tubulin. Finally, a webserver named 3CLP was built to predict the cleavage sites of the coronavirus 3CL protease based on the RF model. Overall, the study provides an effective tool for identifying cleavage sites of the 3CL protease and provides insights into the molecular mechanism underlying the pathogenicity of coronaviruses. |
format | Online Article Text |
id | pubmed-9060714 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Wuhan Institute of Virology, Chinese Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-90607142022-05-03 Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms Chen, Huiting Zhu, Zhaozhong Qiu, Ye Ge, Xingyi Zheng, Heping Peng, Yousong Virol Sin Research Article The coronavirus 3C-like (3CL) protease, a cysteine protease, plays an important role in viral infection and immune escape. However, there is still a lack of effective tools for determining the cleavage sites of the 3CL protease. This study systematically investigated the diversity of the cleavage sites of the coronavirus 3CL protease on the viral polyprotein, and found that the cleavage motif were highly conserved for viruses in the genera of Alphacoronavirus, Betacoronavirus and Gammacoronavirus. Strong residue preferences were observed at the neighboring positions of the cleavage sites. A random forest (RF) model was built to predict the cleavage sites of the coronavirus 3CL protease based on the representation of residues in cleavage motifs by amino acid indexes, and the model achieved an AUC of 0.96 in cross-validations. The RF model was further tested on an independent test dataset which were composed of cleavage sites on 99 proteins from multiple coronavirus hosts. It achieved an AUC of 0.95 and predicted correctly 80% of the cleavage sites. Then, 1,352 human proteins were predicted to be cleaved by the 3CL protease by the RF model. These proteins were enriched in several GO terms related to the cytoskeleton, such as the microtubule, actin and tubulin. Finally, a webserver named 3CLP was built to predict the cleavage sites of the coronavirus 3CL protease based on the RF model. Overall, the study provides an effective tool for identifying cleavage sites of the 3CL protease and provides insights into the molecular mechanism underlying the pathogenicity of coronaviruses. Wuhan Institute of Virology, Chinese Academy of Sciences 2022-05-02 /pmc/articles/PMC9060714/ /pubmed/35513273 http://dx.doi.org/10.1016/j.virs.2022.04.006 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Research Article Chen, Huiting Zhu, Zhaozhong Qiu, Ye Ge, Xingyi Zheng, Heping Peng, Yousong Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms |
title | Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms |
title_full | Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms |
title_fullStr | Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms |
title_full_unstemmed | Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms |
title_short | Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms |
title_sort | prediction of coronavirus 3c-like protease cleavage sites using machine-learning algorithms |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9060714/ https://www.ncbi.nlm.nih.gov/pubmed/35513273 http://dx.doi.org/10.1016/j.virs.2022.04.006 |
work_keys_str_mv | AT chenhuiting predictionofcoronavirus3clikeproteasecleavagesitesusingmachinelearningalgorithms AT zhuzhaozhong predictionofcoronavirus3clikeproteasecleavagesitesusingmachinelearningalgorithms AT qiuye predictionofcoronavirus3clikeproteasecleavagesitesusingmachinelearningalgorithms AT gexingyi predictionofcoronavirus3clikeproteasecleavagesitesusingmachinelearningalgorithms AT zhengheping predictionofcoronavirus3clikeproteasecleavagesitesusingmachinelearningalgorithms AT pengyousong predictionofcoronavirus3clikeproteasecleavagesitesusingmachinelearningalgorithms |