Cargando…

Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms

The coronavirus 3C-like (3CL) protease, a cysteine protease, plays an important role in viral infection and immune escape. However, there is still a lack of effective tools for determining the cleavage sites of the 3CL protease. This study systematically investigated the diversity of the cleavage si...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Huiting, Zhu, Zhaozhong, Qiu, Ye, Ge, Xingyi, Zheng, Heping, Peng, Yousong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Wuhan Institute of Virology, Chinese Academy of Sciences 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9060714/
https://www.ncbi.nlm.nih.gov/pubmed/35513273
http://dx.doi.org/10.1016/j.virs.2022.04.006
_version_ 1784698564137451520
author Chen, Huiting
Zhu, Zhaozhong
Qiu, Ye
Ge, Xingyi
Zheng, Heping
Peng, Yousong
author_facet Chen, Huiting
Zhu, Zhaozhong
Qiu, Ye
Ge, Xingyi
Zheng, Heping
Peng, Yousong
author_sort Chen, Huiting
collection PubMed
description The coronavirus 3C-like (3CL) protease, a cysteine protease, plays an important role in viral infection and immune escape. However, there is still a lack of effective tools for determining the cleavage sites of the 3CL protease. This study systematically investigated the diversity of the cleavage sites of the coronavirus 3CL protease on the viral polyprotein, and found that the cleavage motif were highly conserved for viruses in the genera of Alphacoronavirus, Betacoronavirus and Gammacoronavirus. Strong residue preferences were observed at the neighboring positions of the cleavage sites. A random forest (RF) model was built to predict the cleavage sites of the coronavirus 3CL protease based on the representation of residues in cleavage motifs by amino acid indexes, and the model achieved an AUC of 0.96 in cross-validations. The RF model was further tested on an independent test dataset which were composed of cleavage sites on 99 proteins from multiple coronavirus hosts. It achieved an AUC of 0.95 and predicted correctly 80% of the cleavage sites. Then, 1,352 human proteins were predicted to be cleaved by the 3CL protease by the RF model. These proteins were enriched in several GO terms related to the cytoskeleton, such as the microtubule, actin and tubulin. Finally, a webserver named 3CLP was built to predict the cleavage sites of the coronavirus 3CL protease based on the RF model. Overall, the study provides an effective tool for identifying cleavage sites of the 3CL protease and provides insights into the molecular mechanism underlying the pathogenicity of coronaviruses.
format Online
Article
Text
id pubmed-9060714
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Wuhan Institute of Virology, Chinese Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-90607142022-05-03 Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms Chen, Huiting Zhu, Zhaozhong Qiu, Ye Ge, Xingyi Zheng, Heping Peng, Yousong Virol Sin Research Article The coronavirus 3C-like (3CL) protease, a cysteine protease, plays an important role in viral infection and immune escape. However, there is still a lack of effective tools for determining the cleavage sites of the 3CL protease. This study systematically investigated the diversity of the cleavage sites of the coronavirus 3CL protease on the viral polyprotein, and found that the cleavage motif were highly conserved for viruses in the genera of Alphacoronavirus, Betacoronavirus and Gammacoronavirus. Strong residue preferences were observed at the neighboring positions of the cleavage sites. A random forest (RF) model was built to predict the cleavage sites of the coronavirus 3CL protease based on the representation of residues in cleavage motifs by amino acid indexes, and the model achieved an AUC of 0.96 in cross-validations. The RF model was further tested on an independent test dataset which were composed of cleavage sites on 99 proteins from multiple coronavirus hosts. It achieved an AUC of 0.95 and predicted correctly 80% of the cleavage sites. Then, 1,352 human proteins were predicted to be cleaved by the 3CL protease by the RF model. These proteins were enriched in several GO terms related to the cytoskeleton, such as the microtubule, actin and tubulin. Finally, a webserver named 3CLP was built to predict the cleavage sites of the coronavirus 3CL protease based on the RF model. Overall, the study provides an effective tool for identifying cleavage sites of the 3CL protease and provides insights into the molecular mechanism underlying the pathogenicity of coronaviruses. Wuhan Institute of Virology, Chinese Academy of Sciences 2022-05-02 /pmc/articles/PMC9060714/ /pubmed/35513273 http://dx.doi.org/10.1016/j.virs.2022.04.006 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Chen, Huiting
Zhu, Zhaozhong
Qiu, Ye
Ge, Xingyi
Zheng, Heping
Peng, Yousong
Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms
title Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms
title_full Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms
title_fullStr Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms
title_full_unstemmed Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms
title_short Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms
title_sort prediction of coronavirus 3c-like protease cleavage sites using machine-learning algorithms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9060714/
https://www.ncbi.nlm.nih.gov/pubmed/35513273
http://dx.doi.org/10.1016/j.virs.2022.04.006
work_keys_str_mv AT chenhuiting predictionofcoronavirus3clikeproteasecleavagesitesusingmachinelearningalgorithms
AT zhuzhaozhong predictionofcoronavirus3clikeproteasecleavagesitesusingmachinelearningalgorithms
AT qiuye predictionofcoronavirus3clikeproteasecleavagesitesusingmachinelearningalgorithms
AT gexingyi predictionofcoronavirus3clikeproteasecleavagesitesusingmachinelearningalgorithms
AT zhengheping predictionofcoronavirus3clikeproteasecleavagesitesusingmachinelearningalgorithms
AT pengyousong predictionofcoronavirus3clikeproteasecleavagesitesusingmachinelearningalgorithms