Cargando…
A computational method for predicting nucleocapsid protein in retroviruses
Nucleocapsid protein (NC) in the group-specific antigen (gag) of retrovirus is essential in the interactions of most retroviral gag proteins with RNAs. Computational method to predict NCs would benefit subsequent structure analysis and functional study on them. However, no computational method to pr...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8752852/ https://www.ncbi.nlm.nih.gov/pubmed/35017554 http://dx.doi.org/10.1038/s41598-021-03182-2 |
_version_ | 1784631963467907072 |
---|---|
author | Guo, Manyun Ma, Yucheng Liu, Wanyuan Yuan, Zuyi |
author_facet | Guo, Manyun Ma, Yucheng Liu, Wanyuan Yuan, Zuyi |
author_sort | Guo, Manyun |
collection | PubMed |
description | Nucleocapsid protein (NC) in the group-specific antigen (gag) of retrovirus is essential in the interactions of most retroviral gag proteins with RNAs. Computational method to predict NCs would benefit subsequent structure analysis and functional study on them. However, no computational method to predict the exact locations of NCs in retroviruses has been proposed yet. The wide range of length variation of NCs also increases the difficulties. In this paper, a computational method to identify NCs in retroviruses is proposed. All available retrovirus sequences with NC annotations were collected from NCBI. Models based on random forest (RF) and weighted support vector machine (WSVM) were built to predict initiation and termination sites of NCs. Factor analysis scales of generalized amino acid information along with position weight matrix were utilized to generate the feature space. Homology based gene prediction methods were also compared and integrated to bring out better predicting performance. Candidate initiation and termination sites predicted were then combined and screened according to their intervals, decision values and alignment scores. All available gag sequences without NC annotations were scanned with the model to detect putative NCs. Geometric means of sensitivity and specificity generated from prediction of initiation and termination sites under fivefold cross-validation are 0.9900 and 0.9548 respectively. 90.91% of all the collected retrovirus sequences with NC annotations could be predicted totally correct by the model combining WSVM, RF and simple alignment. The composite model performs better than the simplex ones. 235 putative NCs in unannotated gags were detected by the model. Our prediction method performs well on NC recognition and could also be expanded to solve other gene prediction problems, especially those whose training samples have large length variations. |
format | Online Article Text |
id | pubmed-8752852 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-87528522022-01-13 A computational method for predicting nucleocapsid protein in retroviruses Guo, Manyun Ma, Yucheng Liu, Wanyuan Yuan, Zuyi Sci Rep Article Nucleocapsid protein (NC) in the group-specific antigen (gag) of retrovirus is essential in the interactions of most retroviral gag proteins with RNAs. Computational method to predict NCs would benefit subsequent structure analysis and functional study on them. However, no computational method to predict the exact locations of NCs in retroviruses has been proposed yet. The wide range of length variation of NCs also increases the difficulties. In this paper, a computational method to identify NCs in retroviruses is proposed. All available retrovirus sequences with NC annotations were collected from NCBI. Models based on random forest (RF) and weighted support vector machine (WSVM) were built to predict initiation and termination sites of NCs. Factor analysis scales of generalized amino acid information along with position weight matrix were utilized to generate the feature space. Homology based gene prediction methods were also compared and integrated to bring out better predicting performance. Candidate initiation and termination sites predicted were then combined and screened according to their intervals, decision values and alignment scores. All available gag sequences without NC annotations were scanned with the model to detect putative NCs. Geometric means of sensitivity and specificity generated from prediction of initiation and termination sites under fivefold cross-validation are 0.9900 and 0.9548 respectively. 90.91% of all the collected retrovirus sequences with NC annotations could be predicted totally correct by the model combining WSVM, RF and simple alignment. The composite model performs better than the simplex ones. 235 putative NCs in unannotated gags were detected by the model. Our prediction method performs well on NC recognition and could also be expanded to solve other gene prediction problems, especially those whose training samples have large length variations. Nature Publishing Group UK 2022-01-11 /pmc/articles/PMC8752852/ /pubmed/35017554 http://dx.doi.org/10.1038/s41598-021-03182-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Guo, Manyun Ma, Yucheng Liu, Wanyuan Yuan, Zuyi A computational method for predicting nucleocapsid protein in retroviruses |
title | A computational method for predicting nucleocapsid protein in retroviruses |
title_full | A computational method for predicting nucleocapsid protein in retroviruses |
title_fullStr | A computational method for predicting nucleocapsid protein in retroviruses |
title_full_unstemmed | A computational method for predicting nucleocapsid protein in retroviruses |
title_short | A computational method for predicting nucleocapsid protein in retroviruses |
title_sort | computational method for predicting nucleocapsid protein in retroviruses |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8752852/ https://www.ncbi.nlm.nih.gov/pubmed/35017554 http://dx.doi.org/10.1038/s41598-021-03182-2 |
work_keys_str_mv | AT guomanyun acomputationalmethodforpredictingnucleocapsidproteininretroviruses AT mayucheng acomputationalmethodforpredictingnucleocapsidproteininretroviruses AT liuwanyuan acomputationalmethodforpredictingnucleocapsidproteininretroviruses AT yuanzuyi acomputationalmethodforpredictingnucleocapsidproteininretroviruses AT guomanyun computationalmethodforpredictingnucleocapsidproteininretroviruses AT mayucheng computationalmethodforpredictingnucleocapsidproteininretroviruses AT liuwanyuan computationalmethodforpredictingnucleocapsidproteininretroviruses AT yuanzuyi computationalmethodforpredictingnucleocapsidproteininretroviruses |