Cargando…

Computational analysis and prediction of PE_PGRS proteins using machine learning

Mycobacterium tuberculosis genome comprises approximately 10% of two families of poorly characterised genes due to their high GC content and highly repetitive nature. The largest sub-group, the proline-glutamic acid polymorphic guanine-cytosine-rich sequence (PE_PGRS) family, is thought to be involv...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Fuyi, Guo, Xudong, Xiang, Dongxu, Pitt, Miranda E., Bainomugisa, Arnold, Coin, Lachlan J.M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8804200/
https://www.ncbi.nlm.nih.gov/pubmed/35140886
http://dx.doi.org/10.1016/j.csbj.2022.01.019
_version_ 1784643022637498368
author Li, Fuyi
Guo, Xudong
Xiang, Dongxu
Pitt, Miranda E.
Bainomugisa, Arnold
Coin, Lachlan J.M.
author_facet Li, Fuyi
Guo, Xudong
Xiang, Dongxu
Pitt, Miranda E.
Bainomugisa, Arnold
Coin, Lachlan J.M.
author_sort Li, Fuyi
collection PubMed
description Mycobacterium tuberculosis genome comprises approximately 10% of two families of poorly characterised genes due to their high GC content and highly repetitive nature. The largest sub-group, the proline-glutamic acid polymorphic guanine-cytosine-rich sequence (PE_PGRS) family, is thought to be involved in host response and disease pathogenicity. Due to their high genetic variability and complexity of analysis, they are typically disregarded for further research in genomic studies. There are currently limited online resources and homology computational tools that can identify and analyse PE_PGRS proteins. In addition, they are computational-intensive and time-consuming, and lack sensitivity. Therefore, computational methods that can rapidly and accurately identify PE_PGRS proteins are valuable to facilitate the functional elucidation of the PE_PGRS family proteins. In this study, we developed the first machine learning-based bioinformatics approach, termed PEPPER, to allow users to identify PE_PGRS proteins rapidly and accurately. PEPPER was built upon a comprehensive evaluation of 13 popular machine learning algorithms with various sequence and physicochemical features. Empirical studies demonstrated that PEPPER achieved significantly better performance than alignment-based approaches, BLASTP and PHMMER, in both prediction accuracy and speed. PEPPER is anticipated to facilitate community-wide efforts to conduct high-throughput identification and analysis of PE_PGRS proteins.
format Online
Article
Text
id pubmed-8804200
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-88042002022-02-08 Computational analysis and prediction of PE_PGRS proteins using machine learning Li, Fuyi Guo, Xudong Xiang, Dongxu Pitt, Miranda E. Bainomugisa, Arnold Coin, Lachlan J.M. Comput Struct Biotechnol J Research Article Mycobacterium tuberculosis genome comprises approximately 10% of two families of poorly characterised genes due to their high GC content and highly repetitive nature. The largest sub-group, the proline-glutamic acid polymorphic guanine-cytosine-rich sequence (PE_PGRS) family, is thought to be involved in host response and disease pathogenicity. Due to their high genetic variability and complexity of analysis, they are typically disregarded for further research in genomic studies. There are currently limited online resources and homology computational tools that can identify and analyse PE_PGRS proteins. In addition, they are computational-intensive and time-consuming, and lack sensitivity. Therefore, computational methods that can rapidly and accurately identify PE_PGRS proteins are valuable to facilitate the functional elucidation of the PE_PGRS family proteins. In this study, we developed the first machine learning-based bioinformatics approach, termed PEPPER, to allow users to identify PE_PGRS proteins rapidly and accurately. PEPPER was built upon a comprehensive evaluation of 13 popular machine learning algorithms with various sequence and physicochemical features. Empirical studies demonstrated that PEPPER achieved significantly better performance than alignment-based approaches, BLASTP and PHMMER, in both prediction accuracy and speed. PEPPER is anticipated to facilitate community-wide efforts to conduct high-throughput identification and analysis of PE_PGRS proteins. Research Network of Computational and Structural Biotechnology 2022-01-22 /pmc/articles/PMC8804200/ /pubmed/35140886 http://dx.doi.org/10.1016/j.csbj.2022.01.019 Text en © 2022 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Li, Fuyi
Guo, Xudong
Xiang, Dongxu
Pitt, Miranda E.
Bainomugisa, Arnold
Coin, Lachlan J.M.
Computational analysis and prediction of PE_PGRS proteins using machine learning
title Computational analysis and prediction of PE_PGRS proteins using machine learning
title_full Computational analysis and prediction of PE_PGRS proteins using machine learning
title_fullStr Computational analysis and prediction of PE_PGRS proteins using machine learning
title_full_unstemmed Computational analysis and prediction of PE_PGRS proteins using machine learning
title_short Computational analysis and prediction of PE_PGRS proteins using machine learning
title_sort computational analysis and prediction of pe_pgrs proteins using machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8804200/
https://www.ncbi.nlm.nih.gov/pubmed/35140886
http://dx.doi.org/10.1016/j.csbj.2022.01.019
work_keys_str_mv AT lifuyi computationalanalysisandpredictionofpepgrsproteinsusingmachinelearning
AT guoxudong computationalanalysisandpredictionofpepgrsproteinsusingmachinelearning
AT xiangdongxu computationalanalysisandpredictionofpepgrsproteinsusingmachinelearning
AT pittmirandae computationalanalysisandpredictionofpepgrsproteinsusingmachinelearning
AT bainomugisaarnold computationalanalysisandpredictionofpepgrsproteinsusingmachinelearning
AT coinlachlanjm computationalanalysisandpredictionofpepgrsproteinsusingmachinelearning