Cargando…

Computational analysis and prediction of PE_PGRS proteins using machine learning

Mycobacterium tuberculosis genome comprises approximately 10% of two families of poorly characterised genes due to their high GC content and highly repetitive nature. The largest sub-group, the proline-glutamic acid polymorphic guanine-cytosine-rich sequence (PE_PGRS) family, is thought to be involv...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Fuyi, Guo, Xudong, Xiang, Dongxu, Pitt, Miranda E., Bainomugisa, Arnold, Coin, Lachlan J.M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Research Network of Computational and Structural Biotechnology 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8804200/ https://www.ncbi.nlm.nih.gov/pubmed/35140886 http://dx.doi.org/10.1016/j.csbj.2022.01.019

_version_	1784643022637498368
author	Li, Fuyi Guo, Xudong Xiang, Dongxu Pitt, Miranda E. Bainomugisa, Arnold Coin, Lachlan J.M.
author_facet	Li, Fuyi Guo, Xudong Xiang, Dongxu Pitt, Miranda E. Bainomugisa, Arnold Coin, Lachlan J.M.
author_sort	Li, Fuyi
collection	PubMed
description	Mycobacterium tuberculosis genome comprises approximately 10% of two families of poorly characterised genes due to their high GC content and highly repetitive nature. The largest sub-group, the proline-glutamic acid polymorphic guanine-cytosine-rich sequence (PE_PGRS) family, is thought to be involved in host response and disease pathogenicity. Due to their high genetic variability and complexity of analysis, they are typically disregarded for further research in genomic studies. There are currently limited online resources and homology computational tools that can identify and analyse PE_PGRS proteins. In addition, they are computational-intensive and time-consuming, and lack sensitivity. Therefore, computational methods that can rapidly and accurately identify PE_PGRS proteins are valuable to facilitate the functional elucidation of the PE_PGRS family proteins. In this study, we developed the first machine learning-based bioinformatics approach, termed PEPPER, to allow users to identify PE_PGRS proteins rapidly and accurately. PEPPER was built upon a comprehensive evaluation of 13 popular machine learning algorithms with various sequence and physicochemical features. Empirical studies demonstrated that PEPPER achieved significantly better performance than alignment-based approaches, BLASTP and PHMMER, in both prediction accuracy and speed. PEPPER is anticipated to facilitate community-wide efforts to conduct high-throughput identification and analysis of PE_PGRS proteins.
format	Online Article Text
id	pubmed-8804200
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Research Network of Computational and Structural Biotechnology
record_format	MEDLINE/PubMed
spelling	pubmed-88042002022-02-08 Computational analysis and prediction of PE_PGRS proteins using machine learning Li, Fuyi Guo, Xudong Xiang, Dongxu Pitt, Miranda E. Bainomugisa, Arnold Coin, Lachlan J.M. Comput Struct Biotechnol J Research Article Mycobacterium tuberculosis genome comprises approximately 10% of two families of poorly characterised genes due to their high GC content and highly repetitive nature. The largest sub-group, the proline-glutamic acid polymorphic guanine-cytosine-rich sequence (PE_PGRS) family, is thought to be involved in host response and disease pathogenicity. Due to their high genetic variability and complexity of analysis, they are typically disregarded for further research in genomic studies. There are currently limited online resources and homology computational tools that can identify and analyse PE_PGRS proteins. In addition, they are computational-intensive and time-consuming, and lack sensitivity. Therefore, computational methods that can rapidly and accurately identify PE_PGRS proteins are valuable to facilitate the functional elucidation of the PE_PGRS family proteins. In this study, we developed the first machine learning-based bioinformatics approach, termed PEPPER, to allow users to identify PE_PGRS proteins rapidly and accurately. PEPPER was built upon a comprehensive evaluation of 13 popular machine learning algorithms with various sequence and physicochemical features. Empirical studies demonstrated that PEPPER achieved significantly better performance than alignment-based approaches, BLASTP and PHMMER, in both prediction accuracy and speed. PEPPER is anticipated to facilitate community-wide efforts to conduct high-throughput identification and analysis of PE_PGRS proteins. Research Network of Computational and Structural Biotechnology 2022-01-22 /pmc/articles/PMC8804200/ /pubmed/35140886 http://dx.doi.org/10.1016/j.csbj.2022.01.019 Text en © 2022 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Research Article Li, Fuyi Guo, Xudong Xiang, Dongxu Pitt, Miranda E. Bainomugisa, Arnold Coin, Lachlan J.M. Computational analysis and prediction of PE_PGRS proteins using machine learning
title	Computational analysis and prediction of PE_PGRS proteins using machine learning
title_full	Computational analysis and prediction of PE_PGRS proteins using machine learning
title_fullStr	Computational analysis and prediction of PE_PGRS proteins using machine learning
title_full_unstemmed	Computational analysis and prediction of PE_PGRS proteins using machine learning
title_short	Computational analysis and prediction of PE_PGRS proteins using machine learning
title_sort	computational analysis and prediction of pe_pgrs proteins using machine learning
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8804200/ https://www.ncbi.nlm.nih.gov/pubmed/35140886 http://dx.doi.org/10.1016/j.csbj.2022.01.019
work_keys_str_mv	AT lifuyi computationalanalysisandpredictionofpepgrsproteinsusingmachinelearning AT guoxudong computationalanalysisandpredictionofpepgrsproteinsusingmachinelearning AT xiangdongxu computationalanalysisandpredictionofpepgrsproteinsusingmachinelearning AT pittmirandae computationalanalysisandpredictionofpepgrsproteinsusingmachinelearning AT bainomugisaarnold computationalanalysisandpredictionofpepgrsproteinsusingmachinelearning AT coinlachlanjm computationalanalysisandpredictionofpepgrsproteinsusingmachinelearning

Computational analysis and prediction of PE_PGRS proteins using machine learning

Ejemplares similares