Cargando…

ProFeatX: A parallelized protein feature extraction suite for machine learning

Machine learning algorithms have been successfully applied in proteomics, genomics and transcriptomics. and have helped the biological community to answer complex questions. However, most machine learning methods require lots of data, with every data point having the same vector size. The biological...

Descripción completa

Detalles Bibliográficos
Autores principales:	Guevara-Barrientos, David, Kaundal, Rakesh
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Research Network of Computational and Structural Biotechnology 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9842958/ https://www.ncbi.nlm.nih.gov/pubmed/36698978 http://dx.doi.org/10.1016/j.csbj.2022.12.044

_version_	1784870270629052416
author	Guevara-Barrientos, David Kaundal, Rakesh
author_facet	Guevara-Barrientos, David Kaundal, Rakesh
author_sort	Guevara-Barrientos, David
collection	PubMed
description	Machine learning algorithms have been successfully applied in proteomics, genomics and transcriptomics. and have helped the biological community to answer complex questions. However, most machine learning methods require lots of data, with every data point having the same vector size. The biological sequence data, such as proteins, are amino acid sequences of variable length, which makes it essential to extract a definite number of features from all the proteins for them to be used as input into machine learning models. There are numerous methods to achieve this, but only several tools let researchers encode their proteins using multiple schemes without having to use different programs or, in many cases, code these algorithms themselves, or even come up with new algorithms. In this work, we created ProFeatX, a tool that contains 50 encodings to extract protein features in an efficient and fast way supporting desktop as well as high-performance computing environment. It can also encode concatenated features for protein-protein interactions. The tool has an easy-to-use web interface, allowing non-experts to use feature extraction techniques, as well as a stand-alone version for advanced users. ProFeatX is implemented in C++ and available on GitHub at https://github.com/usubioinfo/profeatx. The web server is available at http://bioinfo.usu.edu/profeatx/.
format	Online Article Text
id	pubmed-9842958
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Research Network of Computational and Structural Biotechnology
record_format	MEDLINE/PubMed
spelling	pubmed-98429582023-01-24 ProFeatX: A parallelized protein feature extraction suite for machine learning Guevara-Barrientos, David Kaundal, Rakesh Comput Struct Biotechnol J Research Article Machine learning algorithms have been successfully applied in proteomics, genomics and transcriptomics. and have helped the biological community to answer complex questions. However, most machine learning methods require lots of data, with every data point having the same vector size. The biological sequence data, such as proteins, are amino acid sequences of variable length, which makes it essential to extract a definite number of features from all the proteins for them to be used as input into machine learning models. There are numerous methods to achieve this, but only several tools let researchers encode their proteins using multiple schemes without having to use different programs or, in many cases, code these algorithms themselves, or even come up with new algorithms. In this work, we created ProFeatX, a tool that contains 50 encodings to extract protein features in an efficient and fast way supporting desktop as well as high-performance computing environment. It can also encode concatenated features for protein-protein interactions. The tool has an easy-to-use web interface, allowing non-experts to use feature extraction techniques, as well as a stand-alone version for advanced users. ProFeatX is implemented in C++ and available on GitHub at https://github.com/usubioinfo/profeatx. The web server is available at http://bioinfo.usu.edu/profeatx/. Research Network of Computational and Structural Biotechnology 2022-12-29 /pmc/articles/PMC9842958/ /pubmed/36698978 http://dx.doi.org/10.1016/j.csbj.2022.12.044 Text en © 2023 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Research Article Guevara-Barrientos, David Kaundal, Rakesh ProFeatX: A parallelized protein feature extraction suite for machine learning
title	ProFeatX: A parallelized protein feature extraction suite for machine learning
title_full	ProFeatX: A parallelized protein feature extraction suite for machine learning
title_fullStr	ProFeatX: A parallelized protein feature extraction suite for machine learning
title_full_unstemmed	ProFeatX: A parallelized protein feature extraction suite for machine learning
title_short	ProFeatX: A parallelized protein feature extraction suite for machine learning
title_sort	profeatx: a parallelized protein feature extraction suite for machine learning
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9842958/ https://www.ncbi.nlm.nih.gov/pubmed/36698978 http://dx.doi.org/10.1016/j.csbj.2022.12.044
work_keys_str_mv	AT guevarabarrientosdavid profeatxaparallelizedproteinfeatureextractionsuiteformachinelearning AT kaundalrakesh profeatxaparallelizedproteinfeatureextractionsuiteformachinelearning

ProFeatX: A parallelized protein feature extraction suite for machine learning

Ejemplares similares