Cargando…

Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins

The interaction between DNA and protein is vital for the development of a living body. Previous numerous studies on in silico identification of DNA-binding proteins (DBPs) usually include features extracted from the alignment-based (pseudo) position-specific scoring matrix (PSSM), leading to limited...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Die, Zhang, Hua, Chen, Zeqi, Xie, Bo, Wang, Ye
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9256349/
https://www.ncbi.nlm.nih.gov/pubmed/35799660
http://dx.doi.org/10.1155/2022/5847242
_version_ 1784741091533127680
author Chen, Die
Zhang, Hua
Chen, Zeqi
Xie, Bo
Wang, Ye
author_facet Chen, Die
Zhang, Hua
Chen, Zeqi
Xie, Bo
Wang, Ye
author_sort Chen, Die
collection PubMed
description The interaction between DNA and protein is vital for the development of a living body. Previous numerous studies on in silico identification of DNA-binding proteins (DBPs) usually include features extracted from the alignment-based (pseudo) position-specific scoring matrix (PSSM), leading to limited application due to its time-consuming generation. Few researchers have paid attention to the application of pretrained language models at the scale of evolution to the identification of DBPs. To this end, we present comprehensive insights into a comparison study on alignment-based PSSM and pretrained evolutionary scale modeling (ESM) representations in the field of DBP classification. The comparison is conducted by extracting information from PSSM and ESM representations using four unified averaging operations and by performing various feature selection (FS) methods. Experimental results demonstrate that the pretrained ESM representation outperforms the PSSM-derived features in a fair comparison perspective. The pretrained feature presentation deserves wide application to the area of in silico DBP identification as well as other function annotation issues. Finally, it is also confirmed that an ensemble scheme by aggregating various trained FS models can significantly improve the classification performance of DBPs.
format Online
Article
Text
id pubmed-9256349
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-92563492022-07-06 Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins Chen, Die Zhang, Hua Chen, Zeqi Xie, Bo Wang, Ye Comput Math Methods Med Research Article The interaction between DNA and protein is vital for the development of a living body. Previous numerous studies on in silico identification of DNA-binding proteins (DBPs) usually include features extracted from the alignment-based (pseudo) position-specific scoring matrix (PSSM), leading to limited application due to its time-consuming generation. Few researchers have paid attention to the application of pretrained language models at the scale of evolution to the identification of DBPs. To this end, we present comprehensive insights into a comparison study on alignment-based PSSM and pretrained evolutionary scale modeling (ESM) representations in the field of DBP classification. The comparison is conducted by extracting information from PSSM and ESM representations using four unified averaging operations and by performing various feature selection (FS) methods. Experimental results demonstrate that the pretrained ESM representation outperforms the PSSM-derived features in a fair comparison perspective. The pretrained feature presentation deserves wide application to the area of in silico DBP identification as well as other function annotation issues. Finally, it is also confirmed that an ensemble scheme by aggregating various trained FS models can significantly improve the classification performance of DBPs. Hindawi 2022-06-28 /pmc/articles/PMC9256349/ /pubmed/35799660 http://dx.doi.org/10.1155/2022/5847242 Text en Copyright © 2022 Die Chen et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Chen, Die
Zhang, Hua
Chen, Zeqi
Xie, Bo
Wang, Ye
Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins
title Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins
title_full Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins
title_fullStr Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins
title_full_unstemmed Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins
title_short Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins
title_sort comparative analysis on alignment-based and pretrained feature representations for the identification of dna-binding proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9256349/
https://www.ncbi.nlm.nih.gov/pubmed/35799660
http://dx.doi.org/10.1155/2022/5847242
work_keys_str_mv AT chendie comparativeanalysisonalignmentbasedandpretrainedfeaturerepresentationsfortheidentificationofdnabindingproteins
AT zhanghua comparativeanalysisonalignmentbasedandpretrainedfeaturerepresentationsfortheidentificationofdnabindingproteins
AT chenzeqi comparativeanalysisonalignmentbasedandpretrainedfeaturerepresentationsfortheidentificationofdnabindingproteins
AT xiebo comparativeanalysisonalignmentbasedandpretrainedfeaturerepresentationsfortheidentificationofdnabindingproteins
AT wangye comparativeanalysisonalignmentbasedandpretrainedfeaturerepresentationsfortheidentificationofdnabindingproteins