Cargando…
Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins
The interaction between DNA and protein is vital for the development of a living body. Previous numerous studies on in silico identification of DNA-binding proteins (DBPs) usually include features extracted from the alignment-based (pseudo) position-specific scoring matrix (PSSM), leading to limited...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9256349/ https://www.ncbi.nlm.nih.gov/pubmed/35799660 http://dx.doi.org/10.1155/2022/5847242 |
_version_ | 1784741091533127680 |
---|---|
author | Chen, Die Zhang, Hua Chen, Zeqi Xie, Bo Wang, Ye |
author_facet | Chen, Die Zhang, Hua Chen, Zeqi Xie, Bo Wang, Ye |
author_sort | Chen, Die |
collection | PubMed |
description | The interaction between DNA and protein is vital for the development of a living body. Previous numerous studies on in silico identification of DNA-binding proteins (DBPs) usually include features extracted from the alignment-based (pseudo) position-specific scoring matrix (PSSM), leading to limited application due to its time-consuming generation. Few researchers have paid attention to the application of pretrained language models at the scale of evolution to the identification of DBPs. To this end, we present comprehensive insights into a comparison study on alignment-based PSSM and pretrained evolutionary scale modeling (ESM) representations in the field of DBP classification. The comparison is conducted by extracting information from PSSM and ESM representations using four unified averaging operations and by performing various feature selection (FS) methods. Experimental results demonstrate that the pretrained ESM representation outperforms the PSSM-derived features in a fair comparison perspective. The pretrained feature presentation deserves wide application to the area of in silico DBP identification as well as other function annotation issues. Finally, it is also confirmed that an ensemble scheme by aggregating various trained FS models can significantly improve the classification performance of DBPs. |
format | Online Article Text |
id | pubmed-9256349 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-92563492022-07-06 Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins Chen, Die Zhang, Hua Chen, Zeqi Xie, Bo Wang, Ye Comput Math Methods Med Research Article The interaction between DNA and protein is vital for the development of a living body. Previous numerous studies on in silico identification of DNA-binding proteins (DBPs) usually include features extracted from the alignment-based (pseudo) position-specific scoring matrix (PSSM), leading to limited application due to its time-consuming generation. Few researchers have paid attention to the application of pretrained language models at the scale of evolution to the identification of DBPs. To this end, we present comprehensive insights into a comparison study on alignment-based PSSM and pretrained evolutionary scale modeling (ESM) representations in the field of DBP classification. The comparison is conducted by extracting information from PSSM and ESM representations using four unified averaging operations and by performing various feature selection (FS) methods. Experimental results demonstrate that the pretrained ESM representation outperforms the PSSM-derived features in a fair comparison perspective. The pretrained feature presentation deserves wide application to the area of in silico DBP identification as well as other function annotation issues. Finally, it is also confirmed that an ensemble scheme by aggregating various trained FS models can significantly improve the classification performance of DBPs. Hindawi 2022-06-28 /pmc/articles/PMC9256349/ /pubmed/35799660 http://dx.doi.org/10.1155/2022/5847242 Text en Copyright © 2022 Die Chen et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Chen, Die Zhang, Hua Chen, Zeqi Xie, Bo Wang, Ye Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins |
title | Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins |
title_full | Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins |
title_fullStr | Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins |
title_full_unstemmed | Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins |
title_short | Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins |
title_sort | comparative analysis on alignment-based and pretrained feature representations for the identification of dna-binding proteins |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9256349/ https://www.ncbi.nlm.nih.gov/pubmed/35799660 http://dx.doi.org/10.1155/2022/5847242 |
work_keys_str_mv | AT chendie comparativeanalysisonalignmentbasedandpretrainedfeaturerepresentationsfortheidentificationofdnabindingproteins AT zhanghua comparativeanalysisonalignmentbasedandpretrainedfeaturerepresentationsfortheidentificationofdnabindingproteins AT chenzeqi comparativeanalysisonalignmentbasedandpretrainedfeaturerepresentationsfortheidentificationofdnabindingproteins AT xiebo comparativeanalysisonalignmentbasedandpretrainedfeaturerepresentationsfortheidentificationofdnabindingproteins AT wangye comparativeanalysisonalignmentbasedandpretrainedfeaturerepresentationsfortheidentificationofdnabindingproteins |