Cargando…

CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields

Sequence–structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence–structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence–structure al...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Sung Jong, Joo, Keehyoung, Sim, Sangjin, Lee, Juyong, Lee, In-Ho, Lee, Jooyoung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9231382/
https://www.ncbi.nlm.nih.gov/pubmed/35744836
http://dx.doi.org/10.3390/molecules27123711
_version_ 1784735325354983424
author Lee, Sung Jong
Joo, Keehyoung
Sim, Sangjin
Lee, Juyong
Lee, In-Ho
Lee, Jooyoung
author_facet Lee, Sung Jong
Joo, Keehyoung
Sim, Sangjin
Lee, Juyong
Lee, In-Ho
Lee, Jooyoung
author_sort Lee, Sung Jong
collection PubMed
description Sequence–structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence–structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence–structure alignment called CRFalign, which improves upon a base alignment model based on HMM-HMM comparison by employing pairwise conditional random fields in combination with nonlinear scoring functions of structural and sequence features. Nonlinear scoring part is implemented by a set of gradient boosted regression trees. In addition to sequence profile features, various position-dependent structural features are employed including secondary structures and solvent accessibilities. Training is performed on reference alignments at superfamily levels or twilight zone chosen from the SABmark benchmark set. We found that CRFalign method produces relative improvement in terms of average alignment accuracies for validation sets of SABmark benchmark. We also tested CRFalign on 51 sequence–structure pairs involving 15 FM target domains of CASP14, where we could see that CRFalign leads to an improvement in average modeling accuracies in these hard targets (TM-CRFalign [Formula: see text]) compared with that of HHalign (TM-HHalign [Formula: see text]) and also that of MRFalign (TM-MRFalign [Formula: see text]). CRFalign was incorporated to our template search framework called CRFpred and was tested for a random target set of 300 target proteins consisting of Easy, Medium and Hard sets which showed a reasonable template search performance.
format Online
Article
Text
id pubmed-9231382
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-92313822022-06-25 CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields Lee, Sung Jong Joo, Keehyoung Sim, Sangjin Lee, Juyong Lee, In-Ho Lee, Jooyoung Molecules Article Sequence–structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence–structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence–structure alignment called CRFalign, which improves upon a base alignment model based on HMM-HMM comparison by employing pairwise conditional random fields in combination with nonlinear scoring functions of structural and sequence features. Nonlinear scoring part is implemented by a set of gradient boosted regression trees. In addition to sequence profile features, various position-dependent structural features are employed including secondary structures and solvent accessibilities. Training is performed on reference alignments at superfamily levels or twilight zone chosen from the SABmark benchmark set. We found that CRFalign method produces relative improvement in terms of average alignment accuracies for validation sets of SABmark benchmark. We also tested CRFalign on 51 sequence–structure pairs involving 15 FM target domains of CASP14, where we could see that CRFalign leads to an improvement in average modeling accuracies in these hard targets (TM-CRFalign [Formula: see text]) compared with that of HHalign (TM-HHalign [Formula: see text]) and also that of MRFalign (TM-MRFalign [Formula: see text]). CRFalign was incorporated to our template search framework called CRFpred and was tested for a random target set of 300 target proteins consisting of Easy, Medium and Hard sets which showed a reasonable template search performance. MDPI 2022-06-09 /pmc/articles/PMC9231382/ /pubmed/35744836 http://dx.doi.org/10.3390/molecules27123711 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Lee, Sung Jong
Joo, Keehyoung
Sim, Sangjin
Lee, Juyong
Lee, In-Ho
Lee, Jooyoung
CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields
title CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields
title_full CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields
title_fullStr CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields
title_full_unstemmed CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields
title_short CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields
title_sort crfalign: a sequence-structure alignment of proteins based on a combination of hmm-hmm comparison and conditional random fields
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9231382/
https://www.ncbi.nlm.nih.gov/pubmed/35744836
http://dx.doi.org/10.3390/molecules27123711
work_keys_str_mv AT leesungjong crfalignasequencestructurealignmentofproteinsbasedonacombinationofhmmhmmcomparisonandconditionalrandomfields
AT jookeehyoung crfalignasequencestructurealignmentofproteinsbasedonacombinationofhmmhmmcomparisonandconditionalrandomfields
AT simsangjin crfalignasequencestructurealignmentofproteinsbasedonacombinationofhmmhmmcomparisonandconditionalrandomfields
AT leejuyong crfalignasequencestructurealignmentofproteinsbasedonacombinationofhmmhmmcomparisonandconditionalrandomfields
AT leeinho crfalignasequencestructurealignmentofproteinsbasedonacombinationofhmmhmmcomparisonandconditionalrandomfields
AT leejooyoung crfalignasequencestructurealignmentofproteinsbasedonacombinationofhmmhmmcomparisonandconditionalrandomfields