Cargando…

CroMaSt: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment

MOTIVATION: Protein domains can be viewed as building blocks, essential for understanding structure–function relationships in proteins. However, each domain database classifies protein domains using its own methodology. Thus, in many cases, domain models and boundaries differ from one domain databas...

Descripción completa

Detalles Bibliográficos
Autores principales: Dhondge, Hrishikesh, Chauvot de Beauchêne, Isaure, Devignes, Marie-Dominique
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329740/
https://www.ncbi.nlm.nih.gov/pubmed/37431435
http://dx.doi.org/10.1093/bioadv/vbad081
_version_ 1785070083656122368
author Dhondge, Hrishikesh
Chauvot de Beauchêne, Isaure
Devignes, Marie-Dominique
author_facet Dhondge, Hrishikesh
Chauvot de Beauchêne, Isaure
Devignes, Marie-Dominique
author_sort Dhondge, Hrishikesh
collection PubMed
description MOTIVATION: Protein domains can be viewed as building blocks, essential for understanding structure–function relationships in proteins. However, each domain database classifies protein domains using its own methodology. Thus, in many cases, domain models and boundaries differ from one domain database to the other, raising the question of domain definition and enumeration of true domain instances. RESULTS: We propose an automated iterative workflow to assess protein domain classification by cross-mapping domain structural instances between domain databases and by evaluating structural alignments. CroMaSt (for Cross-Mapper of domain Structural instances) will classify all experimental structural instances of a given domain type into four different categories (‘Core’, ‘True’, ‘Domain-like’ and ‘Failed’). CroMast is developed in Common Workflow Language and takes advantage of two well-known domain databases with wide coverage: Pfam and CATH. It uses the Kpax structural alignment tool with expert-adjusted parameters. CroMaSt was tested with the RNA Recognition Motif domain type and identifies 962 ‘True’ and 541 ‘Domain-like’ structural instances for this domain type. This method solves a crucial issue in domain-centric research and can generate essential information that could be used for synthetic biology and machine-learning approaches of protein domain engineering. AVAILABILITY AND IMPLEMENTATION: The workflow and the Results archive for the CroMaSt runs presented in this article are available from WorkflowHub (doi: 10.48546/workflowhub.workflow.390.2). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-10329740
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103297402023-07-10 CroMaSt: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment Dhondge, Hrishikesh Chauvot de Beauchêne, Isaure Devignes, Marie-Dominique Bioinform Adv Original Paper MOTIVATION: Protein domains can be viewed as building blocks, essential for understanding structure–function relationships in proteins. However, each domain database classifies protein domains using its own methodology. Thus, in many cases, domain models and boundaries differ from one domain database to the other, raising the question of domain definition and enumeration of true domain instances. RESULTS: We propose an automated iterative workflow to assess protein domain classification by cross-mapping domain structural instances between domain databases and by evaluating structural alignments. CroMaSt (for Cross-Mapper of domain Structural instances) will classify all experimental structural instances of a given domain type into four different categories (‘Core’, ‘True’, ‘Domain-like’ and ‘Failed’). CroMast is developed in Common Workflow Language and takes advantage of two well-known domain databases with wide coverage: Pfam and CATH. It uses the Kpax structural alignment tool with expert-adjusted parameters. CroMaSt was tested with the RNA Recognition Motif domain type and identifies 962 ‘True’ and 541 ‘Domain-like’ structural instances for this domain type. This method solves a crucial issue in domain-centric research and can generate essential information that could be used for synthetic biology and machine-learning approaches of protein domain engineering. AVAILABILITY AND IMPLEMENTATION: The workflow and the Results archive for the CroMaSt runs presented in this article are available from WorkflowHub (doi: 10.48546/workflowhub.workflow.390.2). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2023-06-27 /pmc/articles/PMC10329740/ /pubmed/37431435 http://dx.doi.org/10.1093/bioadv/vbad081 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Dhondge, Hrishikesh
Chauvot de Beauchêne, Isaure
Devignes, Marie-Dominique
CroMaSt: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment
title CroMaSt: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment
title_full CroMaSt: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment
title_fullStr CroMaSt: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment
title_full_unstemmed CroMaSt: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment
title_short CroMaSt: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment
title_sort cromast: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329740/
https://www.ncbi.nlm.nih.gov/pubmed/37431435
http://dx.doi.org/10.1093/bioadv/vbad081
work_keys_str_mv AT dhondgehrishikesh cromastaworkflowforassessingproteindomainclassificationbycrossmappingofstructuralinstancesbetweendomaindatabasesandstructuralalignment
AT chauvotdebeaucheneisaure cromastaworkflowforassessingproteindomainclassificationbycrossmappingofstructuralinstancesbetweendomaindatabasesandstructuralalignment
AT devignesmariedominique cromastaworkflowforassessingproteindomainclassificationbycrossmappingofstructuralinstancesbetweendomaindatabasesandstructuralalignment