Cargando…

FAS: assessing the similarity between proteins using multi-layered feature architectures

MOTIVATION: Protein sequence comparison is a fundamental element in the bioinformatics toolkit. When sequences are annotated with features such as functional domains, transmembrane domains, low complexity regions or secondary structure elements, the resulting feature architectures allow better infor...

Descripción completa

Detalles Bibliográficos
Autores principales: Dosch, Julian, Bergmann, Holger, Tran, Vinh, Ebersberger, Ingo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10185405/
https://www.ncbi.nlm.nih.gov/pubmed/37084276
http://dx.doi.org/10.1093/bioinformatics/btad226
_version_ 1785042349601062912
author Dosch, Julian
Bergmann, Holger
Tran, Vinh
Ebersberger, Ingo
author_facet Dosch, Julian
Bergmann, Holger
Tran, Vinh
Ebersberger, Ingo
author_sort Dosch, Julian
collection PubMed
description MOTIVATION: Protein sequence comparison is a fundamental element in the bioinformatics toolkit. When sequences are annotated with features such as functional domains, transmembrane domains, low complexity regions or secondary structure elements, the resulting feature architectures allow better informed comparisons. However, many existing schemes for scoring architecture similarities cannot cope with features arising from multiple annotation sources. Those that do fall short in the resolution of overlapping and redundant feature annotations. RESULTS: Here, we introduce FAS, a scoring method that integrates features from multiple annotation sources in a directed acyclic architecture graph. Redundancies are resolved as part of the architecture comparison by finding the paths through the graphs that maximize the pair-wise architecture similarity. In a large-scale evaluation on more than 10 000 human-yeast ortholog pairs, architecture similarities assessed with FAS are consistently more plausible than those obtained using e-values to resolve overlaps or leaving overlaps unresolved. Three case studies demonstrate the utility of FAS on architecture comparison tasks: benchmarking of orthology assignment software, identification of functionally diverged orthologs, and diagnosing protein architecture changes stemming from faulty gene predictions. With the help of FAS, feature architecture comparisons can now be routinely integrated into these and many other applications. AVAILABILITY AND IMPLEMENTATION: FAS is available as python package: https://pypi.org/project/greedyFAS/.
format Online
Article
Text
id pubmed-10185405
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101854052023-05-16 FAS: assessing the similarity between proteins using multi-layered feature architectures Dosch, Julian Bergmann, Holger Tran, Vinh Ebersberger, Ingo Bioinformatics Original Paper MOTIVATION: Protein sequence comparison is a fundamental element in the bioinformatics toolkit. When sequences are annotated with features such as functional domains, transmembrane domains, low complexity regions or secondary structure elements, the resulting feature architectures allow better informed comparisons. However, many existing schemes for scoring architecture similarities cannot cope with features arising from multiple annotation sources. Those that do fall short in the resolution of overlapping and redundant feature annotations. RESULTS: Here, we introduce FAS, a scoring method that integrates features from multiple annotation sources in a directed acyclic architecture graph. Redundancies are resolved as part of the architecture comparison by finding the paths through the graphs that maximize the pair-wise architecture similarity. In a large-scale evaluation on more than 10 000 human-yeast ortholog pairs, architecture similarities assessed with FAS are consistently more plausible than those obtained using e-values to resolve overlaps or leaving overlaps unresolved. Three case studies demonstrate the utility of FAS on architecture comparison tasks: benchmarking of orthology assignment software, identification of functionally diverged orthologs, and diagnosing protein architecture changes stemming from faulty gene predictions. With the help of FAS, feature architecture comparisons can now be routinely integrated into these and many other applications. AVAILABILITY AND IMPLEMENTATION: FAS is available as python package: https://pypi.org/project/greedyFAS/. Oxford University Press 2023-04-21 /pmc/articles/PMC10185405/ /pubmed/37084276 http://dx.doi.org/10.1093/bioinformatics/btad226 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Dosch, Julian
Bergmann, Holger
Tran, Vinh
Ebersberger, Ingo
FAS: assessing the similarity between proteins using multi-layered feature architectures
title FAS: assessing the similarity between proteins using multi-layered feature architectures
title_full FAS: assessing the similarity between proteins using multi-layered feature architectures
title_fullStr FAS: assessing the similarity between proteins using multi-layered feature architectures
title_full_unstemmed FAS: assessing the similarity between proteins using multi-layered feature architectures
title_short FAS: assessing the similarity between proteins using multi-layered feature architectures
title_sort fas: assessing the similarity between proteins using multi-layered feature architectures
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10185405/
https://www.ncbi.nlm.nih.gov/pubmed/37084276
http://dx.doi.org/10.1093/bioinformatics/btad226
work_keys_str_mv AT doschjulian fasassessingthesimilaritybetweenproteinsusingmultilayeredfeaturearchitectures
AT bergmannholger fasassessingthesimilaritybetweenproteinsusingmultilayeredfeaturearchitectures
AT tranvinh fasassessingthesimilaritybetweenproteinsusingmultilayeredfeaturearchitectures
AT ebersbergeringo fasassessingthesimilaritybetweenproteinsusingmultilayeredfeaturearchitectures