Cargando…

HFSP: high speed homology-driven function annotation of proteins

MOTIVATION: The rapid drop in sequencing costs has produced many more (predicted) protein sequences than can feasibly be functionally annotated with wet-lab experiments. Thus, many computational methods have been developed for this purpose. Most of these methods employ homology-based inference, appr...

Descripción completa

Detalles Bibliográficos
Autores principales: Mahlich, Yannick, Steinegger, Martin, Rost, Burkhard, Bromberg, Yana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022561/
https://www.ncbi.nlm.nih.gov/pubmed/29950013
http://dx.doi.org/10.1093/bioinformatics/bty262
_version_ 1783335704315559936
author Mahlich, Yannick
Steinegger, Martin
Rost, Burkhard
Bromberg, Yana
author_facet Mahlich, Yannick
Steinegger, Martin
Rost, Burkhard
Bromberg, Yana
author_sort Mahlich, Yannick
collection PubMed
description MOTIVATION: The rapid drop in sequencing costs has produced many more (predicted) protein sequences than can feasibly be functionally annotated with wet-lab experiments. Thus, many computational methods have been developed for this purpose. Most of these methods employ homology-based inference, approximated via sequence alignments, to transfer functional annotations between proteins. The increase in the number of available sequences, however, has drastically increased the search space, thus significantly slowing down alignment methods. RESULTS: Here we describe homology-derived functional similarity of proteins (HFSP), a novel computational method that uses results of a high-speed alignment algorithm, MMseqs2, to infer functional similarity of proteins on the basis of their alignment length and sequence identity. We show that our method is accurate (85% precision) and fast (more than 40-fold speed increase over state-of-the-art). HFSP can help correct at least a 16% error in legacy curations, even for a resource of as high quality as Swiss-Prot. These findings suggest HFSP as an ideal resource for large-scale functional annotation efforts. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6022561
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60225612018-07-10 HFSP: high speed homology-driven function annotation of proteins Mahlich, Yannick Steinegger, Martin Rost, Burkhard Bromberg, Yana Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: The rapid drop in sequencing costs has produced many more (predicted) protein sequences than can feasibly be functionally annotated with wet-lab experiments. Thus, many computational methods have been developed for this purpose. Most of these methods employ homology-based inference, approximated via sequence alignments, to transfer functional annotations between proteins. The increase in the number of available sequences, however, has drastically increased the search space, thus significantly slowing down alignment methods. RESULTS: Here we describe homology-derived functional similarity of proteins (HFSP), a novel computational method that uses results of a high-speed alignment algorithm, MMseqs2, to infer functional similarity of proteins on the basis of their alignment length and sequence identity. We show that our method is accurate (85% precision) and fast (more than 40-fold speed increase over state-of-the-art). HFSP can help correct at least a 16% error in legacy curations, even for a resource of as high quality as Swiss-Prot. These findings suggest HFSP as an ideal resource for large-scale functional annotation efforts. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022561/ /pubmed/29950013 http://dx.doi.org/10.1093/bioinformatics/bty262 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Mahlich, Yannick
Steinegger, Martin
Rost, Burkhard
Bromberg, Yana
HFSP: high speed homology-driven function annotation of proteins
title HFSP: high speed homology-driven function annotation of proteins
title_full HFSP: high speed homology-driven function annotation of proteins
title_fullStr HFSP: high speed homology-driven function annotation of proteins
title_full_unstemmed HFSP: high speed homology-driven function annotation of proteins
title_short HFSP: high speed homology-driven function annotation of proteins
title_sort hfsp: high speed homology-driven function annotation of proteins
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022561/
https://www.ncbi.nlm.nih.gov/pubmed/29950013
http://dx.doi.org/10.1093/bioinformatics/bty262
work_keys_str_mv AT mahlichyannick hfsphighspeedhomologydrivenfunctionannotationofproteins
AT steineggermartin hfsphighspeedhomologydrivenfunctionannotationofproteins
AT rostburkhard hfsphighspeedhomologydrivenfunctionannotationofproteins
AT brombergyana hfsphighspeedhomologydrivenfunctionannotationofproteins