Cargando…

Combining heterogeneous data sources for accurate functional annotation of proteins

Combining heterogeneous sources of data is essential for accurate prediction of protein function. The task is complicated by the fact that while sequence-based features can be readily compared across species, most other data are species-specific. In this paper, we present a multi-view extension to G...

Descripción completa

Detalles Bibliográficos
Autores principales: Sokolov, Artem, Funk, Christopher, Graim, Kiley, Verspoor, Karin, Ben-Hur, Asa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3584846/
https://www.ncbi.nlm.nih.gov/pubmed/23514123
http://dx.doi.org/10.1186/1471-2105-14-S3-S10
_version_ 1782261070428110848
author Sokolov, Artem
Funk, Christopher
Graim, Kiley
Verspoor, Karin
Ben-Hur, Asa
author_facet Sokolov, Artem
Funk, Christopher
Graim, Kiley
Verspoor, Karin
Ben-Hur, Asa
author_sort Sokolov, Artem
collection PubMed
description Combining heterogeneous sources of data is essential for accurate prediction of protein function. The task is complicated by the fact that while sequence-based features can be readily compared across species, most other data are species-specific. In this paper, we present a multi-view extension to GOstruct, a structured-output framework for function annotation of proteins. The extended framework can learn from disparate data sources, with each data source provided to the framework in the form of a kernel. Our empirical results demonstrate that the multi-view framework is able to utilize all available information, yielding better performance than sequence-based models trained across species and models trained from collections of data within a given species. This version of GOstruct participated in the recent Critical Assessment of Functional Annotations (CAFA) challenge; since then we have significantly improved the natural language processing component of the method, which now provides performance that is on par with that provided by sequence information. The GOstruct framework is available for download at http://strut.sourceforge.net.
format Online
Article
Text
id pubmed-3584846
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35848462013-03-11 Combining heterogeneous data sources for accurate functional annotation of proteins Sokolov, Artem Funk, Christopher Graim, Kiley Verspoor, Karin Ben-Hur, Asa BMC Bioinformatics Proceedings Combining heterogeneous sources of data is essential for accurate prediction of protein function. The task is complicated by the fact that while sequence-based features can be readily compared across species, most other data are species-specific. In this paper, we present a multi-view extension to GOstruct, a structured-output framework for function annotation of proteins. The extended framework can learn from disparate data sources, with each data source provided to the framework in the form of a kernel. Our empirical results demonstrate that the multi-view framework is able to utilize all available information, yielding better performance than sequence-based models trained across species and models trained from collections of data within a given species. This version of GOstruct participated in the recent Critical Assessment of Functional Annotations (CAFA) challenge; since then we have significantly improved the natural language processing component of the method, which now provides performance that is on par with that provided by sequence information. The GOstruct framework is available for download at http://strut.sourceforge.net. BioMed Central 2013-02-28 /pmc/articles/PMC3584846/ /pubmed/23514123 http://dx.doi.org/10.1186/1471-2105-14-S3-S10 Text en Copyright ©2013 Sokolov etal.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Sokolov, Artem
Funk, Christopher
Graim, Kiley
Verspoor, Karin
Ben-Hur, Asa
Combining heterogeneous data sources for accurate functional annotation of proteins
title Combining heterogeneous data sources for accurate functional annotation of proteins
title_full Combining heterogeneous data sources for accurate functional annotation of proteins
title_fullStr Combining heterogeneous data sources for accurate functional annotation of proteins
title_full_unstemmed Combining heterogeneous data sources for accurate functional annotation of proteins
title_short Combining heterogeneous data sources for accurate functional annotation of proteins
title_sort combining heterogeneous data sources for accurate functional annotation of proteins
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3584846/
https://www.ncbi.nlm.nih.gov/pubmed/23514123
http://dx.doi.org/10.1186/1471-2105-14-S3-S10
work_keys_str_mv AT sokolovartem combiningheterogeneousdatasourcesforaccuratefunctionalannotationofproteins
AT funkchristopher combiningheterogeneousdatasourcesforaccuratefunctionalannotationofproteins
AT graimkiley combiningheterogeneousdatasourcesforaccuratefunctionalannotationofproteins
AT verspoorkarin combiningheterogeneousdatasourcesforaccuratefunctionalannotationofproteins
AT benhurasa combiningheterogeneousdatasourcesforaccuratefunctionalannotationofproteins