Cargando…

Combining learning and constraints for genome-wide protein annotation

BACKGROUND: The advent of high-throughput experimental techniques paved the way to genome-wide computational analysis and predictive annotation studies. When considering the joint annotation of a large set of related entities, like all proteins of a certain genome, many candidate annotations could b...

Descripción completa

Detalles Bibliográficos
Autores principales:	Teso, Stefano, Masera, Luca, Diligenti, Michelangelo, Passerini, Andrea
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6580517/ https://www.ncbi.nlm.nih.gov/pubmed/31208327 http://dx.doi.org/10.1186/s12859-019-2875-5

_version_	1783428036595548160
author	Teso, Stefano Masera, Luca Diligenti, Michelangelo Passerini, Andrea
author_facet	Teso, Stefano Masera, Luca Diligenti, Michelangelo Passerini, Andrea
author_sort	Teso, Stefano
collection	PubMed
description	BACKGROUND: The advent of high-throughput experimental techniques paved the way to genome-wide computational analysis and predictive annotation studies. When considering the joint annotation of a large set of related entities, like all proteins of a certain genome, many candidate annotations could be inconsistent, or very unlikely, given the existing knowledge. A sound predictive framework capable of accounting for this type of constraints in making predictions could substantially contribute to the quality of machine-generated annotations at a genomic scale. RESULTS: We present Ocelot, a predictive pipeline which simultaneously addresses functional and interaction annotation of all proteins of a given genome. The system combines sequence-based predictors for functional and protein-protein interaction (PPI) prediction with a consistency layer enforcing (soft) constraints as fuzzy logic rules. The enforced rules represent the available prior knowledge about the classification task, including taxonomic constraints over each GO hierarchy (e.g. a protein labeled with a GO term should also be labeled with all ancestor terms) as well as rules combining interaction and function prediction. An extensive experimental evaluation on the Yeast genome shows that the integration of prior knowledge via rules substantially improves the quality of the predictions. The system largely outperforms GoFDR, the only high-ranking system at the last CAFA challenge with a readily available implementation, when GoFDR is given access to intra-genome information only (as Ocelot), and has comparable or better results (depending on the hierarchy and performance measure) when GoFDR is allowed to use information from other genomes. Our system also compares favorably to recent methods based on deep learning. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2875-5) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6580517
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-65805172019-06-24 Combining learning and constraints for genome-wide protein annotation Teso, Stefano Masera, Luca Diligenti, Michelangelo Passerini, Andrea BMC Bioinformatics Software BACKGROUND: The advent of high-throughput experimental techniques paved the way to genome-wide computational analysis and predictive annotation studies. When considering the joint annotation of a large set of related entities, like all proteins of a certain genome, many candidate annotations could be inconsistent, or very unlikely, given the existing knowledge. A sound predictive framework capable of accounting for this type of constraints in making predictions could substantially contribute to the quality of machine-generated annotations at a genomic scale. RESULTS: We present Ocelot, a predictive pipeline which simultaneously addresses functional and interaction annotation of all proteins of a given genome. The system combines sequence-based predictors for functional and protein-protein interaction (PPI) prediction with a consistency layer enforcing (soft) constraints as fuzzy logic rules. The enforced rules represent the available prior knowledge about the classification task, including taxonomic constraints over each GO hierarchy (e.g. a protein labeled with a GO term should also be labeled with all ancestor terms) as well as rules combining interaction and function prediction. An extensive experimental evaluation on the Yeast genome shows that the integration of prior knowledge via rules substantially improves the quality of the predictions. The system largely outperforms GoFDR, the only high-ranking system at the last CAFA challenge with a readily available implementation, when GoFDR is given access to intra-genome information only (as Ocelot), and has comparable or better results (depending on the hierarchy and performance measure) when GoFDR is allowed to use information from other genomes. Our system also compares favorably to recent methods based on deep learning. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2875-5) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-17 /pmc/articles/PMC6580517/ /pubmed/31208327 http://dx.doi.org/10.1186/s12859-019-2875-5 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Teso, Stefano Masera, Luca Diligenti, Michelangelo Passerini, Andrea Combining learning and constraints for genome-wide protein annotation
title	Combining learning and constraints for genome-wide protein annotation
title_full	Combining learning and constraints for genome-wide protein annotation
title_fullStr	Combining learning and constraints for genome-wide protein annotation
title_full_unstemmed	Combining learning and constraints for genome-wide protein annotation
title_short	Combining learning and constraints for genome-wide protein annotation
title_sort	combining learning and constraints for genome-wide protein annotation
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6580517/ https://www.ncbi.nlm.nih.gov/pubmed/31208327 http://dx.doi.org/10.1186/s12859-019-2875-5
work_keys_str_mv	AT tesostefano combininglearningandconstraintsforgenomewideproteinannotation AT maseraluca combininglearningandconstraintsforgenomewideproteinannotation AT diligentimichelangelo combininglearningandconstraintsforgenomewideproteinannotation AT passeriniandrea combininglearningandconstraintsforgenomewideproteinannotation

Combining learning and constraints for genome-wide protein annotation

Ejemplares similares