Cargando…

NoGOA: predicting noisy GO annotations using evidences and sparse representation

BACKGROUND: Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Guoxian, Lu, Chang, Wang, Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5521088/
https://www.ncbi.nlm.nih.gov/pubmed/28732468
http://dx.doi.org/10.1186/s12859-017-1764-z
_version_ 1783251915230937088
author Yu, Guoxian
Lu, Chang
Wang, Jun
author_facet Yu, Guoxian
Lu, Chang
Wang, Jun
author_sort Yu, Guoxian
collection PubMed
description BACKGROUND: Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. RESULTS: We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. CONCLUSIONS: The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1764-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5521088
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55210882017-07-21 NoGOA: predicting noisy GO annotations using evidences and sparse representation Yu, Guoxian Lu, Chang Wang, Jun BMC Bioinformatics Methodology Article BACKGROUND: Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. RESULTS: We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. CONCLUSIONS: The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1764-z) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-21 /pmc/articles/PMC5521088/ /pubmed/28732468 http://dx.doi.org/10.1186/s12859-017-1764-z Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Yu, Guoxian
Lu, Chang
Wang, Jun
NoGOA: predicting noisy GO annotations using evidences and sparse representation
title NoGOA: predicting noisy GO annotations using evidences and sparse representation
title_full NoGOA: predicting noisy GO annotations using evidences and sparse representation
title_fullStr NoGOA: predicting noisy GO annotations using evidences and sparse representation
title_full_unstemmed NoGOA: predicting noisy GO annotations using evidences and sparse representation
title_short NoGOA: predicting noisy GO annotations using evidences and sparse representation
title_sort nogoa: predicting noisy go annotations using evidences and sparse representation
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5521088/
https://www.ncbi.nlm.nih.gov/pubmed/28732468
http://dx.doi.org/10.1186/s12859-017-1764-z
work_keys_str_mv AT yuguoxian nogoapredictingnoisygoannotationsusingevidencesandsparserepresentation
AT luchang nogoapredictingnoisygoannotationsusingevidencesandsparserepresentation
AT wangjun nogoapredictingnoisygoannotationsusingevidencesandsparserepresentation