Cargando…

Estimating the annotation error rate of curated GO database sequence annotations

BACKGROUND: Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have develop...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jones, Craig E, Brown, Alfred L, Baumann, Ute
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892569/ https://www.ncbi.nlm.nih.gov/pubmed/17519041 http://dx.doi.org/10.1186/1471-2105-8-170

_version_	1782133847749558272
author	Jones, Craig E Brown, Alfred L Baumann, Ute
author_facet	Jones, Craig E Brown, Alfred L Baumann, Ute
author_sort	Jones, Craig E
collection	PubMed
description	BACKGROUND: Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO) sequence database (GOSeqLite). This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. RESULTS: We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006) at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS) had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS) had an estimated error rate of 49%. CONCLUSION: While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.
format	Text
id	pubmed-1892569
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-18925692007-06-15 Estimating the annotation error rate of curated GO database sequence annotations Jones, Craig E Brown, Alfred L Baumann, Ute BMC Bioinformatics Research Article BACKGROUND: Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO) sequence database (GOSeqLite). This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. RESULTS: We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006) at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS) had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS) had an estimated error rate of 49%. CONCLUSION: While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information. BioMed Central 2007-05-22 /pmc/articles/PMC1892569/ /pubmed/17519041 http://dx.doi.org/10.1186/1471-2105-8-170 Text en Copyright © 2007 Jones et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Jones, Craig E Brown, Alfred L Baumann, Ute Estimating the annotation error rate of curated GO database sequence annotations
title	Estimating the annotation error rate of curated GO database sequence annotations
title_full	Estimating the annotation error rate of curated GO database sequence annotations
title_fullStr	Estimating the annotation error rate of curated GO database sequence annotations
title_full_unstemmed	Estimating the annotation error rate of curated GO database sequence annotations
title_short	Estimating the annotation error rate of curated GO database sequence annotations
title_sort	estimating the annotation error rate of curated go database sequence annotations
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892569/ https://www.ncbi.nlm.nih.gov/pubmed/17519041 http://dx.doi.org/10.1186/1471-2105-8-170
work_keys_str_mv	AT jonescraige estimatingtheannotationerrorrateofcuratedgodatabasesequenceannotations AT brownalfredl estimatingtheannotationerrorrateofcuratedgodatabasesequenceannotations AT baumannute estimatingtheannotationerrorrateofcuratedgodatabasesequenceannotations

Estimating the annotation error rate of curated GO database sequence annotations

Ejemplares similares