Cargando…

Applying negative rule mining to improve genome annotation

BACKGROUND: Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Artamonova, Irena I, Frishman, Goar, Frishman, Dmitrij
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1940032/ https://www.ncbi.nlm.nih.gov/pubmed/17659089 http://dx.doi.org/10.1186/1471-2105-8-261

_version_	1782134432657833984
author	Artamonova, Irena I Frishman, Goar Frishman, Dmitrij
author_facet	Artamonova, Irena I Frishman, Goar Frishman, Dmitrij
author_sort	Artamonova, Irena I
collection	PubMed
description	BACKGROUND: Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items. RESULTS: Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower. CONCLUSION: Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.
format	Text
id	pubmed-1940032
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-19400322007-08-07 Applying negative rule mining to improve genome annotation Artamonova, Irena I Frishman, Goar Frishman, Dmitrij BMC Bioinformatics Methodology Article BACKGROUND: Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items. RESULTS: Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower. CONCLUSION: Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection. BioMed Central 2007-07-21 /pmc/articles/PMC1940032/ /pubmed/17659089 http://dx.doi.org/10.1186/1471-2105-8-261 Text en Copyright © 2007 Artamonova et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Artamonova, Irena I Frishman, Goar Frishman, Dmitrij Applying negative rule mining to improve genome annotation
title	Applying negative rule mining to improve genome annotation
title_full	Applying negative rule mining to improve genome annotation
title_fullStr	Applying negative rule mining to improve genome annotation
title_full_unstemmed	Applying negative rule mining to improve genome annotation
title_short	Applying negative rule mining to improve genome annotation
title_sort	applying negative rule mining to improve genome annotation
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1940032/ https://www.ncbi.nlm.nih.gov/pubmed/17659089 http://dx.doi.org/10.1186/1471-2105-8-261
work_keys_str_mv	AT artamonovairenai applyingnegativeruleminingtoimprovegenomeannotation AT frishmangoar applyingnegativeruleminingtoimprovegenomeannotation AT frishmandmitrij applyingnegativeruleminingtoimprovegenomeannotation

Applying negative rule mining to improve genome annotation

Ejemplares similares