Cargando…

Automatic detection of false annotations via binary property clustering

BACKGROUND: Computational protein annotation methods occasionally introduce errors. False-positive (FP) errors are annotations that are mistakenly associated with a protein. Such false annotations introduce errors that may spread into databases through similarity with other proteins. Generally, meth...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaplan, Noam, Linial, Michal
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC555558/
https://www.ncbi.nlm.nih.gov/pubmed/15755318
http://dx.doi.org/10.1186/1471-2105-6-46
_version_ 1782122537490055168
author Kaplan, Noam
Linial, Michal
author_facet Kaplan, Noam
Linial, Michal
author_sort Kaplan, Noam
collection PubMed
description BACKGROUND: Computational protein annotation methods occasionally introduce errors. False-positive (FP) errors are annotations that are mistakenly associated with a protein. Such false annotations introduce errors that may spread into databases through similarity with other proteins. Generally, methods used to minimize the chance for FPs result in decreased sensitivity or low throughput. We present a novel protein-clustering method that enables automatic separation of FP from true hits. The method quantifies the biological similarity between pairs of proteins by examining each protein's annotations, and then proceeds by clustering sets of proteins that received similar annotation into biological groups. RESULTS: Using a test set of all PROSITE signatures that are marked as FPs, we show that the method successfully separates FPs in 69% of the 327 test cases supplied by PROSITE. Furthermore, we constructed an extensive random FP simulation test and show a high degree of success in detecting FP, indicating that the method is not specifically tuned for PROSITE and performs well on larger scales. We also suggest some means of predicting in which cases this approach would be successful. CONCLUSION: Automatic detection of FPs may greatly facilitate the manual validation process and increase annotation sensitivity. With the increasing number of automatic annotations, the tendency of biological properties to be clustered, once a biological similarity measure is introduced, may become exceedingly helpful in the development of such automatic methods.
format Text
id pubmed-555558
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5555582005-03-25 Automatic detection of false annotations via binary property clustering Kaplan, Noam Linial, Michal BMC Bioinformatics Methodology Article BACKGROUND: Computational protein annotation methods occasionally introduce errors. False-positive (FP) errors are annotations that are mistakenly associated with a protein. Such false annotations introduce errors that may spread into databases through similarity with other proteins. Generally, methods used to minimize the chance for FPs result in decreased sensitivity or low throughput. We present a novel protein-clustering method that enables automatic separation of FP from true hits. The method quantifies the biological similarity between pairs of proteins by examining each protein's annotations, and then proceeds by clustering sets of proteins that received similar annotation into biological groups. RESULTS: Using a test set of all PROSITE signatures that are marked as FPs, we show that the method successfully separates FPs in 69% of the 327 test cases supplied by PROSITE. Furthermore, we constructed an extensive random FP simulation test and show a high degree of success in detecting FP, indicating that the method is not specifically tuned for PROSITE and performs well on larger scales. We also suggest some means of predicting in which cases this approach would be successful. CONCLUSION: Automatic detection of FPs may greatly facilitate the manual validation process and increase annotation sensitivity. With the increasing number of automatic annotations, the tendency of biological properties to be clustered, once a biological similarity measure is introduced, may become exceedingly helpful in the development of such automatic methods. BioMed Central 2005-03-08 /pmc/articles/PMC555558/ /pubmed/15755318 http://dx.doi.org/10.1186/1471-2105-6-46 Text en Copyright © 2005 Kaplan and Linial; licensee BioMed Central Ltd.
spellingShingle Methodology Article
Kaplan, Noam
Linial, Michal
Automatic detection of false annotations via binary property clustering
title Automatic detection of false annotations via binary property clustering
title_full Automatic detection of false annotations via binary property clustering
title_fullStr Automatic detection of false annotations via binary property clustering
title_full_unstemmed Automatic detection of false annotations via binary property clustering
title_short Automatic detection of false annotations via binary property clustering
title_sort automatic detection of false annotations via binary property clustering
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC555558/
https://www.ncbi.nlm.nih.gov/pubmed/15755318
http://dx.doi.org/10.1186/1471-2105-6-46
work_keys_str_mv AT kaplannoam automaticdetectionoffalseannotationsviabinarypropertyclustering
AT linialmichal automaticdetectionoffalseannotationsviabinarypropertyclustering