Cargando…

Automated methods of predicting the function of biological sequences using GO and BLAST

BACKGROUND: With the exponential increase in genomic sequence data there is a need to develop automated approaches to deducing the biological functions of novel sequences with high accuracy. Our aim is to demonstrate how accuracy benchmarking can be used in a decision-making process evaluating compe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jones, Craig E, Baumann, Ute, Brown, Alfred L
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1298289/ https://www.ncbi.nlm.nih.gov/pubmed/16288652 http://dx.doi.org/10.1186/1471-2105-6-272

_version_	1782126246012911616
author	Jones, Craig E Baumann, Ute Brown, Alfred L
author_facet	Jones, Craig E Baumann, Ute Brown, Alfred L
author_sort	Jones, Craig E
collection	PubMed
description	BACKGROUND: With the exponential increase in genomic sequence data there is a need to develop automated approaches to deducing the biological functions of novel sequences with high accuracy. Our aim is to demonstrate how accuracy benchmarking can be used in a decision-making process evaluating competing designs of biological function predictors. We utilise the Gene Ontology, GO, a directed acyclic graph of functional terms, to annotate sequences with functional information describing their biological context. Initially we examine the effect on accuracy scores of increasing the allowed distance between predicted and a test set of curator assigned terms. Next we evaluate several annotator methods using accuracy benchmarking. Given an unannotated sequence we use the Basic Local Alignment Search Tool, BLAST, to find similar sequences that have already been assigned GO terms by curators. A number of methods were developed that utilise terms associated with the best five matching sequences. These methods were compared against a benchmark method of simply using terms associated with the best BLAST-matched sequence (best BLAST approach). RESULTS: The precision and recall of estimates increases rapidly as the amount of distance permitted between a predicted term and a correct term assignment increases. Accuracy benchmarking allows a comparison of annotation methods. A covering graph approach performs poorly, except where the term assignment rate is high. A term distance concordance approach has a similar accuracy to the best BLAST approach, demonstrating lower precision but higher recall. However, a discriminant function method has higher precision and recall than the best BLAST approach and other methods shown here. CONCLUSION: Allowing term predictions to be counted correct if closely related to a correct term decreases the reliability of the accuracy score. As such we recommend using accuracy measures that require exact matching of predicted terms with curator assigned terms. Furthermore, we conclude that competing designs of BLAST-based GO term annotators can be effectively compared using an accuracy benchmarking approach. The most accurate annotation method was developed using data mining techniques. As such we recommend that designers of term annotators utilise accuracy benchmarking and data mining to ensure newly developed annotators are of high quality.
format	Text
id	pubmed-1298289
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-12982892005-12-02 Automated methods of predicting the function of biological sequences using GO and BLAST Jones, Craig E Baumann, Ute Brown, Alfred L BMC Bioinformatics Methodology Article BACKGROUND: With the exponential increase in genomic sequence data there is a need to develop automated approaches to deducing the biological functions of novel sequences with high accuracy. Our aim is to demonstrate how accuracy benchmarking can be used in a decision-making process evaluating competing designs of biological function predictors. We utilise the Gene Ontology, GO, a directed acyclic graph of functional terms, to annotate sequences with functional information describing their biological context. Initially we examine the effect on accuracy scores of increasing the allowed distance between predicted and a test set of curator assigned terms. Next we evaluate several annotator methods using accuracy benchmarking. Given an unannotated sequence we use the Basic Local Alignment Search Tool, BLAST, to find similar sequences that have already been assigned GO terms by curators. A number of methods were developed that utilise terms associated with the best five matching sequences. These methods were compared against a benchmark method of simply using terms associated with the best BLAST-matched sequence (best BLAST approach). RESULTS: The precision and recall of estimates increases rapidly as the amount of distance permitted between a predicted term and a correct term assignment increases. Accuracy benchmarking allows a comparison of annotation methods. A covering graph approach performs poorly, except where the term assignment rate is high. A term distance concordance approach has a similar accuracy to the best BLAST approach, demonstrating lower precision but higher recall. However, a discriminant function method has higher precision and recall than the best BLAST approach and other methods shown here. CONCLUSION: Allowing term predictions to be counted correct if closely related to a correct term decreases the reliability of the accuracy score. As such we recommend using accuracy measures that require exact matching of predicted terms with curator assigned terms. Furthermore, we conclude that competing designs of BLAST-based GO term annotators can be effectively compared using an accuracy benchmarking approach. The most accurate annotation method was developed using data mining techniques. As such we recommend that designers of term annotators utilise accuracy benchmarking and data mining to ensure newly developed annotators are of high quality. BioMed Central 2005-11-15 /pmc/articles/PMC1298289/ /pubmed/16288652 http://dx.doi.org/10.1186/1471-2105-6-272 Text en Copyright © 2005 Jones et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Jones, Craig E Baumann, Ute Brown, Alfred L Automated methods of predicting the function of biological sequences using GO and BLAST
title	Automated methods of predicting the function of biological sequences using GO and BLAST
title_full	Automated methods of predicting the function of biological sequences using GO and BLAST
title_fullStr	Automated methods of predicting the function of biological sequences using GO and BLAST
title_full_unstemmed	Automated methods of predicting the function of biological sequences using GO and BLAST
title_short	Automated methods of predicting the function of biological sequences using GO and BLAST
title_sort	automated methods of predicting the function of biological sequences using go and blast
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1298289/ https://www.ncbi.nlm.nih.gov/pubmed/16288652 http://dx.doi.org/10.1186/1471-2105-6-272
work_keys_str_mv	AT jonescraige automatedmethodsofpredictingthefunctionofbiologicalsequencesusinggoandblast AT baumannute automatedmethodsofpredictingthefunctionofbiologicalsequencesusinggoandblast AT brownalfredl automatedmethodsofpredictingthefunctionofbiologicalsequencesusinggoandblast

Automated methods of predicting the function of biological sequences using GO and BLAST

Ejemplares similares