Cargando…

Div-BLAST: Diversification of Sequence Search Results

Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory ap...

Descripción completa

Detalles Bibliográficos
Autores principales: Eser, Elif, Can, Tolga, Ferhatosmanoğlu, Hakan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4274030/
https://www.ncbi.nlm.nih.gov/pubmed/25531115
http://dx.doi.org/10.1371/journal.pone.0115445
_version_ 1782349927240695808
author Eser, Elif
Can, Tolga
Ferhatosmanoğlu, Hakan
author_facet Eser, Elif
Can, Tolga
Ferhatosmanoğlu, Hakan
author_sort Eser, Elif
collection PubMed
description Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy can be avoided by introducing non-redundancy during database construction, but it is not feasible to dynamically set a level of non-redundancy tailored to a query sequence. We introduce the problem of diverse search and browsing in sequence databases that produce non-redundant results optimized for any given query. We define diversity measures for sequences and propose methods to obtain diverse results extracted from current sequence similarity search tools. We also propose a new measure to evaluate the diversity of a set of sequences that is returned as a result of a sequence similarity query. We evaluate the effectiveness of the proposed methods in post-processing BLAST and PSI-BLAST results. We also assess the functional diversity of the returned results based on available Gene Ontology annotations. Additionally, we include a comparison with a current redundancy elimination tool, CD-HIT. Our experiments show that the proposed methods are able to achieve more diverse yet significant result sets compared to static non-redundancy approaches. In both sequence-based and functional diversity evaluation, the proposed diversification methods significantly outperform original BLAST results and other baselines. A web based tool implementing the proposed methods, Div-BLAST, can be accessed at cedar.cs.bilkent.edu.tr/Div-BLAST
format Online
Article
Text
id pubmed-4274030
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-42740302014-12-31 Div-BLAST: Diversification of Sequence Search Results Eser, Elif Can, Tolga Ferhatosmanoğlu, Hakan PLoS One Research Article Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy can be avoided by introducing non-redundancy during database construction, but it is not feasible to dynamically set a level of non-redundancy tailored to a query sequence. We introduce the problem of diverse search and browsing in sequence databases that produce non-redundant results optimized for any given query. We define diversity measures for sequences and propose methods to obtain diverse results extracted from current sequence similarity search tools. We also propose a new measure to evaluate the diversity of a set of sequences that is returned as a result of a sequence similarity query. We evaluate the effectiveness of the proposed methods in post-processing BLAST and PSI-BLAST results. We also assess the functional diversity of the returned results based on available Gene Ontology annotations. Additionally, we include a comparison with a current redundancy elimination tool, CD-HIT. Our experiments show that the proposed methods are able to achieve more diverse yet significant result sets compared to static non-redundancy approaches. In both sequence-based and functional diversity evaluation, the proposed diversification methods significantly outperform original BLAST results and other baselines. A web based tool implementing the proposed methods, Div-BLAST, can be accessed at cedar.cs.bilkent.edu.tr/Div-BLAST Public Library of Science 2014-12-22 /pmc/articles/PMC4274030/ /pubmed/25531115 http://dx.doi.org/10.1371/journal.pone.0115445 Text en © 2014 Eser et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Eser, Elif
Can, Tolga
Ferhatosmanoğlu, Hakan
Div-BLAST: Diversification of Sequence Search Results
title Div-BLAST: Diversification of Sequence Search Results
title_full Div-BLAST: Diversification of Sequence Search Results
title_fullStr Div-BLAST: Diversification of Sequence Search Results
title_full_unstemmed Div-BLAST: Diversification of Sequence Search Results
title_short Div-BLAST: Diversification of Sequence Search Results
title_sort div-blast: diversification of sequence search results
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4274030/
https://www.ncbi.nlm.nih.gov/pubmed/25531115
http://dx.doi.org/10.1371/journal.pone.0115445
work_keys_str_mv AT eserelif divblastdiversificationofsequencesearchresults
AT cantolga divblastdiversificationofsequencesearchresults
AT ferhatosmanogluhakan divblastdiversificationofsequencesearchresults