Cargando…

Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT

BACKGROUND: As the number of genomes in public databases increases, it becomes more important to be able to quickly choose the best annotated genomes for further analyses in comparative genomics and evolution. A proxy to annotation quality is the estimation of overannotation by comparing annotated c...

Descripción completa

Detalles Bibliográficos
Autores principales: Moreno-Hagelsieb, Gabriel, Hudy-Yuffa, Brigitte
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4180129/
https://www.ncbi.nlm.nih.gov/pubmed/25228073
http://dx.doi.org/10.1186/1756-0500-7-651
_version_ 1782337178093748224
author Moreno-Hagelsieb, Gabriel
Hudy-Yuffa, Brigitte
author_facet Moreno-Hagelsieb, Gabriel
Hudy-Yuffa, Brigitte
author_sort Moreno-Hagelsieb, Gabriel
collection PubMed
description BACKGROUND: As the number of genomes in public databases increases, it becomes more important to be able to quickly choose the best annotated genomes for further analyses in comparative genomics and evolution. A proxy to annotation quality is the estimation of overannotation by comparing annotated coding genes against the SwissProt database. NCBI’s BLAST (BLAST+) is the common software of choice to compare these sequences. Newer programs that run in a fraction of the time as BLAST+ might miss matches that BLAST+ would find. However, the results might still be useful to calculate overannotation. We thus decided to compare the overannotation estimates yielded using three such programs, UBLAST, LAST and the Blast-Like Alignment Tool (BLAT), and to test non-redundant versions of the SwissProt database to reduce the number of comparisons necessary. FINDINGS: We found that all, UBLAST, LAST and BLAT, tend to produce similar overannotation estimates to those obtained with BLAST+. As would be expected, results varied the most from those obtained with BLAST+ in genomes with fewer proteins matching sequences in the SwissProt database. UBLAST was the fastest running algorithm, and showed the smallest variation from the results obtained using BLAST+. Reduced SwissProt databases did not seem to affect the results much, but the reduction in time was modest compared to that obtained from UBLAST, LAST, or BLAT. CONCLUSIONS: Despite faster programs miss sequence matches otherwise found by NCBI’s BLAST, the overannotation estimates are very similar and thus these programs can be used with confidence for this task. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-651) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4180129
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41801292014-10-01 Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT Moreno-Hagelsieb, Gabriel Hudy-Yuffa, Brigitte BMC Res Notes Short Report BACKGROUND: As the number of genomes in public databases increases, it becomes more important to be able to quickly choose the best annotated genomes for further analyses in comparative genomics and evolution. A proxy to annotation quality is the estimation of overannotation by comparing annotated coding genes against the SwissProt database. NCBI’s BLAST (BLAST+) is the common software of choice to compare these sequences. Newer programs that run in a fraction of the time as BLAST+ might miss matches that BLAST+ would find. However, the results might still be useful to calculate overannotation. We thus decided to compare the overannotation estimates yielded using three such programs, UBLAST, LAST and the Blast-Like Alignment Tool (BLAT), and to test non-redundant versions of the SwissProt database to reduce the number of comparisons necessary. FINDINGS: We found that all, UBLAST, LAST and BLAT, tend to produce similar overannotation estimates to those obtained with BLAST+. As would be expected, results varied the most from those obtained with BLAST+ in genomes with fewer proteins matching sequences in the SwissProt database. UBLAST was the fastest running algorithm, and showed the smallest variation from the results obtained using BLAST+. Reduced SwissProt databases did not seem to affect the results much, but the reduction in time was modest compared to that obtained from UBLAST, LAST, or BLAT. CONCLUSIONS: Despite faster programs miss sequence matches otherwise found by NCBI’s BLAST, the overannotation estimates are very similar and thus these programs can be used with confidence for this task. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-651) contains supplementary material, which is available to authorized users. BioMed Central 2014-09-16 /pmc/articles/PMC4180129/ /pubmed/25228073 http://dx.doi.org/10.1186/1756-0500-7-651 Text en © Moreno-Hagelsieb and Hudy-Yuffa; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Short Report
Moreno-Hagelsieb, Gabriel
Hudy-Yuffa, Brigitte
Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT
title Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT
title_full Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT
title_fullStr Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT
title_full_unstemmed Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT
title_short Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT
title_sort estimating overannotation across prokaryotic genomes using blast+, ublast, last and blat
topic Short Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4180129/
https://www.ncbi.nlm.nih.gov/pubmed/25228073
http://dx.doi.org/10.1186/1756-0500-7-651
work_keys_str_mv AT morenohagelsiebgabriel estimatingoverannotationacrossprokaryoticgenomesusingblastublastlastandblat
AT hudyyuffabrigitte estimatingoverannotationacrossprokaryoticgenomesusingblastublastlastandblat