Cargando…
Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT
BACKGROUND: As the number of genomes in public databases increases, it becomes more important to be able to quickly choose the best annotated genomes for further analyses in comparative genomics and evolution. A proxy to annotation quality is the estimation of overannotation by comparing annotated c...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4180129/ https://www.ncbi.nlm.nih.gov/pubmed/25228073 http://dx.doi.org/10.1186/1756-0500-7-651 |
_version_ | 1782337178093748224 |
---|---|
author | Moreno-Hagelsieb, Gabriel Hudy-Yuffa, Brigitte |
author_facet | Moreno-Hagelsieb, Gabriel Hudy-Yuffa, Brigitte |
author_sort | Moreno-Hagelsieb, Gabriel |
collection | PubMed |
description | BACKGROUND: As the number of genomes in public databases increases, it becomes more important to be able to quickly choose the best annotated genomes for further analyses in comparative genomics and evolution. A proxy to annotation quality is the estimation of overannotation by comparing annotated coding genes against the SwissProt database. NCBI’s BLAST (BLAST+) is the common software of choice to compare these sequences. Newer programs that run in a fraction of the time as BLAST+ might miss matches that BLAST+ would find. However, the results might still be useful to calculate overannotation. We thus decided to compare the overannotation estimates yielded using three such programs, UBLAST, LAST and the Blast-Like Alignment Tool (BLAT), and to test non-redundant versions of the SwissProt database to reduce the number of comparisons necessary. FINDINGS: We found that all, UBLAST, LAST and BLAT, tend to produce similar overannotation estimates to those obtained with BLAST+. As would be expected, results varied the most from those obtained with BLAST+ in genomes with fewer proteins matching sequences in the SwissProt database. UBLAST was the fastest running algorithm, and showed the smallest variation from the results obtained using BLAST+. Reduced SwissProt databases did not seem to affect the results much, but the reduction in time was modest compared to that obtained from UBLAST, LAST, or BLAT. CONCLUSIONS: Despite faster programs miss sequence matches otherwise found by NCBI’s BLAST, the overannotation estimates are very similar and thus these programs can be used with confidence for this task. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-651) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4180129 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41801292014-10-01 Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT Moreno-Hagelsieb, Gabriel Hudy-Yuffa, Brigitte BMC Res Notes Short Report BACKGROUND: As the number of genomes in public databases increases, it becomes more important to be able to quickly choose the best annotated genomes for further analyses in comparative genomics and evolution. A proxy to annotation quality is the estimation of overannotation by comparing annotated coding genes against the SwissProt database. NCBI’s BLAST (BLAST+) is the common software of choice to compare these sequences. Newer programs that run in a fraction of the time as BLAST+ might miss matches that BLAST+ would find. However, the results might still be useful to calculate overannotation. We thus decided to compare the overannotation estimates yielded using three such programs, UBLAST, LAST and the Blast-Like Alignment Tool (BLAT), and to test non-redundant versions of the SwissProt database to reduce the number of comparisons necessary. FINDINGS: We found that all, UBLAST, LAST and BLAT, tend to produce similar overannotation estimates to those obtained with BLAST+. As would be expected, results varied the most from those obtained with BLAST+ in genomes with fewer proteins matching sequences in the SwissProt database. UBLAST was the fastest running algorithm, and showed the smallest variation from the results obtained using BLAST+. Reduced SwissProt databases did not seem to affect the results much, but the reduction in time was modest compared to that obtained from UBLAST, LAST, or BLAT. CONCLUSIONS: Despite faster programs miss sequence matches otherwise found by NCBI’s BLAST, the overannotation estimates are very similar and thus these programs can be used with confidence for this task. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-651) contains supplementary material, which is available to authorized users. BioMed Central 2014-09-16 /pmc/articles/PMC4180129/ /pubmed/25228073 http://dx.doi.org/10.1186/1756-0500-7-651 Text en © Moreno-Hagelsieb and Hudy-Yuffa; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Short Report Moreno-Hagelsieb, Gabriel Hudy-Yuffa, Brigitte Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT |
title | Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT |
title_full | Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT |
title_fullStr | Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT |
title_full_unstemmed | Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT |
title_short | Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT |
title_sort | estimating overannotation across prokaryotic genomes using blast+, ublast, last and blat |
topic | Short Report |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4180129/ https://www.ncbi.nlm.nih.gov/pubmed/25228073 http://dx.doi.org/10.1186/1756-0500-7-651 |
work_keys_str_mv | AT morenohagelsiebgabriel estimatingoverannotationacrossprokaryoticgenomesusingblastublastlastandblat AT hudyyuffabrigitte estimatingoverannotationacrossprokaryoticgenomesusingblastublastlastandblat |