Cargando…

DFAST and DAGA: web-based integrated genome annotation tools and resources

Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation o...

Descripción completa

Detalles Bibliográficos
Autores principales: TANIZAWA, Yasuhiro, FUJISAWA, Takatomo, KAMINUMA, Eli, NAKAMURA, Yasukazu, ARITA, Masanori
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMFH Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5107635/
https://www.ncbi.nlm.nih.gov/pubmed/27867804
http://dx.doi.org/10.12938/bmfh.16-003
_version_ 1782467219712638976
author TANIZAWA, Yasuhiro
FUJISAWA, Takatomo
KAMINUMA, Eli
NAKAMURA, Yasukazu
ARITA, Masanori
author_facet TANIZAWA, Yasuhiro
FUJISAWA, Takatomo
KAMINUMA, Eli
NAKAMURA, Yasukazu
ARITA, Masanori
author_sort TANIZAWA, Yasuhiro
collection PubMed
description Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus, obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genome resources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii, whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp.
format Online
Article
Text
id pubmed-5107635
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BMFH Press
record_format MEDLINE/PubMed
spelling pubmed-51076352016-11-18 DFAST and DAGA: web-based integrated genome annotation tools and resources TANIZAWA, Yasuhiro FUJISAWA, Takatomo KAMINUMA, Eli NAKAMURA, Yasukazu ARITA, Masanori Biosci Microbiota Food Health Full Paper Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus, obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genome resources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii, whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp. BMFH Press 2016-07-14 2016 /pmc/articles/PMC5107635/ /pubmed/27867804 http://dx.doi.org/10.12938/bmfh.16-003 Text en BMFH Press http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial No Derivatives (by-nc-nd) License.
spellingShingle Full Paper
TANIZAWA, Yasuhiro
FUJISAWA, Takatomo
KAMINUMA, Eli
NAKAMURA, Yasukazu
ARITA, Masanori
DFAST and DAGA: web-based integrated genome annotation tools and resources
title DFAST and DAGA: web-based integrated genome annotation tools and resources
title_full DFAST and DAGA: web-based integrated genome annotation tools and resources
title_fullStr DFAST and DAGA: web-based integrated genome annotation tools and resources
title_full_unstemmed DFAST and DAGA: web-based integrated genome annotation tools and resources
title_short DFAST and DAGA: web-based integrated genome annotation tools and resources
title_sort dfast and daga: web-based integrated genome annotation tools and resources
topic Full Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5107635/
https://www.ncbi.nlm.nih.gov/pubmed/27867804
http://dx.doi.org/10.12938/bmfh.16-003
work_keys_str_mv AT tanizawayasuhiro dfastanddagawebbasedintegratedgenomeannotationtoolsandresources
AT fujisawatakatomo dfastanddagawebbasedintegratedgenomeannotationtoolsandresources
AT kaminumaeli dfastanddagawebbasedintegratedgenomeannotationtoolsandresources
AT nakamurayasukazu dfastanddagawebbasedintegratedgenomeannotationtoolsandresources
AT aritamasanori dfastanddagawebbasedintegratedgenomeannotationtoolsandresources