Cargando…

DAIRYdb: a manually curated reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products

BACKGROUND: Reads assignment to taxonomic units is a key step in microbiome analysis pipelines. To date, accurate taxonomy annotation of 16S reads, particularly at species rank, is still challenging due to the short size of read sequences and differently curated classification databases. The close p...

Descripción completa

Detalles Bibliográficos
Autores principales: Meola, Marco, Rifa, Etienne, Shani, Noam, Delbès, Céline, Berthoud, Hélène, Chassard, Christophe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6615214/
https://www.ncbi.nlm.nih.gov/pubmed/31286860
http://dx.doi.org/10.1186/s12864-019-5914-8
_version_ 1783433323304976384
author Meola, Marco
Rifa, Etienne
Shani, Noam
Delbès, Céline
Berthoud, Hélène
Chassard, Christophe
author_facet Meola, Marco
Rifa, Etienne
Shani, Noam
Delbès, Céline
Berthoud, Hélène
Chassard, Christophe
author_sort Meola, Marco
collection PubMed
description BACKGROUND: Reads assignment to taxonomic units is a key step in microbiome analysis pipelines. To date, accurate taxonomy annotation of 16S reads, particularly at species rank, is still challenging due to the short size of read sequences and differently curated classification databases. The close phylogenetic relationship between species encountered in dairy products, however, makes it crucial to annotate species accurately to achieve sufficient phylogenetic resolution for further downstream ecological studies or for food diagnostics. Curated databases dedicated to the environment of interest are expected to improve the accuracy and resolution of taxonomy annotation. RESULTS: We provide a manually curated database composed of 10’290 full-length 16S rRNA gene sequences from prokaryotes tailored for dairy products analysis (https://github.com/marcomeola/DAIRYdb). The performance of the DAIRYdb was compared with the universal databases Silva, LTP, RDP and Greengenes. The DAIRYdb significantly outperformed all other databases independently of the classification algorithm by enabling higher accurate taxonomy annotation down to the species rank. The DAIRYdb accurately annotates over 90% of the sequences of either single or paired hypervariable regions automatically. The manually curated DAIRYdb strongly improves taxonomic annotation accuracy for microbiome studies in dairy environments. The DAIRYdb is a practical solution that enables automatization of this key step, thus facilitating the routine application of NGS microbiome analyses for microbial ecology studies and diagnostics in dairy products. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5914-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6615214
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66152142019-07-18 DAIRYdb: a manually curated reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products Meola, Marco Rifa, Etienne Shani, Noam Delbès, Céline Berthoud, Hélène Chassard, Christophe BMC Genomics Research Article BACKGROUND: Reads assignment to taxonomic units is a key step in microbiome analysis pipelines. To date, accurate taxonomy annotation of 16S reads, particularly at species rank, is still challenging due to the short size of read sequences and differently curated classification databases. The close phylogenetic relationship between species encountered in dairy products, however, makes it crucial to annotate species accurately to achieve sufficient phylogenetic resolution for further downstream ecological studies or for food diagnostics. Curated databases dedicated to the environment of interest are expected to improve the accuracy and resolution of taxonomy annotation. RESULTS: We provide a manually curated database composed of 10’290 full-length 16S rRNA gene sequences from prokaryotes tailored for dairy products analysis (https://github.com/marcomeola/DAIRYdb). The performance of the DAIRYdb was compared with the universal databases Silva, LTP, RDP and Greengenes. The DAIRYdb significantly outperformed all other databases independently of the classification algorithm by enabling higher accurate taxonomy annotation down to the species rank. The DAIRYdb accurately annotates over 90% of the sequences of either single or paired hypervariable regions automatically. The manually curated DAIRYdb strongly improves taxonomic annotation accuracy for microbiome studies in dairy environments. The DAIRYdb is a practical solution that enables automatization of this key step, thus facilitating the routine application of NGS microbiome analyses for microbial ecology studies and diagnostics in dairy products. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5914-8) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-08 /pmc/articles/PMC6615214/ /pubmed/31286860 http://dx.doi.org/10.1186/s12864-019-5914-8 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Meola, Marco
Rifa, Etienne
Shani, Noam
Delbès, Céline
Berthoud, Hélène
Chassard, Christophe
DAIRYdb: a manually curated reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products
title DAIRYdb: a manually curated reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products
title_full DAIRYdb: a manually curated reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products
title_fullStr DAIRYdb: a manually curated reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products
title_full_unstemmed DAIRYdb: a manually curated reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products
title_short DAIRYdb: a manually curated reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products
title_sort dairydb: a manually curated reference database for improved taxonomy annotation of 16s rrna gene sequences from dairy products
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6615214/
https://www.ncbi.nlm.nih.gov/pubmed/31286860
http://dx.doi.org/10.1186/s12864-019-5914-8
work_keys_str_mv AT meolamarco dairydbamanuallycuratedreferencedatabaseforimprovedtaxonomyannotationof16srrnagenesequencesfromdairyproducts
AT rifaetienne dairydbamanuallycuratedreferencedatabaseforimprovedtaxonomyannotationof16srrnagenesequencesfromdairyproducts
AT shaninoam dairydbamanuallycuratedreferencedatabaseforimprovedtaxonomyannotationof16srrnagenesequencesfromdairyproducts
AT delbesceline dairydbamanuallycuratedreferencedatabaseforimprovedtaxonomyannotationof16srrnagenesequencesfromdairyproducts
AT berthoudhelene dairydbamanuallycuratedreferencedatabaseforimprovedtaxonomyannotationof16srrnagenesequencesfromdairyproducts
AT chassardchristophe dairydbamanuallycuratedreferencedatabaseforimprovedtaxonomyannotationof16srrnagenesequencesfromdairyproducts