Cargando…

GLOSSary: the GLobal Ocean 16S subunit web accessible resource

BACKGROUND: Environmental metagenomics is a challenging approach that is exponentially spreading in the scientific community to investigate taxonomic diversity and possible functions of the biological components. The massive amount of sequence data produced, often endowed with rich environmental met...

Descripción completa

Detalles Bibliográficos
Autores principales: Tangherlini, M., Miralto, M., Colantuono, C., Sangiovanni, M., Dell’ Anno, A., Corinaldesi, C., Danovaro, R., Chiusano, M. L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6266928/
https://www.ncbi.nlm.nih.gov/pubmed/30497362
http://dx.doi.org/10.1186/s12859-018-2423-8
_version_ 1783375949039927296
author Tangherlini, M.
Miralto, M.
Colantuono, C.
Sangiovanni, M.
Dell’ Anno, A.
Corinaldesi, C.
Danovaro, R.
Chiusano, M. L.
author_facet Tangherlini, M.
Miralto, M.
Colantuono, C.
Sangiovanni, M.
Dell’ Anno, A.
Corinaldesi, C.
Danovaro, R.
Chiusano, M. L.
author_sort Tangherlini, M.
collection PubMed
description BACKGROUND: Environmental metagenomics is a challenging approach that is exponentially spreading in the scientific community to investigate taxonomic diversity and possible functions of the biological components. The massive amount of sequence data produced, often endowed with rich environmental metadata, needs suitable computational tools to fully explore the embedded information. Bioinformatics plays a key role in providing methodologies to manage, process and mine molecular data, integrated with environmental metagenomics collections. One such relevant example is represented by the Tara Ocean Project. RESULTS: We considered the Tara 16S miTAGs released by the consortium, representing raw sequences from a shotgun metagenomics approach with similarities to 16S rRNA genes. We generated assembled 16S rDNA sequences, which were classified according to their lengths, the possible presence of chimeric reads, the putative taxonomic affiliation. The dataset was included in GLOSSary (the GLobal Ocean 16S Subunit web accessible resource), a bioinformatics platform to organize environmental metagenomics data. The aims of this work were: i) to present alternative computational approaches to manage challenging metagenomics data; ii) to set up user friendly web-based platforms to allow the integration of environmental metagenomics sequences and of the associated metadata; iii) to implement an appropriate bioinformatics platform supporting the analysis of 16S rDNA sequences exploiting reference datasets, such as the SILVA database. We organized the data in a next-generation NoSQL “schema-less” database, allowing flexible organization of large amounts of data and supporting native geospatial queries. A web interface was developed to permit an interactive exploration and a visual geographical localization of the data, either raw miTAG reads or 16S contigs, from our processing pipeline. Information on unassembled sequences is also available. The taxonomic affiliations of contigs and miTAGs, and the spatial distribution of the sampling sites and their associated sequence libraries, as they are contained in the Tara metadata, can be explored by a query interface, which allows both textual and visual investigations. In addition, all the sequence data were made available for a dedicated BLAST-based web application alongside the SILVA collection. CONCLUSIONS: GLOSSary provides an expandable bioinformatics environment, able to support the scientific community in current and forthcoming environmental metagenomics analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2423-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6266928
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62669282018-12-05 GLOSSary: the GLobal Ocean 16S subunit web accessible resource Tangherlini, M. Miralto, M. Colantuono, C. Sangiovanni, M. Dell’ Anno, A. Corinaldesi, C. Danovaro, R. Chiusano, M. L. BMC Bioinformatics Software BACKGROUND: Environmental metagenomics is a challenging approach that is exponentially spreading in the scientific community to investigate taxonomic diversity and possible functions of the biological components. The massive amount of sequence data produced, often endowed with rich environmental metadata, needs suitable computational tools to fully explore the embedded information. Bioinformatics plays a key role in providing methodologies to manage, process and mine molecular data, integrated with environmental metagenomics collections. One such relevant example is represented by the Tara Ocean Project. RESULTS: We considered the Tara 16S miTAGs released by the consortium, representing raw sequences from a shotgun metagenomics approach with similarities to 16S rRNA genes. We generated assembled 16S rDNA sequences, which were classified according to their lengths, the possible presence of chimeric reads, the putative taxonomic affiliation. The dataset was included in GLOSSary (the GLobal Ocean 16S Subunit web accessible resource), a bioinformatics platform to organize environmental metagenomics data. The aims of this work were: i) to present alternative computational approaches to manage challenging metagenomics data; ii) to set up user friendly web-based platforms to allow the integration of environmental metagenomics sequences and of the associated metadata; iii) to implement an appropriate bioinformatics platform supporting the analysis of 16S rDNA sequences exploiting reference datasets, such as the SILVA database. We organized the data in a next-generation NoSQL “schema-less” database, allowing flexible organization of large amounts of data and supporting native geospatial queries. A web interface was developed to permit an interactive exploration and a visual geographical localization of the data, either raw miTAG reads or 16S contigs, from our processing pipeline. Information on unassembled sequences is also available. The taxonomic affiliations of contigs and miTAGs, and the spatial distribution of the sampling sites and their associated sequence libraries, as they are contained in the Tara metadata, can be explored by a query interface, which allows both textual and visual investigations. In addition, all the sequence data were made available for a dedicated BLAST-based web application alongside the SILVA collection. CONCLUSIONS: GLOSSary provides an expandable bioinformatics environment, able to support the scientific community in current and forthcoming environmental metagenomics analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2423-8) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-30 /pmc/articles/PMC6266928/ /pubmed/30497362 http://dx.doi.org/10.1186/s12859-018-2423-8 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Tangherlini, M.
Miralto, M.
Colantuono, C.
Sangiovanni, M.
Dell’ Anno, A.
Corinaldesi, C.
Danovaro, R.
Chiusano, M. L.
GLOSSary: the GLobal Ocean 16S subunit web accessible resource
title GLOSSary: the GLobal Ocean 16S subunit web accessible resource
title_full GLOSSary: the GLobal Ocean 16S subunit web accessible resource
title_fullStr GLOSSary: the GLobal Ocean 16S subunit web accessible resource
title_full_unstemmed GLOSSary: the GLobal Ocean 16S subunit web accessible resource
title_short GLOSSary: the GLobal Ocean 16S subunit web accessible resource
title_sort glossary: the global ocean 16s subunit web accessible resource
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6266928/
https://www.ncbi.nlm.nih.gov/pubmed/30497362
http://dx.doi.org/10.1186/s12859-018-2423-8
work_keys_str_mv AT tangherlinim glossarytheglobalocean16ssubunitwebaccessibleresource
AT miraltom glossarytheglobalocean16ssubunitwebaccessibleresource
AT colantuonoc glossarytheglobalocean16ssubunitwebaccessibleresource
AT sangiovannim glossarytheglobalocean16ssubunitwebaccessibleresource
AT dellannoa glossarytheglobalocean16ssubunitwebaccessibleresource
AT corinaldesic glossarytheglobalocean16ssubunitwebaccessibleresource
AT danovaror glossarytheglobalocean16ssubunitwebaccessibleresource
AT chiusanoml glossarytheglobalocean16ssubunitwebaccessibleresource