Cargando…
GLOSSary: the GLobal Ocean 16S subunit web accessible resource
BACKGROUND: Environmental metagenomics is a challenging approach that is exponentially spreading in the scientific community to investigate taxonomic diversity and possible functions of the biological components. The massive amount of sequence data produced, often endowed with rich environmental met...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6266928/ https://www.ncbi.nlm.nih.gov/pubmed/30497362 http://dx.doi.org/10.1186/s12859-018-2423-8 |
_version_ | 1783375949039927296 |
---|---|
author | Tangherlini, M. Miralto, M. Colantuono, C. Sangiovanni, M. Dell’ Anno, A. Corinaldesi, C. Danovaro, R. Chiusano, M. L. |
author_facet | Tangherlini, M. Miralto, M. Colantuono, C. Sangiovanni, M. Dell’ Anno, A. Corinaldesi, C. Danovaro, R. Chiusano, M. L. |
author_sort | Tangherlini, M. |
collection | PubMed |
description | BACKGROUND: Environmental metagenomics is a challenging approach that is exponentially spreading in the scientific community to investigate taxonomic diversity and possible functions of the biological components. The massive amount of sequence data produced, often endowed with rich environmental metadata, needs suitable computational tools to fully explore the embedded information. Bioinformatics plays a key role in providing methodologies to manage, process and mine molecular data, integrated with environmental metagenomics collections. One such relevant example is represented by the Tara Ocean Project. RESULTS: We considered the Tara 16S miTAGs released by the consortium, representing raw sequences from a shotgun metagenomics approach with similarities to 16S rRNA genes. We generated assembled 16S rDNA sequences, which were classified according to their lengths, the possible presence of chimeric reads, the putative taxonomic affiliation. The dataset was included in GLOSSary (the GLobal Ocean 16S Subunit web accessible resource), a bioinformatics platform to organize environmental metagenomics data. The aims of this work were: i) to present alternative computational approaches to manage challenging metagenomics data; ii) to set up user friendly web-based platforms to allow the integration of environmental metagenomics sequences and of the associated metadata; iii) to implement an appropriate bioinformatics platform supporting the analysis of 16S rDNA sequences exploiting reference datasets, such as the SILVA database. We organized the data in a next-generation NoSQL “schema-less” database, allowing flexible organization of large amounts of data and supporting native geospatial queries. A web interface was developed to permit an interactive exploration and a visual geographical localization of the data, either raw miTAG reads or 16S contigs, from our processing pipeline. Information on unassembled sequences is also available. The taxonomic affiliations of contigs and miTAGs, and the spatial distribution of the sampling sites and their associated sequence libraries, as they are contained in the Tara metadata, can be explored by a query interface, which allows both textual and visual investigations. In addition, all the sequence data were made available for a dedicated BLAST-based web application alongside the SILVA collection. CONCLUSIONS: GLOSSary provides an expandable bioinformatics environment, able to support the scientific community in current and forthcoming environmental metagenomics analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2423-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6266928 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-62669282018-12-05 GLOSSary: the GLobal Ocean 16S subunit web accessible resource Tangherlini, M. Miralto, M. Colantuono, C. Sangiovanni, M. Dell’ Anno, A. Corinaldesi, C. Danovaro, R. Chiusano, M. L. BMC Bioinformatics Software BACKGROUND: Environmental metagenomics is a challenging approach that is exponentially spreading in the scientific community to investigate taxonomic diversity and possible functions of the biological components. The massive amount of sequence data produced, often endowed with rich environmental metadata, needs suitable computational tools to fully explore the embedded information. Bioinformatics plays a key role in providing methodologies to manage, process and mine molecular data, integrated with environmental metagenomics collections. One such relevant example is represented by the Tara Ocean Project. RESULTS: We considered the Tara 16S miTAGs released by the consortium, representing raw sequences from a shotgun metagenomics approach with similarities to 16S rRNA genes. We generated assembled 16S rDNA sequences, which were classified according to their lengths, the possible presence of chimeric reads, the putative taxonomic affiliation. The dataset was included in GLOSSary (the GLobal Ocean 16S Subunit web accessible resource), a bioinformatics platform to organize environmental metagenomics data. The aims of this work were: i) to present alternative computational approaches to manage challenging metagenomics data; ii) to set up user friendly web-based platforms to allow the integration of environmental metagenomics sequences and of the associated metadata; iii) to implement an appropriate bioinformatics platform supporting the analysis of 16S rDNA sequences exploiting reference datasets, such as the SILVA database. We organized the data in a next-generation NoSQL “schema-less” database, allowing flexible organization of large amounts of data and supporting native geospatial queries. A web interface was developed to permit an interactive exploration and a visual geographical localization of the data, either raw miTAG reads or 16S contigs, from our processing pipeline. Information on unassembled sequences is also available. The taxonomic affiliations of contigs and miTAGs, and the spatial distribution of the sampling sites and their associated sequence libraries, as they are contained in the Tara metadata, can be explored by a query interface, which allows both textual and visual investigations. In addition, all the sequence data were made available for a dedicated BLAST-based web application alongside the SILVA collection. CONCLUSIONS: GLOSSary provides an expandable bioinformatics environment, able to support the scientific community in current and forthcoming environmental metagenomics analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2423-8) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-30 /pmc/articles/PMC6266928/ /pubmed/30497362 http://dx.doi.org/10.1186/s12859-018-2423-8 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Tangherlini, M. Miralto, M. Colantuono, C. Sangiovanni, M. Dell’ Anno, A. Corinaldesi, C. Danovaro, R. Chiusano, M. L. GLOSSary: the GLobal Ocean 16S subunit web accessible resource |
title | GLOSSary: the GLobal Ocean 16S subunit web accessible resource |
title_full | GLOSSary: the GLobal Ocean 16S subunit web accessible resource |
title_fullStr | GLOSSary: the GLobal Ocean 16S subunit web accessible resource |
title_full_unstemmed | GLOSSary: the GLobal Ocean 16S subunit web accessible resource |
title_short | GLOSSary: the GLobal Ocean 16S subunit web accessible resource |
title_sort | glossary: the global ocean 16s subunit web accessible resource |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6266928/ https://www.ncbi.nlm.nih.gov/pubmed/30497362 http://dx.doi.org/10.1186/s12859-018-2423-8 |
work_keys_str_mv | AT tangherlinim glossarytheglobalocean16ssubunitwebaccessibleresource AT miraltom glossarytheglobalocean16ssubunitwebaccessibleresource AT colantuonoc glossarytheglobalocean16ssubunitwebaccessibleresource AT sangiovannim glossarytheglobalocean16ssubunitwebaccessibleresource AT dellannoa glossarytheglobalocean16ssubunitwebaccessibleresource AT corinaldesic glossarytheglobalocean16ssubunitwebaccessibleresource AT danovaror glossarytheglobalocean16ssubunitwebaccessibleresource AT chiusanoml glossarytheglobalocean16ssubunitwebaccessibleresource |