Cargando…

NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis

NG-Tax 2.0 is a semantic framework for FAIR high-throughput analysis and classification of marker gene amplicon sequences including bacterial and archaeal 16S ribosomal RNA (rRNA), eukaryotic 18S rRNA and ribosomal intergenic transcribed spacer sequences. It can directly use single or merged reads,...

Descripción completa

Detalles Bibliográficos
Autores principales: Poncheewin, Wasin, Hermes, Gerben D. A., van Dam, Jesse C. J., Koehorst, Jasper J., Smidt, Hauke, Schaap, Peter J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6989550/
https://www.ncbi.nlm.nih.gov/pubmed/32117417
http://dx.doi.org/10.3389/fgene.2019.01366
_version_ 1783492424848375808
author Poncheewin, Wasin
Hermes, Gerben D. A.
van Dam, Jesse C. J.
Koehorst, Jasper J.
Smidt, Hauke
Schaap, Peter J.
author_facet Poncheewin, Wasin
Hermes, Gerben D. A.
van Dam, Jesse C. J.
Koehorst, Jasper J.
Smidt, Hauke
Schaap, Peter J.
author_sort Poncheewin, Wasin
collection PubMed
description NG-Tax 2.0 is a semantic framework for FAIR high-throughput analysis and classification of marker gene amplicon sequences including bacterial and archaeal 16S ribosomal RNA (rRNA), eukaryotic 18S rRNA and ribosomal intergenic transcribed spacer sequences. It can directly use single or merged reads, paired-end reads and unmerged paired-end reads from long range fragments as input to generate de novo amplicon sequence variants (ASV). Using the RDF data model, ASV’s can be automatically stored in a graph database as objects that link ASV sequences with the full data-wise and element-wise provenance, thereby achieving the level of interoperability required to utilize such data to its full potential. The graph database can be directly queried, allowing for comparative analyses of over thousands of samples and is connected with an interactive Rshiny toolbox for analysis and visualization of (meta) data. Additionally, NG-Tax 2.0 exports an extended BIOM 1.0 (JSON) file as starting point for further analyses by other means. The extended BIOM file contains new attribute types to include information about the command arguments used, the sequences of the ASVs formed, classification confidence scores and is backwards compatible. The performance of NG-Tax 2.0 was compared with DADA2, using the plugin in the QIIME 2 analysis pipeline. Fourteen 16S rRNA gene amplicon mock community samples were obtained from the literature and evaluated. Precision of NG-Tax 2.0 was significantly higher with an average of 0.95 vs 0.58 for QIIME2-DADA2 while recall was comparable with an average of 0.85 and 0.77, respectively. NG-Tax 2.0 is written in Java. The code, the ontology, a Galaxy platform implementation, the analysis toolbox, tutorials and example SPARQL queries are freely available at http://wurssb.gitlab.io/ngtax under the MIT License.
format Online
Article
Text
id pubmed-6989550
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-69895502020-02-28 NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis Poncheewin, Wasin Hermes, Gerben D. A. van Dam, Jesse C. J. Koehorst, Jasper J. Smidt, Hauke Schaap, Peter J. Front Genet Genetics NG-Tax 2.0 is a semantic framework for FAIR high-throughput analysis and classification of marker gene amplicon sequences including bacterial and archaeal 16S ribosomal RNA (rRNA), eukaryotic 18S rRNA and ribosomal intergenic transcribed spacer sequences. It can directly use single or merged reads, paired-end reads and unmerged paired-end reads from long range fragments as input to generate de novo amplicon sequence variants (ASV). Using the RDF data model, ASV’s can be automatically stored in a graph database as objects that link ASV sequences with the full data-wise and element-wise provenance, thereby achieving the level of interoperability required to utilize such data to its full potential. The graph database can be directly queried, allowing for comparative analyses of over thousands of samples and is connected with an interactive Rshiny toolbox for analysis and visualization of (meta) data. Additionally, NG-Tax 2.0 exports an extended BIOM 1.0 (JSON) file as starting point for further analyses by other means. The extended BIOM file contains new attribute types to include information about the command arguments used, the sequences of the ASVs formed, classification confidence scores and is backwards compatible. The performance of NG-Tax 2.0 was compared with DADA2, using the plugin in the QIIME 2 analysis pipeline. Fourteen 16S rRNA gene amplicon mock community samples were obtained from the literature and evaluated. Precision of NG-Tax 2.0 was significantly higher with an average of 0.95 vs 0.58 for QIIME2-DADA2 while recall was comparable with an average of 0.85 and 0.77, respectively. NG-Tax 2.0 is written in Java. The code, the ontology, a Galaxy platform implementation, the analysis toolbox, tutorials and example SPARQL queries are freely available at http://wurssb.gitlab.io/ngtax under the MIT License. Frontiers Media S.A. 2020-01-23 /pmc/articles/PMC6989550/ /pubmed/32117417 http://dx.doi.org/10.3389/fgene.2019.01366 Text en Copyright © 2020 Poncheewin, Hermes, van Dam, Koehorst, Smidt and Schaap http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Poncheewin, Wasin
Hermes, Gerben D. A.
van Dam, Jesse C. J.
Koehorst, Jasper J.
Smidt, Hauke
Schaap, Peter J.
NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis
title NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis
title_full NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis
title_fullStr NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis
title_full_unstemmed NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis
title_short NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis
title_sort ng-tax 2.0: a semantic framework for high-throughput amplicon analysis
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6989550/
https://www.ncbi.nlm.nih.gov/pubmed/32117417
http://dx.doi.org/10.3389/fgene.2019.01366
work_keys_str_mv AT poncheewinwasin ngtax20asemanticframeworkforhighthroughputampliconanalysis
AT hermesgerbenda ngtax20asemanticframeworkforhighthroughputampliconanalysis
AT vandamjessecj ngtax20asemanticframeworkforhighthroughputampliconanalysis
AT koehorstjasperj ngtax20asemanticframeworkforhighthroughputampliconanalysis
AT smidthauke ngtax20asemanticframeworkforhighthroughputampliconanalysis
AT schaappeterj ngtax20asemanticframeworkforhighthroughputampliconanalysis