Cargando…

A new pipeline for structural characterization and classification of RNA-Seq microbiome data

BACKGROUND: High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variable...

Descripción completa

Detalles Bibliográficos
Autores principales:	Racedo, Sebastian, Portnoy, Ivan, Vélez, Jorge I., San-Juan-Vergara, Homero, Sanjuan, Marco, Zurek, Eduardo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8268467/ https://www.ncbi.nlm.nih.gov/pubmed/34243809 http://dx.doi.org/10.1186/s13040-021-00266-7

_version_	1783720364331761664
author	Racedo, Sebastian Portnoy, Ivan Vélez, Jorge I. San-Juan-Vergara, Homero Sanjuan, Marco Zurek, Eduardo
author_facet	Racedo, Sebastian Portnoy, Ivan Vélez, Jorge I. San-Juan-Vergara, Homero Sanjuan, Marco Zurek, Eduardo
author_sort	Racedo, Sebastian
collection	PubMed
description	BACKGROUND: High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. RESULTS: Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method’s classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method’s performance with that of two state-of-the-art methods. CONCLUSIONS: Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments.
format	Online Article Text
id	pubmed-8268467
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-82684672021-07-09 A new pipeline for structural characterization and classification of RNA-Seq microbiome data Racedo, Sebastian Portnoy, Ivan Vélez, Jorge I. San-Juan-Vergara, Homero Sanjuan, Marco Zurek, Eduardo BioData Min Methodology BACKGROUND: High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. RESULTS: Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method’s classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method’s performance with that of two state-of-the-art methods. CONCLUSIONS: Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments. BioMed Central 2021-07-09 /pmc/articles/PMC8268467/ /pubmed/34243809 http://dx.doi.org/10.1186/s13040-021-00266-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Racedo, Sebastian Portnoy, Ivan Vélez, Jorge I. San-Juan-Vergara, Homero Sanjuan, Marco Zurek, Eduardo A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title	A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_full	A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_fullStr	A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_full_unstemmed	A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_short	A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_sort	new pipeline for structural characterization and classification of rna-seq microbiome data
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8268467/ https://www.ncbi.nlm.nih.gov/pubmed/34243809 http://dx.doi.org/10.1186/s13040-021-00266-7
work_keys_str_mv	AT racedosebastian anewpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT portnoyivan anewpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT velezjorgei anewpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT sanjuanvergarahomero anewpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT sanjuanmarco anewpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT zurekeduardo anewpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT racedosebastian newpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT portnoyivan newpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT velezjorgei newpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT sanjuanvergarahomero newpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT sanjuanmarco newpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT zurekeduardo newpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata

A new pipeline for structural characterization and classification of RNA-Seq microbiome data

Ejemplares similares