Cargando…

Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration

The integration and reanalysis of big data provide valuable insights into microbiome studies. However, the significant difference in information scale between amplicon data poses a key challenge in data analysis. Therefore, reducing batch effects is crucial to enhance data integration for large-scal...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Tong, Zhao, Feng, Xu, Kuidong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10146031/
https://www.ncbi.nlm.nih.gov/pubmed/37110372
http://dx.doi.org/10.3390/microorganisms11040949
_version_ 1785034480879140864
author Zhou, Tong
Zhao, Feng
Xu, Kuidong
author_facet Zhou, Tong
Zhao, Feng
Xu, Kuidong
author_sort Zhou, Tong
collection PubMed
description The integration and reanalysis of big data provide valuable insights into microbiome studies. However, the significant difference in information scale between amplicon data poses a key challenge in data analysis. Therefore, reducing batch effects is crucial to enhance data integration for large-scale molecular ecology data. To achieve this, the information scale correction (ISC) step, involving cutting different length amplicons into the same sub-region, is essential. In this study, we used the Hidden Markov model (HMM) method to extract 11 different 18S rRNA gene v4 region amplicon datasets with 578 samples in total. The length of the amplicons ranged from 344 bp to 720 bp, depending on the primer position. By comparing the information scale correction of amplicons with varying lengths, we explored the extent to which the comparability between samples decreases with increasing amplicon length. Our method was shown to be more sensitive than V-Xtractor, the most popular tool for performing ISC. We found that near-scale amplicons exhibited no significant change after ISC, while larger-scale amplicons exhibited significant changes. After the ISC treatment, the similarity among the data sets improved, especially for long amplicons. Therefore, we recommend adding ISC processing when integrating big data, which is crucial for unlocking the full potential of microbial community studies and advancing our knowledge of microbial ecology.
format Online
Article
Text
id pubmed-10146031
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-101460312023-04-29 Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration Zhou, Tong Zhao, Feng Xu, Kuidong Microorganisms Article The integration and reanalysis of big data provide valuable insights into microbiome studies. However, the significant difference in information scale between amplicon data poses a key challenge in data analysis. Therefore, reducing batch effects is crucial to enhance data integration for large-scale molecular ecology data. To achieve this, the information scale correction (ISC) step, involving cutting different length amplicons into the same sub-region, is essential. In this study, we used the Hidden Markov model (HMM) method to extract 11 different 18S rRNA gene v4 region amplicon datasets with 578 samples in total. The length of the amplicons ranged from 344 bp to 720 bp, depending on the primer position. By comparing the information scale correction of amplicons with varying lengths, we explored the extent to which the comparability between samples decreases with increasing amplicon length. Our method was shown to be more sensitive than V-Xtractor, the most popular tool for performing ISC. We found that near-scale amplicons exhibited no significant change after ISC, while larger-scale amplicons exhibited significant changes. After the ISC treatment, the similarity among the data sets improved, especially for long amplicons. Therefore, we recommend adding ISC processing when integrating big data, which is crucial for unlocking the full potential of microbial community studies and advancing our knowledge of microbial ecology. MDPI 2023-04-06 /pmc/articles/PMC10146031/ /pubmed/37110372 http://dx.doi.org/10.3390/microorganisms11040949 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhou, Tong
Zhao, Feng
Xu, Kuidong
Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration
title Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration
title_full Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration
title_fullStr Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration
title_full_unstemmed Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration
title_short Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration
title_sort information scale correction for varying length amplicons improves eukaryotic microbiome data integration
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10146031/
https://www.ncbi.nlm.nih.gov/pubmed/37110372
http://dx.doi.org/10.3390/microorganisms11040949
work_keys_str_mv AT zhoutong informationscalecorrectionforvaryinglengthampliconsimproveseukaryoticmicrobiomedataintegration
AT zhaofeng informationscalecorrectionforvaryinglengthampliconsimproveseukaryoticmicrobiomedataintegration
AT xukuidong informationscalecorrectionforvaryinglengthampliconsimproveseukaryoticmicrobiomedataintegration