Cargando…
Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration
The integration and reanalysis of big data provide valuable insights into microbiome studies. However, the significant difference in information scale between amplicon data poses a key challenge in data analysis. Therefore, reducing batch effects is crucial to enhance data integration for large-scal...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10146031/ https://www.ncbi.nlm.nih.gov/pubmed/37110372 http://dx.doi.org/10.3390/microorganisms11040949 |
_version_ | 1785034480879140864 |
---|---|
author | Zhou, Tong Zhao, Feng Xu, Kuidong |
author_facet | Zhou, Tong Zhao, Feng Xu, Kuidong |
author_sort | Zhou, Tong |
collection | PubMed |
description | The integration and reanalysis of big data provide valuable insights into microbiome studies. However, the significant difference in information scale between amplicon data poses a key challenge in data analysis. Therefore, reducing batch effects is crucial to enhance data integration for large-scale molecular ecology data. To achieve this, the information scale correction (ISC) step, involving cutting different length amplicons into the same sub-region, is essential. In this study, we used the Hidden Markov model (HMM) method to extract 11 different 18S rRNA gene v4 region amplicon datasets with 578 samples in total. The length of the amplicons ranged from 344 bp to 720 bp, depending on the primer position. By comparing the information scale correction of amplicons with varying lengths, we explored the extent to which the comparability between samples decreases with increasing amplicon length. Our method was shown to be more sensitive than V-Xtractor, the most popular tool for performing ISC. We found that near-scale amplicons exhibited no significant change after ISC, while larger-scale amplicons exhibited significant changes. After the ISC treatment, the similarity among the data sets improved, especially for long amplicons. Therefore, we recommend adding ISC processing when integrating big data, which is crucial for unlocking the full potential of microbial community studies and advancing our knowledge of microbial ecology. |
format | Online Article Text |
id | pubmed-10146031 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-101460312023-04-29 Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration Zhou, Tong Zhao, Feng Xu, Kuidong Microorganisms Article The integration and reanalysis of big data provide valuable insights into microbiome studies. However, the significant difference in information scale between amplicon data poses a key challenge in data analysis. Therefore, reducing batch effects is crucial to enhance data integration for large-scale molecular ecology data. To achieve this, the information scale correction (ISC) step, involving cutting different length amplicons into the same sub-region, is essential. In this study, we used the Hidden Markov model (HMM) method to extract 11 different 18S rRNA gene v4 region amplicon datasets with 578 samples in total. The length of the amplicons ranged from 344 bp to 720 bp, depending on the primer position. By comparing the information scale correction of amplicons with varying lengths, we explored the extent to which the comparability between samples decreases with increasing amplicon length. Our method was shown to be more sensitive than V-Xtractor, the most popular tool for performing ISC. We found that near-scale amplicons exhibited no significant change after ISC, while larger-scale amplicons exhibited significant changes. After the ISC treatment, the similarity among the data sets improved, especially for long amplicons. Therefore, we recommend adding ISC processing when integrating big data, which is crucial for unlocking the full potential of microbial community studies and advancing our knowledge of microbial ecology. MDPI 2023-04-06 /pmc/articles/PMC10146031/ /pubmed/37110372 http://dx.doi.org/10.3390/microorganisms11040949 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Zhou, Tong Zhao, Feng Xu, Kuidong Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration |
title | Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration |
title_full | Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration |
title_fullStr | Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration |
title_full_unstemmed | Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration |
title_short | Information Scale Correction for Varying Length Amplicons Improves Eukaryotic Microbiome Data Integration |
title_sort | information scale correction for varying length amplicons improves eukaryotic microbiome data integration |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10146031/ https://www.ncbi.nlm.nih.gov/pubmed/37110372 http://dx.doi.org/10.3390/microorganisms11040949 |
work_keys_str_mv | AT zhoutong informationscalecorrectionforvaryinglengthampliconsimproveseukaryoticmicrobiomedataintegration AT zhaofeng informationscalecorrectionforvaryinglengthampliconsimproveseukaryoticmicrobiomedataintegration AT xukuidong informationscalecorrectionforvaryinglengthampliconsimproveseukaryoticmicrobiomedataintegration |