Cargando…
Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data
Understanding the function of the human microbiome is important but the development of statistical methods specifically for the microbial gene expression (i.e. metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different da...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516371/ https://www.ncbi.nlm.nih.gov/pubmed/37738402 http://dx.doi.org/10.1093/bib/bbad279 |
_version_ | 1785109115604828160 |
---|---|
author | Cho, Hunyong Qu, Yixiang Liu, Chuwen Tang, Boyang Lyu, Ruiqi Lin, Bridget M Roach, Jeffrey Azcarate-Peril, M Andrea Aguiar Ribeiro, Apoena Love, Michael I Divaris, Kimon Wu, Di |
author_facet | Cho, Hunyong Qu, Yixiang Liu, Chuwen Tang, Boyang Lyu, Ruiqi Lin, Bridget M Roach, Jeffrey Azcarate-Peril, M Andrea Aguiar Ribeiro, Apoena Love, Michael I Divaris, Kimon Wu, Di |
author_sort | Cho, Hunyong |
collection | PubMed |
description | Understanding the function of the human microbiome is important but the development of statistical methods specifically for the microbial gene expression (i.e. metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this gap, we undertook a comprehensive evaluation and benchmarking of 10 differential analysis methods for metatranscriptomics data. We used a combination of real and simulated data to evaluate performance (i.e. type I error, false discovery rate and sensitivity) of the following methods: log-normal (LN), logistic-beta (LB), MAST, DESeq2, metagenomeSeq, ANCOM-BC, LEfSe, ALDEx2, Kruskal–Wallis and two-part Kruskal–Wallis. The simulation was informed by supragingival biofilm microbiome data from 300 preschool-age children enrolled in a study of childhood dental disease (early childhood caries, ECC), whereas validations were sought in two additional datasets from the ECC study and an inflammatory bowel disease study. The LB test showed the highest sensitivity in both small and large samples and reasonably controlled type I error. Contrarily, MAST was hampered by inflated type I error. Upon application of the LN and LB tests in the ECC study, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with childhood dental disease. This comprehensive model evaluation offers practical guidance for selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics. Selection of an optimal method increases the possibility of detecting true signals while minimizing the chance of claiming false ones. |
format | Online Article Text |
id | pubmed-10516371 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-105163712023-09-23 Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data Cho, Hunyong Qu, Yixiang Liu, Chuwen Tang, Boyang Lyu, Ruiqi Lin, Bridget M Roach, Jeffrey Azcarate-Peril, M Andrea Aguiar Ribeiro, Apoena Love, Michael I Divaris, Kimon Wu, Di Brief Bioinform Problem Solving Protocol Understanding the function of the human microbiome is important but the development of statistical methods specifically for the microbial gene expression (i.e. metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this gap, we undertook a comprehensive evaluation and benchmarking of 10 differential analysis methods for metatranscriptomics data. We used a combination of real and simulated data to evaluate performance (i.e. type I error, false discovery rate and sensitivity) of the following methods: log-normal (LN), logistic-beta (LB), MAST, DESeq2, metagenomeSeq, ANCOM-BC, LEfSe, ALDEx2, Kruskal–Wallis and two-part Kruskal–Wallis. The simulation was informed by supragingival biofilm microbiome data from 300 preschool-age children enrolled in a study of childhood dental disease (early childhood caries, ECC), whereas validations were sought in two additional datasets from the ECC study and an inflammatory bowel disease study. The LB test showed the highest sensitivity in both small and large samples and reasonably controlled type I error. Contrarily, MAST was hampered by inflated type I error. Upon application of the LN and LB tests in the ECC study, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with childhood dental disease. This comprehensive model evaluation offers practical guidance for selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics. Selection of an optimal method increases the possibility of detecting true signals while minimizing the chance of claiming false ones. Oxford University Press 2023-08-09 /pmc/articles/PMC10516371/ /pubmed/37738402 http://dx.doi.org/10.1093/bib/bbad279 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Problem Solving Protocol Cho, Hunyong Qu, Yixiang Liu, Chuwen Tang, Boyang Lyu, Ruiqi Lin, Bridget M Roach, Jeffrey Azcarate-Peril, M Andrea Aguiar Ribeiro, Apoena Love, Michael I Divaris, Kimon Wu, Di Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data |
title | Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data |
title_full | Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data |
title_fullStr | Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data |
title_full_unstemmed | Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data |
title_short | Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data |
title_sort | comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data |
topic | Problem Solving Protocol |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516371/ https://www.ncbi.nlm.nih.gov/pubmed/37738402 http://dx.doi.org/10.1093/bib/bbad279 |
work_keys_str_mv | AT chohunyong comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata AT quyixiang comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata AT liuchuwen comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata AT tangboyang comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata AT lyuruiqi comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata AT linbridgetm comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata AT roachjeffrey comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata AT azcarateperilmandrea comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata AT aguiarribeiroapoena comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata AT lovemichaeli comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata AT divariskimon comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata AT wudi comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata |