Cargando…

Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data

Understanding the function of the human microbiome is important but the development of statistical methods specifically for the microbial gene expression (i.e. metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different da...

Descripción completa

Detalles Bibliográficos
Autores principales: Cho, Hunyong, Qu, Yixiang, Liu, Chuwen, Tang, Boyang, Lyu, Ruiqi, Lin, Bridget M, Roach, Jeffrey, Azcarate-Peril, M Andrea, Aguiar Ribeiro, Apoena, Love, Michael I, Divaris, Kimon, Wu, Di
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516371/
https://www.ncbi.nlm.nih.gov/pubmed/37738402
http://dx.doi.org/10.1093/bib/bbad279
_version_ 1785109115604828160
author Cho, Hunyong
Qu, Yixiang
Liu, Chuwen
Tang, Boyang
Lyu, Ruiqi
Lin, Bridget M
Roach, Jeffrey
Azcarate-Peril, M Andrea
Aguiar Ribeiro, Apoena
Love, Michael I
Divaris, Kimon
Wu, Di
author_facet Cho, Hunyong
Qu, Yixiang
Liu, Chuwen
Tang, Boyang
Lyu, Ruiqi
Lin, Bridget M
Roach, Jeffrey
Azcarate-Peril, M Andrea
Aguiar Ribeiro, Apoena
Love, Michael I
Divaris, Kimon
Wu, Di
author_sort Cho, Hunyong
collection PubMed
description Understanding the function of the human microbiome is important but the development of statistical methods specifically for the microbial gene expression (i.e. metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this gap, we undertook a comprehensive evaluation and benchmarking of 10 differential analysis methods for metatranscriptomics data. We used a combination of real and simulated data to evaluate performance (i.e. type I error, false discovery rate and sensitivity) of the following methods: log-normal (LN), logistic-beta (LB), MAST, DESeq2, metagenomeSeq, ANCOM-BC, LEfSe, ALDEx2, Kruskal–Wallis and two-part Kruskal–Wallis. The simulation was informed by supragingival biofilm microbiome data from 300 preschool-age children enrolled in a study of childhood dental disease (early childhood caries, ECC), whereas validations were sought in two additional datasets from the ECC study and an inflammatory bowel disease study. The LB test showed the highest sensitivity in both small and large samples and reasonably controlled type I error. Contrarily, MAST was hampered by inflated type I error. Upon application of the LN and LB tests in the ECC study, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with childhood dental disease. This comprehensive model evaluation offers practical guidance for selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics. Selection of an optimal method increases the possibility of detecting true signals while minimizing the chance of claiming false ones.
format Online
Article
Text
id pubmed-10516371
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105163712023-09-23 Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data Cho, Hunyong Qu, Yixiang Liu, Chuwen Tang, Boyang Lyu, Ruiqi Lin, Bridget M Roach, Jeffrey Azcarate-Peril, M Andrea Aguiar Ribeiro, Apoena Love, Michael I Divaris, Kimon Wu, Di Brief Bioinform Problem Solving Protocol Understanding the function of the human microbiome is important but the development of statistical methods specifically for the microbial gene expression (i.e. metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this gap, we undertook a comprehensive evaluation and benchmarking of 10 differential analysis methods for metatranscriptomics data. We used a combination of real and simulated data to evaluate performance (i.e. type I error, false discovery rate and sensitivity) of the following methods: log-normal (LN), logistic-beta (LB), MAST, DESeq2, metagenomeSeq, ANCOM-BC, LEfSe, ALDEx2, Kruskal–Wallis and two-part Kruskal–Wallis. The simulation was informed by supragingival biofilm microbiome data from 300 preschool-age children enrolled in a study of childhood dental disease (early childhood caries, ECC), whereas validations were sought in two additional datasets from the ECC study and an inflammatory bowel disease study. The LB test showed the highest sensitivity in both small and large samples and reasonably controlled type I error. Contrarily, MAST was hampered by inflated type I error. Upon application of the LN and LB tests in the ECC study, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with childhood dental disease. This comprehensive model evaluation offers practical guidance for selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics. Selection of an optimal method increases the possibility of detecting true signals while minimizing the chance of claiming false ones. Oxford University Press 2023-08-09 /pmc/articles/PMC10516371/ /pubmed/37738402 http://dx.doi.org/10.1093/bib/bbad279 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Cho, Hunyong
Qu, Yixiang
Liu, Chuwen
Tang, Boyang
Lyu, Ruiqi
Lin, Bridget M
Roach, Jeffrey
Azcarate-Peril, M Andrea
Aguiar Ribeiro, Apoena
Love, Michael I
Divaris, Kimon
Wu, Di
Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data
title Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data
title_full Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data
title_fullStr Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data
title_full_unstemmed Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data
title_short Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data
title_sort comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516371/
https://www.ncbi.nlm.nih.gov/pubmed/37738402
http://dx.doi.org/10.1093/bib/bbad279
work_keys_str_mv AT chohunyong comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata
AT quyixiang comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata
AT liuchuwen comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata
AT tangboyang comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata
AT lyuruiqi comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata
AT linbridgetm comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata
AT roachjeffrey comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata
AT azcarateperilmandrea comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata
AT aguiarribeiroapoena comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata
AT lovemichaeli comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata
AT divariskimon comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata
AT wudi comprehensiveevaluationofmethodsfordifferentialexpressionanalysisofmetatranscriptomicsdata