Cargando…

Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results

BACKGROUND: 16S rRNA-gene sequencing is a valuable approach to characterize the taxonomic content of the whole bacterial population inhabiting a metabolic and spatial niche, providing an important opportunity to study bacteria and their role in many health and environmental mechanisms. The analysis...

Descripción completa

Detalles Bibliográficos
Autores principales: Baruzzo, Giacomo, Patuzzi, Ilaria, Di Camillo, Barbara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8822630/
https://www.ncbi.nlm.nih.gov/pubmed/35130833
http://dx.doi.org/10.1186/s12859-022-04587-0
_version_ 1784646634331701248
author Baruzzo, Giacomo
Patuzzi, Ilaria
Di Camillo, Barbara
author_facet Baruzzo, Giacomo
Patuzzi, Ilaria
Di Camillo, Barbara
author_sort Baruzzo, Giacomo
collection PubMed
description BACKGROUND: 16S rRNA-gene sequencing is a valuable approach to characterize the taxonomic content of the whole bacterial population inhabiting a metabolic and spatial niche, providing an important opportunity to study bacteria and their role in many health and environmental mechanisms. The analysis of data produced by amplicon sequencing, however, brings very specific methodological issues that need to be properly addressed to obtain reliable biological conclusions. Among these, 16S count data tend to be very sparse, with many null values reflecting species that are present but got unobserved due to the multiplexing constraints. However, current data workflows do not consider a step in which the information about unobserved species is recovered. RESULTS: In this work, we evaluate for the first time the effects of introducing in the 16S data workflow a new preprocessing step, zero-imputation, to recover this lost information. Due to the lack of published zero-imputation methods specifically designed for 16S count data, we considered a set of zero-imputation strategies available for other frameworks, and benchmarked them using in silico 16S count data reflecting different experimental designs. Additionally, we assessed the effect of combining zero-imputation and normalization, i.e. the only preprocessing step in current 16S workflow. Overall, we benchmarked 35 16S preprocessing pipelines assessing their ability to handle data sparsity, identify species presence/absence, recovery sample proportional abundance distributions, and improve typical downstream analyses such as computation of alpha and beta diversity indices and differential abundance analysis. CONCLUSIONS: The results clearly show that 16S data analysis greatly benefits from a properly-performed zero-imputation step, despite the choice of the right zero-imputation method having a pivotal role. In addition, we identify a set of best-performing pipelines that could be a valuable indication for data analysts. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04587-0.
format Online
Article
Text
id pubmed-8822630
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-88226302022-02-08 Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results Baruzzo, Giacomo Patuzzi, Ilaria Di Camillo, Barbara BMC Bioinformatics Research BACKGROUND: 16S rRNA-gene sequencing is a valuable approach to characterize the taxonomic content of the whole bacterial population inhabiting a metabolic and spatial niche, providing an important opportunity to study bacteria and their role in many health and environmental mechanisms. The analysis of data produced by amplicon sequencing, however, brings very specific methodological issues that need to be properly addressed to obtain reliable biological conclusions. Among these, 16S count data tend to be very sparse, with many null values reflecting species that are present but got unobserved due to the multiplexing constraints. However, current data workflows do not consider a step in which the information about unobserved species is recovered. RESULTS: In this work, we evaluate for the first time the effects of introducing in the 16S data workflow a new preprocessing step, zero-imputation, to recover this lost information. Due to the lack of published zero-imputation methods specifically designed for 16S count data, we considered a set of zero-imputation strategies available for other frameworks, and benchmarked them using in silico 16S count data reflecting different experimental designs. Additionally, we assessed the effect of combining zero-imputation and normalization, i.e. the only preprocessing step in current 16S workflow. Overall, we benchmarked 35 16S preprocessing pipelines assessing their ability to handle data sparsity, identify species presence/absence, recovery sample proportional abundance distributions, and improve typical downstream analyses such as computation of alpha and beta diversity indices and differential abundance analysis. CONCLUSIONS: The results clearly show that 16S data analysis greatly benefits from a properly-performed zero-imputation step, despite the choice of the right zero-imputation method having a pivotal role. In addition, we identify a set of best-performing pipelines that could be a valuable indication for data analysts. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04587-0. BioMed Central 2022-02-07 /pmc/articles/PMC8822630/ /pubmed/35130833 http://dx.doi.org/10.1186/s12859-022-04587-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Baruzzo, Giacomo
Patuzzi, Ilaria
Di Camillo, Barbara
Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results
title Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results
title_full Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results
title_fullStr Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results
title_full_unstemmed Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results
title_short Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results
title_sort beware to ignore the rare: how imputing zero-values can improve the quality of 16s rrna gene studies results
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8822630/
https://www.ncbi.nlm.nih.gov/pubmed/35130833
http://dx.doi.org/10.1186/s12859-022-04587-0
work_keys_str_mv AT baruzzogiacomo bewaretoignoretherarehowimputingzerovaluescanimprovethequalityof16srrnagenestudiesresults
AT patuzziilaria bewaretoignoretherarehowimputingzerovaluescanimprovethequalityof16srrnagenestudiesresults
AT dicamillobarbara bewaretoignoretherarehowimputingzerovaluescanimprovethequalityof16srrnagenestudiesresults