Cargando…
The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes
The advances in high-throughput sequencing (HTS) have enabled the characterisation of biological processes at an unprecedented level of detail; most hypotheses in molecular biology rely on analyses of HTS data. However, achieving increased robustness and reproducibility of results remains a main cha...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9777937/ https://www.ncbi.nlm.nih.gov/pubmed/36553532 http://dx.doi.org/10.3390/genes13122265 |
_version_ | 1784856230925172736 |
---|---|
author | Williams, Eleanor C. Chazarra-Gil, Ruben Shahsavari, Arash Mohorianu, Irina |
author_facet | Williams, Eleanor C. Chazarra-Gil, Ruben Shahsavari, Arash Mohorianu, Irina |
author_sort | Williams, Eleanor C. |
collection | PubMed |
description | The advances in high-throughput sequencing (HTS) have enabled the characterisation of biological processes at an unprecedented level of detail; most hypotheses in molecular biology rely on analyses of HTS data. However, achieving increased robustness and reproducibility of results remains a main challenge. Although variability in results may be introduced at various stages, e.g., alignment, summarisation or detection of differential expression, one source of variability was systematically omitted: the sequencing design, which propagates through analyses and may introduce an additional layer of technical variation. We illustrate qualitative and quantitative differences arising from splitting samples across lanes on bulk and single-cell sequencing. For bulk mRNAseq data, we focus on differential expression and enrichment analyses; for bulk ChIPseq data, we investigate the effect on peak calling and the peaks’ properties. At the single-cell level, we concentrate on identifying cell subpopulations. We rely on markers used for assigning cell identities; both smartSeq and 10× data are presented. The observed reduction in the number of unique sequenced fragments limits the level of detail on which the different prediction approaches depend. Furthermore, the sequencing stochasticity adds in a weighting bias corroborated with variable sequencing depths and (yet unexplained) sequencing bias. Subsequently, we observe an overall reduction in sequencing complexity and a distortion in the biological signal across technologies, experimental contexts, organisms and tissues. |
format | Online Article Text |
id | pubmed-9777937 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-97779372022-12-23 The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes Williams, Eleanor C. Chazarra-Gil, Ruben Shahsavari, Arash Mohorianu, Irina Genes (Basel) Article The advances in high-throughput sequencing (HTS) have enabled the characterisation of biological processes at an unprecedented level of detail; most hypotheses in molecular biology rely on analyses of HTS data. However, achieving increased robustness and reproducibility of results remains a main challenge. Although variability in results may be introduced at various stages, e.g., alignment, summarisation or detection of differential expression, one source of variability was systematically omitted: the sequencing design, which propagates through analyses and may introduce an additional layer of technical variation. We illustrate qualitative and quantitative differences arising from splitting samples across lanes on bulk and single-cell sequencing. For bulk mRNAseq data, we focus on differential expression and enrichment analyses; for bulk ChIPseq data, we investigate the effect on peak calling and the peaks’ properties. At the single-cell level, we concentrate on identifying cell subpopulations. We rely on markers used for assigning cell identities; both smartSeq and 10× data are presented. The observed reduction in the number of unique sequenced fragments limits the level of detail on which the different prediction approaches depend. Furthermore, the sequencing stochasticity adds in a weighting bias corroborated with variable sequencing depths and (yet unexplained) sequencing bias. Subsequently, we observe an overall reduction in sequencing complexity and a distortion in the biological signal across technologies, experimental contexts, organisms and tissues. MDPI 2022-12-01 /pmc/articles/PMC9777937/ /pubmed/36553532 http://dx.doi.org/10.3390/genes13122265 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Williams, Eleanor C. Chazarra-Gil, Ruben Shahsavari, Arash Mohorianu, Irina The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes |
title | The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes |
title_full | The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes |
title_fullStr | The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes |
title_full_unstemmed | The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes |
title_short | The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes |
title_sort | sum of two halves may be different from the whole—effects of splitting sequencing samples across lanes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9777937/ https://www.ncbi.nlm.nih.gov/pubmed/36553532 http://dx.doi.org/10.3390/genes13122265 |
work_keys_str_mv | AT williamseleanorc thesumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes AT chazarragilruben thesumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes AT shahsavariarash thesumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes AT mohorianuirina thesumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes AT williamseleanorc sumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes AT chazarragilruben sumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes AT shahsavariarash sumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes AT mohorianuirina sumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes |