Cargando…

The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes

The advances in high-throughput sequencing (HTS) have enabled the characterisation of biological processes at an unprecedented level of detail; most hypotheses in molecular biology rely on analyses of HTS data. However, achieving increased robustness and reproducibility of results remains a main cha...

Descripción completa

Detalles Bibliográficos
Autores principales: Williams, Eleanor C., Chazarra-Gil, Ruben, Shahsavari, Arash, Mohorianu, Irina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9777937/
https://www.ncbi.nlm.nih.gov/pubmed/36553532
http://dx.doi.org/10.3390/genes13122265
_version_ 1784856230925172736
author Williams, Eleanor C.
Chazarra-Gil, Ruben
Shahsavari, Arash
Mohorianu, Irina
author_facet Williams, Eleanor C.
Chazarra-Gil, Ruben
Shahsavari, Arash
Mohorianu, Irina
author_sort Williams, Eleanor C.
collection PubMed
description The advances in high-throughput sequencing (HTS) have enabled the characterisation of biological processes at an unprecedented level of detail; most hypotheses in molecular biology rely on analyses of HTS data. However, achieving increased robustness and reproducibility of results remains a main challenge. Although variability in results may be introduced at various stages, e.g., alignment, summarisation or detection of differential expression, one source of variability was systematically omitted: the sequencing design, which propagates through analyses and may introduce an additional layer of technical variation. We illustrate qualitative and quantitative differences arising from splitting samples across lanes on bulk and single-cell sequencing. For bulk mRNAseq data, we focus on differential expression and enrichment analyses; for bulk ChIPseq data, we investigate the effect on peak calling and the peaks’ properties. At the single-cell level, we concentrate on identifying cell subpopulations. We rely on markers used for assigning cell identities; both smartSeq and 10× data are presented. The observed reduction in the number of unique sequenced fragments limits the level of detail on which the different prediction approaches depend. Furthermore, the sequencing stochasticity adds in a weighting bias corroborated with variable sequencing depths and (yet unexplained) sequencing bias. Subsequently, we observe an overall reduction in sequencing complexity and a distortion in the biological signal across technologies, experimental contexts, organisms and tissues.
format Online
Article
Text
id pubmed-9777937
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-97779372022-12-23 The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes Williams, Eleanor C. Chazarra-Gil, Ruben Shahsavari, Arash Mohorianu, Irina Genes (Basel) Article The advances in high-throughput sequencing (HTS) have enabled the characterisation of biological processes at an unprecedented level of detail; most hypotheses in molecular biology rely on analyses of HTS data. However, achieving increased robustness and reproducibility of results remains a main challenge. Although variability in results may be introduced at various stages, e.g., alignment, summarisation or detection of differential expression, one source of variability was systematically omitted: the sequencing design, which propagates through analyses and may introduce an additional layer of technical variation. We illustrate qualitative and quantitative differences arising from splitting samples across lanes on bulk and single-cell sequencing. For bulk mRNAseq data, we focus on differential expression and enrichment analyses; for bulk ChIPseq data, we investigate the effect on peak calling and the peaks’ properties. At the single-cell level, we concentrate on identifying cell subpopulations. We rely on markers used for assigning cell identities; both smartSeq and 10× data are presented. The observed reduction in the number of unique sequenced fragments limits the level of detail on which the different prediction approaches depend. Furthermore, the sequencing stochasticity adds in a weighting bias corroborated with variable sequencing depths and (yet unexplained) sequencing bias. Subsequently, we observe an overall reduction in sequencing complexity and a distortion in the biological signal across technologies, experimental contexts, organisms and tissues. MDPI 2022-12-01 /pmc/articles/PMC9777937/ /pubmed/36553532 http://dx.doi.org/10.3390/genes13122265 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Williams, Eleanor C.
Chazarra-Gil, Ruben
Shahsavari, Arash
Mohorianu, Irina
The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes
title The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes
title_full The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes
title_fullStr The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes
title_full_unstemmed The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes
title_short The Sum of Two Halves May Be Different from the Whole—Effects of Splitting Sequencing Samples Across Lanes
title_sort sum of two halves may be different from the whole—effects of splitting sequencing samples across lanes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9777937/
https://www.ncbi.nlm.nih.gov/pubmed/36553532
http://dx.doi.org/10.3390/genes13122265
work_keys_str_mv AT williamseleanorc thesumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes
AT chazarragilruben thesumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes
AT shahsavariarash thesumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes
AT mohorianuirina thesumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes
AT williamseleanorc sumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes
AT chazarragilruben sumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes
AT shahsavariarash sumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes
AT mohorianuirina sumoftwohalvesmaybedifferentfromthewholeeffectsofsplittingsequencingsamplesacrosslanes