Cargando…

Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method

Advances in sequencing and assembly technology have led to the creation of genome assemblies for a wide variety of non-model organisms. The rapid production and proliferation of updated, novel assembly versions can create vexing problems for researchers when multiple-genome assembly versions are ava...

Descripción completa

Detalles Bibliográficos
Autores principales: Potts, Helen G., Lemieux, Madeleine E., Rice, Edward S., Warren, Wesley, Choudhury, Robin P., Mommersteeg, Mathilda T. M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8870202/
https://www.ncbi.nlm.nih.gov/pubmed/35203259
http://dx.doi.org/10.3390/cells11040608
_version_ 1784656681981968384
author Potts, Helen G.
Lemieux, Madeleine E.
Rice, Edward S.
Warren, Wesley
Choudhury, Robin P.
Mommersteeg, Mathilda T. M.
author_facet Potts, Helen G.
Lemieux, Madeleine E.
Rice, Edward S.
Warren, Wesley
Choudhury, Robin P.
Mommersteeg, Mathilda T. M.
author_sort Potts, Helen G.
collection PubMed
description Advances in sequencing and assembly technology have led to the creation of genome assemblies for a wide variety of non-model organisms. The rapid production and proliferation of updated, novel assembly versions can create vexing problems for researchers when multiple-genome assembly versions are available at once, requiring researchers to work with more than one reference genome. Multiple-genome assemblies are especially problematic for researchers studying the genetic makeup of individual cells, as single-cell RNA sequencing (scRNAseq) requires sequenced reads to be mapped and aligned to a single reference genome. Using the Astyanax mexicanus, this study highlights how the interpretation of a single-cell dataset from the same sample changes when aligned to its two different available genome assemblies. We found that the number of cells and expressed genes detected were drastically different when aligning to the different assemblies. When the genome assemblies were used in isolation with their respective annotations, cell-type identification was confounded, as some classic cell-type markers were assembly-specific, whilst other genes showed differential patterns of expression between the two assemblies. To overcome the problems posed by multiple-genome assemblies, we propose that researchers align to each available assembly and then integrate the resultant datasets to produce a final dataset in which all genome alignments can be used simultaneously. We found that this approach increased the accuracy of cell-type identification and maximised the amount of data that could be extracted from our single-cell sample by capturing all possible cells and transcripts. As scRNAseq becomes more widely available, it is imperative that the single-cell community is aware of how genome assembly alignment can alter single-cell data and their interpretation, especially when reviewing studies on non-model organisms.
format Online
Article
Text
id pubmed-8870202
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-88702022022-02-25 Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method Potts, Helen G. Lemieux, Madeleine E. Rice, Edward S. Warren, Wesley Choudhury, Robin P. Mommersteeg, Mathilda T. M. Cells Article Advances in sequencing and assembly technology have led to the creation of genome assemblies for a wide variety of non-model organisms. The rapid production and proliferation of updated, novel assembly versions can create vexing problems for researchers when multiple-genome assembly versions are available at once, requiring researchers to work with more than one reference genome. Multiple-genome assemblies are especially problematic for researchers studying the genetic makeup of individual cells, as single-cell RNA sequencing (scRNAseq) requires sequenced reads to be mapped and aligned to a single reference genome. Using the Astyanax mexicanus, this study highlights how the interpretation of a single-cell dataset from the same sample changes when aligned to its two different available genome assemblies. We found that the number of cells and expressed genes detected were drastically different when aligning to the different assemblies. When the genome assemblies were used in isolation with their respective annotations, cell-type identification was confounded, as some classic cell-type markers were assembly-specific, whilst other genes showed differential patterns of expression between the two assemblies. To overcome the problems posed by multiple-genome assemblies, we propose that researchers align to each available assembly and then integrate the resultant datasets to produce a final dataset in which all genome alignments can be used simultaneously. We found that this approach increased the accuracy of cell-type identification and maximised the amount of data that could be extracted from our single-cell sample by capturing all possible cells and transcripts. As scRNAseq becomes more widely available, it is imperative that the single-cell community is aware of how genome assembly alignment can alter single-cell data and their interpretation, especially when reviewing studies on non-model organisms. MDPI 2022-02-10 /pmc/articles/PMC8870202/ /pubmed/35203259 http://dx.doi.org/10.3390/cells11040608 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Potts, Helen G.
Lemieux, Madeleine E.
Rice, Edward S.
Warren, Wesley
Choudhury, Robin P.
Mommersteeg, Mathilda T. M.
Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method
title Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method
title_full Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method
title_fullStr Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method
title_full_unstemmed Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method
title_short Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method
title_sort discordant genome assemblies drastically alter the interpretation of single-cell rna sequencing data which can be mitigated by a novel integration method
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8870202/
https://www.ncbi.nlm.nih.gov/pubmed/35203259
http://dx.doi.org/10.3390/cells11040608
work_keys_str_mv AT pottsheleng discordantgenomeassembliesdrasticallyaltertheinterpretationofsinglecellrnasequencingdatawhichcanbemitigatedbyanovelintegrationmethod
AT lemieuxmadeleinee discordantgenomeassembliesdrasticallyaltertheinterpretationofsinglecellrnasequencingdatawhichcanbemitigatedbyanovelintegrationmethod
AT riceedwards discordantgenomeassembliesdrasticallyaltertheinterpretationofsinglecellrnasequencingdatawhichcanbemitigatedbyanovelintegrationmethod
AT warrenwesley discordantgenomeassembliesdrasticallyaltertheinterpretationofsinglecellrnasequencingdatawhichcanbemitigatedbyanovelintegrationmethod
AT choudhuryrobinp discordantgenomeassembliesdrasticallyaltertheinterpretationofsinglecellrnasequencingdatawhichcanbemitigatedbyanovelintegrationmethod
AT mommersteegmathildatm discordantgenomeassembliesdrasticallyaltertheinterpretationofsinglecellrnasequencingdatawhichcanbemitigatedbyanovelintegrationmethod