Cargando…

Expression-based species deconvolution and realignment removes misalignment error in multispecies single-cell data

BACKGROUND: Although single-cell RNA sequencing of xenograft samples has been widely used, no comprehensive bioinformatics pipeline is available for human and mouse mixed single-cell analyses. Considering the numerous homologous genes across the human and mouse genomes, misalignment errors should be...

Descripción completa

Detalles Bibliográficos
Autores principales: Choi, Jaeyong, Lee, Woochan, Yoon, Jung-Ki, Choi, Sun Mi, Lee, Chang-Hoon, Moon, Hyeong-Gon, Cho, Sukki, Chung, Jin-Haeng, Yang, Han-Kwang, Kim, Jong-Il
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9063264/
https://www.ncbi.nlm.nih.gov/pubmed/35501695
http://dx.doi.org/10.1186/s12859-022-04676-0
Descripción
Sumario:BACKGROUND: Although single-cell RNA sequencing of xenograft samples has been widely used, no comprehensive bioinformatics pipeline is available for human and mouse mixed single-cell analyses. Considering the numerous homologous genes across the human and mouse genomes, misalignment errors should be evaluated, and a new algorithm is required. We assessed the extents and effects of misalignment errors and exonic multi-mapping events when using human and mouse combined reference data and developed a new bioinformatics pipeline with expression-based species deconvolution to minimize errors. We also evaluated false-positive signals presumed to originate from ambient RNA of the other species and address the importance to computationally remove them. RESULT: Error when using combined reference account for an average of 0.78% of total reads, but such reads were concentrated to few genes that were greatly affected. Human and mouse mixed single-cell data, analyzed using our pipeline, clustered well with unmixed data and showed higher k-nearest-neighbor batch effect test and Local Inverse Simpson’s Index scores than those derived from Cell Ranger (10 × Genomics). We also applied our pipeline to multispecies multisample single-cell library containing breast cancer xenograft tissue and successfully identified all samples using genomic array and expression. Moreover, diverse cell types in the tumor microenvironment were well captured. CONCLUSION: We present our bioinformatics pipeline for mixed human and mouse single-cell data, which can also be applied to pooled libraries to obtain cost-effective single-cell data. We also address misalignment, multi-mapping error, and ambient RNA as a major consideration points when analyzing multispecies single-cell data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04676-0.