Cargando…

POIBM: batch correction of heterogeneous RNA-seq datasets through latent sample matching

MOTIVATION: RNA sequencing and other high-throughput technologies are essential in understanding complex diseases, such as cancers, but are susceptible to technical factors manifesting as patterns in the measurements. These batch patterns hinder the discovery of biologically relevant patterns. Unbia...

Descripción completa

Detalles Bibliográficos
Autores principales: Holmström, Susanna, Hautaniemi, Sampsa, Häkkinen, Antti
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9048693/
https://www.ncbi.nlm.nih.gov/pubmed/35199138
http://dx.doi.org/10.1093/bioinformatics/btac124
_version_ 1784695986888638464
author Holmström, Susanna
Hautaniemi, Sampsa
Häkkinen, Antti
author_facet Holmström, Susanna
Hautaniemi, Sampsa
Häkkinen, Antti
author_sort Holmström, Susanna
collection PubMed
description MOTIVATION: RNA sequencing and other high-throughput technologies are essential in understanding complex diseases, such as cancers, but are susceptible to technical factors manifesting as patterns in the measurements. These batch patterns hinder the discovery of biologically relevant patterns. Unbiased batch effect correction in heterogeneous populations currently requires special experimental designs or phenotypic labels, which are not readily available for patient samples in existing datasets. RESULTS: We present POIBM, an RNA-seq batch correction method, which learns virtual reference samples directly from the data. We use a breast cancer cell line dataset to show that POIBM exceeds or matches the performance of previous methods, while being blind to the phenotypes. Further, we analyze The Cancer Genome Atlas RNA-seq data to show that batch effects plague many cancer types; POIBM effectively discovers the true replicates in stomach adenocarcinoma; and integrating the corrected data in endometrial carcinoma improves cancer subtyping. AVAILABILITY AND IMPLEMENTATION: https://bitbucket.org/anthakki/poibm/ (archived at https://doi.org/10.5281/zenodo.6122436). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9048693
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-90486932022-04-29 POIBM: batch correction of heterogeneous RNA-seq datasets through latent sample matching Holmström, Susanna Hautaniemi, Sampsa Häkkinen, Antti Bioinformatics Original Papers MOTIVATION: RNA sequencing and other high-throughput technologies are essential in understanding complex diseases, such as cancers, but are susceptible to technical factors manifesting as patterns in the measurements. These batch patterns hinder the discovery of biologically relevant patterns. Unbiased batch effect correction in heterogeneous populations currently requires special experimental designs or phenotypic labels, which are not readily available for patient samples in existing datasets. RESULTS: We present POIBM, an RNA-seq batch correction method, which learns virtual reference samples directly from the data. We use a breast cancer cell line dataset to show that POIBM exceeds or matches the performance of previous methods, while being blind to the phenotypes. Further, we analyze The Cancer Genome Atlas RNA-seq data to show that batch effects plague many cancer types; POIBM effectively discovers the true replicates in stomach adenocarcinoma; and integrating the corrected data in endometrial carcinoma improves cancer subtyping. AVAILABILITY AND IMPLEMENTATION: https://bitbucket.org/anthakki/poibm/ (archived at https://doi.org/10.5281/zenodo.6122436). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-02-23 /pmc/articles/PMC9048693/ /pubmed/35199138 http://dx.doi.org/10.1093/bioinformatics/btac124 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Holmström, Susanna
Hautaniemi, Sampsa
Häkkinen, Antti
POIBM: batch correction of heterogeneous RNA-seq datasets through latent sample matching
title POIBM: batch correction of heterogeneous RNA-seq datasets through latent sample matching
title_full POIBM: batch correction of heterogeneous RNA-seq datasets through latent sample matching
title_fullStr POIBM: batch correction of heterogeneous RNA-seq datasets through latent sample matching
title_full_unstemmed POIBM: batch correction of heterogeneous RNA-seq datasets through latent sample matching
title_short POIBM: batch correction of heterogeneous RNA-seq datasets through latent sample matching
title_sort poibm: batch correction of heterogeneous rna-seq datasets through latent sample matching
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9048693/
https://www.ncbi.nlm.nih.gov/pubmed/35199138
http://dx.doi.org/10.1093/bioinformatics/btac124
work_keys_str_mv AT holmstromsusanna poibmbatchcorrectionofheterogeneousrnaseqdatasetsthroughlatentsamplematching
AT hautaniemisampsa poibmbatchcorrectionofheterogeneousrnaseqdatasetsthroughlatentsamplematching
AT hakkinenantti poibmbatchcorrectionofheterogeneousrnaseqdatasetsthroughlatentsamplematching