Cargando…

Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing

Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity an...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Youngjun, Hauschild, Anne-Christin, Heider, Dominik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8598306/
https://www.ncbi.nlm.nih.gov/pubmed/34805988
http://dx.doi.org/10.1093/nargab/lqab104
_version_ 1784600794599784448
author Park, Youngjun
Hauschild, Anne-Christin
Heider, Dominik
author_facet Park, Youngjun
Hauschild, Anne-Christin
Heider, Dominik
author_sort Park, Youngjun
collection PubMed
description Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning analysis. Here, we present a meta-transfer learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes. Few-shot learning algorithms integrate meta-learning to overcome data scarcity and data heterogeneity by transferring molecular pattern recognition models from datasets of unrelated domains. We explore few-shot learning models with large scale public dataset, TCGA (The Cancer Genome Atlas) and GTEx dataset, and demonstrate their potential as pre-training dataset in other molecular pattern recognition tasks. Our results show that meta-transfer learning is very effective for datasets with a limited sample size. Furthermore, we show that our approach can transfer knowledge across technological heterogeneity, for example, from bulk cell to single-cell data. Our approach can overcome study size constraints, batch effects and technical limitations in analyzing single-cell data by leveraging existing bulk-cell sequencing data.
format Online
Article
Text
id pubmed-8598306
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85983062021-11-18 Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing Park, Youngjun Hauschild, Anne-Christin Heider, Dominik NAR Genom Bioinform Standard Article Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning analysis. Here, we present a meta-transfer learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes. Few-shot learning algorithms integrate meta-learning to overcome data scarcity and data heterogeneity by transferring molecular pattern recognition models from datasets of unrelated domains. We explore few-shot learning models with large scale public dataset, TCGA (The Cancer Genome Atlas) and GTEx dataset, and demonstrate their potential as pre-training dataset in other molecular pattern recognition tasks. Our results show that meta-transfer learning is very effective for datasets with a limited sample size. Furthermore, we show that our approach can transfer knowledge across technological heterogeneity, for example, from bulk cell to single-cell data. Our approach can overcome study size constraints, batch effects and technical limitations in analyzing single-cell data by leveraging existing bulk-cell sequencing data. Oxford University Press 2021-11-12 /pmc/articles/PMC8598306/ /pubmed/34805988 http://dx.doi.org/10.1093/nargab/lqab104 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Park, Youngjun
Hauschild, Anne-Christin
Heider, Dominik
Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing
title Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing
title_full Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing
title_fullStr Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing
title_full_unstemmed Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing
title_short Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing
title_sort transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8598306/
https://www.ncbi.nlm.nih.gov/pubmed/34805988
http://dx.doi.org/10.1093/nargab/lqab104
work_keys_str_mv AT parkyoungjun transferlearningcompensateslimiteddatabatcheffectsandtechnologicalheterogeneityinsinglecellsequencing
AT hauschildannechristin transferlearningcompensateslimiteddatabatcheffectsandtechnologicalheterogeneityinsinglecellsequencing
AT heiderdominik transferlearningcompensateslimiteddatabatcheffectsandtechnologicalheterogeneityinsinglecellsequencing