Cargando…
Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing
Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity an...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8598306/ https://www.ncbi.nlm.nih.gov/pubmed/34805988 http://dx.doi.org/10.1093/nargab/lqab104 |
_version_ | 1784600794599784448 |
---|---|
author | Park, Youngjun Hauschild, Anne-Christin Heider, Dominik |
author_facet | Park, Youngjun Hauschild, Anne-Christin Heider, Dominik |
author_sort | Park, Youngjun |
collection | PubMed |
description | Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning analysis. Here, we present a meta-transfer learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes. Few-shot learning algorithms integrate meta-learning to overcome data scarcity and data heterogeneity by transferring molecular pattern recognition models from datasets of unrelated domains. We explore few-shot learning models with large scale public dataset, TCGA (The Cancer Genome Atlas) and GTEx dataset, and demonstrate their potential as pre-training dataset in other molecular pattern recognition tasks. Our results show that meta-transfer learning is very effective for datasets with a limited sample size. Furthermore, we show that our approach can transfer knowledge across technological heterogeneity, for example, from bulk cell to single-cell data. Our approach can overcome study size constraints, batch effects and technical limitations in analyzing single-cell data by leveraging existing bulk-cell sequencing data. |
format | Online Article Text |
id | pubmed-8598306 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-85983062021-11-18 Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing Park, Youngjun Hauschild, Anne-Christin Heider, Dominik NAR Genom Bioinform Standard Article Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning analysis. Here, we present a meta-transfer learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes. Few-shot learning algorithms integrate meta-learning to overcome data scarcity and data heterogeneity by transferring molecular pattern recognition models from datasets of unrelated domains. We explore few-shot learning models with large scale public dataset, TCGA (The Cancer Genome Atlas) and GTEx dataset, and demonstrate their potential as pre-training dataset in other molecular pattern recognition tasks. Our results show that meta-transfer learning is very effective for datasets with a limited sample size. Furthermore, we show that our approach can transfer knowledge across technological heterogeneity, for example, from bulk cell to single-cell data. Our approach can overcome study size constraints, batch effects and technical limitations in analyzing single-cell data by leveraging existing bulk-cell sequencing data. Oxford University Press 2021-11-12 /pmc/articles/PMC8598306/ /pubmed/34805988 http://dx.doi.org/10.1093/nargab/lqab104 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Standard Article Park, Youngjun Hauschild, Anne-Christin Heider, Dominik Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing |
title | Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing |
title_full | Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing |
title_fullStr | Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing |
title_full_unstemmed | Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing |
title_short | Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing |
title_sort | transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing |
topic | Standard Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8598306/ https://www.ncbi.nlm.nih.gov/pubmed/34805988 http://dx.doi.org/10.1093/nargab/lqab104 |
work_keys_str_mv | AT parkyoungjun transferlearningcompensateslimiteddatabatcheffectsandtechnologicalheterogeneityinsinglecellsequencing AT hauschildannechristin transferlearningcompensateslimiteddatabatcheffectsandtechnologicalheterogeneityinsinglecellsequencing AT heiderdominik transferlearningcompensateslimiteddatabatcheffectsandtechnologicalheterogeneityinsinglecellsequencing |