Cargando…
Adjustments to the reference dataset design improve cell type label transfer
The transfer of cell type labels from pre-annotated (reference) to newly collected data is an important task in single-cell data analysis. As the number of publicly available annotated datasets which can be used as reference, as well as the number of computational methods for cell type label transfe...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10114588/ https://www.ncbi.nlm.nih.gov/pubmed/37091908 http://dx.doi.org/10.3389/fbinf.2023.1150099 |
_version_ | 1785028043709874176 |
---|---|
author | Mölbert, Carla Haghverdi, Laleh |
author_facet | Mölbert, Carla Haghverdi, Laleh |
author_sort | Mölbert, Carla |
collection | PubMed |
description | The transfer of cell type labels from pre-annotated (reference) to newly collected data is an important task in single-cell data analysis. As the number of publicly available annotated datasets which can be used as reference, as well as the number of computational methods for cell type label transfer are constantly growing, rationals to understand and decide which reference design and which method to use for a particular query dataset are needed. Using detailed data visualisations and interpretable statistical assessments, we benchmark a set of popular cell type annotation methods, test their performance on different cell types and study the effects of the design of reference data (e.g., cell sampling criteria, inclusion of multiple datasets in one reference, gene set selection) on the reliability of predictions. Our results highlight the need for further improvements in label transfer methods, as well as preparation of high-quality pre-annotated reference data of adequate sampling from all cell types of interest, for more reliable annotation of new datasets. |
format | Online Article Text |
id | pubmed-10114588 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-101145882023-04-20 Adjustments to the reference dataset design improve cell type label transfer Mölbert, Carla Haghverdi, Laleh Front Bioinform Bioinformatics The transfer of cell type labels from pre-annotated (reference) to newly collected data is an important task in single-cell data analysis. As the number of publicly available annotated datasets which can be used as reference, as well as the number of computational methods for cell type label transfer are constantly growing, rationals to understand and decide which reference design and which method to use for a particular query dataset are needed. Using detailed data visualisations and interpretable statistical assessments, we benchmark a set of popular cell type annotation methods, test their performance on different cell types and study the effects of the design of reference data (e.g., cell sampling criteria, inclusion of multiple datasets in one reference, gene set selection) on the reliability of predictions. Our results highlight the need for further improvements in label transfer methods, as well as preparation of high-quality pre-annotated reference data of adequate sampling from all cell types of interest, for more reliable annotation of new datasets. Frontiers Media S.A. 2023-04-05 /pmc/articles/PMC10114588/ /pubmed/37091908 http://dx.doi.org/10.3389/fbinf.2023.1150099 Text en Copyright © 2023 Mölbert and Haghverdi. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioinformatics Mölbert, Carla Haghverdi, Laleh Adjustments to the reference dataset design improve cell type label transfer |
title | Adjustments to the reference dataset design improve cell type label transfer |
title_full | Adjustments to the reference dataset design improve cell type label transfer |
title_fullStr | Adjustments to the reference dataset design improve cell type label transfer |
title_full_unstemmed | Adjustments to the reference dataset design improve cell type label transfer |
title_short | Adjustments to the reference dataset design improve cell type label transfer |
title_sort | adjustments to the reference dataset design improve cell type label transfer |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10114588/ https://www.ncbi.nlm.nih.gov/pubmed/37091908 http://dx.doi.org/10.3389/fbinf.2023.1150099 |
work_keys_str_mv | AT molbertcarla adjustmentstothereferencedatasetdesignimprovecelltypelabeltransfer AT haghverdilaleh adjustmentstothereferencedatasetdesignimprovecelltypelabeltransfer |