Cargando…

Adjustments to the reference dataset design improve cell type label transfer

The transfer of cell type labels from pre-annotated (reference) to newly collected data is an important task in single-cell data analysis. As the number of publicly available annotated datasets which can be used as reference, as well as the number of computational methods for cell type label transfe...

Descripción completa

Detalles Bibliográficos
Autores principales: Mölbert, Carla, Haghverdi, Laleh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10114588/
https://www.ncbi.nlm.nih.gov/pubmed/37091908
http://dx.doi.org/10.3389/fbinf.2023.1150099
_version_ 1785028043709874176
author Mölbert, Carla
Haghverdi, Laleh
author_facet Mölbert, Carla
Haghverdi, Laleh
author_sort Mölbert, Carla
collection PubMed
description The transfer of cell type labels from pre-annotated (reference) to newly collected data is an important task in single-cell data analysis. As the number of publicly available annotated datasets which can be used as reference, as well as the number of computational methods for cell type label transfer are constantly growing, rationals to understand and decide which reference design and which method to use for a particular query dataset are needed. Using detailed data visualisations and interpretable statistical assessments, we benchmark a set of popular cell type annotation methods, test their performance on different cell types and study the effects of the design of reference data (e.g., cell sampling criteria, inclusion of multiple datasets in one reference, gene set selection) on the reliability of predictions. Our results highlight the need for further improvements in label transfer methods, as well as preparation of high-quality pre-annotated reference data of adequate sampling from all cell types of interest, for more reliable annotation of new datasets.
format Online
Article
Text
id pubmed-10114588
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-101145882023-04-20 Adjustments to the reference dataset design improve cell type label transfer Mölbert, Carla Haghverdi, Laleh Front Bioinform Bioinformatics The transfer of cell type labels from pre-annotated (reference) to newly collected data is an important task in single-cell data analysis. As the number of publicly available annotated datasets which can be used as reference, as well as the number of computational methods for cell type label transfer are constantly growing, rationals to understand and decide which reference design and which method to use for a particular query dataset are needed. Using detailed data visualisations and interpretable statistical assessments, we benchmark a set of popular cell type annotation methods, test their performance on different cell types and study the effects of the design of reference data (e.g., cell sampling criteria, inclusion of multiple datasets in one reference, gene set selection) on the reliability of predictions. Our results highlight the need for further improvements in label transfer methods, as well as preparation of high-quality pre-annotated reference data of adequate sampling from all cell types of interest, for more reliable annotation of new datasets. Frontiers Media S.A. 2023-04-05 /pmc/articles/PMC10114588/ /pubmed/37091908 http://dx.doi.org/10.3389/fbinf.2023.1150099 Text en Copyright © 2023 Mölbert and Haghverdi. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioinformatics
Mölbert, Carla
Haghverdi, Laleh
Adjustments to the reference dataset design improve cell type label transfer
title Adjustments to the reference dataset design improve cell type label transfer
title_full Adjustments to the reference dataset design improve cell type label transfer
title_fullStr Adjustments to the reference dataset design improve cell type label transfer
title_full_unstemmed Adjustments to the reference dataset design improve cell type label transfer
title_short Adjustments to the reference dataset design improve cell type label transfer
title_sort adjustments to the reference dataset design improve cell type label transfer
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10114588/
https://www.ncbi.nlm.nih.gov/pubmed/37091908
http://dx.doi.org/10.3389/fbinf.2023.1150099
work_keys_str_mv AT molbertcarla adjustmentstothereferencedatasetdesignimprovecelltypelabeltransfer
AT haghverdilaleh adjustmentstothereferencedatasetdesignimprovecelltypelabeltransfer