Cargando…

Cell type matching across species using protein embeddings and transfer learning

MOTIVATION: Knowing the relation between cell types is crucial for translating experimental results from mice to humans. Establishing cell type matches, however, is hindered by the biological differences between the species. A substantial amount of evolutionary information between genes that could b...

Descripción completa

Detalles Bibliográficos
Autores principales: Biharie, Kirti, Michielsen, Lieke, Reinders, Marcel J T, Mahfouz, Ahmed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311290/
https://www.ncbi.nlm.nih.gov/pubmed/37387141
http://dx.doi.org/10.1093/bioinformatics/btad248
_version_ 1785066711262691328
author Biharie, Kirti
Michielsen, Lieke
Reinders, Marcel J T
Mahfouz, Ahmed
author_facet Biharie, Kirti
Michielsen, Lieke
Reinders, Marcel J T
Mahfouz, Ahmed
author_sort Biharie, Kirti
collection PubMed
description MOTIVATION: Knowing the relation between cell types is crucial for translating experimental results from mice to humans. Establishing cell type matches, however, is hindered by the biological differences between the species. A substantial amount of evolutionary information between genes that could be used to align the species is discarded by most of the current methods since they only use one-to-one orthologous genes. Some methods try to retain the information by explicitly including the relation between genes, however, not without caveats. RESULTS: In this work, we present a model to transfer and align cell types in cross-species analysis (TACTiCS). First, TACTiCS uses a natural language processing model to match genes using their protein sequences. Next, TACTiCS employs a neural network to classify cell types within a species. Afterward, TACTiCS uses transfer learning to propagate cell type labels between species. We applied TACTiCS on scRNA-seq data of the primary motor cortex of human, mouse, and marmoset. Our model can accurately match and align cell types on these datasets. Moreover, our model outperforms Seurat and the state-of-the-art method SAMap. Finally, we show that our gene matching method results in better cell type matches than BLAST in our model. AVAILABILITY AND IMPLEMENTATION: The implementation is available on GitHub (https://github.com/kbiharie/TACTiCS). The preprocessed datasets and trained models can be downloaded from Zenodo (https://doi.org/10.5281/zenodo.7582460).
format Online
Article
Text
id pubmed-10311290
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103112902023-07-01 Cell type matching across species using protein embeddings and transfer learning Biharie, Kirti Michielsen, Lieke Reinders, Marcel J T Mahfouz, Ahmed Bioinformatics Regulatory and Functional Genomics MOTIVATION: Knowing the relation between cell types is crucial for translating experimental results from mice to humans. Establishing cell type matches, however, is hindered by the biological differences between the species. A substantial amount of evolutionary information between genes that could be used to align the species is discarded by most of the current methods since they only use one-to-one orthologous genes. Some methods try to retain the information by explicitly including the relation between genes, however, not without caveats. RESULTS: In this work, we present a model to transfer and align cell types in cross-species analysis (TACTiCS). First, TACTiCS uses a natural language processing model to match genes using their protein sequences. Next, TACTiCS employs a neural network to classify cell types within a species. Afterward, TACTiCS uses transfer learning to propagate cell type labels between species. We applied TACTiCS on scRNA-seq data of the primary motor cortex of human, mouse, and marmoset. Our model can accurately match and align cell types on these datasets. Moreover, our model outperforms Seurat and the state-of-the-art method SAMap. Finally, we show that our gene matching method results in better cell type matches than BLAST in our model. AVAILABILITY AND IMPLEMENTATION: The implementation is available on GitHub (https://github.com/kbiharie/TACTiCS). The preprocessed datasets and trained models can be downloaded from Zenodo (https://doi.org/10.5281/zenodo.7582460). Oxford University Press 2023-06-30 /pmc/articles/PMC10311290/ /pubmed/37387141 http://dx.doi.org/10.1093/bioinformatics/btad248 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regulatory and Functional Genomics
Biharie, Kirti
Michielsen, Lieke
Reinders, Marcel J T
Mahfouz, Ahmed
Cell type matching across species using protein embeddings and transfer learning
title Cell type matching across species using protein embeddings and transfer learning
title_full Cell type matching across species using protein embeddings and transfer learning
title_fullStr Cell type matching across species using protein embeddings and transfer learning
title_full_unstemmed Cell type matching across species using protein embeddings and transfer learning
title_short Cell type matching across species using protein embeddings and transfer learning
title_sort cell type matching across species using protein embeddings and transfer learning
topic Regulatory and Functional Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311290/
https://www.ncbi.nlm.nih.gov/pubmed/37387141
http://dx.doi.org/10.1093/bioinformatics/btad248
work_keys_str_mv AT bihariekirti celltypematchingacrossspeciesusingproteinembeddingsandtransferlearning
AT michielsenlieke celltypematchingacrossspeciesusingproteinembeddingsandtransferlearning
AT reindersmarceljt celltypematchingacrossspeciesusingproteinembeddingsandtransferlearning
AT mahfouzahmed celltypematchingacrossspeciesusingproteinembeddingsandtransferlearning