Cargando…

devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data

A major informatic challenge in single cell RNA-sequencing analysis is the precise annotation of datasets where cells exhibit complex multilayered identities or transitory states. Here, we present devCellPy a highly accurate and precise machine learning-enabled tool that enables automated prediction...

Descripción completa

Detalles Bibliográficos
Autores principales: Galdos, Francisco X., Xu, Sidra, Goodyer, William R., Duan, Lauren, Huang, Yuhsin V., Lee, Soah, Zhu, Han, Lee, Carissa, Wei, Nicholas, Lee, Daniel, Wu, Sean M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9452519/
https://www.ncbi.nlm.nih.gov/pubmed/36071107
http://dx.doi.org/10.1038/s41467-022-33045-x
_version_ 1784784927407996928
author Galdos, Francisco X.
Xu, Sidra
Goodyer, William R.
Duan, Lauren
Huang, Yuhsin V.
Lee, Soah
Zhu, Han
Lee, Carissa
Wei, Nicholas
Lee, Daniel
Wu, Sean M.
author_facet Galdos, Francisco X.
Xu, Sidra
Goodyer, William R.
Duan, Lauren
Huang, Yuhsin V.
Lee, Soah
Zhu, Han
Lee, Carissa
Wei, Nicholas
Lee, Daniel
Wu, Sean M.
author_sort Galdos, Francisco X.
collection PubMed
description A major informatic challenge in single cell RNA-sequencing analysis is the precise annotation of datasets where cells exhibit complex multilayered identities or transitory states. Here, we present devCellPy a highly accurate and precise machine learning-enabled tool that enables automated prediction of cell types across complex annotation hierarchies. To demonstrate the power of devCellPy, we construct a murine cardiac developmental atlas from published datasets encompassing 104,199 cells from E6.5-E16.5 and train devCellPy to generate a cardiac prediction algorithm. Using this algorithm, we observe a high prediction accuracy (>90%) across multiple layers of annotation and across de novo murine developmental data. Furthermore, we conduct a cross-species prediction of cardiomyocyte subtypes from in vitro-derived human induced pluripotent stem cells and unexpectedly uncover a predominance of left ventricular (LV) identity that we confirmed by an LV-specific TBX5 lineage tracing system. Together, our results show devCellPy to be a useful tool for automated cell prediction across complex cellular hierarchies, species, and experimental systems.
format Online
Article
Text
id pubmed-9452519
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-94525192022-09-09 devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data Galdos, Francisco X. Xu, Sidra Goodyer, William R. Duan, Lauren Huang, Yuhsin V. Lee, Soah Zhu, Han Lee, Carissa Wei, Nicholas Lee, Daniel Wu, Sean M. Nat Commun Article A major informatic challenge in single cell RNA-sequencing analysis is the precise annotation of datasets where cells exhibit complex multilayered identities or transitory states. Here, we present devCellPy a highly accurate and precise machine learning-enabled tool that enables automated prediction of cell types across complex annotation hierarchies. To demonstrate the power of devCellPy, we construct a murine cardiac developmental atlas from published datasets encompassing 104,199 cells from E6.5-E16.5 and train devCellPy to generate a cardiac prediction algorithm. Using this algorithm, we observe a high prediction accuracy (>90%) across multiple layers of annotation and across de novo murine developmental data. Furthermore, we conduct a cross-species prediction of cardiomyocyte subtypes from in vitro-derived human induced pluripotent stem cells and unexpectedly uncover a predominance of left ventricular (LV) identity that we confirmed by an LV-specific TBX5 lineage tracing system. Together, our results show devCellPy to be a useful tool for automated cell prediction across complex cellular hierarchies, species, and experimental systems. Nature Publishing Group UK 2022-09-07 /pmc/articles/PMC9452519/ /pubmed/36071107 http://dx.doi.org/10.1038/s41467-022-33045-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Galdos, Francisco X.
Xu, Sidra
Goodyer, William R.
Duan, Lauren
Huang, Yuhsin V.
Lee, Soah
Zhu, Han
Lee, Carissa
Wei, Nicholas
Lee, Daniel
Wu, Sean M.
devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data
title devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data
title_full devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data
title_fullStr devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data
title_full_unstemmed devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data
title_short devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data
title_sort devcellpy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9452519/
https://www.ncbi.nlm.nih.gov/pubmed/36071107
http://dx.doi.org/10.1038/s41467-022-33045-x
work_keys_str_mv AT galdosfranciscox devcellpyisamachinelearningenabledpipelineforautomatedannotationofcomplexmultilayeredsinglecelltranscriptomicdata
AT xusidra devcellpyisamachinelearningenabledpipelineforautomatedannotationofcomplexmultilayeredsinglecelltranscriptomicdata
AT goodyerwilliamr devcellpyisamachinelearningenabledpipelineforautomatedannotationofcomplexmultilayeredsinglecelltranscriptomicdata
AT duanlauren devcellpyisamachinelearningenabledpipelineforautomatedannotationofcomplexmultilayeredsinglecelltranscriptomicdata
AT huangyuhsinv devcellpyisamachinelearningenabledpipelineforautomatedannotationofcomplexmultilayeredsinglecelltranscriptomicdata
AT leesoah devcellpyisamachinelearningenabledpipelineforautomatedannotationofcomplexmultilayeredsinglecelltranscriptomicdata
AT zhuhan devcellpyisamachinelearningenabledpipelineforautomatedannotationofcomplexmultilayeredsinglecelltranscriptomicdata
AT leecarissa devcellpyisamachinelearningenabledpipelineforautomatedannotationofcomplexmultilayeredsinglecelltranscriptomicdata
AT weinicholas devcellpyisamachinelearningenabledpipelineforautomatedannotationofcomplexmultilayeredsinglecelltranscriptomicdata
AT leedaniel devcellpyisamachinelearningenabledpipelineforautomatedannotationofcomplexmultilayeredsinglecelltranscriptomicdata
AT wuseanm devcellpyisamachinelearningenabledpipelineforautomatedannotationofcomplexmultilayeredsinglecelltranscriptomicdata