Cargando…

A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data

BACKGROUND: The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time....

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Yongli, Hase, Takeshi, Li, Hui Peng, Prabhakar, Shyam, Kitano, Hiroaki, Ng, See Kiong, Ghosh, Samik, Wee, Lawrence Jin Kiat
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5260093/
https://www.ncbi.nlm.nih.gov/pubmed/28155657
http://dx.doi.org/10.1186/s12864-016-3317-7
_version_ 1782499341596884992
author Hu, Yongli
Hase, Takeshi
Li, Hui Peng
Prabhakar, Shyam
Kitano, Hiroaki
Ng, See Kiong
Ghosh, Samik
Wee, Lawrence Jin Kiat
author_facet Hu, Yongli
Hase, Takeshi
Li, Hui Peng
Prabhakar, Shyam
Kitano, Hiroaki
Ng, See Kiong
Ghosh, Samik
Wee, Lawrence Jin Kiat
author_sort Hu, Yongli
collection PubMed
description BACKGROUND: The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). RESULTS: Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. CONCLUSION: This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.
format Online
Article
Text
id pubmed-5260093
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52600932017-01-26 A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data Hu, Yongli Hase, Takeshi Li, Hui Peng Prabhakar, Shyam Kitano, Hiroaki Ng, See Kiong Ghosh, Samik Wee, Lawrence Jin Kiat BMC Genomics Research BACKGROUND: The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). RESULTS: Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. CONCLUSION: This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour. BioMed Central 2016-12-22 /pmc/articles/PMC5260093/ /pubmed/28155657 http://dx.doi.org/10.1186/s12864-016-3317-7 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Hu, Yongli
Hase, Takeshi
Li, Hui Peng
Prabhakar, Shyam
Kitano, Hiroaki
Ng, See Kiong
Ghosh, Samik
Wee, Lawrence Jin Kiat
A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data
title A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data
title_full A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data
title_fullStr A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data
title_full_unstemmed A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data
title_short A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data
title_sort machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5260093/
https://www.ncbi.nlm.nih.gov/pubmed/28155657
http://dx.doi.org/10.1186/s12864-016-3317-7
work_keys_str_mv AT huyongli amachinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT hasetakeshi amachinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT lihuipeng amachinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT prabhakarshyam amachinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT kitanohiroaki amachinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT ngseekiong amachinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT ghoshsamik amachinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT weelawrencejinkiat amachinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT huyongli machinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT hasetakeshi machinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT lihuipeng machinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT prabhakarshyam machinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT kitanohiroaki machinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT ngseekiong machinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT ghoshsamik machinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata
AT weelawrencejinkiat machinelearningapproachfortheidentificationofkeymarkersinvolvedinbraindevelopmentfromsinglecelltranscriptomicdata