Cargando…

Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis

Human preimplantation development is a complex process involving dramatic changes in transcriptional architecture. For a better understanding of their time-spatial development, it is indispensable to identify key genes. Although the single-cell RNA sequencing (RNA-seq) techniques could provide detai...

Descripción completa

Detalles Bibliográficos
Autores principales: Liang, Pengfei, Yang, Wuritu, Chen, Xing, Long, Chunshen, Zheng, Lei, Li, Hanshuang, Zuo, Yongchun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society of Gene & Cell Therapy 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7066034/
https://www.ncbi.nlm.nih.gov/pubmed/32169803
http://dx.doi.org/10.1016/j.omtn.2020.02.004
_version_ 1783505159922384896
author Liang, Pengfei
Yang, Wuritu
Chen, Xing
Long, Chunshen
Zheng, Lei
Li, Hanshuang
Zuo, Yongchun
author_facet Liang, Pengfei
Yang, Wuritu
Chen, Xing
Long, Chunshen
Zheng, Lei
Li, Hanshuang
Zuo, Yongchun
author_sort Liang, Pengfei
collection PubMed
description Human preimplantation development is a complex process involving dramatic changes in transcriptional architecture. For a better understanding of their time-spatial development, it is indispensable to identify key genes. Although the single-cell RNA sequencing (RNA-seq) techniques could provide detailed clustering signatures, the identification of decisive factors remains difficult. Additionally, it requires high experimental cost and a long experimental period. Thus, it is highly desired to develop computational methods for identifying effective genes of development signature. In this study, we first developed a predictor called EmPredictor to identify developmental stages of human preimplantation embryogenesis. First, we compared the F-score of feature selection algorithms with differential gene expression (DGE) analysis to find specific signatures of the development stage. In addition, by training the support vector machine (SVM), four types of signature subsets were comprehensively discussed. The prediction results showed that a feature subset with 1,881 genes from the F-score algorithm obtained the best predictive performance, which achieved the highest accuracy of 93.3% on the cross-validation set. Further function enrichment demonstrated that the gene set selected by the feature selection method was involved in more development-related pathways and cell fate determination biomarkers. This indicates that the F-score algorithm should be preferentially proposed for detecting key genes of multi-period data in mammalian early development.
format Online
Article
Text
id pubmed-7066034
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Society of Gene & Cell Therapy
record_format MEDLINE/PubMed
spelling pubmed-70660342020-03-16 Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis Liang, Pengfei Yang, Wuritu Chen, Xing Long, Chunshen Zheng, Lei Li, Hanshuang Zuo, Yongchun Mol Ther Nucleic Acids Article Human preimplantation development is a complex process involving dramatic changes in transcriptional architecture. For a better understanding of their time-spatial development, it is indispensable to identify key genes. Although the single-cell RNA sequencing (RNA-seq) techniques could provide detailed clustering signatures, the identification of decisive factors remains difficult. Additionally, it requires high experimental cost and a long experimental period. Thus, it is highly desired to develop computational methods for identifying effective genes of development signature. In this study, we first developed a predictor called EmPredictor to identify developmental stages of human preimplantation embryogenesis. First, we compared the F-score of feature selection algorithms with differential gene expression (DGE) analysis to find specific signatures of the development stage. In addition, by training the support vector machine (SVM), four types of signature subsets were comprehensively discussed. The prediction results showed that a feature subset with 1,881 genes from the F-score algorithm obtained the best predictive performance, which achieved the highest accuracy of 93.3% on the cross-validation set. Further function enrichment demonstrated that the gene set selected by the feature selection method was involved in more development-related pathways and cell fate determination biomarkers. This indicates that the F-score algorithm should be preferentially proposed for detecting key genes of multi-period data in mammalian early development. American Society of Gene & Cell Therapy 2020-02-13 /pmc/articles/PMC7066034/ /pubmed/32169803 http://dx.doi.org/10.1016/j.omtn.2020.02.004 Text en © 2020 The Author(s) http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Liang, Pengfei
Yang, Wuritu
Chen, Xing
Long, Chunshen
Zheng, Lei
Li, Hanshuang
Zuo, Yongchun
Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis
title Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis
title_full Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis
title_fullStr Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis
title_full_unstemmed Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis
title_short Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis
title_sort machine learning of single-cell transcriptome highly identifies mrna signature by comparing f-score selection with dge analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7066034/
https://www.ncbi.nlm.nih.gov/pubmed/32169803
http://dx.doi.org/10.1016/j.omtn.2020.02.004
work_keys_str_mv AT liangpengfei machinelearningofsinglecelltranscriptomehighlyidentifiesmrnasignaturebycomparingfscoreselectionwithdgeanalysis
AT yangwuritu machinelearningofsinglecelltranscriptomehighlyidentifiesmrnasignaturebycomparingfscoreselectionwithdgeanalysis
AT chenxing machinelearningofsinglecelltranscriptomehighlyidentifiesmrnasignaturebycomparingfscoreselectionwithdgeanalysis
AT longchunshen machinelearningofsinglecelltranscriptomehighlyidentifiesmrnasignaturebycomparingfscoreselectionwithdgeanalysis
AT zhenglei machinelearningofsinglecelltranscriptomehighlyidentifiesmrnasignaturebycomparingfscoreselectionwithdgeanalysis
AT lihanshuang machinelearningofsinglecelltranscriptomehighlyidentifiesmrnasignaturebycomparingfscoreselectionwithdgeanalysis
AT zuoyongchun machinelearningofsinglecelltranscriptomehighlyidentifiesmrnasignaturebycomparingfscoreselectionwithdgeanalysis