Cargando…

Evaluation of classification in single cell atac-seq data with machine learning methods

BACKGROUND: The technologies advances of single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) allowed to generate thousands of single cells in a relatively easy and economic manner and it is rapidly advancing the understanding of the cellular composition of complex or...

Descripción completa

Detalles Bibliográficos
Autores principales:	Guo, Hongzhe, Yang, Zhongbo, Jiang, Tao, Liu, Shiqi, Wang, Yadong, Cui, Zhe
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9494763/ https://www.ncbi.nlm.nih.gov/pubmed/36131234 http://dx.doi.org/10.1186/s12859-022-04774-z

_version_	1784793864272347136
author	Guo, Hongzhe Yang, Zhongbo Jiang, Tao Liu, Shiqi Wang, Yadong Cui, Zhe
author_facet	Guo, Hongzhe Yang, Zhongbo Jiang, Tao Liu, Shiqi Wang, Yadong Cui, Zhe
author_sort	Guo, Hongzhe
collection	PubMed
description	BACKGROUND: The technologies advances of single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) allowed to generate thousands of single cells in a relatively easy and economic manner and it is rapidly advancing the understanding of the cellular composition of complex organisms and tissues. The data structure and feature in scRNA-seq is similar to that in scATAC-seq, therefore, it’s encouraged to identify and classify the cell types in scATAC-seq through traditional supervised machine learning methods, which are proved reliable in scRNA-seq datasets. RESULTS: In this study, we evaluated the classification performance of 6 well-known machine learning methods on scATAC-seq. A total of 4 public scATAC-seq datasets vary in tissues, sizes and technologies were applied to the evaluation of the performance of the methods. We assessed these methods using a 5-folds cross validation experiment, called intra-dataset experiment, based on recall, precision and the percentage of correctly predicted cells. The results show that these methods performed well in some specific types of the cell in a specific scATAC-seq dataset, while the overall performance is not as well as that in scRNA-seq analysis. In addition, we evaluated the classification performance of these methods by training and predicting in different datasets generated from same sample, called inter-datasets experiments, which may help us to assess the performance of these methods in more realistic scenarios. CONCLUSIONS: Both in intra-dataset and in inter-dataset experiment, SVM and NMC are overall outperformed others across all 4 datasets. Thus, we recommend researchers to use SVM and NMC as the underlying classifier when developing an automatic cell-type classification method for scATAC-seq.
format	Online Article Text
id	pubmed-9494763
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-94947632022-09-23 Evaluation of classification in single cell atac-seq data with machine learning methods Guo, Hongzhe Yang, Zhongbo Jiang, Tao Liu, Shiqi Wang, Yadong Cui, Zhe BMC Bioinformatics Research BACKGROUND: The technologies advances of single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) allowed to generate thousands of single cells in a relatively easy and economic manner and it is rapidly advancing the understanding of the cellular composition of complex organisms and tissues. The data structure and feature in scRNA-seq is similar to that in scATAC-seq, therefore, it’s encouraged to identify and classify the cell types in scATAC-seq through traditional supervised machine learning methods, which are proved reliable in scRNA-seq datasets. RESULTS: In this study, we evaluated the classification performance of 6 well-known machine learning methods on scATAC-seq. A total of 4 public scATAC-seq datasets vary in tissues, sizes and technologies were applied to the evaluation of the performance of the methods. We assessed these methods using a 5-folds cross validation experiment, called intra-dataset experiment, based on recall, precision and the percentage of correctly predicted cells. The results show that these methods performed well in some specific types of the cell in a specific scATAC-seq dataset, while the overall performance is not as well as that in scRNA-seq analysis. In addition, we evaluated the classification performance of these methods by training and predicting in different datasets generated from same sample, called inter-datasets experiments, which may help us to assess the performance of these methods in more realistic scenarios. CONCLUSIONS: Both in intra-dataset and in inter-dataset experiment, SVM and NMC are overall outperformed others across all 4 datasets. Thus, we recommend researchers to use SVM and NMC as the underlying classifier when developing an automatic cell-type classification method for scATAC-seq. BioMed Central 2022-09-21 /pmc/articles/PMC9494763/ /pubmed/36131234 http://dx.doi.org/10.1186/s12859-022-04774-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Guo, Hongzhe Yang, Zhongbo Jiang, Tao Liu, Shiqi Wang, Yadong Cui, Zhe Evaluation of classification in single cell atac-seq data with machine learning methods
title	Evaluation of classification in single cell atac-seq data with machine learning methods
title_full	Evaluation of classification in single cell atac-seq data with machine learning methods
title_fullStr	Evaluation of classification in single cell atac-seq data with machine learning methods
title_full_unstemmed	Evaluation of classification in single cell atac-seq data with machine learning methods
title_short	Evaluation of classification in single cell atac-seq data with machine learning methods
title_sort	evaluation of classification in single cell atac-seq data with machine learning methods
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9494763/ https://www.ncbi.nlm.nih.gov/pubmed/36131234 http://dx.doi.org/10.1186/s12859-022-04774-z
work_keys_str_mv	AT guohongzhe evaluationofclassificationinsinglecellatacseqdatawithmachinelearningmethods AT yangzhongbo evaluationofclassificationinsinglecellatacseqdatawithmachinelearningmethods AT jiangtao evaluationofclassificationinsinglecellatacseqdatawithmachinelearningmethods AT liushiqi evaluationofclassificationinsinglecellatacseqdatawithmachinelearningmethods AT wangyadong evaluationofclassificationinsinglecellatacseqdatawithmachinelearningmethods AT cuizhe evaluationofclassificationinsinglecellatacseqdatawithmachinelearningmethods

Evaluation of classification in single cell atac-seq data with machine learning methods

Ejemplares similares