Cargando…

Characterizing Human Cell Types and Tissue Origin Using the Benford Law

Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as i...

Descripción completa

Detalles Bibliográficos
Autores principales: Morag, Sne, Salmon-Divon, Mali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6770594/
https://www.ncbi.nlm.nih.gov/pubmed/31470662
http://dx.doi.org/10.3390/cells8091004
_version_ 1783455510936158208
author Morag, Sne
Salmon-Divon, Mali
author_facet Morag, Sne
Salmon-Divon, Mali
author_sort Morag, Sne
collection PubMed
description Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as its value increases. Here, we analyzed large single-cell and bulk RNA-seq datasets to test whether cell types and tissue origins can be differentiated based on the adherence of specific genes to the BL. Then, we used the Benford adherence scores of these genes as inputs to machine-learning algorithms and tested their separation accuracy. We found that genes selected based on their first-digit distributions can distinguish between cell types and tissue origins. Moreover, despite the simplicity of this novel feature-selection method, its separation accuracy is higher than that of the mean-expression level approach and is similar to that of the differential expression approach. Thus, the BL can be used to obtain biological insights from massive amounts of numerical genomics data—a capability that could be utilized in various biomedical applications, e.g., to resolve samples of unknown primary origin, identify possible sample contaminations, and provide insights into the molecular basis of cancer subtypes.
format Online
Article
Text
id pubmed-6770594
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-67705942019-10-30 Characterizing Human Cell Types and Tissue Origin Using the Benford Law Morag, Sne Salmon-Divon, Mali Cells Article Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as its value increases. Here, we analyzed large single-cell and bulk RNA-seq datasets to test whether cell types and tissue origins can be differentiated based on the adherence of specific genes to the BL. Then, we used the Benford adherence scores of these genes as inputs to machine-learning algorithms and tested their separation accuracy. We found that genes selected based on their first-digit distributions can distinguish between cell types and tissue origins. Moreover, despite the simplicity of this novel feature-selection method, its separation accuracy is higher than that of the mean-expression level approach and is similar to that of the differential expression approach. Thus, the BL can be used to obtain biological insights from massive amounts of numerical genomics data—a capability that could be utilized in various biomedical applications, e.g., to resolve samples of unknown primary origin, identify possible sample contaminations, and provide insights into the molecular basis of cancer subtypes. MDPI 2019-08-29 /pmc/articles/PMC6770594/ /pubmed/31470662 http://dx.doi.org/10.3390/cells8091004 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Morag, Sne
Salmon-Divon, Mali
Characterizing Human Cell Types and Tissue Origin Using the Benford Law
title Characterizing Human Cell Types and Tissue Origin Using the Benford Law
title_full Characterizing Human Cell Types and Tissue Origin Using the Benford Law
title_fullStr Characterizing Human Cell Types and Tissue Origin Using the Benford Law
title_full_unstemmed Characterizing Human Cell Types and Tissue Origin Using the Benford Law
title_short Characterizing Human Cell Types and Tissue Origin Using the Benford Law
title_sort characterizing human cell types and tissue origin using the benford law
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6770594/
https://www.ncbi.nlm.nih.gov/pubmed/31470662
http://dx.doi.org/10.3390/cells8091004
work_keys_str_mv AT moragsne characterizinghumancelltypesandtissueoriginusingthebenfordlaw
AT salmondivonmali characterizinghumancelltypesandtissueoriginusingthebenfordlaw