Cargando…

A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data

BACKGROUND: Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalization is r...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Yan, Yang, Bin, Wang, Junhui, Zhu, Jiadi, Tian, Guoliang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8234728/
https://www.ncbi.nlm.nih.gov/pubmed/34174824
http://dx.doi.org/10.1186/s12864-021-07790-0
_version_ 1783714150934904832
author Zhou, Yan
Yang, Bin
Wang, Junhui
Zhu, Jiadi
Tian, Guoliang
author_facet Zhou, Yan
Yang, Bin
Wang, Junhui
Zhu, Jiadi
Tian, Guoliang
author_sort Zhou, Yan
collection PubMed
description BACKGROUND: Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalization is regarded as an essential step in the discovery of biologically important changes in expression. The present methods usually involve normalization of the data with a scaling factor, followed by detection of significant genes. However, more than one scaling factor may exist because of the complexity of real data. Consequently, methods that normalize data by a single scaling factor may deliver suboptimal performance or may not even work.The development of modern machine learning techniques has provided a new perspective regarding discrimination between differentially expressed (DE) and non-DE genes. However, in reality, the non-DE genes comprise only a small set and may contain housekeeping genes (in same species) or conserved orthologous genes (in different species). Therefore, the process of detecting DE genes can be formulated as a one-class classification problem, where only non-DE genes are observed, while DE genes are completely absent from the training data. RESULTS: In this study, we transform the problem to an outlier detection problem by treating DE genes as outliers, and we propose a scaling-free minimum enclosing ball (SFMEB) method to construct a smallest possible ball to contain the known non-DE genes in a feature space. The genes outside the minimum enclosing ball can then be naturally considered to be DE genes. Compared with the existing methods, the proposed SFMEB method does not require data normalization, which is particularly attractive when the RNA-seq data include more than one scaling factor. Furthermore, the SFMEB method could be easily extended to different species without normalization. CONCLUSIONS: Simulation studies demonstrate that the SFMEB method works well in a wide range of settings, especially when the data are heterogeneous or biological replicates. Analysis of the real data also supports the conclusion that the SFMEB method outperforms other existing competitors. The R package of the proposed method is available at https://bioconductor.org/packages/MEB. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-021-07790-0).
format Online
Article
Text
id pubmed-8234728
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82347282021-06-28 A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data Zhou, Yan Yang, Bin Wang, Junhui Zhu, Jiadi Tian, Guoliang BMC Genomics Methodology Article BACKGROUND: Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalization is regarded as an essential step in the discovery of biologically important changes in expression. The present methods usually involve normalization of the data with a scaling factor, followed by detection of significant genes. However, more than one scaling factor may exist because of the complexity of real data. Consequently, methods that normalize data by a single scaling factor may deliver suboptimal performance or may not even work.The development of modern machine learning techniques has provided a new perspective regarding discrimination between differentially expressed (DE) and non-DE genes. However, in reality, the non-DE genes comprise only a small set and may contain housekeeping genes (in same species) or conserved orthologous genes (in different species). Therefore, the process of detecting DE genes can be formulated as a one-class classification problem, where only non-DE genes are observed, while DE genes are completely absent from the training data. RESULTS: In this study, we transform the problem to an outlier detection problem by treating DE genes as outliers, and we propose a scaling-free minimum enclosing ball (SFMEB) method to construct a smallest possible ball to contain the known non-DE genes in a feature space. The genes outside the minimum enclosing ball can then be naturally considered to be DE genes. Compared with the existing methods, the proposed SFMEB method does not require data normalization, which is particularly attractive when the RNA-seq data include more than one scaling factor. Furthermore, the SFMEB method could be easily extended to different species without normalization. CONCLUSIONS: Simulation studies demonstrate that the SFMEB method works well in a wide range of settings, especially when the data are heterogeneous or biological replicates. Analysis of the real data also supports the conclusion that the SFMEB method outperforms other existing competitors. The R package of the proposed method is available at https://bioconductor.org/packages/MEB. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-021-07790-0). BioMed Central 2021-06-26 /pmc/articles/PMC8234728/ /pubmed/34174824 http://dx.doi.org/10.1186/s12864-021-07790-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Zhou, Yan
Yang, Bin
Wang, Junhui
Zhu, Jiadi
Tian, Guoliang
A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
title A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
title_full A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
title_fullStr A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
title_full_unstemmed A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
title_short A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
title_sort scaling-free minimum enclosing ball method to detect differentially expressed genes for rna-seq data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8234728/
https://www.ncbi.nlm.nih.gov/pubmed/34174824
http://dx.doi.org/10.1186/s12864-021-07790-0
work_keys_str_mv AT zhouyan ascalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT yangbin ascalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT wangjunhui ascalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT zhujiadi ascalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT tianguoliang ascalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT zhouyan scalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT yangbin scalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT wangjunhui scalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT zhujiadi scalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata
AT tianguoliang scalingfreeminimumenclosingballmethodtodetectdifferentiallyexpressedgenesforrnaseqdata