Cargando…
MSPJ: Discovering potential biomarkers in small gene expression datasets via ensemble learning
In transcriptomics, differentially expressed genes (DEGs) provide fine-grained phenotypic resolution for comparisons between groups and insights into molecular mechanisms underlying the pathogenesis of complex diseases or phenotypes. The robust detection of DEGs from large datasets is well-establish...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9304602/ https://www.ncbi.nlm.nih.gov/pubmed/35891786 http://dx.doi.org/10.1016/j.csbj.2022.07.022 |
_version_ | 1784752124819668992 |
---|---|
author | Yin, HuaChun Tao, JingXin Peng, Yuyang Xiong, Ying Li, Bo Li, Song Yang, Hui |
author_facet | Yin, HuaChun Tao, JingXin Peng, Yuyang Xiong, Ying Li, Bo Li, Song Yang, Hui |
author_sort | Yin, HuaChun |
collection | PubMed |
description | In transcriptomics, differentially expressed genes (DEGs) provide fine-grained phenotypic resolution for comparisons between groups and insights into molecular mechanisms underlying the pathogenesis of complex diseases or phenotypes. The robust detection of DEGs from large datasets is well-established. However, owing to various limitations (e.g., the low availability of samples for some diseases or limited research funding), small sample size is frequently used in experiments. Therefore, methods to screen reliable and stable features are urgently needed for analyses with limited sample size. In this study, MSPJ, a new machine learning approach for identifying DEGs was proposed to mitigate the reduced power and improve the stability of DEG identification in small gene expression datasets. This ensemble learning-based method consists of three algorithms: an improved multiple random sampling with meta-analysis, SVM-RFE (support vector machines-recursive feature elimination), and permutation test. MSPJ was compared with ten classical methods by 94 simulated datasets and large-scale benchmarking with 165 real datasets. The results showed that, among these methods MSPJ had the best performance in most small gene expression datasets, especially those with sample size below 30. In summary, the MSPJ method enables effective feature selection for robust DEG identification in small transcriptome datasets and is expected to expand research on the molecular mechanisms underlying complex diseases or phenotypes. |
format | Online Article Text |
id | pubmed-9304602 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-93046022022-07-25 MSPJ: Discovering potential biomarkers in small gene expression datasets via ensemble learning Yin, HuaChun Tao, JingXin Peng, Yuyang Xiong, Ying Li, Bo Li, Song Yang, Hui Comput Struct Biotechnol J Research Article In transcriptomics, differentially expressed genes (DEGs) provide fine-grained phenotypic resolution for comparisons between groups and insights into molecular mechanisms underlying the pathogenesis of complex diseases or phenotypes. The robust detection of DEGs from large datasets is well-established. However, owing to various limitations (e.g., the low availability of samples for some diseases or limited research funding), small sample size is frequently used in experiments. Therefore, methods to screen reliable and stable features are urgently needed for analyses with limited sample size. In this study, MSPJ, a new machine learning approach for identifying DEGs was proposed to mitigate the reduced power and improve the stability of DEG identification in small gene expression datasets. This ensemble learning-based method consists of three algorithms: an improved multiple random sampling with meta-analysis, SVM-RFE (support vector machines-recursive feature elimination), and permutation test. MSPJ was compared with ten classical methods by 94 simulated datasets and large-scale benchmarking with 165 real datasets. The results showed that, among these methods MSPJ had the best performance in most small gene expression datasets, especially those with sample size below 30. In summary, the MSPJ method enables effective feature selection for robust DEG identification in small transcriptome datasets and is expected to expand research on the molecular mechanisms underlying complex diseases or phenotypes. Research Network of Computational and Structural Biotechnology 2022-07-14 /pmc/articles/PMC9304602/ /pubmed/35891786 http://dx.doi.org/10.1016/j.csbj.2022.07.022 Text en © 2022 Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Yin, HuaChun Tao, JingXin Peng, Yuyang Xiong, Ying Li, Bo Li, Song Yang, Hui MSPJ: Discovering potential biomarkers in small gene expression datasets via ensemble learning |
title | MSPJ: Discovering potential biomarkers in small gene expression datasets via ensemble learning |
title_full | MSPJ: Discovering potential biomarkers in small gene expression datasets via ensemble learning |
title_fullStr | MSPJ: Discovering potential biomarkers in small gene expression datasets via ensemble learning |
title_full_unstemmed | MSPJ: Discovering potential biomarkers in small gene expression datasets via ensemble learning |
title_short | MSPJ: Discovering potential biomarkers in small gene expression datasets via ensemble learning |
title_sort | mspj: discovering potential biomarkers in small gene expression datasets via ensemble learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9304602/ https://www.ncbi.nlm.nih.gov/pubmed/35891786 http://dx.doi.org/10.1016/j.csbj.2022.07.022 |
work_keys_str_mv | AT yinhuachun mspjdiscoveringpotentialbiomarkersinsmallgeneexpressiondatasetsviaensemblelearning AT taojingxin mspjdiscoveringpotentialbiomarkersinsmallgeneexpressiondatasetsviaensemblelearning AT pengyuyang mspjdiscoveringpotentialbiomarkersinsmallgeneexpressiondatasetsviaensemblelearning AT xiongying mspjdiscoveringpotentialbiomarkersinsmallgeneexpressiondatasetsviaensemblelearning AT libo mspjdiscoveringpotentialbiomarkersinsmallgeneexpressiondatasetsviaensemblelearning AT lisong mspjdiscoveringpotentialbiomarkersinsmallgeneexpressiondatasetsviaensemblelearning AT yanghui mspjdiscoveringpotentialbiomarkersinsmallgeneexpressiondatasetsviaensemblelearning |