Cargando…
Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers
Large collections of data in studies on cancer such as leukaemia provoke the necessity of applying tailored analysis algorithms to ensure supreme information extraction. In this work, a custom-fit pipeline is demonstrated for thorough investigation of the voluminous MILE gene expression data set. Th...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5366179/ https://www.ncbi.nlm.nih.gov/pubmed/28303531 http://dx.doi.org/10.1007/s12539-017-0216-9 |
_version_ | 1782517545464496128 |
---|---|
author | Labaj, Wojciech Papiez, Anna Polanski, Andrzej Polanska, Joanna |
author_facet | Labaj, Wojciech Papiez, Anna Polanski, Andrzej Polanska, Joanna |
author_sort | Labaj, Wojciech |
collection | PubMed |
description | Large collections of data in studies on cancer such as leukaemia provoke the necessity of applying tailored analysis algorithms to ensure supreme information extraction. In this work, a custom-fit pipeline is demonstrated for thorough investigation of the voluminous MILE gene expression data set. Three analyses are accomplished, each for gaining a deeper understanding of the processes underlying leukaemia types and subtypes. First, the main disease groups are tested for differential expression against the healthy control as in a standard case-control study. Here, the basic knowledge on molecular mechanisms is confirmed quantitatively and by literature references. Second, pairwise comparison testing is performed for juxtaposing the main leukaemia types among each other. In this case by means of the Dice coefficient similarity measure the general relations are pointed out. Moreover, lists of candidate main leukaemia group biomarkers are proposed. Finally, with this approach being successful, the third analysis provides insight into all of the studied subtypes, followed by the emergence of four leukaemia subtype biomarkers. In addition, the class enhanced DEG signature obtained on the basis of novel pipeline processing leads to significantly better classification power of multi-class data classifiers. The developed methodology consisting of batch effect adjustment, adaptive noise and feature filtration coupled with adequate statistical testing and biomarker definition proves to be an effective approach towards knowledge discovery in high-throughput molecular biology experiments. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12539-017-0216-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5366179 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-53661792017-04-10 Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers Labaj, Wojciech Papiez, Anna Polanski, Andrzej Polanska, Joanna Interdiscip Sci Original Research Article Large collections of data in studies on cancer such as leukaemia provoke the necessity of applying tailored analysis algorithms to ensure supreme information extraction. In this work, a custom-fit pipeline is demonstrated for thorough investigation of the voluminous MILE gene expression data set. Three analyses are accomplished, each for gaining a deeper understanding of the processes underlying leukaemia types and subtypes. First, the main disease groups are tested for differential expression against the healthy control as in a standard case-control study. Here, the basic knowledge on molecular mechanisms is confirmed quantitatively and by literature references. Second, pairwise comparison testing is performed for juxtaposing the main leukaemia types among each other. In this case by means of the Dice coefficient similarity measure the general relations are pointed out. Moreover, lists of candidate main leukaemia group biomarkers are proposed. Finally, with this approach being successful, the third analysis provides insight into all of the studied subtypes, followed by the emergence of four leukaemia subtype biomarkers. In addition, the class enhanced DEG signature obtained on the basis of novel pipeline processing leads to significantly better classification power of multi-class data classifiers. The developed methodology consisting of batch effect adjustment, adaptive noise and feature filtration coupled with adequate statistical testing and biomarker definition proves to be an effective approach towards knowledge discovery in high-throughput molecular biology experiments. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12539-017-0216-9) contains supplementary material, which is available to authorized users. Springer Berlin Heidelberg 2017-03-16 2017 /pmc/articles/PMC5366179/ /pubmed/28303531 http://dx.doi.org/10.1007/s12539-017-0216-9 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
spellingShingle | Original Research Article Labaj, Wojciech Papiez, Anna Polanski, Andrzej Polanska, Joanna Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers |
title | Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers |
title_full | Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers |
title_fullStr | Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers |
title_full_unstemmed | Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers |
title_short | Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers |
title_sort | comprehensive analysis of mile gene expression data set advances discovery of leukaemia type and subtype biomarkers |
topic | Original Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5366179/ https://www.ncbi.nlm.nih.gov/pubmed/28303531 http://dx.doi.org/10.1007/s12539-017-0216-9 |
work_keys_str_mv | AT labajwojciech comprehensiveanalysisofmilegeneexpressiondatasetadvancesdiscoveryofleukaemiatypeandsubtypebiomarkers AT papiezanna comprehensiveanalysisofmilegeneexpressiondatasetadvancesdiscoveryofleukaemiatypeandsubtypebiomarkers AT polanskiandrzej comprehensiveanalysisofmilegeneexpressiondatasetadvancesdiscoveryofleukaemiatypeandsubtypebiomarkers AT polanskajoanna comprehensiveanalysisofmilegeneexpressiondatasetadvancesdiscoveryofleukaemiatypeandsubtypebiomarkers |