Cargando…

Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers

Large collections of data in studies on cancer such as leukaemia provoke the necessity of applying tailored analysis algorithms to ensure supreme information extraction. In this work, a custom-fit pipeline is demonstrated for thorough investigation of the voluminous MILE gene expression data set. Th...

Descripción completa

Detalles Bibliográficos
Autores principales: Labaj, Wojciech, Papiez, Anna, Polanski, Andrzej, Polanska, Joanna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5366179/
https://www.ncbi.nlm.nih.gov/pubmed/28303531
http://dx.doi.org/10.1007/s12539-017-0216-9
_version_ 1782517545464496128
author Labaj, Wojciech
Papiez, Anna
Polanski, Andrzej
Polanska, Joanna
author_facet Labaj, Wojciech
Papiez, Anna
Polanski, Andrzej
Polanska, Joanna
author_sort Labaj, Wojciech
collection PubMed
description Large collections of data in studies on cancer such as leukaemia provoke the necessity of applying tailored analysis algorithms to ensure supreme information extraction. In this work, a custom-fit pipeline is demonstrated for thorough investigation of the voluminous MILE gene expression data set. Three analyses are accomplished, each for gaining a deeper understanding of the processes underlying leukaemia types and subtypes. First, the main disease groups are tested for differential expression against the healthy control as in a standard case-control study. Here, the basic knowledge on molecular mechanisms is confirmed quantitatively and by literature references. Second, pairwise comparison testing is performed for juxtaposing the main leukaemia types among each other. In this case by means of the Dice coefficient similarity measure the general relations are pointed out. Moreover, lists of candidate main leukaemia group biomarkers are proposed. Finally, with this approach being successful, the third analysis provides insight into all of the studied subtypes, followed by the emergence of four leukaemia subtype biomarkers. In addition, the class enhanced DEG signature obtained on the basis of novel pipeline processing leads to significantly better classification power of multi-class data classifiers. The developed methodology consisting of batch effect adjustment, adaptive noise and feature filtration coupled with adequate statistical testing and biomarker definition proves to be an effective approach towards knowledge discovery in high-throughput molecular biology experiments. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12539-017-0216-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5366179
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-53661792017-04-10 Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers Labaj, Wojciech Papiez, Anna Polanski, Andrzej Polanska, Joanna Interdiscip Sci Original Research Article Large collections of data in studies on cancer such as leukaemia provoke the necessity of applying tailored analysis algorithms to ensure supreme information extraction. In this work, a custom-fit pipeline is demonstrated for thorough investigation of the voluminous MILE gene expression data set. Three analyses are accomplished, each for gaining a deeper understanding of the processes underlying leukaemia types and subtypes. First, the main disease groups are tested for differential expression against the healthy control as in a standard case-control study. Here, the basic knowledge on molecular mechanisms is confirmed quantitatively and by literature references. Second, pairwise comparison testing is performed for juxtaposing the main leukaemia types among each other. In this case by means of the Dice coefficient similarity measure the general relations are pointed out. Moreover, lists of candidate main leukaemia group biomarkers are proposed. Finally, with this approach being successful, the third analysis provides insight into all of the studied subtypes, followed by the emergence of four leukaemia subtype biomarkers. In addition, the class enhanced DEG signature obtained on the basis of novel pipeline processing leads to significantly better classification power of multi-class data classifiers. The developed methodology consisting of batch effect adjustment, adaptive noise and feature filtration coupled with adequate statistical testing and biomarker definition proves to be an effective approach towards knowledge discovery in high-throughput molecular biology experiments. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12539-017-0216-9) contains supplementary material, which is available to authorized users. Springer Berlin Heidelberg 2017-03-16 2017 /pmc/articles/PMC5366179/ /pubmed/28303531 http://dx.doi.org/10.1007/s12539-017-0216-9 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Original Research Article
Labaj, Wojciech
Papiez, Anna
Polanski, Andrzej
Polanska, Joanna
Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers
title Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers
title_full Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers
title_fullStr Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers
title_full_unstemmed Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers
title_short Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers
title_sort comprehensive analysis of mile gene expression data set advances discovery of leukaemia type and subtype biomarkers
topic Original Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5366179/
https://www.ncbi.nlm.nih.gov/pubmed/28303531
http://dx.doi.org/10.1007/s12539-017-0216-9
work_keys_str_mv AT labajwojciech comprehensiveanalysisofmilegeneexpressiondatasetadvancesdiscoveryofleukaemiatypeandsubtypebiomarkers
AT papiezanna comprehensiveanalysisofmilegeneexpressiondatasetadvancesdiscoveryofleukaemiatypeandsubtypebiomarkers
AT polanskiandrzej comprehensiveanalysisofmilegeneexpressiondatasetadvancesdiscoveryofleukaemiatypeandsubtypebiomarkers
AT polanskajoanna comprehensiveanalysisofmilegeneexpressiondatasetadvancesdiscoveryofleukaemiatypeandsubtypebiomarkers