Cargando…

Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series

Most of the research studies developed applying microarray technology to the characterization of different pathological states of any disease may fail in reaching statistically significant results. This is largely due to the small repertoire of analysed samples, and to the limitation in the number o...

Descripción completa

Detalles Bibliográficos
Autores principales: Gálvez, Juan Manuel, Castillo, Daniel, Herrera, Luis Javier, San Román, Belén, Valenzuela, Olga, Ortuño, Francisco Manuel, Rojas, Ignacio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5947894/
https://www.ncbi.nlm.nih.gov/pubmed/29750795
http://dx.doi.org/10.1371/journal.pone.0196836
_version_ 1783322456780439552
author Gálvez, Juan Manuel
Castillo, Daniel
Herrera, Luis Javier
San Román, Belén
Valenzuela, Olga
Ortuño, Francisco Manuel
Rojas, Ignacio
author_facet Gálvez, Juan Manuel
Castillo, Daniel
Herrera, Luis Javier
San Román, Belén
Valenzuela, Olga
Ortuño, Francisco Manuel
Rojas, Ignacio
author_sort Gálvez, Juan Manuel
collection PubMed
description Most of the research studies developed applying microarray technology to the characterization of different pathological states of any disease may fail in reaching statistically significant results. This is largely due to the small repertoire of analysed samples, and to the limitation in the number of states or pathologies usually addressed. Moreover, the influence of potential deviations on the gene expression quantification is usually disregarded. In spite of the continuous changes in omic sciences, reflected for instance in the emergence of new Next-Generation Sequencing-related technologies, the existing availability of a vast amount of gene expression microarray datasets should be properly exploited. Therefore, this work proposes a novel methodological approach involving the integration of several heterogeneous skin cancer series, and a later multiclass classifier design. This approach is thus a way to provide the clinicians with an intelligent diagnosis support tool based on the use of a robust set of selected biomarkers, which simultaneously distinguishes among different cancer-related skin states. To achieve this, a multi-platform combination of microarray datasets from Affymetrix and Illumina manufacturers was carried out. This integration is expected to strengthen the statistical robustness of the study as well as the finding of highly-reliable skin cancer biomarkers. Specifically, the designed operation pipeline has allowed the identification of a small subset of 17 differentially expressed genes (DEGs) from which to distinguish among 7 involved skin states. These genes were obtained from the assessment of a number of potential batch effects on the gene expression data. The biological interpretation of these genes was inspected in the specific literature to understand their underlying information in relation to skin cancer. Finally, in order to assess their possible effectiveness in cancer diagnosis, a cross-validation Support Vector Machines (SVM)-based classification including feature ranking was performed. The accuracy attained exceeded the 92% in overall recognition of the 7 different cancer-related skin states. The proposed integration scheme is expected to allow the co-integration with other state-of-the-art technologies such as RNA-seq.
format Online
Article
Text
id pubmed-5947894
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-59478942018-05-25 Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series Gálvez, Juan Manuel Castillo, Daniel Herrera, Luis Javier San Román, Belén Valenzuela, Olga Ortuño, Francisco Manuel Rojas, Ignacio PLoS One Research Article Most of the research studies developed applying microarray technology to the characterization of different pathological states of any disease may fail in reaching statistically significant results. This is largely due to the small repertoire of analysed samples, and to the limitation in the number of states or pathologies usually addressed. Moreover, the influence of potential deviations on the gene expression quantification is usually disregarded. In spite of the continuous changes in omic sciences, reflected for instance in the emergence of new Next-Generation Sequencing-related technologies, the existing availability of a vast amount of gene expression microarray datasets should be properly exploited. Therefore, this work proposes a novel methodological approach involving the integration of several heterogeneous skin cancer series, and a later multiclass classifier design. This approach is thus a way to provide the clinicians with an intelligent diagnosis support tool based on the use of a robust set of selected biomarkers, which simultaneously distinguishes among different cancer-related skin states. To achieve this, a multi-platform combination of microarray datasets from Affymetrix and Illumina manufacturers was carried out. This integration is expected to strengthen the statistical robustness of the study as well as the finding of highly-reliable skin cancer biomarkers. Specifically, the designed operation pipeline has allowed the identification of a small subset of 17 differentially expressed genes (DEGs) from which to distinguish among 7 involved skin states. These genes were obtained from the assessment of a number of potential batch effects on the gene expression data. The biological interpretation of these genes was inspected in the specific literature to understand their underlying information in relation to skin cancer. Finally, in order to assess their possible effectiveness in cancer diagnosis, a cross-validation Support Vector Machines (SVM)-based classification including feature ranking was performed. The accuracy attained exceeded the 92% in overall recognition of the 7 different cancer-related skin states. The proposed integration scheme is expected to allow the co-integration with other state-of-the-art technologies such as RNA-seq. Public Library of Science 2018-05-11 /pmc/articles/PMC5947894/ /pubmed/29750795 http://dx.doi.org/10.1371/journal.pone.0196836 Text en © 2018 Gálvez et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Gálvez, Juan Manuel
Castillo, Daniel
Herrera, Luis Javier
San Román, Belén
Valenzuela, Olga
Ortuño, Francisco Manuel
Rojas, Ignacio
Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series
title Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series
title_full Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series
title_fullStr Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series
title_full_unstemmed Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series
title_short Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series
title_sort multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5947894/
https://www.ncbi.nlm.nih.gov/pubmed/29750795
http://dx.doi.org/10.1371/journal.pone.0196836
work_keys_str_mv AT galvezjuanmanuel multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
AT castillodaniel multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
AT herreraluisjavier multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
AT sanromanbelen multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
AT valenzuelaolga multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
AT ortunofranciscomanuel multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
AT rojasignacio multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries