Cargando…

Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival

BACKGROUND: Modern experimental techniques deliver data sets containing profiles of tens of thousands of potential molecular and genetic markers that can be used to improve medical diagnostics. Previous studies performed with three different experimental methods for the same set of neuroblastoma pat...

Descripción completa

Detalles Bibliográficos
Autores principales: Polewko-Klim, Aneta, Lesiński, Wojciech, Mnich, Krzysztof, Piliszek, Radosław, Rudnicki, Witold R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6148774/
https://www.ncbi.nlm.nih.gov/pubmed/30236139
http://dx.doi.org/10.1186/s13062-018-0222-9
_version_ 1783356775612809216
author Polewko-Klim, Aneta
Lesiński, Wojciech
Mnich, Krzysztof
Piliszek, Radosław
Rudnicki, Witold R.
author_facet Polewko-Klim, Aneta
Lesiński, Wojciech
Mnich, Krzysztof
Piliszek, Radosław
Rudnicki, Witold R.
author_sort Polewko-Klim, Aneta
collection PubMed
description BACKGROUND: Modern experimental techniques deliver data sets containing profiles of tens of thousands of potential molecular and genetic markers that can be used to improve medical diagnostics. Previous studies performed with three different experimental methods for the same set of neuroblastoma patients create opportunity to examine whether augmenting gene expression profiles with information on copy number variation can lead to improved predictions of patients survival. We propose methodology based on comprehensive cross-validation protocol, that includes feature selection within cross-validation loop and classification using machine learning. We also test dependence of results on the feature selection process using four different feature selection methods. RESULTS: The models utilising features selected based on information entropy are slightly, but significantly, better than those using features obtained with t-test. The synergy between data on genetic variation and gene expression is possible, but not confirmed. A slight, but statistically significant, increase of the predictive power of machine learning models has been observed for models built on combined data sets. It was found while using both out of bag estimate and in cross-validation performed on a single set of variables. However, the improvement was smaller and non-significant when models were built within full cross-validation procedure that included feature selection within cross-validation loop. Good correlation between performance of the models in the internal and external cross-validation was observed, confirming the robustness of the proposed protocol and results. CONCLUSIONS: We have developed a protocol for building predictive machine learning models. The protocol can provide robust estimates of the model performance on unseen data. It is particularly well-suited for small data sets. We have applied this protocol to develop prognostic models for neuroblastoma, using data on copy number variation and gene expression. We have shown that combining these two sources of information may increase the quality of the models. Nevertheless, the increase is small and larger samples are required to reduce noise and bias arising due to overfitting. REVIEWERS: This article was reviewed by Lan Hu, Tim Beissbarth and Dimitar Vassilev.
format Online
Article
Text
id pubmed-6148774
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61487742018-09-24 Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival Polewko-Klim, Aneta Lesiński, Wojciech Mnich, Krzysztof Piliszek, Radosław Rudnicki, Witold R. Biol Direct Research BACKGROUND: Modern experimental techniques deliver data sets containing profiles of tens of thousands of potential molecular and genetic markers that can be used to improve medical diagnostics. Previous studies performed with three different experimental methods for the same set of neuroblastoma patients create opportunity to examine whether augmenting gene expression profiles with information on copy number variation can lead to improved predictions of patients survival. We propose methodology based on comprehensive cross-validation protocol, that includes feature selection within cross-validation loop and classification using machine learning. We also test dependence of results on the feature selection process using four different feature selection methods. RESULTS: The models utilising features selected based on information entropy are slightly, but significantly, better than those using features obtained with t-test. The synergy between data on genetic variation and gene expression is possible, but not confirmed. A slight, but statistically significant, increase of the predictive power of machine learning models has been observed for models built on combined data sets. It was found while using both out of bag estimate and in cross-validation performed on a single set of variables. However, the improvement was smaller and non-significant when models were built within full cross-validation procedure that included feature selection within cross-validation loop. Good correlation between performance of the models in the internal and external cross-validation was observed, confirming the robustness of the proposed protocol and results. CONCLUSIONS: We have developed a protocol for building predictive machine learning models. The protocol can provide robust estimates of the model performance on unseen data. It is particularly well-suited for small data sets. We have applied this protocol to develop prognostic models for neuroblastoma, using data on copy number variation and gene expression. We have shown that combining these two sources of information may increase the quality of the models. Nevertheless, the increase is small and larger samples are required to reduce noise and bias arising due to overfitting. REVIEWERS: This article was reviewed by Lan Hu, Tim Beissbarth and Dimitar Vassilev. BioMed Central 2018-09-20 /pmc/articles/PMC6148774/ /pubmed/30236139 http://dx.doi.org/10.1186/s13062-018-0222-9 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Polewko-Klim, Aneta
Lesiński, Wojciech
Mnich, Krzysztof
Piliszek, Radosław
Rudnicki, Witold R.
Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival
title Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival
title_full Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival
title_fullStr Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival
title_full_unstemmed Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival
title_short Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival
title_sort integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6148774/
https://www.ncbi.nlm.nih.gov/pubmed/30236139
http://dx.doi.org/10.1186/s13062-018-0222-9
work_keys_str_mv AT polewkoklimaneta integrationofmultipletypesofgeneticmarkersforneuroblastomamaycontributetoimprovedpredictionoftheoverallsurvival
AT lesinskiwojciech integrationofmultipletypesofgeneticmarkersforneuroblastomamaycontributetoimprovedpredictionoftheoverallsurvival
AT mnichkrzysztof integrationofmultipletypesofgeneticmarkersforneuroblastomamaycontributetoimprovedpredictionoftheoverallsurvival
AT piliszekradosław integrationofmultipletypesofgeneticmarkersforneuroblastomamaycontributetoimprovedpredictionoftheoverallsurvival
AT rudnickiwitoldr integrationofmultipletypesofgeneticmarkersforneuroblastomamaycontributetoimprovedpredictionoftheoverallsurvival