Cargando…
Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival
BACKGROUND: Modern experimental techniques deliver data sets containing profiles of tens of thousands of potential molecular and genetic markers that can be used to improve medical diagnostics. Previous studies performed with three different experimental methods for the same set of neuroblastoma pat...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6148774/ https://www.ncbi.nlm.nih.gov/pubmed/30236139 http://dx.doi.org/10.1186/s13062-018-0222-9 |
_version_ | 1783356775612809216 |
---|---|
author | Polewko-Klim, Aneta Lesiński, Wojciech Mnich, Krzysztof Piliszek, Radosław Rudnicki, Witold R. |
author_facet | Polewko-Klim, Aneta Lesiński, Wojciech Mnich, Krzysztof Piliszek, Radosław Rudnicki, Witold R. |
author_sort | Polewko-Klim, Aneta |
collection | PubMed |
description | BACKGROUND: Modern experimental techniques deliver data sets containing profiles of tens of thousands of potential molecular and genetic markers that can be used to improve medical diagnostics. Previous studies performed with three different experimental methods for the same set of neuroblastoma patients create opportunity to examine whether augmenting gene expression profiles with information on copy number variation can lead to improved predictions of patients survival. We propose methodology based on comprehensive cross-validation protocol, that includes feature selection within cross-validation loop and classification using machine learning. We also test dependence of results on the feature selection process using four different feature selection methods. RESULTS: The models utilising features selected based on information entropy are slightly, but significantly, better than those using features obtained with t-test. The synergy between data on genetic variation and gene expression is possible, but not confirmed. A slight, but statistically significant, increase of the predictive power of machine learning models has been observed for models built on combined data sets. It was found while using both out of bag estimate and in cross-validation performed on a single set of variables. However, the improvement was smaller and non-significant when models were built within full cross-validation procedure that included feature selection within cross-validation loop. Good correlation between performance of the models in the internal and external cross-validation was observed, confirming the robustness of the proposed protocol and results. CONCLUSIONS: We have developed a protocol for building predictive machine learning models. The protocol can provide robust estimates of the model performance on unseen data. It is particularly well-suited for small data sets. We have applied this protocol to develop prognostic models for neuroblastoma, using data on copy number variation and gene expression. We have shown that combining these two sources of information may increase the quality of the models. Nevertheless, the increase is small and larger samples are required to reduce noise and bias arising due to overfitting. REVIEWERS: This article was reviewed by Lan Hu, Tim Beissbarth and Dimitar Vassilev. |
format | Online Article Text |
id | pubmed-6148774 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-61487742018-09-24 Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival Polewko-Klim, Aneta Lesiński, Wojciech Mnich, Krzysztof Piliszek, Radosław Rudnicki, Witold R. Biol Direct Research BACKGROUND: Modern experimental techniques deliver data sets containing profiles of tens of thousands of potential molecular and genetic markers that can be used to improve medical diagnostics. Previous studies performed with three different experimental methods for the same set of neuroblastoma patients create opportunity to examine whether augmenting gene expression profiles with information on copy number variation can lead to improved predictions of patients survival. We propose methodology based on comprehensive cross-validation protocol, that includes feature selection within cross-validation loop and classification using machine learning. We also test dependence of results on the feature selection process using four different feature selection methods. RESULTS: The models utilising features selected based on information entropy are slightly, but significantly, better than those using features obtained with t-test. The synergy between data on genetic variation and gene expression is possible, but not confirmed. A slight, but statistically significant, increase of the predictive power of machine learning models has been observed for models built on combined data sets. It was found while using both out of bag estimate and in cross-validation performed on a single set of variables. However, the improvement was smaller and non-significant when models were built within full cross-validation procedure that included feature selection within cross-validation loop. Good correlation between performance of the models in the internal and external cross-validation was observed, confirming the robustness of the proposed protocol and results. CONCLUSIONS: We have developed a protocol for building predictive machine learning models. The protocol can provide robust estimates of the model performance on unseen data. It is particularly well-suited for small data sets. We have applied this protocol to develop prognostic models for neuroblastoma, using data on copy number variation and gene expression. We have shown that combining these two sources of information may increase the quality of the models. Nevertheless, the increase is small and larger samples are required to reduce noise and bias arising due to overfitting. REVIEWERS: This article was reviewed by Lan Hu, Tim Beissbarth and Dimitar Vassilev. BioMed Central 2018-09-20 /pmc/articles/PMC6148774/ /pubmed/30236139 http://dx.doi.org/10.1186/s13062-018-0222-9 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Polewko-Klim, Aneta Lesiński, Wojciech Mnich, Krzysztof Piliszek, Radosław Rudnicki, Witold R. Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival |
title | Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival |
title_full | Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival |
title_fullStr | Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival |
title_full_unstemmed | Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival |
title_short | Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival |
title_sort | integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6148774/ https://www.ncbi.nlm.nih.gov/pubmed/30236139 http://dx.doi.org/10.1186/s13062-018-0222-9 |
work_keys_str_mv | AT polewkoklimaneta integrationofmultipletypesofgeneticmarkersforneuroblastomamaycontributetoimprovedpredictionoftheoverallsurvival AT lesinskiwojciech integrationofmultipletypesofgeneticmarkersforneuroblastomamaycontributetoimprovedpredictionoftheoverallsurvival AT mnichkrzysztof integrationofmultipletypesofgeneticmarkersforneuroblastomamaycontributetoimprovedpredictionoftheoverallsurvival AT piliszekradosław integrationofmultipletypesofgeneticmarkersforneuroblastomamaycontributetoimprovedpredictionoftheoverallsurvival AT rudnickiwitoldr integrationofmultipletypesofgeneticmarkersforneuroblastomamaycontributetoimprovedpredictionoftheoverallsurvival |