Cargando…

Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer

Gene expression signatures refer to patterns of gene activities and are used to classify different types of cancer, determine prognosis, and guide treatment decisions. Advancements in high-throughput technology and machine learning have led to improvements to predict a patient’s prognosis for differ...

Descripción completa

Detalles Bibliográficos
Autores principales: Tschodu, Dimitrij, Lippoldt, Jürgen, Gottheil, Pablo, Wegscheider, Anne-Sophie, Käs, Josef A., Niendorf, Axel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10556090/
https://www.ncbi.nlm.nih.gov/pubmed/37798300
http://dx.doi.org/10.1038/s41598-023-41090-9
_version_ 1785116803520790528
author Tschodu, Dimitrij
Lippoldt, Jürgen
Gottheil, Pablo
Wegscheider, Anne-Sophie
Käs, Josef A.
Niendorf, Axel
author_facet Tschodu, Dimitrij
Lippoldt, Jürgen
Gottheil, Pablo
Wegscheider, Anne-Sophie
Käs, Josef A.
Niendorf, Axel
author_sort Tschodu, Dimitrij
collection PubMed
description Gene expression signatures refer to patterns of gene activities and are used to classify different types of cancer, determine prognosis, and guide treatment decisions. Advancements in high-throughput technology and machine learning have led to improvements to predict a patient’s prognosis for different cancer phenotypes. However, computational methods for analyzing signatures have not been used to evaluate their prognostic power. Contention remains on the utility of gene expression signatures for prognosis. The prevalent approaches include random signatures, expert knowledge, and machine learning to construct an improved signature. We unify these approaches to evaluate their prognostic power. Re-evaluation of publicly available gene-expression data from 8 databases with 9 machine-learning models revealed previously unreported results. Gene-expression signatures are confirmed to be useful in predicting a patient’s prognosis. Convergent evidence from [Formula: see text]  10,000 signatures implicates a maximum prognostic power. By calculating the concordance index, which measures how well patients with different prognoses can be discriminated, we show that a signature can correctly discriminate patients’ prognoses no more than 80% of the time. Additionally, we show that more than 50% of the potentially available information is still missing at this value. We surmise that an accurate prognosis must incorporate molecular, clinical, histological, and other complementary factors.
format Online
Article
Text
id pubmed-10556090
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-105560902023-10-07 Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer Tschodu, Dimitrij Lippoldt, Jürgen Gottheil, Pablo Wegscheider, Anne-Sophie Käs, Josef A. Niendorf, Axel Sci Rep Article Gene expression signatures refer to patterns of gene activities and are used to classify different types of cancer, determine prognosis, and guide treatment decisions. Advancements in high-throughput technology and machine learning have led to improvements to predict a patient’s prognosis for different cancer phenotypes. However, computational methods for analyzing signatures have not been used to evaluate their prognostic power. Contention remains on the utility of gene expression signatures for prognosis. The prevalent approaches include random signatures, expert knowledge, and machine learning to construct an improved signature. We unify these approaches to evaluate their prognostic power. Re-evaluation of publicly available gene-expression data from 8 databases with 9 machine-learning models revealed previously unreported results. Gene-expression signatures are confirmed to be useful in predicting a patient’s prognosis. Convergent evidence from [Formula: see text]  10,000 signatures implicates a maximum prognostic power. By calculating the concordance index, which measures how well patients with different prognoses can be discriminated, we show that a signature can correctly discriminate patients’ prognoses no more than 80% of the time. Additionally, we show that more than 50% of the potentially available information is still missing at this value. We surmise that an accurate prognosis must incorporate molecular, clinical, histological, and other complementary factors. Nature Publishing Group UK 2023-10-05 /pmc/articles/PMC10556090/ /pubmed/37798300 http://dx.doi.org/10.1038/s41598-023-41090-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Tschodu, Dimitrij
Lippoldt, Jürgen
Gottheil, Pablo
Wegscheider, Anne-Sophie
Käs, Josef A.
Niendorf, Axel
Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer
title Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer
title_full Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer
title_fullStr Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer
title_full_unstemmed Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer
title_short Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer
title_sort re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10556090/
https://www.ncbi.nlm.nih.gov/pubmed/37798300
http://dx.doi.org/10.1038/s41598-023-41090-9
work_keys_str_mv AT tschodudimitrij reevaluationofpubliclyavailablegeneexpressiondatabasesusingmachinelearningyieldsamaximumprognosticpowerinbreastcancer
AT lippoldtjurgen reevaluationofpubliclyavailablegeneexpressiondatabasesusingmachinelearningyieldsamaximumprognosticpowerinbreastcancer
AT gottheilpablo reevaluationofpubliclyavailablegeneexpressiondatabasesusingmachinelearningyieldsamaximumprognosticpowerinbreastcancer
AT wegscheiderannesophie reevaluationofpubliclyavailablegeneexpressiondatabasesusingmachinelearningyieldsamaximumprognosticpowerinbreastcancer
AT kasjosefa reevaluationofpubliclyavailablegeneexpressiondatabasesusingmachinelearningyieldsamaximumprognosticpowerinbreastcancer
AT niendorfaxel reevaluationofpubliclyavailablegeneexpressiondatabasesusingmachinelearningyieldsamaximumprognosticpowerinbreastcancer