Cargando…

A pathway-based data integration framework for prediction of disease progression

Motivation: Within medical research there is an increasing trend toward deriving multiple types of data from the same individual. The most effective prognostic prediction methods should use all available data, as this maximizes the amount of information used. In this article, we consider a variety o...

Descripción completa

Detalles Bibliográficos
Autores principales: Seoane, José A., Day, Ian N. M., Gaunt, Tom R., Campbell, Colin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3957070/
https://www.ncbi.nlm.nih.gov/pubmed/24162466
http://dx.doi.org/10.1093/bioinformatics/btt610
_version_ 1782307762207719424
author Seoane, José A.
Day, Ian N. M.
Gaunt, Tom R.
Campbell, Colin
author_facet Seoane, José A.
Day, Ian N. M.
Gaunt, Tom R.
Campbell, Colin
author_sort Seoane, José A.
collection PubMed
description Motivation: Within medical research there is an increasing trend toward deriving multiple types of data from the same individual. The most effective prognostic prediction methods should use all available data, as this maximizes the amount of information used. In this article, we consider a variety of learning strategies to boost prediction performance based on the use of all available data. Implementation: We consider data integration via the use of multiple kernel learning supervised learning methods. We propose a scheme in which feature selection by statistical score is performed separately per data type and by pathway membership. We further consider the introduction of a confidence measure for the class assignment, both to remove some ambiguously labeled datapoints from the training data and to implement a cautious classifier that only makes predictions when the associated confidence is high. Results: We use the METABRIC dataset for breast cancer, with prediction of survival at 2000 days from diagnosis. Predictive accuracy is improved by using kernels that exclusively use those genes, as features, which are known members of particular pathways. We show that yet further improvements can be made by using a range of additional kernels based on clinical covariates such as Estrogen Receptor (ER) status. Using this range of measures to improve prediction performance, we show that the test accuracy on new instances is nearly 80%, though predictions are only made on 69.2% of the patient cohort. Availability: https://github.com/jseoane/FSMKL Contact: J.Seoane@bristol.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3957070
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-39570702014-03-19 A pathway-based data integration framework for prediction of disease progression Seoane, José A. Day, Ian N. M. Gaunt, Tom R. Campbell, Colin Bioinformatics Original Papers Motivation: Within medical research there is an increasing trend toward deriving multiple types of data from the same individual. The most effective prognostic prediction methods should use all available data, as this maximizes the amount of information used. In this article, we consider a variety of learning strategies to boost prediction performance based on the use of all available data. Implementation: We consider data integration via the use of multiple kernel learning supervised learning methods. We propose a scheme in which feature selection by statistical score is performed separately per data type and by pathway membership. We further consider the introduction of a confidence measure for the class assignment, both to remove some ambiguously labeled datapoints from the training data and to implement a cautious classifier that only makes predictions when the associated confidence is high. Results: We use the METABRIC dataset for breast cancer, with prediction of survival at 2000 days from diagnosis. Predictive accuracy is improved by using kernels that exclusively use those genes, as features, which are known members of particular pathways. We show that yet further improvements can be made by using a range of additional kernels based on clinical covariates such as Estrogen Receptor (ER) status. Using this range of measures to improve prediction performance, we show that the test accuracy on new instances is nearly 80%, though predictions are only made on 69.2% of the patient cohort. Availability: https://github.com/jseoane/FSMKL Contact: J.Seoane@bristol.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-03-15 2013-10-24 /pmc/articles/PMC3957070/ /pubmed/24162466 http://dx.doi.org/10.1093/bioinformatics/btt610 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Seoane, José A.
Day, Ian N. M.
Gaunt, Tom R.
Campbell, Colin
A pathway-based data integration framework for prediction of disease progression
title A pathway-based data integration framework for prediction of disease progression
title_full A pathway-based data integration framework for prediction of disease progression
title_fullStr A pathway-based data integration framework for prediction of disease progression
title_full_unstemmed A pathway-based data integration framework for prediction of disease progression
title_short A pathway-based data integration framework for prediction of disease progression
title_sort pathway-based data integration framework for prediction of disease progression
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3957070/
https://www.ncbi.nlm.nih.gov/pubmed/24162466
http://dx.doi.org/10.1093/bioinformatics/btt610
work_keys_str_mv AT seoanejosea apathwaybaseddataintegrationframeworkforpredictionofdiseaseprogression
AT dayiannm apathwaybaseddataintegrationframeworkforpredictionofdiseaseprogression
AT gaunttomr apathwaybaseddataintegrationframeworkforpredictionofdiseaseprogression
AT campbellcolin apathwaybaseddataintegrationframeworkforpredictionofdiseaseprogression
AT seoanejosea pathwaybaseddataintegrationframeworkforpredictionofdiseaseprogression
AT dayiannm pathwaybaseddataintegrationframeworkforpredictionofdiseaseprogression
AT gaunttomr pathwaybaseddataintegrationframeworkforpredictionofdiseaseprogression
AT campbellcolin pathwaybaseddataintegrationframeworkforpredictionofdiseaseprogression