Cargando…

Transformational machine learning: Learning how to learn from many related scientific problems

Almost all machine learning (ML) is based on representing examples using intrinsic features. When there are multiple related ML problems (tasks), it is possible to transform these features into extrinsic features by first training ML models on other tasks and letting them each make predictions for e...

Descripción completa

Detalles Bibliográficos
Autores principales:	Olier, Ivan, Orhobor, Oghenejokpeme I., Dash, Tirtharaj, Davis, Andy M., Soldatova, Larisa N., Vanschoren, Joaquin, King, Ross D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	National Academy of Sciences 2021
Materias:	Physical Sciences
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8670494/ https://www.ncbi.nlm.nih.gov/pubmed/34845013 http://dx.doi.org/10.1073/pnas.2108013118

_version_	1784614987269931008
author	Olier, Ivan Orhobor, Oghenejokpeme I. Dash, Tirtharaj Davis, Andy M. Soldatova, Larisa N. Vanschoren, Joaquin King, Ross D.
author_facet	Olier, Ivan Orhobor, Oghenejokpeme I. Dash, Tirtharaj Davis, Andy M. Soldatova, Larisa N. Vanschoren, Joaquin King, Ross D.
author_sort	Olier, Ivan
collection	PubMed
description	Almost all machine learning (ML) is based on representing examples using intrinsic features. When there are multiple related ML problems (tasks), it is possible to transform these features into extrinsic features by first training ML models on other tasks and letting them each make predictions for each example of the new task, yielding a novel representation. We call this transformational ML (TML). TML is very closely related to, and synergistic with, transfer learning, multitask learning, and stacking. TML is applicable to improving any nonlinear ML method. We tested TML using the most important classes of nonlinear ML: random forests, gradient boosting machines, support vector machines, k-nearest neighbors, and neural networks. To ensure the generality and robustness of the evaluation, we utilized thousands of ML problems from three scientific domains: drug design, predicting gene expression, and ML algorithm selection. We found that TML significantly improved the predictive performance of all the ML methods in all the domains (4 to 50% average improvements) and that TML features generally outperformed intrinsic features. Use of TML also enhances scientific understanding through explainable ML. In drug design, we found that TML provided insight into drug target specificity, the relationships between drugs, and the relationships between target proteins. TML leads to an ecosystem-based approach to ML, where new tasks, examples, predictions, and so on synergistically interact to improve performance. To contribute to this ecosystem, all our data, code, and our ∼50,000 ML models have been fully annotated with metadata, linked, and openly published using Findability, Accessibility, Interoperability, and Reusability principles (∼100 Gbytes).
format	Online Article Text
id	pubmed-8670494
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	National Academy of Sciences
record_format	MEDLINE/PubMed
spelling	pubmed-86704942021-12-28 Transformational machine learning: Learning how to learn from many related scientific problems Olier, Ivan Orhobor, Oghenejokpeme I. Dash, Tirtharaj Davis, Andy M. Soldatova, Larisa N. Vanschoren, Joaquin King, Ross D. Proc Natl Acad Sci U S A Physical Sciences Almost all machine learning (ML) is based on representing examples using intrinsic features. When there are multiple related ML problems (tasks), it is possible to transform these features into extrinsic features by first training ML models on other tasks and letting them each make predictions for each example of the new task, yielding a novel representation. We call this transformational ML (TML). TML is very closely related to, and synergistic with, transfer learning, multitask learning, and stacking. TML is applicable to improving any nonlinear ML method. We tested TML using the most important classes of nonlinear ML: random forests, gradient boosting machines, support vector machines, k-nearest neighbors, and neural networks. To ensure the generality and robustness of the evaluation, we utilized thousands of ML problems from three scientific domains: drug design, predicting gene expression, and ML algorithm selection. We found that TML significantly improved the predictive performance of all the ML methods in all the domains (4 to 50% average improvements) and that TML features generally outperformed intrinsic features. Use of TML also enhances scientific understanding through explainable ML. In drug design, we found that TML provided insight into drug target specificity, the relationships between drugs, and the relationships between target proteins. TML leads to an ecosystem-based approach to ML, where new tasks, examples, predictions, and so on synergistically interact to improve performance. To contribute to this ecosystem, all our data, code, and our ∼50,000 ML models have been fully annotated with metadata, linked, and openly published using Findability, Accessibility, Interoperability, and Reusability principles (∼100 Gbytes). National Academy of Sciences 2021-11-29 2021-12-07 /pmc/articles/PMC8670494/ /pubmed/34845013 http://dx.doi.org/10.1073/pnas.2108013118 Text en Copyright © 2021 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by/4.0/This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Physical Sciences Olier, Ivan Orhobor, Oghenejokpeme I. Dash, Tirtharaj Davis, Andy M. Soldatova, Larisa N. Vanschoren, Joaquin King, Ross D. Transformational machine learning: Learning how to learn from many related scientific problems
title	Transformational machine learning: Learning how to learn from many related scientific problems
title_full	Transformational machine learning: Learning how to learn from many related scientific problems
title_fullStr	Transformational machine learning: Learning how to learn from many related scientific problems
title_full_unstemmed	Transformational machine learning: Learning how to learn from many related scientific problems
title_short	Transformational machine learning: Learning how to learn from many related scientific problems
title_sort	transformational machine learning: learning how to learn from many related scientific problems
topic	Physical Sciences
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8670494/ https://www.ncbi.nlm.nih.gov/pubmed/34845013 http://dx.doi.org/10.1073/pnas.2108013118
work_keys_str_mv	AT olierivan transformationalmachinelearninglearninghowtolearnfrommanyrelatedscientificproblems AT orhoboroghenejokpemei transformationalmachinelearninglearninghowtolearnfrommanyrelatedscientificproblems AT dashtirtharaj transformationalmachinelearninglearninghowtolearnfrommanyrelatedscientificproblems AT davisandym transformationalmachinelearninglearninghowtolearnfrommanyrelatedscientificproblems AT soldatovalarisan transformationalmachinelearninglearninghowtolearnfrommanyrelatedscientificproblems AT vanschorenjoaquin transformationalmachinelearninglearninghowtolearnfrommanyrelatedscientificproblems AT kingrossd transformationalmachinelearninglearninghowtolearnfrommanyrelatedscientificproblems

Transformational machine learning: Learning how to learn from many related scientific problems

Ejemplares similares