Cargando…

Big data need big theory too

The current interest in big data, machine learning and data analytics has generated the widespread impression that such methods are capable of solving most problems without the need for conventional scientific methods of inquiry. Interest in these methods is intensifying, accelerated by the ease wit...

Descripción completa

Detalles Bibliográficos
Autores principales: Coveney, Peter V., Dougherty, Edward R., Highfield, Roger R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5052735/
https://www.ncbi.nlm.nih.gov/pubmed/27698035
http://dx.doi.org/10.1098/rsta.2016.0153
_version_ 1782458283461705728
author Coveney, Peter V.
Dougherty, Edward R.
Highfield, Roger R.
author_facet Coveney, Peter V.
Dougherty, Edward R.
Highfield, Roger R.
author_sort Coveney, Peter V.
collection PubMed
description The current interest in big data, machine learning and data analytics has generated the widespread impression that such methods are capable of solving most problems without the need for conventional scientific methods of inquiry. Interest in these methods is intensifying, accelerated by the ease with which digitized data can be acquired in virtually all fields of endeavour, from science, healthcare and cybersecurity to economics, social sciences and the humanities. In multiscale modelling, machine learning appears to provide a shortcut to reveal correlations of arbitrary complexity between processes at the atomic, molecular, meso- and macroscales. Here, we point out the weaknesses of pure big data approaches with particular focus on biology and medicine, which fail to provide conceptual accounts for the processes to which they are applied. No matter their ‘depth’ and the sophistication of data-driven methods, such as artificial neural nets, in the end they merely fit curves to existing data. Not only do these methods invariably require far larger quantities of data than anticipated by big data aficionados in order to produce statistically reliable results, but they can also fail in circumstances beyond the range of the data used to train them because they are not designed to model the structural characteristics of the underlying system. We argue that it is vital to use theory as a guide to experimental design for maximal efficiency of data collection and to produce reliable predictive models and conceptual knowledge. Rather than continuing to fund, pursue and promote ‘blind’ big data projects with massive budgets, we call for more funding to be allocated to the elucidation of the multiscale and stochastic processes controlling the behaviour of complex systems, including those of life, medicine and healthcare. This article is part of the themed issue ‘Multiscale modelling at the physics–chemistry–biology interface’.
format Online
Article
Text
id pubmed-5052735
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher The Royal Society
record_format MEDLINE/PubMed
spelling pubmed-50527352016-11-13 Big data need big theory too Coveney, Peter V. Dougherty, Edward R. Highfield, Roger R. Philos Trans A Math Phys Eng Sci Articles The current interest in big data, machine learning and data analytics has generated the widespread impression that such methods are capable of solving most problems without the need for conventional scientific methods of inquiry. Interest in these methods is intensifying, accelerated by the ease with which digitized data can be acquired in virtually all fields of endeavour, from science, healthcare and cybersecurity to economics, social sciences and the humanities. In multiscale modelling, machine learning appears to provide a shortcut to reveal correlations of arbitrary complexity between processes at the atomic, molecular, meso- and macroscales. Here, we point out the weaknesses of pure big data approaches with particular focus on biology and medicine, which fail to provide conceptual accounts for the processes to which they are applied. No matter their ‘depth’ and the sophistication of data-driven methods, such as artificial neural nets, in the end they merely fit curves to existing data. Not only do these methods invariably require far larger quantities of data than anticipated by big data aficionados in order to produce statistically reliable results, but they can also fail in circumstances beyond the range of the data used to train them because they are not designed to model the structural characteristics of the underlying system. We argue that it is vital to use theory as a guide to experimental design for maximal efficiency of data collection and to produce reliable predictive models and conceptual knowledge. Rather than continuing to fund, pursue and promote ‘blind’ big data projects with massive budgets, we call for more funding to be allocated to the elucidation of the multiscale and stochastic processes controlling the behaviour of complex systems, including those of life, medicine and healthcare. This article is part of the themed issue ‘Multiscale modelling at the physics–chemistry–biology interface’. The Royal Society 2016-11-13 /pmc/articles/PMC5052735/ /pubmed/27698035 http://dx.doi.org/10.1098/rsta.2016.0153 Text en © 2015 The Authors. http://creativecommons.org/licenses/by/4.0/ Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.
spellingShingle Articles
Coveney, Peter V.
Dougherty, Edward R.
Highfield, Roger R.
Big data need big theory too
title Big data need big theory too
title_full Big data need big theory too
title_fullStr Big data need big theory too
title_full_unstemmed Big data need big theory too
title_short Big data need big theory too
title_sort big data need big theory too
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5052735/
https://www.ncbi.nlm.nih.gov/pubmed/27698035
http://dx.doi.org/10.1098/rsta.2016.0153
work_keys_str_mv AT coveneypeterv bigdataneedbigtheorytoo
AT doughertyedwardr bigdataneedbigtheorytoo
AT highfieldrogerr bigdataneedbigtheorytoo