Cargando…

Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets

Recent technological and computational advances have enabled the collection of data at an unprecedented rate. On the one hand, the large amount of data suddenly available has opened up new opportunities for new data-driven research but, on the other hand, it has brought into light new obstacles and...

Descripción completa

Detalles Bibliográficos
Autores principales: Garlaschi, Stefano, Fochesato, Anna, Tovo, Anna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597173/
https://www.ncbi.nlm.nih.gov/pubmed/33286853
http://dx.doi.org/10.3390/e22101084
_version_ 1783602282069229568
author Garlaschi, Stefano
Fochesato, Anna
Tovo, Anna
author_facet Garlaschi, Stefano
Fochesato, Anna
Tovo, Anna
author_sort Garlaschi, Stefano
collection PubMed
description Recent technological and computational advances have enabled the collection of data at an unprecedented rate. On the one hand, the large amount of data suddenly available has opened up new opportunities for new data-driven research but, on the other hand, it has brought into light new obstacles and challenges related to storage and analysis limits. Here, we strengthen an upscaling approach borrowed from theoretical ecology that allows us to infer with small errors relevant patterns of a dataset in its entirety, although only a limited fraction of it has been analysed. In particular we show that, after reducing the input amount of information on the system under study, by applying our framework it is still possible to recover two statistical patterns of interest of the entire dataset. Tested against big ecological, human activity and genomics data, our framework was successful in the reconstruction of global statistics related to both the number of types and their abundances while starting from limited presence/absence information on small random samples of the datasets. These results pave the way for future applications of our procedure in different life science contexts, from social activities to natural ecosystems.
format Online
Article
Text
id pubmed-7597173
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75971732020-11-09 Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets Garlaschi, Stefano Fochesato, Anna Tovo, Anna Entropy (Basel) Article Recent technological and computational advances have enabled the collection of data at an unprecedented rate. On the one hand, the large amount of data suddenly available has opened up new opportunities for new data-driven research but, on the other hand, it has brought into light new obstacles and challenges related to storage and analysis limits. Here, we strengthen an upscaling approach borrowed from theoretical ecology that allows us to infer with small errors relevant patterns of a dataset in its entirety, although only a limited fraction of it has been analysed. In particular we show that, after reducing the input amount of information on the system under study, by applying our framework it is still possible to recover two statistical patterns of interest of the entire dataset. Tested against big ecological, human activity and genomics data, our framework was successful in the reconstruction of global statistics related to both the number of types and their abundances while starting from limited presence/absence information on small random samples of the datasets. These results pave the way for future applications of our procedure in different life science contexts, from social activities to natural ecosystems. MDPI 2020-09-26 /pmc/articles/PMC7597173/ /pubmed/33286853 http://dx.doi.org/10.3390/e22101084 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Garlaschi, Stefano
Fochesato, Anna
Tovo, Anna
Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets
title Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets
title_full Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets
title_fullStr Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets
title_full_unstemmed Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets
title_short Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets
title_sort upscaling statistical patterns from reduced storage in social and life science big datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597173/
https://www.ncbi.nlm.nih.gov/pubmed/33286853
http://dx.doi.org/10.3390/e22101084
work_keys_str_mv AT garlaschistefano upscalingstatisticalpatternsfromreducedstorageinsocialandlifesciencebigdatasets
AT fochesatoanna upscalingstatisticalpatternsfromreducedstorageinsocialandlifesciencebigdatasets
AT tovoanna upscalingstatisticalpatternsfromreducedstorageinsocialandlifesciencebigdatasets