Cargando…
Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets
Recent technological and computational advances have enabled the collection of data at an unprecedented rate. On the one hand, the large amount of data suddenly available has opened up new opportunities for new data-driven research but, on the other hand, it has brought into light new obstacles and...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597173/ https://www.ncbi.nlm.nih.gov/pubmed/33286853 http://dx.doi.org/10.3390/e22101084 |
_version_ | 1783602282069229568 |
---|---|
author | Garlaschi, Stefano Fochesato, Anna Tovo, Anna |
author_facet | Garlaschi, Stefano Fochesato, Anna Tovo, Anna |
author_sort | Garlaschi, Stefano |
collection | PubMed |
description | Recent technological and computational advances have enabled the collection of data at an unprecedented rate. On the one hand, the large amount of data suddenly available has opened up new opportunities for new data-driven research but, on the other hand, it has brought into light new obstacles and challenges related to storage and analysis limits. Here, we strengthen an upscaling approach borrowed from theoretical ecology that allows us to infer with small errors relevant patterns of a dataset in its entirety, although only a limited fraction of it has been analysed. In particular we show that, after reducing the input amount of information on the system under study, by applying our framework it is still possible to recover two statistical patterns of interest of the entire dataset. Tested against big ecological, human activity and genomics data, our framework was successful in the reconstruction of global statistics related to both the number of types and their abundances while starting from limited presence/absence information on small random samples of the datasets. These results pave the way for future applications of our procedure in different life science contexts, from social activities to natural ecosystems. |
format | Online Article Text |
id | pubmed-7597173 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-75971732020-11-09 Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets Garlaschi, Stefano Fochesato, Anna Tovo, Anna Entropy (Basel) Article Recent technological and computational advances have enabled the collection of data at an unprecedented rate. On the one hand, the large amount of data suddenly available has opened up new opportunities for new data-driven research but, on the other hand, it has brought into light new obstacles and challenges related to storage and analysis limits. Here, we strengthen an upscaling approach borrowed from theoretical ecology that allows us to infer with small errors relevant patterns of a dataset in its entirety, although only a limited fraction of it has been analysed. In particular we show that, after reducing the input amount of information on the system under study, by applying our framework it is still possible to recover two statistical patterns of interest of the entire dataset. Tested against big ecological, human activity and genomics data, our framework was successful in the reconstruction of global statistics related to both the number of types and their abundances while starting from limited presence/absence information on small random samples of the datasets. These results pave the way for future applications of our procedure in different life science contexts, from social activities to natural ecosystems. MDPI 2020-09-26 /pmc/articles/PMC7597173/ /pubmed/33286853 http://dx.doi.org/10.3390/e22101084 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Garlaschi, Stefano Fochesato, Anna Tovo, Anna Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets |
title | Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets |
title_full | Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets |
title_fullStr | Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets |
title_full_unstemmed | Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets |
title_short | Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets |
title_sort | upscaling statistical patterns from reduced storage in social and life science big datasets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597173/ https://www.ncbi.nlm.nih.gov/pubmed/33286853 http://dx.doi.org/10.3390/e22101084 |
work_keys_str_mv | AT garlaschistefano upscalingstatisticalpatternsfromreducedstorageinsocialandlifesciencebigdatasets AT fochesatoanna upscalingstatisticalpatternsfromreducedstorageinsocialandlifesciencebigdatasets AT tovoanna upscalingstatisticalpatternsfromreducedstorageinsocialandlifesciencebigdatasets |