Cargando…

Storing, combining and analysing turkey experimental data in the Big Data era

With the increasing availability of large amounts of data in the livestock domain, we face the challenge to store, combine and analyse these data efficiently. With this study, we explored the use of a data lake for storing and analysing data to improve scalability and interoperability. Data originat...

Descripción completa

Detalles Bibliográficos
Autores principales: Schokker, D., Athanasiadis, I. N., Visser, B., Veerkamp, R. F., Kamphuis, C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cambridge University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7538337/
https://www.ncbi.nlm.nih.gov/pubmed/32624081
http://dx.doi.org/10.1017/S175173112000155X
_version_ 1783590848452100096
author Schokker, D.
Athanasiadis, I. N.
Visser, B.
Veerkamp, R. F.
Kamphuis, C.
author_facet Schokker, D.
Athanasiadis, I. N.
Visser, B.
Veerkamp, R. F.
Kamphuis, C.
author_sort Schokker, D.
collection PubMed
description With the increasing availability of large amounts of data in the livestock domain, we face the challenge to store, combine and analyse these data efficiently. With this study, we explored the use of a data lake for storing and analysing data to improve scalability and interoperability. Data originated from a 2-day animal experiment in which the gait score of approximately 200 turkeys was determined through visual inspection by an expert. Additionally, inertial measurement units (IMUs), a 3D-video camera and a force plate (FP) were installed to explore the effectiveness of these sensors in automating the visual gait scoring. We deployed a data lake using the IMU and FP data of a single day of that animal experiment. This encompasses data from 84 turkeys for which we preprocessed by performing an ‘extract, transform and load’ (ETL-) procedure. To test scalability of the ETL-procedure, we simulated increasing volumes of the available data from this animal experiment and computed the ‘wall time’ (elapsed real time) for converting FP data into comma-separated files and storing these files. With a simulated data set of 30 000 turkeys, the wall time reduced from 1 h to less than 15 min, when 12 cores were used compared to 1 core. This demonstrated the ETL-procedure to be scalable. Subsequently, a machine learning (ML) pipeline was developed to test the potential of a data lake to automatically distinguish between two classses, that is, very bad gait scores v. other scores. In conclusion, we have set up a dedicated customized data lake, loaded data and developed a prediction model via the creation of an ML pipeline. A data lake appears to be a useful tool to face the challenge of storing, combining and analysing increasing volumes of data of varying nature in an effective manner.
format Online
Article
Text
id pubmed-7538337
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cambridge University Press
record_format MEDLINE/PubMed
spelling pubmed-75383372020-10-15 Storing, combining and analysing turkey experimental data in the Big Data era Schokker, D. Athanasiadis, I. N. Visser, B. Veerkamp, R. F. Kamphuis, C. Animal Research Article With the increasing availability of large amounts of data in the livestock domain, we face the challenge to store, combine and analyse these data efficiently. With this study, we explored the use of a data lake for storing and analysing data to improve scalability and interoperability. Data originated from a 2-day animal experiment in which the gait score of approximately 200 turkeys was determined through visual inspection by an expert. Additionally, inertial measurement units (IMUs), a 3D-video camera and a force plate (FP) were installed to explore the effectiveness of these sensors in automating the visual gait scoring. We deployed a data lake using the IMU and FP data of a single day of that animal experiment. This encompasses data from 84 turkeys for which we preprocessed by performing an ‘extract, transform and load’ (ETL-) procedure. To test scalability of the ETL-procedure, we simulated increasing volumes of the available data from this animal experiment and computed the ‘wall time’ (elapsed real time) for converting FP data into comma-separated files and storing these files. With a simulated data set of 30 000 turkeys, the wall time reduced from 1 h to less than 15 min, when 12 cores were used compared to 1 core. This demonstrated the ETL-procedure to be scalable. Subsequently, a machine learning (ML) pipeline was developed to test the potential of a data lake to automatically distinguish between two classses, that is, very bad gait scores v. other scores. In conclusion, we have set up a dedicated customized data lake, loaded data and developed a prediction model via the creation of an ML pipeline. A data lake appears to be a useful tool to face the challenge of storing, combining and analysing increasing volumes of data of varying nature in an effective manner. Cambridge University Press 2020-11 2020-06-22 /pmc/articles/PMC7538337/ /pubmed/32624081 http://dx.doi.org/10.1017/S175173112000155X Text en © The Animal Consortium 2020 http://creativecommons.org/licenses/by/4.0/ This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Schokker, D.
Athanasiadis, I. N.
Visser, B.
Veerkamp, R. F.
Kamphuis, C.
Storing, combining and analysing turkey experimental data in the Big Data era
title Storing, combining and analysing turkey experimental data in the Big Data era
title_full Storing, combining and analysing turkey experimental data in the Big Data era
title_fullStr Storing, combining and analysing turkey experimental data in the Big Data era
title_full_unstemmed Storing, combining and analysing turkey experimental data in the Big Data era
title_short Storing, combining and analysing turkey experimental data in the Big Data era
title_sort storing, combining and analysing turkey experimental data in the big data era
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7538337/
https://www.ncbi.nlm.nih.gov/pubmed/32624081
http://dx.doi.org/10.1017/S175173112000155X
work_keys_str_mv AT schokkerd storingcombiningandanalysingturkeyexperimentaldatainthebigdataera
AT athanasiadisin storingcombiningandanalysingturkeyexperimentaldatainthebigdataera
AT visserb storingcombiningandanalysingturkeyexperimentaldatainthebigdataera
AT veerkamprf storingcombiningandanalysingturkeyexperimentaldatainthebigdataera
AT kamphuisc storingcombiningandanalysingturkeyexperimentaldatainthebigdataera