Cargando…

The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets

Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the performance of c...

Descripción completa

Detalles Bibliográficos
Autores principales: Chimienti, Marianna, Kato, Akiko, Hicks, Olivia, Angelier, Frédéric, Beaulieu, Michaël, Ouled-Cheikh, Jazel, Marciau, Coline, Raclot, Thierry, Tucker, Meagan, Wisniewska, Danuta Maria, Chiaradia, André, Ropert-Coudert, Yan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9672113/
https://www.ncbi.nlm.nih.gov/pubmed/36396680
http://dx.doi.org/10.1038/s41598-022-22258-1
_version_ 1784832688893460480
author Chimienti, Marianna
Kato, Akiko
Hicks, Olivia
Angelier, Frédéric
Beaulieu, Michaël
Ouled-Cheikh, Jazel
Marciau, Coline
Raclot, Thierry
Tucker, Meagan
Wisniewska, Danuta Maria
Chiaradia, André
Ropert-Coudert, Yan
author_facet Chimienti, Marianna
Kato, Akiko
Hicks, Olivia
Angelier, Frédéric
Beaulieu, Michaël
Ouled-Cheikh, Jazel
Marciau, Coline
Raclot, Thierry
Tucker, Meagan
Wisniewska, Danuta Maria
Chiaradia, André
Ropert-Coudert, Yan
author_sort Chimienti, Marianna
collection PubMed
description Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the performance of commonly used artificial intelligence tools on datasets of increasing volume and dimensionality. By collecting bio-logging data across several sampling seasons, datasets are inherently characterized by inter-individual variability. Such information should be considered when predicting behaviour. We integrated both unsupervised and supervised machine learning approaches to predict behaviours in two penguin species. The classified behaviours obtained from the unsupervised approach Expectation Maximisation were used to train the supervised approach Random Forest. We assessed agreement between the approaches, the performance of Random Forest on unknown data and the implications for the calculation of energy expenditure. Consideration of behavioural variability resulted in high agreement (> 80%) in behavioural classifications and minimal differences in energy expenditure estimates. However, some outliers with < 70% of agreement, highlighted how behaviours characterized by signal similarity are confused. We advise the broad bio-logging community, approaching these large datasets, to be cautious when upscaling predictions, as this might lead to less accurate estimates of behaviour and energy expenditure.
format Online
Article
Text
id pubmed-9672113
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-96721132022-11-19 The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets Chimienti, Marianna Kato, Akiko Hicks, Olivia Angelier, Frédéric Beaulieu, Michaël Ouled-Cheikh, Jazel Marciau, Coline Raclot, Thierry Tucker, Meagan Wisniewska, Danuta Maria Chiaradia, André Ropert-Coudert, Yan Sci Rep Article Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the performance of commonly used artificial intelligence tools on datasets of increasing volume and dimensionality. By collecting bio-logging data across several sampling seasons, datasets are inherently characterized by inter-individual variability. Such information should be considered when predicting behaviour. We integrated both unsupervised and supervised machine learning approaches to predict behaviours in two penguin species. The classified behaviours obtained from the unsupervised approach Expectation Maximisation were used to train the supervised approach Random Forest. We assessed agreement between the approaches, the performance of Random Forest on unknown data and the implications for the calculation of energy expenditure. Consideration of behavioural variability resulted in high agreement (> 80%) in behavioural classifications and minimal differences in energy expenditure estimates. However, some outliers with < 70% of agreement, highlighted how behaviours characterized by signal similarity are confused. We advise the broad bio-logging community, approaching these large datasets, to be cautious when upscaling predictions, as this might lead to less accurate estimates of behaviour and energy expenditure. Nature Publishing Group UK 2022-11-17 /pmc/articles/PMC9672113/ /pubmed/36396680 http://dx.doi.org/10.1038/s41598-022-22258-1 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Chimienti, Marianna
Kato, Akiko
Hicks, Olivia
Angelier, Frédéric
Beaulieu, Michaël
Ouled-Cheikh, Jazel
Marciau, Coline
Raclot, Thierry
Tucker, Meagan
Wisniewska, Danuta Maria
Chiaradia, André
Ropert-Coudert, Yan
The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
title The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
title_full The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
title_fullStr The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
title_full_unstemmed The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
title_short The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
title_sort role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9672113/
https://www.ncbi.nlm.nih.gov/pubmed/36396680
http://dx.doi.org/10.1038/s41598-022-22258-1
work_keys_str_mv AT chimientimarianna theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT katoakiko theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT hicksolivia theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT angelierfrederic theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT beaulieumichael theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT ouledcheikhjazel theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT marciaucoline theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT raclotthierry theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT tuckermeagan theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT wisniewskadanutamaria theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT chiaradiaandre theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT ropertcoudertyan theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT chimientimarianna roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT katoakiko roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT hicksolivia roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT angelierfrederic roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT beaulieumichael roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT ouledcheikhjazel roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT marciaucoline roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT raclotthierry roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT tuckermeagan roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT wisniewskadanutamaria roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT chiaradiaandre roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets
AT ropertcoudertyan roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets