Cargando…
The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets
Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the performance of c...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9672113/ https://www.ncbi.nlm.nih.gov/pubmed/36396680 http://dx.doi.org/10.1038/s41598-022-22258-1 |
_version_ | 1784832688893460480 |
---|---|
author | Chimienti, Marianna Kato, Akiko Hicks, Olivia Angelier, Frédéric Beaulieu, Michaël Ouled-Cheikh, Jazel Marciau, Coline Raclot, Thierry Tucker, Meagan Wisniewska, Danuta Maria Chiaradia, André Ropert-Coudert, Yan |
author_facet | Chimienti, Marianna Kato, Akiko Hicks, Olivia Angelier, Frédéric Beaulieu, Michaël Ouled-Cheikh, Jazel Marciau, Coline Raclot, Thierry Tucker, Meagan Wisniewska, Danuta Maria Chiaradia, André Ropert-Coudert, Yan |
author_sort | Chimienti, Marianna |
collection | PubMed |
description | Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the performance of commonly used artificial intelligence tools on datasets of increasing volume and dimensionality. By collecting bio-logging data across several sampling seasons, datasets are inherently characterized by inter-individual variability. Such information should be considered when predicting behaviour. We integrated both unsupervised and supervised machine learning approaches to predict behaviours in two penguin species. The classified behaviours obtained from the unsupervised approach Expectation Maximisation were used to train the supervised approach Random Forest. We assessed agreement between the approaches, the performance of Random Forest on unknown data and the implications for the calculation of energy expenditure. Consideration of behavioural variability resulted in high agreement (> 80%) in behavioural classifications and minimal differences in energy expenditure estimates. However, some outliers with < 70% of agreement, highlighted how behaviours characterized by signal similarity are confused. We advise the broad bio-logging community, approaching these large datasets, to be cautious when upscaling predictions, as this might lead to less accurate estimates of behaviour and energy expenditure. |
format | Online Article Text |
id | pubmed-9672113 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-96721132022-11-19 The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets Chimienti, Marianna Kato, Akiko Hicks, Olivia Angelier, Frédéric Beaulieu, Michaël Ouled-Cheikh, Jazel Marciau, Coline Raclot, Thierry Tucker, Meagan Wisniewska, Danuta Maria Chiaradia, André Ropert-Coudert, Yan Sci Rep Article Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the performance of commonly used artificial intelligence tools on datasets of increasing volume and dimensionality. By collecting bio-logging data across several sampling seasons, datasets are inherently characterized by inter-individual variability. Such information should be considered when predicting behaviour. We integrated both unsupervised and supervised machine learning approaches to predict behaviours in two penguin species. The classified behaviours obtained from the unsupervised approach Expectation Maximisation were used to train the supervised approach Random Forest. We assessed agreement between the approaches, the performance of Random Forest on unknown data and the implications for the calculation of energy expenditure. Consideration of behavioural variability resulted in high agreement (> 80%) in behavioural classifications and minimal differences in energy expenditure estimates. However, some outliers with < 70% of agreement, highlighted how behaviours characterized by signal similarity are confused. We advise the broad bio-logging community, approaching these large datasets, to be cautious when upscaling predictions, as this might lead to less accurate estimates of behaviour and energy expenditure. Nature Publishing Group UK 2022-11-17 /pmc/articles/PMC9672113/ /pubmed/36396680 http://dx.doi.org/10.1038/s41598-022-22258-1 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Chimienti, Marianna Kato, Akiko Hicks, Olivia Angelier, Frédéric Beaulieu, Michaël Ouled-Cheikh, Jazel Marciau, Coline Raclot, Thierry Tucker, Meagan Wisniewska, Danuta Maria Chiaradia, André Ropert-Coudert, Yan The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets |
title | The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets |
title_full | The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets |
title_fullStr | The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets |
title_full_unstemmed | The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets |
title_short | The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets |
title_sort | role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9672113/ https://www.ncbi.nlm.nih.gov/pubmed/36396680 http://dx.doi.org/10.1038/s41598-022-22258-1 |
work_keys_str_mv | AT chimientimarianna theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT katoakiko theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT hicksolivia theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT angelierfrederic theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT beaulieumichael theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT ouledcheikhjazel theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT marciaucoline theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT raclotthierry theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT tuckermeagan theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT wisniewskadanutamaria theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT chiaradiaandre theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT ropertcoudertyan theroleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT chimientimarianna roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT katoakiko roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT hicksolivia roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT angelierfrederic roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT beaulieumichael roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT ouledcheikhjazel roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT marciaucoline roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT raclotthierry roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT tuckermeagan roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT wisniewskadanutamaria roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT chiaradiaandre roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets AT ropertcoudertyan roleofindividualvariabilityonthepredictiveperformanceofmachinelearningappliedtolargebiologgingdatasets |