Cargando…

On the impact of Citizen Science-derived data quality on deep learning based classification in marine images

The evaluation of large amounts of digital image data is of growing importance for biology, including for the exploration and monitoring of marine habitats. However, only a tiny percentage of the image data collected is evaluated by marine biologists who manually interpret and annotate the image con...

Descripción completa

Detalles Bibliográficos
Autores principales: Langenkämper, Daniel, Simon-Lledó, Erik, Hosking, Brett, Jones, Daniel O. B., Nattkemper, Tim W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6561570/
https://www.ncbi.nlm.nih.gov/pubmed/31188894
http://dx.doi.org/10.1371/journal.pone.0218086
_version_ 1783426153032187904
author Langenkämper, Daniel
Simon-Lledó, Erik
Hosking, Brett
Jones, Daniel O. B.
Nattkemper, Tim W.
author_facet Langenkämper, Daniel
Simon-Lledó, Erik
Hosking, Brett
Jones, Daniel O. B.
Nattkemper, Tim W.
author_sort Langenkämper, Daniel
collection PubMed
description The evaluation of large amounts of digital image data is of growing importance for biology, including for the exploration and monitoring of marine habitats. However, only a tiny percentage of the image data collected is evaluated by marine biologists who manually interpret and annotate the image contents, which can be slow and laborious. In order to overcome the bottleneck in image annotation, two strategies are increasingly proposed: “citizen science” and “machine learning”. In this study, we investigated how the combination of citizen science, to detect objects, and machine learning, to classify megafauna, could be used to automate annotation of underwater images. For this purpose, multiple large data sets of citizen science annotations with different degrees of common errors and inaccuracies observed in citizen science data were simulated by modifying “gold standard” annotations done by an experienced marine biologist. The parameters of the simulation were determined on the basis of two citizen science experiments. It allowed us to analyze the relationship between the outcome of a citizen science study and the quality of the classifications of a deep learning megafauna classifier. The results show great potential for combining citizen science with machine learning, provided that the participants are informed precisely about the annotation protocol. Inaccuracies in the position of the annotation had the most substantial influence on the classification accuracy, whereas the size of the marking and false positive detections had a smaller influence.
format Online
Article
Text
id pubmed-6561570
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-65615702019-06-20 On the impact of Citizen Science-derived data quality on deep learning based classification in marine images Langenkämper, Daniel Simon-Lledó, Erik Hosking, Brett Jones, Daniel O. B. Nattkemper, Tim W. PLoS One Research Article The evaluation of large amounts of digital image data is of growing importance for biology, including for the exploration and monitoring of marine habitats. However, only a tiny percentage of the image data collected is evaluated by marine biologists who manually interpret and annotate the image contents, which can be slow and laborious. In order to overcome the bottleneck in image annotation, two strategies are increasingly proposed: “citizen science” and “machine learning”. In this study, we investigated how the combination of citizen science, to detect objects, and machine learning, to classify megafauna, could be used to automate annotation of underwater images. For this purpose, multiple large data sets of citizen science annotations with different degrees of common errors and inaccuracies observed in citizen science data were simulated by modifying “gold standard” annotations done by an experienced marine biologist. The parameters of the simulation were determined on the basis of two citizen science experiments. It allowed us to analyze the relationship between the outcome of a citizen science study and the quality of the classifications of a deep learning megafauna classifier. The results show great potential for combining citizen science with machine learning, provided that the participants are informed precisely about the annotation protocol. Inaccuracies in the position of the annotation had the most substantial influence on the classification accuracy, whereas the size of the marking and false positive detections had a smaller influence. Public Library of Science 2019-06-12 /pmc/articles/PMC6561570/ /pubmed/31188894 http://dx.doi.org/10.1371/journal.pone.0218086 Text en © 2019 Langenkämper et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Langenkämper, Daniel
Simon-Lledó, Erik
Hosking, Brett
Jones, Daniel O. B.
Nattkemper, Tim W.
On the impact of Citizen Science-derived data quality on deep learning based classification in marine images
title On the impact of Citizen Science-derived data quality on deep learning based classification in marine images
title_full On the impact of Citizen Science-derived data quality on deep learning based classification in marine images
title_fullStr On the impact of Citizen Science-derived data quality on deep learning based classification in marine images
title_full_unstemmed On the impact of Citizen Science-derived data quality on deep learning based classification in marine images
title_short On the impact of Citizen Science-derived data quality on deep learning based classification in marine images
title_sort on the impact of citizen science-derived data quality on deep learning based classification in marine images
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6561570/
https://www.ncbi.nlm.nih.gov/pubmed/31188894
http://dx.doi.org/10.1371/journal.pone.0218086
work_keys_str_mv AT langenkamperdaniel ontheimpactofcitizensciencederiveddataqualityondeeplearningbasedclassificationinmarineimages
AT simonlledoerik ontheimpactofcitizensciencederiveddataqualityondeeplearningbasedclassificationinmarineimages
AT hoskingbrett ontheimpactofcitizensciencederiveddataqualityondeeplearningbasedclassificationinmarineimages
AT jonesdanielob ontheimpactofcitizensciencederiveddataqualityondeeplearningbasedclassificationinmarineimages
AT nattkempertimw ontheimpactofcitizensciencederiveddataqualityondeeplearningbasedclassificationinmarineimages