Cargando…

Optimizations for the EcoPod field identification tool

BACKGROUND: We sketch our species identification tool for palm sized computers that helps knowledgeable observers with census activities. An algorithm turns an identification matrix into a minimal length series of questions that guide the operator towards identification. Historic observation data fr...

Descripción completa

Detalles Bibliográficos
Autores principales: Manoharan, Aswath, Stamberger, Jeannie, Yu, YuanYuan, Paepcke, Andreas
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2322985/
https://www.ncbi.nlm.nih.gov/pubmed/18366649
http://dx.doi.org/10.1186/1471-2105-9-150
_version_ 1782152608289390592
author Manoharan, Aswath
Stamberger, Jeannie
Yu, YuanYuan
Paepcke, Andreas
author_facet Manoharan, Aswath
Stamberger, Jeannie
Yu, YuanYuan
Paepcke, Andreas
author_sort Manoharan, Aswath
collection PubMed
description BACKGROUND: We sketch our species identification tool for palm sized computers that helps knowledgeable observers with census activities. An algorithm turns an identification matrix into a minimal length series of questions that guide the operator towards identification. Historic observation data from the census geographic area helps minimize question volume. We explore how much historic data is required to boost performance, and whether the use of history negatively impacts identification of rare species. We also explore how characteristics of the matrix interact with the algorithm, and how best to predict the probability of observing a previously unseen species. RESULTS: Point counts of birds taken at Stanford University's Jasper Ridge Biological Preserve between 2000 and 2005 were used to examine the algorithm. A computer identified species by correctly answering, and counting the algorithm's questions. We also explored how the character density of the key matrix and the theoretical minimum number of questions for each bird in the matrix influenced the algorithm. Our investigation of the required probability smoothing determined whether Laplace smoothing of observation probabilities was sufficient, or whether the more complex Good-Turing technique is required. CONCLUSION: Historic data improved identification speed, but only impacted the top 25% most frequently observed birds. For rare birds the history based algorithms did not impose a noticeable penalty in the number of questions required for identification. For our dataset neither age of the historic data, nor the number of observation years impacted the algorithm. Density of characters for different taxa in the identification matrix did not impact the algorithms. Intrinsic differences in identifying different birds did affect the algorithm, but the differences affected the baseline method of not using historic data to exactly the same degree. We found that Laplace smoothing performed better for rare species than Simple Good-Turing, and that, contrary to expectation, the technique did not then adversely affect identification performance for frequently observed birds.
format Text
id pubmed-2322985
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23229852008-04-18 Optimizations for the EcoPod field identification tool Manoharan, Aswath Stamberger, Jeannie Yu, YuanYuan Paepcke, Andreas BMC Bioinformatics Research Article BACKGROUND: We sketch our species identification tool for palm sized computers that helps knowledgeable observers with census activities. An algorithm turns an identification matrix into a minimal length series of questions that guide the operator towards identification. Historic observation data from the census geographic area helps minimize question volume. We explore how much historic data is required to boost performance, and whether the use of history negatively impacts identification of rare species. We also explore how characteristics of the matrix interact with the algorithm, and how best to predict the probability of observing a previously unseen species. RESULTS: Point counts of birds taken at Stanford University's Jasper Ridge Biological Preserve between 2000 and 2005 were used to examine the algorithm. A computer identified species by correctly answering, and counting the algorithm's questions. We also explored how the character density of the key matrix and the theoretical minimum number of questions for each bird in the matrix influenced the algorithm. Our investigation of the required probability smoothing determined whether Laplace smoothing of observation probabilities was sufficient, or whether the more complex Good-Turing technique is required. CONCLUSION: Historic data improved identification speed, but only impacted the top 25% most frequently observed birds. For rare birds the history based algorithms did not impose a noticeable penalty in the number of questions required for identification. For our dataset neither age of the historic data, nor the number of observation years impacted the algorithm. Density of characters for different taxa in the identification matrix did not impact the algorithms. Intrinsic differences in identifying different birds did affect the algorithm, but the differences affected the baseline method of not using historic data to exactly the same degree. We found that Laplace smoothing performed better for rare species than Simple Good-Turing, and that, contrary to expectation, the technique did not then adversely affect identification performance for frequently observed birds. BioMed Central 2008-03-17 /pmc/articles/PMC2322985/ /pubmed/18366649 http://dx.doi.org/10.1186/1471-2105-9-150 Text en Copyright © 2008 Manoharan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Manoharan, Aswath
Stamberger, Jeannie
Yu, YuanYuan
Paepcke, Andreas
Optimizations for the EcoPod field identification tool
title Optimizations for the EcoPod field identification tool
title_full Optimizations for the EcoPod field identification tool
title_fullStr Optimizations for the EcoPod field identification tool
title_full_unstemmed Optimizations for the EcoPod field identification tool
title_short Optimizations for the EcoPod field identification tool
title_sort optimizations for the ecopod field identification tool
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2322985/
https://www.ncbi.nlm.nih.gov/pubmed/18366649
http://dx.doi.org/10.1186/1471-2105-9-150
work_keys_str_mv AT manoharanaswath optimizationsfortheecopodfieldidentificationtool
AT stambergerjeannie optimizationsfortheecopodfieldidentificationtool
AT yuyuanyuan optimizationsfortheecopodfieldidentificationtool
AT paepckeandreas optimizationsfortheecopodfieldidentificationtool