Cargando…

On-the-fly learning for visual search of large-scale image and video datasets

The objective of this work is to visually search large-scale video datasets for semantic entities specified by a text query. The paradigm we explore is constructing visual models for such semantic entities on-the-fly, i.e. at run time, by using an image search engine to source visual training data f...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chatfield, Ken, Arandjelović, Relja, Parkhi, Omkar, Zisserman, Andrew
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer London 2015
Materias:	Regular Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4498639/ https://www.ncbi.nlm.nih.gov/pubmed/26191469 http://dx.doi.org/10.1007/s13735-015-0077-0

_version_	1782380651539857408
author	Chatfield, Ken Arandjelović, Relja Parkhi, Omkar Zisserman, Andrew
author_facet	Chatfield, Ken Arandjelović, Relja Parkhi, Omkar Zisserman, Andrew
author_sort	Chatfield, Ken
collection	PubMed
description	The objective of this work is to visually search large-scale video datasets for semantic entities specified by a text query. The paradigm we explore is constructing visual models for such semantic entities on-the-fly, i.e. at run time, by using an image search engine to source visual training data for the text query. The approach combines fast and accurate learning and retrieval, and enables videos to be returned within seconds of specifying a query. We describe three classes of queries, each with its associated visual search method: object instances (using a bag of visual words approach for matching); object categories (using a discriminative classifier for ranking key frames); and faces (using a discriminative classifier for ranking face tracks). We discuss the features suitable for each class of query, for example Fisher vectors or features derived from convolutional neural networks (CNNs), and how these choices impact on the trade-off between three important performance measures for a real-time system of this kind, namely: (1) accuracy, (2) memory footprint, and (3) speed. We also discuss and compare a number of important implementation issues, such as how to remove ‘outliers’ in the downloaded images efficiently, and how to best obtain a single descriptor for a face track. We also sketch the architecture of the real-time on-the-fly system. Quantitative results are given on a number of large-scale image and video benchmarks (e.g. TRECVID INS, MIRFLICKR-1M), and we further demonstrate the performance and real-world applicability of our methods over a dataset sourced from 10,000 h of unedited footage from BBC News, comprising 5M+ key frames.
format	Online Article Text
id	pubmed-4498639
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Springer London
record_format	MEDLINE/PubMed
spelling	pubmed-44986392015-07-15 On-the-fly learning for visual search of large-scale image and video datasets Chatfield, Ken Arandjelović, Relja Parkhi, Omkar Zisserman, Andrew Int J Multimed Inf Retr Regular Paper The objective of this work is to visually search large-scale video datasets for semantic entities specified by a text query. The paradigm we explore is constructing visual models for such semantic entities on-the-fly, i.e. at run time, by using an image search engine to source visual training data for the text query. The approach combines fast and accurate learning and retrieval, and enables videos to be returned within seconds of specifying a query. We describe three classes of queries, each with its associated visual search method: object instances (using a bag of visual words approach for matching); object categories (using a discriminative classifier for ranking key frames); and faces (using a discriminative classifier for ranking face tracks). We discuss the features suitable for each class of query, for example Fisher vectors or features derived from convolutional neural networks (CNNs), and how these choices impact on the trade-off between three important performance measures for a real-time system of this kind, namely: (1) accuracy, (2) memory footprint, and (3) speed. We also discuss and compare a number of important implementation issues, such as how to remove ‘outliers’ in the downloaded images efficiently, and how to best obtain a single descriptor for a face track. We also sketch the architecture of the real-time on-the-fly system. Quantitative results are given on a number of large-scale image and video benchmarks (e.g. TRECVID INS, MIRFLICKR-1M), and we further demonstrate the performance and real-world applicability of our methods over a dataset sourced from 10,000 h of unedited footage from BBC News, comprising 5M+ key frames. Springer London 2015-03-22 2015 /pmc/articles/PMC4498639/ /pubmed/26191469 http://dx.doi.org/10.1007/s13735-015-0077-0 Text en © The Author(s) 2015 https://creativecommons.org/licenses/by/4.0/ Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle	Regular Paper Chatfield, Ken Arandjelović, Relja Parkhi, Omkar Zisserman, Andrew On-the-fly learning for visual search of large-scale image and video datasets
title	On-the-fly learning for visual search of large-scale image and video datasets
title_full	On-the-fly learning for visual search of large-scale image and video datasets
title_fullStr	On-the-fly learning for visual search of large-scale image and video datasets
title_full_unstemmed	On-the-fly learning for visual search of large-scale image and video datasets
title_short	On-the-fly learning for visual search of large-scale image and video datasets
title_sort	on-the-fly learning for visual search of large-scale image and video datasets
topic	Regular Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4498639/ https://www.ncbi.nlm.nih.gov/pubmed/26191469 http://dx.doi.org/10.1007/s13735-015-0077-0
work_keys_str_mv	AT chatfieldken ontheflylearningforvisualsearchoflargescaleimageandvideodatasets AT arandjelovicrelja ontheflylearningforvisualsearchoflargescaleimageandvideodatasets AT parkhiomkar ontheflylearningforvisualsearchoflargescaleimageandvideodatasets AT zissermanandrew ontheflylearningforvisualsearchoflargescaleimageandvideodatasets

On-the-fly learning for visual search of large-scale image and video datasets

Ejemplares similares