Love Hard.: [PaperReading] Efficient visual search of videos cast as text retrieval

Title: Efficient visual search of videos cast as text retrieval
Author: J. Sivic, and A. Zisserman
Year: IEEE TPAMI, 2009

Goal: given a query object, find its occurence in pre-processed video database using the way similiar to text retrieval

It adopt the framework of text retrieval using TF-IDF & Removal of Stop-words

a frame vs. a document
visual word vs. word

the offline part:

for each keyframe, detect affine covariant regions (Shape Adapted & Maximally Stable) and represented by 128-dimensional SIFT descriptor.
Quantized to visual word by K-means. (SA: k=6000; MS: k=10000)

Now each frame is represented by the visual word level's tf-idf vector.(Words that arise freqently in documents are thrown out as stop words.)
Use inverted file to index for fast retrieval.

the online-part :

determine vw within query region.
use vw frequencies to first retrieval the top-N keyframes
then re-ranking by consider Spatial Consistency of the region of interest

live demo

the concept is simple and easy to realize. (a good paper must like this)
But the hard is data-preprocessing...it's not an interesting part.
I love the re-ranking mechanism with spatial consideration also it will take time.
This work can also apply to many applications if we replace the feature for each keyframe
(ex: face feature)

Love Hard.

[PaperReading] Efficient visual search of videos cast as text retrieval

0 Comment(s):

[ Articles ]

[ Tags ]

[ About ]

[ Calendar ]

[ Archives ]

[ Comments ]