[PaperReading] Efficient visual search of videos cast as text retrieval

2010-03-03 ·

Title: Efficient visual search of videos cast as text retrieval
Author: J. Sivic, and A. Zisserman
Year: IEEE TPAMI, 2009

Goal: given a query object, find its occurence in pre-processed video database using the way similiar to text retrieval

It adopt the framework of text retrieval using TF-IDF & Removal of Stop-words
  • a frame        vs.    a document
  • visual word  vs.    word

    the offline part:
    1. for each keyframe, detect affine covariant regions (Shape Adapted & Maximally Stable) and represented by 128-dimensional SIFT descriptor. 
    2. Quantized to visual word by K-means. (SA: k=6000; MS: k=10000)
    Now each frame is represented by the visual word level's tf-idf vector.(Words that arise freqently in documents are thrown out as stop words.)
    Use inverted file to index for fast retrieval.

    the online-part :

    determine vw within query region.
    use vw frequencies to first retrieval the top-N keyframes
    then re-ranking by consider Spatial Consistency of the region of interest



    live demo

    the concept is simple and easy to realize. (a good paper must like this)
    But the hard is data-preprocessing...it's not an interesting part.
    I love the re-ranking mechanism with spatial consideration also it will take time.
    This work can also apply to many applications if we replace the feature for each keyframe
    (ex: face feature)

    0 Comment(s):

    [ About ]

    Welcome :P
    I am Saphina Cheng (anon),
    a master student of MiRA (Multimedia indexing, Retrieval, and Analysis) group of the Communication & Multimedia Laboratory at National Taiwan University

    This blog are about my reading papers.

    Any opinion is appreciated.

    Contact:

    [ Calendar ]

    <<             >>

    [ Comments ]