[PaperReading] On Spectral Clustering: Analysis and an algorithm

2010-04-08 ·


Title: On Spectral Clustering: Analysis and an algorithm
Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss
Year: NIPS 2001

Input: 1) a set of points S = {S1, ... , Sn} in L dimension we want to cluster
2) k (number--indicate the number to partition)
output: k cluster



[ Spectral Algorithm ]
any algorithm use SVD decomposition!!
reference

[ Spectral Clustering ]
Algorithms that cluster points using eigenvectors of matrices derived from the data
ex: Normalized cut,this paper,...etc
Try to obtain data representation in the low-dimensional space that can be easily clustered.

[ Outline ]
this paper first provide an algorithm , then analyze it by using tools from matrix perturbation theory , giving us what conditions the algorithm will do well.
=> use Spectral then K-mean
=> use k eigenvectors simultaneously and apply recursively to find k cluster


[ Algorithm ]
1) build A,D matrix, then use them to build L
-------------------------------------------------------
 A: affinity matrix ( similarity between objects i and j)
(Gaussian similarity)
p.s Aii = 0
p.s alpha: scaling factor to choose

-------------------------------------------------------

D:  [    ]  diagonal matrix: (i,i): summation of A's ith row
         => data i 's total similarity value
-------------------------------------------------------
   
2) solve  kth largest eigenvectors of L, build X , and normalized to Y

3) treat each row of Y as a data point (total: N ), run K-means cluster into K cluster,and for original data S1~Sn, the cluster it belongs is the same as these N data sequentially.

=> It sounds like: each row in Y is the linearly representation of original data point in reduced dimension ( from L to K ) , so each row represent an original data point.

Then in reduced dimension space, run clustering method.

=> Q: why not run cluster in the original dimension space ?
      A: maybe in the original space, it cannot cluster well or is hard to cluster
like the case below:



[ Parameters need to be set ]
1. choice of K
2. Choice of scaling factor
3. cluster method (K-means....etc)

[ Note ]
1. easy to implement in matlab in less line and result good
2. claim: result good even in no convex region or not cleanly seperated
3. PS. convex set: an object is convex if for every pair of points within the object, every point on the straight line segment that joins them is also within the object
below is not a convex set!
 

4. until now, I slowly understand what is a spectral clustering.
The algorithms are similar but how to prove it is a big problem.

0 Comment(s):

[ About ]

Welcome :P
I am Saphina Cheng (anon),
a master student of MiRA (Multimedia indexing, Retrieval, and Analysis) group of the Communication & Multimedia Laboratory at National Taiwan University

This blog are about my reading papers.

Any opinion is appreciated.

Contact:

[ Calendar ]

<<             >>

[ Comments ]