Title: On Spectral Clustering: Analysis and an algorithm
Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss
Year: NIPS 2001
Input: 1) a set of points S = {S1, ... , Sn} in L dimension we want to cluster
2) k (number--indicate the number to partition)
output: k cluster
[ Spectral Algorithm ]
any algorithm use SVD decomposition!!reference
[ Spectral Clustering ]
Algorithms that cluster points using eigenvectors of matrices derived from the data ex: Normalized cut,this paper,...etc
Try to obtain data representation in the low-dimensional space that can be easily clustered.
[ Outline ]
this paper first provide an algorithm , then analyze it by using tools from matrix perturbation theory , giving us what conditions the algorithm will do well.
=> use Spectral then K-mean
=> use k eigenvectors simultaneously and apply recursively to find k cluster
[ Algorithm ]
1) build A,D matrix, then use them to build L
-------------------------------------------------------
A: affinity matrix ( similarity between objects i and j)
(Gaussian similarity)
p.s Aii = 0
p.s alpha: scaling factor to choose
-------------------------------------------------------
D: [ ] diagonal matrix: (i,i): summation of A's ith row
=> data i 's total similarity value
-------------------------------------------------------
2) solve kth largest eigenvectors of L, build X , and normalized to Y
3) treat each row of Y as a data point (total: N ), run K-means cluster into K cluster,and for original data S1~Sn, the cluster it belongs is the same as these N data sequentially.
=> It sounds like: each row in Y is the linearly representation of original data point in reduced dimension ( from L to K ) , so each row represent an original data point.
Then in reduced dimension space, run clustering method.
=> Q: why not run cluster in the original dimension space ?
A: maybe in the original space, it cannot cluster well or is hard to cluster
like the case below:
[ Parameters need to be set ]
1. choice of K
2. Choice of scaling factor
3. cluster method (K-means....etc)
[ Note ]
1. easy to implement in matlab in less line and result good
2. claim: result good even in no convex region or not cleanly seperated
3. PS. convex set: an object is convex if for every pair of points within the object, every point on the straight line segment that joins them is also within the object
below is not a convex set!
4. until now, I slowly understand what is a spectral clustering.
The algorithms are similar but how to prove it is a big problem.






0 Comment(s):
Post a Comment