Love Hard.: [PaperReading] On Spectral Clustering: Analysis and an algorithm

Title: On Spectral Clustering: Analysis and an algorithm
Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss
Year: NIPS 2001

Input: 1) a set of points S = {S1, ... , Sn} in L dimension we want to cluster
2) k (number--indicate the number to partition)
output: k cluster

[ Spectral Algorithm ]

any algorithm use SVD decomposition!!
reference

[ Spectral Clustering ]

Algorithms that cluster points using eigenvectors of matrices derived from the data
ex: Normalized cut,this paper,...etc
Try to obtain data representation in the low-dimensional space that can be easily clustered.

[ Outline ]
this paper first provide an algorithm , then analyze it by using tools from matrix perturbation theory , giving us what conditions the algorithm will do well.
=> use Spectral then K-mean
=> use k eigenvectors simultaneously and apply recursively to find k cluster

[ Algorithm ]
1) build A,D matrix, then use them to build L
-------------------------------------------------------
A: affinity matrix ( similarity between objects i and j)

(Gaussian similarity)
p.s Aii = 0
p.s alpha: scaling factor to choose

-------------------------------------------------------

D: [ ] diagonal matrix: (i,i): summation of A's ith row
=> data i 's total similarity value
-------------------------------------------------------

2) solve kth largest eigenvectors of L, build X , and normalized to Y

3) treat each row of Y as a data point (total: N ), run K-means cluster into K cluster,and for original data S1~Sn, the cluster it belongs is the same as these N data sequentially.

=> It sounds like: each row in Y is the linearly representation of original data point in reduced dimension ( from L to K ) , so each row represent an original data point.

Then in reduced dimension space, run clustering method.

=> Q: why not run cluster in the original dimension space ?
A: maybe in the original space, it cannot cluster well or is hard to cluster
like the case below:

[ Parameters need to be set ]
1. choice of K
2. Choice of scaling factor
3. cluster method (K-means....etc)

[ Note ]
1. easy to implement in matlab in less line and result good
2. claim: result good even in no convex region or not cleanly seperated
3. PS. convex set: an object is convex if for every pair of points within the object, every point on the straight line segment that joins them is also within the object
below is not a convex set!

figure from wikipedia

4. until now, I slowly understand what is a spectral clustering.
The algorithms are similar but how to prove it is a big problem.

Love Hard.

[PaperReading] On Spectral Clustering: Analysis and an algorithm

0 Comment(s):

[ Articles ]

[ Tags ]

[ About ]

[ Calendar ]

[ Archives ]

[ Comments ]