K-Medoids clustering method

A type of partitioning algorithm used to cluster data. We construct a partitioned of a database D of n objects into a set of k clusters.
In the K-mediods clustering method we find representative objects (called medoids) in clusters. To achieve this goal, only the definition of distance from any two objects is needed.


Partitioning Around Medoids (PAM) [1]

Originally devised by Kaufman and Rousseeuw in 1987 and built using S+, PAM uses real objects to represent clusters:

  • Select k representative objects arbitrarily
  • For each pair of non-selected object h and selected object i, calculate the total swapping cost TCih.
    • If TCih < 0, i is replaced by h.
  • Then assign each non-selected object to the most similar representative object
  • Repeat steps 2-3 until there is no change.

CLARA [2]

Draws multiple samples of the data set, applies PAM on each sample and gives the best clustering as the output.
It can deal with larger data sets than PAM, but a good clustering based on samples will not necessarily represent a good clustering of the whole data set if the sample is biased.

The efficiency of CLARA depends on the sample size;
O(kS2 + k(n - k))


CLARANS: Randomized sampling [3]

A Clustering Algorithm based on Randomized Search (CLARANS) draws a sample of neighbours dynamically. The clustering process can be represented as searching a graph, where every node is a potential solution; that is, set of k medoids.
If the local optimum is found, CLARANS starts with a new randomly selected node in search for a new local optimum.

More efficient and scalable than both PAM and CLARA. Focusing techniques and spatial access structures may further its performance.


Focusing and spatial data structure [4]

Ester et al. 1995


Bibliography
1. Kaufman and Rousseeuw, 1987
2. Kaufman and Rousseeuw, 1990
3. Ng & Han, 1994
4. Ester (et al.), 1995
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License