K-Medoids clustering method

A type of partitioning algorithm used to cluster data. We construct a partitioned of a database D of n objects into a set of k clusters.
In the K-mediods clustering method we find representative objects (called medoids) in clusters. To achieve this goal, only the definition of distance from any two objects is needed.

Partitioning Around Medoids (PAM) [1]

Originally devised by Kaufman and Rousseeuw in 1987 and built using S+, PAM uses real objects to represent clusters:

  • Select k representative objects arbitrarily
  • For each pair of non-selected object h and selected object i, calculate the total swapping cost TCih.
    • If TCih < 0, i is replaced by h.
  • Then assign each non-selected object to the most similar representative object
  • Repeat steps 2-3 until there is no change.


Draws multiple samples of the data set, applies PAM on each sample and gives the best clustering as the output.
It can deal with larger data sets than PAM, but a good clustering based on samples will not necessarily represent a good clustering of the whole data set if the sample is biased.

The efficiency of CLARA depends on the sample size;
O(kS2 + k(n - k))

CLARANS: Randomized sampling [3]

A Clustering Algorithm based on Randomized Search (CLARANS) draws a sample of neighbours dynamically. The clustering process can be represented as searching a graph, where every node is a potential solution; that is, set of k medoids.
If the local optimum is found, CLARANS starts with a new randomly selected node in search for a new local optimum.

More efficient and scalable than both PAM and CLARA. Focusing techniques and spatial access structures may further its performance.

Focusing and spatial data structure [4]

Ester et al. 1995

1. Kaufman and Rousseeuw, 1987
2. Kaufman and Rousseeuw, 1990
3. Ng & Han, 1994
4. Ester (et al.), 1995
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License