A type of partitioning algorithm used to cluster data. We construct a partitioned of a database D of n objects into a set of k clusters.
In the K-mediods clustering method we find representative objects (called medoids) in clusters. To achieve this goal, only the definition of distance from any two objects is needed.
Partitioning Around Medoids (PAM) [1]
Originally devised by Kaufman and Rousseeuw in 1987 and built using S+, PAM uses real objects to represent clusters:
- Select k representative objects arbitrarily
- For each pair of non-selected object h and selected object i, calculate the total swapping cost TCih.
- If TCih < 0, i is replaced by h.
- Then assign each non-selected object to the most similar representative object
- Repeat steps 2-3 until there is no change.
CLARA [2]
Draws multiple samples of the data set, applies PAM on each sample and gives the best clustering as the output.
It can deal with larger data sets than PAM, but a good clustering based on samples will not necessarily represent a good clustering of the whole data set if the sample is biased.
The efficiency of CLARA depends on the sample size;
O(kS2 + k(n - k))
CLARANS: Randomized sampling [3]
A Clustering Algorithm based on Randomized Search (CLARANS) draws a sample of neighbours dynamically. The clustering process can be represented as searching a graph, where every node is a potential solution; that is, set of k medoids.
If the local optimum is found, CLARANS starts with a new randomly selected node in search for a new local optimum.
More efficient and scalable than both PAM and CLARA. Focusing techniques and spatial access structures may further its performance.
Focusing and spatial data structure [4]
Ester et al. 1995