Integration of Hiearchical clustering with distance-based method

## BIRCH [1]

Balanced Iterative Reducing and Clustering using Hierarchies. Incrementally constructs a Clustering Feature (CF) tree; a hierarchical data structure for multiphase clustering.

• Phase 1: scan the dataset to build an initial in-memory CF-tree (a multi-level compression of the data that tries to preserve the inherent clustering structure of the data)
• Use an arbitrary clustering algorithm to cluster the leaf nodes of the CF-tree

BIRCH scales linearly, finding a good clustering with a single scan and improves quality with a few subsequent scans. Unfortunately only handles numeric data and is sensitive to the order of the data record.

### Clustering Feature Vector

A clustering feature is represented by:
CF = (N, LS, SS)

N: Number of data points

LS: $\sigma$

LS: $\sigma$

## CURE

Selects well-scattered points from the cluster and then shrinks them towards the center of the cluster by a specified fraction.

## CHAMELEON

Hierarchical clustering using dynamic modeling.

Bibliography
1. Zhang, Ramakrishnan, Livny (SIGMOD'96)
page revision: 4, last edited: 29 Apr 2008 04:09