Integration of Hiearchical clustering with distance-based method


Balanced Iterative Reducing and Clustering using Hierarchies. Incrementally constructs a Clustering Feature (CF) tree; a hierarchical data structure for multiphase clustering.

  • Phase 1: scan the dataset to build an initial in-memory CF-tree (a multi-level compression of the data that tries to preserve the inherent clustering structure of the data)
  • Use an arbitrary clustering algorithm to cluster the leaf nodes of the CF-tree

BIRCH scales linearly, finding a good clustering with a single scan and improves quality with a few subsequent scans. Unfortunately only handles numeric data and is sensitive to the order of the data record.

Clustering Feature Vector

A clustering feature is represented by:
CF = (N, LS, SS)

N: Number of data points

LS: $\sigma$

LS: $\sigma$


Selects well-scattered points from the cluster and then shrinks them towards the center of the cluster by a specified fraction.


Hierarchical clustering using dynamic modeling.

1. Zhang, Ramakrishnan, Livny (SIGMOD'96)
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License