Cafe Cerebral - Cluster Analysis

Cluster analysis is an exploratory data analysis tool for solving classification problems. Its object is to sort cases (people, things, events, etc) into groups, or clusters, so that the degree of association is strong between members of the same cluster and weak between members of different clusters. A cluster is a group of relatively homogenous cases or observations. Each cluster thus describes, in terms of the data collected, the class to which its members belong; and this description may be abstracted through use from the particular to the general class or type.

Cluster analysis can be applied to data that exhibits "natural" groupings. This analysis sorts through the raw data and groups them into clusters.

The diagram below illustrates the results of a survey that studied drinkers' perceptions of spirits (alcohol). Each point represents the results from one respondent. The research indicates there are four clusters in this market.

In marketing, cluster analysis is used for

  • Segmenting the market and determining target market
  • Product positioning and new product development
  • Select test marketing

The basic methods of clustering used are of two types:

  • Hierarchical clustering or Linkage methods
  • Non-hierarchical clustering or Nodal methods
  • Hierarchical clustering
    In hierarchical clustering the objects are organized into a hierarchical structure as part of the procedure.
    • Divisive clustering
      Divisive clustering start by treating all objects as if they are part of a single large cluster, then divide the cluster into smaller and smaller clusters.
    • Agglomerative clustering
      Agglomerative clustering start by treating each object as a separate cluster, then group them into bigger and bigger clusters.
      • Centroid methods
        In centroid methods, clusters are generated that maximize the distance between the centers of clusters (a centroid is the mean value for all the objects in the cluster).
      • Variance methods
        In variance methods, clusters are generated that minimize the within-cluster variance.
        • Ward’s Procedure
          Ward’s procedure is a variance method where clusters are generated that minimize the squared Euclidean distance to the center mean.
      • Linkage methods
        In Linkage methods,cluster objects are based on the distance between them.
        • Single Linkage method
          In Single Linkage method, cluster objects are based on the minimum distance between them (also called the nearest neighbor rule).
        • Complete Linkage method
          In complete linkage method, cluster objects are based on the maximum distance between them (also called the furthest neighbor rule).
        • Average Linkage method
          In the average linkage method, cluster objects are based on the average distance between all pairs of objects (one member of the pair must be from a different cluster)
  • Non-Hierarchical clustering (also called k-means clustering)
    These methods first determine a cluster center, then group all objects that are within a certain distance
    • Sequential Threshold method
      This method first determines a cluster center, then groups all objects that are within a predetermined threshold from the center, creating one cluster at a time.
    • Parallel Threshold method
      In this method, simultaneously several cluster centers are determined, then objects that are within a predetermined threshold from the centers are grouped.
    • Optimizing Partitioning method
      In this method first a non-hierarchical procedure is run, then objects are reassigned so as to optimize an overall criterion.
Contact Mu Sigma
info@mu-sigma.com
Site Map | Disclaimer | Privacy Policy
© 2005 - 2009 Mu Sigma. All rights reserved