silhouette {cluster}  R Documentation 
Compute silhouette information according to a given clustering in k clusters.
silhouette(x, ...) ## Default S3 method: silhouette (x, dist, dmatrix, ...) ## S3 method for class 'partition': silhouette(x, ...) sortSilhouette(object, ...) ## S3 method for class 'silhouette': summary(object, FUN = mean, ...) ## S3 method for class 'silhouette': plot(x, nmax.lab = 40, max.strlen = 5, main = NULL, sub = NULL, xlab = expression("Silhouette width "* s[i]), col = "gray", do.col.sort = length(col) > 1, border = 0, cex.names = par("cex.axis"), do.n.k = TRUE, do.clus.stat = TRUE, ...)
x 
an object of appropriate class; for the default
method an integer vector with k different integer cluster
codes or a list with such an x$clustering
component. Note that silhouette statistics are only defined if
2 <= k <= n1. 
dist 
a dissimilarity object inheriting from class
dist or coercible to one. If not specified,
dmatrix must be. 
dmatrix 
a symmetric dissimilarity matrix (n * n),
specified instead of dist , which can be more efficient. 
object 
an object of class silhouette . 
... 
further arguments passed to and from methods. 
FUN 
function used summarize silhouette widths. 
nmax.lab 
integer indicating the number of labels which is considered too large for singlename labeling the silhouette plot. 
max.strlen 
positive integer giving the length to which strings are truncated in silhouette plot labeling. 
main, sub, xlab 
arguments to title ; have a
sensible nonNULL default here. 
col, border, cex.names 
arguments passed
barplot() ; note that the default used to be col
= heat.colors(n), border = par("fg") instead.col can also be a color vector of length k for
clusterwise coloring, see also do.col.sort :

do.col.sort 
logical indicating if the colors col should
be sorted ``along'' the silhouette; this is useful for casewise or
clusterwise coloring. 
do.n.k 
logical indicating if n and k ``title text'' should be written. 
do.clus.stat 
logical indicating if cluster size and averages should be written right to the silhouettes. 
For each observation i, the silhouette width s(i) is
defined as follows:
Put a(i) = average dissimilarity between i and all other points of the
cluster to which i belongs. For all other clusters C, put
d(i,C) = average dissimilarity of i to all observations of C. The
smallest of these d(i,C) is b(i) := min_C d(i,C),
and can be seen as the dissimilarity between i and its ``neighbor''
cluster, i.e., the nearest one to which it does not belong.
Finally,
s(i) := ( b(i)  a(i) ) / max( a(i), b(i) ).
Observations with a large s(i) (almost 1) are very well clustered, a small s(i) (around 0) means that the observation lies between two clusters, and observations with a negative s(i) are probably placed in the wrong cluster.
silhouette()
returns an object, sil
, of class
silhouette
which is an [n x 3] matrix with attributes. For
each observation i, sil[i,]
contains the cluster to which i
belongs as well as the neighbor cluster of i (the cluster, not
containing i, for which the average dissimilarity between its
observations and i is minimal), and the silhouette width s(i) of
the observation. The colnames
correspondingly are
c("cluster", "neighbor", "sil_width")
.
summary(sil)
returns an object of class
summary.silhouette
, a list with components
si.summary 
numerical summary of the individual
silhouette widths s(i). 
clus.avg.widths 
numeric (rank 1) array of clusterwise
means of silhouette widths where mean = FUN is used. 
avg.width 
the total mean FUN(s) where s are the
individual silhouette widths. 
clus.sizes 
table of the k cluster sizes. 
call 
if available, the call creating sil . 
Ordered 
logical identical to attr(sil, "Ordered") , see
below. 
sortSilhouette(sil)
orders the rows of sil
as in the
silhouette plot, by cluster (increasingly) and decreasing silhouette
width s(i).
attr(sil, "Ordered")
is a logical indicating if sil
is
ordered as by sortSilhouette()
. In that case,
rownames(sil)
will contain case labels or numbers, and
attr(sil, "iOrd")
the ordering index vector.
While silhouette()
is intrinsic to the
partition
clusterings, and hence has a (trivial) method
for these, it is straightforward to get silhouettes from hierarchical
clusterings from silhouette.default()
with
cutree()
and distance as input.
Rousseeuw, P.J. (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53–65.
chapter 2 of Kaufman, L. and Rousseeuw, P.J. (1990), see
the references in plot.agnes
.
partition.object
, plot.partition
.
data(ruspini) pr4 < pam(ruspini, 4) str(si < silhouette(pr4)) (ssi < summary(si)) plot(si) # silhouette plot si2 < silhouette(pr4$clustering, dist(ruspini, "canberra")) summary(si2) # has small values: "canberra"'s fault plot(si2, nmax= 80, cex.names=0.6) par(mfrow = c(3,2), oma = c(0,0, 3, 0)) for(k in 2:6) plot(silhouette(pam(ruspini, k=k)), main = paste("k = ",k), do.n.k=FALSE) mtext("PAM(Ruspini) as in Kaufman & Rousseeuw, p.101", outer = TRUE, font = par("font.main"), cex = par("cex.main")) ## Silhouette for a hierarchical clustering: ar < agnes(ruspini) si3 < silhouette(cutree(ar, k = 5), # k = 4 gave the same as pam() above daisy(ruspini)) plot(si3, nmax = 80, cex.names = 0.5) ## 2 groups: Agnes() wasn't too good: si4 < silhouette(cutree(ar, k = 2), daisy(ruspini)) plot(si4, nmax = 80, cex.names = 0.5)