Multivariate classification & analysis
of North America (CSNA)
Metasite with links to classification meetings, journals,
discussion groups, commercial and on-line software.
PAM, CLARA, FANNY, AGNES, DIANA & MONA
Collection of multivariate clustering techniques implemented in
the core R package. DAISY computes
dissimilarities between objects with different types of
variables. Partitioning Around Medoids (PAM) partitions the
dataset using the k-medoid method which is robust against outliers.
Clustering Large Applications (CLARA) partitions large data sets.
Fuzzy Analysis (FANNY) give a fuzze partitioning. Agglomerative
clustering (AGNES) and divisive clustering (DIANA) give hierarchical
structures. Monothetic Analysis (MONA) uses binary variables.
This site gives stand-alone Fortran implementations. From the
book Finding Groups in Data: An
Introduction to Cluster Analysis by L. Kaufman and P. J.
Normal mixture models
Several codes are available that classify and characterize multivariate
datasets as mixtures of Gaussian populations via likelihood methods,
often using the EM Algorithm and Bayesian principles. Snob uses the
minimum message length method of machine learning.
G. McLachlan of University of Queensland
C. Fraley and A. Raftery of University of Washington
AutoClass Cby P. Cheeseman of NASA's Ames Research Center
D. Dowe of Monash University
FastEM by the Auton Lab (CMU) and the PiCA Collaboration
Machine learning algorithms for data mining including
multivariate classifiers, decision trees, neural nets, GUIs, resampling
and more. In Java
Library in C++ (MLC++)
Data mining and multivariate classification package including
data manipulation, variety of categorizers (on attributes, thresholds,
nearest neighbor, perceptron, decision tree ), induction algorithms,
and visualization tools of data and trees. From Silicon Graphics
GRB Tool Shed
Interactive environment for the analysis of astronomical
gamma-ray bursts from NASA's BATSE experiment. Emphases
multivariate classification including supervised decision trees,
K* nearest neighbor, Naive Bayes, normal mixtures using the EM
Algorithm, K means, COBWEB, backpropagation neutral networks, and
Kohonen networks. Based on the Weka machine learning
package. By Jon Hakkila (College of Charleston) and colleagues.
Algorithms for the common high breakdown estimation criteria, and
to find the minimum volume ellipsoid in multivariate datasets. By D.
Hawkins, University of Minnesota, and distributed by Statlib.
classifier 1 (OC1)
Partitioning of multivariate datasets using oblique and
axis-parallel hyperplanes. Written in C by S. Salzbert of Johns Hopkins
for clustering and multivariate analysis
Metasite with descriptions of on-line programs and
packages. From Fionn Murtagh (Univ. London)
Clustering algorithm based on dynamic altering of hierarchies.
Fast Algorithm for
Tree-structures classification similar to CART.
Library of several dozen subroutines from NIST for multivariate
clustering algorithm from 1975 monobraph by J. A. Hartigan.
Six programs computing dissimilarities, partitioning using
medoids, k-medoid clustering, fuzzy clustering, agglomerative and
divisive hierarchical clustering, clustering of binary data.
Average-linkage hierarchical clustering.
Algorithm for agglomerative clustering using various criteria
(Ward's minimum variance, single linkage, average linkage, complete
linkage, McQuitty's method, median method, centroid method).
Algorithm for single-linkage and minimum intra-cluster variance
clustering. Applied Statistics algorithm #58.
k-means clustering minimizing intra-cluster variance.
Package in Pascal developed for ecological spatio-temporal
multivariate datasets based on monograph by L. & P. Legendre
(1983). Functionalities include autocorrelation using correlograms
(Moran's I and Geary's c indices), hierarchical agglomerative
clustering, k-means clustering, chronological clustering for
multivariate time series, analysis of variance, geometrical connectors,
(nearest neighbor, Gabriel's connection, Delaunay triangulation),
Mantel's two-sample statistic, multidimensional scaling by principal
coordinates analysis, univariate periodogram. [This package
should not be confused with the enormous R statistical package modeled
Large multivariate analysis and graphical display package
designed for ecologists and geographers. Includes principal components
analysis with instrumental variables, correspondance analysis,
coinertia analysis, contingency tables, discriminant analysis,fuzzy
correspondance analysis, Rao's diversity coefficient, Moran's I and
Geary'c randomization tests for spatial autocorrelation, Wartenberg's
correlation analysis, partial triadic analysis of k-tables. From
the bioinformatic group at Universite de Lyon for Macintosh
and Windows 95 platforms.
Minimum Covariance Determinant (MCD)
This is a highly robust estimator of multivariate location and
scatter based on the subset of points whose covariance matrix has the
lowest determinant. Efficient method for large datasets. By
P. Rousseeuw and K. Van Driessen of University of Antwerp.
Volume Ellipsoid (MINVOL)
Computes highly robust location and scatter matrix. By P.
Rousseeuw of University of Antwerp.
data analysis software
Collection of subroutines for principal components analysis,
partitioning, hierarchical clustering. discriminant analyses (linear,
multiple, k-nearest neighbors), correspondence analysis,
multidimensional scaling, Sammon mapping, Kohonen self-organizing
feature map. From Fionn Murtagh (Univ. London).
Self-contained data management and analysis system well-adapted
to very large multivariate datasets. Includes fast searches
and data minin, ANOVA, linear modeling, clustering, life table
analysis. For Windows.
Interactive Projection Pursuit, providing 1- and 2-dimensional
projections of multivariate data for interactive discovery of
structure. The user chooses and graphically investigates interesting
projections. From Case Western Reserve University. C and Fortran
algorithms installed as a library for S-Plus.
Two-dimensional exploratory projection pursuit.
skewness and kurtosis
Distribution function of the square multiple correlation
dependency analysis for multivariate data
linear regression by least median of squares.
Robust estimator of multivariate location and dispersion.
Hypothesis testing for means and spreads for multivariate