Astrostatistics Image Penn State University Eberly College of Science Center for Astrostatistics Center for Astrostatistics


The R package

R, closely related to the commercial package S-Plus, is the largest and most comprehensive public domain statistical computing environment.  It provides a coherent, flexible programming environment for data analysis, applied mathematics, statistical analysis, and graphics.  Unlike some menu-drived statistical packages, the user interacts with R with a C-like command language with pop-up graphical windows.  The core R package is enhanced by several hundred user-supplied add-on packages in the Comprehensive R Archive Network (CRAN) and the Omegahat Project for Statistical Computing. Binary executables and open source codes for Linux, Windows and MacOS can be downloaded for instant use.  R has extensive documentation. Here we list some of its capabilities that may be of interest to the physical scientist.

The base R package includes:

    • arithmetic (scalar/vector/array)
    • bootstrap resampling and confidence intervals (basic, ABC, percentile, studentized, tilted, jackknife)
    • correlation coefficients (Pearson, Kendall, Spearman)
    • distributions (Gaussian, Poisson, and many other statistical distributions and special functions, including random deviates)
    • empirical distribution tests (Anderson-Darling, Cramer-von Mises, Kolmogorov-Smirnov) and quantiles
    • exploratory data analysis
    • generalized linear & generalized additive modelling
    • graphics, publication-quality (scatter, dendrograms, lattice, etc)
    • integration and interpolation
    • linear algebra and equation solutions (extensive methods)
    • linear mixed-effects modelling
    • linear modelling (including nonlinear functions), resistant regression. robust M-estimators
    • linear & quadratic programming (simplex, penalized constraints)
    • local and ridge regression (loess, variograms)
    • maximum likelihood estimation (AIC, BIC)
    • multivariate analysis (tabulations, ANOVA, discriminant, factor, principal components, Mahalanobis distances, MANOVA, principal components)
    • multivariate cluster analyses (agglomerative and divisive clustering, dissimilarity matrix, fuzzy, k-nearest neighbor, k-means & m-medioid partitioning, monothetic,  recursive partitioning, regression trees, self-organizing maps)
    • neural networks (censored, least-squares, entropy, log-linear, maximum likelihood, perceptron)
    • nonlinear least-squares regression
    • smoothing (cross-validation, histograms, kernel, local regression, variogram)
    • sorting
    • spatial analysis & point processes (correlogram, kriging, Moran's I, Geary's C, pattern analysis, polynomial surface, simulation, variogram)
    • splines (B-spline, periodic, polynomial)
    • statistical tests, parametric & nonparametric (Ansari, Bartlett, binomial, Box, F, Fisher, Fligner, Friedman, Mantel-Haenzel, Mauchley, McNemar, Mood, proportions, Shapiro, t, Wilcoxon, signed rank),
    • survival analysis for censored data (Cox regression, Kaplan-Meier & Fleming-Harrington survival curves, life table, linear regression, ridge regression, tobit modelling, Weibull & other survival curve fitting, k-sample tests)
    • time series analysis (ARMA, acf, Box-Jenkins, FFT, Kalman filter, lags, mixed-effects, prediction, smoothing, spectral analysis) 

CRAN add-on packages treat:

   (see Chapter 5 for brief individual descriptions)

    • adaptive quadrature
    • ARIMA modeling
    • Bayesian computation (empirical Bayes, MCMC calculations & diagnostics, survival regression, logit/probit, networks
    • Boolean hypotheses
    • boosting
    • bootstrap modelling
    • classification and regression trees
    • convex clustering & convex hulls
    • conditional inference
    • combinatorics
    • elliptical confidence regions
    • energy statistical tests
    • extreme value distribution
    • fixed point clusters
    • genetic algorithms
    • geostatistical modelling
    • GUIs (Rcmdr, SciViews)
    • heteroscedastic t-regression
    • hidden Markov models
    • hierarchical partitioning & clustering
    • independent component analysis
    • interpolation
    • irregular time series. 
    • kernel smoothing
    • kernel-based machine learning
    • k-nearest neighbor tree classifier
    • Kolmogorov-Zurbenko adaptive filtering
    • least-angle and lasso regression
    • linear programming (simplex)
    • likelihood ratios
    • local regression density estimators
    • logistic regression
    • map projections
    • Matlab emulator
    • matrices, sparse matrices, tensor decomposition
    • Markov chain Monte Carlo
    • mixture models
    • mixture discriminant analysis
    • model-based clustering
    • nonlinear least squares
    • Markov multistate models
    • mixture models & regression
    • multidimensional analysis
    • multimodality test
    • multivariate time series
    • multivariate Shapiro-Wilk test
    • multivariate outlier detection
    • multivariate normal partitioning
    • multivariate normals with missing data
    • neural networks
    • non-linear time series analysis
    • nonparametric multiple comparisons
    • omnibus tests for normality
    • orientation data, outlier detection
    • parallel coordinates plots
    • partial least squares
    • periodic autoregression analysis
    • Poisson-Gamma additive models
    • polychoric and polyserial correlations
    • principal component regression
    • principal curve fits
    • projection pursuit
    • proportional hazards modelling
    • quantile regression
    • quasi-variances
    • random fields
    • random forest classification
    • ridge regression
    • robust regression
    • Sampford sampling
    • segmented regression break points
    • self-organizing maps
    • shape analysis
    • space-time ecological data analysis
    • spatial analysis and kriging
    • spline fits & regressions (MARS, BRUTO)
    • structural regression with splines
    • tesselations & Delaunay trangulation
    • three-dimensional visualization
    • two-stage least squares regression
    • unit root tests
    • variogram diagnostics
    • wavelet toolbox & denoising
    • weighted likelihood robust inference

CRAN includes codes and datasets associated with textbooks on:

    • Bayesian statistics
    • bootstrapping
    • circular statistics
    • contingency tables
    • data analysis
    • engineering statistics
    • econometrics
    • kernel smoothing
    • generalized additive models
    • image analysis
    • linear regression
    • relative distribution methods
    • smoothing
    • survival analysis (censored data)
    • time-frequency analysis

Through base R, CRAN and the Omegahat Project, R interfaces to the following languages, formats and protocols:

    • Languages : BUGS, C, Fortran, Java, Python, Perl, XLisp
    • Headers: XML
    • I/O file structures: ASCII, binary, bitmapped images, ftp, gzip, MIM, Oracle, SAS, S-Plus, SPSS, Systat, Stata, URL, .wav)
    • Web formats : cgi, HTML, Netscape, SOAP
    • Statistics packages: GRASS, Matlab (emulator), XGobi
    • Spreadsheets: Excel, Gnumeric
    • Graphics: Grace, Gtk, OpenGL, Tcl/Tk
    • Databases: MySQL, SQL, SQLite
    • Science/math Libraries: GSL, Isoda, LAPACK
    • Parallel processing: PVM
    • Text processors: LaTeX
    • Network connections: sockets, DCOM, CORBA