Astrostatistics Image Penn State University Eberly College of Science Center for Astrostatistics Center for Astrostatistics

 
CASt online resources

Selected articles in statistics for astronomy & physics

Two types of articles are presented here: broad reviews or discussions of statistical issues relevant to the observational physical sciences; and examples of recent statistical advances to illustrate the state-of-the-art.  Papers from both the mathematical statistical and physical science literature are included.  While not required, a strong preference is given to articles whose full text is available on-line without constraint.  Articles are selected by the Center for Astrostatistics Board and Associates; please forward additional selections to Eric Feigelson (edf@astro.psu.edu).
 
Comparison of Bayesian and frequentist approaches
Bayesians, frequentists, and scientists
    by Bradley Efron. A brief, readable and fascinating view of 21st century applications of statistics to modern science involving a combination of frequentist and Bayesian approaches.  Topics include bootstrap, empirical Bayes and false discovery rate.  2005 ASA Presidential Address.  B. Efron (2005), JASA 100, 1
Bayesian, frequentists and physicists
    A similar article oriented towards particle physics.  Topics include Feldman-Cousins bounds, model selection, James-Stein estimation, empirical Bayes. Appears in PhyStat 2003.
Bayesian reasoning vs. conventional statistics in high energy physics
    by G. D'Agostini. Another valuable discussion comparing frequentist and Bayesian methods in the physical sciences.  Talk at MaxEnt98 conference.

Confidence interval and limits with small signals
A unified approach to the classical statistical analysis of small signals
    by Gary J. Feldman & Robert D. Cousins.  Seminal study in particle physics for construction of confidence intervals and limits when low- or zero-signal is present.  A bibliography of related papers appears here.   Phys. Rev. D, 57, 3873-3889 (1998).
Lectures on statistics and numerical methods in HEP
    by Frank Porter & Roger Barlow.  Lectures on high energy physics given to the SLAC Users Organization in 2000.
A Fully Bayesian Computation of Upper Limits for Poisson Processes
    by Luc Demortier. Detailed treatment (2004). 

Image processing
Bayesian restoration of digital images employing Markov chain Monte Carlo
    by K. P. N. Murthy.  Invited review. (2005).
Morphological classification of galaxies by shapelet decomposition in the Sloan Digital Sky Survey. II. Multiwavelength classification
    by B. C. Kelly and T. A. McKay.  Principal components analysis of shapelet coefficients (after inclination correction) and a normal mixture model leads to a classification of galaxy morphologies.  Astron. J. 129. 1287-1310 (2005).
Bootstrap resampling as a tool for radio-interferometric imaging fidelity assessment
    by Athol Kemball and Adam Martinsek.  Model-based and subsample bootstrap methods are examined to test fidelity of features in radio interoferometric imaging.  Astron. J. 129, 1760 (2005).
Multiscale likelihood analysis and complexity penalized estimation
    by Eric D. Kolaczyk and Robert D. Nowak.  A mathematical framework is presented for the application multiscale models (e.g. wavelet decomposition) to count (Poisson) and catagorical (binomial) as well as Gaussian data.  Here the data-based likelihood is subject to a multiscale factorization; in the Poisson case, it involves a recursive partitioning.  Application to photon-counting images is envisioned.  Annals of Statistics 32, 500-527 (2004)

Massive data sets
Class discovery in galaxy classification
    by David Bazell and David J. Miller.  Application of neural network mixture models to the star/galaxy classification problem.  Astrophys. J. 618, 723-32 (2005).
Statistical challenges with massive data sets in particle physics
    by Bruce Kunteson & Paul Padley.  Review of particle physics problems for statisticians.  Journal of Computational & Graphical Statistics (2003).

Bayesian methodology
Significance in gamma-ray astronomy - the Li & Ma problem in Bayesian statistics
    by S. Gillessen and H. L. Harney.  The significance of gamma-ray source existence in a Poisson data set with high background is examined in a Bayesian context.  Astron & Astrophys 430, 355-62 (2005).
Formal rules for selecting prior distributions: A review and annotated bibliography
    by Robert E. Kass and Larry Wasserman.  Discussion of non-informative (non-subjective) priors used in Bayesian inference with emphasis on Jeffreys's rules.  J Am Stat Assn 91, 1343 (1996).
Reviews and research on Bayesian inference in astrophysics
    by Thomas Loredo.  Several substantial lectures and research articles from 1989 to 2003 on the principles and prospects for Bayesian methods in astronomy.  Applications include Gaussian & Poisson problems (e.g. the neutrinos from SN 1987A), gamma ray bursts, periodograms for time series, spatial analysis of cosmic microwave background radiation, adaptive experimental design, and computational techniques.

Poisson processes

Equivalence theory for density estimation, Poisson processes and Gaussian white noise with drift
    by Lawrence D. Brown et al.  This very mathematical paper gives an example of a current theoretical study of Poisson processes which are often seen in astronomical and physics observations.  Annals of Statistics, 32, 2074-97 (2004).

Multivariate analysis
Analysis of Variance -- Why it is more important than ever
    by Andrew Gelman.  Discussion of ANOVA, a classical multivariate technique involving the structuring of regression coefficients into batches to improve prediction, in terms of exploratory data analysis, linear modeling, and hierarchical Bayesian regression.  Annals of Statistics 33, 1 (2005)
Least angle regression
    by Bradley Efron et al.  This discusses various computational efficient methods for model selection in least-squares multiple regression; e.g. predicting redshift from a large database of properties of extragalactic objects.  Least angle regression is compared to the Lasso, boosting, and traditional stepwise regression techniques.  Annals of Statistics 32, 407-51 (2004) with commentaries.

Nonparametric statistics
Spectral classification technique for X-ray sources: Quartile analysis
    by Jaesub Hong et al. Use of the median and ratio of quartiles to  characterize
    the spectra of faint CCD X-ray sources. Astrophys. J. 614, 508-17 (2004).

Time series analysis
Time series analysis in astronomy: Limits and potentialities
Wavelet-based estimation with multiple sampling rates
    by Peter Hall and Spiridon Penev.  This paper is an example of recent studies of the statistical properties of wavelets.  When a nonstationary signal in noise is sampled at discrete times, information may be lost when signal strength and structure increases.  Here an algorithm for adaptive switching between sampling rates based on high-frequency wavelet terms is presented. Annals of Statistics, 32, 1933-56 (2004)
Multiscale likelihood analysis and complexity penalized estimation
    by Eric D. Kolaczyk and Robert D. Nowak.  A mathematical framework is presented for the application multiscale models (e.g. wavelet decomposition) to count (Poisson) and catagorical (binomial) as well as Gaussian data.  Here the data-based likelihood is subject to a multiscale factorization; in the Poisson case, it involves a recursive partitioning.  Application to photon-counting images is envisioned.  Annals of Statistics 32, 500-527 (2004)
A selective overview of nonparametric methods in financial econometrics
    by Jianqing Fan.  This review gives insight into recent progress in modeling correlated but stochastic time series such as seen in stock prices, gamma-ray bursts, accretion binaries, or BL Lac objects. The procedures model nonstationary autoregressive processes with heteroscadasticity (i.e. where the nature of the variations change with time).  See his recent monograph Nonlinear Time Series: Nonparametric and Parametric Methods (2003).

Model selection & goodness-of-fit
A tutorial introduction to the minimum description length principle
    by Peter Grunwald.  This is a recent method of inference addressing the model selection problem that balances goodness-of-fit with model complexity, and thus avoids overfitting with too many parameters.  The mathematics is based on Kolmogorov Complexity, information theory and data compression, and the result is related to penalized likelihood criteria (AIC, BIC, RIC).  See also the MDL research Web site.

Spatial point processes
Estimating the J function without edge correction
    by Adrian Baddeley et al.  The J function is a combination of the empty space (~ the astronomers' void probability) function and nearest-neighbour distance distribution (~ 2-point correlation) function in a spatial point process (e.g. distribution of galaxies in space).  This study proposes a Monte Carlo test of the importance of weighting due to edge effects (~ survey boundaries).  (1997)

Multivariate clustering
A robust method for cluster analysis
    by Maria T. Gallegos and Gunter Ritter.  A treatment of multivariate clustering when outliers are present.  A subset of the observations are partitioned into clusters using a maximum-likelihood estimator so that the pooled sum of squares and products matrix has minimum determinant.  Annals of Statistics 33, 347-380 (2005)
Model-based clustering, discriminant analysis, and density estimation
    by Chris Fraley and Adrian E. Raftery A review of recent methods for discrimination of groups in multivariate datasets, mixture and classification models.    J. Amer. Statist. Assoc. 97, 611--631 (2002)

False Discovery Rate method
Multiple Comparison Procedures
    Hochberg, Y. and Tamhane, A. (Wiley, 1987)
Controlling the False Discovery rate in astrophysical data analysis
    Miller, C. J. et al (PiCA collaboration)  Astron. J.  122, 3492-3505 (2002).
A stochastic process approach to False Discovery Rates
    Genovese C., Wasserman L. Annals of Statistics 32 1035-1061 (2004).
Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses
    Meinshausen, N and Rice, J. (2005)
Return to CASt bibliographies