
Statistical Methodology for the National Virtual Observatory
The NAS Taylor/McKee Decadal Report on astronomy for 2000-2010 recommends
as a top priority the formation of a National Virtual Observatory (NVO)
to link archival datasets and catalogues from many existing astronomical
surveys. The effective use of such integrated massive datasets involves
more than just access and extraction of information - - scientific understanding
requires sophisticated statistical modeling of the selected data. This
effort falls under the rubric of statistical inference and includes the
fields of multivariate analysis, nonparametrics, Bayesian analysis, spatial
point processes, density estimation and data mining. Large-scale multiwavelength
astronomical surveys present a variety of new challenging statistical and
algorithmic problems that require methodological advances. The principal
investigator and his colleagues address some of the critically important
statistical challenges raised by the NVO. Specific approaches include:
low-storage percentile estimation for large datasets, multi-resolutional
K-Dimensional trees for clustering and outlier detection, and multi-dimensional
goodness-of-fit tests for comparison of multivariate astronomical datasets
with astrophysical models and simulations. Such an endeavor needs close
collaboration of statisticians, astronomers and NVO specialists who reside
at different institutions. Developing a statistical toolkit within the
NVO software environment implementing both new and existing methods is
one of the central goals of this project.
As the data volume and complexity of astronomical findings have enormously
increased in recent decades, a paradigm shift is underway in the very nature
of observational astronomy. While in the past a single astronomer might
observe a handful of objects, today data mining of large digital sky archives
obtained at all wavelengths of light is becoming a major mode of study.
The astronomical community thus faces a key task: to enable efficient and
objective scientific exploitation of enormous multifaceted datasets. In
recognition of this need, the National Virtual Observatory (NVO) initiative
has recently emerged to federate numerous large digital sky archives and
develop tools to explore and understand these vast volumes of data. The
investigation here aims at developing statistical and computational methods
to achieve these goals. The cross-disciplinary team, of astronomers and
statisticians, brings advances in these fields into the toolbox of observational
astronomy. The project seeks not only to formulate effective techniques
to address NVO problems, but to code these methods into statistical toolkits
within NVO software environments for the entire astronomical community.
The collaboration includes two institutions skilled in astrostatistics
(Penn State and Carnegie Mellon) and an institution at the center of the
NVO effort (California Institute of Technology). The participation by graduate
students and postdocs give them a rare opportunity to develop skills needed
for cross-disciplinary work.
Last updated: 5 May 2004