Astrostatistics Image Penn State University Eberly College of Science Center for Astrostatistics Center for Astrostatistics

2005 Summer School | Biographies | Travel, Visa and Lodging | Contact | Registration 

 

Summer School in Statistics for Astronomers & Physicists
June 5-17, 2005

 

1. DISCRIMINANT ANALYSIS using MINITAB

0. Down Load data file halodiscrim2.xls 
1. Start Minitab:
  Click Start
      -> All Program
      -> Spreadsheets & Statistics
      -> Minitab 14
      -> Minitab
  Now students open Minitab.
 
2. To read Excel data
  Go to File -> Open Worksheet
  Alert!!! 
       1. Change File of type to .xls at the bottom
       2. Click Option at the bottom and select "none" as variable names
  Now we can open the data set on Minitab
 
3. To used command lines (instead of clicking pull down menu)
  Go to Editor -> Enable Commands

Let us carry out the analysis (linear and quadratic discriminant analysis of data set halodiscrim2.xls described in the Lecture Notes on Discriminant Analysis with MINITAB and interpret them.  Data are in this Excel file. The first column contains the supervisor classification of cases as NonHyades (1) and Hyades (2).  The other columns are data on RA, DE, PMRA, PMDE. The MINITAB commands are:

 

MTB> discriminant  c1 c2-c5;

SUBC>priors p1 p2;

SUBC>xval.

 

You will notice from Crossvalidation results that the linear discriminant function is not good enough.

You can perform quadratic discriminant analysis by using the subcommand

 

MTB> discriminant  c1 c2-c5;

SUBC>priors p1 p2;

SUBC> quadratic;

SUBC>xval.

together with the other subcommands.


2. EM ALGORITHM FOR MIXTURE ESTIMATION WITH EMMIX

For this exercise let us use the unsupervised version of the same data set which is given in data file halomixture.txt.  Download EMMIX.

program to be used:         2

number of entities:         149

number of variables:          4          

number of variables to be used:      4

number of components:      2

covariance matrices:     1 (equal)

automatic methods used for initial groupings

                                    with 3  random start(s)

random seeds (give 3 random 4 digit numbers)

percent of the data used for initial start    20 (say)

 

The output will consist of estimates of the mixture parameters.


NSFDepartment of StatisticsEberly College of ScienceDepartment of Astronomy and Astrophysics