VOStat's missing data policy
Statistical data sets are often riddled with missing
values. These often cause standard statistical methods in R to
break down in unexpected ways. To avoid such possibilities VOStat
filters out all rows of the data set that has at least one
missing value. This filtering is built into the uploading
process. VOStat reports the indices of the deleted rows as
part of the data summary.
Let us look at an example. Suppose that your data set is as
follows, where the red cells have missing values, and the green cells
have actual values.
Important columns | Unimportant columns
|
---|
| |
|
|
|
Then VOStat will filter away all the rows (and produce an
error message complaining that the data set has size 0), as each
row in the data set has at least one missing value. This, however, is hardly desirable, as all the
missing values occur only in the unimportant columns. A good
strategy here would be to
remove the unimportant columns (using some other software) prior to uploading the data into
VOStat, as VOStat has no way of knowing which columns are
important and which are not.