cv.glm {boot}  R Documentation 
This function calculates the estimated Kfold crossvalidation prediction error for generalized linear models.
cv.glm(data, glmfit, cost, K)
data 
A matrix or data frame containing the data. The rows should be cases and the columns correspond to variables, one of which is the response. 
glmfit 
An object of class "glm" containing the results of a generalized linear
model fitted to data .

cost 
A function of two vector arguments specifying the cost function for the
crossvalidation. The first argument to cost should correspond to the
observed responses and the second argument should correspond to the predicted
or fitted responses from the generalized linear model. cost must return a
nonnegative scalar value. The defualt is the average squared error function.

K 
The number of groups into which the data should be split to estimate the
crossvalidation prediction error. The value of K must be such that all
groups are of approximately equal size. If the supplied value of K does
not satisfy this criterion then it will be set to the closest integer which
does and a warning is generated specifying the value of K used. The default
is to set K equal to the number of observations in data which gives the
usual leaveoneout crossvalidation.

The data is divided randomly into K
groups. For each group the generalized
linear model is fit to data
omitting that group, then the function cost
is applied to the observed responses in the group that was omitted from the fit
and the prediction made by the fitted models for those observations.
When K
is the number of observations leaveoneout crossvalidation is used
and all the possible splits of the data are used. When K
is less than
the number of observations the K
splits to be used are found by randomly
partitioning the data into K
groups of approximately equal size. In this
latter case a certain amount of bias is introduced. This can be reduced by
using a simple adjustment (see equation 6.48 in Davison and Hinkley, 1997).
The second value returned in delta
is the estimate adjusted by this method.
The returned value is a list with the following components.
call 
The original call to cv.glm .

K 
The value of K used for the Kfold cross validation.

delta 
A vector of length two. The first component is the raw crossvalidation estimate of prediction error. The second component is the adjusted crossvalidation estimate. The adjustment is designed to compensate for the bias introduced by not using leaveoneout crossvalidation. 
seed 
The value of .Random.seed when cv.glm was called.

The value of .Random.seed
is updated.
Brieman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984) Classification and Regression Trees. Wadsworth.
Burman, P. (1989) A comparitive study of ordinary crossvalidation, vfold crossvalidation and repeated learningtesting methods. Biometrika, 76, 503–514
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Efron, B. (1986) How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association, 81, 461–470.
Stone, M. (1974) Crossvalidation choice and assessment of statistical predictions (with Discussion). Journal of the Royal Statistical Society, B, 36, 111–147.
# leaveoneout and 6fold crossvalidation prediction error for # the mammals data set. data(mammals, package="MASS") mammals.glm < glm(log(brain)~log(body),data=mammals) cv.err < cv.glm(mammals,mammals.glm) cv.err.6 < cv.glm(mammals, mammals.glm, K=6) # As this is a linear model we could calculate the leaveoneout # crossvalidation estimate without any extra modelfitting. muhat < mammals.glm$fitted mammals.diag < glm.diag(mammals.glm) cv.err < mean((mammals.glm$ymuhat)^2/(1mammals.diag$h)^2) # leaveoneout and 11fold crossvalidation prediction error for # the nodal data set. Since the response is a binary variable an # appropriate cost function is cost < function(r, pi=0) mean(abs(rpi)>0.5) nodal.glm < glm(r~stage+xray+acid,binomial,data=nodal) cv.err < cv.glm(nodal, nodal.glm, cost, K=nrow(nodal))$delta cv.11.err < cv.glm(nodal, nodal.glm, cost, K=11)$delta