Compute CV statistics from a matrix of predictions.

computeError(
  predmat,
  y,
  lambda,
  foldid,
  type.measure,
  family,
  weights = rep(1, dim(predmat)[1]),
  grouped = TRUE
)

Arguments

predmat

Array of predictions. If `y` is univariate, this has dimensions `c(nobs, nlambda)`. If `y` is multivariate with `nc` levels/columns (e.g. for `family = "multionmial"` or `family = "mgaussian"`), this has dimensions `c(nobs, nc, nlambda)`. Note that these should be on the same scale as `y` (unlike in the glmnet package where it is the linear predictor).

y

Response variable. Either a vector or a matrix, depending on the type of model.

lambda

Lambda values associated with the errors in `predmat`.

foldid

Vector of values identifying which fold each observation is in.

type.measure

Loss function to use for cross-validation. See `availableTypeMeasures()` for possible values for `type.measure`. Note that the package does not check if the user-specified measure is appropriate for the family.

family

Model family; used to determine the correct loss function.

weights

Observation weights.

grouped

This is an experimental argument, with default `TRUE`, and can be ignored by most users. For all models except `family = "cox"`, this refers to computing `nfolds` separate statistics, and then using their mean and estimated standard error to describe the CV curve. If `FALSE`, an error matrix is built up at the observation level from the predictions from the `nfolds` fits, and then summarized (does not apply to `type.measure="auc"`). For the "cox" family, `grouped=TRUE` obtains the CV partial likelihood for the Kth fold by subtraction; by subtracting the log partial likelihood evaluated on the full dataset from that evaluated on the on the (K-1)/K dataset. This makes more efficient use of risk sets. With `grouped=FALSE` the log partial likelihood is computed only on the Kth fold.

Value

An object of class "cvobj".

lambda

The values of lambda used in the fits.

cvm

The mean cross-validated error: a vector of length `length(lambda)`.

cvsd

Estimate of standard error of `cvm`.

cvup

Upper curve = `cvm + cvsd`.

cvlo

Lower curve = `cvm - cvsd`.

lambda.min

Value of `lambda` that gives minimum `cvm`.

lambda.1se

Largest value of `lambda` such that the error is within 1 standard error of the minimum.

index

A one-column matrix with the indices of `lambda.min` and `lambda.1se` in the sequence of coefficients, fits etc.

name

A text string indicating the loss function used (for plotting purposes).

Details

Note that for the setting where `family = "cox"` and `type.measure = "deviance"` and `grouped = TRUE`, `predmat` needs to have a `cvraw` attribute as computed by `buildPredMat()`. This is because the usual matrix of pre-validated fits does not contain all the information needed to compute the model deviance for this setting.

Examples

set.seed(1) x <- matrix(rnorm(500), nrow = 50) y <- rnorm(50) cv_fit <- kfoldcv(x, y, train_fun = glmnet::glmnet, predict_fun = predict, keep = TRUE) mae_err <- computeError(cv_fit$fit.preval, y, cv_fit$lambda, cv_fit$foldid, type.measure = "mae", family = "gaussian")