Permutation Feature Importance for Classification with Parallel Factor Analysis

Calculates permutation feature importance for results from a 'wrapcpfa' object generated by function cpfa. Allows for calculation of a conditional permutation feature importance as an alternative.

Usage

pficpfa(object, nshuffles = 10, type = c("marginal", "conditional"), 
        conditional.model = c("ridge", "rf"), ridge.lambda = 1e-4, 
        ntree = 500, nodesize = 5, safealign = FALSE, 
        safealign.stat = c("min", "mean", "median"), safealign.threshold = 0.9, 
        parallel = FALSE, cl = NULL)

Arguments

object: An object of class 'wrapcpfa' from function cpfa.
nshuffles: Single positive integer specifying the number of times each feature (or its conditional residuals) is permuted per replication. Permutation feature importance is averaged over these shuffles. Defaults to 10.
type: Character specifying type of permutation feature importance (PFI) to calculate. If type = "marginal", calculates regular PFI by shuffling each feature's observations. If type = "conditional", first fits a conditioning model to predict a given feature using all other features. Then, randomly shuffles the model's residuals. Reconstructs a given feature as the sum of its predicted values and shuffled residuals. Calculates PFI using this reconstructed feature, which is a conditional calculation. Defaults to marginal.
conditional.model: Character indicating the conditioning model to use when type = "conditional". For conditional.model = "ridge", uses ridge regression as the conditioning model. For conditional.model = "rf", uses random forest as the conditioning model. Defaults to "ridge".
ridge.lambda: Single, numeric real number greater than zero indicating the ridge regression penalty when both type = "conditional" and conditional.model = "ridge". Defaults to ridge.lambda = 1e-4.
ntree: Single, numeric integer greater than zero indicating the random forest number of trees parameter when both type = "conditional" and conditional.model = "rf". Defaults to ntree = 500.
nodesize: Single, numeric integer greater than zero indicating the random forest node size parameter when both type = "conditional" and conditional.model = "rf". Defaults to nodesize = 5.
safealign: Logical indicating whether to remove replications for component models from object based on the values of tccb (see help file for function cpfa for more information). Defaults to FALSE.
safealign.stat: Character indicating the statistic to use to remove replications for component models from object based on the values of tccb. For example, when safealign.stat = "min", for each component model, the minimum value of tccb is identified for each replication. Defaults to "min".
safealign.threshold: Single, positive real number between 0 and 1, exclusive. Indicates threshold used to remove replications for component models from object based on the values of tccb. For example, if safealign.stat = "median", for each component model and for safealign.threshold = 0.9, any replications with a median tccb value below 0.9 are removed from feature importance calculations. Defaults to 0.9.
parallel: Logical indicating if parallel computing should be implemented. Defaults to FALSE, which implements sequential computing.
cl: Cluster for parallel computing, which is used when parallel = TRUE. Note that if parallel = TRUE and cl = NULL, then the cluster is defined as makeCluster(max(1L, detectCores() - 1L)).

Details

Function pficpfa measures each feature's contribution to classification performance via permutation feature importance (PFI). The function requires a 'wrapcpfa' object from the function cpfa where argument align = TRUE. For each replication and component model, the predicted weight matrix (consisting of components first, then features from argument z in cpfa) contains the features. Each feature is permuted nshuffles times; the importance is the resulting change in each performance measure, averaged over permutations. When type = "marginal", the feature is permuted directly, corresponding to the feature importance of Breiman (2001). When type = "conditional", a conditioning model (ridge regression or random forest) predicts the feature from the others; and the model's residuals are permuted and added back to the fitted values, preserving the feature's dependence on other features (for more information, see Huang, 2025, and O'Gorman, 2005). When safealign = TRUE, replications are screened per component model using the Tucker congruence coefficients in object$tccb: replications whose safealign.stat value does not exceed safealign.threshold are excluded. When parallel = TRUE, parallel computing is used; otherwise, sequential computing is used.

Value

Returns a data frame containing the following nine variables: (1) nfac, a numeric indicating the number of components in the model; (2) feature, an integer indexing a given feature; (3) method, a character indicating the classification method; (4) metric, a character indicating the classification performance measure used for permutation feature importance (see help file for function cpm for more information); (5) mean, a numeric value that is the mean permutation feature importance among all valid replications; (6) median, a numeric value that is the median permutation feature importance among all valid replications; (7) sd, a numeric value that is the standard deviation among all permutation feature importance values for valid replications; (8) n, an integer indicating the number of replications available for calculating feature importance after removing replications via safealign; and (9) nvalid, an integer indicating the actual number of replications used for the calculation after removing any replications resulting in NaN.

Author

Matthew A. Asisgress <mattgress@protonmail.ch>

References

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

Huang, P. (2025). Residual permutation tests for feature importance in machine learning. British Journal of Mathematical and Statistical Psychology.

O'Gorman, T. (2005). The performance of randomization tests that use permutations of independent variables. Communications in Statistics - Simulation and Computation, 34(4), 895-908.

Examples

########## Parafac2 example with 4-way array and multiclass response ##########
if (FALSE) { # \dontrun{
# set seed
set.seed(5)

# define list of arguments specifying distributions for A and G weights
techlist <- list(distA = list(dname = "poisson", 
                              lambda = 3),                 # for A weights
                 distG = list(dname = "gamma", shape = 2, 
                              scale = 4))                  # for G weights

# define target correlation matrix for columns of D mode weights matrix
cormat <- matrix(c(1, .6, .6, .6, 1, .6, .6, .6, 1), nrow = 3, ncol = 3)

# simulate a four-way ragged array connected to a response
data <- simcpfa(arraydim = c(10, 11, 12, 100), model = "parafac2", nfac = 3, 
                nclass = 3, nreps = 1e2, onreps = 10, corresp = rep(.6, 3), 
                meanpred = rep(2, 3), modes = 4, corrpred = cormat,
                technical = techlist, smethod = "eigende")

# initialize
alpha <- seq(0, 1, length = 20)
gamma <- c(0, 1)
cost <- c(0.1, 5)
ntree <- c(200, 300)
nodesize <- c(1, 2)
size <- c(1, 2)
decay <- c(0, 1)
rda.alpha <- seq(0.1, 0.9, length = 2)
delta <- c(0.1, 2)
eta <- c(0.3, 0.7)
max.depth <- c(1, 2)
subsample <- c(0.75)
nrounds <- c(100)
method <- c("PLR", "SVM", "RF", "NN", "RDA", "GBM")
family <- "multinomial"
parameters <- list(alpha = alpha, gamma = gamma, cost = cost, ntree = ntree,
                   nodesize = nodesize, size = size, decay = decay, 
                   rda.alpha = rda.alpha, delta = delta, eta = eta,
                   max.depth = max.depth, subsample = subsample,
                   nrounds = nrounds)
model <- "parafac2"
nfolds <- 10
nstart <- 10

# constrain first mode weights to be orthogonal, fourth mode to be nonnegative
const <- c("orthog", "uncons", "uncons", "nonneg")

# fit Parafac2 model and use fourth mode weights to tune classification
# methods, to predict class labels, and to return classification 
# performance measures pooled across multiple train-test splits
output <- cpfa(x = data$X, y = data$y, model = model, nfac = 3, 
               nrep = 5, ratio = 0.9, nfolds = nfolds, method = method, 
               family = family, parameters = parameters, align = TRUE,
               type.out = "descriptives", seeds = NULL, plot.out = TRUE, 
               parallel = FALSE, const = const, nstart = nstart)

# calculate permutation feature importance for output
pfistats <- pficpfa(output, nshuffles = 5, type = "marginal", safealign = FALSE)
} # }