Fast Algorithm for Variable Selection

A Fast Algorithm for Variable Selection in the Normal Linear Model

J.A. Stark and W.J. Fitzgerald

Technical Report CUED/F-INFENG/TR 288 (1997), Department of Engineering, University of Cambridge

A standard problem in statistical modelling is that of variable selection in the linear model. If models can be composed of any selection from a set of candidate variables, then evaluation of all possible models is infeasible. A typical alternative approach, stochastic stepwise searching, is to evaluate a series of trial models. Small changes are made at each step, and these changes are made randomly in accordance with the characteristics of the resulting models. The Markov chain Monte Carlo treatment of the variable selection for certain hierarchical formulations takes this form.

An algorithm is presented in this paper for evaluating the effect of removing a candidate variable from the trial model or adding one in. The calculations are performed in order-p operations for each candidate variable excluded from the model, where $p$ is the size of the trial model. This is a considerable improvement over standard update methods, which require order-p-squared operations.

The derivations are presented in a step-by-step fashion. However, the algorithm was originally devised through the framework of the matrix gyration operator. This approach involves indentifying common subexpressions, and then considering the store-and-update options for them. The technique is a useful methodology for designing numerical algorithms such as the linear variable selection problem.