Statistical Model Selection Techniques for Data Analysis

J. Alex Stark

PhD Thesis, Department of Engineering, University of Cambridge, October 1995.

This thesis is concerned with the selection of models for data and signals. Reliable and practical algorithms are developed for searching for the optimal model from a set of candidates, and their performance is examined using numerical experiments. Two different approaches are taken to exploring the model space: direct random sampling of trial models, and stochastic stepwise searching.

The starting point for model selection is the formulation of statistical criteria for comparing candidate models, and hypothesis tests on the parameters are shown to be an interesting means of deriving these. Consideration is given to a class of models that can be easily analysed, can be applied to different modelling problems, and yet can represent a variety of nonlinearities.

Matrix methods provide the means of evaluating trial models efficiently. An operator termed the `gyration of a matrix' is investigated, and a method for stepwise model analysis is developed using this. Matrix factorisations can be used to simultaneously evaluate more than one randomly-selected trial model.

The selection of models according to a criterion can be viewed as a task in optimisation, when the criterion is the cost function, or as a task in probabilistic inference. The combination of these perspectives leads to insights into the treatment of the search task.

The analysis of trial models is an important part of the selection process. An effective method is to sample models that are larger than that finally selected (`oversized trial model sampling'), to evaluate the quality of each term individually, and to consider the trial models ranked in accordance with their performance.

Numerical experiments with a test modelling problem demonstrate that the two algorithms are effective in searching for the best model. The direct method performs well when there are relatively few candidate variables. The stochastic stepwise search is very efficient on a wide range of problems, and may prove to be a powerful tool in practical applications.