Modelling phylogenetic trees of proteins

J.A. Stark, K. Tatsuoka and F. Seillier-Moiseiwitsch.

1999 Spring Research Conference on Statistics in Industry and Technology, June 1999.

Reconstructing phylogenies (evolutionary histories) of proteins from a set of observed sequences is an important statistical modelling task in molecular evolution. Both maximum likelihood and Bayesian methods have been used for phylogenetic inference, but significant simplifying assumptions have been made in the evolutionary models. In particular, mutations at different locations are assumed to occur independently. We propose a model which allows for correlations between the mutations along protein sequences. We set out a Markov chain Monte Carlo sampler for analysing these models. The specific case of HIV protease is considered.