The two by far most used models for prepayments are the Proportional Hazard (PH) model and Logistic Regression (LR) model.
Proportional Hazard Model
The Proportional Hazard model, introduced by Cox in 1972, originates from the analysis of survival data (survival times of e.g., cancer patients under various types of treatment). Cox PH model is the most famous statistical model and his 1972 paper is one of the most cited papers in the world. In the Proportional Hazard model, the quantity of interest is the (random) time to event, usually death, and the distribution of this event time is characterized by the so‐ called hazard rate, which is the probability of the event occurring in the next small time interval, given it has not occurred before. The hazard rate is modelled as the product of the so‐called baseline hazard, which describes the time development of the “typical” hazard rate of an average patient, and a multiplier, containing the influence of patient‐specific variables, such as gender, age, biometric characteristics and, most importantly, the type of treatment.
In prepayment modelling, the event is the prepayment of the mortgage and we model the hazard of prepayment occurring in the next month, given the mortgage has not been prepaid yet. The baseline hazard rate is then the “typical” prepayment profile, which usually depends on the age of the mortgage and exhibits the so‐ called S‐shape, indicating that newly initiated mortgages and those mortgages close to expiration have lower repayment rates than those in between. The multiplier of the hazard rate contains mortgage‐specific factors which we will outline below.
The attractions of the PH model are numerous. First of all, its estimation procedures are well‐ developed, even in cases when the data in incomplete. Second, it has a massive interpretability, whereby we can pinpoint and quantify in a very intuitive way the influence of various factors on the likelihood of prepayment. Finally, the model is very flexible, in terms of inclusion of time‐varying factors and parameters, commonality and specificity factors and so on.
Despite all these attractions, the PH model is not as widely used by banks for modelling prepayments, due to its relative complexity and unfamiliarity of finance professionals with this model, since it comes from medical sciences. This in our view is a pity, because it is a great model and our experience with prepayment modelling shows that it produces superior results to other (e.g., logistic regression) models.
Logistic Regression model
The logistic regression (its logit or probit variant) is another widely used model for prepayments. The response (or the dependent variable) in such a regression model is a binary variable, indicating the occurrence of a certain event (prepayment in this case). The regressors or independent variables are the mortgage‐specific as well as overall economic factors, as in PH model. It can be shown that in some cases (e.g., low hazard rates), the logistic regression model is close to PH model.
After fitting the LR model to the historical data, we can plug in characteristics of a new mortgage into the model, together with the economic factors, and the outcome of the model will be the probability that the mortgage will be prepaid early
The logistic regression is a tool widely used in banks already (e.g., in default modelling and credit scoring), hence it is more popular model among practitioners for prepayments. However, it often lacks the flexibility and interpretability of the PH hazard model, and in our experience with prepayment modelling, can produce inferior results when compared to PH model.