We evaluated the accuracy and robustness of the forecast of life expectancy at age 65 in Western Europe for six different options for the jump-off rates. We observed that the options for the jump-off rates clearly influence the accuracy and robustness of the mortality forecast, albeit in different ways. For most countries, the most accurate forecast resulted from taking the last observed values as jump-off rates, which relates to the relatively high estimation error of the model in recent years. The most robust forecast was obtained by using an average of the most recent observed years as jump-off rates. The more years that are averaged, the better the robustness, but accuracy decreases with more years averaged. The best choice for the jump-off rates, thus, seems to depend on whether you are interested mainly in accuracy or robustness, on the country-specific past mortality trends, or the model fit.
The influence of the choice of the jump-off rates on the accuracy and robustness of the forecast can be substantial. Figure 2 in Appendix 1 gives an example for the Netherlands of a forecast with the model values as jump-off rates, a forecast with the last observed values as jump-off rates, and a forecast with an average of five observed years as jump-off rates, with different fitting periods. The forecasts with the model values as jump-off rates are not accurate, i.e. there are large gaps between the observed values and the forecasts in the first year. The forecasts with the model values are also not robust: the successive forecasts, using different fitting periods, show large differences (i.e. increases and decreases between successive forecasts) between the successive forecasted e65 for a particular year. For the forecasts with the last observed values as jump-off rates the accuracy is improved, and, from the analysis, the most accurate from the six options for the jump-off rates. However, the successive forecasts are also showing large differences between the successive forecasted e65 in a particular year. Lastly, the successive forecasts with an average of five observed years as jump-off rates are slowly increasing with each new year of data added to the fitting period. This option for the jump-off rates was the most robust for the Netherlands.
Evaluation of analysis
We assessed the effect of the choice of the jump-off rates by means of two important evaluation criteria for a mortality forecasting method: robustness and accuracy (Dowd et al. 2010a, 2010b; Cairns et al. 2011). A third evaluation criterion for evaluating a mortality forecast is plausibility (Cairns et al. 2011): is the outcome of the forecast reasonable given what we know? This is rather a subjective issue for which there are no objective measures, and for that reason, we did not include it in the analysis. Nonetheless, plausibility is important to consider when performing a mortality forecast. A plausible future age pattern is an important issue related to the plausibility of the results. Different characteristics of the jump-off rates, such as a rough age pattern of the last observed values, have an effect on the plausibility of the future age pattern of mortality. To limit the effect of the choice of the jump-off rates on the plausibility of the future age pattern, smoothing the observed mortality rates by age is recommended.
We performed the different mortality forecasts using the Lee-Carter method, which is frequently used for mortality forecasting in practice (Stoeldraijer et al. 2013), as benchmark method (Booth and Tickle 2008), and as the basis for more recent mortality forecasting models (Booth and Tickle 2008; Lee and Carter 1992). The Lee-Carter method, however, is known to be biased and tends to underpredict future mortality (Bell 1997; Lee 2000; Lee and Miller 2001; Booth et al. 2002; Girosi and King 2007; Liu and Yu 2011), as we have also seen in Table 2 where the mean error in the last ten years of the fitting period was negative for most countries. Therefore, differences between the last observed values and the model values tend to be relatively large. For this reason, we performed a sensitivity analysis using two additional models: (i) a Lee-Carter model using three principal components (Appendix 2), because based on earlier research, it is unnecessary to adjust the jump-off rates when several principal components are used (Hyndman et al. 2013); and (ii) the Cairns-Blake-Dowd model (Cairns et al. 2006; Appendix 3), which is considered a different stochastic model compared to the Lee-Carter model and widely used in actuarial sciences. The results show smaller differences in outcomes compared to differences we observed earlier with the Lee-Carter model, but, especially for accuracy, the importance of the jump-off rates remains. This highlights the importance of the model for the best choice of the jump-off rates.
We showed the results of our analysis for men and women combined. Similar results are observed however for men and women separately (see Tables 8 and 9 in Appendix 4). Also for men (with the exception of Finland) and women separately, an average of multiple years as jump-off rates was preferred for the most robust forecast. For the most accurate forecast, there was some more variation in the results for men and women separately compared to men and women combined. For men in France and Spain, the forecast is most accurate when using model values as jump-off rates, although accuracy is only slightly higher compared to the last observed values. For women, the most accurate forecast in the fifth year of the forecasting period is obtained by using the last observed values. The accuracy of the forecast for the first year of the forecasting period shows for women mostly small differences between choices for the jump-off rates but resulted in model values (France, Sweden), last observed values (Belgium, the UK), and an average (Spain, Norway, Finland, the Netherlands).
We deliberately computed the accuracy and the robustness measures directly for life expectancy at age 65, because of the use of this indicator in the pension reforms. For different contexts, e.g. life insurance and pension valuation, an evaluation of other outcomes (e.g. death rates or probabilities) would be relevant and could lead to different outcomes. That is, for different age groups the model fit, and subsequently, the choice of the jump-off rates might be different. Booth et al. (2006) compared both errors in life expectancy and log death rates when analysing the accuracy for different choices of the jump-off rate. They concluded that the accuracy in log death rates does not necessarily translate into accuracy in life expectancy. Analysis based on forecasted log death rates might therefore lead to different conclusions, but in general, last observed values as jump-off rates would give the most accurate forecast (Booth et al. 2006). The above indicates that the context of the forecast determines the outcome measure used in the analysis of the jump-off rates and, hence, the final choice for the best jump-off rates.
Generalizability of our outcomes
We evaluated the results based on the life expectancy at age 65 in relation to pension reforms. Results based on the life expectancy at birth (e0) are very similar to the results based on e65 (Appendix 5). The differences between the six options for the jump-off rates for both accuracy and robustness are slightly larger for e0 than for e65. This means that our conclusions can be generalised to other ages of life expectancy.
We focused our analysis on Western Europe, because of the prevalence of the pension reforms. Our findings can be generalised to countries which have seen similar trends in the past. For example, the results for the Netherlands are expected to be close to the results for Denmark, since both experienced a stagnation of the increase in life expectancy at approximately the same time (Janssen et al. 2004). Similarly, our results for the remaining Western European countries can be generalised to other countries exhibiting fairly regular increases in life expectancy, like Japan since 1970 (Leon 2011). Generalising our results to Eastern Europe, however, will be more daunting because these countries experienced very different past mortality trends due to the health crisis from 1975 onwards (McKee and Shkolnikov 2001; Vallin and Meslé 2004; Leon 2011). The Lee-Carter method is most likely not suited to account for these specific past mortality trends (Bohk and Rau 2015). Before evaluating different choices for the jump-off rates in the context of Eastern Europe, first, the forecasting method needs to be improved.
Recommendations
Following our findings, we recommend the goal of the forecast, and the related emphasis on accuracy, robustness, or both, to be leading for determining the best choice of the jump-off rates.
If the goal of the mortality forecast is focused on accuracy, it is relevant to examine the error of the estimates of the model over the period it is applied to, following its importance in explaining our results for accuracy. We recommend the model values as most suitable as jump-off rates for an accurate forecast when the errors are small. We recommend the last observed values as most suitable jump-off rates when the model errors are large and there is an underestimation of the model in the most recent period. With large errors and an overestimation of the model in the most recent period, we recommend to use the model values as jump-off rates, following our results of men and women separately.
If the goal of the mortality forecast is focused on robustness, we recommend using an average of multiple years as jump-off rates, as it was the most suitable for a robust forecast for all countries in our analysis. There was little difference in the outcomes between a 2-year average and a 5-year average; thus, the number of years used in the averaging is less important. Robustness becomes more important in situations where the forecast is made regularly, for instance when the future retirement age based on the forecasted life expectancy needs to be determined every year.
Because often the goal of the forecast is focused both on accuracy and robustness, the most optimal choice for the jump-off rates must give the most accurate as well as the most robust forecast. For each country in our analysis, there was no option of the jump-off rates that guaranteed accuracy and robustness at the same time. Thus, there always has to be a trade-off between accuracy and robustness. Therefore, we recommend looking into developing a choice for the jump-off rates that is both accurate and robust. Our four recommendations for determining the best choice for the jump-off rates that give both an accurate and robust forecast are as follows: (1) Because the accuracy of the forecast decreases distinctly with the averaging of more observed years as jump-off rates, whereas the robustness of the forecast stayed approximately the same, it is preferable to use an average using as few years as possible to improve the accuracy with a robust forecast. (2) Using the observed values instead of the model values in case the model fits the data well does not improve accuracy and deteriorates the robustness. Thus, in the case the model fits well, it is best to use the model values as jump-off rates and not the observed values as is often done by force of habit. (3) The further ahead, the less accurate the forecast gets. This means that the relative price you pay for more robustness is lower for a forecast further in the future. If the forecast further in the future is of more importance than the short-term forecast, there should be a greater value attached to the robustness of the forecast, and thus the best option for the most robust forecast can be selected. (4) In line with the previous recommendations, to best unite the results for robustness and accuracy, we would recommend interpolation (see Appendix 6 for an example). Robustness is more important for the long-term forecast (for instance, from 5 years in the future) as a result of the increasing uncertainty with duration. For the first few years, accuracy would be more relevant because data for these years will be available quickly. Our recommendation would be to start with a forecast using a jump-off rate that is the most accurate in the first year. Subsequently, make a forecast that is most robust in, say, the fifth year of the forecast period. Between the two forecast, each year, more weight should be given to the most robust forecast, i.e. we recommend interpolating from the most accurate forecast to the most robust forecast. By interpolating between the two forecasts, both accuracy in the first year of the forecast and robustness of the forecast 5 years ahead is obtained.
An additional issue to consider is to match the forecast to recent data, it is important that it is of good quality. Preliminary data might underestimate or overestimate the life expectancy. Using jump-off rates based on this data might not work well for the accuracy (to final data) of the forecast. It might also turn out to be disadvantageous for the robustness if the preliminary data is replaced by final data. The use of preliminary data is therefore not recommended when matching the forecast to recent data.