On the evolution of the gender gap in life expectancy at normal retirement age for OECD countries

Population aging is evolving at different rates across countries and over time, and it represents a long-term challenge for both the sustainability of pension schemes and for the realization of public intergenerational transfers. In this context, this work focuses on gender differences in survival at older ages. Specifically, we implement a comparative analysis of OECD countries to assess the adequacy of the corresponding gender-specific normal retirement age when faced with growing life expectancy. The analysis hinges on several graphical representations and is motivated by recent findings on Italian longevity to determine optimal retirement age shifts necessary to match growing life expectancy at older ages while accounting for model risk for mortality projections. Our analysis determines—at the country level—the extent to which adjustments to the normal retirement age are advisable for the sustainability of the intergenerational paradigm for pensions. The study considers males and females separately because most of these countries are characterized by aging societies where men and women have different crucial characteristics, including life expectancy. It is therefore important that policymakers have information on the future evolution of the longevity gender gap so they will be able to apply policies that preserve the principles of equality and solidarity and reduce the pension gender gap. We find groups of countries where gender gap in life expectancy follows the same dynamics.

this issue by ensuring that retirement age is linked to life expectancy, making accurate projections of life expectancy vital for achieving sustainability and solvency in pension systems (Bravo et al., 2021). Across the world, the trend toward reforming pension systems has proceeded at different rates because the way countries deal with the aging problem is influenced by their individual histories and preferences (Betti et al., 2015).
Population aging affects public finances and labor markets, as well as individuals' behavior and corporate strategies (Visco, 2006). Recent mortality trends have been volatile, and as a result, future longevity has proved increasingly difficult to predict. Given the projected demographic trends, the sustainability of social security systems is an issue that is frequently cited and explored in the most recent literature, as pension providers face the risk that retirees might on average live longer than expected. The risk that mortality will deviate from its mean has two underlying components: random variation risk and trend risk. Random variation risk, also known as individual mortality/longevity risk, is the risk that individual mortality rates differ from what is expected; in other words, some people will die before their life expectancy and some will die after. Trend risk, also called aggregate mortality/longevity risk (as defined by the International Actuarial Association), is the risk that unexpected changes in lifestyle behavior or medical advances will significantly improve longevity. A state pension system mitigates random variation risk by pooling a large number of different individuals and relying on the law of large numbers to reduce its variability. Longevity risk represents a relevant risk for a state pension system because it cannot be diversified away by pooling. Consequently, it poses more challenges than individual longevity risk because it cannot be shared among members of the same cohort. Incorrect forecasts of mortality trends and life expectancy imply that pension benefits have to be paid for more years than expected, thus severely impacting public finances if it is not properly assessed from both financial and demographic perspectives. Hence, the adoption of a stochastic mortality model to assess the impact of longevity risk on pension expenditure has the advantage of generating predictions for the evolution of population over time in each cohort. This is the requirement needed to assess the impact of sequential social security reforms that, by typically not being retroactive, have a heterogeneous impact on different cohorts of the population (Bisetti & Favero, 2014).
More generally, if on one hand the sustainability of retirement policies must be pursued, the dynamic of the gender gap in longevity has to be properly acknowledged and controlled to maintain the efficacy of the intergenerational solidarity paradigm for pensions.
Gender inequality in the retirement period is a fact. It is essentially due to the longevity gap and the wage gap. On average, women live longer than men, but during their working life, they earn less and therefore receive smaller retirement incomes than men. This disparity is the result of multiple factors related to education, family, childcare, society, and so on. Many European countries have tried to reduce these inequalities by adopting policies based on the principles of solidarity and redistribution. The importance of these principles was recently confirmed by the European Parliament resolution of January 30, 2020. However, the different treatment of men and women has been increasingly questioned by policymakers. In particular, Directive 2006/54/EC (following Directive 96/97/EC) prohibits any gender discrimination in the field of pension regulation, reducing the possibility of resorting to redistributive policies. As a result of this Directive, most EU countries have introduced a single retirement age for men and women, while other countries continue to compensate women for the time spent on childcare by applying derogations in accordance with Article 141 (4) of that Directive. However, it is important to highlight that failing to consider the issue of gender in an aging society where men and women differ in crucial characteristics like life expectancy and earning opportunities necessarily leads to reduced overall welfare (Barigozzi et al., 2021).
Thus, the design of dynamic retirement schemes, anchored to life expectancy at retirement age, should properly account for the gender gap in mortality trends and its evolution over time.
Given these premises, the present study will run a cross-country comparative analysis of the adequacy of the so-called Normal Retirement Age (NRA) for male and female populations issued by the OECD (2019), considering different gender-specific dynamics of mortality trends. "Normal" retirement is defined as receiving a full pension without penalties. In some schemes, a pension can be claimed earlier, from the "early" retirement age onwards, implying benefit penalties that adjust for the longer retirement period. The 2018 average NRA across OECD countries indicates theoretical ages applied to a person entering the labor force at age 22 and working without interruption, and it is used as a benchmark in this context. We highlight that, in our analysis, we take a differentiated NRA for men and women when indicated by the OECD report; however, we consider a single NRA if the OECD report provides the same NRA for men and women (as is the case with many countries). This is not to be confused with the mortality projections at the NRA, which are instead differentiated by gender (according to the evolution of gender-and countryspecific longevity trends). We consider mortality projections of NRA both for males and females because we believe that deepening the study of the longevity gap can support policymakers in implementing policies to reduce the pension gender gap.
As a matter of fact, the stronger link between earnings and pensions introduced in many pension reforms has intensified the already weak pension position of women (Queisser et al., 2007).
A hybrid demographic and actuarial rationale is at the heart of some recent findings on the Italian longevity experience: some authors (Coppola et al., 2019(Coppola et al., , 2020 have proposed an indexation mechanism for pensions, which would involve determining the genderspecific prospective adjustment in retirement age needed to match residual life span and assessing the sustainability of public pension expenditure with respect to mortality forecasts. The proposed approach enjoys robustness with respect to model selection since it resorts to a model-assembling technique for projections that make it possible to control for model risk; indeed, the analysis of mortality rates over time and cohorts is usually performed with extrapolative methods based on popular stochastic mortality models, such as the Lee-Carter (LC) model, the Renshaw-Haberman extension to deal with cohort effect, and further generalizations (Villegas et al., 2018).
With some more details, we test the extent to which an indexation mechanism for pensions would reduce the gender-specific impact of longevity on the sustainability of NRA, providing-for each country-a measure of the forward shift that should be applied to match the expected life span. Results are meant to provide guidelines to National Security Systems to better calibrate pension policies, and for this reason, it is important to take into account the differences in mortality between men and women because it is better to use more information than less in containing longevity risk. The visual analysis proposes an innovative use of control charts suitable for identifying even minor structural changes in temporal data.
The statistical and actuarial framework is briefly laid out in "The methodological framework" section, whereas "Exploratory data analysis of the gender gap" section develops the core of the paper with a cross-country comparative analysis of selected OECD members to assess the adequacy of the NRA given the country and gender-specific life expectancies. "Data framework" section introduces the data framework, and "Monitoring gender differences in life expectancy at NRA via control charts" section then discusses an exploratory statistical analysis of trends in the country-specific gender gap in life expectancy at NRA and the results of the application of the indexation mechanism. Specifically, we propose the use of exponentially weighted average control charts (Montgomery, 2005) to identify the extent to which the gender gap in survival at older ages varies over time. As a by-product, this approach makes it possible to identify groups of countries that share the same dynamics regarding the gender gap. A concluding section ends the discussion.

The methodological framework
In recent years, many countries have introduced pension system reforms to improve their financial sustainability. Many of these are based on the adoption of an automatic link between changes in life expectancy and pensions (Bravo et al., 2021;OECD, 2017).
In this section, we resort to the mechanism proposed by Coppola et al. (2019), who recommended the indexation of retirement age to life expectancy at a given reference age as obtained by projecting mortality tables for different cohorts.
The proposed cross-country comparative analysis relies on the NRA defined as the age at which an individual could retire in 2018 without any reduction to his/her pension, having had a full working career from age 22 (OECD, 2019). Even if this indicator is sometimes quite different from individuals' effective retirement age, the broad heterogeneity of pension schemes characterizing the chosen OECD countries makes it necessary to consider a universal and comparable measure.
In the following, we assume that each individual receives a constant monthly payment as long as he/she survives, starting at the NRA. Let e (M) NRA,C be the life expectancy at NRA for a given cohort C, according to the mortality model M.
If we consider the mortality trend and the subsequent longevity risk, we can realistically expect that life expectancy for younger generations will increase with respect to that of a benchmark cohort C * . As suggested by Coppola et al. (2019), we consider a stochastic mortality model M, fit it to data, and then use it for mortality projections for a given cohort C. We then determine the forward shift ( lag (M) C ) applicable to the NRA for cohort C to match the residual expected life span for cohort C * , in the formula: By applying that shift to the NRA for cohort C, the pension provider will pay benefits for a number of years not greater than the one for which it will have to pay them to an individual belonging to the reference cohort C*. A similar approach can be found in Denmark's pension reform, where the statutory retirement age will gradually increase, targeting the age at which the remaining life expectancy is 14.5 years (target retirement age). The target (1) lag NRA,C * . retirement age implies that the remaining life expectancy after retirement should be constant at 14.5 years (Alvarez et al., 2021). For the sake of illustration, we assume C * = 1960 and we consider that the lag is expressed in monthly fractions of a year.
On this basis, the paper presents a comparative analysis across selected OECD countries of the gender differences in life expectancy at NRA and in the lags to be applied to the NRA. The study has an exploratory nature because it is meant to provide guidelines on policy adjustments, and it will be structured in the following phases: 1. projection of mortality tables by using the Generalized Age-Period-Cohort (GAPC) mortality models shown in Table 1, in order to consider the longevity risk and calculate relative life expectancies; 2. introduction of an averaged model obtained with an assembling technique that allows us to mitigate the model risk arising when we choose one of the aforementioned models for prediction; 3. calculation of the shifts to be applied to the NRA for the male and female populations of each OECD country according to the indexing mechanism and the averaged mortality model; 4. comparative analysis of the gender gap both in life expectancy and in shifts required for NRA.

Mortality projections: GAPC mortality models
In actuarial calculations, age-specific measures of mortality are usually needed, and in a dynamic context, mortality is assumed to be a function of both the age x and the calendar year t (the so-called age-period approach). In our study, we refer to GAPC models to project mortality and we use the GAPC forecasts to obtain projected life table for each specific cohort (calendar year). GAPC models are a class of parametric models that link the force of mortality ( µ x,t ) 1 to a linear or bilinear predictor structure consisting of a series of factors. The reader is referred to Pitacco et al., 2009 for major details on mortality forecasting and to Villegas et al., (2018) for a concise yet comprehensive introduction to model implementation in the free R statistical environment.  (Brouhns et al., 2002;Lee & Carter, 1992) (Cairns et al., 2006) η The GAPC class includes most of the stochastic mortality models discussed in the demographic and actuarial literature given by the original LC model (Lee & Carter, 1992) and its Poisson version proposed by Brouhns et al., (2002), the extensions of the LC model proposed in Renshaw and Haberman (2006), the age-period-cohort (APC) model (Currie, 2006), the original Cairns-Blake-Dowd (CBD) model (Cairns et al., 2006), the extended CBD (M7) model of Cairns et al., (2009), and the model of Plat (2009). In Table 1, 2 we give a summary definition of the GAPC models we use, pointing out the corresponding systematic component and bibliographic references.
All of these models are derived from the LC model, which is a milestone in the literature related to mortality projections. The LC model has been widely extended to improve model performance and forecasting power. Brouhns et al. (2002) employed this model assuming a Poisson distribution of the number of deaths and using the log link function with respect to the force of mortality µ x,t . Some other proposals have been introduced in the literature to include components for capturing the cohort effect ( γ t−x ) , for instance in Renshaw and Haberman (2006), and a quadratic age effect (x − x) 2 − σ 2 x , as in the M7 model, to obtain the predictor. In the latter, x is the average age in the data, and σ 2 x is the average value of (x − x) . In 2006, Currie introduced the APC model, a substructure of the RH model. Cairns et al. (2006), using the two factor CBD model, propose a predictor structure with two age-period terms k (i) t (I = 1, 2), with age-modulating terms β (1) x = 1 and β (2) x = x − x , no age function α x and no cohort effect. Plat (2009) combines the CBD model with some features of the LC model to obtain a model that is suitable for considering full age ranges and captures the cohort effect.
In order to model and project mortality rates, we follow the steps summarized below: 1. Fitting the parameter estimates of GAPC stochastic mortality models by maximizing the model log-likelihood; 2. Inspecting the residuals of the fitted model to analyze mortality models' goodness of fit; 3. Modeling the period indexes k (i) t and the cohort index γ t−x by using time-series techniques; 4. Using the forecasted (simulated) values of the predictors to obtain forecasted (simulated) age-specific central mortality rates or age-specific one-year death probabilities.

Model averaging
Whatever the criterion we use to select a model to carry out the projections of mortality rates, the model risk arises. As Shang and Haberman (2018) observed, "Model averaging combines forecasts obtained from a range of models, and it often produces more accurate forecasts than a forecast from a single model". To mitigate this risk, we refer to a modelassembling technique tested by Buckland et al. (1997) and Benchimol et al. (2016). The assembling technique is based on the use of a weighted average of the forecasts obtained under competing models (see Coppola et al., 2019), with weights that account for the model's goodness of fit. Referring to the Akaike Information Criterion (AIC), which is widely used for model selection, we calculate the weights for the N projection models using the following formula: where AIC best indicates the lowest AIC value among those of the N competing models. Thus, the procedure assigns to projections from the ith model an importance that increases as the goodness of fit increases.
For the purposes of our analysis, we have fitted the selected models on mortality data from 1960 to 2017, which were then projected 30 years onwards. This method has been performed for each country and separately for male and female populations.
Finally, we want to point out that: -Regarding the fitting time window, we have not considered the period following the cardiovascular revolution. Since the early 2000s, there has been a marked slowdown in progress relating to cardiovascular mortality; consequently, focusing only on the period following the cardiovascular revolution may provide distorted results.
-Regarding the validation of the predictive capacity of the model it could be carried out by splitting the dataset in two subsets: the training set and the test set. We evaluate this approach to be not suitable to our study, as it would result in a test set strongly influenced by the effects of the economic crisis. In the long run, there is an established positive association between economic growth and life expectancy. In the short term, the situation is different; many authors have studied whether the mortality rates are pro-cyclical (for example, Cervini-Plá & Vall-Castelló, 2021). Our approach favors long-term trends in mortality, given that actuarial calculations concerning pensions are based on survival probabilities extending over a long-term horizon.

Lag calculation
For each of the selected countries, we consider the gender-specific NRA given by the OECD in 2019 (see Table 2 in "Data framework" section). We assume C* = 1960 as the benchmark and we calculate the life expectancy at NRA according to the projection derived from the model-assembling (MA) technique, e  (1). This indicator provides the gender-specific adjustment that can be applied to NRA to match e (MA) NRA,C * , thus measuring the extent to which the sustainability of NRA is undermined by the aging of the target population.

Comparative analysis of gender gap
Finally, we carry out a comparative analysis across selected countries with the aim of deriving similarities and differences in the dynamics of longevity risk associated with pension sustainability. To simplify the notation, let L M , L F denote the lags for men and female NRAs for a given cohort, as detailed in (1), "Lag calculation" section. Then, we consider their difference g = L M − L F as a straightforward measure of the gender gap in sustainability for the NRA. In particular, g = 0 denotes that, with respect to gender-specific NRA, similar adjustments should be made for men and women; g > 0, instead, indicates that NRA is less sustainable for men than it is for women. In this case, stronger adjustments are required for men with respect to the indexation of retirement age needed for women, as a consequence of a higher velocity of growth in life expectancy for men than for women. Conversely, countries for which g < 0 are characterized by a higher velocity in longevity risk for women than for men with respect to NRA. Thus, higher and positive values of this index indicate that longevity risk in NRA programs is more underestimated for men than it is for women. Conversely, the gender-specific evolution of longevity patterns entails that NRA for women exposes National Security Systems to unsustainability to a greater extent than for men if low and negative values occur for the gender gap in retirement age.

Exploratory data analysis of the gender gap
The present section is devoted to a two-step exploratory data analysis of the gender gap in survival at older ages. After introducing the data framework in "Data framework" section and providing the necessary information about the fitting procedures, "Monitoring gender differences in life expectancy at NRA via control charts" section will investigate the stability over time of the gender gap in life expectancy at NRA via a specific type of control chart, whereas "Exploratory data analysis for the indexation of NRA" section will focus on the gender gap in the gender-specific adjustment to be applied to NRA to match the benchmark e (MA) NRA,C * via visualization tools.

Data framework
As mentioned above, for the purpose of this analysis, we consider country-and genderspecific NRAs that were reported by the OECD (2019). For the chosen countries, Table 2 reports the relevant indicators of the longevity experience and gender-specific NRA (rounded to the closest unit for convenience). The total fertility rate in 2018 is also reported to provide a unifying view of the demographic scenario for each country (OECD, 2022). Table 2 bold highlights the countries that are deemed to suffer from longevity risk the most: in Europe, the number of people exposed to longevity risk (people over the age of 65) is particularly high for Italy, Portugal, Finland, and France: overall, except for Luxembourg and Norway, more than 30% of the population is over 65 years of age.
Mortality data for the selected countries were downloaded from the Human Mortality Database (2021) for men and women separately. In particular, to fit stochastic mortality models within the GAPC family, we have considered the period from 1960 to 2017 and ages from 55 to 89 as suggested by Villegas et al. (2018), since the CBD and the M7 models have been designed to fit higher ages specifically. Indeed, in the actuarial field, it is of greater interest to estimate future mortality rates for the individuals who belong to the oldest segment of the population. At the same time, including extreme outliers could lead to biased and equally implausible predictions: therefore, only those between the ages of 55 and 89 will be examined below.
For the sake of completeness, Table 3 reports the best-and worst-fitting models within the GAPC family for the considered mortality data, for each country, and separately for male and female populations. It turns out that different cohort effects are present for the considered countries and, in addition, gender differences emerge for several countries in the selection of the best fitting model. This circumstance supports the idea that a general framework to monitor life expectancy at NRA and the gender gap in mortality (and subsequently, in retirement adjustments) should be as flexible as possible. The proposal of resorting to an assembling technique for model projections for each country and for each gender allows for the comparability of results (see "Model averaging" section).

Monitoring gender differences in life expectancy at NRA via control charts
After averaging model projections, we consider life expectancy at gender-specific NRA for male (M) and female (F) populations separately, as e (MA) NRA,C M and e (MA) NRA,C F , for a given cohort C i , i = M, F . Then, we propose to use control charts (Montgomery, 2005) to test the gender-specific adequacy of NRA to mitigate longevity risk. These are graphical methods for statistical process control useful to monitor the stability of production and conformity to given quality standards. Specifically, the innovative character of the analysis pursued in this contribution hinges on Exponentially Weighted Moving Average charts (EWMA), which are robust also for the non-normality of the continuous outcome under examination. The EWMA control charts for the gender gap in life expectancy (LE) at NRA are suitable to identify even small variations with respect to a target value by considering the entire history of the process while weighting past observations according to their distance from the present. Thus, it can be useful to identify change points in the dynamics of gender differences in LE with respect to a benchmark level, in a sensitive way. Thus, after computing the LE at NRA for each cohort and for both male and female populations, we propose to build EWMA charts for the relative difference in the gender gap in life expectancy. If NRA,C M − e (MA) NRA,C F denotes the gender gap in LE at gender-specific NRA, which is defined using the indicator: c is computed from one cohort C to the previous one (C-1) in order to understand the dynamics of the gender gap in LE at NRA. For the EWMA charts with limits at 2σ, we have set the parameter λ = 0.5 (which tunes the weight to be assigned to current and past observations), and the target value (that is, the central limit) to zero to monitor the yearly variation of the gender gap in LE from stability over time. Results are useful to identify when structural shifts in the gender gap in LE dynamics occur for each country. The most relevant instances are reported in Fig. 1. Then, from a graphical inspection of these EWMA charts, it is possible to group countries according to the same gender gap dynamics in LE (see Fig. 5 in the Appendix): -Overall stability of the gender gap in LE: USA, UK, Luxembourg, New Zealand, Ireland, Poland, Sweden, Switzerland;

Exploratory data analysis for the indexation of NRA
With reference to "Comparative analysis of gender gap" section, Fig. 2 displays the gender gap g = L M − L F in the forward shift to be applied to NRA under the proposed indexation mechanism for all selected countries. It is worth noting that three stable groups can be identified: -g > 0: USA, UK, New Zealand, Belgium, Luxembourg, Netherlands, Italy, Portugal, Sweden, Switzerland, Austria, France, and the Czech Republic; -g ≈0: Denmark, Finland, Iceland, and Estonia; -g < 0: Hungary, Lithuania, Latvia, Japan, and Poland; with small changes within groups. Interestingly, Spain features g < 0 for cohort 1980 (even if for a small value) and then switches to the group for which g ≈ 0, indicating that the gap in longevity is narrowing (specifically, longevity growth for women is decelerating).
A more detailed graphical analysis can be performed by inspecting a scatter-plot of L M and L F for each country and a selected cohort. Figure 3 displays this representation for cohorts 1980 and 1985 to allow a temporal comparison among countries. Most of the countries lie beneath the bisector line, indicating that adjustment for longevity to NRA is needed to a greater extent for men than for women. In particular, we can identify two main groups of countries. European continental countries (Italy, Spain, France, Switzerland) are always close together and farthest apart from other countries, and New Zealand performs more similarly to this group than the UK. Similarly, Austria and Luxembourg are moving constantly over time and seemingly maintaining the same relative distance. The USA behaves like an isolated point for all cohorts. Norway, Sweden, Portugal, and the Netherlands move together over time, and Iceland joins this group in 1985. Latvia and Lithuania are constantly stacked from the rest of the countries, whereas Poland, Hungary, and Japan move close to the bisector across time. The Appendix supplements these comments with plots analogous to Figs. 1, 2, and 3, for cohorts 1965Figs. 1, 2, and 3, for cohorts , 1970Figs. 1, 2, and 3, for cohorts , and 1975.
At a glance, it is possible to observe country-specific velocity in the loss in adequacy for gender-specific NRA over time with a time-series plot of the gender gap for the lag in NRA for selected cohorts. Figure 4 shows that, except for a few countries, the overall gender gap is broadening over time (in absolute values) in a unique country-specific direction, indicating that longevity differences are evolving quickly and suitable gender-specific adjustments should be made to gender-specific NRA to support the sustainability of National Security Systems. It is confirmed that the only countries for which gender-specific adjustments to NRA are less urgent are Denmark and Spain, for which the gender gap is stable at around 0 over different cohorts. If no urgent gender-specific adjustment to NRA is necessary for Lithuania and Latvia for older cohorts, this will become important to ensure the sustainability of NRA for younger cohorts. Specifically, NRA should be increased to a greater extent for women than for men.

Conclusions
The paper is framed within a research field tailored to studying indexing mechanisms for retirement age that are suitable to cope with longevity risk and to deal with the gender gap in LE and its evolution over time and cohorts.
The paper has developed a cross-country exploratory analysis of the adequacy of the NRA issued by the OECD for selected countries to face the dynamics of longevity risk,