In this section, we describe the process undertaken to build a near to real-time system for monitoring and tracking excess mortality in South Africa in the context of the COVID-19 pandemic (Bradshaw et al., 2021). Some 20 years ago, in an effort to measure the impact of the HIV/AIDS epidemic at a time when there was large-scale government denial and an hiatus in the production of vital statistics on the cause of registered deaths by the national statistical agency, the South African Medical Research Council (SAMRC) and UCT set up a process of Rapid Mortality Surveillance (RMS) (Dorrington et al., 2001). Essentially, the process made use of data from the NPR.
These data are subject to two forms of under-reporting. The first is non-registration on the population register (because the deceased did not have a South African birth certificate or identity document). The second is the non-registration of the death, a common challenge experienced in developing countries. The annual numbers of these deaths by sex and age were adjusted to account for deaths not captured on the population register, and adjusted for under-registration to provide national estimates of the numbers of deaths (Dorrington et al., 2020). Since then, the system has been maintained by the SAMRC as a means of tracking various indicators of mortality more timeously than the release of vital statistics allows. Up to 2020, although monthly updates of the registrations of deaths were received from the DHA, the analysis and reporting was confined to annual data for the country as a whole, and sub-national analysis was not attempted.
The spread of SARS-CoV-2 to countries beyond China in February 2020, together with reports of rapidly escalating mortality in Italy and elsewhere, prompted the SAMRC and UCT team responsible for the RMS to seek to modify the RMS to be able to monitor and track the effects of the epidemic on mortality in South Africa. Weekly updates of changes to the NPR including dates of birth and death, sex, the DHA office where the death was registered,Footnote 2 and whether the death was due to natural or unnatural causes, were requested. However, to be useful, the following limitations and deficiencies needed to be addressed:
-
Apart from a level of general under-registration of deaths (disproportionately so among young children; although these were less likely to be affected by the virus), these data are missing deaths of those not on the NPR, i.e. those without identity numbers (IDs)—mainly deaths that occur before the birth was registered, and non-South African citizens. The Department of Home Affairs does not include such deaths in the adjustments to the NPR, but where a death notification form has been completed, the forms are forwarded to Statistics South Africa, for the production of the official vital statistics, which are, as mentioned above, subject to reporting delays of several years.
-
In addition to the general under-registration of deaths, our investigations showed that, even in the absence of closure of offices (due to public holidays, or as a result of contamination during the epidemic), about 20% of natural and 50% of unnatural deaths still remained to be processed for the most recent week being reported on.Footnote 3 Thus, the number of deaths reported for the most recent week needed to be adjusted for these ‘incurred but not yet processed’ deaths.
-
Since deaths can be registered at any DHA office in the country, the location of the office, even if one assumes that the death was registered at the most convenient office, is not necessarily an indication of the place where the death occurred. For this reason, sub-national measurement was confined to provinces, and below that to the eight metropolitan districts.
-
A drop in registration of births owing first to a complete cessation of birth registration during the early, severe, lockdown, possibly followed by the impact of the closure of DHA facilities in maternity hospitals, and presumably some reluctance by parents to register births during the early stages of the epidemic. The effect of this on the registration of deaths under age 1 was so significant that monitoring of the impact of the epidemic had to be confined to those aged 1 and older. However, it can be assumed that the number of COVID-19 and COVID-19-related deaths under age 1 is small relative to those of adults.
The last two issues described above represent structural constraints in the data, which cannot be easily remedied. The manner of correcting the NPR data for the first two issues is described below.
Correction in respect of sub-national under-registration of deaths
Although in practice the computations were complicated by limitations on the data available at the time of very strict lockdown, in essence the approach made use of the fact that for the past 10 years we have been using data on deaths by age and sex from the national population register (NPR) to estimate the true number of deaths for years more recent than allowed by the release of cause-of-death (VR) data by Statistics South Africa (Stats SA). This was achieved by comparing the NPR data to the VR data for the same year over time to estimate the proportions (by age and sex) of notified deaths that are not included on the NPR. This in turn facilitated the estimation of the expected numbers of VR deaths for more recent years, before the VR data are released.
Completeness of the VR data for adults was estimated by application of Deaths Distribution methods (DDMs) (2013b; Dorrington, 2013a) to past VR data and censuses/surveys from the 1980s to 2011 and for infants and children by comparison of infant and childhood VR data to estimates of the true numbers of deaths implied by estimates of infant and childhood mortality rates from census and survey data. These estimates suggested that completeness of registration of, particularly, adult mortality was reasonably high and followed a logistic curve as it approached 100%. From these estimates, we have extrapolated trends to provide estimates beyond the data. Infant deaths reported by the official CRVS system are estimated to be currently around 75% completely reported, those age 1–4 about 60% complete and adult deaths 90–95% complete in recent years (i.e. post 2013). The NPR deaths are 75%, 60% and 100% of the official CRVS infant, 1–4, and adult deaths, respectively. These estimates can then be applied to the numbers of VR deaths estimated from the numbers of deaths on the NPR to provide estimates of the true numbers of deaths by age and sex for the country as a whole, which were the starting point for deriving estimates of the numbers of deaths by age and sex for sub-national populations.
These estimates of the true numbers of deaths by age and sex for the provinces and metropolitan districts were estimated by assuming that:
-
1.
The VR unnatural deaths (whether in urban or rural areas) are completely reported.
-
2.
The VR deaths (both natural and unnatural) in the eight metropolitan districts (metros) are completely reported.
-
3.
The VR deaths (both natural and unnatural) in the non-metro district councilsFootnote 4 in which 70% or more of the population live in enumeration areas classified as “urban” in the 2011 census are completely reported.
-
4.
The correction factors for natural deaths for non-metro district councils that are urban is the same for all provinces and equal to the weighted average of correction for incompleteness of natural deaths of eight metros. The correction factors for natural deaths for the non-metro councils that are not urban are the same for all provinces.
The detailed steps for calculating the completeness for the metropolitan districts, urban and non-urban non-metropolitan areas and hence for the provinces given these assumptions are given in the Appendix: Table 2, together with the resulting estimates of provincial completeness in total and for ages 15 + .
The correction factors were then applied to the NPR deaths by metro (including non-metro urban and rural), sex, age, and cause (i.e. natural and unnatural) to provide estimates by province and the eight metropolitan districts that are consistent with the national estimate of the true numbers of deaths by age and sex.
Correction in respect of late registration of deaths
The RMS had been set up initially to monitor mortality annually with monthly data updates so small delays in registration were not material. With a move to a weekly system, allowance for delayed reporting is a more important adjustment.
Comparison of the numbers of deaths reported most recently to the numbers of deaths ultimately reported for that week (i.e. including delayed processing), showed that, provided there were no interruptions to processing over the seven days prior to the provision of the data, about 80% of the natural deaths ultimately registered on the NPR were captured by end of business on the Monday following that epi-week.
This percentage has been monitored to include further adjustments for processing weeks interrupted by public holidays and/or office closures due to COVID-19 infection at the office. Based on this information the adjustment for late processing is adjusted upwards depending on which day(s) of the week’s processing were missed.Footnote 5
Model to predict weekly numbers of deaths
The weekly deaths from all causes and from unnatural deaths are shown in Fig. 2A and B, respectively. There are strong seasonal variations in the numbers of deaths from all causes with an increase in the winter months as well as upticks at the beginning of the year. The trend in deaths from unnatural causes has distinct upticks, coinciding with month-ends. Additionally, the first week of the year tends to be high.
In developing predicted values for 2021, it was considered important to use more historical data to obtain a more robust trend for the prediction, and thus data for the 6 years prior to 2020 (i.e. 2014 to 2019)Footnote 6 were used to establish the baseline of predicted deaths for 2020 and 2021. Given the distortion in the number of deaths in 2020 arising from COVID-19, we cannot use data from 2020 to establish the predicted series of weekly deaths.
Poisson regression and negative binomial regression are common statistical models used for the analysis of count data. Over-dispersion of the data necessitates fitting a negative binomial model to the death data after adjustment for incompleteness.
Following further exploratory analysis of the data, it was decided to fit separate models to the unnatural deaths, allowing for these deaths to follow a weekly pattern that would be different from the natural deaths. In the case of natural deaths, we fitted separate models for the Western Cape and KwaZulu-Natal and a third model for the remaining provinces. The main reason to separate Western Cape is that it appears to have a slightly different seasonal trend, while KwaZulu-Natal needed to be modelled separately because it appears to have a more rapid decline in adult mortality in recent years than in the other provinces (presumably due to treatment of high numbers of people infected with HIV over these years). This separation was not done for unnatural deaths as the data were more limited and there was no evidence of a need for separate models than for natural.
Models were fitted using Stata and allow for an interaction between age group and sex, and independent effects of the year, province, and epi-week as categorical variables. Estimates of population size were included in the model as an “offset” term, permitting the modelling of mortality rates directly.
Thus, in effect, the following regression model was fitted to the log of the rate, calculated as the number of deaths divided by exposure time measured in person-weeks:
$$\ln \left( {\frac{{d_{ij} }}{{PW_{ij} }}} \right) = \ln \left( {d_{ij} } \right) - \ln \left( {PW_{ij} } \right) = \beta_{0} + \beta_{1} X_{i} + \beta_{2} X_{1j} + \beta_{3} X_{2j} + ... + \beta_{n + 1} X_{nj} ,$$
where dij is the count of deaths and PWij is the exposure (measured in person-weeks) for a particular age group i and combination of covariates j.
The statistical model produces estimates of the coefficients (the betas in the formulation above). Since calendar year was included in the covariates as a linear effect, we derive extrapolated fitted values for each epi-week of 2020 and 2021, by age, sex, and province.Footnote 7 To derive prediction intervals for the forecasted weekly deaths, we follow the approach recommended by WHO (Vital Strategies & World Health Organisation, 2020), estimating the standard deviation based on the observed values from the previous years. Since there is considerable variability across the weeks, we have further adapted the recommended approach and created an uniform prediction interval for each week of the year by taking the median of observed standard deviations for the 6 values (from each year 2014–2019) for epi-weeks 1–52. Data for week 53 were excluded as 5 out of the 6 years did not have a 53rd week.
The national predictions and those for the provinces in 2020 using this approach were reasonably consistent with 2020 deaths in periods not impacted severely by either epidemic or lockdown conditions and provides a consistent series into the epi-weeks of 2021.