Skip to main content

Journal of Population Sciences

  • Original Article
  • Open access
  • Published:

Evaluation of alternative methods for forecasting the Aboriginal and Torres Strait Islander population of Australia

Abstract

Assessing future demand for a wide range of services requires good quality population forecasts. Unfortunately, many past forecasts of the Aboriginal and Torres Strait Islander (Indigenous) population of Australia have proved highly inaccurate. This is due to poor data quality, missing data, and demographers’ incomplete understanding of Indigenous population change. In addition, because Indigenous population estimates are published only every 5 years and long after the reference date, forecasts are often used as preliminary population estimates. These form the denominators of various metrics used to monitor programmes aimed at improving health and social outcomes. The aim of this paper is to present an evaluation of alternative forecasting models and forecasts of the Indigenous population of Australia’s States and Territories by age and sex. Four models, differing substantially in complexity, were evaluated: (1) the simple Hamilton–Perry method, (2) the synthetic migration cohort-component model, (3) a uniregional cohort-component model with net migration, and (4) a bi-regional cohort-component model. The population forecasting methods were evaluated against several criteria, including forecast accuracy over the 2016–21 period, input data requirements, conceptual adequacy, output detail, time required to prepare, ability to create scenarios and select alternative assumptions, and ease of implementation. The Hamilton–Perry and synthetic migration cohort-component models provided greater forecast accuracy and scored well against the evaluation criteria. In the challenging data environment for modelling Indigenous populations, simpler forecasting methods offer several practical advantages and are likely to produce more accurate forecasts than more data-intensive models.

Introduction

The First Nations population of Australia consists of Aboriginal peoples, a diverse population comprising over 250 cultural and language groups, and the people from the Torres Strait Islands located off the north-east coast of Australia. Population estimates and projections of the Aboriginal and Torres Strait Islander population have a troubled and shameful history, with many data sources excluding or miscounting Aboriginal and/or Torres Strait Islander people (Griffiths et al., 2019). Starting with the colonial concept of Terra Nullius (‘land belonging to no-one’) in which Aboriginal people were not recognised as people in their own country, there exists a long history of not recognising First Nations people as citizens of Australia. It was only from the 1971 Census of Population and Housing onwards that Aboriginal and Torres Strait Islander peoples were fully included in the population count. Since then, considerable debate has existed about the demographic past and future of the Aboriginal and Torres Strait Islander population (Biddle, 2013; Gray, 1985, 1997, 1998; Gray & Gaminirate, 1993; Hunter & Carmody, 2015; Raymer et al., 2018; Smith, 1980; Taylor, 1997, 2011; Taylor et al., 2021; Wilson, 2009, 2016a).

Forecasts of the Aboriginal and Torres Strait Islander population are prepared regularly, often by government agencies, but they do not possess a good track record. Unfortunately, they often prove to be highly inaccurate just a few years into the future (Wilson & Taylor, 2016). The sources of inaccuracy include data quality issues in population estimates, in part due to a substantial undercount of the Aboriginal and Torres Strait Islander population in the census, as well as coverage and quality limitations in measuring births, deaths, and migration. Changes in the way people report being of Aboriginal and Torres Strait Islander origin between censuses are poorly measured and not fully understood (Griffiths et al., 2019; Shalley et al., 2023; Williamson et al., 2021). These influences can result in sub-optimal projection modelling decisions, variable quality input data, and imperfect projection assumptions, resulting in population forecasts which often differ considerably from population estimates based on the next census published a few years later.

Population forecasts of the Aboriginal and Torres Strait Islander (hereafter, respectfully referred to as Indigenous) population have many uses. They provide advance notice to governments and Indigenous community organisations of the likely future size and age structure of the population to aid planning for future service and infrastructure needs. Population numbers are also used as denominators for a range of social, economic, health, and demographic indicators. For example, they are used to calculate age-specific death rates—the input data for life expectancy at birth calculations. The Australian Government has set a target of closing the gap between Indigenous and non-Indigenous life expectancy by 2031 (Australian Government, 2022). Forecasts can provide some guidance on whether targets are on track to be met or whether additional investment and resources are required.

Population forecasts are also frequently used as interim population estimates given that Indigenous Estimated Resident Populations (ERPs) in Australia are only available every 5 years following a census and are published 1–2 years after the reference date (e.g. ABS, 2022a). For example, ABS projections are used as denominators in the calculation of Indigenous incarceration rates for recent years in the Productivity Commission’s online Closing the Gap Information Repository (Productivity Commission, 2023). Of course, if projections used as population estimates are not accurate, they are liable to imply rates and trends which are not reliable and could result in poor policy and planning decisions being made.

There is a clear need to improve the accuracy of Indigenous population forecasts, but currently there is almost nothing in the literature which provides guidance on the best approaches for Indigenous population projections. To help inform decisions about appropriate projection model options, we evaluated four modelling approaches, including the simplest possible cohort model and a bi-regional cohort-component model incorporating all demographic flows in and out of the Indigenous population. The two other models evaluated lie between these two in terms of complexity and consist of a newly introduced cohort-component model employing some synthetic demographic data (Wilson, 2022) and the cohort-component model commonly used for official Indigenous population projections in Australia (e.g. ABS, 2019; Khalidi, 2013). The assessment considered conceptual adequacy, input data requirements, output detail, time required to prepare, ability to create scenarios and select alternative assumptions, ease of implementation, and short-term forecast accuracy. We restricted the accuracy assessment to 2016-based forecasts for 2021 only because of the limited time series of reliable data required for the bi-regional model, and because most attention and use are made of short-term projections. As we discovered, a 5-year forecast horizon proved sufficient to clearly distinguish the forecasting performance of the models under evaluation.

The four projection models are described in the “Projection models” section. Input data, projection assumptions, and evaluation metrics are summarised in the “Data and methods” section. The results of the short-term forecast accuracy assessment are presented in the “Results” section, while the “Discussion” section contains a discussion, including recommendations for forecasting practitioners. Some concluding remarks are made in the “Conclusions” section.

Projection models

The population forecasts evaluated in this study were produced by the following projection models:

  1. 1.

    a bi-regional cohort-component model,

  2. 2.

    a uniregional cohort-component model with net migration,

  3. 3.

    the latest version of the simple Hamilton–Perry cohort model, and

  4. 4.

    an adaptation of the synthetic migration cohort-component model.

All models were used to produce population projections for Australia’s 8 States and Territories (6 States and 2 Territories) from 2016 to 2021. The models are described in turn below, with key features noted in Table 1. Some of the reported features are unavoidably subjective, based on the authors’ familiarity with the models.

Table 1 Key features of the projection models evaluated in this study

Bi-regional cohort-component model

The bi-regional cohort-component model incorporates all demographic outflows from, and inflows to, the Indigenous population. This model was created in 2019 to produce 2016-based projections of Australia’s population by Indigenous status (i.e. for the two population groups Indigenous and non-Indigenous), and the 15 Greater Capital City Statistical Areas of Australia (Taylor et al., 2021). For this study, we used the model to prepare projections for State and Territory populations only. The model uses bi-regional flows, rather than a full origin–destination matrix, for internal migration due to the small numbers in the full matrix. This means out-migration from each region occurs to a broad ‘rest of the country’ region, while in-migration similarly is handled as movement from the rest of the country to the specified region. Projected internal migration flows are constrained to be consistent with a separate net internal migration total assumption to prevent net migration becoming implausible in the long-run (Dion, 2017; Wilson & Bell, 2004). The model requires a substantial amount of input data. Not all the inputs are available, so a considerable amount of indirect estimation—and therefore researcher’s time—is required to create all necessary data inputs. Some of the demographic flows are small and subject to substantial random variation, necessitating smoothing and strengthening of rate age profiles.

In this model, the population accounting equation for all cohorts except newly born babies is

$$\begin{aligned} P_{s,a + 5}^{G,i} \left( {t + 5} \right) & = P_{s,a}^{G,i} \left( t \right) - D_{s,a \to a + 5}^{G,i} - E_{s,a \to a + 5}^{G,i} - {\text{OC}}_{s,a \to a + 5}^{G,i} - {\text{OM}}_{s,a \to a + 5}^{G,i} \\ & \quad + {\text{IM}}_{s,a \to a + 5}^{G,i} + {\text{IC}}_{s,a \to a + 5}^{G,i} + I_{s,a \to a + 5}^{G,i} , \\ \end{aligned}$$
(1)

where \(P\) is population; \(G\) is population group (Indigenous or non-Indigenous); \(D\) is deaths; \(E\) is emigration; \(\text{OC}\) is outward change from the population (due to changes in the reporting of Indigenous origin between censuses); \(\text{OM}\) is internal out-migration; \(\text{IM}\) is internal in-migration; \(\text{IC}\) is inward change to the population (due to changes in the reporting of Indigenous origin between censuses); \(I\) is immigration; \(t\) is point in time; \(t+5\) is 5 years after \(t\); \(i\) is State/Territory; \(a\) is age group; \(s\) is sex; and \(a\to a+5\) is the period-cohort aged \(a\) at time \(t\) and aged \(a+5\) at time \(t+5\).

The label \(t,t+5\), denoting the projection interval between times \(t\) and \(t+5\) is omitted from all demographic component variables in equations to reduce cluttering. The component flows in Eq. 1, with the exception of immigration, are projected by multiplying rates by populations-at-risk. For example, deaths are projected as

$$D_{s,a \to a + 5}^{G,i} = d_{a \to a + 5}^{G,i} \;\frac{5}{2}\left( {P_{s,a}^{G,i} \left( t \right) + P_{s,a + 5}^{G,i} \left( {t + 5} \right)} \right),$$
(2)

where \(d\) is the death rate.

An iterative calculation scheme is used with the end-of-interval population updated in successive iterations until no further change occurs.Footnote 1

Births by Indigenous status of women are projected by multiplying age-specific fertility rates by populations-at-risk:

$$B_a^{G,i} \left( {t,t + 5} \right) = {\text{ASFR}}_{f,a}^{G,i} \;\frac{5}{2}\left( {P_{f,a}^{G,i} \left( t \right) + P_{f,a}^{G,i} \left( {t + 5} \right)} \right),$$
(3)

where \(B\) is births; \(\text{ASFR}\) is the age-specific fertility rate; and \(f\) is female.

Births to Indigenous and non-Indigenous women are then summed over age of mother. Then an additional calculation is made to acknowledge that the reported Indigenous status of babies and their mothers may differ. Babies are then projected to the end of the projection interval using the equivalent of Eq. 1 but with the start-of-interval population replaced by births.

This type of model is conceptually strong: it includes inflows and outflows of all components of change affecting the Indigenous population, including interactions with the non-Indigenous population through changes in the reporting of Indigenous origin between censuses, and in fertility. Its major weaknesses include its complexity, the considerable amounts of data, data estimation, and data smoothing required, and the amount of time needed to prepare a set of projections.

Uniregional cohort-component model

The uniregional cohort-component model accounts only for births, deaths, and net migration and produces projections of the Indigenous population only. Projections of the non-Indigenous population are not created. This is the type of model which was used by the ABS to produce its 2016-based Indigenous population projections (ABS, 2019) and by Khalidi (2013) to prepare Indigenous projections for regions of NSW. Both sets of projections did not consider changes in the reporting of Indigenous origin over time, and we do the same here to obtain similar projections.

Projections for all cohorts except newly born babies are prepared by taking the start-of-interval population and subtracting deaths (calculated by multiplying the death rate by the population-at-risk) and then adding net migration:

$$P_{s,a + 5}^{A,i} \left( {t + 5} \right) = P_{s,a}^{A,i} \left( t \right) - d_{s,a \to a + 5}^{A,i} \;\frac{5}{2}\left( {P_{s,a}^{A,i} \left( t \right) + P_{s,a + 5}^{A,i} \left( {t + 5} \right)} \right) + N_{s,a \to a + 5}^{A,i} ,$$
(4)

where \(A\) is the Indigenous population; \(d\) is the death rate; and \(N\) is the net migration number.

Re-arranging Eq. 4 to remove the end-of-interval population from the right-hand side gives

$$P_{s,a + 5}^{A,i} \left( {t + 5} \right) = \frac{{\left( {1 - \frac{5}{2}d_{s,a \to a + 5}^{A,i} } \right)}}{{\left( {1 + \;\frac{5}{2}d_{s,a \to a + 5}^{A,i} } \right)}}P_{s,a}^{A,i} \left( t \right) + \frac{{N_{s,a \to a + 5}^{A,i} }}{{\left( {1 + \frac{5}{2}d_{s,a \to a + 5}^{A,i} } \right)}}.$$
(5)

In this model, all babies born to at least one Indigenous parent are assumed to be Indigenous themselves. Babies born to Indigenous mothers are projected by multiplying age-specific fertility rates by female Indigenous populations-at-risk:

$$B_{f,a}^{A,i} = {\text{ASFR}}_a^{A,i} \;\frac{5}{2}\left( {P_{f,a}^{A,i} \left( t \right) + P_{f,a}^{A,i} \left( {t + 5} \right)} \right),$$
(6)

where \(\text{ASFR}\) is the age-specific fertility rate; \(f\) is female.

Babies born to non-Indigenous women with Indigenous male partners are projected by multiplying age-specific paternity rates by male Indigenous populations-at-risk:

$$B_{m,a}^{A,i} = {\text{ASPR}}_a^{A,i} \;\frac{5}{2}\left( {P_{m,a}^{A,i} \left( t \right) + P_{m,a}^{A,i} \left( {t + 5} \right)} \right),$$
(7)

where \(\text{ASPR}\) is the age-specific paternity rate and \(m\) is male.

Projected babies are summed over age group of mother and father and then divided into males and females assuming a sex ratio at birth of 106 males per 100 females. Then the newly born cohort is projected to the end of the projection interval at time \(t+5\) by accounting for deaths and net migration:

$$P_{s,0 - 4}^{A,i} \left( {t + 5} \right) = B_{s,a}^{A,i} - d_{s,{\text{birth}} \to 0 - 4}^{A,i}\; \frac{5}{2}P_{s,0 - 4}^{A,i} \left( {t + 5} \right) + N_{s,{\text{birth}} \to 0 - 4}^{A,i} ,$$
(8)

which re-arranges to

$$P_{s,0 - 4}^{A,i} \left( {t + 5} \right) = \frac{{B_{s,a}^{A,i} }}{{\left( {1 + \frac{5}{2}d_{s,{\text{birth}} \to 0 - 4}^{A,i} } \right)}} + \frac{{N_{s,{\text{birth}} \to 0 - 4}^{A,i} }}{{\left( {1 + \frac{5}{2}d_{s,{\text{birth}} \to 0 - 4}^{A,i} } \right)}},$$
(9)

where \(\text{birth}\to 0-4\) is the newly born cohort which becomes the population aged 0–4 at time \(t+5\).

The strengths of this simpler type of cohort-component model include its relatively low data requirements and ease of calculation. However, it does not consider changes in the reporting of Indigenous origin, which has been substantial over the last few intercensal periods. The exclusion of the non-Indigenous population, which is an important population-at-risk where there are changes of reported Indigenous origin and for modelling births (Wilson, 2016a), is a shortcoming. In addition, the use of net migration numbers, rather than inwards and outward migration based on rates, risks projecting ‘negative populations’ if net migration is highly negative and the origin population small.

Hamilton–Perry cohort model

The Hamilton–Perry model is a pared-down, data-light, version of the standard cohort-component model (Hamilton & Perry, 1962). Instead of projecting populations via births, deaths and migration, it uses simpler cohort change measures. It was used to prepare projections of the Indigenous population only. The version of the model we implemented makes use of Cohort Change Ratios (CCRs) and Cohort Change Differences (CCDs). This version was found to give slightly more accurate forecasts than the standard Hamilton–Perry model which uses only CCRs (Wilson & Grossman, 2021). A CCR is the ratio of a cohort’s population at a specific time to its size 5 years earlier; a CCD is the cohort’s population at a specific time minus its population 5 years earlier. The base period CCR and CCD measures are typically estimated from populations at the jump-off date and 5 years earlier. If cohort change is negative over the base period then CCRs are used:

$$P_{s,a + 5}^{A,i} \left( {t + 5} \right) = P_{s,a}^{A,i} \left( t \right)\;\;\;{\text{CCR}}_{s,a \to a + 5}^{A,i} ,$$
(10)

while if it is positive then CCDs are used:

$$P_{s,a + 5}^{A,i} \left( {t + 5} \right) = P_{s,a}^{A,i} \left( t \right) + {\text{CCD}}_{s,a \to a + 5}^{A,i} .$$
(11)

This arrangement ensures that growing cohorts are not projected to grow exponentially.

Instead of projecting births, the end-of-interval population aged 0–4 is calculated using the Child/Woman Ratio (CWR), defined as the number of 0–4-year-olds to females aged 15–49. Thus,

$$P_{0 - 4}^{A,i} \left( {t + 5} \right) = {\text{CWR}}^{A,i} \left( {t + 5} \right) P_{f,15 - 49}^{A,i} \left( {t + 5} \right).$$
(12)

The projected number of 0–4-year-olds is then divided into males and females using recent sex ratios for this age group.

The strengths of the Hamilton–Perry model include its simplicity, ease of calculation, and very low input data requirements. It is well suited to situations where population estimates are of higher quality than the components of change (births, deaths, migration, and changes in the reporting of Indigenous origin between censuses), which is the case for Indigenous demographic data in Australia. However, it does not produce projected demographic components of change, and their exclusion from the model means that assumptions about fertility, mortality, migration, and changes in the reporting of Indigenous origin between censuses cannot be made (at least not directly).

Adapted synthetic migration cohort-component model

The synthetic migration cohort-component model was designed to enable small area population projections to be created with the advantages of a directional migration model (projecting inward and outward migration flows) but in the absence of any actual migration (or fertility or mortality) data being available (Wilson, 2022). Migration is handled in a simplified bi-regional arrangement. Only two migration flows are modelled for each area: outward migration flows from each area to everywhere else (the rest of the country plus the rest of the world) and inward migration flows in the opposite direction. It projects outward migration by multiplying outward migration rates by the population-at-risk. Inward migration is projected directly as flows to avoid the complexity of having to model the rest of the world as the origin population.

Fertility, mortality, and migration input data for the projections is estimated over a 5-year base period which ends at the jump-off year. However, only population estimates by sex and 5-year age group at the start and end of the base period are required. Inward and outward migration combined with changes in the reporting of Indigenous origin are created by an estimation procedure within the model. The key data inputs are as follows: (1) cohort net migration, calculated as remaining cohort population change over a 5-year base period once mortality has been taken into account, and (2) a model migration rate age schedule. Preliminary inward and outward flows by age and sex are prepared by multiplying the model migration rates by the base period population-at-risk, scaling the flows to plausible migration flow totals, and then adjusting the inward and outward flows by age and sex to be consistent with cohort net migration. This provides a set of synthetic inward and outward migration flows for the base period which replicate the base period net migration age–sex pattern. In the projections, outward migration is applied as rates, while inward migration is used directly as flows. Wilson (2022) provides more detail of the data preparation steps undertaken by the projection program.

In the adapted form for projecting populations by Indigenous status, each subnational population consists of Indigenous and non-Indigenous populations by State and Territory (so for 8 States/Territories there are 16 populations in total). The migration flows of the original synthetic migration model become migration plus changes in the reporting of Indigenous origin between censuses combined. The projection equation for all cohorts except newly born babies is

$$P_{s,a + 5}^{G,i} \left( {t + 5} \right) = P_{s,a}^{G,i} \left( t \right) - D_{s,a \to a + 5}^{G,i} - {\text{OWM}}_{s,a \to a + 5}^{G,i} + {\text{IWM}}_{s,a \to a + 5}^{G,i} ,$$
(13)

where \(\text{OWM}\) is the outward movement from the population due to migration and changes in the reporting of Indigenous origin between censuses; \(\text{IWM}\) is the inward movement to the population due to migration and changes in the reporting of Indigenous origin between censuses.

Deaths and outward movement are projected by multiplying rates by the area’s population-at-risk, while inward movement is input directly as flows.

Births are projected using age-specific fertility ratios (not rates) applied to female populations-at-risk. They are labelled ratios because they are estimated as births by Indigenous status divided by the female age-specific populations of the same Indigenous status. Some babies are born to women with a different Indigenous status. The ratios represent a simplified practical way of modelled fertility, even though there is some numerator–denominator inconsistency. Babies are then projected to the end of the projection interval using Eq. 13 in which the start-of-interval population is replaced by births.

Projections for the non-Indigenous population are calculated in an identical way.

The projections are then subject to two sets of constraints:

  1. 1.

    projections of total population for each area, typically created by an extrapolative model;

  2. 2.

    national cohort-component projections of population, deaths, and net migration by age and sex.

The first set of constraining projections is included because constraining to independent total population numbers has been shown to improve age–sex population forecast accuracy (e.g. Baker et al., 2020; Tayman et al., 2021; Wilson, 2016b). The second set of constraints ensures consistency with a national projection, which is often regarded as a desirable feature.Footnote 2 Importantly, it also ensures that area-specific migration flows are adjusted to match national net migration by age and sex. In projection models which use a bi-regional approximation, there would otherwise be an inconsistency between inward and outward migration summed over all areas and national net international migration.

The strengths of this model include its ability to calculate cohort-component projections based on very little data, project migration as flows (rather than net migration), output projections of both the Indigenous and non-Indigenous populations, and produce projections consistent with a national (or regional) projection. It also allows assumptions about fertility, mortality, and total populations to be set. However, assumptions about migration and changes in the reporting of Indigenous origin between censuses can only be formulated indirectly through the total population constraints. It also relies on population estimates for the jump-off year and 5 years earlier to be accurate, because age profiles for migration and changes in the reporting of Indigenous origin between censuses are shaped to match overall cohort change over the base period.

Data and methods

Input data and projection assumptions

All sets of population forecasts were launched from Estimated Resident Populations (ERPs) for 30th June 2016 (ABS, 2018) and had a forecast horizon of 5 years out to 2021. ERPs by Indigenous status are based on census counts of the population but adjusted for undercount, which is substantial for the Indigenous population at 17% (ABS, 2022b). Census counts are derived from the question ‘Is the person of Aboriginal or Torres Strait Islander origin?’ with possible tick box responses of ‘No’, ‘Yes, Aboriginal’, and ‘Yes, Torres Strait Islander’. For all projection models, our general approach to assumption setting was to maintain recent trends. Wherever possible, we used the same input data and assumptions across models.

Bi-regional model

Projected Total Fertility Rates (TFRs) and age-specific fertility rates (ASFRs) by Indigenous status of mother were assumed to remain unchanged from the 2011–16 base period. ASFRs were calculated using customised births data purchased from the ABS. Proportions of babies by Indigenous status cross-classified by mother’s Indigenous status were estimated from a customised table of children aged 0–4 and mothers by Indigenous status in households from the 2016 Census.

National mortality was assumed to continue long-run improvements, with life expectancy at birth in 2016–21 assumed to reach 85.3 years for females and 81.4 years for males. The national projections were produced by an extrapolative model of mortality (Ediev, 2008) with a 2016 jump-off year. For each population by Indigenous status and State/Territory, life expectancy was set as national life expectancy plus the difference measured in the latest set of life tables. These differences were calculated from ABS 2015–17 life tables (ABS, 2018) which, for the Indigenous population, incorporate adjustments for the undercounting of Indigenous deaths. Indigenous life expectancy at birth for 2015–17 was estimated by the ABS to be 75.6 years for females and 71.6 years for males, about 9 years below the equivalent national figures for the entire Australian population. Age-specific death rates were calculated from a mortality surface of past and projected life table nLx values by selecting the set of nLx values matching each life expectancy assumption (Wilson, 2018). Separate Indigenous and non-Indigenous mortality surfaces were prepared.

Forecast interstate in- and out-migration rates were based on census migration age profiles scaled up to be consistent with internal migration estimates. These migration estimates were based on ABS migration estimates which had been adjusted to make them consistent with 2011–16 intercensal population change. In the projection model, forecast age-specific in- and out-migration is constrained to be consistent with specified total net internal migration assumptions. Net internal migration was assumed to remain unchanged from the 2011–16 period. For overseas migration, immigration and emigration age profiles were based on census immigration and ABS overseas migration data and adjusted to be consistent with 2011–16 intercensal population change. In the projection model the initial immigration and emigration forecasts are proportionally adjusted to a separate net overseas migration assumption. For the Indigenous population immigration, emigration, and therefore net overseas migration were all assumed to be zero. Net overseas migration for the non-Indigenous population was assumed to remain unchanged from 2011 to 2016 at 213,000 per year.

Changes in the reporting of Indigenous origin between 2011 and 2016 were based on data from the Australian Census Longitudinal Database (ABS, 2019). This is a 5% sample of census records linked probabilistically. The raw data were adjusted to be consistent with 2011–16 intercensal population change, with age-specific identification change rates heavily smoothed due to small numbers. In the forecasts, we assumed the adjusted identification change of the 2011–16 period would continue into the future.

Uniregional cohort-component model

The forecast age-specific fertility rates for Indigenous women from the bi-regional model were used in the uniregional cohort-component model. For age-specific paternity rates, we used the rates from the ABS Indigenous population projections (ABS, 2019). Age-specific death rates corresponding to the life expectancy at birth assumptions used in the bi-regional model were used. Net interstate migration for the Indigenous population for 2011–16 from the 2016 Census was assumed to remain constant into the future. Strictly, this is an incorrect use of transition migration data in a movement-accounts projection model, but it closely follows the approach of the ABS in their Indigenous projections. Overseas migration was set to zero because it is very close to zero for the Indigenous population.

Hamilton–Perry cohort model

The Hamilton–Perry model requires few forecast assumptions. We calculated Cohort Change Ratios and Cohort Change Differences for 2006–11 and 2011–16 and used values averaged over the two periods in the projections. The Child/Woman Ratio for the jump-off year was assumed to remain constant.

Adapted synthetic migration cohort-component model

Total Fertility Ratios for the synthetic migration model were estimated from the jump-off population age structure using the xTFR measure of Hauer and Schmertmann (2020). Life expectancy at birth assumptions from the bi-regional model were used. Assumptions for internal migration and changes in the reporting of Indigenous origin are automated within the model. The separate total population forecasts required by the model were created by linear extrapolation over a 10-year base period and then scaled to match the total forecast population from a separate national cohort-component forecast. This national forecast was obtained from the bi-regional model.

Accuracy assessment measures

To measure the accuracy of our test population ‘forecasts’ for 2021, we calculated Percentage Error (PE), defined as

$${\text{PE}} = \frac{{F - {\text{ERP}}}}{{{\text{ERP}}}} 100\% ,$$

where \(F\) is the forecast and \(\text{ERP}\) is the actual Estimated Resident Population. These populations comprise the final 2021 ERPs published by the ABS (2023) in ‘Estimates of Aboriginal and Torres Strait Islander Australians 2021’. We also make use of Absolute Percentage Error (APE), the unsigned value of PE, and Mean Absolute Percentage Error (MAPE).

To summarise error across age- and sex-specific populations, we used the Age Structure Error (ASE) (Wilson, 2022). This is calculated by summing absolute errors by 5-year age group and sex and then dividing by the total ERP. The numerator of this metric consists of the area between the forecast population pyramid and the actual population pyramid. It can be expressed as

$${\text{ASE}} = \frac{{\sum_s \sum_a \left| {F_{s,a} - {\text{ERP}}_{s,a} } \right|}}{{{\text{ERP}}}} 100\% .$$

This measure can reveal errors in forecasts of the population by age and sex even when the total population forecast error is small.

Results

Total populations

Absolute Percentage Errors in forecasting 2021 State and Territory total Indigenous populations by the four models are illustrated in Fig. 1. Errors of the four sets of forecasts aggregated to a national scale are shown at the bottom of the graph. Even after a short forecast horizon of just 5 years, differences in accuracy were striking. For most jurisdictions, the simple cohort-component model with net migration (and no consideration of change in the reporting of Indigenous origin between censuses) performed the worst. Only for the Northern Territory it did provide a good quality forecast. Disappointingly, the bi-regional model performed the second worst for most jurisdictions, while the synthetic migration model and Hamilton–Perry model generally achieved noticeably lower errors. If errors under 5% are considered acceptable quality to users (Wilson & Shalley, 2019), then out of the 8 State and Territories, the simple cohort-component model produced an acceptable forecast for just 1 jurisdiction, while the bi-regional model did so for only 2 jurisdictions. In contrast, both the Hamilton–Perry and synthetic migration models achieved this for 7 out of 8 jurisdictions. Mean Absolute Percentage Errors across State/Territories were 10.5% for the uniregional cohort-component model, 7.2% for the bi-regional model, 4.7% for the synthetic migration model, and 3.2% for the Hamilton–Perry model. Although Fig. 1 shows Absolute Percentage Errors, nearly all Percentage Errors from all four models were negative, signalling under-forecasts of the Indigenous population. The one exception was the Northern Territory, whose population was slightly over-forecast by all models.

Fig. 1
figure 1

(Source: calculated using authors’ forecasts and ABS ERPs)

Errors of 2016-based forecasts of State/Territory total Indigenous populations in 2021

Populations by age and sex

Errors in forecasting population age–sex structure are summarised by Age Structure Error which is shown in Fig. 2. Again, the simple cohort-component model produced the largest errors, with the one exception of the Northern Territory, and the bi-regional model gave the second largest errors for most jurisdictions. The synthetic migration and Hamilton–Perry models mostly performed better by a clear margin. The mean Age Structure Errors across State/Territories for the four models were as follows: 11.1% for the uniregional cohort-component model, 8.0% for the bi-regional model, 6.0% for the synthetic migration model, and 5.3% for the Hamilton–Perry model.

Fig. 2
figure 2

(Source: calculated using authors’ forecasts and ABS ERPs)

Errors of 2016-based forecasts of State/Territory Indigenous populations by age and sex in 2021

The error patterns averaged for each age group over States and Territories are shown in Fig. 3. The uniregional and bi-regional cohort-component models produced forecasts with relatively high errors in most age groups, while the Hamilton–Perry model performed best overall, followed closely by the synthetic migration model. The mean of APEs across jurisdictions and age groups was 12.8% for the uniregional cohort-component model, 9.6% for the bi-regional model, 7.3% for the synthetic migration model, and 6.9% for the Hamilton–Perry model. Errors were generally higher at the oldest ages, with the signed Mean Percentage Errors at these ages generally being negative, indicating under-forecasts. However, previous work on estimating the Northern Territory Indigenous population back to 1966 by backcasting from the latest ERP (Wilson et al., 2019) revealed some overestimation of ERPs in earlier years at high ages—a problem common with population estimates based on census counts (Thatcher et al., 2002). If overestimation remains a problem in 2021, then the errors shown here may therefore overstate forecasting inaccuracy at high ages.

Fig. 3
figure 3

(Source: calculated using authors’ forecasts and ABS ERPs)

Mean Absolute Percentage Errors of 2016-based forecasts of State/Territory Indigenous populations by age group in 2021

Applying national constraints

We also ran a set of projections with national constraints to determine if this made much difference to the performance of the models. Projections of national Indigenous population were created using the uniregional, bi-regional, and Hamilton–Perry models. An alternative set of projections was not prepared using the synthetic migration model since it automatically produces projections constrained to the national Australian population projection (i.e. the Indigenous and non-Indigenous populations combined). Constrained total Indigenous projections for 2021 resulted in mean APEs of 10.4% for the uniregional model (compared to 10.5% for the unconstrained projections), 6.4% for the bi-regional model (7.2%), and 3.2% for the Hamilton–Perry model (unchanged). In terms of projecting population age–sex structure, the constrained projections produced mean Age Structure Errors of 11.0% from the uniregional model (compared to 11.1% for the unconstrained projections), 7.3% for the bi-regional model (8.0%), and 5.3% for the Hamilton–Perry model (unchanged). Over longer projection horizons, the differences in errors between constrained and unconstrained projections would probably be greater.

Combining elements of the two best models

After analysing the performance of the four models, we decided to experiment with a fifth forecast drawing on the strengths of the Hamilton–Perry and synthetic migration models. Several studies have demonstrated the effectiveness of combining and averaging the results from two or more projection models (e.g. Goodwin, 2009; Grossman et al., 2022; Rayer & Smith, 2010; Wilson, 2017). For this fifth forecast we used the synthetic migration model with the forecasts created by the Hamilton–Perry model as total Indigenous population constraints.

Forecast errors of total Indigenous populations were obviously the same as those of the Hamilton–Perry model. Figure 4 summarises forecast errors by age and sex in terms of the Age Structure Error. Overall, the graph appears to show little difference with the synthetic migration and Hamilton–Perry models. However, calculation of the ASEs for forecasts from the combined approach averaged over States and Territories revealed the combined approach to be marginally more accurate than the Hamilton–Perry model. In addition, when examining APEs by age group averaged across age groups and jurisdictions (as in Fig. 3), the MAPE for the combined approach was 6.6% (compared to 6.9% for the Hamilton–Perry model and 7.3% for the synthetic migration model). Although encouraging, the results indicate only a minor reduction in error, and this is from an analysis involving a small number of observations over a short forecast horizon. We can only state that the combined approach proved, overall, marginally more accurate than the Hamilton–Perry forecasts in this particular case.

Fig. 4
figure 4

(Source: calculated using authors’ forecasts and ABS ERPs)

Errors of 2016-based forecasts of State/Territory Indigenous populations by age and sex in 2021 using the combined approach

Discussion

Can we explain the pattern of forecast errors?

For the simple cohort-component model with net migration, there is a modest positive relationship between APE and the net gain to the Indigenous population through changes to reported identity between the 2011 and 2016 censuses as measured by the ACLD. There was very little change in the reporting of Indigenous origin between censuses recorded for the Northern Territory, while considerable amounts of reporting change occurred in New South Wales, Victoria, and the Australian Capital Territory. In other words, the inclusion of changes in the reporting of Indigenous origin between censuses in this model would have given much more accurate population forecasts. The fact there was so little recorded change in the Northern Territory is why the forecast for that jurisdiction from the simple cohort-component model was relatively accurate.

For the other three models (which directly or indirectly include changes in the reporting of Indigenous origin between censuses), there is no relationship between error and identification change in the ACLD. Nor is there any relationship between population size, which is sometimes found in population forecast error studies (though with just 8 populations, patterns are difficult to discern). It is likely that actual demographic trends of fertility, mortality, migration, and reported changes in Indigenous origin diverged from the projection assumptions to a greater or lesser extent across States and Territories. Almost certainly the variable quality and coverage of demographic data by Indigenous status contributed to these errors by creating uncertainty about recent trends, which in turn impacted the projection assumptions. Unfortunately, forecast error assessments based on comparisons between projected and recorded births and deaths are likely to be inconclusive because of these data quality limitations.

However, comparison of the projected number of Indigenous 0–4-year-olds and the number of 0–4-year-olds in the 2021 ERP can give an approximate indication of the accuracy of births projections. Table 2 presents the errors for this age group. The bi-regional model and simple cohort-component model tended to produce the largest errors of the 0–4-year-old population, while the Hamilton–Perry and synthetic migration model generally proved more accurate. As a proportion of error in forecasting age-specific populations as measured by the absolute error summed across age groups, the error for 0–4-year-olds contributed between 14.6% with the simple cohort-component model and 7.6% with the synthetic migration model. So, births (or strictly, 0–4-year-old population forecasts) were responsible for a relatively small proportion of overall error.

Table 2 Absolute percentage errors of projections of Indigenous 0–4-year-olds in 2021

Did incorrect migration assumptions contribute much to population forecast error? International migration of the Indigenous population is small (though not negligible), but interstate migration is not. Table 3 shows net interstate migration of the Indigenous population for the intercensal periods 2011–16 and 2016–21 from the 2016 and 2021 censuses. There is a substantial undercount of the Indigenous population in the census, estimated at about 17%, so the numbers here are likely underestimates of population redistribution. Our population forecasts were based on the 2011–16 migration data either directly as a migration assumption (uniregional and bi-regional cohort-component models) or indirectly (Hamilton–Perry and synthetic migration models).

Table 3 Census net interstate migration of the Indigenous population.

The largest differences between the two periods occurred for New South Wales and Queensland. For New South Wales, net interstate migration in the forecasts was not negative enough, but total Indigenous population forecasts for this state were too low, with errors ranging from − 8008 to − 51,147 depending on the model. If net interstate migration had been forecast as the recorded value for 2016–21, its population forecast errors would have been greater. For Queensland, forecast net interstate migration was too low by about 2500. Queensland’s total Indigenous population forecast had errors between − 8402 and − 27,271, indicating an under-forecast. So, if migration had been forecast as the recorded net gain of 4010, then the population forecast errors would have been reduced a little. However, overall, incorrect interstate migration assumptions (incorporated directly or indirectly) contributed relatively little to overall population forecast error.

The remaining error is therefore due to deaths and change in the reporting of Indigenous origin between censuses, and also any errors in the ERPs. We cannot measure errors in deaths with much confidence due to data limitations, but mortality tends to be more stable than migration and fertility trends at least in the short run, and our mortality forecasts (for the three models which incorporate them) were derived from ABS Indigenous life tables which include scaled-up death counts to adjust for the under-recording of Indigenous deaths. So, mortality forecast errors probably contributed a relatively small amount to overall population forecast error. A larger contribution is probably due to errors in forecasting the changing way in which people identify in the census over time.

Limitations of the study

Our study contains a number of limitations. The limited extent of available data restricted us to a small sample of geographical areas and one short forecast horizon. If it had been possible, the evaluation would have considered many more geographical areas and forecast horizons. Data quality also limited the precise forecast evaluation to population stocks due to the variable quality of data on demographic components of change. The consideration of error among the components of change was unavoidably incomplete and approximate. We also placed the greatest emphasis on forecast accuracy. Other aspects of projection models are undoubtedly important (Table 1), but the high errors experienced in forecasting the Indigenous population in the past means that improved accuracy must be a priority.

It is also important to note that most assessments of forecasting models are imperfect comparisons of models. It is often the case—as it is here—that the assessment actually consists of an evaluation of models together with their input data and assumptions, and the judgements made in preparing these. It is difficult, if not impossible, for the input data and assumptions to be perfectly consistent between models due to the differing nature of the models. We made our input data as consistent as possible within the constraints of the different modelling approaches, but some inconsistencies undoubtedly remained. For example, it is difficult to achieve perfect consistency between a Total Fertility Rate in the simple cohort-component model, a Total Fertility Ratio in the synthetic migration model, and a Child/Woman Ratio in the Hamilton–Perry model. Furthermore, some of the forecasts were constrained to independent population forecasts while others were not.

Conclusions

This study has evaluated several projection models for Indigenous population projections in Australia. The main finding is that data quality issues mean that it is difficult to produce good quality population forecasts with commonly used models, and that simpler models provide a better practical solution in the current Indigenous data environment. Omitting changes to reported Indigenous origin over time in the modelling generally leads to poor quality forecasts. We also found that combining elements of the adapted synthetic migration and Hamilton–Perry models yielded encouraging results.

Given the findings of this study, what modelling approach would we recommend for preparing forecasts of the Indigenous population? The answer depends on the uses to which the models and forecasts will be put. The bi-regional model is a conceptually sophisticated model which is most useful for creating scenarios based on alternative futures for fertility, mortality, migration, and changes in the way people report Indigenous origin between censuses, and for decomposing the demographic drivers of population change. But if the aim is to produce population forecasts only, it is not the best option given the current data quality and coverage limitations. The cohort-component model with net migration is not especially useful for producing population forecasts due to the omission of changes in reported origin. Nor it is particularly useful for creating alternative scenarios or decomposing population forecasts for the same reason.

For the preparation of forecasts, the Hamilton–Perry model is a good choice, allowing good quality forecasts to be prepared easily, quickly, and with little data. But it is a less useful model if there is the need to formulate scenarios or constrain to independent forecasts; doing so is possible with this model, but not easily. The synthetic migration model also produces relatively accurate forecasts, though they proved marginally less accurate than those of the Hamilton–Perry model in our limited evaluation. However, this model prepares both Indigenous and non-Indigenous forecasts, includes constraining to population totals, and ensures consistency with national population forecasts by age and sex. Assumptions can be made about fertility, mortality, and migration combined with changes in reported Indigenous origin between censuses.

For creating Indigenous population forecasts, a good choice would be to draw strength from both the Hamilton–Perry and synthetic migration models, as we demonstrated in our combined approach. The Hamilton–Perry model would be used to create Indigenous population totals which would be input as constraints in the synthetic migration model in place of populations generated by linear extrapolation. Linear extrapolations of population totals provide good quality constraints for short-term forecasts of population by age and sex. But over the longer term, overall population growth will be affected by the changing age–sex structure of a population, with growth likely to slow somewhat as a population undergoes ageing. The Hamilton–Perry model provides a simple means of generating medium- and long-term population total constraints.

The findings of this paper have obvious implications for policy and planning for future health and social care provision for Indigenous peoples. The variations in calculating population forecasts using different methods are potentially significant and need to be understood by relevant stakeholders including members of the community when interpreting findings. For Indigenous people, accuracy of data is influenced by the mistrust of the census process due to use of such data for discriminatory policies in the past. The complexities of governments asking many diverse language and cultural groups to identify as being of Indigenous origin only compound the choice of many not to participate in the census. Accurate data are important to inform policy programmes, and the voices of Indigenous peoples need to be included in ensuring that meaningful data are collected and reported. An improved process here adds to the legitimacy of population forecasts.

While our study represents and initial investigation into projection model characteristics and performance, further work is required to obtain a more comprehensive picture. The research needs to be extended spatially and temporally to examine sub-state Indigenous forecasts, and, in due course, forecasts over longer time horizons. In addition, other models and variations or combinations of models could be included in future evaluations. In the meantime, we hope the results of this study prove useful to researchers and practitioners.

Availability of data and materials

Data and projection models are available from the corresponding author.

Notes

  1. Convergence was deemed to have occurred when every age-sex population by Indigenous status and State/Territory was less than 0.01 different from the previous iteration. In this case it took 31 iterations to achieve.

  2. For populations with considerable geographical variations in population growth rates, as is the case for the Indigenous population, this may not be the ideal approach from a modelling perspective. However, from a practical perspective, it is often the case that statistical agencies create national projections and constrain all subnational projections to that national projection.

References

Download references

Acknowledgements

Advice from Dr Kim Johnstone on the uses of population projections in government is gratefully acknowledged.

Funding

This research was supported by the Australian Research Council Centre of Excellence in Population Ageing Research (project number CE1101029).

Author information

Authors and Affiliations

Authors

Contributions

TW designed the study and undertook the analysis. All authors contributed to the drafting and re-drafting of the paper.

Corresponding author

Correspondence to Tom Wilson.

Ethics declarations

Ethics approval and consent to participate

This project was approved by the Office of Research Ethics and Integrity at the University of Melbourne (reference 2023-25631-42542-4).

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wilson, T., Temple, J., Burchill, L. et al. Evaluation of alternative methods for forecasting the Aboriginal and Torres Strait Islander population of Australia. Genus 80, 16 (2024). https://doi.org/10.1186/s41118-024-00223-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s41118-024-00223-2

Keywords