Rates, intrinsic linkages, and multistate population dynamics
- Robert Schoen^{1}Email author
https://doi.org/10.1186/s41118-017-0023-5
© The Author(s) 2017
Received: 7 February 2017
Accepted: 7 July 2017
Published: 30 November 2017
Abstract
Demographic analyses of multistate populations are commonplace, as are situations where population stocks are known but population flows are not. Still, demographic models for multistate populations with changing rates remain at an early stage of development, limiting dynamic analyses and analytical projections. Here, a new approach, the Intrinsic Linkage-Rate Ratio (IL-RR) model, is presented and explored. The key IL parameter, w, is a simple weight for projecting populations. Using the ultimate state composition implied by the prevailing rates, the IL-RR model provides new relationships that connect multistate populations over time and allow analytical population projections. Parameter w reflects population metabolism and scales the level of the transfer rates. Compositional change is driven by the sequence of implicit stable population compositions. The IL-RR approach also provides a new method for estimating transfer rates within an interval from population numbers at the beginning and end of the interval. The new relationships developed advance the ability of demographers to model multistate populations with changing rates and to relate population stocks and flows.
Keywords
Background
Multistate phenomena are common in demography, as individuals move between different marital statuses, regions of a country, labor force statuses, and so on. The modern demographic analysis of fixed rate multistate models dates from Rogers (1975) and was further developed in Land and Rogers (1982) and Schoen (1988, Chaps. 4 and 5). Still, the ability of demographers to model the dynamics of multistate populations with time varying rates remains quite limited.
In the long term, the behavioral rates of interstate transfer determine the state composition of a population. In the short term, the composition at the end of a time interval is determined by the rates during that interval and the initial population composition. Analytical models that describe the evolution of a multistate population subject to time varying rates are discussed in Schoen (2006, Chap. 8), but none have been particularly useful. Schoen (2014), building on the birth-death model in Schoen (2013), presented a multistate projection approach based on the idea of “intrinsic linkages,” i.e., linear relationships between the dominant right eigenvectors of projection matrices and the sequence of population compositions. Despite the promise of that approach, there were difficulties in specifying the full projection matrices, the properties of the linkage parameter were not fully developed, and the eigenvectors were not related to the projection elements.
Here, we more fully articulate an improved Intrinsic Linkage (IL) approach, connecting the IL parameter to demographic measures and relating the dominant right eigenvector to functions of the transfer rates. Emphasis is on the matrix of rates rather than on the matrix of transition probabilities, as the rates are independent of each other and directly reflect behavior, i.e., a movement from one state to another. The present analysis yields new relationships between population stocks and flows, equations that can analytically project multistate populations, and flexible procedures for determining the array of transfer rates. In addition, with the beginning and ending populations known, the new Intrinsic Linkage-Rate Ratio (IL-RR) approach can be used to estimate an interval’s rates of interstate transfer.
The paper begins by setting out the mathematical structure of IL models and how model values can be determined. IL-RR models with two, three, and more than three states are then explored, and IL-RR models are examined from a graph theory perspective. The section “Finding an appropriate value for parameter w ” provides ways to find an appropriate value for IL parameter w, and “Two hypothetical model calculations” shows two hypothetical model calculations. The IL approach is then used as a rate estimation method. After the “Summary and conclusion,” Appendix 1 gives a brief discussion of the eigenstructure decomposition of a matrix, and Additional file 1 presents illustrative calculations of IL-RR models.
The mathematical structure of multistate IL models
This section presents the IL equation and its parameter, w. The properties of parameter w are highlighted, and the structure of the multistate models considered here is set forth.
The IL assumption
Where f _{jt} is the fraction of the total population that is in state j at time t (Σ^{N} _{j = 1} f _{jt} = 1); N is the number of living states; w is the IL parameter (0 ≤ w < 1); and τ _{jt} is the fraction of the stable population, implied by the rates prevailing over the t − 1 to t interval, that is in state j (Σ^{N} _{j = 1} τ _{jt} = 1). Equation (1) modifies IL Eq. (5) of Schoen (2014), in that Eq. (1) uses population fractions rather than size relative to the first state in order to avoid complicating scale adjustments.
Equation (1) defines IL parameter w in terms of the weighted average of initial and stable populations that determines the end of interval population composition. A larger value of w gives more weight to the initial composition, while a smaller value of w gives more weight to the ultimate stable fraction. Since the larger the value of w the less the change in state composition, w reflects the metabolism of the model (Ryder 1975), that is the level of turnover or movement between states that drives the pace of convergence to stability.
The IL-RR approach assumes that, in any given model, the value of w is constant over both state and time. In every time interval of an N-state model, IL Eq. (1) provides (N − 1) constraints on the end of interval populations or on the rates prevailing during the interval. Those IL constraints are far looser than the stable requirement of constant rates and allow the state composition to change as the transfer rates vary over time. From the fundamental principle, in any multistate population, every state fraction moves in the direction specified by the IL assumption. The IL constraints arise because states typically do not move uniformly, as required by Eq. (1). The IL assumption is eminently plausible, limits population change in a simple and reasonable fashion, and provides new relationships in the context of a dynamic multistate model.
Knowledge of w, initial values, and the time trajectory of the τ _{j} thus allow the composition of the model to be analytically projected as far into the future as the τ _{jt}sequence extends.
IL relationships
IL parameter w, as defined by Eq. (1), has significant connections to transfer rates and probabilities. Two relationships, to be demonstrated in the following sections, are of particular significance. First, w is equal to a real subordinate eigenvalue (root) of the transition probability matrix. That finding is consistent with viewing w as an indicator of the speed of convergence to stability (Schoen 2006, Chap. 2). Second, in IL projections, the effect of w on the rates is separable from the effects of the τ’s and initial values, and yields an overall factor of (1 − w)/(1 + w) that applies to all rates. A model with a different value of w would have all of its rates proportionally adjusted. In short, parameter w sets the level of turnover and the speed of convergence by raising or lowering all of the rates of interstate transfer. Those consequences of the definition of w have not been noted before and are not obvious from Eq. (1). They demonstrate how w impacts demographic behavior and show its relationship to intrinsic demographic measures. In the following sections, those relationships are established algebraically for N = 2 and numerically for models with three or more states.
Specifying the multistate models being considered
Here, we examine multistate models with no growth or attrition to focus attention on the rates of interstate transfer in an analytically tractable context. In practice, that restriction only eliminates fertility or mortality that differs across states. In the absence of state differentials, changes in size can readily be incorporated as overall scaling factors. Age is not explicitly recognized; cohort analyses, where time reflects age, can readily be done.
To fix the state space of the models, we assume that the N-state models have all N states present at stability. Models that have states with exits but no entrants have states that are not present at stability, and hence are excluded. That restriction can frequently be relaxed, however, as τ _{j} can often equal zero in Eq. (1) without compromising the analysis.
Where the sums over j range over all possible destination states, and m _{ij} is the occurrence/exposure rate of movement from state i to state j. With a time index added, m _{ijt} represents that rate over the t − 1 to t time interval. Matrix M _{N} can have up to N (N − 1) distinct interstate transfer rates. A transfer rate is zero whenever there is no direct movement from state i to state j. Here, we only consider models where R ≥ N. If R < (N − 1), the N states are not all connected. If R = (N − 1), all states are not present at stability, and the beginning and ending populations determine the rates. That case, which has an important application to parity status models, is discussed in depth in Schoen (2016). Accordingly, here, we consider models where N ≤ R ≤ N (N − 1).
The number of moves (or transfers) from state j to state i during the interval is L _{jt} m _{jit}. Since the f _{j} sum to one, there are (N − 1) independent flow equations that constrain model values.
Now consider Π _{N}, the transition probability or projection matrix associated with M _{N}. The ijth element of Π _{ N }, π_{ji}, i ≠ j, is the probability that a person in state j at the beginning of an interval is in state i at the end of the interval, with π _{jj} probability that a person in state j at the beginning of the interval is also in state j at the end. In contrast to the rate matrix elements, the value of π _{ji} is zero only when there is no route, direct or indirect, from state j to state i.
Where I is the N × N identity matrix (cf. Schoen 1988, Chap. 4).
Determining the IL model values
In a multistate model with N states and R nonzero rates, we seek the (N − 1) end of interval populations and the R transfer rates. Assume that we know, or can determine IL parameter w, and have (N − 1) IL equations in the form of Eq. (1), and (N − 1) flow equations of the form of Eq. (5). To fully determine the model, we need (R − N + 1) additional constraints. The innovation here is expressing the dominant right eigenvector elements of M _{ N } in terms of the transfer rates, which provides further (N – 1) constraints. That is readily done, since eigenvector elements are functions of the transfer rates.
In general, the transfer rates are not known when the IL-RR approach is used. If all of the transfer rates are known, the projection matrices can be found from Eq. (7), and the projections carried out directly. In that case, Intrinsic Linkages are generally not present, as the implicit parameter w values are likely not constant over all states and across a sequence of arbitrary rate matrices. The IL-RR approach, by imposing a constant w, allows projections to be made without knowledge of the transfer rates. The relationships between the τ’s and the rates then contribute to the solutions for the rates.
To specify the complete multistate model, two cases must be considered.
The two-step solution when R ≥ 2(N − 1)
Every projection needs to be guided by some information or assumptions. Here, assume that all of the τ _{j} values are known. As Eq. (1) indicates, the IL approach stresses the centrality of the dominant eigenvector implied by the rates prevailing in every interval of a population trajectory. Knowing the sequence of τ _{jt} values means knowing the sequence of implied stable population compositions. Such knowledge represents ongoing population dynamics in terms of clearly interpretable quantities.
In the first step of the two-step solution, population composition is projected into the future using IL Eqs. (1–3). With w and the f _{j0} and τ _{jt} values known, the entire trajectory of future f _{jt} values can be found immediately. The solutions are unique and demographically valid (i.e., real and non-negative). The ability to do such analytical projections is a major advantage of the IL approach and can be extremely useful in applied work.
In the second step, the R nonzero transfer rates are found. Given the (N − 1) flow equations and the (N − 1) equations that relate the τ _{j} values to the rates, additional (R − 2[N − 1]) constraints are needed to complete the solution. When R = 2(N − 1), the two-step solution fully determines the model rates, for any number of states, with no need for additional constraints.
If R > 2(N − 1), the additional (R − 2[N − 1]) constraints can come from any source, including rates, cross-product ratios, or rate products. A simple, flexible approach is to use known (or assumed) rate ratios. In a model with all rates present, think of the 2(N − 1) rates that can be determined by the IL-RR method as being on the super- and sub-diagonals of the rate matrix, i.e., the diagonals immediately above and below the main diagonal. In each column, the ratio of (i) a rate above (or below) the super- (or sub-) diagonal rate, to (ii) that super- (or sub-) diagonal rate, can yield a further constraint. Applying that procedure to all rates above the super-diagonal or below the sub-diagonal can supply the needed constraints. In models with some zero rates, fewer constraints are needed. If a super- or sub-diagonal rate is zero, the nearest nonzero rate above (or below) the diagonal can be used. In general, the algebra underlying the rate equations is straightforward, but there need not be a demographically valid solution and multiple solutions possible.
The one-step solution when R < 2(N–1) or τ _{j} values are not known
We seek (N–1) end of interval population fractions and R rates, for a total of (R + N–1) unknown values. We have (N–1) IL equations and (N–1) flow equations, thus (R–N + 1) < (N–1) further constraints are needed. It follows that all of the (N–1) τ _{j} (or u_{j}) values cannot be specified a priori; at least one is constrained by the flow and IL equations. Lacking all of the τ _{j} values, IL Eq. (1) cannot be implemented for all states, and the first step above cannot be carried out.
The one-step solution simultaneously solves for the rates and the end of interval population composition. Knowledge of parameter w, initial population values, and (R–N + 1) < (N–1) additional constraints suffice to do so. The necessary constraints can come from any source, including τ _{j} values and rate ratios. The one-step approach proceeds interval by interval, and thus is less attractive than the two-step approach. A numerical solution can always be found, but multiple solutions are possible, and no demographically valid solution may exist.
Stability in the special case of constant dominant right eigenvectors
Consider the case where a two-step solution is possible, and the τ _{j} values are constant over time. Then, the rate matrix and the population projection matrix are constant over time. The projection describes the trajectory to the stable population composition specified by the τ _{j} values. Even though the initial population composition does not influence the ultimate stable composition, it does influence the rates of transfer. For N > 2, parameter w and the τ _{j} do not specify a unique rate matrix. Different initial population compositions lead to different rate matrices, albeit matrices with the same dominant right eigenvector.
The special case of w = 0
Equation (1) indicates that when w = 0, the end of interval population has the same composition as the stable population implied by the rates prevailing over the interval. That suggests that there are rates which, over the course of a single interval, can transform an arbitrary initial population into the stable composition implied by those rates. The existence of such a “dynamically stable” multistate population has not, to my knowledge, previously been noted in the demographic literature. It merits a brief discussion.
It is not difficult to show that demographically valid, dynamically stable, multistate populations actually exist. Consider a two-state model with interval length 5 where the rates alternate over time. Let the rates during odd-numbered intervals be m _{12} (odd) = .24 and m _{21} (odd) = .16, and during even numbered intervals be m _{12} (even) = .16 and m _{21} (even) = .24. The τ vectors for odd and even periods are then [0.6, 0.4]′ and [0.4, 0.6]′ respectively, where the prime (′) indicates a transpose. At every time point, the population composition is that of the stable population implied by the prevailing rates, and that composition alternates as the rates alternate.
The assumption of w = 0 is quite strong and would need to be justified in any analysis. A demographically valid solution may not exist, the projection matrix is singular, and the transfer rates can be quite high. Nonetheless, unlike birth-death populations, multistate populations can become dynamically stable. That somewhat paradoxical finding shows the flexibility of dynamic multistate models and the complex connections that exist between fixed rate and changing rate models.
The IL solution in the two-state model
describes the model, where the line indicates a connection between the states and the two arrowheads indicate that there is movement in both directions. Denoting the states as 1 and 2, the model has two rates, m _{12} and m _{21}. There is a two-step solution with R = 2(N − 1). Knowing w, the initial composition, and one τ (or u) eigenvector value yields one IL equation, one flow equation, and one eigenvector equation, which are sufficient to determine the end of interval composition and the two rates.
The dominant right (column) eigenvector of M _{2} can be written u = [1, m _{12}/m _{21}]′, with the first element of u scaled to one. The subordinate eigenvector of M _{2} is simply [1, –1]′. The values in the related τ vector are given by [m _{21}/(m _{12} + m _{21}), m _{12}/(m _{12} + m _{21}]′.
Equation (14) indicates that an increase in either rate decreases w. The smaller the rates, the closer w is to one, the lower the metabolism, and the slower the convergence to stability.
Now consider Π _{ 2 }, the transition probability or projection matrix implied by rate matrix M _{ 2 }. The eigenvectors of the two matrices are the same, but their eigenvalues (roots) differ. The dominant root of M _{ 2 }, r_{1}, is zero, while subordinate root r_{2} = –(m _{12} + m _{21}). The dominant (λ_{1}) and subordinate (λ_{2}) roots of Π _{ 2 } are 1 and w. Equality between w and λ_{2} is consistent with the known relationship between (λ_{2}/λ_{1})^{2} and the speed of convergence (cf. Schoen 2006, Chap. 2). While parameter w is defined in quite different terms, Eqs. (1) and (14) shows that, at every time point, it is inherently a part of the basic structure of the N = 2 model’s projection matrix.
As the N = 2 model is relatively simple, every two-state model satisfies the IL relationship in Eq. (1). Given any m _{12}, m _{21}, the projection matrix they imply, and an initial population, Eq. (1) is satisfied when w = λ_{2}. For w to remain constant over time, (m _{12} + m _{21}) must be a constant. However, when N ≥ 3, a given set of rates will generally not yield a projection matrix consistent with Eq. (1), as states need not move toward their stable proportions in a uniform manner.
IL-RR in three-state models
When N = 3, five different models arise. Let us consider each in turn.
The N = 3, R = 3 (ring) model
With one known z value, each interval involves a system of five equations with five unknowns, i.e., the other z value, two end of interval populations, and two rates. Algebraically, the solution is a complicated quadratic. Numerically, there are two possible solutions for the rates, one associated with λ_{2}, and one associated with λ_{3}, the smaller subordinate root of the projection matrix. There may be one, two, or no demographically valid solutions. Additional file 1 provides an annotated Maple program that calculates an R = 3 Ring model.
The N = 3, R = 4 (path) model
In “path” models, states can directly connect only to an immediately preceding and an immediately following state. Consider the path model with rates m _{12}, m _{21}, m _{23}, and m _{32}, shown in Fig. 1b. States 1 and 2 and states 2 and 3 directly communicate, but there is no direct connection between states 1 and 3. Here, (R–N + 1) = 2, and the two eigenvector elements suffice to identify the model. The dominant right eigenvector of M _{ 3 } is [1, (m _{12t}/m _{21t}), (m _{12t}/m _{21t}) (m _{23t}/m _{32t})]′, suggesting the rate ratios z _{1t} = m _{12t}/m _{21t} and z _{2t} = m _{23t}/m _{32t}. The IL and flow equations follow as before.
with m _{12t} and m _{32t}following immediately from the rate ratios. Note that the factor (1 − w)/(1 + w) appears in Eq. (17). Numerically, w is a subordinate root of the projection matrix associated with path rate matrix M _{3}. The solution is unique, though not necessarily demographically valid. Additional file 1 provides an annotated Maple program that calculates a three-state path model.
The N = 3, R = 4 (triangular) model
In the triangular model, the rates are m _{12}, m _{23}, m _{31}, and m _{32}, as shown in Fig. 1c. Here, the “path” of the previous model is replaced by a “cycle”, in that a person can start in state 1, move to state 2 and then state 3, and return to state 1 without ever being in another state twice. The dominant right eigenvector is [1, m _{12} (m _{31} + m _{32})/(m _{31} m _{23}), m _{12}/m _{31}]′. Constituent rate ratios are z _{1} = m _{12}/m _{31} and z _{2} = (m _{31} + m _{32})/m _{23}.
With the trajectory of z _{jt} values known, the two-step approach can be applied as described above. The solutions are unique, and w is a subordinate root of the projection matrix.
The N = 3, R = 5 model
A multistate model with three states and five rates is depicted in Fig. 1d, with rates m _{12}, m _{13}, m _{21}, m _{23}, and m _{31.} With N _{z} = (R − N + 1) = 3, the two eigenvector elements are not enough to determine the rates. A further constraint, such as a rate ratio, is needed. Now assume known rate ratios z _{1} = (m _{21} + m _{23})/m _{12}, z _{2} = m _{23}/m _{31}, and z _{3} = m _{13}/m _{31}. The dominant right eigenvector of the rate matrix is then [z _{1}, 1, z _{2} + z _{1} z _{3}]′. The two-step approach is applicable, the solutions are unique, and w is a subordinate root of Π _{3}.
The N = 3, R = 6 (full) model
The dominant right eigenvector is then [1, u_{2}, u_{3}]′. Note that dominant right eigenvector elements of the other three-state models can be found from Eq. (18), with zero values entered for transfers that are not allowed.
Here, and whenever R ≥ 2(N − 1), a two-step solution can be found from the (N − 1) known τ_{j} values and an additional (R − 2(N − 1)) constraints obtained from rate ratios or other sources. Algebraically, the solutions of the full model are complicated. Numerically, they yield two solutions, neither of which may be demographically valid. Parameter w is once again a subordinate root of the projection matrix, and its value proportionally adjusts all rates.
IL-RR in models with more than three states
With four or more states, the number of distinct multistate models increases greatly, as does the range of interstate movements and the complexity of the elements of the dominant right eigenvector. The Appendix presents an algebraic method for finding dominant right eigenvector elements and gives the eigenvector elements of the full four-state model. The solution approaches described above remain applicable. If parameter w, initial population values, and (R – N + 1) additional constraints are known, numerical solutions for the ending populations and transfer rates can always be found. If R ≥ 2(N − 1) and the τ_{jt} are known, a two-step solution with an immediate population projection is possible.
IL-RR models from a graph theory perspective
Graph theory affords some insight into when unique, two-step projections can be made. There are close parallels between graphs and multistate models, though they have not been well explored from a demographic perspective. We have already used the graph theory concepts of path and cycle. Another useful concept is that of “tree”, a connected graph with no cycles (Chartrand and Zhang 2012; Rebane and Pearl 1987). In demographic terms, a tree describes a strictly hierarchical multistate model. There is one tree in two- and three-state models, the path model. In four-state models there are two trees, the path and “Y” forms, the latter where diagrams of the connections between states resemble the letter Y. When N = 5, there are three trees, the path, the Y, and the “star”, the latter having four states that connect directly to the fifth state. The number of tree forms increases with N, as six-, seven-, and eight-state models have 6, 11, and 23 distinct forms, respectively (Chartrand and Zhang 2012; Rebane and Pearl 1987). Each tree has (N − 1) links. Now assume that each link represents two rates, one in each direction. Every such modified tree can be interpreted as an acyclic multistate model with 2(N − 1) rates. All such tree-form models are fully soluble by the two-step IL-RR approach when the τ _{j} values are known.
IL-RR in N-state tree-form (acyclic) models
We examine patterns in three kinds of tree-form models: the path, the Y, and the star.
The N-state path model
N-state Y and star models
A seven-state Y model is shown in Fig. 2b, and a nine-state star model in Fig. 2c. To write the dominant right eigenvector of those models, let us make state 1 the reference state and scale it to 1. Because, we have tree-form models, there is only one route between the reference state and every other state. As was the case in the N-state path model above, the jth eigenvector element is the product of the rate ratios that trace out the route from state 1 to state j. For example, in the Y model, the fourth element of the eigenvector is the product {(m _{12}/m _{21})(m _{23}/m _{32})(m _{34}/m _{43})}. An analogous string of rate ratios provides every eigenvector element. As in the generalized path model, with (N–1) known τ _{j} values, the two-step approach provides a unique solution, and w equals a subordinate root of the projection matrix.
In short, in generalized acyclic (or tree-form) models, a single-route links every pair of states. Given w, initial population composition, and (N − 1) τ _{j} values, the two-step approach provides the complete N-state model.
IL-RR in two N-state cyclic models with R = 2(N − 1)
The two-step approach can also lead to a complete solution in some N-state cyclic models when there are R = 2(N − 1) nonzero rates. In this section, two such models are presented.
N-state “add-on” models
With (N − 1) known τ_{j} values, the complete model follows via the two-step approach.
N-state “sawtooth” models
A “sawtooth” model with seven states is depicted in Fig. 3b. As in the add-on model, the first three states have the form of a triangular model. Then any number of pairs of states can be added, with the odd-numbered states along the bottom row constituting a path model, and the even numbered states (the “teeth”) receiving increments from the previous odd-numbered state and sending decrements to the next odd-numbered state.
The dominant right eigenvector is then [1, S_{1}, F_{1}, F_{1}S_{2}, F_{1}F_{2}, F_{1}F_{2}S_{3}, F_{1}F_{2}F_{3}, …]′, and the complete multistate model follows from the two-step approach.
Finding an appropriate value for parameter w
Because of parameter w’s pivotal role in IL models, it is important to use an appropriate value. As indicated above, w reflects the metabolism of the population, and the factor (1 − w)/(1 + w) scales all rates. If comparable rates are known or can be estimated for a similar population, an appropriate value of w is the largest subordinate root of the projection matrix associated with that rate matrix.
“Pattern” or “design” matrices can help illuminate the relationship between w and a set of transfer rates. For simplicity, and because the level of the rates is a key factor, let us assume that all of the transfer rates are equal to m. We can then construct a “pattern” rate matrix in the form of Eq. (4), reflecting the state space and possible flows in the multistate model of interest. An approximation for w is the λ_{2} of the associated projection matrix.
Values of parameter w in selected multistate models with all transfer rates equal to m and interval lengths equal to 1
Largest subordinate root of | Value of w when m is | ||||
---|---|---|---|---|---|
Model form | Rate matrix (r2) | Projection matrix (λ2) | .05 | .10 | .20 |
N = 2 path | –2 m | (1–m)/(1 + m) | .905 | .818 | .667 |
N = 3 path | –m | (2–m)/(2 + m) | .951 | .905 | .818 |
N = 3 triangular, 5-rates | –2 m | (1–m)/(1 + m) | .905 | .818 | .667 |
N = 3 full | –3 m | (2–3 m)/(2 + 3 m) | .860 | .739 | .538 |
N = 4 path | \( -\mathrm{m}\left(2-\surd \overline{2}\right) \) | \( \frac{2+2\mathrm{m}\surd \overline{2}-{\mathrm{m}}^2}{2+4\mathrm{m}+{\mathrm{m}}^2} \) | .971 | .943 | .889 |
Two hypothetical model calculations
To show how the IL-RR approach is useful in modeling multistate population dynamics, we consider two applications: a population projection over five time intervals, and a model of a rural-to-urban transition.
A hypothetical population projection in a three-state path model
A 5-interval projection in a hypothetical three-state (path) membership model with states unaffiliated (U), joined (J), and active (A)
Fraction in state (f) | z_{2t} | Rates of transfer (Eqs. (20)–(21)) | ||||||
---|---|---|---|---|---|---|---|---|
Time | U | J | A | m _{UJ} | m _{JU} | m _{JA} | m _{AJ} | |
A. Interval by interval calculations (Eq. (1)) | ||||||||
0 | .85 | .10 | .05 | – | – | – | – | – |
1 | .82925 | .10985 | .06090 | .70 | .01317 | .06586 | .08474 | .12106 |
2 | .81243 | .11769 | .06988 | .71 | .01335 | .06674 | .08291 | .11678 |
3 | .79876 | .12391 | .07733 | .72 | .01357 | .06785 | .08026 | .11147 |
4 | .78759 | .12885 | .08356 | .73 | .01384 | .06921 | .07698 | .10545 |
5 | .77844 | .13275 | .08881 | .74 | .01419 | .07093 | .07318 | .09890 |
B. Stable population values (τ_{j5}) implied by time 5 rates | ||||||||
5 | .74184 | .14837 | .10979 | |||||
C. Time 5 values calculated directly from time 0 populations and z values (Eq. (2)) | ||||||||
5 | .77844 | .13275 | .08881 |
Panel A of Table 2 shows an interval by interval population projection. The fraction of the population in state A increases steadily over time. However, transfer rate m _{JAt} decreases during the projection, though not as rapidly as m _{AJt} decreases. Rates m _{UJt} and m _{JUt} both increase while maintaining a constant ratio (i.e., z _{1}), and the fraction in state J rises steadily. At time 5, a comparison with panel B shows that the model population is still some distance from the stable population implied by the prevailing rates. If the time 5 rates remain constant, the fractions in J and A will continue to rise and the fraction in U will continue to drop. Here, w = λ_{3}.
Panel C of Table 2 shows population values for time 5 calculated directly from Eq. (2), without an interval by interval projection. The results confirm that Eq. (2) yields the same time 5 population composition as the five single-interval projections shown in panel A. The ability to project over multiple intervals is a major strength of the IL-RR approach. If information is available over age instead of time, Eqs. (1) and (2) can be used to immediately trace out the age trajectory of a real or synthetic cohort.
A two-state model of urbanization
The time 0 stable rates are m _{RU} = .0071 and m _{UR} = .0635. Rates at time 16 and after follow from z _{16} = .9/.1 = 9, with ultimate stable rates m _{RU} = .0635 and m _{UR} = .0071.
In a two-state IL model, a constant w implies that (m _{RU} + m _{UR}) is fixed (cf. “The IL solution in the two-state model” section); hence, a change in one rate must be offset by an equal and opposite change in the other. Given the nature of the τ vector, a linear increase in m _{RU}, counterbalanced by a linear decrease in m _{UR}, produces a linear increase in the stable proportion urban and a corresponding decrease in the stable proportion rural. Between times 0 and 16, we therefore let the m _{RU} and m _{UR} schedules change linearly, though in opposite directions. The rate ratios follow from the rates, whose sum remains constant at 0.0706.
At each time point, the model proportions urban and rural can be found from Eqs. (1–3). During the transition, the model proportions urban lag behind the implied stable proportions. At time 8, halfway through the behavioral shift, the implied stable proportion urban is 50%, while the model is 39% urban. After 80 years, when the ultimate rates are in place, the model population is 78% urban as opposed to the stable figure of 90%. With w = 0.7, the model population is close to stability after 125 years, with an urban proportion of 89.5%. A larger w would lead to a longer stabilization time, though the overall dynamics would be similar.
IL as an estimation method
To this point, the focus has been on how the Intrinsic Linkage-Rate Ratio approach can be used to find multistate population trajectories and their underlying behavioral rates. Now, we apply the same methodology to estimate rates of transfer when the populations at the beginning and end of an interval are known. The IL-RR approach adds to presently existing estimation procedures, such as IPF (iterative proportional fitting) (Bishop et al. 1975; Willekens 1982) and QERT (quadratic estimation of rates of transfer) (Schoen 2015).
The IL estimation procedure
The advantage of IL-RR is its ability to provide not (N–1) but 2(N–1) population-based constraints on the transfer rates. Hence, when parameter w and the populations at the beginning and end of an interval are known, 2(N–1) transfer rates can be found. Additional rates can be estimated if more information is available in any form, including rate ratios, rate products, cross-product ratios, or observed rates.
The estimation procedure is straightforward. The (N–1) IL equations, (N–1) eigenvector element equations, and (N–1) flow equations, along with any needed supplementary equations, are simultaneously solved for the (N–1) τ_{j} values and the transfer rates. Multiple solutions typically arise, and a demographically valid solution is not assured. If the number of rates to be found is less than 2(N–1), IL equations can be dropped or combined. The manner in which the IL equations are consolidated does not affect the rate estimations, as long as w is a subordinate root of the implied projection matrix. When R < 2(N–1), results may be quite sensitive to input values. If no demographically valid solution is found, a different value for parameter w should be considered.
where D _{12} = (n/2) (1 + w) [f _{2t} (1 – f _{3,t–1}) – f _{2,t–1} (1 – f _{3t})]. In the estimations, the effects of parameter w are no longer separable from compositional effects.
Evaluating the IL-RR rate estimates
To evaluate the IL-RR estimates relative to other methods, let us begin with the N = 2 case. In the two-state projection matrix, subordinate root λ_{2} is a constant and equals parameter w. The subordinate root of rate matrix M _{2}, i.e., r _{2}, equals –(m _{12} + m _{21}). It follows from Eq. (14), that r _{2}, the sum of the rates, is fixed. In contrast to that constant sum constraint in IL-RR, the QERT estimating approach fixes the product of m _{12} and m _{21}.
In a rough nonlinear way, r _{2} and r _{3} are again constrained by the sum of the transfer rates.
In two-state models, the IL-RR sum constraint does not appear to perform quite as well as the product constraint in QERT or the constant cross-product ratio relationship in IPF. In a comparison paralleling Table 1 of Schoen (2015), the QERT and IFP approaches both had percent errors in estimating m _{21} of 0.5, –5.5, and –10.2. The corresponding IL-RR estimates had percent errors of –4.3, –7.8, and –10.7.
Objective comparisons are more difficult to make when there are more than two states. For example, in an N-state model with 2(N − 1) rates, IL-RR requires only one parameter, w, while QERT requires the values of (N–1) products and IPF needs values for (N − 1) cross-product ratios. When good estimates of the requisite parameters are available, QERT or IPF may have an advantage, but IL-RR and its single parameter may be useful in situations where R = 2(N – 1) or when information is more limited.
An illustrative IL-RR rate estimation: voting behavior
There are a number of areas of demographic interest where multistate population data are available from censuses or surveys, but where rates of interstate transfer are hard to obtain. Here, we look at one such area, voting behavior, which has long been of interest to social scientists. For some time, the U.S. Census Bureau, through the November Current Population Survey, has been asking adult citizens whether they have voted. Using those data, Land et al. (1986) constructed and analyzed two-state (voting/nonvoting) life tables for the U.S. Presidential elections of 1972, 1976, and 1980.
Here, we consider elections for U.S. President and members of Congress. There is a consistent pattern of higher voter turnout in Presidential election years (2004, 2008, etc.) than in years when there are only Congressional elections (2002, 2006, etc.). Assume that those who voted in a given Congressional election year also voted in the previous Presidential election. Then, data on voting in the immediately past Congressional election can be used to classify persons by three voting statuses: voted in the last (Congressional) election (L), did not vote in the last election but voted within the last 4 years (P), and has not voted in the last 4 years (N).
The four transfer rates are m _{LP}, m _{PL}, m _{PN}, and m _{NL}. There are 4 errors in the subscripts in the bracketed expression. It [(m _{PL} + m _{PN})/m _{LP}, 1, m _{PN}/m _{NL}]′. The two-step solution for the rates from the two IL, two τ_{j}, and two flow equations is straightforward. Numerically, the solution is unique, but not necessarily valid demographically.
Estimated rates of transfer between voting statuses in the USA, 2002–06 and 2006–10, from a three-state (triangular) model
Proportion in state | |||||
---|---|---|---|---|---|
Year | Voted in the last Congressional Election (L) | Did not vote in last election; voted within the last 4 years (P) | Has not voted in the last 4 years (N) | ||
2002 | .395 | .147 | .458 | ||
2006 | .404 | .197 | .399 | ||
2010 | .410 | .206 | .384 | ||
Transfer rates | m _{LP} | m _{PL} | m _{PN} | m _{NL} | |
2002–06 | .0598 | .0494 | .0169 | .0412 | |
2006–10 | .0493 | .0472 | .0412 | .0308 |
Summary and conclusions
Intrinsic Linkage-Rate Ratio models afford a new approach to multistate modeling, facilitating the projection of multistate populations with changing rates and the estimation of interstate transfer rates from adjacent population data. The IL-RR approach exploits the connections between model population composition, transfer rates, and the composition of the stable population implied by those rates. By definition, IL parameter w weights the compositions of the beginning of interval and ultimate stable populations to yield the end of interval population composition. Analysis shows that w is also a subordinate root of the associated population projection matrix.
In analyzing multistate models with N states, there are (N − 1) IL equations like Eq. (1) and (N − 1) flow equations like Eq. (5) that describe the movements between states. With R nonzero transfer rates, if parameter w and the initial population composition are known, (R − N + 1) additional constraints must be available to determine the end of interval composition and the R rates.
The (N − 1) independent elements of the dominant right eigenvector of the rate matrix, or the τ_{j} values, are functions of the rates. If R ≥ 2(N − 1) and the eigenvector elements are expressed in terms of rate functions, a “two-step” solution is possible. In step one, the population can immediately be projected as far into the future as the τ_{j} values extend because, under intrinsic linkage, the τ_{jt} values shape future population composition. In step two, the transfer rates are found. The IL-RR approach can determine 2(N − 1) transfer rates; if R = 2(N − 1), solutions can be found for models with any number of states. If R > 2(N − 1), the additional rates can be found using information from any available source. Simple ratios of rates can provide the needed constraints.
The effects of parameter w are separable from those of the τ_{j} values. Parameter w determines the level of all transfer rates, and hence reflects the population’s metabolism or speed of convergence to stability. When w = 0, the model immediately stabilizes, leading to a previously unrecognized phenomenon: a “dynamically stable” population (i.e., a changing rate population whose composition at every time point reflects the stable composition implied by the prevailing rates).
The IL-RR approach provides a new method for rate estimation when initial and end of interval populations are known. In N-state models, intrinsic linkage provides (N − 1) new constraints on the transfer rates. Thus, the IL and flow equations allow 2(N − 1) transfer rates to be determined from a single parameter, w. The rate estimates are sensitive to the choice of w, but a reasonable parameter value can usually be found from the nature of the model and the likely level of the rates.
The IL-RR approach opens new lines of research on multistate models with changing rates. Further analyses of IL-RR model relationships are in order, including the development of additional ties to graph theory. Future work can explicitly incorporate mortality and fertility and recognize multiple ages as well as states. An extension allowing parameter w to vary over time/age may enhance cohort analyses.
The present results show that intrinsic linkages embody important demographic relationships, connecting population composition, transfer rates, and the stable populations implied by those rates. In multistate contexts, they provide useful new tools for analytical projections and rate estimation.
Declarations
Acknowledgements
Assistance from Vladimir Canudas-Romo, Carl Boe, and Adam Lenart is gratefully acknowledged.
Ethics approval and consent to participate
Not applicable.
Competing interests
The author declares that he has no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Bishop, Y. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis: Theory and Practice. Cambridge: MIT Press.Google Scholar
- Caswell, H. (2001). Matrix population models (2d ed.). Sunderland: Sinauer.Google Scholar
- U.S. Census Bureau, 2002-2010. Current Population Reports, November, Washington DC.Google Scholar
- Chartrand, G., & Zhang, P. (2012). A first course in graph theory. Mineola: Dover.Google Scholar
- Franklin, J. N. (1968). Matrix theory. Englewood Cliffs: Prentice-Hall.Google Scholar
- Gantmacher, F. R. (1959). Matrix theory. New York: Chelsea Publishing Company.Google Scholar
- Land, K. C., & Rogers, A. (Eds.). (1982). Multidimensional mathematical demography. New York: Academic.Google Scholar
- Land, K. C., Hough, G. C., & McMillen, M. M. (1986). Voting status life tables for the United States, 1968-1980. Demography, 23, 381–402.View ArticleGoogle Scholar
- Rebane, G., & Pearl, J. (1987). The recovery of causal poly-trees from statistical data. Seattle: Proceedings of the Third Annual Conference on Uncertainty in Artificial Intelligence.Google Scholar
- Rogers, A. (1975). Introduction to multiregional mathematical demography. New York: Wiley.Google Scholar
- Ryder, N. B. (1975). Notes on stationary populations. Population Index, 41, 3–28.View ArticleGoogle Scholar
- Schoen, R. (1988). Modeling multigroup populations. New York: Plenum.View ArticleGoogle Scholar
- Schoen, R. (2006). Dynamic population models. Dordrecht: Springer.Google Scholar
- Schoen, R. (2013). A dynamic birth-death model via Intrinsic Linkage. Demographic Research, 28, 995–1020.View ArticleGoogle Scholar
- Schoen, R. (2014). Intrinsic linkages in dynamic multistate populations. Genus, 70, 57–73.Google Scholar
- Schoen, R. (2015). Multistate transfer rate estimation from adjacent populations. Population Research and Policy Review, 35(2016), 217–240.Google Scholar
- Schoen, R. (2016). Hierarchical multistate models from population data: an application to parity statuses. PeerJ, 4, e2535.View ArticleGoogle Scholar
- Schoen, R., & Kim, Y. J. (1991). Movement toward stability as a fundamental principle of population dynamics. Demography, 28, 455–466.View ArticleGoogle Scholar
- Willekens, F. J. (1982). Multistate population analysis with incomplete data. In K. C. Land & A. Rogers (Eds.), Multidimensional mathematical demography (pp. 43–111). New York: Academic.View ArticleGoogle Scholar