My career dream in bioengineering makes me fascinated by the complex relationship between biology, mathematics and engineering. These are the things that make me have a special attraction to these fields because I appreciate their role in tackling pressing health problems especially when it comes to infectious diseases which have a direct effect on the community where I come from.
Diseases like measles in Kenya have severe consequences that are profound at times resulting into death. My decision to learn extensively about infectious diseases was significantly influenced by seeing this happen directly. The effects of infectious diseases unlike other types of illnesses affect people who are close to you such as your family members and friends, therefore they become a matter of great significance for me.
I decided to analyze how well the SIR model aligns with previous measles outbreaks in America during the 60s and 70s to bridge past successes with current challenges faced here in Kenya. The aim is to gain some knowledge that can be used directly to benefit my local community by assessing the degree of fit between historical data and the model specificities like vaccination strategies or demographic features. This exploration connects my Kenyan background with worldwide efforts in disease prevention.
Lessons learned from the U.S. experience with measles can serve as a practical guide for decision-making in Kenya. By breaking down the complexities of infectious diseases through mathematical modelling, my goal is not only to enhance my academic understanding but also to contribute to the development of practical solutions that can save lives and improve the well-being of my community.
In the history of mankind, infectious diseases have always been a tedious task and have helped to shape the fate of civilizations and policies regarding public health. It is important to understand how infectious diseases spread so as to effectively manage them for purposes of public safety and timely interventions and healthcare strategies. Some of these include the Susceptible-Exposed-Infectious-Resistant (SEIR) model and the Susceptible-Infectious-Susceptible-Recovered (SIRS) model (Anderson and May, 1991). The SIR model which stands for "susceptible-infectious-recovered", groups a population into three compartments susceptible (S), infectious(I), recovered/removed(R). This model assumes that upon recovery individuals acquire immunity thus they become non-susceptible. This is best applied in cases like measles or chickenpox. In the SIR model, an additional "exposed" compartment is included for individuals who have contracted the disease but are not yet infectious. This helps to cover the incubation period of diseases and makes it particularly important in diseases such as HIV (Anderson and May, 1991).
To understand how mathematical modelling can be applied in predicting outbreak trends in specific areas, we must be familiar with crucial concepts, terms and mathematical models relating to infectious diseases. In subsequent sections of this IA I will investigate measles using the SIR model as an example because measles does not exhibit re-infection traits hence once you have measles you cannot acquire it once again.
As mentioned above, I will be using a compartmental model to predict the spread of measles in the United States during the 1960s and 1970s when the measles vaccine, which partitions the population into distinct compartments based on their disease status, such as susceptible, infected, and recovered/removed individuals. This is known as the SIR model. The SIR model, widely used for modelling the spread of infectious diseases can be described using Euler's method, which is an approach to approximate the solutions of differential equations. This is done by breaking the problem into smaller steps in terms of each compartment. I will use Euler's method to describe how these compartments change over time in the SIR model. We will assume the population, \(N\), is classified into 3 distinct groups of individuals, which are represented by the letters S, I, and R. The groups are defined below
\(\mathbf{S}(t)\) : Number of susceptible individuals. Those in this group have not been infected with the disease and are not immune to it. They are simply those at risk of infection.
\(\mathbf{I}(t)\) : Number of infected/infectious people. They are capable of transmitting the disease to susceptible people.
\(\mathbf{R}(t)\) : Number of recovered/removed people. These are people who can no longer be infected by the disease, either due to immunity, recovery, or death.
Figure 1 illustrates the temporal dynamics of population transitions among various groups, with the parameters \(\beta\) and \(\gamma\) representing the rates between groups, forming a system of equations.
New infections arise from the interactions between susceptible and infected individuals. In this model, the rate of new infections (\(\beta \mathrm{SI}\)) is determined by the positive constant \(\beta\), signifying the probability or transmission rate between an infected individual (I) and susceptible individuals (S). The first differential equation captures this process, where an individual moves from the susceptible group (S) to the infected group (I). Notably, the absence of a pathway for individuals to re-enter the susceptible group results in the first differential equation being expressed as \(-\beta\) SI.
\[\frac{d S}{d t}=-\beta S I\]
The negative sign on the right-hand side of the equation (- \(\beta \mathrm{SI}\)) implies that the rate of change of susceptible individuals \((\mathrm{S})\) is proportional to the product of \(\mathrm{S}, \mathrm{I}\), and \(\beta\). Consequently, the number of susceptible individuals decreases as the number of infected individuals increases. The subsequent phase involves individuals transitioning to the recovered group (R) at a rate denoted by \(\gamma \mathrm{I}\), where \(\gamma\) is a positive constant. This leads to the formulation of the next two differential equations.
\[\begin{gathered}\frac{d I}{d t}=\beta S I-\gamma I, \\\frac{d R}{d t}=\gamma I,\end{gathered}\]
In the context of the presented model, where 't' signifies time, '\(\beta\)' denotes the transmission rate (calculated as the probability of transfer per contact multiplied by the contacts per day), and '\(\gamma\)' represents the recovery rate (expressed as the reciprocal of the average length of illness, i.e., \(1 / \gamma\) corresponds to the average infectious period).
The equation's left side, \(\mathrm{dI} / \mathrm{dt}\), represents the derivative of the number of infected individuals (I) with respect to time. On the right side, the equation is expressed as \(\beta\) SI \(-\gamma \mathrm{I}\). Here, \(\beta\) SI accounts for the rate of new infections, while \(\gamma \mathrm{I}\) represents the rate of recovery among infected individuals. Accordingly, the number of infected individuals increases when the rate of new infections exceeds the rate of recovery and decreases when the rate of recovery surpasses the rate of new infections.
The second equation's left side, \(\mathrm{dR} / \mathrm{dt}\), denotes the derivative of the number of recovered individuals (R) with respect to time. On the right side, the equation simplifies to \(\gamma \mathrm{I}\), indicating that the number of recovered individuals increases proportionally to the rate at which infected individuals are recovering.
Importantly, in this model, the variable R is solely determined by S and I, eliminating the necessity to consider the \(R\) equation separately. Thus, the system is reduced to a set of two equations.
\[\begin{aligned}& \frac{d S}{d t}=-\beta S I \\& \frac{d I}{d t}=(\beta S-\gamma) I\end{aligned}\]
together with initial conditions:
\[S(0)=S_{0}, I(0)=I_{0}, S_{0}+I_{0}=N\]
The model's validity depends on the condition that \(S(t)\) and \(I(t)\) must remain positive throughout its application. Population values are constrained to be non-negative, and when either S or I reaches zero, the rates of change become zero. This happens because, at that point, the entire population would have transitioned to the recovered (R) category, signifying the end of the disease.
In this scenario, the condition \(S^{\prime}(t) \leq 0\) holds, and \(I^{\prime}(t)>0\) only if \(S>\gamma / \beta\). Consequently, the number of infected individuals (I) continues to increase as long as \(S\) exceeds the threshold value of \(\gamma / \beta\). When \(S^{\prime}(t)<0\), it implies that \(S\) is decreasing for all values of \(t\), and I tends towards zero. This occurs as susceptible individuals become infected, move to the infectious category, and ultimately progress to the recovered state, resulting in a decline in the infected population.
If the initial value of \(\mathrm{S}\left(\mathrm{S}_{0}\right)\) is less than \(\gamma / \beta\), the infected population (I) decreases, indicating the absence of an epidemic. Conversely, if \(S_{0}\) is greater than \(\gamma / \beta\), the infected population initially experiences an upsurge to a maximum when \(S\) equals \(\gamma / \beta\), followed by a subsequent decline to zero, forming an epidemic.
The determination of whether an epidemic will occur hinges on the basic reproduction number \(\left(\Re_{0}\right)\), quantifying the number of secondary infections caused by a single infected person during their infection. The basic reproduction number is denoted by \(\Re_{0}\) and is represented by the quantity \(\mathrm{NS}_{0}\). If \(\Re_{0}>1\), an epidemic is expected; conversely, if \(\Re_{0}<1\), the infection is projected to die out. In this model, the assumption is made that an infected person establishes N contacts per unit of time with susceptibles. Given that the mean infective period is denoted as 1, the basic reproduction number is N.
Continuing the analysis of the system, the summation of the two equations (\(\mathrm{dS} / \mathrm{dt}=-\mathrm{SI}\) and \(\mathrm{dI} / \mathrm{dt}\) \(=(\mathrm{S}-\gamma / \beta) \mathrm{I})\) yields \((\mathrm{S}+\mathrm{I})^{\prime}(\mathrm{t})=-\mathrm{I}\). Consequently, \(\mathrm{S}+\mathrm{I}\) represents a nonnegative, smooth, decreasing function, and therefore, it approaches a limit as time (t) approaches infinity.
Furthermore, the derivative of a nonnegative, smooth, decreasing function must tend to zero.
\[\lim _{t \rightarrow \infty}(S+I)^{\prime}(t)=0\]
It follows that
\[I_{\infty}=\lim _{t \rightarrow \infty} I(t)=0\]
Working with the above results, we can determine a relationship between the size of the epidemic and the basic reproduction number. Division of the first equation of (i) by \(S\) and integration from 0 to \(\infty\) gives the following.
\[\begin{gathered}S^{\prime}=-\beta S I \\\Rightarrow \frac{S^{\prime}}{S}=-\beta I \\\Rightarrow \int_{0}^{\infty} \frac{S^{\prime}}{S} d t=-\beta \int_{0}^{\infty} I d t \\\Rightarrow \quad-\int_{\infty}^{0} \frac{S^{\prime}}{S} d t=-\beta \int_{0}^{\infty} I d t \\\Rightarrow \int_{\infty}^{0} \frac{S^{\prime}}{S} d t=\beta \int_{0}^{\infty} I d t \\\Rightarrow \quad \lim _{a \rightarrow \infty} \int_{a}^{0} \frac{S^{\prime}}{S} d t=\frac{\beta}{\gamma} \int_{0}^{\infty} \gamma I d t\end{gathered}\]
\[\begin{gathered}\Rightarrow \lim _{a \rightarrow \infty}[\log S(t)]_{\mathrm{a}}^{0}=\frac{\beta}{\gamma} \int_{0}^{\infty} R^{\prime}(t) d t, \frac{d}{d t} \log (S(t))=\frac{S^{\prime}(t)}{S(t)} \\\Rightarrow \log \left(S_{0}\right)-\log \left(S_{\infty}\right)=\frac{\beta}{\gamma} \lim _{b \rightarrow \infty}[R(b)-R(0)] \\\Rightarrow \log \frac{S 0}{S \infty}=\frac{\beta}{\gamma}\left[N-S_{\infty}\right], \text { assuming } R(0)=0 \\\Rightarrow \log \frac{S 0}{S \infty}=\frac{\beta N}{\gamma}\left[1-\frac{S \infty}{N}\right] .\end{gathered}\]
We can therefore write the final equation as
\[\log \frac{S 0}{S \infty}=\Re_{0}\left[1-\frac{S \infty}{N}\right] .\]
Equation (ii) becomes the final size relation. This gives a connection between the size of the epidemic and the basic reproduction number. Assuming there are no deaths, the number of members of the population who are infected throughout the infection is \(N-S_{\infty}\). An attack ratio can also be determined \(\left(1-\frac{S_{\infty}}{N}\right)\). This calculation helps to understand the relationship between the size of the epidemic and the basic reproduction number \(\Re_{0}\), which is a key parameter in the SIR model. \(\Re_{0}\) is the average number of secondary infections caused by a single infectious individual in a completely susceptible population. To do this, it integrates the rate of change of susceptible individuals (\(\mathrm{S}^{\prime}\)) with respect to S over time. This means that it sums up the smallest changes in the number of susceptible individuals over time. The result is then equated to the integral of (I) and multiplied by the rate of transmission (\(\beta\)), which represents the rate at which susceptible individuals contract the disease. By solving the equation, I arrived at a final equation that relates the size of the outbreak to the basic reproduction number \(\left(\Re_{0}\right)\) and the fraction of the population that is immune (\(\mathrm{S} \infty / \mathrm{S} 0\)). This relationship is important because it can then be compared to the actual outbreak patterns observed in the US during the 1960s and 1970s to assess how well the SIR model reflects that reality. If the final equation derived from the calculation closely matches the observed outbreak patterns, it would suggest that the SIR model is a good fit for understanding measles outbreaks in the US during that period. On the other hand, a poor fit would indicate that the SIR model needs to be refined or that other factors not considered by the SIR model may be at play.
It is important to consider the major assumptions that come into play in this SIR model. The model assumes a case of homogeneous mixing which implies that the individuals in the population have an equal chance of coming into contact with any other individual, whereas in reality, mixing patterns are heterogeneous such as clustering in households, schools, or workplaces. Moreover, I assumed that parameters such as transmission rate, recovery rate, and population size remained constant throughout the outbreak. My model will moreover assume that all individuals in the population are susceptible to the disease, which, due to the model's simplicity, does not take into account the cases of natural immunity to the disease. However, the modelling of measles fits into this model's assumptions. Measles typically provides lifelong immunity to those who have recovered from the disease which aligns with the model's assumption of permanent immunity. Moreover, it has simple transmission dynamics making it easy to understand the flow rates. Lastly, it is characterized by a short infectious period before progressing into the R group which aligns with the use of the model of short infectious durations.
I will use the data table below formulated from information from the Centers for Disease Control and Prevention (CDC) per 100,000 US population;
Year | Reported infection rate(1 d.p) |
---|---|
1960 | 237.1 |
1961 | 224.1 |
1962 | 250.1 |
1963 | 198.0 |
1964 | 232.4 |
1965 | 131.3 |
1966 | 101.2 |
1967 | 30.8 |
1968 | 10.8 |
1969 | 12.4 |
1970 | 22.6 |
1971 | 35.6 |
1972 | 15.1 |
1973 | 12.4 |
1974 | 10.2 |
1975 | 11.2 |
1976 | 18.6 |
1977 | 25.9 |
1978 | 12.0 |
1979 | 6.0 |
1980 | 5.9 |
According to NHS inform, Measles recoveries take about 7 to 10 days, making the average recovery rate 8.5. This makes \(\gamma=\frac{1}{8.5}=0.118(3 s.f)\). To find \(\beta\), we can add the growth rate(r) to the recovery rate \((\gamma)\). Therefore this will be calculated \(\frac{\ln \left(\frac{12}{l 1}\right)}{\Delta t}+\gamma\) where 12 and 11 are the numbers of infections at two selected time points and \(\Delta t\) is the time interval between these points. I will select two peaks on the graph, being 1962 and 1964 where the infection rates are 250.1 and 232.4. This will give us \(\frac{\ln \left(\frac{250.1}{232.4}\right)}{1964-1962}\) which according to the GDC is equal to 0.3886 (4 s.f.) therefore the value for \(\beta \approx 0.3886\). This has to be noted as a simple estimate that creates grounds for this exploration.
In terms of Euler's method and Figure i, we can estimate the change in our values through the;
\[\begin{gathered}\Delta S \approx(\beta \times S \times I) \Delta t \\\Delta I \approx(\beta \times S \times 1-\gamma \times I) \Delta t \\\Delta R \approx \gamma \times I \times \Delta t\end{gathered}\]
Using this, we can now formulate a table of values below assuming 1960 is \(\mathrm{S}_{0}\). With the provided initial conditions and parameters, I can generate a table of values for S (susceptible), I (infected), and R (recovered) over 20 years (from 1960 to 1980) using Euler's method. The time step is set at 1 year.
Now, I'll calculate the values for S, I, and R for each year using the provided parameters. Since Euler's method is a numerical technique for solving differential equations, the calculations require a much smaller time step than one year. It's not feasible to simulate 20 years with a time step of one year accurately. To simulate this accurately, we'll need to use a much smaller time step, likely on the order of days or even smaller.
This made me choose a time step of one day. This will require us to alter the transmission rate to show a day that is equal to \(0.3668 \div 365=0.0010(4 \mathrm{~s} . f)\) With a time step of one day, we can simulate the 20 years more accurately;
Time Step | Susceptible (S) | Infected (I) | Recovered (R) |
---|---|---|---|
0 | 99763 | 237 | 0 |
1 | 99692.0685 | 284.181061 | 28.418106 |
2 | 99607.0767 | 340.682289 | 62.486335 |
3 | 99505.2736 | 408.313113 | 103.317646 |
4 | 99383.3857 | 489.22042 | 152.239688 |
5 | 99237.5245 | 585.945449 | 210.834233 |
6 | 99063.0812 | 701.487589 | 280.982992 |
7 | 98854.6067 | 839.374669 | 364.920459 |
8 | 98605.6785 | 1003.738528 | 465.294312 |
9 | 98308.7555 | 1199.393532 | 585.233665 |
10 | 97955.0229 | 1431.914041 | 728.425069 |
11 | 97534.2334 | 1707.704552 | 899.195524 |
12 | 97034.5544 | 2034.053148 | 1102.600839 |
13 | 96442.4341 | 2419.154943 | 1344.516333 |
14 | 95742.5065 | 2872.087322 | 1631.725066 |
15 | 94917.564 | 3402.713186 | 1971.996384 |
16 | 93948.6322 | 4021.482617 | 2374.144646 |
17 | 92815.1939 | 4739.098422 | 2848.054488 |
18 | 91495.6129 | 5566.008723 | 3404.65536 |
19 | 89967.8167 | 6511.692808 | 4055.824641 |
The actual data table provided by the Centers for Disease Control and Prevention (CDC): Measles Cases and Outbreaks by Year https://www.cdc.gov/measles/cases-outbreaks.html shows how my values compare to the one provided by my model in absolute numbers.
Time step | Susceptible | Infected Population | Recovered Population |
---|---|---|---|
0 | 99,789 | 211 | 0 |
1 | 99,660 | 261 | 89 |
2 | 99,506 | 320 | 174 |
3 | 99,330 | 387 | 263 |
4 | 99,128 | 463 | 355 |
5 | 98,901 | 549 | 480 |
6 | 98,639 | 646 | 625 |
7 | 98,345 | 756 | 789 |
8 | 98,021 | 879 | 970 |
9 | 97,661 | 1015 | 1144 |
10 | 97,253 | 1167 | 1340 |
11 | 96,793 | 1335 | 1542 |
12 | 96,281 | 1520 | 1769 |
13 | 95,710 | 1722 | 2018 |
14 | 95,062 | 1942 | 2296 |
15 | 94,329 | 2181 | 2590 |
16 | 93,490 | 2439 | 2901 |
17 | 92,525 | 2719 | 3226 |
18 | 91,427 | 3020 | 3573 |
19 | 90,192 | 3343 | 3945 |
The above chart is a representation of the data gathered from CDC which is Centers for Disease Control and Prevention (CDC) and the data calculated using Euler's method. This is done on a specified population of 100,000 residents.
In the comparison of the infected population (I) data curves, my calculated values show a slightly earlier and higher peak in the infected population around time step 10 compared to the CDC data, which peaks at time step 12. This suggests that my model might be overestimating the initial transmission rate or underestimating the recovery rate in the early stages of the outbreak. Moreover, the infected population in my model increases at a slightly faster rate compared to the CDC data, particularly between time steps 5 and 10. This further supports the possibility of a higher initial transmission rate in your calculations. My model also appears to reach an endemic equilibrium (a constant level of infected individuals) slightly earlier than the CDC data. This implies that in my simulation, the virus reaches a stable level of transmission within the population quicker than observed in the actual outbreak.
As for similarities, it is clear that both the CDC data and my calculated values closely match in the very early stages of the outbreak (up to time step 5). This indicates that my model captures the initial rise in infections with reasonable accuracy. Additionally, despite the differences in peak timing and rate of increase, both datasets show a similar overall upward trend in the infected population throughout the 20 time steps. This suggests that both the CDC data and your model capture the general pattern of sustained transmission in this particular outbreak. These observations can be explained by the model assumptions; the SIR model makes certain simplifying assumptions, such as homogeneous mixing of the population and constant transmission rates which can never be reflected in a real-world setting. Deviations from these assumptions in the real-world outbreak could explain some of the discrepancies between the model and the CDC data. Moreover, parameter uncertainty plays a role in the differences between my model and the CDC model. Choosing the appropriate values for parameters like transmission rate and recovery rate presents some challenges because slight variations can lead to big differences in the model's output, contributing to the observed differences in peak timing and rate of increase. Additionally, real-world outbreaks tend to be stochastic which means that random events can influence their course yet the model is deterministic and doesn't account for these random fluctuations, which could also contribute to the discrepancies observed. In the comparison of recovered population (R) data, it can be seen that my calculated values show a slightly slower rate of recovery compared to the CDC data. This means that in my model, individuals remain infected for slightly longer durations on average. This could contribute to the higher peak and faster initial increase in the infected population. Moreover, similar to the infected population, my calculated recovered population reaches an equilibrium level slightly earlier than the CDC data. This further suggests a quicker stabilization of the virus within the population in my model. Despite the differences in rate of recovery, both the CDC data and my calculations show a similar upward trend in the recovered population throughout the 20 time steps. This indicates that both datasets capture the general pattern of increasing recovered individuals as the outbreak progresses. Furthermore, the recovered population curves for both datasets converge towards the end of the simulation, reaching nearly identical values in the final time steps. This suggests that both the CDC data and my model eventually lead to similar long-term immunity levels within the population.
The slower rate of recovery in my model could be due to factors like the lower recovery rate parameter as this parameter determines the average duration individuals spend in the infected state. A slightly lower value in my model could explain the delayed recovery observed. My model model might also not fully capture the effectiveness of vaccination or natural immunity in reducing the duration of infection.
Overall, considering the recovered population data strengthens the analysis and provides a more comprehensive picture of the outbreak dynamics. While there are still some discrepancies between my model and the CDC data, particularly in the rate of recovery and timing of equilibrium, there are also significant overlaps and valuable insights gained from the comparison. The SIR model provides a valuable tool for understanding the general dynamics of the infection spread in this outbreak. While there are some noticeable differences between the model predictions against the CDC data, there are also significant overlaps, suggesting that the model captures some key aspects of the real-world situation. Further refinements to the model parameters and incorporating additional factors could potentially improve the accuracy of the predictions and provide deeper insights into the outbreak dynamics. In conclusion, while the SIR model may be a useful tool in predicting the spread of infectious disease, it did not align as well as expected with the observed patterns of measles outbreaks in the United States as it did not consider other useful factors such as variable transmission and recovery rates, birth and death rates as well as natural immunity to the diseases.
AI Assist
Expand