Mathematics AI SL's Sample Internal Assessment

Mathematics AI SL's Sample Internal Assessment

To what extent does the workforce size of U.S. Atomic Power Plants influence cancer rates among employees and retirees, categorized by age groups (30-45, 45-60, 60-75 years)?

7/7
7/7
10 mins read
10 mins read
Candidate Name: N/A
Candidate Number: N/A
Session: N/A
Word count: 1,946

Table of content

Rationale

Being an inquirer and a creative thinker, I always aspired to contribute to society with the skill and knowledge I procure. I believe, real-life experience is something that genuinely motivates with an internal objective to persuade. I recently came across one of the most harmful diseases called cancer as one of my neighbours recently detected. He, being a worker at a nuclear power plant, doctors have assumed that leakage of radiation was one reason behind cancer. The statement claimed by the doctor has raised several curiosities in our mind. Does working in a nuclear power station causes cancer? Does the age of nuclear power plant employees increase the chance of getting infected by cancer? To derive the answers to the questions, I have done a few research. I have read a few research journals on cancer and medical science, which has enabled me to understand different cancer causative agents.

 

Understanding several causes of cancer, I have tried to explore the probability of getting infected by cancer based on one of the most significant nuclear power plant parameters, i.e., the number of working employees. To derive a correlation between the chances of getting infected in a nuclear power station based on the total number of employees, I have also researched different correlation coefficients to justify the derived correlation. In the process, I have learnt the use of Pearson’s Correlation Coefficient, which is an extension of the regression correlation coefficient that I have studied in the curriculum of IB.

 

After all of these researches, I have come to the research question of this exploration intending to find the chance of getting infected by cancer if a person is working in a nuclear power plant with a more significant number of employees than that of a nuclear power plant with less number of employee.

Aim

This exploration's prime objective is to derive a relationship on chances of getting infected by cancer for a worker of an Atomic Power Station and the total number of working professionally in the power station.

Research question

To what extent is there a correlation for three different age groups of individuals (Gr 1: 30 years to 45 years, Gr 2: 45 years to 60 years, and Gr 3: 60 years to 75 years) between the number of workers getting infected by Cancer during the period of their service as well as after retirement from job in different Atomic Power Plants in the United States of America and the total number of workers working in the Atomic Power Plant?

Background information

Atomic power plant

Atomic power plant uses the process of nuclear fission to generate energy. It is performed in nuclear reactors where heat is generated which is further used to generate electricity. During the process, several radiations, such as, α - rays, β - rays, γ - rays and many more are emitted. Amongst the mentioned rays, the most harmful radiation is the γ ray. Though many precautions are taken in atomic power plants to prevent leakage of radiations; however, cases of radiation leakage are observed which invariably affect human life and environment.

Regression correlation coefficient

Regression correlation coefficient provides information about the stability of any obtained correlation between a dependent variable and its corresponding independent variable. The magnitude of the coefficient lies between 0 and 1. Here, the correlation's maximum strength is denoted by 1, whereas, a minimum strength of correlation or no correlation is represented by 0. The mathematical formulation of the regression correlation coefficient for a linear trend is shown below:

 

\(r^2=\bigg[\frac{n\big(\sum xy\big)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-\big(\sum x\big)^2][n\sum y^2-\big(\sum y\big)^2}]}\bigg]^2\)

 

x = independent variable

 

y = dependent variable

 

r2 = regression correlation coefficient

 

n = number of observations

Pearson’s correlation coefficient

Pearson’s correlation coefficient provides information about the stability and the nature of any obtained correlation between a dependent variable and its corresponding independent variable. The magnitude of the coefficient lies between -1 and 1. Here, the maximum strength of the correlation is denoted by the value of ±1, whereas, a minimum strength of correlation or no correlation is represented by 0. A positive value of Pearson’s Coefficient signifies that the relationship is increasing in nature, and that of a negative value indicates that the relationship is decreasing in nature. The mathematical formulation of Pearson’s correlation coefficient for a linear trend is shown below:

 

\(R=\frac{\sum(x-\bar x)(y-\bar y)}{\sqrt{\sum(x-\bar x)^2\times\sum(y-\bar y)^2}}\)

 

x = independent variable

 

y = dependent variable

 

R = Pearson's correlation coefficient

 

\(\bar x = \,mean \,value \,of \,all \,observations \,of \,the \,independent \,variable\)

 

\(\bar y = \,mean \,value \,of \,all \,observations \,of \,the \,dependent \,variable\)

Exploration methodology

In this exploration, ten central atomic power stations in the United States of America are chosen. The total number of employees, currently working or have worked in each organisation, has been collected from three different age groups, as mentioned in the research question. The total number of workers infected by cancer during their tenure of service or after retirement is based on each age group and the atomic power station. To verify the collected data's stability, the percentage of infected employees for each nuclear power station has been calculated based on their organisation. Finally, the correlation between the number of infected employees of each age group and each power station has been plotted compared to the total number of employees working or worked in the corresponding power station. To verify the correlation, regression correlation coefficient and Pearson's correlation coefficient has been calculated, and the correlation is evaluated using T-Test.

Hypothesis

Null hypothesis

It is assumed that no correlation is obtained between the number of employees getting infected by Cancer during the period of their service as well as after retirement from the job in different Nuclear Power Plants in the United States of America and the total number of employees working in the Nuclear Power Plant.

Alternate hypothesis

It is assumed that a correlation is obtained between the number of employees getting infected by Cancer during the period of their service as well as after retirement from the job in different Nuclear Power Plants in the United States of America and the total number of employees working in the Nuclear Power Plant.

Data collection

Case 1 for group 1 (30 years to 45 years)

Data table -

NameTotalInfectedPercentage
Rochester City Project3283310.06
Chicago City Project3483610.34
San Diego City Project3864210.88
Newark City Project452721.593
Texas City Project4585311.57
Dayton City Project6738813.08
Virginia City Project72410214.09
Utah City Project97717718.12
Boston City Project156330119.26
Austin City Project387487822.66

Figure 1 - Table On Total No. Of Employees Vs. No. Of Employees Infected (Gr1: 30 – 45 Years)

Sample Calculation:

 

Percentage of Infected Worker in Rochester City Project

 

\(= \frac{33}{328} = 10.06\)

 

Graphical Analysis:

Figure 2 - No Of Worker Infected Versus Total No. Of Employees (GR1: 30 - 45 Years)

Analysis of graph 1

The above graph represents the relationship between the number of employees aged between 30 and 45 who are infected by cancer during their tenure of service at different Nuclear Power Plants in the USA. The total number of employees working in various power plants, being the independent variable of the exploration, is plotted along the X-Axis. The cancer-infected employees out of the total working employees, being the dependent variable of the investigation, are plotted along the Y-Axis. The total number of employees working in power plant increases from 328 to 3874; the number of individuals infected by cancer increases from 33 to 878. Hence, an increasing linear trend has been obtained in the graph, i.e., with an increase in the number of workers in each power plant, the number of employees getting infected by cancer increases. The equation of trend obtained in the graph is shown below:

 

y = 0.2386x - 54.366

 

Here, x represents the total number of employees working in different power plants, and y represents cancer infected employees out of the entire working employees.

 

Despite having a very high value of the regression coefficient of 0.99, the data set itself questions the correlation's reliability because there is a vast gap in the total number of employees working in the nuclear power plant (independent variable) between 1600 and 3800. As the dependent variable's values for the corresponding range of independent variable are not available, the correlation cannot be said to be reliable.

 

Calculation of Regression Coefficient -

In the processed data table, total number of employees working in nuclear power plant is denoted by x, and the number of employees infected by cancer is denoted by y, and denotes the summation.

xy

x2

Y2

xy
32833107584108910824
34836121104129612528
38642148996176416212
45272204304518432544
45853209764280924274
67388452929774459224
7241025241761040473848
97717795452931329172929
1563301244296990601470463
3874878150078767708843401372
Σx = 9783Σy = 1782

Σx2 = 20174231

Σy2 = 923104

Σxy = 4274218

Figure 3 - Table On Processed Data For Calculation Of R2 For Group 1

Calculation:

 

\(r^2=\bigg[\frac{n(Σxy)-(Σx)(Σy)}{\sqrt{[nΣx^2-(Σx)^2][nΣy^2-(Σy)^2]}}\bigg]\)

 

\(=>r^2=\bigg[\frac{10(4274218)-(9783)(1782)}{\sqrt{[10×20174231-(9783)^2}][10×923104-(1782)^2]}\bigg]^2\)

 

=> r= (0.9987)= 0.9975

 

Calculation of Pearson’s Correlation Coefficient -

In the processed data table, total number of employees working in nuclear power plant is denoted by x, and the number of employees infected by cancer is denoted by y, \(\bar x\)  denotes the average number of workers those are working in nuclear power plant, \(\bar y\) denotes the average number of workers those are infected y cancer, and denotes the summation.

xy

\(x-\bar x\)

\(y-\bar y\)

\((x-\bar x)(y-\bar y)\)

\((x-\bar x)^2\)

\((y-\bar y)^2\)

32833-650.30-145.2094423.56422890.0921083.04
34836-630.30-142.2089628.66397278.0920220.84
38642-592.30-136.2080671.26350819.2918550.44
45272-526.30-106.2055893.06276991.6911278.44
45853-520.30-125.2065141.56270712.0915675.04
67388-305.30-90.2027538.0693208.098136.04
724102-254.30-76.2019377.6664668.495806.44
977177-1.30-1.201.561.691.44
1563301584.70122.8071801.16341874.0915079.84
38748782895.70699.802026410.868385078.49489720.04

Figure 4 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 1

Calculation -

 

\(\bar x=\frac{Σx}{N}=\frac{9783}{10}=978.3\)

 

\(\bar y=\frac{Σy}{N}=\frac{1782}{10}=178.2\)

 

\(Σ(x-\bar x)(y-\bar y)=2530887.40\)

 

\(Σ(x-\bar x)^2=10603522.10\)

 

\(Σ(y-\bar y)^2=605551.60\)

 

\(R=\frac{Σ(x-\bar x)(y-\bar y)}{\sqrt{Σ(x-\bar x)^2×Σ(y-\bar y)^2}}\)

 

\(R=\frac{2530887.40}{\sqrt{10603522.10×605551.60}}=0.998\)

 

Evaluation by T – Test -

In the calculation shown below, the total number of employees working in nuclear power plant is denoted by x, and the number of employees infected by cancer is denoted by y, \(\bar x\) denotes the average number of workers those are working in nuclear power plant, \(\bar y\) denotes the average number of workers those are infected y cancer, nx represents the number of observation of total number of working employee (independent variable), ny represents the number of observation of cancer infected employee (dependent variable) and S is an estimator of pooled variance which is defined as follows:

 

\(S=\frac{Σ(x-\bar x)^2+Σ(x-\bar y)^2}{n_x+n_y-2}\)

 

The mathematical formulation of T – Value is also shown below:

 

\(T\ value=\frac{|\bar x-\bar y|}{\sqrt{\frac{S^2}{n_x}+\frac{S^2}{n_y}}}\)

 

For calculation of T – Value required for this test, Table 1 has been followed:

 

\(\bar x=\frac{9783}{10}=978.3\)

 

\(\bar y=\frac{1782}{10}=178.2\)

 

\(S^2=\frac{Σ(x-\bar x)^2+Σ(x-\bar y)^2}{n_x+n_y-2}=178.2\)

 

\(=\frac{(328-978.3)^2+...+(3874-978.3)^2+(328-178.2)^2+...+(3874-178.2)^2}{10+10-2}\)

 

= 1533813.57

 

\(T\ value=\frac{|978.3-178.2|}{\sqrt{\frac{1533813.57}{10}+\frac{1533813.57}{10}}}=\frac{800.1}{553.86}=1.44\)

 

Comparing the T – Value with respect to the values in T – Table, it can be stated that the Alternate Hypothesis is true.

Case 2: for group 2 (45 years to 60 years)

Data Table:

NameTotalInfectedPercentage
Rochester City Project3333610.81
Chicago City Project3443811.05
San Diego City Project3785715.08
Newark City Project4629921.43
Texas City Project48610220.99
Dayton City Project62011418.39
Virginia City Project79714418.07
Utah City Project97116016.48
Boston City Project149729719.84
Austin City Project338879023.32

Figure 5 - Table On Total No. Of Employees Vs. No. Of Employees Infected (Gr2: 45 – 60 Years)

Sample Calculation:

 

Refer to the Sample Calculation shown for Table No. 1.

 

Graphical Analysis: