Sell your IB Docs (IA, EE, TOK, etc.) for $10 a pop!
Nail IB's App Icon
Mathematics AI SL
Mathematics AI SL
Sample Internal Assessment
Sample Internal Assessment

Skip to

Table of content
Rationale
Aim
Research question
Background information
Hypothesis
Data collection
Conclusion
Reflection
Bibliography

To what extent does the workforce size of U.S. Atomic Power Plants influence cancer rates among employees and retirees, categorized by age groups (30-45, 45-60, 60-75 years)?

To what extent does the workforce size of U.S. Atomic Power Plants influence cancer rates among employees and retirees, categorized by age groups (30-45, 45-60, 60-75 years)? Reading Time
10 mins Read
To what extent does the workforce size of U.S. Atomic Power Plants influence cancer rates among employees and retirees, categorized by age groups (30-45, 45-60, 60-75 years)? Word Count
1,946 Words
Candidate Name: N/A
Candidate Number: N/A
Session: N/A
Personal Code: N/A
Word count: 1,946

Table of content

Rationale

Being an inquirer and a creative thinker, I always aspired to contribute to society with the skill and knowledge I procure. I believe, real-life experience is something that genuinely motivates with an internal objective to persuade. I recently came across one of the most harmful diseases called cancer as one of my neighbours recently detected. He, being a worker at a nuclear power plant, doctors have assumed that leakage of radiation was one reason behind cancer. The statement claimed by the doctor has raised several curiosities in our mind. Does working in a nuclear power station causes cancer? Does the age of nuclear power plant employees increase the chance of getting infected by cancer? To derive the answers to the questions, I have done a few research. I have read a few research journals on cancer and medical science, which has enabled me to understand different cancer causative agents.

 

Understanding several causes of cancer, I have tried to explore the probability of getting infected by cancer based on one of the most significant nuclear power plant parameters, i.e., the number of working employees. To derive a correlation between the chances of getting infected in a nuclear power station based on the total number of employees, I have also researched different correlation coefficients to justify the derived correlation. In the process, I have learnt the use of Pearson’s Correlation Coefficient, which is an extension of the regression correlation coefficient that I have studied in the curriculum of IB.

 

After all of these researches, I have come to the research question of this exploration intending to find the chance of getting infected by cancer if a person is working in a nuclear power plant with a more significant number of employees than that of a nuclear power plant with less number of employee.

Aim

This exploration's prime objective is to derive a relationship on chances of getting infected by cancer for a worker of an Atomic Power Station and the total number of working professionally in the power station.

Research question

To what extent is there a correlation for three different age groups of individuals (Gr 1: 30 years to 45 years, Gr 2: 45 years to 60 years, and Gr 3: 60 years to 75 years) between the number of workers getting infected by Cancer during the period of their service as well as after retirement from job in different Atomic Power Plants in the United States of America and the total number of workers working in the Atomic Power Plant?

Background information

Atomic power plant

Atomic power plant uses the process of nuclear fission to generate energy. It is performed in nuclear reactors where heat is generated which is further used to generate electricity. During the process, several radiations, such as, α - rays, β - rays, γ - rays and many more are emitted. Amongst the mentioned rays, the most harmful radiation is the γ ray. Though many precautions are taken in atomic power plants to prevent leakage of radiations; however, cases of radiation leakage are observed which invariably affect human life and environment.

Regression correlation coefficient

Regression correlation coefficient provides information about the stability of any obtained correlation between a dependent variable and its corresponding independent variable. The magnitude of the coefficient lies between 0 and 1. Here, the correlation's maximum strength is denoted by 1, whereas, a minimum strength of correlation or no correlation is represented by 0. The mathematical formulation of the regression correlation coefficient for a linear trend is shown below:

 

\(r^2=\bigg[\frac{n\big(\sum xy\big)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-\big(\sum x\big)^2][n\sum y^2-\big(\sum y\big)^2}]}\bigg]^2\)

 

x = independent variable

 

y = dependent variable

 

r2 = regression correlation coefficient

 

n = number of observations

Pearson’s correlation coefficient

Pearson’s correlation coefficient provides information about the stability and the nature of any obtained correlation between a dependent variable and its corresponding independent variable. The magnitude of the coefficient lies between -1 and 1. Here, the maximum strength of the correlation is denoted by the value of ±1, whereas, a minimum strength of correlation or no correlation is represented by 0. A positive value of Pearson’s Coefficient signifies that the relationship is increasing in nature, and that of a negative value indicates that the relationship is decreasing in nature. The mathematical formulation of Pearson’s correlation coefficient for a linear trend is shown below:

 

\(R=\frac{\sum(x-\bar x)(y-\bar y)}{\sqrt{\sum(x-\bar x)^2\times\sum(y-\bar y)^2}}\)

 

x = independent variable

 

y = dependent variable

 

R = Pearson's correlation coefficient

 

\(\bar x\) = mean value of all observations of the independent variable

 

\(\bar y\) = mean value of all observations of the dependent variable

Exploration methodology

In this exploration, ten central atomic power stations in the United States of America are chosen. The total number of employees, currently working or have worked in each organisation, has been collected from three different age groups, as mentioned in the research question. The total number of workers infected by cancer during their tenure of service or after retirement is based on each age group and the atomic power station. To verify the collected data's stability, the percentage of infected employees for each nuclear power station has been calculated based on their organisation. Finally, the correlation between the number of infected employees of each age group and each power station has been plotted compared to the total number of employees working or worked in the corresponding power station. To verify the correlation, regression correlation coefficient and Pearson's correlation coefficient has been calculated, and the correlation is evaluated using T-Test.

Hypothesis

Null hypothesis

It is assumed that no correlation is obtained between the number of employees getting infected by Cancer during the period of their service as well as after retirement from the job in different Nuclear Power Plants in the United States of America and the total number of employees working in the Nuclear Power Plant.

Alternate hypothesis

It is assumed that a correlation is obtained between the number of employees getting infected by Cancer during the period of their service as well as after retirement from the job in different Nuclear Power Plants in the United States of America and the total number of employees working in the Nuclear Power Plant.

Data collection

Case 1: for group 1 (30 years to 45 years)

Data table:

Name
Total
Infected
Percentage
Rochester City Project
328
33
10.06
Chicago City Project
348
36
10.34
San Diego City Project
386
42
10.88
Newark City Project
452
72
1.593
Texas City Project
458
53
11.57
Dayton City Project
673
88
13.08
Virginia City Project
724
102
14.09
Utah City Project
977
177
18.12
Boston City Project
1563
301
19.26
Austin City Project
3874
878
22.66
Figure 1 - Table On Total No. Of Employees Vs. No. Of Employees Infected (Gr1: 30 – 45 Years)

Sample Calculation:

 

Percentage of Infected Worker in Rochester City Project = \(\frac{33}{328}\) = 10.06

 

Graphical Analysis:

Figure 2 - No Of Worker Infected Versus Total No. Of Employees (GR1: 30 - 45 Years)
Figure 2 - No Of Worker Infected Versus Total No. Of Employees (GR1: 30 - 45 Years)

Analysis of graph 1

The above graph represents the relationship between the number of employees aged between 30 and 45 who are infected by cancer during their tenure of service at different Nuclear Power Plants in the USA. The total number of employees working in various power plants, being the independent variable of the exploration, is plotted along the X-Axis. The cancer-infected employees out of the total working employees, being the dependent variable of the investigation, are plotted along the Y-Axis. The total number of employees working in power plant increases from 328 to 3874; the number of individuals infected by cancer increases from 33 to 878. Hence, an increasing linear trend has been obtained in the graph, i.e., with an increase in the number of workers in each power plant, the number of employees getting infected by cancer increases. The equation of trend obtained in the graph is shown below:

 

y = 0.2386x - 54.366

 

Here, x represents the total number of employees working in different power plants, and y represents cancer infected employees out of the entire working employees.

 

Despite having a very high value of the regression coefficient of 0.99, the data set itself questions the correlation's reliability because there is a vast gap in the total number of employees working in the nuclear power plant (independent variable) between 1600 and 3800. As the dependent variable's values for the corresponding range of independent variable are not available, the correlation cannot be said to be reliable.

 

Calculation of Regression Coefficient:

In the processed data table, total number of employees working in nuclear power plant is denoted by x, and the number of employees infected by cancer is denoted by y, and denotes the summation.

x
y

x2

Y2

xy
328
33
107584
1089
10824
348
36
121104
1296
12528
386
42
148996
1764
16212
452
72
204304
5184
32544
458
53
209764
2809
24274
673
88
452929
7744
59224
724
102
524176
10404
73848
977
177
954529
31329
172929
1563
301
2442969
90601
470463
3874
878
15007876
770884
3401372
Σx = 9783
Σy = 1782

Σx2 = 20174231

Σy2 = 923104

Σxy = 4274218

Figure 3 - Table On Processed Data For Calculation Of R2 For Group 1

Calculation:

 

\(r^2=\bigg[\frac{n(Σxy)-(Σx)(Σy)}{\sqrt{[nΣx^2-(Σx)^2][nΣy^2-(Σy)^2]}}\bigg]\)

 

\(=>r^2=\bigg[\frac{10(4274218)-(9783)(1782)}{\sqrt{[10×20174231-(9783)^2}][10×923104-(1782)^2]}\bigg]^2\)

 

=> r= (0.9987)= 0.9975

 

Calculation of Pearson’s Correlation Coefficient:

In the processed data table, total number of employees working in nuclear power plant is denoted by x, and the number of employees infected by cancer is denoted by y, \(\bar x\)  denotes the average number of workers those are working in nuclear power plant, \(\bar y\) denotes the average number of workers those are infected y cancer, and denotes the summation.

x
y

\(x-\bar x\)

\(y-\bar y\)

\((x-\bar x)(y-\bar y)\)

\((x-\bar x)^2\)

\((y-\bar y)^2\)

328
33
-650.30
-145.20
94423.56
422890.09
21083.04
348
36
-630.30
-142.20
89628.66
397278.09
20220.84
386
42
-592.30
-136.20
80671.26
350819.29
18550.44
452
72
-526.30
-106.20
55893.06
276991.69
11278.44
458
53
-520.30
-125.20
65141.56
270712.09
15675.04
673
88
-305.30
-90.20
27538.06
93208.09
8136.04
724
102
-254.30
-76.20
19377.66
64668.49
5806.44
977
177
-1.30
-1.20
1.56
1.69
1.44
1563
301
584.70
122.80
71801.16
341874.09
15079.84
3874
878
2895.70
699.80
2026410.86
8385078.49
489720.04
Figure 4 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 1

Calculation:

 

\(\bar x=\frac{Σx}{N}=\frac{9783}{10}=978.3\)

 

\(\bar y=\frac{Σy}{N}=\frac{1782}{10}=178.2\)

 

\(Σ(x-\bar x)(y-\bar y)=2530887.40\)

 

\(Σ(x-\bar x)^2=10603522.10\)

 

\(Σ(y-\bar y)^2=605551.60\)

 

\(R=\frac{Σ(x-\bar x)(y-\bar y)}{\sqrt{Σ(x-\bar x)^2×Σ(y-\bar y)^2}}\)

 

\(R=\frac{2530887.40}{\sqrt{10603522.10×605551.60}}=0.998\)

 

Evaluation by T – Test:

In the calculation shown below, the total number of employees working in nuclear power plant is denoted by x, and the number of employees infected by cancer is denoted by y, \(\bar x\) denotes the average number of workers those are working in nuclear power plant, \(\bar y\) denotes the average number of workers those are infected y cancer, nx represents the number of observation of total number of working employee (independent variable), ny represents the number of observation of cancer infected employee (dependent variable) and S is an estimator of pooled variance which is defined as follows:

 

\(S=\frac{Σ(x-\bar x)^2+Σ(x-\bar y)^2}{n_x+n_y-2}\)

 

The mathematical formulation of T – Value is also shown below:

 

\(T\ value=\frac{|\bar x-\bar y|}{\sqrt{\frac{S^2}{n_x}+\frac{S^2}{n_y}}}\)

 

For calculation of T – Value required for this test, Table 1 has been followed:

 

\(\bar x=\frac{9783}{10}=978.3\)

 

\(\bar y=\frac{1782}{10}=178.2\)

 

\(S^2=\frac{Σ(x-\bar x)^2+Σ(x-\bar y)^2}{n_x+n_y-2}=178.2\)

 

\(=\frac{(328-978.3)^2+...+(3874-978.3)^2+(328-178.2)^2+...+(3874-178.2)^2}{10+10-2}\)

 

= 1533813.57

 

\(T\ value=\frac{|978.3-178.2|}{\sqrt{\frac{1533813.57}{10}+\frac{1533813.57}{10}}}=\frac{800.1}{553.86}=1.44\)

 

Comparing the T – Value with respect to the values in T – Table, it can be stated that the Alternate Hypothesis is true.

Case 2: for group 2 (45 years to 60 years)

Data Table:

Name
Total
Infected
Percentage
Rochester City Project
333
36
10.81
Chicago City Project
344
38
11.05
San Diego City Project
378
57
15.08
Newark City Project
462
99
21.43
Texas City Project
486
102
20.99
Dayton City Project
620
114
18.39
Virginia City Project
797
144
18.07
Utah City Project
971
160
16.48
Boston City Project
1497
297
19.84
Austin City Project
3388
790
23.32
Figure 5 - Table On Total No. Of Employees Vs. No. Of Employees Infected (Gr2: 45 – 60 Years)

Sample Calculation:

 

Refer to the Sample Calculation shown for Table No. 1.

 

Graphical Analysis:

Figure 6 - Total No. Of Employees Vs. No. Of Employees Infected (Gr2: 45 – 60 Years)
Figure 6 - Total No. Of Employees Vs. No. Of Employees Infected (Gr2: 45 – 60 Years)

Analysis of Graph 2:

The above graph represents the relationship between several employees aged between 45 and 60 who are infected by cancer during their tenure of service at different Nuclear Power Plants in the USA. The total number of employees working in various power plants, being the independent variable of the exploration, is plotted along with the X-Axis and cancer infected employees out of the total working employees, being the dependent variable of the investigation, is plotted along the Y-Axis. The total number of employees working in power plant increases from 333 to 3388; the number of individuals infected by cancer increases from 36 to 790. Hence, an increasing linear trend has been obtained in the graph, i.e., with an increase in the number of workers in each power plant, the number of employees getting infected by cancer increases. The equation of trend obtained in the graph is shown below: y = 0.2401x - 38.697 Here, x represents the total number of employees working in different power plants, and y represents cancer infected employees out of the entire working employees.

 

Despite having a very high value of the regression coefficient of 0.99, the data set itself questions the correlation's reliability because there is a vast gap in the total number of employees working in the nuclear power plant (independent variable) between 1500 and 3400. As the dependent variable's values for the corresponding range of independent variable are not available, the correlation cannot be said to be reliable.

 

Calculation of Regression Coefficient:

In the processed data table, total number of employees working in nuclear power plant is denoted by x, and the number of employees infected by cancer is denoted by y, and denotes the summation.

x
y

x2

y2

xy
333
36
110889
1296
11988
344
38
118336
1444
13072
378
57
142884
3249
21546
462
99
213444
9801
45738
486
102
236196
10404
49572
620
114
384400
12996
70680
797
144
635209
20736
114768
971
160
942841
25600
155360
1497
297
2241009
88209
444609
3388
790
11478544
624100
2676520
Σx = 9276
Σy = 1837

Σx2 = 16503752

Σy= 797835

Σxy = 3603853

Figure 7 - Table On Processed Data For Calculation Of r2 For Group 2

Calculation:

 

r= 0.9929

 

For calculation, refer to the calculation of regression coefficient as shown in Case 1.

 

Calculation of Pearson’s Correlation Coefficient:

In the processed data table, total number of employees working in nuclear power plant is denoted by x, and the number of employees infected by cancer is denoted by y, \(\bar x \) denotes the average number of workers those are working in nuclear power plant,\(​​\bar y\) denotes the average number of workers those are infected y cancer, and denotes the summation.

x
y

\(x-\bar x\)

\(y-\bar y\)

\((x-\bar x)(y-\bar y)\)

\((x-\bar x)^2\)

\((y-\bar y)^2\)

333
36
-594.6
-151.7
90200.82
353549.16
23012.89
344
38
-583.6
-149.7
87364.92
340588.96
22410.09
378
57
-549.6
-130.7
71832.72
302060.16
17082.49
462
99
-465.6
-88.7
41298.72
216783.36
7867.69
486
102
-441.6
-85.7
37845.12
195010.56
7344.49
620
114
-307.6
-73.7
22670.12
94617.76
5431.69
797
144
-130.6
-43.7
5707.22
17056.36
1909.69
971
160
43.4
-27.7
-1202.18
1883.56
767.29
1497
297
569.4
109.3
62235.42
324216.36
11946.49
3388
790
2460.4
602.3
1481898.92
6053568.16
362765.29
Figure 8 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 2

Calculation:

 

R = 0.996

 

For calculation, refer to the calculation of Pearson’s coefficient shown for Case 1.

 

Evaluation by T – Test:

In the calculation shown below, the total number of employees working in nuclear power plant is denoted by x, and the number of employees infected by cancer is denoted by y, \(\bar x\) denotes the average number of workers those are working in nuclear power plant, \(\bar y\) denotes the average number of workers those are infected y cancer, nx represents the number of observation of total number of working employee (independent variable), ny represents the number of observation of cancer infected employee (dependent variable) and S is an estimator of pooled variance which is defined as follows:

 

\(S=\frac{Σ(x-\bar x)^2+Σ(x-\bar y)^2}{n_x+n_y-2}\)

 

The mathematical formulation of T – Value is also shown below:

 

\(T\ value=\frac{|\bar x-\bar y|}{\sqrt{\frac{S^2}{n_x}+\frac{S^2}{n_y}}}\)

 

For calculation of T – Value required for this test, Table 4 has been followed:

 

T - value = 1.45

 

For calculation, refer to the calculation of T – value as shown in Case 1.

 

Comparing the T – Value with respect to the values in T – Table, it can be stated that the Alternate Hypothesis is true.

Case 3: for group 3 (60 years to 75 years)

Data Table:

Name
Total
Infected
Percentage
Rochester City Project
290
45
15.52
Chicago City Project
299
53
17.73
San Diego City Project
302
66
21.85
Newark City Project
402
135
33.58
Texas City Project
435
137
31.49
Dayton City Project
544
106
19.49
Virginia City Project
643
188
29.24
Utah City Project
878
191
21.75
Boston City Project
1271
399
31.39
Austin City Project
2893
983
33.98
Figure 9 - Table On Total No. Of Employees Vs. No. Of Employees Infected (Gr3: 60 – 75 Years)

Sample Calculation:

 

Refer to the Sample Calculation shown for Table No. 1.

 

Graphical Analysis:

Figure 10 - Total No. Of Employees Vs. No. Of Employees Infected (Gr3: 60 – 75 Years)
Figure 10 - Total No. Of Employees Vs. No. Of Employees Infected (Gr3: 60 – 75 Years)

Analysis of Graph 3:

The above graph represents the relationship between several employees aged between 60 and 75 who are infected by cancer during their tenure of service at different Nuclear Power Plants in the USA. The total number of employees working in various power plants, being the independent variable of the exploration, is plotted along with the X-Axis and cancer infected employees out of the total working employees, being the dependent variable of the investigation, is plotted along the Y-Axis. The total number of employees working in power plant increases from 289 to 2894; the number of individuals infected by cancer increases from 46 to 982, respectively. Hence, an increasing linear trend has been obtained in the graph, i.e., with an increase in the number of workers in each power plant, the number of employees getting infected by cancer increases. The equation of trend obtained in the graph is shown below:

 

y = 0.352x - 50.422

 

Here, x represents the total number of employees working in different power plants, and y represents cancer infected employees out of the entire working employees.

 

Despite having a very high value of the regression coefficient of 0.98, the data set itself questions the correlation's reliability. There is a vast gap in the total number of employees working in the nuclear power plant (independent variable) between 1400 to 2700. As the dependent variable's values for the corresponding range of independent variable are not available, the correlation cannot be said to be reliable.

 

Calculation of Regression Coefficient:

In the processed data table, total number of employees working in nuclear power plant is denoted by x, and the number of employees infected by cancer is denoted by y, and denotes the summation.

x
y

x2

y2

xy
290
45
84100
2025
13050
299
53
89401
2809
15847
302
66
91204
4356
19932
402
135
161604
18225
54270
435
137
189225
18769
59595
544
106
295936
11236
57664
643
188
413449
35344
120884
878
191
770884
36481
167698
1271
399
1615441
159201
507129
2893
983
8369449
966289
2843819
Σx = 7957
Σy = 2303

Σx2 = 12080693

Σy2 = 1254735

Σxy = 3859888

Figure 11 - Table On Processed Data For Calculation Of r2 For Group 3

Calculation:

 

r= 0.987

 

For calculation, refer to the calculation of regression coefficient as shown in Case 1.

 

Calculation of Pearson’s Correlation Coefficient:

In the processed data table, total number of employees working in nuclear power plant is denoted by x, and the number of employees infected by cancer is denoted by y, \(\bar x\) denotes the average number of workers those are working in nuclear power plant, \(\bar y\) denotes the average number of workers those are infected y cancer, and denotes the summation.

x
y

\(x-\bar x\)

\(y-\bar y\)

\((x-\bar x)(y-\bar y)\)

\((x-\bar x)^2\)

\((y-\bar y)^2\)

290
45
-505.70
-185.30
93706.21
255732.49
34336.09
299
53
-496.70
-177.30
88064.91
246710.89
31435.29
302
66
-493.70
-164.30
81114.91
243739.69
26994.49
402
135
-393.70
-95.30
37519.61
154999.69
9082.09
435
137
-360.70
-93.30
33653.31
130104.49
8704.89
544
106
-251.70
-124.30
31286.31
63352.89
15450.49
643
188
-152.70
-42.30
6459.21
23317.29
1789.29
878
191
82.30
-39.30
-3234.39
6773.29
1544.49
1271
399
475.30
168.70
80183.11
225910.09
28459.69
2893
983
2097.30
752.70
1578637.71
4398667.29
566557.29
Figure 12 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 3

Calculation:

 

R = 0.993

 

For calculation, refer to the calculation of Pearson’s coefficient as shown in Case 1.

 

Evaluation by T – Test:

In the calculation shown below, the total number of employees working in nuclear power plant is denoted by x, and the number of employees infected by cancer is denoted by y, \(\bar x\) denotes the average number of workers those are working in nuclear power plant, \(\bar y\) denotes the average number of workers those are infected y cancer, nx represents the number of observation of total number of working employee (independent variable), ny represents the number of observation of cancer infected employee (dependent variable) and S is an estimator of pooled variance which is defined as follows:

 

\(S=\frac{Σ(x-\bar x)^2+Σ(x-\bar y)^2}{n_x+n_y-2}\)

 

The mathematical formulation of T Value is also shown below:

 

\(T\ value=\frac{|\bar x-\bar y|}{\frac{S^2}{n_x}+\frac{S^2}{n_y}}\)

 

For calculation of T Value required for this test, Table 4 has been followed:

 

T - value = 1.43

 

For calculation, refer to the calculation of T – value as shown in Case 1.

 

Comparing the T – Value with respect to the values in T – Table, it can be stated that the Alternate Hypothesis is true.

Conclusion

To what extent is there a correlation for three different age groups of individuals (Gr 1: 30 years to 45 years, Gr 2: 45 years to 60 years, and Gr 3: 60 years to 75 years) between the number of employees getting infected by Cancer during the period of their service as well as after retirement from job in different Nuclear Power Plants in the United States of America and the total number of employees working in the Nuclear Power Plant?

 

A linear and increasing trend has been obtained between the number of employees getting infected by Cancer during the period of their service as well as after retirement from job in different Nuclear Power Plants in the United States of America and the total number of employees working in the Nuclear Power Plant for all the age three groups.

 

  • For Group 1, as the total number of employees working in power plant increases from 328 to 3874, the number of individuals infected by cancer increases from 33 to 878 respectively.
  • The equation of trend for Group 1 is expressed as: y = 0.2386x - 54.366 where, x represents total number of employees working in different power plants, and y represents cancer infected employees out of the total working employees.
  • As the value of regression coefficient and the Pearson’s correlation coefficient for correlation in Group 1 are very high (= 0.99) and (= 0.99) respectively, i.e., very close to 1, the correlation can be stated to be existent and valid.
  • Alternate Hypothesis has been established for Group 1 using T Test.
  • For Group 2, as the total number of employees working in power plant increases from 333 to 3388, the number of individuals infected by cancer increases from 36 to 790 respectively.
  • The equation of trend for Group 2 is expressed as: = 0.2401- 38.697 where, x represents total number of employees working in different power plants, and y represents cancer infected employees out of the total working employees.
  • As the value of regression coefficient and the Pearson’s correlation coefficient for correlation in Group 2 are very high (= 0.99) and (= 0.99) respectively, i.e., very close to 1, the correlation can be stated to be existent and valid.
  • Alternate Hypothesis has been established for Group 2 using T – Test.
  • For Group 3, as the total number of employees working in power plant increases from 289 to 2894, the number of individuals infected by cancer increases from 46 to 982 respectively.
  • The equation of trend for Group 3 is expressed as: y = 0.352x - 50.422 where, x represents total number of employees working in different power plants, and y represents cancer infected employees out of the total working employees.
  • As the value of regression coefficient and the Pearson’s correlation coefficient for correlation in Group 3 are very high (= 0.98) and (= 0.99) respectively, i.e., very close to 1, the correlation can be stated to be existent and valid.
  • Alternate Hypothesis has been established for Group 3 using T – Test.

Reflection

Strength

  • Use of two different correlation coefficient in mathematical exploration has justified the validity of the correlation. Moreover, Pearson’s coefficient has enabled the investigation to mathematically conclude the nature of the correlation (increasing or decreasing).
  • Age groups have been made considering an equal interval of 15 years. It has useful to maintain a regularity throughout the exploration.
  • Calculation of percentage of the infected individual has enabled the exploration to verify the data's reliability. In this exploration, as the number of employees working in different power plants could vary significantly based on the size of the manufacturing unit, calculation of standard deviation will not indicate the reliability of data for each age group.
  • Apart from graphical derivation, T-Test has mathematically concluded the correlation, which improves the correlation's strength, hence the exploration.

Weakness

  • The data collected of the total number of employees and cancer infected employees are gathered from different sources like news articles, newspaper surveys, official websites of various nuclear power plants, and many more. Though the data are collected from authentic sources, however, the reliability of the data cannot be determined.

Future scope

  • Cancer is one of the very few diseases which cannot be claimed to be cured completely. As a result, mathematics can determine the relationship between chances of getting infected by cancer and presence of different causative agents. Hence, the same methodology as followed in this exploration could be repeated to explore the effect of other causative agents of cancer. Thus, another research question could be framed as follows: “To what extent is there a correlation for three different age groups of individuals (Gr 1: 30 years to 45 years, Gr 2: 45 years to 60 years, and Gr 3: 60 years to 75 years) between the number of traffic police employee getting infected by Cancer during the period of their service as well as after retirement from job cities in India and the carbon dioxide index of atmosphere in the respective cities?”

Bibliography