   Mathematics AI SL
Sample Internal Assessment

Table of content
Rationale
Aim
Research question
Introduction
Hypothesis
Data collection
Calculation of correlation coefficient
Evaluation of hypothesis
Conclusion
Reflection
Bibliography

# To what extent is there a co-relation between total number of employees working in nuclear power plant and number of employees getting infected by cancer for three age groups? 6/7  0 Words
Candidate Name: N/A
Candidate Number: N/A
Session: N/A
Personal Code: N/A
Word count: 0

## Rationale

"Although September 11 was horrible, it didn't threaten the survival of the human race, like nuclear weapons do." – Stephen Hawking. Despite the heinous terrorism happens across the globe, the destructive power of nuclear bomb has left a scar and fear in every individual of this world.

Since childhood, I have been listening to the nuisance that happened in Hiroshima and Nagasaki in the year 1945 marking the end of World War II. The destructive capacity of nuclear energy was very clear and prominent to me since the early school days.

It was secondary education when the picture of nuclear power and nuclear energy began to change in front of me when I studied about the nuclear energy as a non-conventional source of energy. Despite initial doubts and queries which arose due to the childhood stories of catastrophism, I came across a fact that it is the nuclear energy which is the face of change in providing energy to the mankind. The amount of energy that could be produced by nuclear reaction is unparallel to any other source of energy.

With passing days, the course curriculum became more intense and I started learning in depth concepts of nuclear energy. About two years from now, I have studied that during the nuclear fission reaction which generates the nuclear energy in Physics. Sooner or later, I felt a deep inclination towards the subject. Due to the highly constructive facility that it provides to the mankind, I thought of pursuing higher studies in Nuclear Energy and working in Nuclear Power Station.

However, currently in Biology, I studied about the disease named cancer. Some of the facts have shattered my dream of pursuing a job in nuclear power plant. In the curriculum, I studied that γ - ray causes cancer. The subtle fear which was developed regarding the devastating effects of nuclear energy again filled into my mind because in nuclear fission reaction, the reaction using which nuclear energy is generated, γ - rays are emitted.

To remove the fear and to concentrate on the career, I started doing a few researches. I read a few journals on side – effects of nuclear energy. There were several instances of an increased chance of getting affected by cancer if an individual is exposed to the harmful γ - ray. However, I came across a lot of articles where the preventive measures were discussed which were taken in every nuclear power plant to protect their employees from radiation. To be more confident on this, I read a lot of news journals and articles from which I came across the fact that employees working in nuclear power plant are often getting affected by cancer. However, I could not find any information on the chances of getting affected by cancer for a nuclear plant employee.

To find the answer, I am working on this mathematical exploration so that I can derive some relation on chances of affected by cancer if I pursue my dream job.

## Aim

The main motive of this investigation is to explore the correlation between the number of employees working in a nuclear power plant and the number of employees getting affected by cancer.

## Research question

What is the relationship between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr 2: 60 years to 70 years and Gr 3: 70 years and 80 years?

## Introduction

### What is cancer

Cancer 1 is a disease which is characterized by uncontrolled cell division. It results in repetitive division of cell which often causes formation of tumor, cyst, fibroid etc. However, tumors are categorized into two types – Benign and Malignant; Malignant tumors are considered to be cancerous. Cells of malignant tumor or cancerous cells can spread throughout the body through the blood stream and initiate the formation of tumor in any other part of the body. This results in development of pressure on vital organs on where the tumor has originated which leads to organ failure. Tumor also constricts blood vessels at its vicinity resulting in increased heart rate and blood pressure eventually increasing the chances of stroke or heart fail.

### What causes cancer

There are several causative agents which triggers the cells to divide at an uncontrolled manner. However, in context of this mathematical exploration, radiation is one of the reasons responsible for causing cancer. Radiations like gamma rays, X – rays, etc. are considered to be one of the most eminent causative agents of cancer. These radiations have sufficient ionization energy to trigger the mutagen present in human DNA. On activation of mutagen of any cell, the cell began to divide continuously without maintaining the cell cycle which leads to formation of malignant or cancerous tumor.

From several news reports and scientific research, it is now a clear statement that due to increased emission of greenhouse gas, depletion of ozone layer has caused the harmful ultra violet rays to pass through the Earth’s atmosphere. As a result, cases of skin cancer have increased invariably in the world. This signifies the effect of radiation in causing cancer.

### Nuclear power plant

Nuclear power station 4 or nuclear power plant is a power plant which generates energy by nuclear fission reaction. Nuclear fission reaction is performed in a nuclear reactor in which the heat generated by the nuclear reaction is used to convert water into steam. The steam, thus generated is used to run a turbine which generates electricity.

The nuclear fission reaction is accompanied by emission of radiations, such as, α - rays, β - rays, γ - rays etc. Out of which, γ - ray is considered to be the most harmful radiation. The nuclear reactor is constructed in such a way that the leakage of radiation is assured to be null. However, a number of preventive measures in respect to dresses, medical check – up, etc. of employees working in nuclear power plants are taken into consideration. Despite such preventive measures, instances have been noted where radiation has been leaked which has caused severe illness not only to the employees but also to the individuals living in the nearby areas of the power plant. This is because, γ - ray can pass through even inches of metal sheet like lead.

### Regression correlation coefficient

Regression correlation coefficient is a tool to measure the strength of the correlation between the independent variable and the dependent variable. The set of values , , are used to find the value of r as stated by the formula below:

In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and denotes the sum of all the observation of the mentioned variable.

By squaring the value of r, the value of the regression coefficient (r2) will be achieved. The value of r2 lies between 0 and 1 where 1 signifies maximum correlation whereas 0 signifies null correlation.

### Pearson’s correlation coefficient

Pearson’s correlation coefficient is a tool to measure the strength of the correlation and also the nature of correlation between the independent variable and the dependent variable. The set of values , , are used to find the value of as stated by the formula below:

In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, is the arithmetic mean of all the observations of the independent variable, is the arithmetic mean of all the observations of the dependent variable and denotes the sum of all the observation of the mentioned variable.

The value of lies between -1 and 1. A positive value of Pearson’s correlation coefficient implies a direct relationship the independent and the dependent variable whereas, a negative value of Pearson’s correlation coefficient implies a indirect relationship the independent and the dependent variable. If the value of the correlation coefficient is close of 1 or -1, it signifies the correlation exists true. On the other hand, if the value of the correlation coefficient is close to 0, it signifies the correlation does not exist.

### Chi squared test

Chi squared test  is a kind of analysis which predicts the existence of any correlation between an independent variable and a dependent variable. The Chi squared value of any given set of data is firstly calculated. Now, based on the type of data, for example, paired data or independent data, the Chi squared value is checked in the Chi squared table which further predicts the existence of any correlation.

The formula of Chi squared value is given below:

Here, is the observed value, is the expected value, denotes the sum of all the observation of the mentioned variable.

df
0.995
0.99
0.975
0.95
0.90
0.10
0.05
0.025
0.01
0.005
1
---
---
0.001
0.004
0.016
2.706
3.841
5.024
6.635
7.879
2
0.010
0.020
0.051
0.103
0.211
4.605
5.991
7.378
9.210
10.597
3
0.072
0.115
0.216
0.352
0.584
6.251
7.815
9.348
11.345
12.838
4
0.207
0.297
0.484
0.711
1.064
7.779
9.488
11.143
13.277
14.860
5
0.412
0.554
0.831
1.145
1.610
9.236
11.070
12.833
15.086
16.750
6
0.676
0.872
1.237
1.635
2.204
10.645
12.592
14.449
16.812
18.548
7
0.989
1.239
1.690
2.167
2.833
12.017
14.067
16.013
18.475
20.278
8
1.344
1.646
2.180
2.733
3.490
13.362
15.507
17.535
20.090
21.955
9
1.735
2.088
2.700
3.325
4.168
14.684
16.919
19.023
21.666
23.589
10
2.156
2.558
3.247
3.940
4.865
15.987
18.307
20.483
23.209
25.188
11
2.603
3.053
3.816
4.575
5.578
17.275
19.675
21.920
24.725
26.757
12
3.074
3.571
4.404
5.226
6.304
18.549
21.026
23.337
26.217
28.300
13
3.565
4.107
5.009
5.892
7.042
19.812
22.362
24.736
27.688
29.819
14
4.075
4.660
5.629
6.571
7.790
21.064
23.685
26.119
29.141
31.319
15
4.601
5.229
6.262
7.261
8.547
22.307
24.996
27.488
30.578
32.801
16
5.142
5.812
6.908
7.962
9.312
23.542
26.296
28.845
32.000
34.267
17
5.697
6.408
7.564
8.672
10.085
24.769
27.587
30.191
33.409
35.718
18
6.265
7.015
8.231
9.390
10.865
25.989
28.869
31.526
34.805
37.156
19
6.844
7.633
8.907
10.117
11.651
27.204
30.144
32.852
36.191
38.582
20
7.434
8.260
9.591
10.851
12.443
28.412
31.410
34.170
37.566
39.997
21
8.034
8.897
10.283
11.591
13.240
29.615
32.671
35.479
38.932
41.401
22
8.643
9.542
10.982
12.338
14.041
30.813
33.924
36.781
40.289
42.796
23
9.260
10.196
11.689
13.091
14.848
32.007
35.172
38.076
41.638
44.181
24
9.886
10.856
12.401
13.848
15.659
33.196
36.415
39.364
42.980
45.559
25
10.520
11.524
13.120
14.611
16.473
34.382
37.652
40.646
44.314
46.928
26
11.160
12.198
13.844
15.379
17.292
35.563
38.885
41.923
45.642
48.290
27
11.808
12.879
14.573
16.151
18.114
36.741
40.113
43.195
46.963
49.645
28
12.461
13.565
15.308
16.928
18.939
37.916
41.337
44.461
48.278
50.993
29
13.121
14.256
16.047
17.708
19.768
39.087
42.557
45.722
49.588
52.336
30
13.787
14.953
16.791
18.493
20.599
40.256
43.773
46.979
50.892
53.672
40
20.707
22.164
24.433
26.509
29.051
51.805
55.758
59.342
63.691
66.766
50
27.991
29.707
32.357
34.764
37.689
63.167
67.505
71.420
76.154
79.490
60
35.534
37.485
40.482
43.188
46.459
74.397
79.082
83.298
88.379
91.952
70
43.275
45.442
48.758
51.739
55.329
85.527
90.531
95.023
100.425
104.215
80
51.172
53.540
57.153
60.391
64.278
96.578
101.879
106.629
112.329
116.321
90
59.196
61.754
65.647
69.126
73.291
107.565
113.145
118.136
124.116
128.299
100
67.328
70.065
74.222
77.929
82.358
118.498
124.342
129.561
135.807
140.169
Figure 1 - Table On The Chi Squared Table Is Shown Below

## Hypothesis

### Null hypothesis

It is assumed that there does not exist any correlation between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr2: 60 years to 70 years and Gr3: 70 years and 80 years.

### Alternate hypothesis

It is assumed that there is a correlation between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr2: 60 years to 70 years and Gr3: 70 years and 80 years.

## Data collection

### Source of data

A data sheet has been prepared based on several news articles, reports and surveys in different nuclear power plant across the globe. It has been possible to record the data of number of employees got infected by cancer during their tenure of service because of the health insurance policy that the company offers to all its employees. Similarly, the health status of the retired employees has been achieved from the health benefit that the company offers even after retirement.

### Justification on categorization of age groups

The employees working in nuclear power plant has been categorized into three groups to illustrate the correlation in a proper and intensive way. It has been studied that immunity against cancer is more in young age than that of the elder. However, there are lot of exceptions; mutagen is activated in elder people with very less exposition to radiations than that of others. On the other hand, it has been observed that an individual at a young age has been exposed to cancer causing radiation, however, the cancer has been observed at a very later period of his life. Thus, considering the strength of immunity in an individual, the age groups are made accordingly.

### Raw Data Table

Name
Total
Infected
Byron Nuclear Power Station
329
34
Peach Bottom Atomic Power Station
347
37
Oconee Nuclear Station
387
47
Braidwood Generating Station
451
71
South Texas Project Electric Generating Station
459
52
Susquehanna Nuclear Power Plant
674
89
Mcguire Nuclear Power Plant
725
103
Browns Ferry Nuclear Plant
978
178
Palo Verde Generation Station
1564
302
Vogtle Nuclear Power Station
3875
879
Figure 2 - Table On Total No. of Employees vs. No. of Employees Infected (Gr1: 50 – 60 Years)
Name
Total
Infected
Byron Nuclear Power Station
334
37
Peach Bottom Atomic Power Station
345
38
Oconee Nuclear Station
379
58
Braidwood Generating Station
463
98
South Texas Project Electric Generating Station
487
103
Susquehanna Nuclear Power Plant
621
115
Mcguire Nuclear Power Plant
798
145
Browns Ferry Nuclear Plant
970
161
Palo Verde Generation Station
1498
298
Vogtle Nuclear Power Station
3389
789
Figure 3 - Table On Total No. of Employees vs. No. of Employees Infected (Gr2: 60 – 70 years):
Name
Total
Infected
Byron Nuclear Power Station
289
46
Peach Bottom Atomic Power Station
297
52
Oconee Nuclear Station
303
67
Braidwood Generating Station
401
132
South Texas Project Electric Generating Station
432
136
Susquehanna Nuclear Power Plant
543
105
Mcguire Nuclear Power Plant
641
187
Browns Ferry Nuclear Plant
879
190
Palo Verde Generation Station
1273
398
Vogtle Nuclear Power Station
2894
982
Figure 4 - Table On Total No. of Employees vs. No. of Employees Infected (Gr3: 70 – 80 Years):

### Processed data table

Total No. of Employees
Infected Employees
Percentage
329
34
10.33
347
37
10.66
387
47
12.14
451
71
15.74
459
52
11.32
674
89
13.20
725
103
14.20
978
178
18.20
1564
302
19.30
3875
879
22.68
Figure 5 - Table On Processed Data Table For Gr. 1
Total No. of Employees
Infected Employees
Percentage
334
37
11.08
345
38
11.01
379
58
15.30
463
98
21.17
487
103
21.15
621
115
18.52
798
145
18.17
970
161
16.60
1498
298
19.89
3389
789
23.28
Figure 6 - Table On Processed Data Table For Gr. 2
Total No. of Employees
Infected Employees
Percentage
289
46
15.92
297
52
17.51
303
67
22.11
401
132
32.92
432
136
31.48
543
105
19.34
641
187
29.17
879
190
21.62
1273
398
31.26
2894
982
33.93
Figure 7 - Table On Processed Data Table For Gr. 3

Sample Calculation

Percentage of Infected Employee =

### Analysis of processed data table

In Table 4 to Table 6, percentage of employee who were getting infected by cancer out of the total number of employees have been found. As the interval in total number of employees (independent variable) is not regular, the mean value and standard deviation will not serve any purpose in analyzing the data. Rather, the number of employees infected by cancer is completely depending upon the total number of employees working in that particular power plant. Thus, percentage has been calculated.

In table 4, it has been observed that the percentage of employees infected by cancer is ranging between 10% and 23%. However, it is noticed that number of infected employees is increasing with the total number of employees working in a power plant. Similarly, in table 5, the percentage of infected employee is ranging between 11% and 24% with 11% infected being in the power plant with least number of working employees and 24% being the maximum number of employees working. In table 6, as the age group is between 70 years and 80 years, it can be assumed that the total number of employees who worked for the power plants may have decreased due to death rates in the age. Thus, the total number of employees currently alive is less than that of the other groups. On the other hand, the percentage of infected employees has also increased over the other groups, ranging between 15% and 34% with 15% infected being in the power plant with least number of working employees and 34% being the maximum number of employees working.

## Graphical analysis Figure 8 - Total No. of Employees vs. No. of Employees Infected (Gr1: 50 – 60 Years) Figure 9 - Total No. of Employees vs. No. of Employees Infected (Gr2: 60 – 70 Years) Figure 10 - Total No. of Employees vs. No. of Employees Infected (Gr3: 70 – 80 Years)

### Choice of Axes

The X – Axis of the graph denotes the total number of employees working or worked in nuclear power plants (independent variable).

The Y – Axis of the graph denotes number of employees who are currently infected by cancer (dependent variable).

### Trendline for linear correlation

In all the graphs from no. 1 to no. 3, a linear trendline has been obtained using the data that has been collected from the official websites of the nuclear power plants, newspapers, journals, articles etc.

In graph 1, the equation of trendline is:

y = 0.2386x - 54.366

In graph 2, the equation of trendline is:

y = 0.2401x - 38.697

In graph 3, the equation of trendline is:

y = 0.352x - 50.422

From the graphs, it can be stated that, there exists a positive increasing correlation between the number of employees getting infected by cancer and the total number of employees either currently working or worked in the nuclear power plants. However, a few outliers have been noticed in the graphs as well.

### Outliers

There are a few outliers when the total number of employees are in the range of 500 to 750. Due to presence of very less number of outliers, the value of regression coefficient is 0.99. Such a high value (close to one) of regression coefficient satisfies the existence of any linear correlation between the dependent and the independent variable.

### Intercept for linear correlation

From the equation of the trendline of graph 1, the Y – intercept of the trendline has been calculated:

The value of y for x = 0 will be:

From the equation of the trendline of graph 2, the Y – intercept of the trendline has been calculated:

The value of y for x = 0 will be:

From the equation of the trendline of graph 3, the Y – intercept of the trendline has been calculated:

The value of y for x = 0 will be:

The value of Y – Intercept is -54.366, -38.697, and -50.422 for graph 1, graph 2 and graph 3 respectively. A negative intercept is suggests that if the total number of employee is zero, then the number of infected individual should be negative. However, literally it cannot be possible; mathematical significance of the statement is if the total number of employee is considered to be null, there will not be any infected patient as well; this justifies the fact of increase in the rate of cancer through nuclear power plant.

From the equation of the trendline of graph 1, the X – intercept of the trendline has been calculated:

The value of x for y = 0 will be:

From the equation of the trendline of graph 2, the Y – intercept of the trendline has been calculated:

The value of x for y = 0 will be:

From the equation of the trendline of graph 3, the Y – intercept of the trendline has been calculated:

The value of x for y = 0 will be:

The value of X – Intercept is 228, 161, and 143 approximately for graph 1, graph 2 and graph 3 respectively. The mathematical significance of the statement is if the total number of employee is 228 for age group 50 years to 60 years, 161 for age group 60 to 70 years and 143 for age group 70 to 80 years, there will not be any infected patient.

## Calculation of correlation coefficient

### Calculation of regression correlation coefficient

Processed Data for calculation of R2:

There are five headers of the processed data tables expressed as x, y, x2, y2, xy. The total number of employees is represented by x and the number of employee infected by cancer is represented by y. The remaining headers has usual meaning. The calculation of R2 correlation coefficient is shown explore the efficiency and stability of the trendline and the correlation.

329
34
108241
1156
11186
347
37
120409
1369
12839
387
47
149769
2209
18189
451
71
203401
5041
32021
459
52
210681
2704
23868
674
89
454276
7921
59986
725
103
525625
10609
74675
978
178
956484
31684
174084
1564
302
2446096
91204
472328
3875
879
15015625
772641
3406125
Figure 11 - Table On Processed Data For Calculation Of R2 For Group 1
334
37
111556
1369
12358
345
38
119025
1444
13110
379
58
143641
3364
21982
463
98
214369
9604
45374
487
103
237169
10609
50161
621
115
385641
13225
71415
798
145
636804
21025
115710
970
161
940900
25921
156170
1498
298
2244004
88804
446404
3389
789
11485321
622521
2673921
Figure 12 - Table On Processed Data For Calculation Of R 2 For Group 2
289
46
83521
2116
13294
297
52
88209
2704
15444
303
67
91809
4489
20301
401
132
160801
17424
52932
432
136
186624
18496
58752
543
105
294849
11025
57015
641
187
410881
34969
119867
879
190
772641
36100
167010
1273
398
1620529
158404
506654
2894
982
8375236
964324
2841908
Figure 13 - Table On Processed Data For Calculation Of R2 For Group 3

The formula of regression coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and denotes the sum of all the observation of the mentioned variable.

Calculation for Group 1:

Calculation for Group 2:

Calculation for Group 3:

### Analysis

The value of regression coefficient is 0.9975, 0.9929, and 0.987 for group 1, group 2 and group 3 respectively. Such a high value (close to one) of regression coefficient satisfies the existence of any linear correlation between the dependent and the independent variable.

### Calculation of pearson’s correlation coefficient

Processed Data Table for calculation of Pearson’s Correlation:

There are seven headers of the processed data table for calculation of Pearson’s correlation coefficient expressed as, x, y,, , , and . The total number of employees is represented by x and the number of employees infected by cancer is represented by y, is the arithmetic mean of all the observations of total number of employees, is the arithmetic mean of all the observations of the number of employees infected by cancer. The remaining headers has usual meaning. The calculation of Pearson’s correlation coefficient is shown to explore the efficiency and stability of the trendline and the correlation.

329
34
-649.9
-145.2
94365.48
422370.01
21083.04
347
37
-631.9
-142.2
89856.18
399297.61
20220.84
387
47
-591.9
-132.2
78249.18
350345.61
17476.84
451
71
-527.9
-108.2
57118.78
278678.41
11707.24
459
52
-519.9
-127.2
66131.28
270296.01
16179.84
674
89
-304.9
-90.2
27501.98
92964.01
8136.04
725
103
-253.9
-76.2
19347.18
64465.21
5806.44
978
178
-0.9
-1.2
1.08
0.81
1.44
1564
302
585.1
122.8
71850.28
342342.01
15079.84
3875
879
2896.1
699.8
2026690.78
8387395.2 1
489720.04
Figure 14 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 1
334
37
-594.4
-147.2
87495.68
353311.36
21667.84
345
38
-583.4
-146.2
85293.08
340355.56
21374.44
379
58
-549.4
-126.2
69334.28
301840.36
15926.44
463
98
-465.4
-86.2
40117.48
216597.16
7430.44
487
103
-441.4
-81.2
35841.68
194833.96
6593.44
621
115
-307.4
-69.2
21272.08
94494.76
4788.64
798
145
-130.4
-39.2
5111.68
17004.16
1536.64
970
161
41.6
-23.2
-965.12
1730.56
538.24
1498
298
569.6
113.8
64820.48
324444.16
12950.44
3389
789
2460.6
604.8
1488170.88
6054552.3 6
365783.04
Figure 15 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 2
289
46
-506.2
-183.5
92887.7
256238.44
33672.25
297
52
-498.2
-177.5
88430.5
248203.24
248203.24
303
67
-492.2
-162.5
79982.5
242260.84
26406.25
401
132
-394.2
-97.5
38434.5
155393.64
9506.25
432
136
-363.2
-93.5
33959.2
131914.24
8742.25
543
105
-252.2
-124.5
31398.9
63604.84
15500.25
641
187
-154.2
-42.5
6553.5
23777.64
1806.25
879
190
83.8
-39.5
-3310.1
7022.44
1560.25
1273
398
477.8
168.5
80509.3
228292.84
28392.25
2894
982
2098.8
752.5
1579347
4404961.4 4
566256.25
Figure 16 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 3

The formula of Pearson’s correlation coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, is the arithmetic mean of all the observations of the independent variable, is the arithmetic mean of all the observations of the dependent variable and denotes the sum of all the observation of the mentioned variable.

Calculation for Table 8A:

Let, the Pearson’s Correlation Coefficient be ☐.

Calculation for Table 8B:

Let, the Pearson’s Correlation Coefficient be ☐.

Calculation for Table 8C:

Let, the Pearson’s Correlation Coefficient be ☐.

### Analysis

The value of Pearson’s correlation coefficient for three groups are 0.998, 0.996 and 0.993 respectively. As it is a positive value, it can be stated that the correlation is increasing in nature, i.e., with an increase in total number of employees working or worked in the nuclear power plant, the number of cancer infected patient also increases. However, the value of Pearson’s correlation coefficient is very close to one. It signifies that the strength of correlation is very strong.

## Evaluation of hypothesis

The hypothesis has been evaluated with the help of T – Test in this section of this mathematical exploration. The T – Test will conclude whether or not the null hypothesis or the alternate hypothesis is true.

### Processed data table Figure 17 - Table On Observed Data For Evaluation Of Test Figure 18 - Table On Expected Data For Evaluation Of Test

### Calculation of

Observed Value (O)
Expected Value (E)
10.33
9.52391937
0.80608063
0.64976598
0.06822464
11.08
11.3543268
-0.2743268
0.07525519
0.00662789
15.92
16.4517538
-0.5317538
0.2827621
0.01718735
10.66
9.99590573
0.66409427
0.4410212
0.04412018
11.01
11.9170245
-0.9070245
0.82269344
0.06903514
17.51
17.2670698
0.2429302
0.05901508
0.00341778
12.14
12.6415806
-0.5015806
0.2515831
0.01990124
15.3
15.0711732
0.2288268
0.0523617
0.0034743
22.11
21.8372462
0.2727538
0.07439464
0.00340678
15.74
17.8155717
-2.0755717
4.30799788
0.24181081
21.17
21.2395565
-0.0695565
0.00483811
0.00022779
32.92
30.7748719
2.1451281
4.60157457
0.14952376
11.32
16.3154204
-4.9954204
24.954225
1.5294871
21.15
19.4510903
1.6989097
2.88629417
0.14838727
31.48
28.1834893
3.2965107
10.8669828
0.38557975
13.2
13.0268235
0.1731765
0.0299901
0.00230218
18.52
15.5304561
2.9895439
8.93737273
0.57547394
19.34
22.5027203
-3.1627203
10.0027997
0.44451513
14.2
15.7005625
-1.5005625
2.25168782
0.14341447
18.17
18.7180625
-0.5480625
0.3003725
0.0160472
29.17
27.121375
2.048625
4.19686439
0.15474379
18.2
14.3943084
3.8056916
14.4832886
1.00618162
16.6
17.1607586
-0.5607586
0.31445021
0.01832379
21.62
24.864933
-3.244933
10.5295902
0.42347149
19.3
17.9737509
1.3262491
1.75893668
0.09786141
19.89
21.4281362
-1.5381362
2.36586297
0.11040918
31.26
31.0481129
0.2118871
0.04489614
0.00144602
22.68
20.3821569
2.2978431
5.28008291
0.25905418
23.28
24.2994152
-1.0194152
1.03920735
0.04276676
33.93
35.2084278
-1.2784278
1.63437764
0.04642007
Figure 19 - Table On Calculation Of

### Evaluation

Examining the value of with respect to the degree of freedom using the table as shown in Background Information Section, it is concluded that the Null Hypothesis is rejected and the Alternate Hypothesis is accepted.

## Conclusion

What is the relationship between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr2: 60 years to 70 years and Gr3: 70 years and 80 years?

The relationship between the number of employees working or worked in Nuclear Power Plant and the number of employees out of them who are getting or got infected by cancer respectively is direct, i.e., with increase in total number of employees, the number of employees infected by cancer is also increased.

• The equation of trendline for Group 1, i.e., the age group of 50 to 60 years, is: y = 0.2386x - 54.366.
• The equation of trendline for Group 2, i.e., the age group of 60 to 70 years, is: y = 0.2401x - 38.697.
• The equation of trendline for Group 3, i.e., the age group of 70 to 80 years, is: y = 0.352x - 50.422.
• The value of regression coefficient for Group 1 is 0.997 which satisfies the existence of the increasing correlation between the independent and the dependent variable.
• The value of regression coefficient for Group 2 is 0.992 which satisfies the existence of the increasing correlation between the independent and the dependent variable.
• The value of regression coefficient for Group 3 is 0.987 which satisfies the existence of the increasing correlation between the independent and the dependent variable.
• The value of Pearson’s Correlation Coefficient for Group 1 is 0.998. Positive value of correlation coefficient signifies that the correlation is increasing (direct relation) in nature. Secondly, such a high value (close to 1) of coefficient satisfies the existence of the correlation.
• The value of Pearson’s Correlation Coefficient for Group 2 is 0.996. Positive value of correlation coefficient signifies that the correlation is increasing (direct relation) in nature. Secondly, such a high value (close to 1) of coefficient satisfies the existence of the correlation.
• The value of Pearson’s Correlation Coefficient for Group 3 is 0.993. Positive value of correlation coefficient signifies that the correlation is increasing (direct relation) in nature. Secondly, such a high value (close to 1) of coefficient satisfies the existence of the correlation.
• The minimum percentage of employees getting infected by cancer in all the three groups is in Byron Nuclear Power Station, with values ranging between 10% to 16%.
• The maximum percentage of employees getting infected by cancer in all the three groups is in Vogtle Nuclear Power Station, with values ranging between 22% and 34%.
• The percentage of infected individuals is minimum in first age group (50 years to 60 years). This is because of the strength of immunity each employee possesses. Another reason might be advancement in radiation prevention techniques which protects employees of this generation with more efficiency than that of the others.
• The percentage of infected individuals is maximum in third age group (70 years to 80 years). This is because of the weakened immunity of each retired employee. Another reason might be the number of employees who worked in power plants alive during the survey of data collection. Due to a smaller number of retired employees, the percentage has increased.
• It is concluded that if the total number of employees in age group 50 to 60 years is 227, then there will be no case of cancer.
• Similarly, if the total number of employees in age group 60 to 70 years is 161, then there will be no case of cancer.
• Similarly, if the total number of employees in age group 70 to 80 years is 143, then there will be no case of cancer.
• The test evaluates the hypothesis and concludes that the alternate hypothesis is true.

## Reflection

In this investigation, several process and mathematical tools have been observed to find the correlation along with its strength. The choice of nuclear power plants is one of the most important strength of this investigation. It has provided with a data sheet with accurate observations of employee count. On the other hand, internationally proclaimed newspapers has also contributed in this. Use of two different correlation coefficients – Regression and Pearson’s correlation coefficient has provided the strength and nature of correlation. Furthermore, calculation of percentage of employees infected with cancer has enabled the investigation to analyse the variation of cancer infected employee (dependent variable) in the observed data sheet. Lastly, the use of – test has provided the conclusion regarding the correlation.

However, there are few weakness that has been observed during this mathematical investigation. As immunity of human body is very uncertain and cannot be generalised. Moreover, cancer is one of the disease in which research is still going on and there are a lot of gaps or queries such as causes of cancer, etc. which governs the rate of spreading of cancer. As there are a lot of variables affecting the dependent variable apart from total employee count, thus, the correlation study cannot be efficiently carried on. In order to employ an efficient correlative analysis on the research question, all of these parameters must be controlled or made constant.