Mathematics AI SL's Sample Internal Assessment

Mathematics AI SL's Sample Internal Assessment

To what extent is there a co-relation between total number of employees working in nuclear power plant and number of employees getting infected by cancer for three age groups?

6/7
6/7
10 mins read
10 mins read
Candidate Name: N/A
Candidate Number: N/A
Session: N/A
Word count: 1,967

Table of content

Rationale

"Although September 11 was horrible, it didn't threaten the survival of the human race, like nuclear weapons do." – Stephen Hawking. Despite the heinous terrorism happens across the globe, the destructive power of nuclear bomb has left a scar and fear in every individual of this world.

 

Since childhood, I have been listening to the nuisance that happened in Hiroshima and Nagasaki in the year 1945 marking the end of World War II. The destructive capacity of nuclear energy was very clear and prominent to me since the early school days.

 

It was secondary education when the picture of nuclear power and nuclear energy began to change in front of me when I studied about the nuclear energy as a non-conventional source of energy. Despite initial doubts and queries which arose due to the childhood stories of catastrophism, I came across a fact that it is the nuclear energy which is the face of change in providing energy to the mankind. The amount of energy that could be produced by nuclear reaction is unparallel to any other source of energy.

 

With passing days, the course curriculum became more intense and I started learning in depth concepts of nuclear energy. About two years from now, I have studied that during the nuclear fission reaction which generates the nuclear energy in Physics. Sooner or later, I felt a deep inclination towards the subject. Due to the highly constructive facility that it provides to the mankind, I thought of pursuing higher studies in Nuclear Energy and working in Nuclear Power Station.

 

However, currently in Biology, I studied about the disease named cancer. Some of the facts have shattered my dream of pursuing a job in nuclear power plant. In the curriculum, I studied that γ - ray causes cancer. The subtle fear which was developed regarding the devastating effects of nuclear energy again filled into my mind because in nuclear fission reaction, the reaction using which nuclear energy is generated, γ - rays are emitted.

 

To remove the fear and to concentrate on the career, I started doing a few researches. I read a few journals on side – effects of nuclear energy. There were several instances of an increased chance of getting affected by cancer if an individual is exposed to the harmful γ - ray. However, I came across a lot of articles where the preventive measures were discussed which were taken in every nuclear power plant to protect their employees from radiation. To be more confident on this, I read a lot of news journals and articles from which I came across the fact that employees working in nuclear power plant are often getting affected by cancer. However, I could not find any information on the chances of getting affected by cancer for a nuclear plant employee.

 

To find the answer, I am working on this mathematical exploration so that I can derive some relation on chances of affected by cancer if I pursue my dream job.

Aim

The main motive of this investigation is to explore the correlation between the number of employees working in a nuclear power plant and the number of employees getting affected by cancer.

Research question

What is the relationship between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr 2: 60 years to 70 years and Gr 3: 70 years and 80 years?

Introduction

What is cancer

Cancer 1 is a disease which is characterized by uncontrolled cell division. It results in repetitive division of cell which often causes formation of tumor, cyst, fibroid etc. However, tumors are categorized into two types – Benign and Malignant; Malignant tumors are considered to be cancerous. Cells of malignant tumor or cancerous cells can spread throughout the body through the blood stream and initiate the formation of tumor in any other part of the body. This results in development of pressure on vital organs on where the tumor has originated which leads to organ failure. Tumor also constricts blood vessels at its vicinity resulting in increased heart rate and blood pressure eventually increasing the chances of stroke or heart fail.

What causes cancer

There are several causative agents which triggers the cells to divide at an uncontrolled manner. However, in context of this mathematical exploration, radiation is one of the reasons responsible for causing cancer. Radiations like gamma rays, X – rays, etc. are considered to be one of the most eminent causative agents of cancer. These radiations have sufficient ionization energy to trigger the mutagen present in human DNA. On activation of mutagen of any cell, the cell began to divide continuously without maintaining the cell cycle which leads to formation of malignant or cancerous tumor.

 

From several news reports and scientific research, it is now a clear statement that due to increased emission of greenhouse gas, depletion of ozone layer has caused the harmful ultra violet rays to pass through the Earth’s atmosphere. As a result, cases of skin cancer have increased invariably in the world. This signifies the effect of radiation in causing cancer.

Nuclear power plant

Nuclear power station 4 or nuclear power plant is a power plant which generates energy by nuclear fission reaction. Nuclear fission reaction is performed in a nuclear reactor in which the heat generated by the nuclear reaction is used to convert water into steam. The steam, thus generated is used to run a turbine which generates electricity.

 

The nuclear fission reaction is accompanied by emission of radiations, such as, α - rays, β - rays, γ - rays etc. Out of which, γ - ray is considered to be the most harmful radiation. The nuclear reactor is constructed in such a way that the leakage of radiation is assured to be null. However, a number of preventive measures in respect to dresses, medical check – up, etc. of employees working in nuclear power plants are taken into consideration. Despite such preventive measures, instances have been noted where radiation has been leaked which has caused severe illness not only to the employees but also to the individuals living in the nearby areas of the power plant. This is because, γ - ray can pass through even inches of metal sheet like lead.

Regression correlation coefficient

Regression correlation coefficient is a tool to measure the strength of the correlation between the independent variable and the dependent variable. The set of values (x1, y1), (x2, y2), (xn, yn), are used to find the value of r as stated by the formula below

 

\(r=\frac{n(\Sigma xy)-(\Sigma x)(\Sigma y)}{\sqrt{[n\Sigma x^2-(\Sigma x)^2][n\Sigma y^2-(\Sigma y)^2]}}\)

 

In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and  denotes the sum of all the observation of the mentioned variable.

 

By squaring the value of r, the value of the regression coefficient (r2) will be achieved. The value of r2 lies between 0 and 1 where 1 signifies maximum correlation whereas 0 signifies null correlation.

Pearson’s correlation coefficient

Pearson’s correlation coefficient is a tool to measure the strength of the correlation and also the nature of correlation between the independent variable and the dependent variable. The set of values , (x1y1), (x2y2), (xnyn), are used to find the value of as stated by the formula below:

 

\(\mathfrak{R}=\frac{\Sigma(x-\bar x)(y-\bar y)}{\sqrt{\Sigma (x-\bar x)^2 \Sigma×(y-\bar y)^2}}\)

 

In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, \(\bar x\) is the arithmetic mean of all the observations of the independent variable, \(\bar y\) is the arithmetic mean of all the observations of the dependent variable and denotes the sum of all the observation of the mentioned variable.

 

The value of R lies between -1 and 1. A positive value of Pearson’s correlation coefficient implies a direct relationship the independent and the dependent variable whereas, a negative value of Pearson’s correlation coefficient implies a indirect relationship the independent and the dependent variable. If the value of the correlation coefficient is close of 1 or -1, it signifies the correlation exists true. On the other hand, if the value of the correlation coefficient is close to 0, it signifies the correlation does not exist.

Chi squared test

Chi squared test  is a kind of analysis which predicts the existence of any correlation between an independent variable and a dependent variable. The Chi squared value of any given set of data is firstly calculated. Now, based on the type of data, for example, paired data or independent data, the Chi squared value is checked in the Chi squared table which further predicts the existence of any correlation.

 

The formula of Chi squared value is given below

 

\(x^2 \text{ value} = \sum \frac{(O_i - E_i)^2}{E_i} \)

 

Here, is the observed value, Ei is the expected value, denotes the sum of all the observation of the mentioned variable.

 

Now, the Chi squared value is checked in Chi squared table which predicts the existence of any correlation. The Chi squared table is shown below

df0.9950.990.9750.950.900.100.050.0250.010.005
1------0.0010.0040.0162.7063.8415.0246.6357.879
20.0100.0200.0510.1030.2114.6055.9917.3789.21010.597
30.0720.1150.2160.3520.5846.2517.8159.34811.34512.838
40.2070.2970.4840.7111.0647.7799.48811.14313.27714.860
50.4120.5540.8311.1451.6109.23611.07012.83315.08616.750
60.6760.8721.2371.6352.20410.64512.59214.44916.81218.548
70.9891.2391.6902.1672.83312.01714.06716.01318.47520.278
81.3441.6462.1802.7333.49013.36215.50717.53520.09021.955
91.7352.0882.7003.3254.16814.68416.91919.02321.66623.589
102.1562.5583.2473.9404.86515.98718.30720.48323.20925.188
112.6033.0533.8164.5755.57817.27519.67521.92024.72526.757
123.0743.5714.4045.2266.30418.54921.02623.33726.21728.300
133.5654.1075.0095.8927.04219.81222.36224.73627.68829.819
144.0754.6605.6296.5717.79021.06423.68526.11929.14131.319
154.6015.2296.2627.2618.54722.30724.99627.48830.57832.801
165.1425.8126.9087.9629.31223.54226.29628.84532.00034.267
175.6976.4087.5648.67210.08524.76927.58730.19133.40935.718
186.2657.0158.2319.39010.86525.98928.86931.52634.80537.156
196.8447.6338.90710.11711.65127.20430.14432.85236.19138.582
207.4348.2609.59110.85112.44328.41231.41034.17037.56639.997
218.0348.89710.28311.59113.24029.61532.67135.47938.93241.401
228.6439.54210.98212.33814.04130.81333.92436.78140.28942.796
239.26010.19611.68913.09114.84832.00735.17238.07641.63844.181
249.88610.85612.40113.84815.65933.19636.41539.36442.98045.559
2510.52011.52413.12014.61116.47334.38237.65240.64644.31446.928
2611.16012.19813.84415.37917.29235.56338.88541.92345.64248.290
2711.80812.87914.57316.15118.11436.74140.11343.19546.96349.645
2812.46113.56515.30816.92818.93937.91641.33744.46148.27850.993
2913.12114.25616.04717.70819.76839.08742.55745.72249.58852.336
3013.78714.95316.79118.49320.59940.25643.77346.97950.89253.672
4020.70722.16424.43326.50929.05151.80555.75859.34263.69166.766
5027.99129.70732.35734.76437.68963.16767.50571.42076.15479.490
6035.53437.48540.48243.18846.45974.39779.08283.29888.37991.952
7043.27545.44248.75851.73955.32985.52790.53195.023100.425104.215
8051.17253.54057.15360.39164.27896.578101.879106.629112.329116.321
9059.19661.75465.64769.12673.291107.565113.145118.136124.116128.299
10067.32870.06574.22277.92982.358118.498124.342129.561135.807140.169

Figure 1 - Table On The Chi Squared Table Is Shown Below

Hypothesis

Null hypothesis

It is assumed that there does not exist any correlation between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr2: 60 years to 70 years and Gr3: 70 years and 80 years.

Alternate hypothesis

It is assumed that there is a correlation between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr2: 60 years to 70 years and Gr3: 70 years and 80 years.

Data collection

Source of data

A data sheet has been prepared based on several news articles, reports and surveys in different nuclear power plant across the globe. It has been possible to record the data of number of employees got infected by cancer during their tenure of service because of the health insurance policy that the company offers to all its employees. Similarly, the health status of the retired employees has been achieved from the health benefit that the company offers even after retirement.

Justification on categorization of age groups

The employees working in nuclear power plant has been categorized into three groups to illustrate the correlation in a proper and intensive way. It has been studied that immunity against cancer is more in young age than that of the elder. However, there are lot of exceptions; mutagen is activated in elder people with very less exposition to radiations than that of others. On the other hand, it has been observed that an individual at a young age has been exposed to cancer causing radiation, however, the cancer has been observed at a very later period of his life. Thus, considering the strength of immunity in an individual, the age groups are made accordingly.

Raw Data Table

NameTotalInfected
Byron Nuclear Power Station32934
Peach Bottom Atomic Power Station34737
Oconee Nuclear Station38747
Braidwood Generating Station45171
South Texas Project Electric Generating Station45952
Susquehanna Nuclear Power Plant67489
Mcguire Nuclear Power Plant725103
Browns Ferry Nuclear Plant978178
Palo Verde Generation Station1564302
Vogtle Nuclear Power Station3875879

Figure 2 - Table On Total No. of Employees vs. No. of Employees Infected (Gr1: 50 – 60 Years)

NameTotalInfected
Byron Nuclear Power Station33437
Peach Bottom Atomic Power Station34538
Oconee Nuclear Station37958
Braidwood Generating Station46398
South Texas Project Electric Generating Station487103
Susquehanna Nuclear Power Plant621115
Mcguire Nuclear Power Plant798145
Browns Ferry Nuclear Plant970161
Palo Verde Generation Station1498298
Vogtle Nuclear Power Station3389789

Figure 3 - Table On Total No. of Employees vs. No. of Employees Infected (Gr2: 60 – 70 years):

NameTotalInfected
Byron Nuclear Power Station28946
Peach Bottom Atomic Power Station29752
Oconee Nuclear Station30367
Braidwood Generating Station401132
South Texas Project Electric Generating Station432136
Susquehanna Nuclear Power Plant543105
Mcguire Nuclear Power Plant641187
Browns Ferry Nuclear Plant879190
Palo Verde Generation Station1273398
Vogtle Nuclear Power Station2894982

Figure 4 - Table On Total No. of Employees vs. No. of Employees Infected (Gr3: 70 – 80 Years):

Processed data table

Total No. of EmployeesInfected EmployeesPercentage
3293410.33
3473710.66
3874712.14
4517115.74
4595211.32
6748913.20
72510314.20
97817818.20
156430219.30
387587922.68

Figure 5 - Table On Processed Data Table For Gr. 1

Total No. of EmployeesInfected EmployeesPercentage
3343711.08
3453811.01
3795815.30
4639821.17
48710321.15
62111518.52
79814518.17
97016116.60
149829819.89
338978923.28

Figure 6 - Table On Processed Data Table For Gr. 2

Total No. of EmployeesInfected EmployeesPercentage
2894615.92
2975217.51
3036722.11
40113232.92
43213631.48
54310519.34
64118729.17
87919021.62
127339831.26
289498233.93

Figure 7 - Table On Processed Data Table For Gr. 3

Sample Calculation

 

Percentage of Infected Employee \(= \frac{34}{326}=10.33\)

Analysis of processed data table

In Figure 5 to Figure 7, percentage of employee who were getting infected by cancer out of the total number of employees have been found. As the interval in total number of employees (independent variable) is not regular, the mean value and standard deviation will not serve any purpose in analyzing the data. Rather, the number of employees infected by cancer is completely depending upon the total number of employees working in that particular power plant. Thus, percentage has been calculated.

 

In Figure 5, it has been observed that the percentage of employees infected by cancer is ranging between 10% and 23%. However, it is noticed that number of infected employees is increasing with the total number of employees working in a power plant. Similarly, in table 5, the percentage of infected employee is ranging between 11% and 24% with 11% infected being in the power plant with least number of working employees and 24% being the maximum number of employees working. In table 6, as the age group is between 70 years and 80 years, it can be assumed that the total number of employees who worked for the power plants may have decreased due to death rates in the age. Thus, the total number of employees currently alive is less than that of the other groups. On the other hand, the percentage of infected employees has also increased over the other groups, ranging between 15% and 34% with 15% infected being in the power plant with least number of working employees and 34% being the maximum number of employees working.

Graphical analysis

Figure 8 - Total No. of Employees vs. No. of Employees Infected (Gr1: 50 – 60 Years)

Figure 9 - Total No. of Employees vs. No. of Employees Infected (Gr2: 60 – 70 Years)

Figure 10 - Total No. of Employees vs. No. of Employees Infected (Gr3: 70 – 80 Years)

Choice of Axes

The X – Axis of the graph denotes the total number of employees working or worked in nuclear power plants (independent variable).

 

The Y – Axis of the graph denotes number of employees who are currently infected by cancer (dependent variable).

Trendline for linear correlation

In all the graphs from no. 1 to no. 3, a linear trendline has been obtained using the data that has been collected from the official websites of the nuclear power plants, newspapers, journals, articles etc.

 

In Figure 8, the equation of trendline is

 

y = 0.2386x - 54.366

 

In Figgur 9, the equation of trendline is

 

y = 0.2401x - 38.697

 

In Figure 10, the equation of trendline is

 

y = 0.352x - 50.422

 

From the graphs, it can be stated that, there exists a positive increasing correlation between the number of employees getting infected by cancer and the total number of employees either currently working or worked in the nuclear power plants. However, a few outliers have been noticed in the graphs as well.

Outliers

There are a few outliers when the total number of employees are in the range of 500 to 750. Due to presence of very less number of outliers, the value of regression coefficient is 0.99. Such a high value (close to one) of regression coefficient satisfies the existence of any linear correlation between the dependent and the independent variable.

Intercept for linear correlation

From the equation of the trendline of figure 8, the Y – intercept of the trendline has been calculated

 

y = 0.2386x - 54.366

 

The value of y for x = 0 will be

 

y = 0.2386 × 0 - 54.366

 

=> y = -54.366

 

From the equation of the trendline of figure 9, the Y – intercept of the trendline has been calculated

 

y = 0.2401x - 38.697

 

The value of y for x = 0 will be

 

y = 0.2401 × 0 - 38.697

 

=> y = -38.697

 

From the equation of the trendline of figure 10, the Y – intercept of the trendline has been calculated

 

y = 0.352x - 50.422

 

The value of y for x = 0 will be

 

y = 0.352 × 0 - 50.422

 

=>  y = -50.422

 

The value of Y – Intercept is -54.366, -38.697, and -50.422 for figure 8, figure 9 and figure 10 respectively. A negative intercept is suggests that if the total number of employee is zero, then the number of infected individual should be negative. However, literally it cannot be possible; mathematical significance of the statement is if the total number of employee is considered to be null, there will not be any infected patient as well; this justifies the fact of increase in the rate of cancer through nuclear power plant.

 

From the equation of the trendline of figure 8, the X – intercept of the trendline has been calculated

 

y = 0.236x - 54.366

 

The value of x for y = 0 will be

 

0 = 0.236x - 54.366

 

\(=> x = \frac{54.366}{0.2386}\)

 

=> x = 227.85

 

From the equation of the trendline of figure 9, the Y – intercept of the trendline has been calculated

 

y = 0.2401x - 38.697

 

The value of x for y = 0 will be

 

0 = 0.2401x - 38.697

 

\(=> x = \frac{38.697}{0.2401}\)

 

=> x = 227.85

 

From the equation of the trendline of figure 10, the Y – intercept of the trendline has been calculated

 

y = 0.352x - 50.422

 

The value of x for y = 0 will be

 

0 = 0.352x - 50.422

 

\(=> x = \frac{50.422}{0.352}\)

 

=> x = 143.24

 

The value of X – Intercept is 228, 161, and 143 approximately for graph 1, graph 2 and graph 3 respectively. The mathematical significance of the statement is if the total number of employee is 228 for age group 50 years to 60 years, 161 for age group 60 to 70 years and 143 for age group 70 to 80 years, there will not be any infected patient.

Calculation of correlation coefficient

Calculation of regression correlation coefficient

Processed Data for calculation of R2

 

There are five headers of the processed data tables expressed as x, y, x2, y2, xy. The total number of employees is represented by x and the number of employee infected by cancer is represented by y. The remaining headers has usual meaning. The calculation of R2 correlation coefficient is shown explore the efficiency and stability of the trendline and the correlation.

x

y

x2

y2

xy

32934108241115611186
34737120409136912839
38747149769220918189
45171203401504132021
45952210681270423868
67489454276792159986
7251035256251060974675
97817895648431684174084
1564302244609691204472328
3875879150156257726413406125

∑ x = 9789

∑ y = 1792

∑ x2 = 20190607

∑ y2 = 926538

∑ xy = 4285301

Figure 11 - Table On Processed Data For Calculation Of R2 For Group 1

x

y

x2

y2

y2

33437111556136912358
34538119025144413110
37958143641336421982
46398214369960445374
4871032371691060950161
6211153856411322571415
79814563680421025115710
97016194090025921156170
1498298224400488804446404
3389789114853216225212673921

∑x = 9284

∑y = 1842

∑x2 = 16518430

∑y2 = 797886

∑xy = 3606605

Figure 12 - Table On Processed Data For Calculation Of R2 For Group 2

2894683521211613294
2975288209270415444
3036791809448920301
4011321608011742452932
4321361866241849658752
5431052948491102557015
64118741088134969119867
87919077264136100167010
12733981620529158404506654
289498283752369643242841908

∑x = 7952

∑y = 2295

∑x2 = 12085100

∑y2 = 1250051

∑xy = 3853177

Figure 13 - Table On Processed Data For Calculation Of R2 For Group 3

The formula of regression coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and denotes the sum of all the observation of the mentioned variable.

 

Calculation for Group 1

 

\(r = \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-(\sum x)^2][n\sum y^2-(\sum y)^2]}}\)

 

\(=>r=\frac{10(4285301)-(9789)(1792)}{\sqrt{[10 × 20190607-(9789)^2][10 × 926538 - (1792)^2]}}\)

 

=> r = 0.9987

 

=> r2 = 0.9975

 

Calculation for Group 2

 

\(r = \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-(\sum x)^2][n\sum y^2-(\sum y)^2]}}\)

 

\(=>r = \frac{10(3606605)-(9284)(1842)}{\sqrt{[10 × 16518430-(9284)^2][10 × 797886--(1842)^2]}}\)

 

=> r = 0.9964

 

=> r2 = 0.9929

 

Calculation for Group 3

 

\(r = \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-(\sum x)^2][n\sum y^2-(\sum y)^2]}}\)

 

\(=>r= \frac{10(383177)--(7952)(2295)}{\sqrt{[10 × 12085100 - (7952)^2][10 × 1250051 (2295)^2]}}\)

 

=> r = 0.993

 

=> r2 = 0.987

Analysis

The value of regression coefficient is 0.9975, 0.9929, and 0.987 for group 1, group 2 and group 3 respectively. Such a high value (close to one) of regression coefficient satisfies the existence of any linear correlation between the dependent and the independent variable.

Calculation of pearson’s correlation coefficient

Processed Data Table for calculation of Pearson’s Correlation:

 

There are seven headers of the processed data table for calculation of Pearson’s correlation coefficient expressed as, x, y, \(x-\bar x\), \(y- \bar y\), \((x-\bar x)^2\), and \((y- \bar y)^2\). The total number of employees is represented by x and the number of employees infected by cancer is represented by y, \(\bar x\) is the arithmetic mean of all the observations of total number of employees, \(\bar y\) is the arithmetic mean of all the observations of the number of employees infected by cancer. The remaining headers has usual meaning. The calculation of Pearson’s correlation coefficient is shown to explore the efficiency and stability of the trendline and the correlation.

32934-649.9-145.294365.48422370.0121083.04
34737-631.9-142.289856.18399297.6120220.84
38747-591.9-132.278249.18350345.6117476.84
45171-527.9-108.257118.78278678.4111707.24
45952-519.9-127.266131.28270296.0116179.84
67489-304.9-90.227501.9892964.018136.04
725103-253.9-76.219347.1864465.215806.44
978178-0.9-1.21.080.811.44
1564302585.1122.871850.28342342.0115079.84
38758792896.1699.82026690.788387395.2 1489720.04

Figure 14 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 1

33437-594.4-147.287495.68353311.3621667.84
34538-583.4-146.285293.08340355.5621374.44
37958-549.4-126.269334.28301840.3615926.44
46398-465.4-86.240117.48216597.167430.44
487103-441.4-81.235841.68194833.966593.44
621115-307.4-69.221272.0894494.764788.64
798145-130.4-39.25111.6817004.161536.64
97016141.6-23.2-965.121730.56538.24
1498298569.6113.864820.48324444.1612950.44
33897892460.6604.81488170.886054552.3 6365783.04

Figure 15 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 2

28946-506.2-183.592887.7256238.4433672.25
29752-498.2-177.588430.5248203.24248203.24
30367-492.2-162.579982.5242260.8426406.25
401132-394.2-97.538434.5155393.649506.25
432136-363.2-93.533959.2131914.248742.25
543105-252.2-124.531398.963604.8415500.25
641187-154.2-42.56553.523777.641806.25
87919083.8-39.5-3310.17022.441560.25
1273398477.8168.580509.3228292.8428392.25
28949822098.8752.515793474404961.4 4566256.25

Figure 16 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 3

The formula of Pearson’s correlation coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, is the arithmetic mean of all the observations of the independent variable, is the arithmetic mean of all the observations of the dependent variable and denotes the sum of all the observation of the mentioned variable.

 

Calculation for Table 8A

 

\(\bar x = \frac{\sum x}{N}=\frac{9789}{10} = 978.9\)

 

\(\bar y = \frac{\sum y}{N}=\frac{1792}{10}=179.2\)

 

\( \displaystyle \sum (x - \bar{x})(y - \bar{y}) = 2531112.2 \)

 

\(\displaystyle \sum (x- \bar x)(y- \bar y) = 2531112.2\)

 

\(\displaystyle \sum (x-\bar x)^2=1008154.9\)

 

\(\displaystyle\sum(y-\bar y)=605411.6\)

 

Let, the Pearson’s Correlation Coefficient be \(\mathfrak{R}\).

 

\(\mathfrak{R}=\frac{\sum{({x}-\bar{{x}})({y}-\bar{{y}})}}{\sqrt{\sum{({x}-\bar{{x}})}^{2}\times\sum{({y}-\bar{{y}})}^{2}}}\)

 

\(\mathfrak{R}=\frac{2531112.2}{\sqrt{10608154.9\times605411.6}}=0.998\)

 

Calculation for Table 8B

 

\(\bar x=\frac{\sum x}{N}= \frac{9284}{10}=928.4\)

 

\(\bar y = \frac{\sum y}{N}=\frac{9284}{10}=184.2\)

 

\(\displaystyle\sum (x-\bar x)(y-\bar y)=1896492.2\)

 

\(\displaystyle \sum (x- \bar x)^2=7899164.4\)

 

\(\displaystyle \sum (y-\bar y)^2=458589.6\)

 

Let, the Pearson’s Correlation Coefficient be \(\mathfrak{R}\).

 

\(\mathfrak{R}=\frac{\sum{({x}-\bar{{x}})({y}-\bar{{y}})}}{\sqrt{\sum{({x}-\bar{{x}})}^{2}\times\sum{({y}-\bar{{y}})}^{2}}}\)

 

\(\mathfrak{R}=\frac{1896492.2}{\sqrt{7899164.4\times458589.6}}=0.996\)

 

Calculation for Table 8C

 

\(\bar x=\frac{\sum x}{N}=\frac{7952}{10}=795.2\)

 

\(\bar y = \frac{\sum y}{N} = \frac{7952}{10}=229.5\)

 

\(\displaystyle \sum (x-\bar x)(y -\bar y)=2028193\)

 

\(\displaystyle \sum(x-\bar x)^2 = 576169.6\)

 

\(\displaystyle \sum (y-\bar y)^2= 723348.5\)

 

Let, the Pearson’s Correlation Coefficient be \(\mathfrak{R}\).

 

\(\mathfrak{R}=\frac{\sum{({x}-\bar{{x}})({y}-\bar{{y}})}}{\sqrt{\sum{({x}-\bar{{x}})}^{2}\times\sum{({y}-\bar{{y}})}^{2}}}\)

 

\(\mathfrak{R}=\frac{2028193}{\sqrt{5761669.6\times723348.5}}=0.993\)

Analysis

The value of Pearson’s correlation coefficient for three groups are 0.998, 0.996 and 0.993 respectively. As it is a positive value, it can be stated that the correlation is increasing in nature, i.e., with an increase in total number of employees working or worked in the nuclear power plant, the number of cancer infected patient also increases. However, the value of Pearson’s correlation coefficient is very close to one. It signifies that the strength of correlation is very strong.

Evaluation of hypothesis

The hypothesis has been evaluated with the help of T – Test in this section of this mathematical exploration. The T – Test will conclude whether or not the null hypothesis or the alternate hypothesis is true.

Processed data table

Figure 17 - Table On Observed Data For Evaluation Of

x2

Test

Figure 18 - Table On Expected Data For Evaluation Of

x2

Test

Calculation of

x2

Observed Value (O)Expected Value (E)
10.339.523919370.806080630.649765980.06822464
11.0811.3543268-0.27432680.075255190.00662789
15.9216.4517538-0.53175380.28276210.01718735
10.669.995905730.664094270.44102120.04412018
11.0111.9170245-0.90702450.822693440.06903514
17.5117.26706980.24293020.059015080.00341778
12.1412.6415806-0.50158060.25158310.01990124
15.315.07117320.22882680.05236170.0034743
22.1121.83724620.27275380.074394640.00340678
15.7417.8155717-2.07557174.307997880.24181081
21.1721.2395565-0.06955650.004838110.00022779
32.9230.77487192.14512814.601574570.14952376
11.3216.3154204-4.995420424.9542251.5294871
21.1519.45109031.69890972.886294170.14838727
31.4828.18348933.296510710.86698280.38557975
13.213.02682350.17317650.02999010.00230218
18.5215.53045612.98954398.937372730.57547394
19.3422.5027203-3.162720310.00279970.44451513
14.215.7005625-1.50056252.251687820.14341447
18.1718.7180625-0.54806250.30037250.0160472
29.1727.1213752.0486254.196864390.15474379
18.214.39430843.805691614.48328861.00618162
16.617.1607586-0.56075860.314450210.01832379
21.6224.864933-3.24493310.52959020.42347149
19.317.97375091.32624911.758936680.09786141
19.8921.4281362-1.53813622.365862970.11040918
31.2631.04811290.21188710.044896140.00144602
22.6820.38215692.29784315.280082910.25905418
23.2824.2994152-1.01941521.039207350.04276676
33.9335.2084278-1.27842781.634377640.04642007

Figure 19 - Table On Calculation Of

x2

\(\displaystyle \sum \frac{(0-E)}{E}=0.068+0.006 +...+ 0.046= 6.032843\)

 

x2 = 6.032843

Calculation of degree of freedom

Degree of Freedom = (Column - 1) (Row - 1)

 

= (3 - 1) (10-1) = 2 × 9 = 18

Evaluation

Examining the value of with respect to the degree of freedom using the table as shown in Background Information Section, it is concluded that the Null Hypothesis is rejected and the Alternate Hypothesis is accepted.

Conclusion

What is the relationship between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr2: 60 years to 70 years and Gr3: 70 years and 80 years?

 

The relationship between the number of employees working or worked in Nuclear Power Plant and the number of employees out of them who are getting or got infected by cancer respectively is direct, i.e., with increase in total number of employees, the number of employees infected by cancer is also increased.

  • The equation of trendline for Group 1, i.e., the age group of 50 to 60 years, is: y = 0.2386x - 54.366.
  • The equation of trendline for Group 2, i.e., the age group of 60 to 70 years, is: y = 0.2401x - 38.697.
  • The equation of trendline for Group 3, i.e., the age group of 70 to 80 years, is: y = 0.352x - 50.422.
  • The value of regression coefficient for Group 1 is 0.997 which satisfies the existence of the increasing correlation between the independent and the dependent variable.
  • The value of regression coefficient for Group 2 is 0.992 which satisfies the existence of the increasing correlation between the independent and the dependent variable.
  • The value of regression coefficient for Group 3 is 0.987 which satisfies the existence of the increasing correlation between the independent and the dependent variable.
  • The value of Pearson’s Correlation Coefficient for Group 1 is 0.998. Positive value of correlation coefficient signifies that the correlation is increasing (direct relation) in nature. Secondly, such a high value (close to 1) of coefficient satisfies the existence of the correlation.
  • The value of Pearson’s Correlation Coefficient for Group 2 is 0.996. Positive value of correlation coefficient signifies that the correlation is increasing (direct relation) in nature. Secondly, such a high value (close to 1) of coefficient satisfies the existence of the correlation.
  • The value of Pearson’s Correlation Coefficient for Group 3 is 0.993. Positive value of correlation coefficient signifies that the correlation is increasing (direct relation) in nature. Secondly, such a high value (close to 1) of coefficient satisfies the existence of the correlation.
  • The minimum percentage of employees getting infected by cancer in all the three groups is in Byron Nuclear Power Station, with values ranging between 10% to 16%.
  • The maximum percentage of employees getting infected by cancer in all the three groups is in Vogtle Nuclear Power Station, with values ranging between 22% and 34%.
  • The percentage of infected individuals is minimum in first age group (50 years to 60 years). This is because of the strength of immunity each employee possesses. Another reason might be advancement in radiation prevention techniques which protects employees of this generation with more efficiency than that of the others.
  • The percentage of infected individuals is maximum in third age group (70 years to 80 years). This is because of the weakened immunity of each retired employee. Another reason might be the number of employees who worked in power plants alive during the survey of data collection. Due to a smaller number of retired employees, the percentage has increased.
  • It is concluded that if the total number of employees in age group 50 to 60 years is 227, then there will be no case of cancer.
  • Similarly, if the total number of employees in age group 60 to 70 years is 161, then there will be no case of cancer.
  • Similarly, if the total number of employees in age group 70 to 80 years is 143, then there will be no case of cancer.
  • The test evaluates the hypothesis and concludes that the alternate hypothesis is true.

Reflection

In this investigation, several process and mathematical tools have been observed to find the correlation along with its strength. The choice of nuclear power plants is one of the most important strength of this investigation. It has provided with a data sheet with accurate observations of employee count. On the other hand, internationally proclaimed newspapers has also contributed in this. Use of two different correlation coefficients – Regression and Pearson’s correlation coefficient has provided the strength and nature of correlation. Furthermore, calculation of percentage of employees infected with cancer has enabled the investigation to analyse the variation of cancer infected employee (dependent variable) in the observed data sheet. Lastly, the use of – test has provided the conclusion regarding the correlation.

 

However, there are few weakness that has been observed during this mathematical investigation. As immunity of human body is very uncertain and cannot be generalised. Moreover, cancer is one of the disease in which research is still going on and there are a lot of gaps or queries such as causes of cancer, etc. which governs the rate of spreading of cancer. As there are a lot of variables affecting the dependent variable apart from total employee count, thus, the correlation study cannot be efficiently carried on. In order to employ an efficient correlative analysis on the research question, all of these parameters must be controlled or made constant.

Bibliography

  • What Is Cancer? - National Cancer Institute. 17 Sept. 2007, https://www.cancer.gov/about-cancer/understanding/what-is-cancer.
  • Risk Factors: Radiation - National Cancer Institute. 29 Apr. 2015, https://www.cancer.gov/about-cancer/causes-prevention/risk/radiation.
  • ‘UV Radiation’. The Skin Cancer Foundation, https://www.skincancer.org/risk- factors/uv-radiation/. Accessed 22 Nov. 2020.
  • Nuclear Power Plants - U.S. Energy Information Administration (EIA). https://www.eia.gov/energyexplained/nuclear/nuclear-power-plants.php. Accessed 22 Nov. 2020.
  • ‘Electromagnetic Radiation - Gamma Rays’. Encyclopedia Britannica, https://www.britannica.com/science/electromagnetic-radiation. Accessed 22 Nov. 2020.
  • Correlation. http://www.stat.yale.edu/Courses/1997-98/101/correl.htm. Accessed 22 Nov. 2020.
  • Data Analysis - Pearson’s Correlation Coefficient. http://learntech.uwe.ac.uk/da/default.aspx?pageid=1442. Accessed 22 Nov. 2020.
  • Chi Square Statistics. https://math.hws.edu/javamath/ryan/ChiSquare.html. Accessed 23 Nov. 2020.
  • Table: Chi-Square Probabilities. https://people.richland.edu/james/lecture/m170/tbl- chi.html. Accessed 23 Nov. 2020.
  • ‘Nuclear Workers May Face Higher Cancer Risk’. WebMD, https://www.webmd.com/cancer/news/20050628/nuclear-workers-may-face-higher- cancer-risk. Accessed 22 Nov. 2020.
  • Parthasarathy, K. s. ‘Is Working in a Nuclear Power Plant Risky?’ The Hindu, 1 Jan. 2014. www.thehindu.com, https://www.thehindu.com/sci-tech/science/is-working-in- a-nuclear-power-plant-risky/article5526497.ece
  • Accidents at Nuclear Power Plants and Cancer Risk - National Cancer Institute. 19 Apr. 2011, https://www.cancer.gov/about-cancer/causes- prevention/risk/radiation/nuclear-accidents-fact-sheet.
  • Exelon. https://www.exeloncorp.com:443/locations/power-plants/byron-generating- station. Accessed 25 Nov. 2020.
  • Peach Bottom Atomic Power Station Receives Approval to Operate an Additional 20 Years | Transmission Intelligence Service. https://www.transmissionhub.com/articles/2020/03/peach-bottom-atomic-power- station-receives-approval-to-operate-an-additional-20-years.html. Accessed 25 Nov. 2020.
  • NRC: Oconee Nuclear Station, Unit 1. https://www.nrc.gov/info- finder/reactors/oco1.html. Accessed 25 Nov. 2020.
  • ‘Braidwood Generating Station | Braceville, Ill.’ Nuclear Powers IL, https://www.nuclearpowersillinois.com/braidwood_generating_station. Accessed 25 Nov. 2020.
  • NRC: South Texas Project, Unit 1. https://www.nrc.gov/info- finder/reactors/stp1.html. Accessed 25 Nov. 2020.
  • NRC: Susquehanna Steam Electric Station, Unit 1. https://www.nrc.gov/info- finder/reactors/susq1.html. Accessed 25 Nov. 2020.
  • Energy, Duke. ‘McGuire Nuclear Station Focuses on Operational Excellence and Community Outreach’. Duke Energy | Nuclear Information Center, https://nuclear.duke-energy.com/2013/06/25/mcguire-nuclear-station-focuses-on- operational-excellence-and-community-outreach. Accessed 25 Nov. 2020.
  • ‘Browns Ferry Nuclear Plant’. TVA.Com, https://www.tva.com/energy/our-power- system/nuclear/browns-ferry-nuclear-plant. Accessed 25 Nov. 2020.
  • ‘Aps – Arizona Public Service Electric’. Aps, https://www.aps.com/en/About/Our- Company/Clean-Energy/Nuclear-generation. Accessed 25 Nov. 2020.
  • ‘Vogtle 3 and 4’. Georgia Power, http://www.georgiapower.com/company/plant- vogtle.html. Accessed 25 Nov. 2020.