"Although September 11 was horrible, it didn't threaten the survival of the human race, like nuclear weapons do." – Stephen Hawking. Despite the heinous terrorism happens across the globe, the destructive power of nuclear bomb has left a scar and fear in every individual of this world.
Since childhood, I have been listening to the nuisance that happened in Hiroshima and Nagasaki in the year 1945 marking the end of World War II. The destructive capacity of nuclear energy was very clear and prominent to me since the early school days.
It was secondary education when the picture of nuclear power and nuclear energy began to change in front of me when I studied about the nuclear energy as a non-conventional source of energy. Despite initial doubts and queries which arose due to the childhood stories of catastrophism, I came across a fact that it is the nuclear energy which is the face of change in providing energy to the mankind. The amount of energy that could be produced by nuclear reaction is unparallel to any other source of energy.
With passing days, the course curriculum became more intense and I started learning in depth concepts of nuclear energy. About two years from now, I have studied that during the nuclear fission reaction which generates the nuclear energy in Physics. Sooner or later, I felt a deep inclination towards the subject. Due to the highly constructive facility that it provides to the mankind, I thought of pursuing higher studies in Nuclear Energy and working in Nuclear Power Station.
However, currently in Biology, I studied about the disease named cancer. Some of the facts have shattered my dream of pursuing a job in nuclear power plant. In the curriculum, I studied that γ - ray causes cancer. The subtle fear which was developed regarding the devastating effects of nuclear energy again filled into my mind because in nuclear fission reaction, the reaction using which nuclear energy is generated, γ - rays are emitted.
To remove the fear and to concentrate on the career, I started doing a few researches. I read a few journals on side – effects of nuclear energy. There were several instances of an increased chance of getting affected by cancer if an individual is exposed to the harmful γ - ray. However, I came across a lot of articles where the preventive measures were discussed which were taken in every nuclear power plant to protect their employees from radiation. To be more confident on this, I read a lot of news journals and articles from which I came across the fact that employees working in nuclear power plant are often getting affected by cancer. However, I could not find any information on the chances of getting affected by cancer for a nuclear plant employee.
To find the answer, I am working on this mathematical exploration so that I can derive some relation on chances of affected by cancer if I pursue my dream job.
The main motive of this investigation is to explore the correlation between the number of employees working in a nuclear power plant and the number of employees getting affected by cancer.
What is the relationship between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr 2: 60 years to 70 years and Gr 3: 70 years and 80 years?
Cancer 1 is a disease which is characterized by uncontrolled cell division. It results in repetitive division of cell which often causes formation of tumor, cyst, fibroid etc. However, tumors are categorized into two types – Benign and Malignant; Malignant tumors are considered to be cancerous. Cells of malignant tumor or cancerous cells can spread throughout the body through the blood stream and initiate the formation of tumor in any other part of the body. This results in development of pressure on vital organs on where the tumor has originated which leads to organ failure. Tumor also constricts blood vessels at its vicinity resulting in increased heart rate and blood pressure eventually increasing the chances of stroke or heart fail.
There are several causative agents which triggers the cells to divide at an uncontrolled manner. However, in context of this mathematical exploration, radiation is one of the reasons responsible for causing cancer. Radiations like gamma rays, X – rays, etc. are considered to be one of the most eminent causative agents of cancer. These radiations have sufficient ionization energy to trigger the mutagen present in human DNA. On activation of mutagen of any cell, the cell began to divide continuously without maintaining the cell cycle which leads to formation of malignant or cancerous tumor.
From several news reports and scientific research, it is now a clear statement that due to increased emission of greenhouse gas, depletion of ozone layer has caused the harmful ultra violet rays to pass through the Earth’s atmosphere. As a result, cases of skin cancer have increased invariably in the world. This signifies the effect of radiation in causing cancer.
Nuclear power station 4 or nuclear power plant is a power plant which generates energy by nuclear fission reaction. Nuclear fission reaction is performed in a nuclear reactor in which the heat generated by the nuclear reaction is used to convert water into steam. The steam, thus generated is used to run a turbine which generates electricity.
The nuclear fission reaction is accompanied by emission of radiations, such as, α - rays, β - rays, γ - rays etc. Out of which, γ - ray is considered to be the most harmful radiation. The nuclear reactor is constructed in such a way that the leakage of radiation is assured to be null. However, a number of preventive measures in respect to dresses, medical check – up, etc. of employees working in nuclear power plants are taken into consideration. Despite such preventive measures, instances have been noted where radiation has been leaked which has caused severe illness not only to the employees but also to the individuals living in the nearby areas of the power plant. This is because, γ - ray can pass through even inches of metal sheet like lead.
Regression correlation coefficient is a tool to measure the strength of the correlation between the independent variable and the dependent variable. The set of values (x_{1}, y_{1}), (x_{2}, y_{2}), (x_{n}, y_{n}), are used to find the value of r as stated by the formula below
\(r=\frac{n(\Sigma xy)-(\Sigma x)(\Sigma y)}{\sqrt{[n\Sigma x^2-(\Sigma x)^2][n\Sigma y^2-(\Sigma y)^2]}}\)
In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and ∑ denotes the sum of all the observation of the mentioned variable.
By squaring the value of r, the value of the regression coefficient (r^{2}) will be achieved. The value of r^{2} lies between 0 and 1 where 1 signifies maximum correlation whereas 0 signifies null correlation.
Pearson’s correlation coefficient is a tool to measure the strength of the correlation and also the nature of correlation between the independent variable and the dependent variable. The set of values , (x_{1}, y_{1}), (x_{2}, y_{2}), (x_{n}, y_{n}), are used to find the value of as stated by the formula below:
\(\mathfrak{R}=\frac{\Sigma(x-\bar x)(y-\bar y)}{\sqrt{\Sigma (x-\bar x)^2 \Sigma×(y-\bar y)^2}}\)
In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, \(\bar x\) is the arithmetic mean of all the observations of the independent variable, \(\bar y\) is the arithmetic mean of all the observations of the dependent variable and ∑ denotes the sum of all the observation of the mentioned variable.
The value of R lies between -1 and 1. A positive value of Pearson’s correlation coefficient implies a direct relationship the independent and the dependent variable whereas, a negative value of Pearson’s correlation coefficient implies a indirect relationship the independent and the dependent variable. If the value of the correlation coefficient is close of 1 or -1, it signifies the correlation exists true. On the other hand, if the value of the correlation coefficient is close to 0, it signifies the correlation does not exist.
Chi squared test is a kind of analysis which predicts the existence of any correlation between an independent variable and a dependent variable. The Chi squared value of any given set of data is firstly calculated. Now, based on the type of data, for example, paired data or independent data, the Chi squared value is checked in the Chi squared table which further predicts the existence of any correlation.
The formula of Chi squared value is given below
\(x^2 \text{ value} = \sum \frac{(O_i - E_i)^2}{E_i} \)
Here, is the observed value, E_{i} is the expected value, ∑ denotes the sum of all the observation of the mentioned variable.
Now, the Chi squared value is checked in Chi squared table which predicts the existence of any correlation. The Chi squared table is shown below
df | 0.995 | 0.99 | 0.975 | 0.95 | 0.90 | 0.10 | 0.05 | 0.025 | 0.01 | 0.005 |
---|---|---|---|---|---|---|---|---|---|---|
1 | --- | --- | 0.001 | 0.004 | 0.016 | 2.706 | 3.841 | 5.024 | 6.635 | 7.879 |
2 | 0.010 | 0.020 | 0.051 | 0.103 | 0.211 | 4.605 | 5.991 | 7.378 | 9.210 | 10.597 |
3 | 0.072 | 0.115 | 0.216 | 0.352 | 0.584 | 6.251 | 7.815 | 9.348 | 11.345 | 12.838 |
4 | 0.207 | 0.297 | 0.484 | 0.711 | 1.064 | 7.779 | 9.488 | 11.143 | 13.277 | 14.860 |
5 | 0.412 | 0.554 | 0.831 | 1.145 | 1.610 | 9.236 | 11.070 | 12.833 | 15.086 | 16.750 |
6 | 0.676 | 0.872 | 1.237 | 1.635 | 2.204 | 10.645 | 12.592 | 14.449 | 16.812 | 18.548 |
7 | 0.989 | 1.239 | 1.690 | 2.167 | 2.833 | 12.017 | 14.067 | 16.013 | 18.475 | 20.278 |
8 | 1.344 | 1.646 | 2.180 | 2.733 | 3.490 | 13.362 | 15.507 | 17.535 | 20.090 | 21.955 |
9 | 1.735 | 2.088 | 2.700 | 3.325 | 4.168 | 14.684 | 16.919 | 19.023 | 21.666 | 23.589 |
10 | 2.156 | 2.558 | 3.247 | 3.940 | 4.865 | 15.987 | 18.307 | 20.483 | 23.209 | 25.188 |
11 | 2.603 | 3.053 | 3.816 | 4.575 | 5.578 | 17.275 | 19.675 | 21.920 | 24.725 | 26.757 |
12 | 3.074 | 3.571 | 4.404 | 5.226 | 6.304 | 18.549 | 21.026 | 23.337 | 26.217 | 28.300 |
13 | 3.565 | 4.107 | 5.009 | 5.892 | 7.042 | 19.812 | 22.362 | 24.736 | 27.688 | 29.819 |
14 | 4.075 | 4.660 | 5.629 | 6.571 | 7.790 | 21.064 | 23.685 | 26.119 | 29.141 | 31.319 |
15 | 4.601 | 5.229 | 6.262 | 7.261 | 8.547 | 22.307 | 24.996 | 27.488 | 30.578 | 32.801 |
16 | 5.142 | 5.812 | 6.908 | 7.962 | 9.312 | 23.542 | 26.296 | 28.845 | 32.000 | 34.267 |
17 | 5.697 | 6.408 | 7.564 | 8.672 | 10.085 | 24.769 | 27.587 | 30.191 | 33.409 | 35.718 |
18 | 6.265 | 7.015 | 8.231 | 9.390 | 10.865 | 25.989 | 28.869 | 31.526 | 34.805 | 37.156 |
19 | 6.844 | 7.633 | 8.907 | 10.117 | 11.651 | 27.204 | 30.144 | 32.852 | 36.191 | 38.582 |
20 | 7.434 | 8.260 | 9.591 | 10.851 | 12.443 | 28.412 | 31.410 | 34.170 | 37.566 | 39.997 |
21 | 8.034 | 8.897 | 10.283 | 11.591 | 13.240 | 29.615 | 32.671 | 35.479 | 38.932 | 41.401 |
22 | 8.643 | 9.542 | 10.982 | 12.338 | 14.041 | 30.813 | 33.924 | 36.781 | 40.289 | 42.796 |
23 | 9.260 | 10.196 | 11.689 | 13.091 | 14.848 | 32.007 | 35.172 | 38.076 | 41.638 | 44.181 |
24 | 9.886 | 10.856 | 12.401 | 13.848 | 15.659 | 33.196 | 36.415 | 39.364 | 42.980 | 45.559 |
25 | 10.520 | 11.524 | 13.120 | 14.611 | 16.473 | 34.382 | 37.652 | 40.646 | 44.314 | 46.928 |
26 | 11.160 | 12.198 | 13.844 | 15.379 | 17.292 | 35.563 | 38.885 | 41.923 | 45.642 | 48.290 |
27 | 11.808 | 12.879 | 14.573 | 16.151 | 18.114 | 36.741 | 40.113 | 43.195 | 46.963 | 49.645 |
28 | 12.461 | 13.565 | 15.308 | 16.928 | 18.939 | 37.916 | 41.337 | 44.461 | 48.278 | 50.993 |
29 | 13.121 | 14.256 | 16.047 | 17.708 | 19.768 | 39.087 | 42.557 | 45.722 | 49.588 | 52.336 |
30 | 13.787 | 14.953 | 16.791 | 18.493 | 20.599 | 40.256 | 43.773 | 46.979 | 50.892 | 53.672 |
40 | 20.707 | 22.164 | 24.433 | 26.509 | 29.051 | 51.805 | 55.758 | 59.342 | 63.691 | 66.766 |
50 | 27.991 | 29.707 | 32.357 | 34.764 | 37.689 | 63.167 | 67.505 | 71.420 | 76.154 | 79.490 |
60 | 35.534 | 37.485 | 40.482 | 43.188 | 46.459 | 74.397 | 79.082 | 83.298 | 88.379 | 91.952 |
70 | 43.275 | 45.442 | 48.758 | 51.739 | 55.329 | 85.527 | 90.531 | 95.023 | 100.425 | 104.215 |
80 | 51.172 | 53.540 | 57.153 | 60.391 | 64.278 | 96.578 | 101.879 | 106.629 | 112.329 | 116.321 |
90 | 59.196 | 61.754 | 65.647 | 69.126 | 73.291 | 107.565 | 113.145 | 118.136 | 124.116 | 128.299 |
100 | 67.328 | 70.065 | 74.222 | 77.929 | 82.358 | 118.498 | 124.342 | 129.561 | 135.807 | 140.169 |
It is assumed that there does not exist any correlation between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr2: 60 years to 70 years and Gr3: 70 years and 80 years.
It is assumed that there is a correlation between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr2: 60 years to 70 years and Gr3: 70 years and 80 years.
A data sheet has been prepared based on several news articles, reports and surveys in different nuclear power plant across the globe. It has been possible to record the data of number of employees got infected by cancer during their tenure of service because of the health insurance policy that the company offers to all its employees. Similarly, the health status of the retired employees has been achieved from the health benefit that the company offers even after retirement.
The employees working in nuclear power plant has been categorized into three groups to illustrate the correlation in a proper and intensive way. It has been studied that immunity against cancer is more in young age than that of the elder. However, there are lot of exceptions; mutagen is activated in elder people with very less exposition to radiations than that of others. On the other hand, it has been observed that an individual at a young age has been exposed to cancer causing radiation, however, the cancer has been observed at a very later period of his life. Thus, considering the strength of immunity in an individual, the age groups are made accordingly.
Name | Total | Infected |
---|---|---|
Byron Nuclear Power Station | 329 | 34 |
Peach Bottom Atomic Power Station | 347 | 37 |
Oconee Nuclear Station | 387 | 47 |
Braidwood Generating Station | 451 | 71 |
South Texas Project Electric Generating Station | 459 | 52 |
Susquehanna Nuclear Power Plant | 674 | 89 |
Mcguire Nuclear Power Plant | 725 | 103 |
Browns Ferry Nuclear Plant | 978 | 178 |
Palo Verde Generation Station | 1564 | 302 |
Vogtle Nuclear Power Station | 3875 | 879 |
Name | Total | Infected |
---|---|---|
Byron Nuclear Power Station | 334 | 37 |
Peach Bottom Atomic Power Station | 345 | 38 |
Oconee Nuclear Station | 379 | 58 |
Braidwood Generating Station | 463 | 98 |
South Texas Project Electric Generating Station | 487 | 103 |
Susquehanna Nuclear Power Plant | 621 | 115 |
Mcguire Nuclear Power Plant | 798 | 145 |
Browns Ferry Nuclear Plant | 970 | 161 |
Palo Verde Generation Station | 1498 | 298 |
Vogtle Nuclear Power Station | 3389 | 789 |
Name | Total | Infected |
---|---|---|
Byron Nuclear Power Station | 289 | 46 |
Peach Bottom Atomic Power Station | 297 | 52 |
Oconee Nuclear Station | 303 | 67 |
Braidwood Generating Station | 401 | 132 |
South Texas Project Electric Generating Station | 432 | 136 |
Susquehanna Nuclear Power Plant | 543 | 105 |
Mcguire Nuclear Power Plant | 641 | 187 |
Browns Ferry Nuclear Plant | 879 | 190 |
Palo Verde Generation Station | 1273 | 398 |
Vogtle Nuclear Power Station | 2894 | 982 |
Total No. of Employees | Infected Employees | Percentage |
---|---|---|
329 | 34 | 10.33 |
347 | 37 | 10.66 |
387 | 47 | 12.14 |
451 | 71 | 15.74 |
459 | 52 | 11.32 |
674 | 89 | 13.20 |
725 | 103 | 14.20 |
978 | 178 | 18.20 |
1564 | 302 | 19.30 |
3875 | 879 | 22.68 |
Total No. of Employees | Infected Employees | Percentage |
---|---|---|
334 | 37 | 11.08 |
345 | 38 | 11.01 |
379 | 58 | 15.30 |
463 | 98 | 21.17 |
487 | 103 | 21.15 |
621 | 115 | 18.52 |
798 | 145 | 18.17 |
970 | 161 | 16.60 |
1498 | 298 | 19.89 |
3389 | 789 | 23.28 |
Total No. of Employees | Infected Employees | Percentage |
---|---|---|
289 | 46 | 15.92 |
297 | 52 | 17.51 |
303 | 67 | 22.11 |
401 | 132 | 32.92 |
432 | 136 | 31.48 |
543 | 105 | 19.34 |
641 | 187 | 29.17 |
879 | 190 | 21.62 |
1273 | 398 | 31.26 |
2894 | 982 | 33.93 |
Sample Calculation
Percentage of Infected Employee \(= \frac{34}{326}=10.33\)
In Figure 5 to Figure 7, percentage of employee who were getting infected by cancer out of the total number of employees have been found. As the interval in total number of employees (independent variable) is not regular, the mean value and standard deviation will not serve any purpose in analyzing the data. Rather, the number of employees infected by cancer is completely depending upon the total number of employees working in that particular power plant. Thus, percentage has been calculated.
In Figure 5, it has been observed that the percentage of employees infected by cancer is ranging between 10% and 23%. However, it is noticed that number of infected employees is increasing with the total number of employees working in a power plant. Similarly, in table 5, the percentage of infected employee is ranging between 11% and 24% with 11% infected being in the power plant with least number of working employees and 24% being the maximum number of employees working. In table 6, as the age group is between 70 years and 80 years, it can be assumed that the total number of employees who worked for the power plants may have decreased due to death rates in the age. Thus, the total number of employees currently alive is less than that of the other groups. On the other hand, the percentage of infected employees has also increased over the other groups, ranging between 15% and 34% with 15% infected being in the power plant with least number of working employees and 34% being the maximum number of employees working.
The X – Axis of the graph denotes the total number of employees working or worked in nuclear power plants (independent variable).
The Y – Axis of the graph denotes number of employees who are currently infected by cancer (dependent variable).
In all the graphs from no. 1 to no. 3, a linear trendline has been obtained using the data that has been collected from the official websites of the nuclear power plants, newspapers, journals, articles etc.
In Figure 8, the equation of trendline is
y = 0.2386x - 54.366
In Figgur 9, the equation of trendline is
y = 0.2401x - 38.697
In Figure 10, the equation of trendline is
y = 0.352x - 50.422
From the graphs, it can be stated that, there exists a positive increasing correlation between the number of employees getting infected by cancer and the total number of employees either currently working or worked in the nuclear power plants. However, a few outliers have been noticed in the graphs as well.
There are a few outliers when the total number of employees are in the range of 500 to 750. Due to presence of very less number of outliers, the value of regression coefficient is 0.99. Such a high value (close to one) of regression coefficient satisfies the existence of any linear correlation between the dependent and the independent variable.
From the equation of the trendline of figure 8, the Y – intercept of the trendline has been calculated
y = 0.2386x - 54.366
The value of y for x = 0 will be
y = 0.2386 × 0 - 54.366
=> y = -54.366
From the equation of the trendline of figure 9, the Y – intercept of the trendline has been calculated
y = 0.2401x - 38.697
The value of y for x = 0 will be
y = 0.2401 × 0 - 38.697
=> y = -38.697
From the equation of the trendline of figure 10, the Y – intercept of the trendline has been calculated
y = 0.352x - 50.422
The value of y for x = 0 will be
y = 0.352 × 0 - 50.422
=> y = -50.422
The value of Y – Intercept is -54.366, -38.697, and -50.422 for figure 8, figure 9 and figure 10 respectively. A negative intercept is suggests that if the total number of employee is zero, then the number of infected individual should be negative. However, literally it cannot be possible; mathematical significance of the statement is if the total number of employee is considered to be null, there will not be any infected patient as well; this justifies the fact of increase in the rate of cancer through nuclear power plant.
From the equation of the trendline of figure 8, the X – intercept of the trendline has been calculated
y = 0.236x - 54.366
The value of x for y = 0 will be
0 = 0.236x - 54.366
\(=> x = \frac{54.366}{0.2386}\)
=> x = 227.85
From the equation of the trendline of figure 9, the Y – intercept of the trendline has been calculated
y = 0.2401x - 38.697
The value of x for y = 0 will be
0 = 0.2401x - 38.697
\(=> x = \frac{38.697}{0.2401}\)
=> x = 227.85
From the equation of the trendline of figure 10, the Y – intercept of the trendline has been calculated
y = 0.352x - 50.422
The value of x for y = 0 will be
0 = 0.352x - 50.422
\(=> x = \frac{50.422}{0.352}\)
=> x = 143.24
The value of X – Intercept is 228, 161, and 143 approximately for graph 1, graph 2 and graph 3 respectively. The mathematical significance of the statement is if the total number of employee is 228 for age group 50 years to 60 years, 161 for age group 60 to 70 years and 143 for age group 70 to 80 years, there will not be any infected patient.
Processed Data for calculation of R^{2}
There are five headers of the processed data tables expressed as x, y, x^{2}, y^{2}, xy. The total number of employees is represented by x and the number of employee infected by cancer is represented by y. The remaining headers has usual meaning. The calculation of R^{2} correlation coefficient is shown explore the efficiency and stability of the trendline and the correlation.
x | y | x^{2} | y^{2} | xy |
---|---|---|---|---|
329 | 34 | 108241 | 1156 | 11186 |
347 | 37 | 120409 | 1369 | 12839 |
387 | 47 | 149769 | 2209 | 18189 |
451 | 71 | 203401 | 5041 | 32021 |
459 | 52 | 210681 | 2704 | 23868 |
674 | 89 | 454276 | 7921 | 59986 |
725 | 103 | 525625 | 10609 | 74675 |
978 | 178 | 956484 | 31684 | 174084 |
1564 | 302 | 2446096 | 91204 | 472328 |
3875 | 879 | 15015625 | 772641 | 3406125 |
∑ x = 9789 | ∑ y = 1792 | ∑ x^{2} = 20190607 | ∑ y^{2} = 926538 | ∑ xy = 4285301 |
Figure 11 - Table On Processed Data For Calculation Of R^{2} For Group 1
x | y | x^{2} | ^{y2} | y^{2} |
---|---|---|---|---|
334 | 37 | 111556 | 1369 | 12358 |
345 | 38 | 119025 | 1444 | 13110 |
379 | 58 | 143641 | 3364 | 21982 |
463 | 98 | 214369 | 9604 | 45374 |
487 | 103 | 237169 | 10609 | 50161 |
621 | 115 | 385641 | 13225 | 71415 |
798 | 145 | 636804 | 21025 | 115710 |
970 | 161 | 940900 | 25921 | 156170 |
1498 | 298 | 2244004 | 88804 | 446404 |
3389 | 789 | 11485321 | 622521 | 2673921 |
∑x = 9284 | ∑y = 1842 | ∑x^{2} = 16518430 | ∑y^{2} = 797886 | ∑xy = 3606605 |
Figure 12 - Table On Processed Data For Calculation Of R^{2} For Group 2
289 | 46 | 83521 | 2116 | 13294 |
297 | 52 | 88209 | 2704 | 15444 |
303 | 67 | 91809 | 4489 | 20301 |
401 | 132 | 160801 | 17424 | 52932 |
432 | 136 | 186624 | 18496 | 58752 |
543 | 105 | 294849 | 11025 | 57015 |
641 | 187 | 410881 | 34969 | 119867 |
879 | 190 | 772641 | 36100 | 167010 |
1273 | 398 | 1620529 | 158404 | 506654 |
2894 | 982 | 8375236 | 964324 | 2841908 |
∑x = 7952 | ∑y = 2295 | ∑x^{2} = 12085100 | ∑y^{2} = 1250051 | ∑xy = 3853177 |
Figure 13 - Table On Processed Data For Calculation Of R^{2} For Group 3
The formula of regression coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and denotes the sum of all the observation of the mentioned variable.
Calculation for Group 1
\(r = \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-(\sum x)^2][n\sum y^2-(\sum y)^2]}}\)
\(=>r=\frac{10(4285301)-(9789)(1792)}{\sqrt{[10 × 20190607-(9789)^2][10 × 926538 - (1792)^2]}}\)
=> r = 0.9987
=> r^{2} = 0.9975
Calculation for Group 2
\(r = \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-(\sum x)^2][n\sum y^2-(\sum y)^2]}}\)
\(=>r = \frac{10(3606605)-(9284)(1842)}{\sqrt{[10 × 16518430-(9284)^2][10 × 797886--(1842)^2]}}\)
=> r = 0.9964
=> r^{2} = 0.9929
Calculation for Group 3
\(r = \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-(\sum x)^2][n\sum y^2-(\sum y)^2]}}\)
\(=>r= \frac{10(383177)--(7952)(2295)}{\sqrt{[10 × 12085100 - (7952)^2][10 × 1250051 (2295)^2]}}\)
=> r = 0.993
=> r^{2} = 0.987
The value of regression coefficient is 0.9975, 0.9929, and 0.987 for group 1, group 2 and group 3 respectively. Such a high value (close to one) of regression coefficient satisfies the existence of any linear correlation between the dependent and the independent variable.
Processed Data Table for calculation of Pearson’s Correlation:
There are seven headers of the processed data table for calculation of Pearson’s correlation coefficient expressed as, x, y, \(x-\bar x\), \(y- \bar y\), \((x-\bar x)^2\), and \((y- \bar y)^2\). The total number of employees is represented by x and the number of employees infected by cancer is represented by y, \(\bar x\) is the arithmetic mean of all the observations of total number of employees, \(\bar y\) is the arithmetic mean of all the observations of the number of employees infected by cancer. The remaining headers has usual meaning. The calculation of Pearson’s correlation coefficient is shown to explore the efficiency and stability of the trendline and the correlation.
329 | 34 | -649.9 | -145.2 | 94365.48 | 422370.01 | 21083.04 |
347 | 37 | -631.9 | -142.2 | 89856.18 | 399297.61 | 20220.84 |
387 | 47 | -591.9 | -132.2 | 78249.18 | 350345.61 | 17476.84 |
451 | 71 | -527.9 | -108.2 | 57118.78 | 278678.41 | 11707.24 |
459 | 52 | -519.9 | -127.2 | 66131.28 | 270296.01 | 16179.84 |
674 | 89 | -304.9 | -90.2 | 27501.98 | 92964.01 | 8136.04 |
725 | 103 | -253.9 | -76.2 | 19347.18 | 64465.21 | 5806.44 |
978 | 178 | -0.9 | -1.2 | 1.08 | 0.81 | 1.44 |
1564 | 302 | 585.1 | 122.8 | 71850.28 | 342342.01 | 15079.84 |
3875 | 879 | 2896.1 | 699.8 | 2026690.78 | 8387395.2 1 | 489720.04 |
334 | 37 | -594.4 | -147.2 | 87495.68 | 353311.36 | 21667.84 |
345 | 38 | -583.4 | -146.2 | 85293.08 | 340355.56 | 21374.44 |
379 | 58 | -549.4 | -126.2 | 69334.28 | 301840.36 | 15926.44 |
463 | 98 | -465.4 | -86.2 | 40117.48 | 216597.16 | 7430.44 |
487 | 103 | -441.4 | -81.2 | 35841.68 | 194833.96 | 6593.44 |
621 | 115 | -307.4 | -69.2 | 21272.08 | 94494.76 | 4788.64 |
798 | 145 | -130.4 | -39.2 | 5111.68 | 17004.16 | 1536.64 |
970 | 161 | 41.6 | -23.2 | -965.12 | 1730.56 | 538.24 |
1498 | 298 | 569.6 | 113.8 | 64820.48 | 324444.16 | 12950.44 |
3389 | 789 | 2460.6 | 604.8 | 1488170.88 | 6054552.3 6 | 365783.04 |
289 | 46 | -506.2 | -183.5 | 92887.7 | 256238.44 | 33672.25 |
297 | 52 | -498.2 | -177.5 | 88430.5 | 248203.24 | 248203.24 |
303 | 67 | -492.2 | -162.5 | 79982.5 | 242260.84 | 26406.25 |
401 | 132 | -394.2 | -97.5 | 38434.5 | 155393.64 | 9506.25 |
432 | 136 | -363.2 | -93.5 | 33959.2 | 131914.24 | 8742.25 |
543 | 105 | -252.2 | -124.5 | 31398.9 | 63604.84 | 15500.25 |
641 | 187 | -154.2 | -42.5 | 6553.5 | 23777.64 | 1806.25 |
879 | 190 | 83.8 | -39.5 | -3310.1 | 7022.44 | 1560.25 |
1273 | 398 | 477.8 | 168.5 | 80509.3 | 228292.84 | 28392.25 |
2894 | 982 | 2098.8 | 752.5 | 1579347 | 4404961.4 4 | 566256.25 |
The formula of Pearson’s correlation coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, is the arithmetic mean of all the observations of the independent variable, is the arithmetic mean of all the observations of the dependent variable and denotes the sum of all the observation of the mentioned variable.
Calculation for Table 8A
\(\bar x = \frac{\sum x}{N}=\frac{9789}{10} = 978.9\)
\(\bar y = \frac{\sum y}{N}=\frac{1792}{10}=179.2\)
\( \displaystyle \sum (x - \bar{x})(y - \bar{y}) = 2531112.2 \)
\(\displaystyle \sum (x- \bar x)(y- \bar y) = 2531112.2\)
\(\displaystyle \sum (x-\bar x)^2=1008154.9\)
\(\displaystyle\sum(y-\bar y)=605411.6\)
Let, the Pearson’s Correlation Coefficient be \(\mathfrak{R}\).
\(\mathfrak{R}=\frac{\sum{({x}-\bar{{x}})({y}-\bar{{y}})}}{\sqrt{\sum{({x}-\bar{{x}})}^{2}\times\sum{({y}-\bar{{y}})}^{2}}}\)
\(\mathfrak{R}=\frac{2531112.2}{\sqrt{10608154.9\times605411.6}}=0.998\)
Calculation for Table 8B
\(\bar x=\frac{\sum x}{N}= \frac{9284}{10}=928.4\)
\(\bar y = \frac{\sum y}{N}=\frac{9284}{10}=184.2\)
\(\displaystyle\sum (x-\bar x)(y-\bar y)=1896492.2\)
\(\displaystyle \sum (x- \bar x)^2=7899164.4\)
\(\displaystyle \sum (y-\bar y)^2=458589.6\)
Let, the Pearson’s Correlation Coefficient be \(\mathfrak{R}\).
\(\mathfrak{R}=\frac{\sum{({x}-\bar{{x}})({y}-\bar{{y}})}}{\sqrt{\sum{({x}-\bar{{x}})}^{2}\times\sum{({y}-\bar{{y}})}^{2}}}\)
\(\mathfrak{R}=\frac{1896492.2}{\sqrt{7899164.4\times458589.6}}=0.996\)
Calculation for Table 8C
\(\bar x=\frac{\sum x}{N}=\frac{7952}{10}=795.2\)
\(\bar y = \frac{\sum y}{N} = \frac{7952}{10}=229.5\)
\(\displaystyle \sum (x-\bar x)(y -\bar y)=2028193\)
\(\displaystyle \sum(x-\bar x)^2 = 576169.6\)
\(\displaystyle \sum (y-\bar y)^2= 723348.5\)
Let, the Pearson’s Correlation Coefficient be \(\mathfrak{R}\).
\(\mathfrak{R}=\frac{\sum{({x}-\bar{{x}})({y}-\bar{{y}})}}{\sqrt{\sum{({x}-\bar{{x}})}^{2}\times\sum{({y}-\bar{{y}})}^{2}}}\)
\(\mathfrak{R}=\frac{2028193}{\sqrt{5761669.6\times723348.5}}=0.993\)
The value of Pearson’s correlation coefficient for three groups are 0.998, 0.996 and 0.993 respectively. As it is a positive value, it can be stated that the correlation is increasing in nature, i.e., with an increase in total number of employees working or worked in the nuclear power plant, the number of cancer infected patient also increases. However, the value of Pearson’s correlation coefficient is very close to one. It signifies that the strength of correlation is very strong.
The hypothesis has been evaluated with the help of T – Test in this section of this mathematical exploration. The T – Test will conclude whether or not the null hypothesis or the alternate hypothesis is true.
x^{2}
Observed Value (O) | Expected Value (E) | |||
---|---|---|---|---|
10.33 | 9.52391937 | 0.80608063 | 0.64976598 | 0.06822464 |
11.08 | 11.3543268 | -0.2743268 | 0.07525519 | 0.00662789 |
15.92 | 16.4517538 | -0.5317538 | 0.2827621 | 0.01718735 |
10.66 | 9.99590573 | 0.66409427 | 0.4410212 | 0.04412018 |
11.01 | 11.9170245 | -0.9070245 | 0.82269344 | 0.06903514 |
17.51 | 17.2670698 | 0.2429302 | 0.05901508 | 0.00341778 |
12.14 | 12.6415806 | -0.5015806 | 0.2515831 | 0.01990124 |
15.3 | 15.0711732 | 0.2288268 | 0.0523617 | 0.0034743 |
22.11 | 21.8372462 | 0.2727538 | 0.07439464 | 0.00340678 |
15.74 | 17.8155717 | -2.0755717 | 4.30799788 | 0.24181081 |
21.17 | 21.2395565 | -0.0695565 | 0.00483811 | 0.00022779 |
32.92 | 30.7748719 | 2.1451281 | 4.60157457 | 0.14952376 |
11.32 | 16.3154204 | -4.9954204 | 24.954225 | 1.5294871 |
21.15 | 19.4510903 | 1.6989097 | 2.88629417 | 0.14838727 |
31.48 | 28.1834893 | 3.2965107 | 10.8669828 | 0.38557975 |
13.2 | 13.0268235 | 0.1731765 | 0.0299901 | 0.00230218 |
18.52 | 15.5304561 | 2.9895439 | 8.93737273 | 0.57547394 |
19.34 | 22.5027203 | -3.1627203 | 10.0027997 | 0.44451513 |
14.2 | 15.7005625 | -1.5005625 | 2.25168782 | 0.14341447 |
18.17 | 18.7180625 | -0.5480625 | 0.3003725 | 0.0160472 |
29.17 | 27.121375 | 2.048625 | 4.19686439 | 0.15474379 |
18.2 | 14.3943084 | 3.8056916 | 14.4832886 | 1.00618162 |
16.6 | 17.1607586 | -0.5607586 | 0.31445021 | 0.01832379 |
21.62 | 24.864933 | -3.244933 | 10.5295902 | 0.42347149 |
19.3 | 17.9737509 | 1.3262491 | 1.75893668 | 0.09786141 |
19.89 | 21.4281362 | -1.5381362 | 2.36586297 | 0.11040918 |
31.26 | 31.0481129 | 0.2118871 | 0.04489614 | 0.00144602 |
22.68 | 20.3821569 | 2.2978431 | 5.28008291 | 0.25905418 |
23.28 | 24.2994152 | -1.0194152 | 1.03920735 | 0.04276676 |
33.93 | 35.2084278 | -1.2784278 | 1.63437764 | 0.04642007 |
x^{2}
\(\displaystyle \sum \frac{(0-E)}{E}=0.068+0.006 +...+ 0.046= 6.032843\)
∴ x^{2} = 6.032843
Degree of Freedom = (Column - 1) (Row - 1)
= (3 - 1) (10-1) = 2 × 9 = 18
Examining the value of with respect to the degree of freedom using the table as shown in Background Information Section, it is concluded that the Null Hypothesis is rejected and the Alternate Hypothesis is accepted.
What is the relationship between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr2: 60 years to 70 years and Gr3: 70 years and 80 years?
The relationship between the number of employees working or worked in Nuclear Power Plant and the number of employees out of them who are getting or got infected by cancer respectively is direct, i.e., with increase in total number of employees, the number of employees infected by cancer is also increased.
In this investigation, several process and mathematical tools have been observed to find the correlation along with its strength. The choice of nuclear power plants is one of the most important strength of this investigation. It has provided with a data sheet with accurate observations of employee count. On the other hand, internationally proclaimed newspapers has also contributed in this. Use of two different correlation coefficients – Regression and Pearson’s correlation coefficient has provided the strength and nature of correlation. Furthermore, calculation of percentage of employees infected with cancer has enabled the investigation to analyse the variation of cancer infected employee (dependent variable) in the observed data sheet. Lastly, the use of – test has provided the conclusion regarding the correlation.
However, there are few weakness that has been observed during this mathematical investigation. As immunity of human body is very uncertain and cannot be generalised. Moreover, cancer is one of the disease in which research is still going on and there are a lot of gaps or queries such as causes of cancer, etc. which governs the rate of spreading of cancer. As there are a lot of variables affecting the dependent variable apart from total employee count, thus, the correlation study cannot be efficiently carried on. In order to employ an efficient correlative analysis on the research question, all of these parameters must be controlled or made constant.