Mathematics AI SL
Mathematics AI SL
Sample Internal Assessment
Sample Internal Assessment
6/7
6/7
10 mins Read
10 mins Read
1,967 Words
1,967 Words
English
English
Free
Free

To what extent is there a co-relation between total number of employees working in nuclear power plant and number of employees getting infected by cancer for three age groups?

Table of content

Rationale

"Although September 11 was horrible, it didn't threaten the survival of the human race, like nuclear weapons do." – Stephen Hawking. Despite the heinous terrorism happens across the globe, the destructive power of nuclear bomb has left a scar and fear in every individual of this world.

 

Since childhood, I have been listening to the nuisance that happened in Hiroshima and Nagasaki in the year 1945 marking the end of World War II. The destructive capacity of nuclear energy was very clear and prominent to me since the early school days.

 

It was secondary education when the picture of nuclear power and nuclear energy began to change in front of me when I studied about the nuclear energy as a non-conventional source of energy. Despite initial doubts and queries which arose due to the childhood stories of catastrophism, I came across a fact that it is the nuclear energy which is the face of change in providing energy to the mankind. The amount of energy that could be produced by nuclear reaction is unparallel to any other source of energy.

 

With passing days, the course curriculum became more intense and I started learning in depth concepts of nuclear energy. About two years from now, I have studied that during the nuclear fission reaction which generates the nuclear energy in Physics. Sooner or later, I felt a deep inclination towards the subject. Due to the highly constructive facility that it provides to the mankind, I thought of pursuing higher studies in Nuclear Energy and working in Nuclear Power Station.

 

However, currently in Biology, I studied about the disease named cancer. Some of the facts have shattered my dream of pursuing a job in nuclear power plant. In the curriculum, I studied that γ - ray causes cancer. The subtle fear which was developed regarding the devastating effects of nuclear energy again filled into my mind because in nuclear fission reaction, the reaction using which nuclear energy is generated, γ - rays are emitted.

 

To remove the fear and to concentrate on the career, I started doing a few researches. I read a few journals on side – effects of nuclear energy. There were several instances of an increased chance of getting affected by cancer if an individual is exposed to the harmful γ - ray. However, I came across a lot of articles where the preventive measures were discussed which were taken in every nuclear power plant to protect their employees from radiation. To be more confident on this, I read a lot of news journals and articles from which I came across the fact that employees working in nuclear power plant are often getting affected by cancer. However, I could not find any information on the chances of getting affected by cancer for a nuclear plant employee.

 

To find the answer, I am working on this mathematical exploration so that I can derive some relation on chances of affected by cancer if I pursue my dream job.

Aim

The main motive of this investigation is to explore the correlation between the number of employees working in a nuclear power plant and the number of employees getting affected by cancer.

Research question

What is the relationship between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr 2: 60 years to 70 years and Gr 3: 70 years and 80 years?

  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course

  • Introduction

    What is cancer

    Cancer 1 is a disease which is characterized by uncontrolled cell division. It results in repetitive division of cell which often causes formation of tumor, cyst, fibroid etc. However, tumors are categorized into two types – Benign and Malignant; Malignant tumors are considered to be cancerous. Cells of malignant tumor or cancerous cells can spread throughout the body through the blood stream and initiate the formation of tumor in any other part of the body. This results in development of pressure on vital organs on where the tumor has originated which leads to organ failure. Tumor also constricts blood vessels at its vicinity resulting in increased heart rate and blood pressure eventually increasing the chances of stroke or heart fail.

    What causes cancer

    There are several causative agents which triggers the cells to divide at an uncontrolled manner. However, in context of this mathematical exploration, radiation is one of the reasons responsible for causing cancer. Radiations like gamma rays, X – rays, etc. are considered to be one of the most eminent causative agents of cancer. These radiations have sufficient ionization energy to trigger the mutagen present in human DNA. On activation of mutagen of any cell, the cell began to divide continuously without maintaining the cell cycle which leads to formation of malignant or cancerous tumor.

     

    From several news reports and scientific research, it is now a clear statement that due to increased emission of greenhouse gas, depletion of ozone layer has caused the harmful ultra violet rays to pass through the Earth’s atmosphere. As a result, cases of skin cancer have increased invariably in the world. This signifies the effect of radiation in causing cancer.

  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course

  • Nuclear power plant

    Nuclear power station 4 or nuclear power plant is a power plant which generates energy by nuclear fission reaction. Nuclear fission reaction is performed in a nuclear reactor in which the heat generated by the nuclear reaction is used to convert water into steam. The steam, thus generated is used to run a turbine which generates electricity.

     

    The nuclear fission reaction is accompanied by emission of radiations, such as, α - rays, β - rays, γ - rays etc. Out of which, γ - ray is considered to be the most harmful radiation. The nuclear reactor is constructed in such a way that the leakage of radiation is assured to be null. However, a number of preventive measures in respect to dresses, medical check – up, etc. of employees working in nuclear power plants are taken into consideration. Despite such preventive measures, instances have been noted where radiation has been leaked which has caused severe illness not only to the employees but also to the individuals living in the nearby areas of the power plant. This is because, γ - ray can pass through even inches of metal sheet like lead.

    Regression correlation coefficient

    Regression correlation coefficient is a tool to measure the strength of the correlation between the independent variable and the dependent variable. The set of values (x1, y1), (x2, y2), (xn, yn), are used to find the value of r as stated by the formula below

     

    \(r=\frac{n(\Sigma xy)-(\Sigma x)(\Sigma y)}{\sqrt{[n\Sigma x^2-(\Sigma x)^2][n\Sigma y^2-(\Sigma y)^2]}}\)

     

    In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and  denotes the sum of all the observation of the mentioned variable.

     

    By squaring the value of r, the value of the regression coefficient (r2) will be achieved. The value of r2 lies between 0 and 1 where 1 signifies maximum correlation whereas 0 signifies null correlation.

    Pearson’s correlation coefficient

    Pearson’s correlation coefficient is a tool to measure the strength of the correlation and also the nature of correlation between the independent variable and the dependent variable. The set of values , (x1y1), (x2y2), (xnyn), are used to find the value of as stated by the formula below:

     

    \(\mathfrak{R}=\frac{\Sigma(x-\bar x)(y-\bar y)}{\sqrt{\Sigma (x-\bar x)^2 \Sigma×(y-\bar y)^2}}\)

     

    In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, \(\bar x\) is the arithmetic mean of all the observations of the independent variable, \(\bar y\) is the arithmetic mean of all the observations of the dependent variable and denotes the sum of all the observation of the mentioned variable.

     

    The value of R lies between -1 and 1. A positive value of Pearson’s correlation coefficient implies a direct relationship the independent and the dependent variable whereas, a negative value of Pearson’s correlation coefficient implies a indirect relationship the independent and the dependent variable. If the value of the correlation coefficient is close of 1 or -1, it signifies the correlation exists true. On the other hand, if the value of the correlation coefficient is close to 0, it signifies the correlation does not exist.

  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course

  • Chi squared test

    Chi squared test  is a kind of analysis which predicts the existence of any correlation between an independent variable and a dependent variable. The Chi squared value of any given set of data is firstly calculated. Now, based on the type of data, for example, paired data or independent data, the Chi squared value is checked in the Chi squared table which further predicts the existence of any correlation.

     

    The formula of Chi squared value is given below

     

    \(x^2 \text{ value} = \sum \frac{(O_i - E_i)^2}{E_i} \)

     

    Here, is the observed value, Ei is the expected value, denotes the sum of all the observation of the mentioned variable.

     

    Now, the Chi squared value is checked in Chi squared table which predicts the existence of any correlation. The Chi squared table is shown below

    df
    0.995
    0.99
    0.975
    0.95
    0.90
    0.10
    0.05
    0.025
    0.01
    0.005
    1
    ---
    ---
    0.001
    0.004
    0.016
    2.706
    3.841
    5.024
    6.635
    7.879
    2
    0.010
    0.020
    0.051
    0.103
    0.211
    4.605
    5.991
    7.378
    9.210
    10.597
    3
    0.072
    0.115
    0.216
    0.352
    0.584
    6.251
    7.815
    9.348
    11.345
    12.838
    4
    0.207
    0.297
    0.484
    0.711
    1.064
    7.779
    9.488
    11.143
    13.277
    14.860
    5
    0.412
    0.554
    0.831
    1.145
    1.610
    9.236
    11.070
    12.833
    15.086
    16.750
    6
    0.676
    0.872
    1.237
    1.635
    2.204
    10.645
    12.592
    14.449
    16.812
    18.548
    7
    0.989
    1.239
    1.690
    2.167
    2.833
    12.017
    14.067
    16.013
    18.475
    20.278
    8
    1.344
    1.646
    2.180
    2.733
    3.490
    13.362
    15.507
    17.535
    20.090
    21.955
    9
    1.735
    2.088
    2.700
    3.325
    4.168
    14.684
    16.919
    19.023
    21.666
    23.589
    10
    2.156
    2.558
    3.247
    3.940
    4.865
    15.987
    18.307
    20.483
    23.209
    25.188
    11
    2.603
    3.053
    3.816
    4.575
    5.578
    17.275
    19.675
    21.920
    24.725
    26.757
    12
    3.074
    3.571
    4.404
    5.226
    6.304
    18.549
    21.026
    23.337
    26.217
    28.300
    13
    3.565
    4.107
    5.009
    5.892
    7.042
    19.812
    22.362
    24.736
    27.688
    29.819
    14
    4.075
    4.660
    5.629
    6.571
    7.790
    21.064
    23.685
    26.119
    29.141
    31.319
    15
    4.601
    5.229
    6.262
    7.261
    8.547
    22.307
    24.996
    27.488
    30.578
    32.801
    16
    5.142
    5.812
    6.908
    7.962
    9.312
    23.542
    26.296
    28.845
    32.000
    34.267
    17
    5.697
    6.408
    7.564
    8.672
    10.085
    24.769
    27.587
    30.191
    33.409
    35.718
    18
    6.265
    7.015
    8.231
    9.390
    10.865
    25.989
    28.869
    31.526
    34.805
    37.156
    19
    6.844
    7.633
    8.907
    10.117
    11.651
    27.204
    30.144
    32.852
    36.191
    38.582
    20
    7.434
    8.260
    9.591
    10.851
    12.443
    28.412
    31.410
    34.170
    37.566
    39.997
    21
    8.034
    8.897
    10.283
    11.591
    13.240
    29.615
    32.671
    35.479
    38.932
    41.401
    22
    8.643
    9.542
    10.982
    12.338
    14.041
    30.813
    33.924
    36.781
    40.289
    42.796
    23
    9.260
    10.196
    11.689
    13.091
    14.848
    32.007
    35.172
    38.076
    41.638
    44.181
    24
    9.886
    10.856
    12.401
    13.848
    15.659
    33.196
    36.415
    39.364
    42.980
    45.559
    25
    10.520
    11.524
    13.120
    14.611
    16.473
    34.382
    37.652
    40.646
    44.314
    46.928
    26
    11.160
    12.198
    13.844
    15.379
    17.292
    35.563
    38.885
    41.923
    45.642
    48.290
    27
    11.808
    12.879
    14.573
    16.151
    18.114
    36.741
    40.113
    43.195
    46.963
    49.645
    28
    12.461
    13.565
    15.308
    16.928
    18.939
    37.916
    41.337
    44.461
    48.278
    50.993
    29
    13.121
    14.256
    16.047
    17.708
    19.768
    39.087
    42.557
    45.722
    49.588
    52.336
    30
    13.787
    14.953
    16.791
    18.493
    20.599
    40.256
    43.773
    46.979
    50.892
    53.672
    40
    20.707
    22.164
    24.433
    26.509
    29.051
    51.805
    55.758
    59.342
    63.691
    66.766
    50
    27.991
    29.707
    32.357
    34.764
    37.689
    63.167
    67.505
    71.420
    76.154
    79.490
    60
    35.534
    37.485
    40.482
    43.188
    46.459
    74.397
    79.082
    83.298
    88.379
    91.952
    70
    43.275
    45.442
    48.758
    51.739
    55.329
    85.527
    90.531
    95.023
    100.425
    104.215
    80
    51.172
    53.540
    57.153
    60.391
    64.278
    96.578
    101.879
    106.629
    112.329
    116.321
    90
    59.196
    61.754
    65.647
    69.126
    73.291
    107.565
    113.145
    118.136
    124.116
    128.299
    100
    67.328
    70.065
    74.222
    77.929
    82.358
    118.498
    124.342
    129.561
    135.807
    140.169
    Figure 1 - Table On The Chi Squared Table Is Shown Below

    Hypothesis

  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course

  • Null hypothesis

    It is assumed that there does not exist any correlation between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr2: 60 years to 70 years and Gr3: 70 years and 80 years.

    Alternate hypothesis

    It is assumed that there is a correlation between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr2: 60 years to 70 years and Gr3: 70 years and 80 years.

    Data collection

  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course

  • Source of data

    A data sheet has been prepared based on several news articles, reports and surveys in different nuclear power plant across the globe. It has been possible to record the data of number of employees got infected by cancer during their tenure of service because of the health insurance policy that the company offers to all its employees. Similarly, the health status of the retired employees has been achieved from the health benefit that the company offers even after retirement.

    Justification on categorization of age groups

    The employees working in nuclear power plant has been categorized into three groups to illustrate the correlation in a proper and intensive way. It has been studied that immunity against cancer is more in young age than that of the elder. However, there are lot of exceptions; mutagen is activated in elder people with very less exposition to radiations than that of others. On the other hand, it has been observed that an individual at a young age has been exposed to cancer causing radiation, however, the cancer has been observed at a very later period of his life. Thus, considering the strength of immunity in an individual, the age groups are made accordingly.

    Raw Data Table

    Name
    Total
    Infected
    Byron Nuclear Power Station
    329
    34
    Peach Bottom Atomic Power Station
    347
    37
    Oconee Nuclear Station
    387
    47
    Braidwood Generating Station
    451
    71
    South Texas Project Electric Generating Station
    459
    52
    Susquehanna Nuclear Power Plant
    674
    89
    Mcguire Nuclear Power Plant
    725
    103
    Browns Ferry Nuclear Plant
    978
    178
    Palo Verde Generation Station
    1564
    302
    Vogtle Nuclear Power Station
    3875
    879
    Figure 2 - Table On Total No. of Employees vs. No. of Employees Infected (Gr1: 50 – 60 Years)
    Name
    Total
    Infected
    Byron Nuclear Power Station
    334
    37
    Peach Bottom Atomic Power Station
    345
    38
    Oconee Nuclear Station
    379
    58
    Braidwood Generating Station
    463
    98
    South Texas Project Electric Generating Station
    487
    103
    Susquehanna Nuclear Power Plant
    621
    115
    Mcguire Nuclear Power Plant
    798
    145
    Browns Ferry Nuclear Plant
    970
    161
    Palo Verde Generation Station
    1498
    298
    Vogtle Nuclear Power Station
    3389
    789
    Figure 3 - Table On Total No. of Employees vs. No. of Employees Infected (Gr2: 60 – 70 years):
    Name
    Total
    Infected
    Byron Nuclear Power Station
    289
    46
    Peach Bottom Atomic Power Station
    297
    52
    Oconee Nuclear Station
    303
    67
    Braidwood Generating Station
    401
    132
    South Texas Project Electric Generating Station
    432
    136
    Susquehanna Nuclear Power Plant
    543
    105
    Mcguire Nuclear Power Plant
    641
    187
    Browns Ferry Nuclear Plant
    879
    190
    Palo Verde Generation Station
    1273
    398
    Vogtle Nuclear Power Station
    2894
    982
    Figure 4 - Table On Total No. of Employees vs. No. of Employees Infected (Gr3: 70 – 80 Years):

    Processed data table

    Total No. of Employees
    Infected Employees
    Percentage
    329
    34
    10.33
    347
    37
    10.66
    387
    47
    12.14
    451
    71
    15.74
    459
    52
    11.32
    674
    89
    13.20
    725
    103
    14.20
    978
    178
    18.20
    1564
    302
    19.30
    3875
    879
    22.68
    Figure 5 - Table On Processed Data Table For Gr. 1
    Total No. of Employees
    Infected Employees
    Percentage
    334
    37
    11.08
    345
    38
    11.01
    379
    58
    15.30
    463
    98
    21.17
    487
    103
    21.15
    621
    115
    18.52
    798
    145
    18.17
    970
    161
    16.60
    1498
    298
    19.89
    3389
    789
    23.28
    Figure 6 - Table On Processed Data Table For Gr. 2
    Total No. of Employees
    Infected Employees
    Percentage
    289
    46
    15.92
    297
    52
    17.51
    303
    67
    22.11
    401
    132
    32.92
    432
    136
    31.48
    543
    105
    19.34
    641
    187
    29.17
    879
    190
    21.62
    1273
    398
    31.26
    2894
    982
    33.93
    Figure 7 - Table On Processed Data Table For Gr. 3

    Sample Calculation

     

    Percentage of Infected Employee \(= \frac{34}{326}=10.33\)

  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course

  • Analysis of processed data table

    In Figure 5 to Figure 7, percentage of employee who were getting infected by cancer out of the total number of employees have been found. As the interval in total number of employees (independent variable) is not regular, the mean value and standard deviation will not serve any purpose in analyzing the data. Rather, the number of employees infected by cancer is completely depending upon the total number of employees working in that particular power plant. Thus, percentage has been calculated.

     

    In Figure 5, it has been observed that the percentage of employees infected by cancer is ranging between 10% and 23%. However, it is noticed that number of infected employees is increasing with the total number of employees working in a power plant. Similarly, in table 5, the percentage of infected employee is ranging between 11% and 24% with 11% infected being in the power plant with least number of working employees and 24% being the maximum number of employees working. In table 6, as the age group is between 70 years and 80 years, it can be assumed that the total number of employees who worked for the power plants may have decreased due to death rates in the age. Thus, the total number of employees currently alive is less than that of the other groups. On the other hand, the percentage of infected employees has also increased over the other groups, ranging between 15% and 34% with 15% infected being in the power plant with least number of working employees and 34% being the maximum number of employees working.

    Graphical analysis

    Figure 8 - Total No. of Employees vs. No. of Employees Infected (Gr1: 50 – 60 Years)
    Figure 9 - Total No. of Employees vs. No. of Employees Infected (Gr2: 60 – 70 Years)
    Figure 10 - Total No. of Employees vs. No. of Employees Infected (Gr3: 70 – 80 Years)

    Choice of Axes

    The X – Axis of the graph denotes the total number of employees working or worked in nuclear power plants (independent variable).

     

    The Y – Axis of the graph denotes number of employees who are currently infected by cancer (dependent variable).

    Trendline for linear correlation

    In all the graphs from no. 1 to no. 3, a linear trendline has been obtained using the data that has been collected from the official websites of the nuclear power plants, newspapers, journals, articles etc.

     

    In Figure 8, the equation of trendline is

     

    y = 0.2386x - 54.366

     

    In Figgur 9, the equation of trendline is

     

    y = 0.2401x - 38.697

     

    In Figure 10, the equation of trendline is

     

    y = 0.352x - 50.422

     

    From the graphs, it can be stated that, there exists a positive increasing correlation between the number of employees getting infected by cancer and the total number of employees either currently working or worked in the nuclear power plants. However, a few outliers have been noticed in the graphs as well.

  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course

  • Outliers

    There are a few outliers when the total number of employees are in the range of 500 to 750. Due to presence of very less number of outliers, the value of regression coefficient is 0.99. Such a high value (close to one) of regression coefficient satisfies the existence of any linear correlation between the dependent and the independent variable.

    Intercept for linear correlation

    From the equation of the trendline of figure 8, the Y – intercept of the trendline has been calculated

     

    y = 0.2386x - 54.366

     

    The value of y for x = 0 will be

     

    y = 0.2386 × 0 - 54.366

     

    => y = -54.366

     

    From the equation of the trendline of figure 9, the Y – intercept of the trendline has been calculated

     

    y = 0.2401x - 38.697

     

    The value of y for x = 0 will be

     

    y = 0.2401 × 0 - 38.697

     

    => y = -38.697

     

    From the equation of the trendline of figure 10, the Y – intercept of the trendline has been calculated

     

    y = 0.352x - 50.422

     

    The value of y for x = 0 will be

     

    y = 0.352 × 0 - 50.422

     

    =>  y = -50.422

     

    The value of Y – Intercept is -54.366, -38.697, and -50.422 for figure 8, figure 9 and figure 10 respectively. A negative intercept is suggests that if the total number of employee is zero, then the number of infected individual should be negative. However, literally it cannot be possible; mathematical significance of the statement is if the total number of employee is considered to be null, there will not be any infected patient as well; this justifies the fact of increase in the rate of cancer through nuclear power plant.

     

    From the equation of the trendline of figure 8, the X – intercept of the trendline has been calculated

     

    y = 0.236x - 54.366

     

    The value of x for y = 0 will be

     

    0 = 0.236x - 54.366

     

    \(=> x = \frac{54.366}{0.2386}\)

     

    => x = 227.85

     

    From the equation of the trendline of figure 9, the Y – intercept of the trendline has been calculated

     

    y = 0.2401x - 38.697

     

    The value of x for y = 0 will be

     

    0 = 0.2401x - 38.697

     

    \(=> x = \frac{38.697}{0.2401}\)

     

    => x = 227.85

     

    From the equation of the trendline of figure 10, the Y – intercept of the trendline has been calculated

     

    y = 0.352x - 50.422

     

    The value of x for y = 0 will be

     

    0 = 0.352x - 50.422

     

    \(=> x = \frac{50.422}{0.352}\)

     

    => x = 143.24

     

    The value of X – Intercept is 228, 161, and 143 approximately for graph 1, graph 2 and graph 3 respectively. The mathematical significance of the statement is if the total number of employee is 228 for age group 50 years to 60 years, 161 for age group 60 to 70 years and 143 for age group 70 to 80 years, there will not be any infected patient.

    Calculation of correlation coefficient

  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course

  • Calculation of regression correlation coefficient

    Processed Data for calculation of R2

     

    There are five headers of the processed data tables expressed as x, y, x2, y2, xy. The total number of employees is represented by x and the number of employee infected by cancer is represented by y. The remaining headers has usual meaning. The calculation of R2 correlation coefficient is shown explore the efficiency and stability of the trendline and the correlation.

    x

    y

    x2

    y2

    xy

    329
    34
    108241
    1156
    11186
    347
    37
    120409
    1369
    12839
    387
    47
    149769
    2209
    18189
    451
    71
    203401
    5041
    32021
    459
    52
    210681
    2704
    23868
    674
    89
    454276
    7921
    59986
    725
    103
    525625
    10609
    74675
    978
    178
    956484
    31684
    174084
    1564
    302
    2446096
    91204
    472328
    3875
    879
    15015625
    772641
    3406125

    ∑ x = 9789

    ∑ y = 1792

    ∑ x2 = 20190607

    ∑ y2 = 926538

    ∑ xy = 4285301

    Figure 11 - Table On Processed Data For Calculation Of R2 For Group 1

    x

    y

    x2

    y2

    y2

    334
    37
    111556
    1369
    12358
    345
    38
    119025
    1444
    13110
    379
    58
    143641
    3364
    21982
    463
    98
    214369
    9604
    45374
    487
    103
    237169
    10609
    50161
    621
    115
    385641
    13225
    71415
    798
    145
    636804
    21025
    115710
    970
    161
    940900
    25921
    156170
    1498
    298
    2244004
    88804
    446404
    3389
    789
    11485321
    622521
    2673921

    ∑x = 9284

    ∑y = 1842

    ∑x2 = 16518430

    ∑y2 = 797886

    ∑xy = 3606605

    Figure 12 - Table On Processed Data For Calculation Of R2 For Group 2

    289
    46
    83521
    2116
    13294
    297
    52
    88209
    2704
    15444
    303
    67
    91809
    4489
    20301
    401
    132
    160801
    17424
    52932
    432
    136
    186624
    18496
    58752
    543
    105
    294849
    11025
    57015
    641
    187
    410881
    34969
    119867
    879
    190
    772641
    36100
    167010
    1273
    398
    1620529
    158404
    506654
    2894
    982
    8375236
    964324
    2841908

    ∑x = 7952

    ∑y = 2295

    ∑x2 = 12085100

    ∑y2 = 1250051

    ∑xy = 3853177

    Figure 13 - Table On Processed Data For Calculation Of R2 For Group 3

    The formula of regression coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and denotes the sum of all the observation of the mentioned variable.

     

    Calculation for Group 1

     

    \(r = \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-(\sum x)^2][n\sum y^2-(\sum y)^2]}}\)

     

    \(=>r=\frac{10(4285301)-(9789)(1792)}{\sqrt{[10 × 20190607-(9789)^2][10 × 926538 - (1792)^2]}}\)

     

    => r = 0.9987

     

    => r2 = 0.9975

     

    Calculation for Group 2

     

    \(r = \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-(\sum x)^2][n\sum y^2-(\sum y)^2]}}\)

     

    \(=>r = \frac{10(3606605)-(9284)(1842)}{\sqrt{[10 × 16518430-(9284)^2][10 × 797886--(1842)^2]}}\)

     

    => r = 0.9964

     

    => r2 = 0.9929

     

    Calculation for Group 3

     

    \(r = \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-(\sum x)^2][n\sum y^2-(\sum y)^2]}}\)

     

    \(=>r= \frac{10(383177)--(7952)(2295)}{\sqrt{[10 × 12085100 - (7952)^2][10 × 1250051 (2295)^2]}}\)

     

    => r = 0.993

     

    => r2 = 0.987

    Analysis

    The value of regression coefficient is 0.9975, 0.9929, and 0.987 for group 1, group 2 and group 3 respectively. Such a high value (close to one) of regression coefficient satisfies the existence of any linear correlation between the dependent and the independent variable.

  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course

  • Calculation of pearson’s correlation coefficient

    Processed Data Table for calculation of Pearson’s Correlation:

     

    There are seven headers of the processed data table for calculation of Pearson’s correlation coefficient expressed as, x, y, \(x-\bar x\), \(y- \bar y\), \((x-\bar x)^2\), and \((y- \bar y)^2\). The total number of employees is represented by x and the number of employees infected by cancer is represented by y, \(\bar x\) is the arithmetic mean of all the observations of total number of employees, \(\bar y\) is the arithmetic mean of all the observations of the number of employees infected by cancer. The remaining headers has usual meaning. The calculation of Pearson’s correlation coefficient is shown to explore the efficiency and stability of the trendline and the correlation.

    329
    34
    -649.9
    -145.2
    94365.48
    422370.01
    21083.04
    347
    37
    -631.9
    -142.2
    89856.18
    399297.61
    20220.84
    387
    47
    -591.9
    -132.2
    78249.18
    350345.61
    17476.84
    451
    71
    -527.9
    -108.2
    57118.78
    278678.41
    11707.24
    459
    52
    -519.9
    -127.2
    66131.28
    270296.01
    16179.84
    674
    89
    -304.9
    -90.2
    27501.98
    92964.01
    8136.04
    725
    103
    -253.9
    -76.2
    19347.18
    64465.21
    5806.44
    978
    178
    -0.9
    -1.2
    1.08
    0.81
    1.44
    1564
    302
    585.1
    122.8
    71850.28
    342342.01
    15079.84
    3875
    879
    2896.1
    699.8
    2026690.78
    8387395.2 1
    489720.04
    Figure 14 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 1
    334
    37
    -594.4
    -147.2
    87495.68
    353311.36
    21667.84
    345
    38
    -583.4
    -146.2
    85293.08
    340355.56
    21374.44
    379
    58
    -549.4
    -126.2
    69334.28
    301840.36
    15926.44
    463
    98
    -465.4
    -86.2
    40117.48
    216597.16
    7430.44
    487
    103
    -441.4
    -81.2
    35841.68
    194833.96
    6593.44
    621
    115
    -307.4
    -69.2
    21272.08
    94494.76
    4788.64
    798
    145
    -130.4
    -39.2
    5111.68
    17004.16
    1536.64
    970
    161
    41.6
    -23.2
    -965.12
    1730.56
    538.24
    1498
    298
    569.6
    113.8
    64820.48
    324444.16
    12950.44
    3389
    789
    2460.6
    604.8
    1488170.88
    6054552.3 6
    365783.04
    Figure 15 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 2
    289
    46
    -506.2
    -183.5
    92887.7
    256238.44
    33672.25
    297
    52
    -498.2
    -177.5
    88430.5
    248203.24
    248203.24
    303
    67
    -492.2
    -162.5
    79982.5
    242260.84
    26406.25
    401
    132
    -394.2
    -97.5
    38434.5
    155393.64
    9506.25
    432
    136
    -363.2
    -93.5
    33959.2
    131914.24
    8742.25
    543
    105
    -252.2
    -124.5
    31398.9
    63604.84
    15500.25
    641
    187
    -154.2
    -42.5
    6553.5
    23777.64
    1806.25
    879
    190
    83.8
    -39.5
    -3310.1
    7022.44
    1560.25
    1273
    398
    477.8
    168.5
    80509.3
    228292.84
    28392.25
    2894
    982
    2098.8
    752.5
    1579347
    4404961.4 4
    566256.25
    Figure 16 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient For Group 3

    The formula of Pearson’s correlation coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, is the arithmetic mean of all the observations of the independent variable, is the arithmetic mean of all the observations of the dependent variable and denotes the sum of all the observation of the mentioned variable.

     

    Calculation for Table 8A

     

    \(\bar x = \frac{\sum x}{N}=\frac{9789}{10} = 978.9\)

     

    \(\bar y = \frac{\sum y}{N}=\frac{1792}{10}=179.2\)

     

    \( \displaystyle \sum (x - \bar{x})(y - \bar{y}) = 2531112.2 \)

     

    \(\displaystyle \sum (x- \bar x)(y- \bar y) = 2531112.2\)

     

    \(\displaystyle \sum (x-\bar x)^2=1008154.9\)

     

    \(\displaystyle\sum(y-\bar y)=605411.6\)

     

    Let, the Pearson’s Correlation Coefficient be \(\mathfrak{R}\).

     

    \(\mathfrak{R}=\frac{\sum{({x}-\bar{{x}})({y}-\bar{{y}})}}{\sqrt{\sum{({x}-\bar{{x}})}^{2}\times\sum{({y}-\bar{{y}})}^{2}}}\)

     

    \(\mathfrak{R}=\frac{2531112.2}{\sqrt{10608154.9\times605411.6}}=0.998\)

     

    Calculation for Table 8B

     

    \(\bar x=\frac{\sum x}{N}= \frac{9284}{10}=928.4\)

     

    \(\bar y = \frac{\sum y}{N}=\frac{9284}{10}=184.2\)

     

    \(\displaystyle\sum (x-\bar x)(y-\bar y)=1896492.2\)

     

    \(\displaystyle \sum (x- \bar x)^2=7899164.4\)

     

    \(\displaystyle \sum (y-\bar y)^2=458589.6\)

     

    Let, the Pearson’s Correlation Coefficient be \(\mathfrak{R}\).

     

    \(\mathfrak{R}=\frac{\sum{({x}-\bar{{x}})({y}-\bar{{y}})}}{\sqrt{\sum{({x}-\bar{{x}})}^{2}\times\sum{({y}-\bar{{y}})}^{2}}}\)

     

    \(\mathfrak{R}=\frac{1896492.2}{\sqrt{7899164.4\times458589.6}}=0.996\)

     

    Calculation for Table 8C

     

    \(\bar x=\frac{\sum x}{N}=\frac{7952}{10}=795.2\)

     

    \(\bar y = \frac{\sum y}{N} = \frac{7952}{10}=229.5\)

     

    \(\displaystyle \sum (x-\bar x)(y -\bar y)=2028193\)

     

    \(\displaystyle \sum(x-\bar x)^2 = 576169.6\)

     

    \(\displaystyle \sum (y-\bar y)^2= 723348.5\)

     

    Let, the Pearson’s Correlation Coefficient be \(\mathfrak{R}\).

     

    \(\mathfrak{R}=\frac{\sum{({x}-\bar{{x}})({y}-\bar{{y}})}}{\sqrt{\sum{({x}-\bar{{x}})}^{2}\times\sum{({y}-\bar{{y}})}^{2}}}\)

     

    \(\mathfrak{R}=\frac{2028193}{\sqrt{5761669.6\times723348.5}}=0.993\)

    Analysis

    The value of Pearson’s correlation coefficient for three groups are 0.998, 0.996 and 0.993 respectively. As it is a positive value, it can be stated that the correlation is increasing in nature, i.e., with an increase in total number of employees working or worked in the nuclear power plant, the number of cancer infected patient also increases. However, the value of Pearson’s correlation coefficient is very close to one. It signifies that the strength of correlation is very strong.

  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course

  • Evaluation of hypothesis

    The hypothesis has been evaluated with the help of T – Test in this section of this mathematical exploration. The T – Test will conclude whether or not the null hypothesis or the alternate hypothesis is true.

    Processed data table

    Figure 17 - Table On Observed Data For Evaluation Of

    x2

    Test
    Figure 18 - Table On Expected Data For Evaluation Of

    x2

    Test

    Calculation of

    x2

    Observed Value (O)
    Expected Value (E)
    10.33
    9.52391937
    0.80608063
    0.64976598
    0.06822464
    11.08
    11.3543268
    -0.2743268
    0.07525519
    0.00662789
    15.92
    16.4517538
    -0.5317538
    0.2827621
    0.01718735
    10.66
    9.99590573
    0.66409427
    0.4410212
    0.04412018
    11.01
    11.9170245
    -0.9070245
    0.82269344
    0.06903514
    17.51
    17.2670698
    0.2429302
    0.05901508
    0.00341778
    12.14
    12.6415806
    -0.5015806
    0.2515831
    0.01990124
    15.3
    15.0711732
    0.2288268
    0.0523617
    0.0034743
    22.11
    21.8372462
    0.2727538
    0.07439464
    0.00340678
    15.74
    17.8155717
    -2.0755717
    4.30799788
    0.24181081
    21.17
    21.2395565
    -0.0695565
    0.00483811
    0.00022779
    32.92
    30.7748719
    2.1451281
    4.60157457
    0.14952376
    11.32
    16.3154204
    -4.9954204
    24.954225
    1.5294871
    21.15
    19.4510903
    1.6989097
    2.88629417
    0.14838727
    31.48
    28.1834893
    3.2965107
    10.8669828
    0.38557975
    13.2
    13.0268235
    0.1731765
    0.0299901
    0.00230218
    18.52
    15.5304561
    2.9895439
    8.93737273
    0.57547394
    19.34
    22.5027203
    -3.1627203
    10.0027997
    0.44451513
    14.2
    15.7005625
    -1.5005625
    2.25168782
    0.14341447
    18.17
    18.7180625
    -0.5480625
    0.3003725
    0.0160472
    29.17
    27.121375
    2.048625
    4.19686439
    0.15474379
    18.2
    14.3943084
    3.8056916
    14.4832886
    1.00618162
    16.6
    17.1607586
    -0.5607586
    0.31445021
    0.01832379
    21.62
    24.864933
    -3.244933
    10.5295902
    0.42347149
    19.3
    17.9737509
    1.3262491
    1.75893668
    0.09786141
    19.89
    21.4281362
    -1.5381362
    2.36586297
    0.11040918
    31.26
    31.0481129
    0.2118871
    0.04489614
    0.00144602
    22.68
    20.3821569
    2.2978431
    5.28008291
    0.25905418
    23.28
    24.2994152
    -1.0194152
    1.03920735
    0.04276676
    33.93
    35.2084278
    -1.2784278
    1.63437764
    0.04642007
    Figure 19 - Table On Calculation Of

    x2

    \(\displaystyle \sum \frac{(0-E)}{E}=0.068+0.006 +...+ 0.046= 6.032843\)

     

    x2 = 6.032843

    Calculation of degree of freedom

    Degree of Freedom = (Column - 1) (Row - 1)

     

    = (3 - 1) (10-1) = 2 × 9 = 18

  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course

  • Evaluation

    Examining the value of with respect to the degree of freedom using the table as shown in Background Information Section, it is concluded that the Null Hypothesis is rejected and the Alternate Hypothesis is accepted.

    Conclusion

    What is the relationship between the number of employees working in a Nuclear Power Station and the number of employees getting infected by cancer during the working period or after retirement for three different age groups – Gr 1: 50 years to 60 years, Gr2: 60 years to 70 years and Gr3: 70 years and 80 years?

     

    The relationship between the number of employees working or worked in Nuclear Power Plant and the number of employees out of them who are getting or got infected by cancer respectively is direct, i.e., with increase in total number of employees, the number of employees infected by cancer is also increased.

    • The equation of trendline for Group 1, i.e., the age group of 50 to 60 years, is: y = 0.2386x - 54.366.
    • The equation of trendline for Group 2, i.e., the age group of 60 to 70 years, is: y = 0.2401x - 38.697.
    • The equation of trendline for Group 3, i.e., the age group of 70 to 80 years, is: y = 0.352x - 50.422.
    • The value of regression coefficient for Group 1 is 0.997 which satisfies the existence of the increasing correlation between the independent and the dependent variable.
    • The value of regression coefficient for Group 2 is 0.992 which satisfies the existence of the increasing correlation between the independent and the dependent variable.
    • The value of regression coefficient for Group 3 is 0.987 which satisfies the existence of the increasing correlation between the independent and the dependent variable.
    • The value of Pearson’s Correlation Coefficient for Group 1 is 0.998. Positive value of correlation coefficient signifies that the correlation is increasing (direct relation) in nature. Secondly, such a high value (close to 1) of coefficient satisfies the existence of the correlation.
    • The value of Pearson’s Correlation Coefficient for Group 2 is 0.996. Positive value of correlation coefficient signifies that the correlation is increasing (direct relation) in nature. Secondly, such a high value (close to 1) of coefficient satisfies the existence of the correlation.
    • The value of Pearson’s Correlation Coefficient for Group 3 is 0.993. Positive value of correlation coefficient signifies that the correlation is increasing (direct relation) in nature. Secondly, such a high value (close to 1) of coefficient satisfies the existence of the correlation.
    • The minimum percentage of employees getting infected by cancer in all the three groups is in Byron Nuclear Power Station, with values ranging between 10% to 16%.
    • The maximum percentage of employees getting infected by cancer in all the three groups is in Vogtle Nuclear Power Station, with values ranging between 22% and 34%.
    • The percentage of infected individuals is minimum in first age group (50 years to 60 years). This is because of the strength of immunity each employee possesses. Another reason might be advancement in radiation prevention techniques which protects employees of this generation with more efficiency than that of the others.
    • The percentage of infected individuals is maximum in third age group (70 years to 80 years). This is because of the weakened immunity of each retired employee. Another reason might be the number of employees who worked in power plants alive during the survey of data collection. Due to a smaller number of retired employees, the percentage has increased.
    • It is concluded that if the total number of employees in age group 50 to 60 years is 227, then there will be no case of cancer.
    • Similarly, if the total number of employees in age group 60 to 70 years is 161, then there will be no case of cancer.
    • Similarly, if the total number of employees in age group 70 to 80 years is 143, then there will be no case of cancer.
    • The test evaluates the hypothesis and concludes that the alternate hypothesis is true.

    Reflection

    In this investigation, several process and mathematical tools have been observed to find the correlation along with its strength. The choice of nuclear power plants is one of the most important strength of this investigation. It has provided with a data sheet with accurate observations of employee count. On the other hand, internationally proclaimed newspapers has also contributed in this. Use of two different correlation coefficients – Regression and Pearson’s correlation coefficient has provided the strength and nature of correlation. Furthermore, calculation of percentage of employees infected with cancer has enabled the investigation to analyse the variation of cancer infected employee (dependent variable) in the observed data sheet. Lastly, the use of – test has provided the conclusion regarding the correlation.

     

    However, there are few weakness that has been observed during this mathematical investigation. As immunity of human body is very uncertain and cannot be generalised. Moreover, cancer is one of the disease in which research is still going on and there are a lot of gaps or queries such as causes of cancer, etc. which governs the rate of spreading of cancer. As there are a lot of variables affecting the dependent variable apart from total employee count, thus, the correlation study cannot be efficiently carried on. In order to employ an efficient correlative analysis on the research question, all of these parameters must be controlled or made constant.

  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course

  • Bibliography

    • What Is Cancer? - National Cancer Institute. 17 Sept. 2007, https://www.cancer.gov/about-cancer/understanding/what-is-cancer.
    • Risk Factors: Radiation - National Cancer Institute. 29 Apr. 2015, https://www.cancer.gov/about-cancer/causes-prevention/risk/radiation.
    • ‘UV Radiation’. The Skin Cancer Foundation, https://www.skincancer.org/risk- factors/uv-radiation/. Accessed 22 Nov. 2020.
    • Nuclear Power Plants - U.S. Energy Information Administration (EIA). https://www.eia.gov/energyexplained/nuclear/nuclear-power-plants.php. Accessed 22 Nov. 2020.
    • ‘Electromagnetic Radiation - Gamma Rays’. Encyclopedia Britannica, https://www.britannica.com/science/electromagnetic-radiation. Accessed 22 Nov. 2020.
    • Correlation. http://www.stat.yale.edu/Courses/1997-98/101/correl.htm. Accessed 22 Nov. 2020.
    • Data Analysis - Pearson’s Correlation Coefficient. http://learntech.uwe.ac.uk/da/default.aspx?pageid=1442. Accessed 22 Nov. 2020.
    • Chi Square Statistics. https://math.hws.edu/javamath/ryan/ChiSquare.html. Accessed 23 Nov. 2020.
    • Table: Chi-Square Probabilities. https://people.richland.edu/james/lecture/m170/tbl- chi.html. Accessed 23 Nov. 2020.
    • ‘Nuclear Workers May Face Higher Cancer Risk’. WebMD, https://www.webmd.com/cancer/news/20050628/nuclear-workers-may-face-higher- cancer-risk. Accessed 22 Nov. 2020.
    • Parthasarathy, K. s. ‘Is Working in a Nuclear Power Plant Risky?’ The Hindu, 1 Jan. 2014. www.thehindu.com, https://www.thehindu.com/sci-tech/science/is-working-in- a-nuclear-power-plant-risky/article5526497.ece
    • Accidents at Nuclear Power Plants and Cancer Risk - National Cancer Institute. 19 Apr. 2011, https://www.cancer.gov/about-cancer/causes- prevention/risk/radiation/nuclear-accidents-fact-sheet.
    • Exelon. https://www.exeloncorp.com:443/locations/power-plants/byron-generating- station. Accessed 25 Nov. 2020.
    • Peach Bottom Atomic Power Station Receives Approval to Operate an Additional 20 Years | Transmission Intelligence Service. https://www.transmissionhub.com/articles/2020/03/peach-bottom-atomic-power- station-receives-approval-to-operate-an-additional-20-years.html. Accessed 25 Nov. 2020.
    • NRC: Oconee Nuclear Station, Unit 1. https://www.nrc.gov/info- finder/reactors/oco1.html. Accessed 25 Nov. 2020.
    • ‘Braidwood Generating Station | Braceville, Ill.’ Nuclear Powers IL, https://www.nuclearpowersillinois.com/braidwood_generating_station. Accessed 25 Nov. 2020.
    • NRC: South Texas Project, Unit 1. https://www.nrc.gov/info- finder/reactors/stp1.html. Accessed 25 Nov. 2020.
    • NRC: Susquehanna Steam Electric Station, Unit 1. https://www.nrc.gov/info- finder/reactors/susq1.html. Accessed 25 Nov. 2020.
    • Energy, Duke. ‘McGuire Nuclear Station Focuses on Operational Excellence and Community Outreach’. Duke Energy | Nuclear Information Center, https://nuclear.duke-energy.com/2013/06/25/mcguire-nuclear-station-focuses-on- operational-excellence-and-community-outreach. Accessed 25 Nov. 2020.
    • ‘Browns Ferry Nuclear Plant’. TVA.Com, https://www.tva.com/energy/our-power- system/nuclear/browns-ferry-nuclear-plant. Accessed 25 Nov. 2020.
    • ‘Aps – Arizona Public Service Electric’. Aps, https://www.aps.com/en/About/Our- Company/Clean-Energy/Nuclear-generation. Accessed 25 Nov. 2020.
    • ‘Vogtle 3 and 4’. Georgia Power, http://www.georgiapower.com/company/plant- vogtle.html. Accessed 25 Nov. 2020.
  • Nail IB Video
    Dr. Adam Nazha

    Top IB Math Tutor: 45/45 IBDP, 7/7 Further Math, 7 Yrs Exp, Medicine Student

    Video Course