Best IB Resources Website
Sell your IB Docs (IA, EE, TOK, etc.) for $10 a pop!
Best IB Resources Website
Nail IB's App Icon
Mathematics AI SL
Mathematics AI SL
Sample Internal Assessment
Sample Internal Assessment

Skip to

Table of content
Rationale
Aim
Research question
Background information
Hypothesis
Data collection
Graphical analysis
Calculation of correlation coefficient for linear trendline
Evaluation of hypothesis
Conclusion
Reflection
Bibliography

To what extent is there a Co-relation between Strike Rate of Batsman and Height of the Batsman?

To what extent is there a Co-relation between Strike Rate of Batsman and Height of the Batsman? Reading Time
10 mins Read
To what extent is there a Co-relation between Strike Rate of Batsman and Height of the Batsman? Word Count
1,906 Words
Candidate Name: N/A
Candidate Number: N/A
Session: N/A
Personal Code: N/A
Word count: 1,906

Table of content

Rationale

I love sports from a very young age. When I was a kid, I remember my dad taking me to parks every day to play to game of cricket. I guess this is how my relation with sports strengthened. Though I have grown up now, I have an active participation in sports. My studies have never been an excuse to skip games. "All work and no play make Jack a dull boy"- I abide by the statement.

 

I do not play only for recreation; I follow sports religiously. I am into my school's cricket team. I take coaching classes and practice even after school. I love to play Cricket. I love being a batsman and captain of a team.

 

Recently, it was announced that players will be selected to be a part of the interschool cricket tournament. I was super excited and wished to grab the opportunity. When surfing the net for some tips, I read various resources, got to know many interesting facts but a statement about height being a factor of selection caught my eyes.

 

Being the captain of my school team, selecting players to an extent was my responsibility. I looked for confirmation everywhere but could not get a satisfactory answer. I could not decide on the players as I thought their heights should not overshadow their performances. It was a matter of their hard work as well as the name of the school.

 

Heaped with worries, I decided to research and find the answer to my query. This IA is about the same. In this IA, I have tried to find out if the height of a batsman determines his strike rate. I will also try to find how much height of a batsman act as a deciding factor in the result of the cricket match. This research will help me convince myself on selecting players for the competition.

Aim

The main motive of this IA is to study whether or not there exist a correlation between the strike rate of batsman and their height in the game of cricket. Furthermore, this IA will provide a brief information about the benefit or disadvantage a batsman has by default due to his height in scoring runs at a faster rate, i.e., his strike-rate. This exploration will help the team management and selection committee to sign contract with players.

Research question

What is the relationship between strike rate of batsman and the height of the batsman?

Background information

What is strike rate

Strike rate1 is one of the most important parameters which measures the performance of any batsman in the game of cricket. It analyses how much the batsman has scored runs with respect to the number of balls he played. The formula of calculation of strike rate is shown below:

 

\(Strike\ Rate=\frac{Runs\ Scored}{Number\ of\ balls\ played}\times100\)

Physical benefits in athletics – height

Height of players could be a benefit for any player in several games. For example, in games like football and basketball, taller players often stand a better chance in the gameplay with respect to performance over the players with comparatively shorter height.

 

In the game of cricket, taller batsman could have a better chance while playing short balls which will allow then to score a lot of runs in difficult deliveries also.

Regression correlation coefficient

Regression correlation coefficient is a tool to measure the strength of the correlation between the independent variable and the dependent variable. The set of values (x1,y1), (x2,y2), (xn,yn) are used to find the value of r as stated by the formula below:

 

\(r=\frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-(\sum x)^2][n\sum y^2-(\sum y)^2]}}\)

 

In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and denotes the sum of all the observation of the mentioned variable.

 

By squaring the value of r, the value of the regression coefficient (r2 ) will be achieved. The value of r2 lies between 0 and 1 where 1 signifies maximum correlation whereas 0 signifies null correlation.

Pearson’s correlation coefficient

Pearson’s correlation coefficient is a tool to measure the strength of the correlation and also the nature of correlation between the independent variable and the dependent variable. The set of values (x1,y1), (x2,y2), (xn,yn) are used to find the value of \(\mathfrak{R}\) as stated by the formula below:

 

 \(\mathfrak{R}=\frac{\sum (x-\bar x)(y-\bar y)}{\sqrt{\sum(x-\bar x)^2\times\sum (y-\bar y)^2}}\)

 

In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, \(\bar x\) is the arithmetic mean of all the observations of the independent variable, \(\bar y\) is the arithmetic mean of all the observations of the dependent variable and denotes the sum of all the observation of the mentioned variable. The value of \(\mathfrak{R}\) lies between -1 and 1. A positive value of Pearson’s correlation coefficient implies a direct relationship the independent and the dependent variable whereas, a negative value of Pearson’s correlation coefficient implies a indirect relationship the independent and the dependent variable. If the value of the correlation coefficient is close of 1 or -1, it signifies the correlation exists true. On the other hand, if the value of the correlation coefficient is close to 0, it signifies the correlation does not exist.

T – test

T – test is a kind of analysis which predicts the existence of any correlation between an independent variable and a dependent variable. The T – value of any given set of data is firstly calculated. Now, based on the type of data, for example, paired data or independent data, the T- value is checked in the T – table which further predicts the existence of any correlation. The formula of T – value is given below:

 

\(T\ value=\frac{|\bar x-\bar y|}{\sqrt{\frac{v_x^2}{n_x}+\frac{v_y^2}{n_y}}}\)

 

Here, \(\bar x\) is the arithmetic mean of all the observations of the independent variable, \(\bar y\) is the arithmetic mean of all the observations of the dependent variable, vx is the variance of independent variable, vy is the variance of dependent variable, vis the number of observation of independent variable, and vy is the number of observation of dependent variable.

 

Now, the T – value is checked in T – table which predicts the existence of any correlation. The T – table is shown below:

Figure 1 - Table On T – table
Figure 1 - Table On T – table

Hypothesis

Null hypothesis

It is assumed that there does not exist any correlation between strike rate of batsman and the height of the batsman.

Alternate hypothesis

It is assumed that there is a correlation between the strike rate of batsman and the height of the batsman.

Data collection

Source of data

The strike rate of different batsman with respect to their height has been collected from the very recently organised cricket tournament, Indian Premier League 2020 . Indian Premier League or abbreviated as IPL T20 is a domestic cricket tournament organized by BCCI (Board of Council for Cricket in India). Eight teams each representing a particular city/ state in India competes in a two – three months long tournament where players across the globe are signed contract and assigned in each team. As it is a twenty over match, it is often abbreviated as T20 series.

Justification on selecting the source as IPL T20

IPL T20 has been selected for collection of data for a various reason. Firstly, IPL, though a domestic tournament organized by BCCI, it offers an amalgamation of players across the globe. It will allow the data set to have more generalized observations rather than specific to any single country. Secondly, IPL T20 is one of the most recently organized tournaments. It will allow the data set to be updated with respect to the current style of playing the game of cricket. Thirdly, IPL is a twenty over game. A twenty over game’s pre-requisite is scoring runs at a smaller number of balls played. As a result, the strike rate of batsman in this tournament will be more than that of any other tournament. Higher observed values offer an ease and perfection to find the correlation than that of smaller observed values.

Raw data table

Sl. No
Batsmen
Height(cm)
Strike rate
1
Shakib al Hassan
155
82.05
2
Mushfiqur Rahim
160
92.67
3
Rashid Khan
168
100.96
4
Kusal Perera
168
110.97
5
Rishabh Pant
170
89.23
6
David Warner
170
89.36
7
JP Duminy
170
97.22
8
Rohit Sharma
170
98.33
9
Kane Williamson
173
99.8
10
Nicholas Pooran
173
100.27
11
Mosaddek Hossain
174
106.36
12
MS Dhoni
175
87.78
13
Mohammed Hafeez
175
88.77
14
Virat Kohli
175
94.04
15
Liton Das
175
110.17
16
Eoin Morgan
175
111.07
17
Aaron Finch
176
102.21
18
Usman Khawaja
177
88.26
19
Jonny Bairstow
178
92.84
20
Colin Munro
178
97.65
21
Shimron Hetmyer
178
101.58
22
Mohammad Saifuddin
179
120.83
23
Najibullah Zadran
180
88.8
24
Mahmudullah
180
89.75
25
Haris Sohail
180
94.28
26
Shikhar Dhawan
180
103.3
27
Jos Buttler
180
122.83
28
Avishka Fernando
181
105.72
29
Jason Roy
182
115.36
30
Glen Maxwell
182
150
31
Alex Carey
182
104.45
32
Joe Root
183
89.53
33
Hazratullah Zazai
183
94.11
34
Colin de Grandhomme
183
100.52
35
Soumya Sarkar
183
101.21
36
Hardik Pandya
183
112.43
37
Chris Woakes
185
89.93
38
Ben Stokes
185
93.18
39
Thisara Perera
185
95.31
40
Wahab Riaz
185
127.53
41
Imad Wasim
187
118.24
42
Chris Gayle
188
88.32
43
Rassie van der Dussen
188
90.37
44
Martin Guptill
188
143.13
45
David Miller
191
117.94
46
Nathan Coulter-Nile
191
136.11
47
Carlos Brathwaite
193
106.2
48
Chris Morris
196
121.31
49
Mitchell Stark
197
89.47
50
Jason Holder
201
108.97
Figure 2 - Table On Strike Rate Of 50 Batsman Along With Their Height (In Cm)

Processed data table

Figure 3 - Table On Processed Data Table For Strike Rate Of 50 Batsman Along With Their Height (In Cm)
Figure 3 - Table On Processed Data Table For Strike Rate Of 50 Batsman Along With Their Height (In Cm)

Sample calculation

Mean = \(\frac{y_1+y_2+...+y_n}n{}\)

 

Arithmetic Mean = \(\frac{82.05+92.67+100.96+...+89.47+108.97}{50}\) = 103.2144

 

Standard Deviation = \(\frac{\sqrt{(\bar y-y_1)^2+(\bar y-y_2)^2+...+(\bar y-y_n)^2}}{n}\)

 

Standard Deviation = \(\frac{\sqrt{{\overline{(103.2144}-82.05)^2+(103.2144-92.67)^2+...+(\overline{103.2144}-108.97)^2}}}{50}\\ \) = 14.967

Processed data table analysis

The mean strike rate of all the batsman is 103.2144. On the other hand the standard deviation is 14.967. The value of standard deviation, being high, offers a wide range of values of strike rate with respect to the mean. As a result, it can be assumed that the strike rate varies greatly from each player to the other.

Graphical analysis

Linear correlation

Figure 4 - Linear Correlation Between Strike Rate And Height Of Batsman
Figure 4 - Linear Correlation Between Strike Rate And Height Of Batsman

Polynomial correlation

Figure 5 - Polynomial Correlation Between Strike Rate And Height Of Batsman
Figure 5 - Polynomial Correlation Between Strike Rate And Height Of Batsman

Choice of axes

The X – Axis of the graph denotes the height of the batsman measured in centimetre (independent variable).

 

The Y – Axis of the graph denotes the strike rate of the batsman (dependent variable).

Trendline for linear correlation

In this graph, a linear trendline has been obtained using the data that has been collected based on the most recent performance of the players in IPL 2020. The equation of the trendline is shown below:

 

y = 0.5913x - 3.1403

 

From the graph, it can be stated that, there exists a positive increasing correlation between the strike rate and height of each batsman. However, a lot of outliers are seen in the graph.

Trendline for polynomial correlation

In this graph, a polynomial trendline has been obtained using the data that has been collected based on the most recent performance of the players in IPL 2020. The equation of the trendline is shown below:

 

y = -0.0089x2 + 3.7722x - 287.37

 

From the graph, it can be stated that, there exists a positive increasing correlation between the strike rate and height of each batsman. However, the slope of the curve is decreasing which implies the fact that with further increase in height, the strike rate will start to decrease.

Outliers

There are a lot of outliers between the range of 170 cm to 190 cm height. This may be because of the several other parameters which either offers a partial benefit to the batsman in cricket. For example, if any bowler is at the top of his performance (form) and if any batsman is dismissed by the bowling skill of the bowler, then it significantly affects the correlation study. There are other factors which are responsible for presence of such a high number of outliers. They are – Current Form of Batsman, Pitch Condition, Weather Conditions, etc. All of the factors directly affects the performance of a batsman which in turn affects the correlation study. Due to presence of high number of outliers, the value of regression coefficient is 0.12. Such a small value (close to zero) of regression coefficient nullifies the existence of any linear correlation between the dependent and the independent variable.

Intercept for linear correlation

The Y – intercept of the graph can be studied to comment on the existence of the linear correlation. From the equation of the trendline, the Y – intercept of the trendline has been calculated:

 

y = 0.5913x − 3.1403

 

The value of y for x = 0 will be:

 

y = 0.5913 × 0 − 3.1403

 

=> y = −3.1403

 

The value of Y – Intercept is -3.1403. A negative intercept is absurd to get in this correlation. This is because, for a height of zero centimetre, the strike rate has come out to be -3.1403. From the formula of strike rate that has been mentioned in the Background Information Section, the value of strike rate cannot be negative. Thus, it justifies the fact that the correlation between strike rate and height of batsman should not be linear.

Calculation of maxima - minima for polynomial correlation

From the equation of polynomial correlation, the value of maxima of the strike rate can be measured.

 

y = −0.0089x2 + 3.7722x − 287.37

 

Differentiating both sides with respect to x, we get,

 

\(\frac{dy}{dx}=-\frac{d(0.0089x^2)}{dx}+\frac{d(3.7722x)}{dx}-\frac{d(287.37)}{dx}\)

 

\(=>\frac{dy}{dx}\) = −0.0178x + 3.7722 − 0

 

\(=>\frac{dy}{dx}\) = −0.0178x + 3.7722

 

Further, differentiating both sides with respect to x, we get,

 

\(\frac{d^2y}{dx^2}=-\frac{d(0.0178x)}{dx}+\frac{d(3.7722)}{dx}\)

 

\(\frac{d^2y}{dx^2}\) = − 0.0178 + 0

 

\(\frac{d^2y}{dx^2}\) = − 0.0178

 

As the value of \(\frac{d^2y}{dx^2}\) is negative, thus it can be stated that the value of the maxima will be

 

found be putting \(\frac{d^2y}{dx^2}\) = 0

 

\(\frac{dx}{dy}\) = 0

 

=> −0.0178x + 3.7722 = 0

 

=> −0.0178x = −3.7722

 

\(=> x=\frac{-3.7722}{-0.0178}\)

 

=> x = 221.92

 

Thus, the value of maxima of the polynomial trendline is x = 211.92 cm. Thus, a batsman with a height of 211.92 cm, will have the maximum strike rate as per the polynomial correlation.

Calculation of correlation coefficient for linear trendline

Calculation of regression correlation coefficient

There are five headers of the processed data tables expressed as x, y, x2 , y2 , xy. The height of the batsman is represented by x and the strike rate of batsman is represented by y. The remaining headers has usual meaning. The calculation of R2 correlation coefficient is shown explore the efficiency and stability of the trendline and the correlation.

x

y

x2

y2

xy

155
82.05
24025
6732.2025
12717.75
160
92.67
25600
8587.7289
14827.2
168
100.96
28224
10192.9216
16961.28
168
110.97
28224
12314.3409
18642.96
170
89.23
28900
7961.9929
15169.1
170
89.36
28900
7985.2096
15191.2
170
97.22
28900
9451.7284
16527.4
170
98.33
28900
9668.7889
16716.1
173
99.8
29929
9960.04
17265.4
173
100.27
29929
10054.0729
17346.71
174
106.36
30276
11312.4496
18506.64
175
87.78
30625
7705.3284
15361.5
175
88.77
30625
7880.1129
15534.75
175
94.04
30625
8843.5216
16457
175
110.17
30625
12137.4289
19279.75
175
111.07
30625
12336.5449
19437.25
176
102.21
30976
10446.8841
17988.96
177
88.26
31329
7789.8276
15622.02
178
92.84
31684
8619.2656
16525.52
178
97.65
31684
9535.5225
17381.7
178
101.58
31684
10318.4964
18081.24
179
120.83
32041
14599.8889
21628.57
180
88.8
32400
7885.44
15984
180
89.75
32400
8055.0625
16155
180
94.28
32400
8888.7184
16970.4
180
103.3
32400
10670.89
18594
180
122.83
32400
15087.2089
22109.4
181
105.72
32761
11176.7184
19135.32
182
115.36
33124
13307.9296
20995.52
182
150
33124
22500
27300
182
104.45
33124
10909.8025
19009.9
183
89.53
33124
8015.6209
16383.99
183
94.11
33489
8856.6921
17222.13
183
100.52
33489
10104.2704
18395.16
183
101.21
33489
10243.4641
18521.43
183
112.43
33489
12640.5049
20574.69
185
89.93
34225
8087.4049
16637.05
185
93.18
34225
8682.5124
17238.3
185
95.31
34225
9083.9961
17632.35
185
127.53
34225
16263.9009
23593.05
187
118.24
34969
13980.6976
22110.88
188
88.32
35344
7800.4224
16604.16
188
90.37
35344
8166.7369
16989.56
188
143.13
35344
20486.1969
26908.44
191
117.94
36481
13909.8436
22526.54
191
136.11
36481
18525.9321
25997.01
193
106.2
37249
11278.44
20496.6
196
121.31
38416
14716.1161
23776.76
197
89.47
38809
8004.8809
17625.59
201
108.97
40401
11874.4609
21902.97

x = 8994

y = 5160.72

x2 = 1621646

y2 = 543638.162

xy = 930560.2

Figure 6 - Table On Processed Data For Calculation Of R2

The formula of regression coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and denotes the sum of all the observation of the mentioned variable.

 

Calculation:

 

\(r =\frac{n(∑xy)-(∑x)(∑y)}{[n∑x^2-(∑x)^2][n∑y^2-(∑y)^2]}\)

 

\(=>r =\frac{50(930560.2) − (8994)(5160.72)}{\sqrt{[50 × 1621646 − (8994)^2][50 × 543638.162 − (5160.72)^2]}}\)

 

=> r = 0.348

 

=> r2 = 0.1212

Analysis

The value of regression coefficient is 0.12. Such a small value (close to zero) of regression coefficient nullifies the existence of any linear correlation between the dependent and the independent variable.

Calculation of pearson’s correlation coefficient

There are seven headers of the processed data table for calculation of Pearson’s correlation coefficient expressed as, x, y,x\(\bar x\),y\(\bar y\),\((x − \bar x)\) \((y − \bar y)\),\((x − \bar x)^2\) and \((x − \bar x)^2\). The height of the batsman is represented by x and the strike rate of batsman is represented by y\(\bar x\) is the arithmetic mean of all the observations of the height of batsman, \(\bar y\) is the arithmetic mean of all the observations of the strike rate of batsman. The remaining headers has usual meaning. The calculation of Pearson’s correlation coefficient is shown explore the efficiency and stability of the trendline and the correlation.

x

y

\(x-\bar x\)

\(y-\bar y\)

\((x-\bar x)(y-\bar y)\)

\((x-\bar x)^2\)

\((y-\bar y)^2\)

155
82.05
-24.88
-21.1644
526.570272
619.0144
447.931827
160
92.67
-19.88
-10.5444
209.622672
395.2144
111.184371
168
100.96
-11.88
-2.2544
26.782272
141.1344
5.08231936
168
110.97
-11.88
7.7556
-92.136528
141.1344
60.1493314
170
89.23
-9.88
-13.9844
138.165872
97.6144
195.563443
170
89.36
-9.88
-13.8544
136.881472
97.6144
191.944399
170
97.22
-9.88
-5.9944
59.224672
97.6144
35.9328314
170
98.33
-9.88
-4.8844
48.257872
97.6144
23.8573634
173
99.8
-6.88
-3.4144
23.491072
47.3344
11.6581274
173
100.27
-6.88
-2.9444
20.257472
47.3344
8.66949136
174
106.36
-5.88
3.1456
-18.496128
34.5744
9.89479936
175
87.78
-4.88
-15.4344
75.319872
23.8144
238.220703
175
88.77
-4.88
-14.4444
70.488672
23.8144
208.640691
175
94.04
-4.88
-9.1744
44.771072
23.8144
84.1696154
175
110.17
-4.88
6.9556
-33.943328
23.8144
48.3803714
175
111.07
-4.88
7.8556
-38.335328
23.8144
61.7104514
176
102.21
-3.88
-1.0044
3.897072
15.0544
1.00881936
177
88.26
-2.88
-14.9544
43.068672
8.2944
223.634079
178
92.84
-1.88
-10.3744
19.503872
3.5344
107.628175
178
97.65
-1.88
-5.5644
10.461072
3.5344
30.9625474
178
101.58
-1.88
-1.6344
3.072672
3.5344
2.67126336
179
120.83
-0.88
17.6156
-15.501728
0.7744
310.309363
180
88.8
0.12
-14.4144
-1.729728
0.0144
207.774927
180
89.75
0.12
-13.4644
-1.615728
0.0144
181.290067
180
94.28
0.12
-8.9344
-1.072128
0.0144
79.8235034
180
103.3
0.12
0.0856
0.010272
0.0144
0.00732736
180
122.83
0.12
19.6156
2.353872
0.0144
384.771763
181
105.72
1.12
2.5056
2.806272
1.2544
6.27803136
182
115.36
2.12
12.1456
25.748672
4.4944
147.515599
182
150
2.12
46.7856
99.185472
4.4944
2188.89237
182
104.45
2.12
1.2356
2.619472
4.4944
1.52670736
183
89.53
3.12
-13.6844
-42.695328
9.7344
187.262803
183
94.11
3.12
-9.1044
-28.405728
9.7344
82.8900994
183
100.52
3.12
-2.6944
-8.406528
9.7344
7.25979136
183
101.21
3.12
-2.0044
-6.253728
9.7344
4.01761936
183
112.43
3.12
9.2156
28.752672
9.7344
84.9272834
185
89.93
5.12
-13.2844
-68.016128
26.2144
176.475283
185
93.18
5.12
-10.0344
-51.376128
26.2144
100.689183
185
95.31
5.12
-7.9044
-40.470528
26.2144
62.4795394
185
127.53
5.12
24.3156
124.495872
26.2144
591.248403
187
118.24
7.12
15.0256
106.982272
50.6944
225.768655
188
88.32
8.12
-14.8944
-120.94253
65.9344
221.843151
188
90.37
8.12
-12.8444
-104.29653
65.9344
164.978611
188
143.13
8.12
39.9156
324.114672
65.9344
1593.25512
191
117.94
11.12
14.7256
163.748672
123.6544
216.843295
191
136.11
11.12
32.8956
365.799072
123.6544
1082.1205
193
106.2
13.12
2.9856
39.171072
172.1344
8.91380736
196
121.31
16.12
18.0956
291.701072
259.8544
327.450739
197
89.47
17.12
-13.7444
-235.30413
293.0944
188.908531
201
108.97
21.12
5.7556
121.558272
446.0544
33.1269314
Figure 7 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient In Graph 1

The formula of Pearson’s correlation coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, \(\bar x\) is the arithmetic mean of all the observations of the independent variable, \(\bar y\) is the arithmetic mean of all the observations of the dependent variable and denotes the sum of all the observation of the mentioned variable.

 

Calculation:

 

\(\bar x=\frac{∑x}{N}=\frac{8994}{50}\) = 179.88

 

\(\bar y=\frac{∑y}{N}=\frac{5160.72}{50}\) = 103.2144

 

\(∑(x-\bar x)(y-\bar y)= 2249.8864\)

 

\(∑(x-\bar x)^2= 3805.28\)

 

\(∑(y-\bar y)^2= 10977.544\)

 

Let, the Pearson’s Correlation Coefficient be \(\mathfrak{R}\).

 

\(\mathfrak{R}=\frac{∑(x-\bar x)(y-\bar y)}{\sqrt{∑(x-\bar x)^2\times∑(y-\bar y)^2}}\)

 

\(\mathfrak{R}=\frac{2249.8864}{\sqrt{3805.28 × 10977.544}}\) = 0.3481

 

\(\mathfrak{R}=0.348\)

Analysis

The value of Pearson’s correlation coefficient is 0.348. As it is a positive value, it can be stated that the correlation is increasing in nature, i.e., with an increase in height of batsman, the strike rate also increases. This might be because taller batsman has a benefit in playing short pitched balls which allows them to score runs from a whole lot of deliveries. However, the value of Pearson’s correlation coefficient is very close to zero. It signifies that the correlation is very weak.

Evaluation of hypothesis

Processed data table

There are two headers of the processed data table expressed as x, and y. The height of the batsman is represented by x and the strike rate of batsman is represented by y.

x

y

155
82.05
160
92.67
168
100.96
168
110.97
170
89.23
170
89.36
170
97.22
170
98.33
173
99.8
173
100.27
174
106.36
175
87.78
175
88.77
175
94.04
175
110.17
175
111.07
176
102.21
177
88.26
178
92.84
178
97.65
178
101.58
179
120.83
180
88.8
180
89.75
180
94.28
180
103.3
180
122.83
181
105.72
182
115.36
182
150
182
104.45
183
89.53
183
94.11
183
100.52
183
101.21
183
112.43
185
89.93
185
93.18
185
95.31
185
127.53
187
118.24
188
88.32
188
90.37
188
143.13
191
117.94
191
136.11
193
106.2
196
121.31
197
89.47
201
108.97

Figure 8 - Table On Processed Data for calculation of R2

The formula of the T – value is shown below:

 

T value = \(\frac{|\bar x-\bar y|}{\sqrt{\frac{v_x^2}{n_x}+\frac{v_y^2}{n_y}}}\)

 

Here, \(\bar x\) is the arithmetic mean of all the observations of the height of batsman, \(\bar y\) is the arithmetic mean of all the observations of the strike rate of the batsman, vx is the variance of height of batsman, vy is the variance of strike rate of batsman, nx is the number of observation of height of batsman, and ny is the number of observation of strike rate of batsman.

Calculation of t – value

\(\bar x=\frac{x_1+x_2+...+x_n}{n_x}\)

 

\(=>\bar x=\frac{x_1+x_2+...+x_n}{n_x}\)

 

\(=>\bar x=\frac{155 + 160 + ⋯ + 197 + 201}{50}\) = 179.88

 

\(\bar y=\frac{y_1+y_2+...+y_n}{n_y}\)

 

\(=>\bar y=\frac{82.05 + 92.67 + ⋯ + 89.47 + 108.97}{50}\) = 103.2144

 

\(v_x^2=\frac{(\bar x-x_1)^2+(\bar x-x_2)^2+...+(\bar x-x_n)^2}{n_x}\)

 

\(=>v_x^2=\frac{(179.88 − 155)^2 + (179.88 − 160)^2+ ⋯ + (179.88 − 201)^2}{50}\) = 77.65877

 

\(v_y^2=\frac{(\bar y-y_1)^2+(\bar y-y_2)^2+...+(\bar y-y_n)^2}{n_y}\)

 

\(=>v_y^2=\frac{(103.2144 − 155)^2 + (103.2144 − 160)^2 + ⋯ + (103.2144 − 201)^2}{50}\) = 224.03151

 

Therefore, the T – value can be computed as:

 

\(T\ value =\frac{|179.88 − 103.2144|}{\sqrt{\frac{77.65877}{50}+\frac{224.03151}{50}}}\)

 

\(=\frac{76.6656}{\sqrt{1.5531754+4.4806302}}\)

 

\(=\frac{76.6656}{\sqrt{6.0338056}}\)

 

\(=\frac{76.6656}{2.45638}\)

 

= 31.210798

Calculation of degree of freedom

Degree of Freedom = n+ ny - 2 = 50 + 50 - 2 = 98

Result of t – test

The value of T – Test can be found from the table of values of T as mentioned in Background Information Section. From that table, it can be concluded that the Null Hypothesis is accepted and the alternate hypothesis has been rejected. Thus, it can be stated that there is no correlation between the height of batsman and the strike rate of the batsman.

Conclusion

There is no profound correlation between the height of a batsman (measured in cm) and his strike rate in the game of cricket.

  • The average strike rate of all the batsmen as studied in this correlative analysis is 103.2144.
  • The standard deviation in the values of the standard deviation with respect to height of each batsman is 14.967. Such a high value of standard deviation suggests that the strike rate of each batsman varies greatly from each other.
  • A linear correlation trendline was observed between the height of batsman and his strike rate. The equation of the trendline was: y = 0.5913x - 3.1403. However, the due to very weak correlation as given by the value of regression correlation coefficient (0.1212), the correlation was rejected.
  • A polynomial correlation trendline was observed between the height of batsman and his strike rate. The equation of the trendline was: y = -0.0089x2 + 3.7722x - 287.37. However, again due to very weak correlation as given by the value of regression correlation coefficient (0.1266), the correlation was rejected.
  • The value of Y – intercept of the linear correlation trendline was negative which is absurd to get, as strike rate cannot be negative. This also justifies the claim that there exists no correlation between the height and strike rate of the batsman.
  • The maximum value of strike rate as found from the polynomial correlation trendline was 221.92 cm. Thus, a batsman with a height of 221.92 cm will have the maximum strike rate as given by the polynomial trendline.
  • The value of T – test also satisfies the claim that the null hypothesis is true for the above-performed correlative study between height and strike rate of batsman.

Reflection

In this investigation, several process and mathematical tools have been observed to find the correlation along with its strength. The choice of tournament is one of the most important strength of this investigation. It has provided with a data sheet with accurate observations of strike rate and height based on the current form of cricket. Use of two different correlation coefficients – Regression and Pearson’s correlation coefficient has provided the strength and nature of correlation. Furthermore, values of mean, and standard deviation has enabled the investigation to analyse the variation of strike rate (dependent variable) in the observed data sheet. Lastly, the use of T – test has provided the conclusion regarding the correlation.

 

However, there are few weakness that has been observed during this mathematical investigation. As cricket is a game of uncertainty, there are a lot of parameters which govern the strike rate of the batsman. Few of such parameters are pitch quality, weather report, bowler etc. Different batsman has different cricketing technique which is also another parameter which governs the strike rate. As there are a lot of variables affecting the dependent variable (strike rate) apart from height, the correlation study cannot be efficiently carried on. In order to employ an efficient correlative analysis on the research question, all of these parameters must be controlled or made constant.

Bibliography