I love sports from a very young age. When I was a kid, I remember my dad taking me to parks every day to play to game of cricket. I guess this is how my relation with sports strengthened. Though I have grown up now, I have an active participation in sports. My studies have never been an excuse to skip games. "All work and no play make Jack a dull boy"- I abide by the statement.
I do not play only for recreation; I follow sports religiously. I am into my school's cricket team. I take coaching classes and practice even after school. I love to play Cricket. I love being a batsman and captain of a team.
Recently, it was announced that players will be selected to be a part of the interschool cricket tournament. I was super excited and wished to grab the opportunity. When surfing the net for some tips, I read various resources, got to know many interesting facts but a statement about height being a factor of selection caught my eyes.
Being the captain of my school team, selecting players to an extent was my responsibility. I looked for confirmation everywhere but could not get a satisfactory answer. I could not decide on the players as I thought their heights should not overshadow their performances. It was a matter of their hard work as well as the name of the school.
Heaped with worries, I decided to research and find the answer to my query. This IA is about the same. In this IA, I have tried to find out if the height of a batsman determines his strike rate. I will also try to find how much height of a batsman act as a deciding factor in the result of the cricket match. This research will help me convince myself on selecting players for the competition.
The main motive of this IA is to study whether or not there exist a correlation between the strike rate of batsman and their height in the game of cricket. Furthermore, this IA will provide a brief information about the benefit or disadvantage a batsman has by default due to his height in scoring runs at a faster rate, i.e., his strike-rate. This exploration will help the team management and selection committee to sign contract with players.
What is the relationship between strike rate of batsman and the height of the batsman?
Strike rate1 is one of the most important parameters which measures the performance of any batsman in the game of cricket. It analyses how much the batsman has scored runs with respect to the number of balls he played. The formula of calculation of strike rate is shown below:
\(Strike\ Rate=\frac{Runs\ Scored}{Number\ of\ balls\ played}\times100\)
Height of players could be a benefit for any player in several games. For example, in games like football and basketball, taller players often stand a better chance in the gameplay with respect to performance over the players with comparatively shorter height.
In the game of cricket, taller batsman could have a better chance while playing short balls which will allow then to score a lot of runs in difficult deliveries also.
Regression correlation coefficient is a tool to measure the strength of the correlation between the independent variable and the dependent variable. The set of values (x_{1},y_{1}), (x_{2},y_{2}), (x_{n},y_{n}) are used to find the value of r as stated by the formula below:
\(r=\frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-(\sum x)^2][n\sum y^2-(\sum y)^2]}}\)
In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and ∑ denotes the sum of all the observation of the mentioned variable.
By squaring the value of r, the value of the regression coefficient (r^{2} ) will be achieved. The value of r^{2} lies between 0 and 1 where 1 signifies maximum correlation whereas 0 signifies null correlation.
Pearson’s correlation coefficient is a tool to measure the strength of the correlation and also the nature of correlation between the independent variable and the dependent variable. The set of values (x_{1},y_{1}), (x_{2},y_{2}), (x_{n},y_{n}) are used to find the value of \(\mathfrak{R}\) as stated by the formula below:
\(\mathfrak{R}=\frac{\sum (x-\bar x)(y-\bar y)}{\sqrt{\sum(x-\bar x)^2\times\sum (y-\bar y)^2}}\)
In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, \(\bar x\) is the arithmetic mean of all the observations of the independent variable, \(\bar y\) is the arithmetic mean of all the observations of the dependent variable and ∑ denotes the sum of all the observation of the mentioned variable. The value of \(\mathfrak{R}\) lies between -1 and 1. A positive value of Pearson’s correlation coefficient implies a direct relationship the independent and the dependent variable whereas, a negative value of Pearson’s correlation coefficient implies a indirect relationship the independent and the dependent variable. If the value of the correlation coefficient is close of 1 or -1, it signifies the correlation exists true. On the other hand, if the value of the correlation coefficient is close to 0, it signifies the correlation does not exist.
T – test is a kind of analysis which predicts the existence of any correlation between an independent variable and a dependent variable. The T – value of any given set of data is firstly calculated. Now, based on the type of data, for example, paired data or independent data, the T- value is checked in the T – table which further predicts the existence of any correlation. The formula of T – value is given below:
\(T\ value=\frac{|\bar x-\bar y|}{\sqrt{\frac{v_x^2}{n_x}+\frac{v_y^2}{n_y}}}\)
Here, \(\bar x\) is the arithmetic mean of all the observations of the independent variable, \(\bar y\) is the arithmetic mean of all the observations of the dependent variable, v_{x} is the variance of independent variable, v_{y} is the variance of dependent variable, v_{x }is the number of observation of independent variable, and v_{y} is the number of observation of dependent variable.
Now, the T – value is checked in T – table which predicts the existence of any correlation. The T – table is shown below:
It is assumed that there does not exist any correlation between strike rate of batsman and the height of the batsman.
It is assumed that there is a correlation between the strike rate of batsman and the height of the batsman.
The strike rate of different batsman with respect to their height has been collected from the very recently organised cricket tournament, Indian Premier League 2020 . Indian Premier League or abbreviated as IPL T20 is a domestic cricket tournament organized by BCCI (Board of Council for Cricket in India). Eight teams each representing a particular city/ state in India competes in a two – three months long tournament where players across the globe are signed contract and assigned in each team. As it is a twenty over match, it is often abbreviated as T20 series.
IPL T20 has been selected for collection of data for a various reason. Firstly, IPL, though a domestic tournament organized by BCCI, it offers an amalgamation of players across the globe. It will allow the data set to have more generalized observations rather than specific to any single country. Secondly, IPL T20 is one of the most recently organized tournaments. It will allow the data set to be updated with respect to the current style of playing the game of cricket. Thirdly, IPL is a twenty over game. A twenty over game’s pre-requisite is scoring runs at a smaller number of balls played. As a result, the strike rate of batsman in this tournament will be more than that of any other tournament. Higher observed values offer an ease and perfection to find the correlation than that of smaller observed values.
Sl. No | Batsmen | Height(cm) | Strike rate |
---|---|---|---|
1 | Shakib al Hassan | 155 | 82.05 |
2 | Mushfiqur Rahim | 160 | 92.67 |
3 | Rashid Khan | 168 | 100.96 |
4 | Kusal Perera | 168 | 110.97 |
5 | Rishabh Pant | 170 | 89.23 |
6 | David Warner | 170 | 89.36 |
7 | JP Duminy | 170 | 97.22 |
8 | Rohit Sharma | 170 | 98.33 |
9 | Kane Williamson | 173 | 99.8 |
10 | Nicholas Pooran | 173 | 100.27 |
11 | Mosaddek Hossain | 174 | 106.36 |
12 | MS Dhoni | 175 | 87.78 |
13 | Mohammed Hafeez | 175 | 88.77 |
14 | Virat Kohli | 175 | 94.04 |
15 | Liton Das | 175 | 110.17 |
16 | Eoin Morgan | 175 | 111.07 |
17 | Aaron Finch | 176 | 102.21 |
18 | Usman Khawaja | 177 | 88.26 |
19 | Jonny Bairstow | 178 | 92.84 |
20 | Colin Munro | 178 | 97.65 |
21 | Shimron Hetmyer | 178 | 101.58 |
22 | Mohammad Saifuddin | 179 | 120.83 |
23 | Najibullah Zadran | 180 | 88.8 |
24 | Mahmudullah | 180 | 89.75 |
25 | Haris Sohail | 180 | 94.28 |
26 | Shikhar Dhawan | 180 | 103.3 |
27 | Jos Buttler | 180 | 122.83 |
28 | Avishka Fernando | 181 | 105.72 |
29 | Jason Roy | 182 | 115.36 |
30 | Glen Maxwell | 182 | 150 |
31 | Alex Carey | 182 | 104.45 |
32 | Joe Root | 183 | 89.53 |
33 | Hazratullah Zazai | 183 | 94.11 |
34 | Colin de Grandhomme | 183 | 100.52 |
35 | Soumya Sarkar | 183 | 101.21 |
36 | Hardik Pandya | 183 | 112.43 |
37 | Chris Woakes | 185 | 89.93 |
38 | Ben Stokes | 185 | 93.18 |
39 | Thisara Perera | 185 | 95.31 |
40 | Wahab Riaz | 185 | 127.53 |
41 | Imad Wasim | 187 | 118.24 |
42 | Chris Gayle | 188 | 88.32 |
43 | Rassie van der Dussen | 188 | 90.37 |
44 | Martin Guptill | 188 | 143.13 |
45 | David Miller | 191 | 117.94 |
46 | Nathan Coulter-Nile | 191 | 136.11 |
47 | Carlos Brathwaite | 193 | 106.2 |
48 | Chris Morris | 196 | 121.31 |
49 | Mitchell Stark | 197 | 89.47 |
50 | Jason Holder | 201 | 108.97 |
\(\text{Mean }= \frac{y_1+y_2+...+y_n}n{}\)
\(\text{Arithmetic Mean }= \frac{82.05+92.67+100.96+...+89.47+108.97}{50} = 103.2144\)
\(\text{Standard Deviation }= \frac{\sqrt{(\bar y-y_1)^2+(\bar y-y_2)^2+...+(\bar y-y_n)^2}}{n}\)
\(\\text{Standard Deviation =}\frac{\sqrt{{\overline{(103.2144}-82.05)^2+(103.2144-92.67)^2+...+(\overline{103.2144}-108.97)^2}}}{50} = 14.967\)
The mean strike rate of all the batsman is 103.2144. On the other hand the standard deviation is 14.967. The value of standard deviation, being high, offers a wide range of values of strike rate with respect to the mean. As a result, it can be assumed that the strike rate varies greatly from each player to the other.
The X – Axis of the graph denotes the height of the batsman measured in centimetre (independent variable).
The Y – Axis of the graph denotes the strike rate of the batsman (dependent variable).
In this graph, a linear trendline has been obtained using the data that has been collected based on the most recent performance of the players in IPL 2020. The equation of the trendline is shown below:
y = 0.5913x - 3.1403
From the graph, it can be stated that, there exists a positive increasing correlation between the strike rate and height of each batsman. However, a lot of outliers are seen in the graph.
In this graph, a polynomial trendline has been obtained using the data that has been collected based on the most recent performance of the players in IPL 2020. The equation of the trendline is shown below:
y = -0.0089x^{2} + 3.7722x - 287.37
From the graph, it can be stated that, there exists a positive increasing correlation between the strike rate and height of each batsman. However, the slope of the curve is decreasing which implies the fact that with further increase in height, the strike rate will start to decrease.
There are a lot of outliers between the range of 170 cm to 190 cm height. This may be because of the several other parameters which either offers a partial benefit to the batsman in cricket. For example, if any bowler is at the top of his performance (form) and if any batsman is dismissed by the bowling skill of the bowler, then it significantly affects the correlation study. There are other factors which are responsible for presence of such a high number of outliers. They are – Current Form of Batsman, Pitch Condition, Weather Conditions, etc. All of the factors directly affects the performance of a batsman which in turn affects the correlation study. Due to presence of high number of outliers, the value of regression coefficient is 0.12. Such a small value (close to zero) of regression coefficient nullifies the existence of any linear correlation between the dependent and the independent variable.
The Y – intercept of the graph can be studied to comment on the existence of the linear correlation. From the equation of the trendline, the Y – intercept of the trendline has been calculated:
y = 0.5913x − 3.1403
The value of y for x = 0 will be:
y = 0.5913 × 0 − 3.1403
=> y = −3.1403
The value of Y – Intercept is -3.1403. A negative intercept is absurd to get in this correlation. This is because, for a height of zero centimetre, the strike rate has come out to be -3.1403. From the formula of strike rate that has been mentioned in the Background Information Section, the value of strike rate cannot be negative. Thus, it justifies the fact that the correlation between strike rate and height of batsman should not be linear.
From the equation of polynomial correlation, the value of maxima of the strike rate can be measured.
y = −0.0089x^{2} + 3.7722x − 287.37
Differentiating both sides with respect to x, we get,
\(\frac{dy}{dx}=-\frac{d(0.0089x^2)}{dx}+\frac{d(3.7722x)}{dx}-\frac{d(287.37)}{dx}\)
\(=>\frac{dy}{dx} = −0.0178x + 3.7722 − 0\)
\(=>\frac{dy}{dx} = −0.0178x + 3.7722\)
Further, differentiating both sides with respect to x, we get,
\(\frac{d^2y}{dx^2}=-\frac{d(0.0178x)}{dx}+\frac{d(3.7722)}{dx}\)
\(\frac{d^2y}{dx^2} = − 0.0178 + 0\)
\(\frac{d^2y}{dx^2} = − 0.0178\)
As the value of \(\frac{d^2y}{dx^2}\) is negative, thus it can be stated that the value of the maxima will be
found be putting \(\frac{d^2y}{dx^2} = 0\)
\(\frac{dx}{dy} = 0\)
=> −0.0178x + 3.7722 = 0
=> −0.0178x = −3.7722
\(=> x=\frac{-3.7722}{-0.0178}\)
=> x = 221.92
Thus, the value of maxima of the polynomial trendline is x = 211.92 cm. Thus, a batsman with a height of 211.92 cm, will have the maximum strike rate as per the polynomial correlation.
There are five headers of the processed data tables expressed as x, y, x^{2} , y^{2} , xy. The height of the batsman is represented by x and the strike rate of batsman is represented by y. The remaining headers has usual meaning. The calculation of R^{2} correlation coefficient is shown explore the efficiency and stability of the trendline and the correlation.
x | y | x^{2} | y^{2} | xy |
---|---|---|---|---|
155 | 82.05 | 24025 | 6732.2025 | 12717.75 |
160 | 92.67 | 25600 | 8587.7289 | 14827.2 |
168 | 100.96 | 28224 | 10192.9216 | 16961.28 |
168 | 110.97 | 28224 | 12314.3409 | 18642.96 |
170 | 89.23 | 28900 | 7961.9929 | 15169.1 |
170 | 89.36 | 28900 | 7985.2096 | 15191.2 |
170 | 97.22 | 28900 | 9451.7284 | 16527.4 |
170 | 98.33 | 28900 | 9668.7889 | 16716.1 |
173 | 99.8 | 29929 | 9960.04 | 17265.4 |
173 | 100.27 | 29929 | 10054.0729 | 17346.71 |
174 | 106.36 | 30276 | 11312.4496 | 18506.64 |
175 | 87.78 | 30625 | 7705.3284 | 15361.5 |
175 | 88.77 | 30625 | 7880.1129 | 15534.75 |
175 | 94.04 | 30625 | 8843.5216 | 16457 |
175 | 110.17 | 30625 | 12137.4289 | 19279.75 |
175 | 111.07 | 30625 | 12336.5449 | 19437.25 |
176 | 102.21 | 30976 | 10446.8841 | 17988.96 |
177 | 88.26 | 31329 | 7789.8276 | 15622.02 |
178 | 92.84 | 31684 | 8619.2656 | 16525.52 |
178 | 97.65 | 31684 | 9535.5225 | 17381.7 |
178 | 101.58 | 31684 | 10318.4964 | 18081.24 |
179 | 120.83 | 32041 | 14599.8889 | 21628.57 |
180 | 88.8 | 32400 | 7885.44 | 15984 |
180 | 89.75 | 32400 | 8055.0625 | 16155 |
180 | 94.28 | 32400 | 8888.7184 | 16970.4 |
180 | 103.3 | 32400 | 10670.89 | 18594 |
180 | 122.83 | 32400 | 15087.2089 | 22109.4 |
181 | 105.72 | 32761 | 11176.7184 | 19135.32 |
182 | 115.36 | 33124 | 13307.9296 | 20995.52 |
182 | 150 | 33124 | 22500 | 27300 |
182 | 104.45 | 33124 | 10909.8025 | 19009.9 |
183 | 89.53 | 33124 | 8015.6209 | 16383.99 |
183 | 94.11 | 33489 | 8856.6921 | 17222.13 |
183 | 100.52 | 33489 | 10104.2704 | 18395.16 |
183 | 101.21 | 33489 | 10243.4641 | 18521.43 |
183 | 112.43 | 33489 | 12640.5049 | 20574.69 |
185 | 89.93 | 34225 | 8087.4049 | 16637.05 |
185 | 93.18 | 34225 | 8682.5124 | 17238.3 |
185 | 95.31 | 34225 | 9083.9961 | 17632.35 |
185 | 127.53 | 34225 | 16263.9009 | 23593.05 |
187 | 118.24 | 34969 | 13980.6976 | 22110.88 |
188 | 88.32 | 35344 | 7800.4224 | 16604.16 |
188 | 90.37 | 35344 | 8166.7369 | 16989.56 |
188 | 143.13 | 35344 | 20486.1969 | 26908.44 |
191 | 117.94 | 36481 | 13909.8436 | 22526.54 |
191 | 136.11 | 36481 | 18525.9321 | 25997.01 |
193 | 106.2 | 37249 | 11278.44 | 20496.6 |
196 | 121.31 | 38416 | 14716.1161 | 23776.76 |
197 | 89.47 | 38809 | 8004.8809 | 17625.59 |
201 | 108.97 | 40401 | 11874.4609 | 21902.97 |
∑ x = 8994 | ∑ y = 5160.72 | ∑ x^{2} = 1621646 | ∑ y^{2} = 543638.162 | ∑ xy = 930560.2 |
Figure 6 - Table On Processed Data For Calculation Of R^{2}
The formula of regression coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and ∑ denotes the sum of all the observation of the mentioned variable.
Calculation -
\(r =\frac{n(∑xy)-(∑x)(∑y)}{[n∑x^2-(∑x)^2][n∑y^2-(∑y)^2]}\)
\(=>r =\frac{50(930560.2) − (8994)(5160.72)}{\sqrt{[50 × 1621646 − (8994)^2][50 × 543638.162 − (5160.72)^2]}}\)
=> r = 0.348
=> r^{2} = 0.1212
The value of regression coefficient is 0.12. Such a small value (close to zero) of regression coefficient nullifies the existence of any linear correlation between the dependent and the independent variable.
There are seven headers of the processed data table for calculation of Pearson’s correlation coefficient expressed as, x, y,x − \(\bar x\),y − \(\bar y\),\((x − \bar x)\) \((y − \bar y)\),\((x − \bar x)^2\) and \((x − \bar x)^2\). The height of the batsman is represented by x and the strike rate of batsman is represented by y, \(\bar x\) is the arithmetic mean of all the observations of the height of batsman, \(\bar y\) is the arithmetic mean of all the observations of the strike rate of batsman. The remaining headers has usual meaning. The calculation of Pearson’s correlation coefficient is shown explore the efficiency and stability of the trendline and the correlation.
x | y | \(x-\bar x\) | \(y-\bar y\) | \((x-\bar x)(y-\bar y)\) | \((x-\bar x)^2\) | \((y-\bar y)^2\) |
---|---|---|---|---|---|---|
155 | 82.05 | -24.88 | -21.1644 | 526.570272 | 619.0144 | 447.931827 |
160 | 92.67 | -19.88 | -10.5444 | 209.622672 | 395.2144 | 111.184371 |
168 | 100.96 | -11.88 | -2.2544 | 26.782272 | 141.1344 | 5.08231936 |
168 | 110.97 | -11.88 | 7.7556 | -92.136528 | 141.1344 | 60.1493314 |
170 | 89.23 | -9.88 | -13.9844 | 138.165872 | 97.6144 | 195.563443 |
170 | 89.36 | -9.88 | -13.8544 | 136.881472 | 97.6144 | 191.944399 |
170 | 97.22 | -9.88 | -5.9944 | 59.224672 | 97.6144 | 35.9328314 |
170 | 98.33 | -9.88 | -4.8844 | 48.257872 | 97.6144 | 23.8573634 |
173 | 99.8 | -6.88 | -3.4144 | 23.491072 | 47.3344 | 11.6581274 |
173 | 100.27 | -6.88 | -2.9444 | 20.257472 | 47.3344 | 8.66949136 |
174 | 106.36 | -5.88 | 3.1456 | -18.496128 | 34.5744 | 9.89479936 |
175 | 87.78 | -4.88 | -15.4344 | 75.319872 | 23.8144 | 238.220703 |
175 | 88.77 | -4.88 | -14.4444 | 70.488672 | 23.8144 | 208.640691 |
175 | 94.04 | -4.88 | -9.1744 | 44.771072 | 23.8144 | 84.1696154 |
175 | 110.17 | -4.88 | 6.9556 | -33.943328 | 23.8144 | 48.3803714 |
175 | 111.07 | -4.88 | 7.8556 | -38.335328 | 23.8144 | 61.7104514 |
176 | 102.21 | -3.88 | -1.0044 | 3.897072 | 15.0544 | 1.00881936 |
177 | 88.26 | -2.88 | -14.9544 | 43.068672 | 8.2944 | 223.634079 |
178 | 92.84 | -1.88 | -10.3744 | 19.503872 | 3.5344 | 107.628175 |
178 | 97.65 | -1.88 | -5.5644 | 10.461072 | 3.5344 | 30.9625474 |
178 | 101.58 | -1.88 | -1.6344 | 3.072672 | 3.5344 | 2.67126336 |
179 | 120.83 | -0.88 | 17.6156 | -15.501728 | 0.7744 | 310.309363 |
180 | 88.8 | 0.12 | -14.4144 | -1.729728 | 0.0144 | 207.774927 |
180 | 89.75 | 0.12 | -13.4644 | -1.615728 | 0.0144 | 181.290067 |
180 | 94.28 | 0.12 | -8.9344 | -1.072128 | 0.0144 | 79.8235034 |
180 | 103.3 | 0.12 | 0.0856 | 0.010272 | 0.0144 | 0.00732736 |
180 | 122.83 | 0.12 | 19.6156 | 2.353872 | 0.0144 | 384.771763 |
181 | 105.72 | 1.12 | 2.5056 | 2.806272 | 1.2544 | 6.27803136 |
182 | 115.36 | 2.12 | 12.1456 | 25.748672 | 4.4944 | 147.515599 |
182 | 150 | 2.12 | 46.7856 | 99.185472 | 4.4944 | 2188.89237 |
182 | 104.45 | 2.12 | 1.2356 | 2.619472 | 4.4944 | 1.52670736 |
183 | 89.53 | 3.12 | -13.6844 | -42.695328 | 9.7344 | 187.262803 |
183 | 94.11 | 3.12 | -9.1044 | -28.405728 | 9.7344 | 82.8900994 |
183 | 100.52 | 3.12 | -2.6944 | -8.406528 | 9.7344 | 7.25979136 |
183 | 101.21 | 3.12 | -2.0044 | -6.253728 | 9.7344 | 4.01761936 |
183 | 112.43 | 3.12 | 9.2156 | 28.752672 | 9.7344 | 84.9272834 |
185 | 89.93 | 5.12 | -13.2844 | -68.016128 | 26.2144 | 176.475283 |
185 | 93.18 | 5.12 | -10.0344 | -51.376128 | 26.2144 | 100.689183 |
185 | 95.31 | 5.12 | -7.9044 | -40.470528 | 26.2144 | 62.4795394 |
185 | 127.53 | 5.12 | 24.3156 | 124.495872 | 26.2144 | 591.248403 |
187 | 118.24 | 7.12 | 15.0256 | 106.982272 | 50.6944 | 225.768655 |
188 | 88.32 | 8.12 | -14.8944 | -120.94253 | 65.9344 | 221.843151 |
188 | 90.37 | 8.12 | -12.8444 | -104.29653 | 65.9344 | 164.978611 |
188 | 143.13 | 8.12 | 39.9156 | 324.114672 | 65.9344 | 1593.25512 |
191 | 117.94 | 11.12 | 14.7256 | 163.748672 | 123.6544 | 216.843295 |
191 | 136.11 | 11.12 | 32.8956 | 365.799072 | 123.6544 | 1082.1205 |
193 | 106.2 | 13.12 | 2.9856 | 39.171072 | 172.1344 | 8.91380736 |
196 | 121.31 | 16.12 | 18.0956 | 291.701072 | 259.8544 | 327.450739 |
197 | 89.47 | 17.12 | -13.7444 | -235.30413 | 293.0944 | 188.908531 |
201 | 108.97 | 21.12 | 5.7556 | 121.558272 | 446.0544 | 33.1269314 |
The formula of Pearson’s correlation coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, \(\bar x\) is the arithmetic mean of all the observations of the independent variable, \(\bar y\) is the arithmetic mean of all the observations of the dependent variable and ∑ denotes the sum of all the observation of the mentioned variable.
Calculation -
\(\bar x=\frac{∑x}{N}=\frac{8994}{50} = 179.88\)
\(\bar y=\frac{∑y}{N}=\frac{5160.72}{50} = 103.2144\)
\(∑(x-\bar x)(y-\bar y)= 2249.8864\)
\(∑(x-\bar x)^2= 3805.28\)
\(∑(y-\bar y)^2= 10977.544\)
Let, the Pearson’s Correlation Coefficient be \(\mathfrak{R}\).
\(\mathfrak{R}=\frac{∑(x-\bar x)(y-\bar y)}{\sqrt{∑(x-\bar x)^2\times∑(y-\bar y)^2}}\)
\(\mathfrak{R}=\frac{2249.8864}{\sqrt{3805.28 × 10977.544}} = 0.3481\)
\(\mathfrak{R}=0.348\)
The value of Pearson’s correlation coefficient is 0.348. As it is a positive value, it can be stated that the correlation is increasing in nature, i.e., with an increase in height of batsman, the strike rate also increases. This might be because taller batsman has a benefit in playing short pitched balls which allows them to score runs from a whole lot of deliveries. However, the value of Pearson’s correlation coefficient is very close to zero. It signifies that the correlation is very weak.
There are two headers of the processed data table expressed as x, and y. The height of the batsman is represented by x and the strike rate of batsman is represented by y.
x | y |
---|---|
155 | 82.05 |
160 | 92.67 |
168 | 100.96 |
168 | 110.97 |
170 | 89.23 |
170 | 89.36 |
170 | 97.22 |
170 | 98.33 |
173 | 99.8 |
173 | 100.27 |
174 | 106.36 |
175 | 87.78 |
175 | 88.77 |
175 | 94.04 |
175 | 110.17 |
175 | 111.07 |
176 | 102.21 |
177 | 88.26 |
178 | 92.84 |
178 | 97.65 |
178 | 101.58 |
179 | 120.83 |
180 | 88.8 |
180 | 89.75 |
180 | 94.28 |
180 | 103.3 |
180 | 122.83 |
181 | 105.72 |
182 | 115.36 |
182 | 150 |
182 | 104.45 |
183 | 89.53 |
183 | 94.11 |
183 | 100.52 |
183 | 101.21 |
183 | 112.43 |
185 | 89.93 |
185 | 93.18 |
185 | 95.31 |
185 | 127.53 |
187 | 118.24 |
188 | 88.32 |
188 | 90.37 |
188 | 143.13 |
191 | 117.94 |
191 | 136.11 |
193 | 106.2 |
196 | 121.31 |
197 | 89.47 |
201 | 108.97 |
Figure 8 - Table On Processed Data for calculation of R^{2}
The formula of the T – value is shown below:
\(T \,value = \frac{|\bar x-\bar y|}{\sqrt{\frac{v_x^2}{n_x}+\frac{v_y^2}{n_y}}}\)
Here, \(\bar x\) is the arithmetic mean of all the observations of the height of batsman, \(\bar y\) is the arithmetic mean of all the observations of the strike rate of the batsman, v_{x} is the variance of height of batsman, v_{y} is the variance of strike rate of batsman, n_{x} is the number of observation of height of batsman, and n_{y} is the number of observation of strike rate of batsman.
\(\bar x=\frac{x_1+x_2+...+x_n}{n_x}\)
\(=>\bar x=\frac{x_1+x_2+...+x_n}{n_x}\)
\(=>\bar x=\frac{155 + 160 + ⋯ + 197 + 201}{50} = 179.88\)
\(\bar y=\frac{y_1+y_2+...+y_n}{n_y}\)
\(=>\bar y=\frac{82.05 + 92.67 + ⋯ + 89.47 + 108.97}{50} = 103.2144\)
\(v_x^2=\frac{(\bar x-x_1)^2+(\bar x-x_2)^2+...+(\bar x-x_n)^2}{n_x}\)
\(=>v_x^2=\frac{(179.88 − 155)^2 + (179.88 − 160)^2+ ⋯ + (179.88 − 201)^2}{50} = 77.65877\)
\(v_y^2=\frac{(\bar y-y_1)^2+(\bar y-y_2)^2+...+(\bar y-y_n)^2}{n_y}\)
\(=>v_y^2=\frac{(103.2144 − 155)^2 + (103.2144 − 160)^2 + ⋯ + (103.2144 − 201)^2}{50} = 224.03151\)
Therefore, the T – value can be computed as -
\(T\ value =\frac{|179.88 − 103.2144|}{\sqrt{\frac{77.65877}{50}+\frac{224.03151}{50}}}\)
\(=\frac{76.6656}{\sqrt{1.5531754+4.4806302}}\)
\(=\frac{76.6656}{\sqrt{6.0338056}}\)
\(=\frac{76.6656}{2.45638}\)
= 31.210798
Degree of Freedom = n_{x }+ n_{y }- 2 = 50 + 50 - 2 = 98
The value of T – Test can be found from the table of values of T as mentioned in Background Information Section. From that table, it can be concluded that the Null Hypothesis is accepted and the alternate hypothesis has been rejected. Thus, it can be stated that there is no correlation between the height of batsman and the strike rate of the batsman.
There is no profound correlation between the height of a batsman (measured in cm) and his strike rate in the game of cricket.
In this investigation, several process and mathematical tools have been observed to find the correlation along with its strength. The choice of tournament is one of the most important strength of this investigation. It has provided with a data sheet with accurate observations of strike rate and height based on the current form of cricket. Use of two different correlation coefficients – Regression and Pearson’s correlation coefficient has provided the strength and nature of correlation. Furthermore, values of mean, and standard deviation has enabled the investigation to analyse the variation of strike rate (dependent variable) in the observed data sheet. Lastly, the use of T – test has provided the conclusion regarding the correlation.
However, there are few weakness that has been observed during this mathematical investigation. As cricket is a game of uncertainty, there are a lot of parameters which govern the strike rate of the batsman. Few of such parameters are pitch quality, weather report, bowler etc. Different batsman has different cricketing technique which is also another parameter which governs the strike rate. As there are a lot of variables affecting the dependent variable (strike rate) apart from height, the correlation study cannot be efficiently carried on. In order to employ an efficient correlative analysis on the research question, all of these parameters must be controlled or made constant.