Mathematics AI SL's Sample Internal Assessment

Mathematics AI SL's Sample Internal Assessment

To what extent is there a Co-relation between strike rate of batsman & height of the batsman?

6/7
6/7
10 mins read
10 mins read
Candidate Name: N/A
Candidate Number: N/A
Session: N/A
Word count: 1,906

Table of content

Rationale

I love sports from a very young age. When I was a kid, I remember my dad taking me to parks every day to play to game of cricket. I guess this is how my relation with sports strengthened. Though I have grown up now, I have an active participation in sports. My studies have never been an excuse to skip games. "All work and no play make Jack a dull boy"- I abide by the statement.

 

I do not play only for recreation; I follow sports religiously. I am into my school's cricket team. I take coaching classes and practice even after school. I love to play Cricket. I love being a batsman and captain of a team.

 

Recently, it was announced that players will be selected to be a part of the interschool cricket tournament. I was super excited and wished to grab the opportunity. When surfing the net for some tips, I read various resources, got to know many interesting facts but a statement about height being a factor of selection caught my eyes.

 

Being the captain of my school team, selecting players to an extent was my responsibility. I looked for confirmation everywhere but could not get a satisfactory answer. I could not decide on the players as I thought their heights should not overshadow their performances. It was a matter of their hard work as well as the name of the school.

 

Heaped with worries, I decided to research and find the answer to my query. This IA is about the same. In this IA, I have tried to find out if the height of a batsman determines his strike rate. I will also try to find how much height of a batsman act as a deciding factor in the result of the cricket match. This research will help me convince myself on selecting players for the competition.

Aim

The main motive of this IA is to study whether or not there exist a correlation between the strike rate of batsman and their height in the game of cricket. Furthermore, this IA will provide a brief information about the benefit or disadvantage a batsman has by default due to his height in scoring runs at a faster rate, i.e., his strike-rate. This exploration will help the team management and selection committee to sign contract with players.

Research question

What is the relationship between strike rate of batsman and the height of the batsman?

Background information

What is strike rate

Strike rate1 is one of the most important parameters which measures the performance of any batsman in the game of cricket. It analyses how much the batsman has scored runs with respect to the number of balls he played. The formula of calculation of strike rate is shown below:

 

\(Strike\ Rate=\frac{Runs\ Scored}{Number\ of\ balls\ played}\times100\)

Physical benefits in athletics – height

Height of players could be a benefit for any player in several games. For example, in games like football and basketball, taller players often stand a better chance in the gameplay with respect to performance over the players with comparatively shorter height.

 

In the game of cricket, taller batsman could have a better chance while playing short balls which will allow then to score a lot of runs in difficult deliveries also.

Regression correlation coefficient

Regression correlation coefficient is a tool to measure the strength of the correlation between the independent variable and the dependent variable. The set of values (x1,y1), (x2,y2), (xn,yn) are used to find the value of r as stated by the formula below:

 

\(r=\frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2-(\sum x)^2][n\sum y^2-(\sum y)^2]}}\)

 

In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and denotes the sum of all the observation of the mentioned variable.

 

By squaring the value of r, the value of the regression coefficient (r2 ) will be achieved. The value of r2 lies between 0 and 1 where 1 signifies maximum correlation whereas 0 signifies null correlation.

Pearson’s correlation coefficient

Pearson’s correlation coefficient is a tool to measure the strength of the correlation and also the nature of correlation between the independent variable and the dependent variable. The set of values (x1,y1), (x2,y2), (xn,yn) are used to find the value of \(\mathfrak{R}\) as stated by the formula below:

 

 \(\mathfrak{R}=\frac{\sum (x-\bar x)(y-\bar y)}{\sqrt{\sum(x-\bar x)^2\times\sum (y-\bar y)^2}}\)

 

In the above-mentioned formula, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, \(\bar x\) is the arithmetic mean of all the observations of the independent variable, \(\bar y\) is the arithmetic mean of all the observations of the dependent variable and denotes the sum of all the observation of the mentioned variable. The value of \(\mathfrak{R}\) lies between -1 and 1. A positive value of Pearson’s correlation coefficient implies a direct relationship the independent and the dependent variable whereas, a negative value of Pearson’s correlation coefficient implies a indirect relationship the independent and the dependent variable. If the value of the correlation coefficient is close of 1 or -1, it signifies the correlation exists true. On the other hand, if the value of the correlation coefficient is close to 0, it signifies the correlation does not exist.

T – test

T – test is a kind of analysis which predicts the existence of any correlation between an independent variable and a dependent variable. The T – value of any given set of data is firstly calculated. Now, based on the type of data, for example, paired data or independent data, the T- value is checked in the T – table which further predicts the existence of any correlation. The formula of T – value is given below:

 

\(T\ value=\frac{|\bar x-\bar y|}{\sqrt{\frac{v_x^2}{n_x}+\frac{v_y^2}{n_y}}}\)

 

Here, \(\bar x\) is the arithmetic mean of all the observations of the independent variable, \(\bar y\) is the arithmetic mean of all the observations of the dependent variable, vx is the variance of independent variable, vy is the variance of dependent variable, vis the number of observation of independent variable, and vy is the number of observation of dependent variable.

 

Now, the T – value is checked in T – table which predicts the existence of any correlation. The T – table is shown below:

Figure 1 - Table On T – table

Hypothesis

Null hypothesis

It is assumed that there does not exist any correlation between strike rate of batsman and the height of the batsman.

Alternate hypothesis

It is assumed that there is a correlation between the strike rate of batsman and the height of the batsman.

Data collection

Source of data

The strike rate of different batsman with respect to their height has been collected from the very recently organised cricket tournament, Indian Premier League 2020 . Indian Premier League or abbreviated as IPL T20 is a domestic cricket tournament organized by BCCI (Board of Council for Cricket in India). Eight teams each representing a particular city/ state in India competes in a two – three months long tournament where players across the globe are signed contract and assigned in each team. As it is a twenty over match, it is often abbreviated as T20 series.

Justification on selecting the source as IPL T20

IPL T20 has been selected for collection of data for a various reason. Firstly, IPL, though a domestic tournament organized by BCCI, it offers an amalgamation of players across the globe. It will allow the data set to have more generalized observations rather than specific to any single country. Secondly, IPL T20 is one of the most recently organized tournaments. It will allow the data set to be updated with respect to the current style of playing the game of cricket. Thirdly, IPL is a twenty over game. A twenty over game’s pre-requisite is scoring runs at a smaller number of balls played. As a result, the strike rate of batsman in this tournament will be more than that of any other tournament. Higher observed values offer an ease and perfection to find the correlation than that of smaller observed values.

Raw data table

Sl. NoBatsmenHeight(cm)Strike rate
1Shakib al Hassan15582.05
2Mushfiqur Rahim16092.67
3Rashid Khan168100.96
4Kusal Perera168110.97
5Rishabh Pant17089.23
6David Warner17089.36
7JP Duminy17097.22
8Rohit Sharma17098.33
9Kane Williamson17399.8
10Nicholas Pooran173100.27
11Mosaddek Hossain174106.36
12MS Dhoni17587.78
13Mohammed Hafeez17588.77
14Virat Kohli17594.04
15Liton Das175110.17
16Eoin Morgan175111.07
17Aaron Finch176102.21
18Usman Khawaja17788.26
19Jonny Bairstow17892.84
20Colin Munro17897.65
21Shimron Hetmyer178101.58
22Mohammad Saifuddin179120.83
23Najibullah Zadran18088.8
24Mahmudullah18089.75
25Haris Sohail18094.28
26Shikhar Dhawan180103.3
27Jos Buttler180122.83
28Avishka Fernando181105.72
29Jason Roy182115.36
30Glen Maxwell182150
31Alex Carey182104.45
32Joe Root18389.53
33Hazratullah Zazai18394.11
34Colin de Grandhomme183100.52
35Soumya Sarkar183101.21
36Hardik Pandya183112.43
37Chris Woakes18589.93
38Ben Stokes18593.18
39Thisara Perera18595.31
40Wahab Riaz185127.53
41Imad Wasim187118.24
42Chris Gayle18888.32
43Rassie van der Dussen18890.37
44Martin Guptill188143.13
45David Miller191117.94
46Nathan Coulter-Nile191136.11
47Carlos Brathwaite193106.2
48Chris Morris196121.31
49Mitchell Stark19789.47
50Jason Holder201108.97

Figure 2 - Table On Strike Rate Of 50 Batsman Along With Their Height (In Cm)

Processed data table

Figure 3 - Table On Processed Data Table For Strike Rate Of 50 Batsman Along With Their Height (In Cm)

Sample calculation

\(\text{Mean }= \frac{y_1+y_2+...+y_n}n{}\)

 

\(\text{Arithmetic Mean }= \frac{82.05+92.67+100.96+...+89.47+108.97}{50} = 103.2144\)

 

\(\text{Standard Deviation }= \frac{\sqrt{(\bar y-y_1)^2+(\bar y-y_2)^2+...+(\bar y-y_n)^2}}{n}\)

 

\(\\text{Standard Deviation =}\frac{\sqrt{{\overline{(103.2144}-82.05)^2+(103.2144-92.67)^2+...+(\overline{103.2144}-108.97)^2}}}{50} = 14.967\)

Processed data table analysis

The mean strike rate of all the batsman is 103.2144. On the other hand the standard deviation is 14.967. The value of standard deviation, being high, offers a wide range of values of strike rate with respect to the mean. As a result, it can be assumed that the strike rate varies greatly from each player to the other.

Graphical analysis

Linear correlation

Figure 4 - Linear Correlation Between Strike Rate And Height Of Batsman

Polynomial correlation

Figure 5 - Polynomial Correlation Between Strike Rate And Height Of Batsman

Choice of axes

The X – Axis of the graph denotes the height of the batsman measured in centimetre (independent variable).

 

The Y – Axis of the graph denotes the strike rate of the batsman (dependent variable).

Trendline for linear correlation

In this graph, a linear trendline has been obtained using the data that has been collected based on the most recent performance of the players in IPL 2020. The equation of the trendline is shown below:

 

y = 0.5913x - 3.1403

 

From the graph, it can be stated that, there exists a positive increasing correlation between the strike rate and height of each batsman. However, a lot of outliers are seen in the graph.

Trendline for polynomial correlation

In this graph, a polynomial trendline has been obtained using the data that has been collected based on the most recent performance of the players in IPL 2020. The equation of the trendline is shown below:

 

y = -0.0089x2 + 3.7722x - 287.37

 

From the graph, it can be stated that, there exists a positive increasing correlation between the strike rate and height of each batsman. However, the slope of the curve is decreasing which implies the fact that with further increase in height, the strike rate will start to decrease.

Outliers

There are a lot of outliers between the range of 170 cm to 190 cm height. This may be because of the several other parameters which either offers a partial benefit to the batsman in cricket. For example, if any bowler is at the top of his performance (form) and if any batsman is dismissed by the bowling skill of the bowler, then it significantly affects the correlation study. There are other factors which are responsible for presence of such a high number of outliers. They are – Current Form of Batsman, Pitch Condition, Weather Conditions, etc. All of the factors directly affects the performance of a batsman which in turn affects the correlation study. Due to presence of high number of outliers, the value of regression coefficient is 0.12. Such a small value (close to zero) of regression coefficient nullifies the existence of any linear correlation between the dependent and the independent variable.

Intercept for linear correlation

The Y – intercept of the graph can be studied to comment on the existence of the linear correlation. From the equation of the trendline, the Y – intercept of the trendline has been calculated:

 

y = 0.5913x − 3.1403

 

The value of y for x = 0 will be:

 

y = 0.5913 × 0 − 3.1403

 

=> y = −3.1403

 

The value of Y – Intercept is -3.1403. A negative intercept is absurd to get in this correlation. This is because, for a height of zero centimetre, the strike rate has come out to be -3.1403. From the formula of strike rate that has been mentioned in the Background Information Section, the value of strike rate cannot be negative. Thus, it justifies the fact that the correlation between strike rate and height of batsman should not be linear.

Calculation of maxima - minima for polynomial correlation

From the equation of polynomial correlation, the value of maxima of the strike rate can be measured.

 

y = −0.0089x2 + 3.7722x − 287.37

 

Differentiating both sides with respect to x, we get,

 

\(\frac{dy}{dx}=-\frac{d(0.0089x^2)}{dx}+\frac{d(3.7722x)}{dx}-\frac{d(287.37)}{dx}\)

 

\(=>\frac{dy}{dx} = −0.0178x + 3.7722 − 0\)

 

\(=>\frac{dy}{dx} = −0.0178x + 3.7722\)

 

Further, differentiating both sides with respect to x, we get,

 

\(\frac{d^2y}{dx^2}=-\frac{d(0.0178x)}{dx}+\frac{d(3.7722)}{dx}\)

 

\(\frac{d^2y}{dx^2} = − 0.0178 + 0\)

 

\(\frac{d^2y}{dx^2} = − 0.0178\)

 

As the value of \(\frac{d^2y}{dx^2}\) is negative, thus it can be stated that the value of the maxima will be

 

found be putting \(\frac{d^2y}{dx^2} = 0\)

 

\(\frac{dx}{dy} = 0\)

 

=> −0.0178x + 3.7722 = 0

 

=> −0.0178x = −3.7722

 

\(=> x=\frac{-3.7722}{-0.0178}\)

 

=> x = 221.92

 

Thus, the value of maxima of the polynomial trendline is x = 211.92 cm. Thus, a batsman with a height of 211.92 cm, will have the maximum strike rate as per the polynomial correlation.

Calculation of correlation coefficient for linear trendline

Calculation of regression correlation coefficient

There are five headers of the processed data tables expressed as x, y, x2 , y2 , xy. The height of the batsman is represented by x and the strike rate of batsman is represented by y. The remaining headers has usual meaning. The calculation of R2 correlation coefficient is shown explore the efficiency and stability of the trendline and the correlation.

x

y

x2

y2

xy

15582.05240256732.202512717.75
16092.67256008587.728914827.2
168100.962822410192.921616961.28
168110.972822412314.340918642.96
17089.23289007961.992915169.1
17089.36289007985.209615191.2
17097.22289009451.728416527.4
17098.33289009668.788916716.1
17399.8299299960.0417265.4
173100.272992910054.072917346.71
174106.363027611312.449618506.64
17587.78306257705.328415361.5
17588.77306257880.112915534.75
17594.04306258843.521616457
175110.173062512137.428919279.75
175111.073062512336.544919437.25
176102.213097610446.884117988.96
17788.26313297789.827615622.02
17892.84316848619.265616525.52
17897.65316849535.522517381.7
178101.583168410318.496418081.24
179120.833204114599.888921628.57
18088.8324007885.4415984
18089.75324008055.062516155
18094.28324008888.718416970.4
180103.33240010670.8918594
180122.833240015087.208922109.4
181105.723276111176.718419135.32
182115.363312413307.929620995.52
182150331242250027300
182104.453312410909.802519009.9
18389.53331248015.620916383.99
18394.11334898856.692117222.13
183100.523348910104.270418395.16
183101.213348910243.464118521.43
183112.433348912640.504920574.69
18589.93342258087.404916637.05
18593.18342258682.512417238.3
18595.31342259083.996117632.35
185127.533422516263.900923593.05
187118.243496913980.697622110.88
18888.32353447800.422416604.16
18890.37353448166.736916989.56
188143.133534420486.196926908.44
191117.943648113909.843622526.54
191136.113648118525.932125997.01
193106.23724911278.4420496.6
196121.313841614716.116123776.76
19789.47388098004.880917625.59
201108.974040111874.460921902.97

x = 8994

y = 5160.72

x2 = 1621646

y2 = 543638.162

xy = 930560.2

Figure 6 - Table On Processed Data For Calculation Of R2

The formula of regression coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, xy is the value of the product of the independent and the dependent variable of each observation, n is the number of observation and denotes the sum of all the observation of the mentioned variable.

 

Calculation -

 

\(r =\frac{n(∑xy)-(∑x)(∑y)}{[n∑x^2-(∑x)^2][n∑y^2-(∑y)^2]}\)

 

\(=>r =\frac{50(930560.2) − (8994)(5160.72)}{\sqrt{[50 × 1621646 − (8994)^2][50 × 543638.162 − (5160.72)^2]}}\)

 

=> r = 0.348

 

=> r2 = 0.1212

Analysis

The value of regression coefficient is 0.12. Such a small value (close to zero) of regression coefficient nullifies the existence of any linear correlation between the dependent and the independent variable.

Calculation of pearson’s correlation coefficient

There are seven headers of the processed data table for calculation of Pearson’s correlation coefficient expressed as, x, y,x\(\bar x\),y\(\bar y\),\((x − \bar x)\) \((y − \bar y)\),\((x − \bar x)^2\) and \((x − \bar x)^2\). The height of the batsman is represented by x and the strike rate of batsman is represented by y\(\bar x\) is the arithmetic mean of all the observations of the height of batsman, \(\bar y\) is the arithmetic mean of all the observations of the strike rate of batsman. The remaining headers has usual meaning. The calculation of Pearson’s correlation coefficient is shown explore the efficiency and stability of the trendline and the correlation.

x

y

\(x-\bar x\)

\(y-\bar y\)

\((x-\bar x)(y-\bar y)\)

\((x-\bar x)^2\)

\((y-\bar y)^2\)

15582.05-24.88-21.1644526.570272619.0144447.931827
16092.67-19.88-10.5444209.622672395.2144111.184371
168100.96-11.88-2.254426.782272141.13445.08231936
168110.97-11.887.7556-92.136528141.134460.1493314
17089.23-9.88-13.9844138.16587297.6144195.563443
17089.36-9.88-13.8544136.88147297.6144191.944399
17097.22-9.88-5.994459.22467297.614435.9328314
17098.33-9.88-4.884448.25787297.614423.8573634
17399.8-6.88-3.414423.49107247.334411.6581274
173100.27-6.88-2.944420.25747247.33448.66949136
174106.36-5.883.1456-18.49612834.57449.89479936
17587.78-4.88-15.434475.31987223.8144238.220703
17588.77-4.88-14.444470.48867223.8144208.640691
17594.04-4.88-9.174444.77107223.814484.1696154
175110.17-4.886.9556-33.94332823.814448.3803714
175111.07-4.887.8556-38.33532823.814461.7104514
176102.21-3.88-1.00443.89707215.05441.00881936
17788.26-2.88-14.954443.0686728.2944223.634079
17892.84-1.88-10.374419.5038723.5344107.628175
17897.65-1.88-5.564410.4610723.534430.9625474
178101.58-1.88-1.63443.0726723.53442.67126336
179120.83-0.8817.6156-15.5017280.7744310.309363
18088.80.12-14.4144-1.7297280.0144207.774927
18089.750.12-13.4644-1.6157280.0144181.290067
18094.280.12-8.9344-1.0721280.014479.8235034
180103.30.120.08560.0102720.01440.00732736
180122.830.1219.61562.3538720.0144384.771763
181105.721.122.50562.8062721.25446.27803136
182115.362.1212.145625.7486724.4944147.515599
1821502.1246.785699.1854724.49442188.89237
182104.452.121.23562.6194724.49441.52670736
18389.533.12-13.6844-42.6953289.7344187.262803
18394.113.12-9.1044-28.4057289.734482.8900994
183100.523.12-2.6944-8.4065289.73447.25979136
183101.213.12-2.0044-6.2537289.73444.01761936
183112.433.129.215628.7526729.734484.9272834
18589.935.12-13.2844-68.01612826.2144176.475283
18593.185.12-10.0344-51.37612826.2144100.689183
18595.315.12-7.9044-40.47052826.214462.4795394
185127.535.1224.3156124.49587226.2144591.248403
187118.247.1215.0256106.98227250.6944225.768655
18888.328.12-14.8944-120.9425365.9344221.843151
18890.378.12-12.8444-104.2965365.9344164.978611
188143.138.1239.9156324.11467265.93441593.25512
191117.9411.1214.7256163.748672123.6544216.843295
191136.1111.1232.8956365.799072123.65441082.1205
193106.213.122.985639.171072172.13448.91380736
196121.3116.1218.0956291.701072259.8544327.450739
19789.4717.12-13.7444-235.30413293.0944188.908531
201108.9721.125.7556121.558272446.054433.1269314

Figure 7 - Table On Processed Data Table For Calculation Of Pearson’s Correlation Coefficient In Graph 1

The formula of Pearson’s correlation coefficient as mentioned in the background information has been used to find the correlation coefficient. Here, x is the value of independent variable of each observation, y is the value of dependent variable of each observation, \(\bar x\) is the arithmetic mean of all the observations of the independent variable, \(\bar y\) is the arithmetic mean of all the observations of the dependent variable and denotes the sum of all the observation of the mentioned variable.

 

Calculation -

 

\(\bar x=\frac{∑x}{N}=\frac{8994}{50} = 179.88\)

 

\(\bar y=\frac{∑y}{N}=\frac{5160.72}{50} = 103.2144\)

 

\(∑(x-\bar x)(y-\bar y)= 2249.8864\)

 

\(∑(x-\bar x)^2= 3805.28\)

 

\(∑(y-\bar y)^2= 10977.544\)

 

Let, the Pearson’s Correlation Coefficient be \(\mathfrak{R}\).

 

\(\mathfrak{R}=\frac{∑(x-\bar x)(y-\bar y)}{\sqrt{∑(x-\bar x)^2\times∑(y-\bar y)^2}}\)

 

\(\mathfrak{R}=\frac{2249.8864}{\sqrt{3805.28 × 10977.544}} = 0.3481\)

 

\(\mathfrak{R}=0.348\)

Analysis

The value of Pearson’s correlation coefficient is 0.348. As it is a positive value, it can be stated that the correlation is increasing in nature, i.e., with an increase in height of batsman, the strike rate also increases. This might be because taller batsman has a benefit in playing short pitched balls which allows them to score runs from a whole lot of deliveries. However, the value of Pearson’s correlation coefficient is very close to zero. It signifies that the correlation is very weak.

Evaluation of hypothesis

Processed data table

There are two headers of the processed data table expressed as x, and y. The height of the batsman is represented by x and the strike rate of batsman is represented by y.

x

y

15582.05
16092.67
168100.96
168110.97
17089.23
17089.36
17097.22
17098.33
17399.8
173100.27
174106.36
17587.78
17588.77
17594.04
175110.17
175111.07
176102.21
17788.26
17892.84
17897.65
178101.58
179120.83
18088.8
18089.75
18094.28
180103.3
180122.83
181105.72
182115.36
182150
182104.45
18389.53
18394.11
183100.52
183101.21
183112.43
18589.93
18593.18
18595.31
185127.53
187118.24
18888.32
18890.37
188143.13
191117.94
191136.11
193106.2
196121.31
19789.47
201108.97

Figure 8 - Table On Processed Data for calculation of R2

The formula of the T – value is shown below:

 

\(T \,value = \frac{|\bar x-\bar y|}{\sqrt{\frac{v_x^2}{n_x}+\frac{v_y^2}{n_y}}}\)

 

Here, \(\bar x\) is the arithmetic mean of all the observations of the height of batsman, \(\bar y\) is the arithmetic mean of all the observations of the strike rate of the batsman, vx is the variance of height of batsman, vy is the variance of strike rate of batsman, nx is the number of observation of height of batsman, and ny is the number of observation of strike rate of batsman.

Calculation of t – value

\(\bar x=\frac{x_1+x_2+...+x_n}{n_x}\)

 

\(=>\bar x=\frac{x_1+x_2+...+x_n}{n_x}\)

 

\(=>\bar x=\frac{155 + 160 + ⋯ + 197 + 201}{50} = 179.88\)

 

\(\bar y=\frac{y_1+y_2+...+y_n}{n_y}\)

 

\(=>\bar y=\frac{82.05 + 92.67 + ⋯ + 89.47 + 108.97}{50} = 103.2144\)

 

\(v_x^2=\frac{(\bar x-x_1)^2+(\bar x-x_2)^2+...+(\bar x-x_n)^2}{n_x}\)

 

\(=>v_x^2=\frac{(179.88 − 155)^2 + (179.88 − 160)^2+ ⋯ + (179.88 − 201)^2}{50} = 77.65877\)

 

\(v_y^2=\frac{(\bar y-y_1)^2+(\bar y-y_2)^2+...+(\bar y-y_n)^2}{n_y}\)

 

\(=>v_y^2=\frac{(103.2144 − 155)^2 + (103.2144 − 160)^2 + ⋯ + (103.2144 − 201)^2}{50} = 224.03151\)

 

Therefore, the T – value can be computed as -

 

\(T\ value =\frac{|179.88 − 103.2144|}{\sqrt{\frac{77.65877}{50}+\frac{224.03151}{50}}}\)

 

\(=\frac{76.6656}{\sqrt{1.5531754+4.4806302}}\)

 

\(=\frac{76.6656}{\sqrt{6.0338056}}\)

 

\(=\frac{76.6656}{2.45638}\)

 

= 31.210798

Calculation of degree of freedom

Degree of Freedom = n+ ny - 2 = 50 + 50 - 2 = 98

Result of t – test

The value of T – Test can be found from the table of values of T as mentioned in Background Information Section. From that table, it can be concluded that the Null Hypothesis is accepted and the alternate hypothesis has been rejected. Thus, it can be stated that there is no correlation between the height of batsman and the strike rate of the batsman.

Conclusion

There is no profound correlation between the height of a batsman (measured in cm) and his strike rate in the game of cricket.

  • The average strike rate of all the batsmen as studied in this correlative analysis is 103.2144.
  • The standard deviation in the values of the standard deviation with respect to height of each batsman is 14.967. Such a high value of standard deviation suggests that the strike rate of each batsman varies greatly from each other.
  • A linear correlation trendline was observed between the height of batsman and his strike rate. The equation of the trendline was: y = 0.5913x - 3.1403. However, the due to very weak correlation as given by the value of regression correlation coefficient (0.1212), the correlation was rejected.
  • A polynomial correlation trendline was observed between the height of batsman and his strike rate. The equation of the trendline was: y = -0.0089x2 + 3.7722x - 287.37. However, again due to very weak correlation as given by the value of regression correlation coefficient (0.1266), the correlation was rejected.
  • The value of Y – intercept of the linear correlation trendline was negative which is absurd to get, as strike rate cannot be negative. This also justifies the claim that there exists no correlation between the height and strike rate of the batsman.
  • The maximum value of strike rate as found from the polynomial correlation trendline was 221.92 cm. Thus, a batsman with a height of 221.92 cm will have the maximum strike rate as given by the polynomial trendline.
  • The value of T – test also satisfies the claim that the null hypothesis is true for the above-performed correlative study between height and strike rate of batsman.

Reflection

In this investigation, several process and mathematical tools have been observed to find the correlation along with its strength. The choice of tournament is one of the most important strength of this investigation. It has provided with a data sheet with accurate observations of strike rate and height based on the current form of cricket. Use of two different correlation coefficients – Regression and Pearson’s correlation coefficient has provided the strength and nature of correlation. Furthermore, values of mean, and standard deviation has enabled the investigation to analyse the variation of strike rate (dependent variable) in the observed data sheet. Lastly, the use of T – test has provided the conclusion regarding the correlation.

 

However, there are few weakness that has been observed during this mathematical investigation. As cricket is a game of uncertainty, there are a lot of parameters which govern the strike rate of the batsman. Few of such parameters are pitch quality, weather report, bowler etc. Different batsman has different cricketing technique which is also another parameter which governs the strike rate. As there are a lot of variables affecting the dependent variable (strike rate) apart from height, the correlation study cannot be efficiently carried on. In order to employ an efficient correlative analysis on the research question, all of these parameters must be controlled or made constant.

Bibliography

  • ‘Batting Strike Rate (SR) Calculator (Cricket)’. Captain Calculator, https://captaincalculator.com/sports/cricket/batting-strike-rate-calculator/. 23 Nov. 2020. Accessed
  • 'The Advantages of Short Soccer Players'.Sports Rec, https://www.sportsrec.com/1006527-advantages-short-soccer-players.html. Accessed 23 Nov. 2020.
  • Correlation. http://www.stat.yale.edu/Courses/1997-98/101/correl.htm. Accessed 22 Nov. 2020.
  • Data Analysis Pearson's Correlation Coefficient. http://learntech.uwe.ac.uk/da/default.aspx?pageid=1442. Accessed 22 Nov. 2020.
  • T Test (Student's T-Test): Definition and Examples'. Statistics How To, https://www.statisticshowto.com/probability-and-statistics/t-test/. Accessed 23 Nov. 2020.
  •  https://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf
  • IPLT20.Com - Indian Premier League Official Website. https://www.iplt20.com/. Accessed 23 Nov. 2020.
  • 'Board of Control for Cricket in India'. The Board of Control for Cricket in India, http://www.bcci.tv/. Accessed 23 Nov. 2020.