Mathematics AI SL's Sample Internal Assessment

Mathematics AI SL's Sample Internal Assessment

Investigating the relationship between a team's spending during the transfer window and their performance in two consecutive English Premier League(EPL) seasons

5/7
5/7
10 mins read
10 mins read
Candidate Name: N/A
Candidate Number: N/A
Session: N/A
Word count: 1,910

Table of content

Introduction

As a football fan who is both a football player and a massive supporter of the English Premier League club, Arsenal, I have developed an interest in investigating the relationship between the amount of money a team spends on buying players during summer (June-August) transfer window, and their position on the table at the end of the season. This investigation is aimed at determining whether a team's input( money spent) in the transfer window is a determining factor of the club's overall performance in the league. Exploring this topic will allow me to solve the mystery behind expenditure and standings in the English Premier League(EPL), which can further be extended to other football leagues and hence be used as a predictor of performance.

 

I expect a weak negative relationship between the variables ( amount of money spent and total points)because I have realized that bigger clubs who realistically spend more, usually end up higher up the table than smaller clubs who understandably spend less. I expect a negative because moving up the table is represented by a decreasing value of the position with 1st being the best position and 20th being the worst. I mention "bigger clubs", but the question is, what makes a club "big" in the EPL? In the EPL, a club is considered a "big club" based on several factors:

  1. Historical success of the club in various competitions such as the EPL, UEFA Champions League, FA Cup, etc.
  2. The club's fan base
  3. Financial resources of the club

Despite the common use of these factors to determine a "big" club, they can be subjective and vary over time (Das, 2023). Hence, a "small" club can be considered as one that is classified on the lower side of the factors stated above.

 

However, to simplify the process and test my hypothesis, I am going to use the famous "Big Six" of the EPL (Arsenal, Liverpool, Manchester City, Manchester United, Chelsea, and Tottenham) as representatives of the term "big clubs", and to compare their expenditure and standings to that of the rest of the clubs in the EPL (Kelly, 2021).

Data collection

To conduct my investigation, I used a reliable website, transfermarket.com, that contains up-to-date accurate documentation of transfer information of various clubs and leagues worldwide. I will look at two Premier League seasons: 20/21 and 21/22. This is to come up with an accurate conclusion by conducting two separate investigations and comparing the findings of the data in these two seasons.

PositionClubAmount spentClub ValueMajor trophies won
1Man City€167.40m€1.06bn22
2Man Utd€62.50m€758.10m42
3Liverpool€79.70m€1.03bn43
4Chelsea€247.20m€880.10m25
5Leicester€59.40m€420.30m5
6West Ham€27.70m€295.15m4
7Tottenham€110.50m€721.55m17
8Arsenal€84.00m€599.35m31

PositionClubAmount spentClub ValueMajor trophies won
1Man City€117.50m€1.04bn23
2Liverpool€40.00m€879.50m45
3Chelsea€118.00m€881.50m25
4Tottenham€66.90m€697.00m17
5Arsenal€165.60m€548.50m31
6Man Utd€142.00m€937.25m42
7West Ham€74.50m€354.75m4
8Leicester€67.60m€550.10m5

Mathematical Approach

Linear Correlation and Pearson's Product-Moment Correlation Coefficient (PPMCC)

The linear correlation is a measure of the relationship between two variables represented in a scatter plot. In my investigation, the two variables are the amount of money spent by a club and the position of the team at the end of the season. When plotted on a scatter diagram, the proximity of the points to each other determines the strength of the linear relationship. The strength of the relationship is quantified using a coefficient r which is the Pearson Product-Moment Correlation Coefficient (PPMCC).

 

The numerical value, r, usually lies within the range of -1 and 1 where:

  • -1 represents the perfect negative correlation
  • 0 represents no correlation
  • 1 represents the perfect positive correlation

The strength of the correlation varies with the proximity of r to either -1 or 1 whereby when:

  • r is between 0 and 0.25 it is a very weak correlation
  • r is between 0.25 and 0.5 it is a weak correlation
  • r is between 0.5 and 0.75 it is a moderate correlation
  • r is between 0.75 and 1 it is a strong correlation

The above also applies when the value is on the negative scale.

Figure 3. Scatter diagram showing the relationship between money spent by a club and the club's position at the end of the 20/21 English Premier League season

Where \(r=-0.577\) (See Appendix C)

 

Because the r value lies between -0.5 and -0.75, the result for the 20/21 season suggests a moderate negative correlation between the two variables of the amount spent and the position of the team in the EPL. By observing the scatter diagram above, I was able to deduce a negative relationship based on the gradient of the trendline. The moderate PPMCC value suggests that as the amount of money spent by a club reduced, the position of the club moved lower down the table (representative of decreasing performance). However, to confirm whether this is a recurring trend over seasons, I conducted the same test on the 21/22 season to compare the coefficients.

Figure 4. Scatter diagram showing the relationship between money spent by a club and the club's position at the end of the 21/22 English Premier League season

Where \(r=-0.547\) (See Appendix D)

 

The r value lies between -0.5 and -0.75 hence suggesting a moderate negative correlation between the variables, though weaker than the previous season's coefficient, they both suggest that a lower value of money spent in the summer transfer window results in poorer performance which is represented by a lower standing in the EPL table.

 

In support of my hypothesis following the results of testing the relationship between the variable in the 20/21 and 21/22 EPL seasons, I can deduce that the trend is a recurring one and shows some dependence of a team's standing upon the money spent on transfers. However, I will need to test this dependence to confirm this observation.

Chi-Square Test for Independence (χ²)

The χ² test for independence is used to determine whether two sets of data are independent of each other or not. I had earlier observed some dependence between the variables of my investigation and sought out the χ² test for independence as a suitable method to test and verify my observation. The χ² was also used to further examine the relationship between the variables.

 

To conduct this test I categorized the EPL clubs in terms of performance and spending, where I broke down performance into either good, moderate, or poor, and broke down spending into high, moderate, and low.

 

Below are the conditions a team had to meet to fall under the specific broken-down groups.

 

For performance:

  • Good performance- top 6 clubs
  • Moderate performance- clubs between 7th and 17th position
  • Poor performance- bottom 6 clubs

For spending:

  • High spending- any team spending over €100 million
  • Moderate spending- any team spending between €50 and €100 million
  • Low spending- any team spending below €50 million

I used a 1%(0.01) level of significance to minimize the chances of incorrectly rejecting the null hypothesis. I used the critical value of 13.277 which was obtained from a Chi-Square distribution table by Turney (2022).

 

The null hypothesis and alternative hypothesis:
\(H_0\): The performance of a team is independent of its spending
\(H_1\): The performance of a team is not independent of its spending

High SpendingModerate SpendingLow spendingTotal
Good Performance2316
Moderate Performance2428
Poor Performance0156
Total48820

I calculated the Expected values using the formula: \(E(x)=\frac{\text{Total}}{\text{No. of cells}}=\frac{20}{6}=3.3\)

 

Below are the results of the χ² test that I got using my GDC:

 

\[\begin{aligned}&\chi^2=7.083\\&p=0.132\\&df=4\end{aligned}\]

 

Following the result, I accepted the \(H_0\) because \(7.083(\chi^2\text{ value})<13.277\) (critical value). These results suggest no significant association between the variables in the 20/21 EPL season.

High SpendingModerate SpendingLow spendingTotal
Good Performance4116
Moderate Performance0538
Poor Performance0336
Total49720

The results of the χ² test conducted for the 21/22 season were:

 

\[\begin{aligned}&\chi^2=11.96\\&p=0.018\\&df=4\end{aligned}\]

 

Following the results above I accepted the \(H_0\) because \(11.96(\chi^2\text{ value})<13.277\) (critical value). Suggesting that there is a lack of a significant relationship between the two variables in the 21/22 season.

 

The results of the tests from both EPL seasons suggest that there is no dependence between the amount of money spent during a transfer window and a team's overall performance. These findings contradict and invalidate the observations I had made about the presence of some dependence between the variables.

Evaluation

The statistical tests used to investigate the relationship between the variables acted as strengths of my investigation because the PPMCC is well-suited for assessing linear relationships and is commonly used, making it versatile for exploring relationships between two variables. The Chi-Square test of independence played a vital role in verifying the relationship suggested by the PMCC and provided results that are relatively easy to understand.

 

On the contrary, my investigation was limited since I only took into account two English Premier League seasons hence the representativeness of the findings is limited and would require the testing of a larger sample such as 5 seasons. However, this was impractical due to the vast amount of data collection and tests that it would require.

 

The expenditure data collected on the clubs can be invalidated by the fact that the clubs might not release the actual amount of money spent hence raising concerns about the accuracy of my results. Therefore, I cannot generalize my findings to other seasons of the EPL or other leagues.

Conclusion

Overall, it can be concluded that my initial hypothesis, whereby I expected a weak negative relationship, has been invalidated by my investigation. The first test, the PMCC, showed the presence of a relationship between the variables however, the second test, the Chi-Square test for independence, contradicted the relationship assumed and suggested that there is no substantial evidence to show an association between a club's spending and their placement in that English Premier League table. Furthermore, it depicts that the amount of money spent in the summer transfer window, which is the longest window, cannot be used as a predictor of a team's performance at the end of the season. This investigation is crucial for football fanatics such as me and others and can be applied to our everyday lives such that we can understand that regardless of the money spent by any team, it does not guarantee their performance, and this helps reduce assumptions and false predictions. The findings have proved that the common relation of a club's spending to the club's performance is a stereotype and is not the reality of the sport. At the end of this investigation, my interest in the relationship between expenditure and performance has been satisfied and I am now able to keep my expectations for my team (Arsenal) realistic by looking at other predictor factors such as squad depth and strength rather than making assumptions based on transfer spendings.

Reference

Das, P. (2023). What makes an English Premier League (EPL) club "big"? Quora. https://www.quora.com/What-makes-an-English-Premier-League-EPL-club-big

 

Kelly, R. (2021, April 21). Who are the Premier League 'big six'? Top English clubs & nickname explained. Goal.com. https://www.goal.com/en-ke/news/who-are-premier-league-big-six-top-english-clubs-nickname-explained/130iokmi8t8dt1k3kudou73s1k

 

Neuenhaus, M. (n.d.). Football (Running Total of Trophies). KryssTal. http://www.krysstal.com/trophies.html

 

Premier League - Transfers 20/21. (n.d.). Www.transfermarkt.com. https://www.transfermarkt.com/premier-league/transfers/wettbewerb/GB1/plus/?saison_id=2020&s_w=s&leihe=1&intern=0&intern=1

 

Premier League - Transfers 21/22. (n.d.). Www.transfermarkt.com. https://www.transfermarkt.com/premier-league/transfers/wettbewerb/GB1/plus/?saison_id=2021&s_w=s&leihe=1&intern=0&intern=1

 

Turney, S. (2022, May 31). Chi-Square (X²) Table |Examples & Downloadable Table. Scribbr. https://www.scribbr.com/statistics/chi-square-distribution-table/

Appendices

Appendix A: Table displaying the raw data collected on the teams in the 20/21 season

PositionClubAmount spentClub ValueMajor trophies won
1Man City€167.40m€1.06bn22
2Man Utd€62.50m€758.10m42
3Liverpool€79.70m€1.03bn43
4Chelsea€247.20m€880.10m25
5Leicester€59.40m€420.30m5
6West Ham€27.70m€295.15m4
7Tottenham€110.50m€721.55m17
8Arsenal€84.00m€599.35m31
9Leeds€127.80m€128.05m7
10Everton€74.37m€411.05m15
11Aston Villa€85.50m€232.70m20
12Newcastle€39.00m€228.45m11
13Wolves€84.59m€318.80m9
14Crystal Palace€18.90m€188.50m0
15Southampton€37.30m€211.20m1
16Brighton€21.90m€209.05m0
17Burnley€1.10m€154.78m1
18Fulham€37.25m€144.25m0
19West Brom€40.45m€69.00m7
20Sheff Utd€62.70m€137.95m5

Appendix B: Table displaying the raw data collected on the teams in the 21/22

PositionClubAmount spentClub ValueMajor trophies won
1Man City€117.50m€1.04bn23
2Liverpool€40.00m€879.50m45
3Chelsea€118.00m€881.50m25
4Tottenham€66.90m€697.00m17
5Arsenal€165.60m€548.50m31
6Man Utd€142.00m€937.25m42
7West Ham€74.50m€354.75m4
8Leicester€67.60m€550.10m5
9Brighton€57.00m€248.10m0
10Wolves€32.30m€391.30m9
11Newcastle€29.40m€242.90m11
12Crystal Palace€73.44m€239.45m0
13Brentford€38.20m€167.85m0
14Aston Villa€99.80m€406.80m20
15Southampton€63.40m€241.30m1
16Everton€2.00m€461.75m15
17Leeds€61.05m€250.80m7
18Burnley€31.90m€145.30m1
19Watford€18.80m€133.80m0
20Norwich€63.55m€189.55m2

Appendix C: PPMCC test for 20/21 EPL season

Additional data visualization

The value of R is -0.5775.

 

This is a moderate negative correlation, which means there is a tendency for high X variable scores to go with low Y variable scores (and vice versa).

AI Assist

Expand

AI Avatar
Hello there,
how can I help you today?