Is It Possible To Beat The Bookie

Transcription

Engineering Degree ProjectIs it possible to beat the bookie- Machine learning and football!Author: Hampus BergqvistSupervisor: Jonas NordqvistSemester: VT/HT 2020Subject: Computer Science

AbstractForecasting is one of the toughest tasks in the world, but that doesn’t stop usfrom trying. In this project, machine learning is used to predict three-way result on football fixtures and applied to the field of football betting. The sportsbetting market is one of the biggest in the world right now, and the sports betting companies are making a lot of money. So the question arises, is it possible to beat them? The answer is yes, with machine learning and historical data,it is possible to beat them.Keywords: random forest, neural network, football, logistic regression,machine learning, betting.2

Contents1 Introduction 51.1 Background 51.2 Related work 61.3 Problem formulation 61.4 Motivation 71.5 Objectives 81.6 Scope/Limitation 81.7 Target group 91.8 Outline 92 Method 102.1 Approach 102.1.1 Strategies 112.3 Data set and data processing 202.3.1 League name 212.3.2 Home team formation/Away Team formation 212.3.3 Home defensive players/Home o ensive players/Homegoalkeeper/Away defensive players/Away o ensive players/Awaygoalkeeper 212.3.4 Highest average of home players/Lowest average of homeplayers/Lowest average of home players/Lowest average of homeplayers 242.3.5 Home Team/Away Team 242.3.6 Home win, draw, loss/Away win, draw, loss 272.4 Ethical Considerations 273 Theory 283.1 Machine learning 283.2 Logistic Regression 283.3 Random Forest 293.4 Neural Network 293.5 Hyperparameters 304 Implementation 304.1 Scripts 304.2.1 whoscored/WSScraper.py 30ffff3

4.2.2 whoscored/main.py 304.2.3 whoscored/player strat/PlayerScraper.py 314.2.4 oddsportal/main.py 314.2.5 get data players.py 314.2.6 ml/ 315 Results 325.1 Bookies 325.2 Logistic Regression 335.3 Random Forest 355.4 Neural Network 366 Analysis 387 Discussion 408 Conclusion 418.1 Future work 41References 42A Appendix 1 45A.1 xtures database 45A.2 player xtures database 48A.3 xture formations 49A.4 Hyperparameter selection for Logistic Regression 49A.5 Hyperparameter selection for Random Forest 50A.6 Hyperparameter selection for Neural Network 51A6.1 Optimizer 51A6.2 Learning rate 51A6.3 Neurons 52A6.4 Hidden layers 52A6.5 Decay 52fififi4

1IntroductionIt’s maybe impossible to predict the future. But with the help of machine learning, we can get an estimate of something that is going to happen. In theworld of football betting, an odds set by the bookies1 is just that, an estimateof something that is going to happen.This project will test different machine learning models' ability to predict different football fixtures2. The probabilities created by the different models will be converted into odds and compared with the bookies. If the oddsgiven by the model are lower than the odds given by the bookies, that meansthe model thinks something is more credible to happen than the bookies.1.1BackgroundThe gambling market is one of the biggest markets in the world right now. InSweden, the gambling companies spent 738.5 million dollars on commercialsfor the year 2018, that’s roughly 73.85 dollars per person [1]. The biggestgambling company that operates in Sweden had a 670 million dollar profitthat year [2]. Approximately two-thirds of the Swedish population gambledwith money in 2018 and about half out of those stated that they gambled atleast once every month [3]. A lot of people gamble, and gambling companiesearn a lot of money.But why do people gamble? According to the International Journal ofMental Health and Addiction who interview 131 people from different backgrounds in New Zealand [4], there were five major reasons: Economic reasons, such as winning experiences.Personal reasons, such as mood.Recruitment, such as how gambling is normalised.Environmental reasons, such as the availability.Social reasons, such as social participation.Notice that none of the above reasons were “saving” or “investing”.There are a lot of different ways to gamble, one popular gambling feature is trying to predict the 3-way-result on a football fixture, which meanseither home team wins, draw, or away team wins. A gambler will then make adecision based on the given odds and different parameters about the fixture.Since it boils down to a decision problem, we can use machine learning topredict the fixtures. So the question arises, is it possible with the help ofmachine learning to beat the sports betting companies and earn money?1A bookie or bookmaker is someone who sets odds.2A football match between two teams.5

1.2Related workSeveral articles try to predict the outcome of football fixtures with the help ofdata.Sumpter has written about a strategy in his book “Soccermatics: Mathematical Adventures in the Beautiful Game”. In short, he used a statisticalmodel to show that it pays off to back the favourites in Premier League andthat there is an even more strongly bias against betting on a draw betweentwo evenly matched teams, especially between the big six3. He was able tomake a 25% return over half a season [5].Stübinger and Knoll tested different machine learning models' ability topredict “easy wins” on the top five European football leagues by letting themodel train on predicting the goal difference between the home team and theaway team. Therefore, when the model predicts that one of the team will winby more than 1 goal, a bet was placed. A Random Forest had 75.62% accuracy on the 3-way results where the model predicted that the team would winby more than 1 goal [6].Tax and Joustra trained different machine learning models, such as decision trees, neural networks, and naive Bayes, to predict fixtures from theDutch Eredivisie. One of the models had a higher accuracy than the bookieson unseen data [7].1.3Problem formulationThe odds set by the gambling companies can be seen as probabilities. Toconvert a probability into an odds you need to divide 1 with the probability.For example, if the 3-way odds for a fixture between Team A and Team Bwould have been 2.00 for Team A to win, 5.00 for a draw and 3.33 for TeamB to win, the probabilities would have been 50% for Team A, 20% for a drawand 30% for Team B. In a perfect world, this is how odds are generated. But,gambling companies do not give out these true odds. Instead, they will decrease the odds, so the odds will become 1.90 for Team A, 4.90 for a draw, and3.23 for Team B, to make money.So how well does the odds created by different machine learning models stand against the odds given by the bookies? This report will focus onfinding models that can predict 3-way-result on football fixtures and findstrategies that can earn money.At the website oddsportal.com, where you can find odds from differentbookies, there are 37 different bookies compared. These bookies competewith each other for customers. To gain customers, you can’t provide odds thatare too low, so the bookies compete with each other to provide as high oddsas possible. Maybe the bookies provide to high odds sometimes? For exampArsenal, Manchester United, Manchester City, Chelsea, Tottenham andLiverpool.36

le, when big teams with a large fanbase like Arsenal plays against a smallerteam with a smaller fanbase like Fulham, are the odds only based on historical data? Or are there some factors that depend on how much money thatwould be placed on one of the teams to gain customers? If that is so, I believethe machine learning models would find odds that are too generous.1.4MotivationCreating different machine learning models to predict football fixtures andfind betting strategies can be used with more than just trying to earn money.The approach taken in this report can also help coaches and teams performbetter. For example, when Team A plays against Team B, what happens if wefeed our model with features from another player or a different starting formation, do the probability of winning increase? What do the players need todo to increase the probability of winning? Is it passing accuracy, shot accuracy, more shots, etc?!7

1.5ObjectivesO1Scrape data from whoscorred.comO2Scrape data from oddsportal.comO3Data preprocessingO4Train and evaluate models ability to predict 3 way resultsO5Create and evaluate strategiesO6Examine the resultsThe first objective (O1) is to scrape whoscored.com for data on different fixtures, both team performance data and individual player data. Thewhole dataset will be done after the odds from different bookies will bescraped from the website oddsportal.com. O3 will focus on preparing andcreating different features before O4 can begin training different machinelearning models. When O4 is done and the result is considered approved, O5will begin trying to find strategies that can yield a positive return.Finally, O6 will look more closely at the different odds created by themodels and given by the bookies, where do they differ and why can that be.1.6Scope/LimitationFootball is one of the biggest sports in the world. According to FIFA (Fédération Internationale de Football Association), 3.572 billion people watchedone game or more in the world cup 2018 [8]. In 2007, FIFA had 265 millionregistered football players [9].Because of this, the amount of time or computer power needed to collect and process the data cannot be achieved within this timeframe. Instead,the leagues and seasons that were used in this project are shown in Table 1.1.CountryLeagueSeasonsEnglandPremier League2009 - 2020EnglandChampionship2013 - 2020ItalySerie A2009 - 2020FranceLigue 12009 - 2020GermanyBundesliga2009 - 2020SpainLa Liga2009 - 2020Table 1.1 - Leagues and Seasons8

The sports betting companies also provide a large number of different odds.For example, in 2014, a Swedish teacher placed a bet that the Uruguayanstriker Luis Suarez would bite someone during the World Cup. When Uruguay faced Italy, Luis Suarez bit the Italian defender Giorgi Chiellini and theSwedish teacher won 1400 dollars [10].This report will not cover every possible odds that the bookies offer.Instead, the odds that will be used will be a 3-way result, which is eitherhome win, draw, or away win.1.7Target groupThe target group of this report is primarily people with an understanding ofmachine learning and an interest in football.1.8OutlineChapter 2 will describe the method that has been used to solve this problemand an explanation of the strategies. Chapter 3 will present the theory behindthe project with explanations of the different machine learning models used.Chapter 4 presents the different script used with the method described inChapter 2. Chapter 5 will present the result from the method and Chapter 6will analys this results. Last, Chapter 7 will discuss the results and some ownthoughts about this report.9

2MethodThe machine learning models that were experimented with were logisticregression, random forest, and neural network. These three models are themodels that I feel most comfortable with. In previous project, I've always gotthe best results with these three models. To find the best hyperparameters foreach model and to find strategies outperforming the bookies, controlled experiments were conducted on a validation dataset.2.1ApproachBefore the different machine learning models were trained, data was extracted from the websites whoscored.com and oddsportal.com. The websitewhoscored contains data about the different fixtures and the website oddsportal contains the corresponding closing odds on the fixtures, which means thatthe odds are generated just before the game start. Because of that, we needdata on the line-up as well, otherwise, the odds from the bookies will notmatch the probabilities from the models.The data received

5.1 Bookies_32 5.2 Logistic Regression _33 5.3 Random Forest _35 5.4 Neural Network _36 . machine learning to beat the sports betting companies and earn money? 1 A bookie or bookmaker is someone who sets odds. 2 A football match between two teams. 5. 1.2 Related work Several articles try to predict the outcome of football fixtures with the help of data. Sumpter has written .