NewWorldTech | NWT



We believe Brazil will win this year’s world cup…

Using publicly available data based on world FIFA rankings over past 30+ years and data on all international football competitions, we did some data analysis and performed feature engineering to create and select the features essential to building the model with the use of the Random Forrest classifier algorithm which was done in python with the use of machine learning libraries Pandas, Numpy, Keras and Sckit-learn as part of the process. We then looked to take this historical data and narrow it down the teams that qualified for this world cup and find the probabilities of each game based on the probability of one team winning the game or whether a match would end in a draw across all groups in the initial stages of the competition, to the round of 16, quarter finals, semi-finals and the final.

In several methods used to predict the outcomes of international football matches, the number of goals scored by each team has historically served as a baseline indicator for evaluating a team’s performance and projecting future results. However, there is a significant element of randomness in the number of goals scored during a game, which frequently results in significant discrepancies between a team’s performance and the number of goals scored or conceded.

When forecasting the outcome of a match between two teams, a person will often consider a variety of factors, such as the teams’ most recent results, whether the game will be played at home or away, recent player moves, recent coach and staffing changes, etc. It is difficult for someone to accurately anticipate the outcome of a game because their choice will be influenced by factors including their preference for teams, how they feel about specific players on a team, and other factors. Some studies even suggest that the team’s clothing may influence decisions.

Match outcome prediction is one of the crucial applications in football match prediction that demands high predictive accuracy, in this case when building our model using the random forest classifier, we displayed an accuracy of 96% when validated across training data, however 76% on testing data, although not too low of an accuracy score between the two, the model potentially could be overfitting. Could this lead to skewed results? The results of the matches are typically predicted using mathematical and statistical models, which are frequently evaluated by a subject-matter expert. The uniqueness of match-related characteristics to diverse sports makes it difficult to directly compare the findings from other studies in this application. Play by play data, which includes specifics on each shot or pass made throughout a game, has lately been acquired for multiple games in different competitions when it comes to creating machine learning predictions on scores.

Data science has advanced in the football industry as a result of the collection and aggregation of this data. A football fan must have wanted to know the result of a match between two teams before the game at least once in their lives! A sizable number of factors, come into play when performing feature engineering. Given the collected datasets we used in our model these features include the past performance of the teams, match outcomes, and player statistics, which are gathered to help assess the odds of winning, drawing or losing in the upcoming matches. Machine learning algorithms are essentially designed to classify items, search for patterns, predict outcomes, and draw conclusions. It is possible to use one method at a time or combine numerous algorithms to get the best level of accuracy when working with complex and more unpredictable data.

Following the outcomes derived from our model from the initial group stages we simulated how the rest of the tournament would look like down to the final and created a graphic to go with it. Even though there are a number of other models and methods for forecasting and predicting football score data, the ideal model would be one that can handle complicated datasets, especially in this modern era of football which proves to be stochastic such that a larger range of elements are expected to affect the outcome of a game. As the world cup continues to come along, the team at NWT are excited to see how our model compares against the real-life outcomes of the games yet to be played.

If you would like to find out more about how NWT could help your organisation to utilise and exploit your data please reach out today by clicking here