The Deep Soccer Blog

Ciencia de Datos y Fútbol

Deep Soccer, but how on earth does this guy make predictions?

In the field of machine learning, there are countless diverse approaches to reach a goal. The world of football is no exception, and the methodology can vary greatly. In my case, in the early versions of Deep Soccer, predictions were based on a generic scoring system evaluating different aspects of each team (offensive, defensive, physical, strategic, etc.). However, I always felt that predictions with this model were too linear. Besides, what would happen if a team was experiencing an optimal or very poor run of form? It was time to explore a different hypothesis.

From that point, I considered numerous options. In my case, I decided that the last five matches played by each team provided a sufficient sample of their current form. This was enough to gauge the team’s overall potential and, to some extent, their current state. So, the decision was made. Only one important question remained: which algorithm should I use? Or perhaps it would be better to train a custom neural network?

In a project like this, technical feasibility is as critical as the accuracy of the results. What use would it be to train a series of neural networks or decision tree systems that weigh gigabytes? How could I possibly host such a behemoth in the cloud while keeping costs reasonable for a project of this kind? Would maintaining something like that even be viable?

Thus, the challenge wasn’t just maximizing accuracy but also adapting to the project’s technical and economic constraints. In that regard, LightGBM met many of the essential requirements: it’s based on decision trees, supports multi-output (a single model can predict multiple targets), and is versatile and lightweight, making it easier to host in the cloud. On the other hand, its accuracy was more than acceptable compared to other models that demand significantly greater cloud resources. Hence, the decision was made.

From there, the goal was clear: to collect the largest number of samples (matches) possible so that predictions would be based on a wide range of experiences. Today, Deep Soccer bases its predictions on a database of 60,000 matches. This database is continually expanding to cover the broadest possible spectrum of matches and scenarios. By 2025, it’s expected that the database will grow to 100,000 matches.

And from there, paraphrasing Vujadin Boskov: “Football is football.”

Best regards to all,


Miguel

Leave a Reply

Your email address will not be published. Required fields are marked *