You are viewing 1 of your 1 free articles
As the world looks forward to the unique spectacle of a northern hemisphere winter football World Cup, Jason Tee examines the science practitioners apply to keep the world’s best footballers on the field.
Qatar’s successful bid to host the 2022 football World Cup was atypical in that the World Cup usually takes place in June and July. The World Cup typically falls in the gap between the end and the beginning of most northern hemisphere professional football leagues, and players would have a short rest before returning to their clubs for another league season. However, the June-July World Cup window was unfeasible due to the extreme heat of the Qatari summer. So instead, the competition will take place from 20 November to 18 December, in the (relative) cool of the Qatari winter. This is unique because the major European Leagues will take a break throughout the World Cup, but players will return to action for their clubs immediately after Christmas.
This arrangement provides a unique challenge for footballers and their clubs worldwide, with many voicing concern over the capacity of footballers to maintain performance and remain injury free in the face of such a demanding schedule. These concerns are not unfounded. In 2020, a group of Israeli researchers published research quantifying the estimated financial costs of injuries to footballers in the British Premier League competition(1). They estimated that clubs lose, on average, one league position for every 271 days lost to injury. The financial cost of this to clubs in terms of reduced prize money, tv viewership, and access to more lucrative competitions is estimated to be £45 million. No wonder teams are nervous about their players returning from the World Cup tired and injured.
Sports science research has conclusively demonstrated that teams that manage to keep their players’ injury free perform better over the long term(2). In an effort to mitigate the negative effects of injuries, many professional sports organizations now employ sports and data scientists to monitor their athletes and guide them on preventing injuries. For example, in March 2021, Nature.com published an article describing how sports scientists working in professional football used artificial intelligence (AI) and advanced algorithms to “predict” sports injuries. This is an appealing concept! In addition, a modern understanding of training science suggests that injuries occur when players exceed a particular training load and become fatigued. Surely then, if we can measure all the loads that players are exposed to, we could “predict” injury risk and manage them away from it. Again, it is an appealing concept, but this assumption has several problems.
The relationship between training load, fatigue, and injury is not linear. There are a lot of variables to consider, and most of these variables affect other variables. For example, let’s say that a team’s sports scientist has determined that when a player exceeds 25 000m of running in a week, they are at increased risk of injury. This seems like a reasonable assertion, but what if a player is sleeping poorly or is dehydrated? Does that number go up or down? What if they stretch regularly, use ice baths, and follow a nutritionist’s directions? Could we then squeeze more out of that player? What if they are older or younger? Or have more or less playing experience? Or its week one of the season or the week leading up to a big final? The interrelationships between these different factors create a complex system(3). A system with any hope of predicting sports injuries would need to account for most of the interrelationships in the complex system. Interestingly, this is one of the major appeals of the use of AI applications – they can account for a vast array of different variables. But AI applications depend entirely on the data entered into them, which provides some challenges.
Developing algorithms that might be able to predict injury is a long and complex process. Norwegian sports scientist Roald Bahr has accurately described this process in his 2016 publication “Why screening tests to predict injury do not work—and probably never will…: a critical review” (4).
The first step in predicting injury is identifying potential risk factors in a prospective cohort study. This step is initiated by identifying a group of athletes and then measuring their exposure to training over time. In the context of professional football clubs hoping to reduce injuries in their team, this would likely involve several measures. Clubs will likely engage in preseason screening, where they might record individual characteristics such as age, body mass, injury history, and muscle imbalances. In addition, teams would record daily measures of training outputs such as total distance, high-speed distance, and the number of accelerations and decelerations performed. Teams may also record individual daily measures such as sleep quality and time, mood state, or hydration status (see figure 1).
In parallel with collecting all this data on training and training response, teams would also accurately record the number of injuries that players sustain. Then, at the end of a specified period (usually a season), data scientists at the club would search for associations between risk factors (or combinations of risk factors) and injury.
At this point, data scientists might get excited about their findings and declare, “We have found that when players exceed 5 273 m of high-speed running in a week, they are at an increased risk of injury”. Unfortunately for the excitable data scientist, it is too early to confidently make this kind of statement as the injury prediction model needs to be tested in other cohorts (see figure 2).
Dangers of discretization
Discretization is the process of transforming data from continuous variables into discrete categories. Scientists do this by creating a set of “bins” that describe the measured data. Discretization creates cut-off points that we use to categorize our data so that it can be easily interpreted and inserted into models. In data science, these categorizations are done by comparing different cut-off points and determining which one maximizes the number of true positive and true negative predictions.
As an example, we might be interested in the relationship between people’s standing height and their likelihood of playing in the NBA. We would collect all the heights of the people in our cohort and then determine a cut-off value that best predicts who will be drafted into the NBA. We might find that being above 6ft. 10in tall is a good predictor for selection.
The problem with discretization is that most continuous variables are normally distributed (as represented by the bell curves above). This means that while, on average, NBA players are taller than the normal population, some players within the NBA fall within the regular population curve. Similarly, some very tall people in the world never play in the NBA. Using a single discrete cut-off value always causes this problem and never results in a perfect prediction.
2. Validate the injury prediction model in a ‘different’ cohort
Once a sport or data scientist has developed a model that they believe can be used to predict injury in a population, the next step in the validation process is to test the model in a new group. For injury prediction models to be useful, they should work in all settings, not just the one in which they were developed. This validation in different cohorts has proven to be difficult for most injury prediction researchers.
For example, in 2013, Australian researchers investigated the association between eccentric hamstring strength and a hamstring injury in 186 elite Aussie Rules footballers(7). They determined that a cut-off point of 256N of eccentric hamstring strength was the best predictor of injury risk, with athletes falling below this cut-off being at increased risk of injury. The same research group repeated their experiment with a different player group two years later and could not predict injury with any consistency(8). Very few research groups have reached the point of testing their injury prediction models in new cohorts. Prediction models perform better during development than during implementation, so practitioners should maintain a healthy skepticism around new prediction models(9).
3. Randomized control trial to demonstrate an effect
The final step in developing and validating an injury prediction model is to demonstrate that by using data from the model, practitioners or researchers can reduce injury. In order to achieve this, research has to have two groups – an experimental group that receives treatment or intervention based on the model prediction and a control group that receives no intervention or a placebo. Using the example of the Aussie Rules footballers above, research like this would have to demonstrate that a strength training intervention that increases players’ eccentric hamstring strength reduces injury compared to a group of players who didn’t do a strength training intervention. Randomized control trials are notoriously difficult to conduct, and no current research groups have developed their injury prediction models to this point.
Despite the promise and excitement around the potential to use machine learning to predict injury in sports, we are still in the very early days of application. Recently experts conducted a systematic review to gather and evaluate all the published research in sports injury prediction(10). After an extensive evaluation, they concluded: “that there are no existing musculoskeletal injury prediction models that could be confidently recommended for use in sports medicine practice” (10). While the idea of injury prediction in sports remains appealing, significant growth and development are still required before this will be an accurate and reliable science.
Ethics of data collection for injury prediction
Modern athletes are among the most closely monitored individuals in history. Their sporting performances are subject to constant public scrutiny, and in many cases, their private lives are too.
Many sports scientists would argue that the more data they collect on athletes, the better their models predict injury. As a result, sporting organizations’ demands for athlete data are rapidly increasing, but where will it stop? In some organizations, athletes would be expected to:
- Wear GPS and HR monitors during training and matches
- Report loads lifted in the gym
- Report perception of effort after each session
- Submit to daily weigh-ins
- Report sleep quality and duration
- Report mood state
- Report the stage of their menstrual cycle
- Measure their hydration status daily
- Complete daily fatigue monitoring protocols
The collection and reporting of this data are invasive. Modern data protection laws argue that the collection of personal data should be with good reason. If the reason for subjecting athletes to such scrutiny is to fuel injury prediction models that are not yet fit for purpose, it may be time to rethink this approach.
Our international team of qualified experts (see above) spend hours poring over scores of technical journals and medical papers that even the most interested professionals don't have time to read.
For 17 years, we've helped hard-working physiotherapists and sports professionals like you, overwhelmed by the vast amount of new research, bring science to their treatment. Sports Injury Bulletin is the ideal resource for practitioners too busy to cull through all the monthly journals to find meaningful and applicable studies.
*includes 3 coaching manuals
Get Inspired
All the latest techniques and approaches
Sports Injury Bulletin brings together a worldwide panel of experts – including physiotherapists, doctors, researchers and sports scientists. Together we deliver everything you need to help your clients avoid – or recover as quickly as possible from – injuries.
We strip away the scientific jargon and deliver you easy-to-follow training exercises, nutrition tips, psychological strategies and recovery programmes and exercises in plain English.