Hacking Tryouts – Experimenting with advanced statistics

Tryout camps are a time of tumultuous emotion, upset parents, scorned players, stressed out coaches, and political agendas. When I took part in the minor hockey and junior tryout camps, I was sort of blind to all the calamity around me. When I began attending tryout camps from the perspective of strength and conditioning/skills coach, I took on a whole new perspective. I sort of took on the perspective of the anxious parent who wants his kid (in my case, athletes I train), to do well. I also took the perspective of the coach trying to sort out who was deserving of a spot and who wasn’t. This journey was further animated by the (wide) range of perspectives of every different parent.

Most parents of players who were cut, thought that their kid was not given a fair shake. I sometimes agreed, and other times disagreed. Since parents were so biased in watching their kids play, I wondered, though, exactly how much of my bias was distorting my perception of players’ performance. The next logical question was: how much of the coach’s bias distorted his views? In order to answer this, I wanted to evaluate some sort of objective data that might track a player’s performance in the tryout camp and possibly predict their future performance on the team. So that’s what I did…

Moneyball: The birth of advanced stats

Most articles on this blog examine how a bias distorts or slows a hockey player’s improvement. This article examines how bias may prevent a coach from choosing players based on criteria that is not actually proven to lead to winning outcomes.

Moneyball illustrates how coaches, scouts, and general managers are affected by cultural and historical bias in choosing baseball players. The main point from the movie was that the decision makers who chose players were selecting players based on attributes like size, physical appearance, and technique. After careful examination, it was found that having players with these attributes would not contribute to a winning baseball team. Instead, there were certain statistical indicators that players possessed, which actually (and mathematically) led to the entire team’s success.

Plus/Minus to Corsi – The evolution of hockey’s advanced stats

The only way, for a while, that people could measure a player’s success was by point totals. The way that you could measure a team’s success was in wins. If a player wasn’t supposed to be a scorer, their success might be measured by how many times goals were scored while they were on the ice, for or against.

These are rough measurements because goals only happen a few times per game. And if goals and wins are the only event markers we can measure success by, then we have very little opportunity to provide feedback to players beyond what we see subjectively.

Corsi was invented to provide more data by which an observer could predict the success of a team. Corsi measures the amount of shots that are either directed towards the opponent’s or your net while a player is on the ice. Measuring these shot attempts is supposed to extrapolate to the amount of puck possession a team has. It is assumed that a team with more puck possession over the course of a game will control and thus win the game.

Corsi’s magic lies in that there are more events to record per game. By recording the many shot attempts, an observer has more data to base conclusions on. By basing measurements on events that are only recorded on goals, we have way less data. Less data means that conclusions we draw on these events may cause an illusory correlation. An illusory correlation is an event that seems to be linked to another event, but just happens to be linked by the data when it is measured…there is no real correlation!

Corsi’s Predictive Ability

Last season (2013-2014), it was predicted that the Toronto Maple Leafs would not make it to the playoffs by statisticians. The reason was that they did not generate a high enough “shot differential for”. This means that they were consistently outshot by opponents. However, they surprised everyone by starting the season out strong.

The Maple Leafs maintained their negative shot differential through the season, and as predicted, dropped out of a playoff spot. A similar pattern was shown with the Colorado Avalanche last season too with statisticians predicted that they would not make it very far in playoffs.

Corsi as Evaluative Metric

So we might tentatively conclude that a better shot differential for a team might lead to more wins. We need more data (and I need to do more research) to strengthen this conclusion. In the meantime, if we know that more shots for tends to lead to more wins, then we would want players on our team who have better Corsi differentials. This means that players tend to direct more shots on the opposing net than they have directed at their own when they are on the ice.

The new wave of GM may start looking at players’ Corsi differentials to select his team. This might provide a GM with a mathematical model by which to build their team, but does it provide the GM with any insight on their players? Is it predictive of prospects being able to move up in the ranks? Corsi is supposed to indicate if a player factors into his team getting possession and therefore shots. But does Corsi tell you anything about how a player goes about getting his or her own possessions? Not really. Corsi demonstrates another outcome, just like goals, assists, plus/minus and wins.

Darryl Belfry and Evaluative Metrics

I am a huge proponent of Darryl Belfry. Ever since his video blog was shown to me, I have been completely taken with his approach to hockey development.

What he does is break down, and look logically at the events in a game, and how a player deals with those events that determine the success of a hockey player. While I have little idea of his complete evaluative model, I wanted to try my hand at a Darryl Belfry inspired evaluative model.

Jason’s Darryl Belfry Inspired Evaluative Metrics Model 1.0 (JDBIEM1.0)

Going back to the tryouts, I had a great opportunity to create and try out my own model to test.

It operated on a couple assumptions, and a couple constraints. My assumptions were made through observation, past experience, and consultation with other coaches.

Assumption 1: Players who have more puck possession are better than others. There are ways of obtaining possession which demonstrate more “hockey sense.”

Assumption 2: Players who make contact with other players more often are better at defence. Players who can time and anticipate the movement of an opposing player have more defensive “hockey sense.”

Constraint: I didn’t have lot’s of time or access to video replay.

Here is what I did:

All my data tracking took place over the course of 1 day at an above minor level tryout. There were 6 games during that day.
I had an observer track all the possessions that players on a specific team obtained over the course of the game. I had them tally those possessions by categorizing the way that they obtained possession of the puck. The three possible ways of obtaining the puck were by receiving a pass, creating a turnover, or chasing down a loose puck.
I concurrently had an observer track all the hits of players on a specific team. I had them tally those possessions by categorizing what happened when the player made the hit. Either the player hit an opponent when they didn’t have a puck, the player hit an opponent when they had the puck, and the player hit an opponent and created a turnover.
Players were awarded points based on the following scheme: 3 points for received pass or hit creating a turnover, 2 points for creating a turnover or hitting a player when they had the puck, 1 point for chasing a loose puck or hitting a player when they didn’t have the puck
I then compared the rankings provided by my measures to the coach’s rankings.

Results

Interestingly, there were not many discrepancies between the stats that I gathered and the coach’s rankings. The correlation between the rankings provided by my stats and the coaches for forwards was 0.44 and defenseman 0.50. 0.00 would mean that my stats did not predict ranking at all, and 1.00 would mean that my stats perfectly predicted the coach’s ranking.

I learned…

I can honestly say I was surprised that the stats and coach’s rating matched up so well. I expected there to be bigger discrepancies, because the coach informed me that his evaluations were based on more factors than just hockey sense.

Had I been the coach, I might have used the data to take a closer look at players whose advanced stat ranking and subjective ranking were highly discrepant.

As a skills coach, or as a coach looking to improve the players involved in this experiment, I might use the data to show the players were they can improve. I could look at the data, determine where a player is lacking (perhaps in creating turnovers), and then I could look directly at that area of the player’s game to determine how he could do a better job creating turnovers and winning battles. Or in the case of loose pucks, if a player is below average in getting his possessions this way, I could determine if the reason they are loosing puck races is physical (speed, quickness) or psychological (scared, unmotivated). No matter which area is weak, it allows me as a coach to put my microscope right over that area of their game and help them there. Furthermore, as an evaluator, if I suspect that a player may be too slow to play at the next level, I might be able to use the data to inform my speculation. If I am looking at ways that they get possession, and they mostly get their possession by chasing loose pucks, then I would probably not choose that player. But if he get’s lot’s of possessions, and most are through receiving passes and winning battles, then I would select him and wouldn’t mind giving up the possessions he isn’t getting from winning races while I wait for his speed to catch up.

Back to Corsi…if I were using Corsi in this experiment, I would be able to tell you which players have the best shot differential, but wouldn’t be able to explain how. I wouldn’t be able to pinpoint where that player needs improvement in his game, nor would I be make very good predictions on how well he might do at the next level. If the player has a good Corsi differential because they play with 4 other great players, that isn’t shown in the data. But using the metric from this case study, we could sort out if a player’s success was due to talented line mates, or hockey sense.

What’s next?

The next step, is for me to take my stats and research methods class! Once that is done, I’ll be able to do a better job designing perhaps a more comprehensive model.

Currently, the role of advanced stats is to inform a coach. Decisions, at this point, cannot be made based purely on these sorts of statistics. They have not been tested enough, we don’t have enough data, and we aren’t sure if the indicators are valid yet. (At least, I’m not sure on their validity.) Right now, the most useful indicator of advanced stats for me is if the coach’s perception seems to be accurate…or if there’s a discrepancy between what the coach sees and what the stats say. The discrepancy serves as a red flag that indicates that the coach should look more closely at their own evaluation of the player.

Hacking Tryouts – Experimenting with advanced statistics

Published by

Jason at Train 2.0

Leave a comment Cancel reply

Share this:

Published by

Jason at Train 2.0

Leave a comment Cancel reply