We like to incorporate advanced analytics in our hockey coverage here.
I'll use this space to dive into some of the basic advanced stats in detail, and then when we use those statistics in our coverage, we can link to this file so you can refer to it if needed.
If you have any further questions, feel free to leave them in the comments.
TERMS
CORSI
Corsi statistics involve all shot attempts: Shots on goal, blocked shots, and shots that hit the post or miss the net.
The number can be used as an individual statistic, looking at how many shot attempts a player takes themselves in a game. Corsi can also be used as an on-ice statistic, in which case we count the number of shot attempts for and against that occur when a specific player is on the ice. When we're using Corsi this way, it's usually only for five-on-five play.
When we count the number of shot attempts for and against that occur when a specific player is on the ice, we can present it in the form of a percentage, called Corsi For (abbreviated CF%). If a player was on the ice for just as many shot attempts for as he was against, his Corsi For percentage would be 50 percent. Anything above 50 means a player is on the ice for more shot attempts for than against, and anything below 50 means the team allows more shot attempts than they take while that player is on the ice.
For example, if when a player was on the ice his team attempted 12 shots and allowed eight attempts, his Corsi For percentage would be 60 percent, because his team was responsible for 60 percent of the total shot attempts taken while he was on the ice.
The stat is sometimes used as a proxy for possession, since actual possession numbers currently do not exist, and if a team is able to attempt a lot of shots, then they had to have the puck a lot to do so. It's not a perfect measure for possession, though.
The stat as we know it was developed by Tim Barnes (who also has gone by the name Vic Ferrari online) who used to run an Oilers blog.
As Barnes explained to Bob McKenzie in 2014, he heard then Sabres general manager Darcy Regier on the radio talking about shot attempt differential, then went on to develop the statistic.
“I was going to call (the new metric) the “Regier” number," Barnes told McKenzie. "But it didn’t sound good; it didn’t seem right. Then I was going to call it the “Ruff” number (after then Sabres’ coach Lindy Ruff), but that obviously sounded bad. So I went to the Buffalo Sabre website and looked at a picture of a guy on their website, and (then goaltending coach) Jim Corsi kind of fit the bill. So I called it a “Corsi number” and then I pretended it was (Corsi) I heard on the radio talking about it – that’s what I told people. That’s basically (how Corsi got named).”
Unbeknownst to Barnes, Jim Corsi actually was tracking shot attempts on his own at the time, but was doing so as a way to measure a goaltender's workload, since a goaltender still reacts to a shot attempt even if it gets blocked or just misses the net. Barnes didn't know this until he was told by McKenzie.
“Oh, I had no idea of that,” Barnes said. “I just liked his moustache.”
Barnes was hired by the Capitals as an analyst in 2014 and has been the director of analytics since 2017.
FENWICK
Fenwick is similar to Corsi, but removes blocked shots from the equation. You'll usually see it referred to as "unblocked shot attempts," and like Corsi, it can be presented as a percentage (Fenwick For).
Fenwick was developed by Flames blogger Matt Fenwick, who argued that unblocked shot attempts was a better predictor of success over time, because shots that end up blocked are usually poor scoring chances to begin with, and taken from less-dangerous areas of the ice.
DANGEROUSNESS
You'll often hear a shot described by terms like "low-danger" or "high-danger."
The website WAR on Ice first laid out clear cut danger areas of an ice surface based on the probability of a goal being scored from that area of the ice.
This chart, which uses the Kings' 2013-14 data as an example, lays out the probability of a goal being scored from each area. The light green zone is a high-danger area, the pink zone is a medium-danger area, and everything else in yellow is a low-danger area.
Current "danger" labels add more context.
Each shot attempt is given a numerical value. Shots from the low-danger areas are given a 1, shots from the medium-danger areas are given a 2, and shots from the high-danger areas are given a 3.
If a shot attempt is off of a rebound (within three seconds of the initial shot) or a rush attempt (any attempt within four seconds of any event in the shooting team’s neutral or offensive zones), a value of 1 is added to the above value.
If the attempt is blocked, a value of 1 is subtracted from the above value. If the result is a value of 3 or higher, it is considered high-danger. If the result is a value of 2, it is medium-danger. If the result is a value of 1 or less, it is low-danger.
____________________
HOW DO WE USE THOSE STATS?
There are several concepts in advanced analytics that combine the above terms and aim to provide more useful data.
SCORING CHANCES
Not every shot attempt or shot on goal is a scoring chance. The scoring chance statistic follows the formula developed by WAR on Ice after studying the probabilities of scoring a goal off of different types of shots from different areas of the ice. Using the above "dangerousness" values, if a shot attempt is given a value of 2 or higher, it is considered a scoring chance.
That means the following criteria must be met to count as a legitimate scoring chance:
- In the low danger zone, unblocked rebounds and rush shots only.
- In the medium danger zone, all unblocked shots.
- In the high danger zone, all shot attempts (since blocked shots taken here may be more representative of more “wide-open nets”, though we don’t know this for sure.)
EXPECTED GOALS
Expected goals (xG) aims to assign a value to a shot attempt based on several factors.
Factors taken into consideration include the type of shot (slap shot, wrist shot, etc.), whether the shot may have come off of a rebound or a rush, the location of the shooter (both distance and angle to the net), the talent of the shooter, and the state of the game (tie game, trailing, etc.), and more. Values of each factor are determined by the probability of each factor leading to a goal based on the data from past events.
There isn't one single definition for expected goals, because different analysts have created their own models for the statistic that use different values for the above factors.
For an in-depth look at how models are developed and how values are decided, you can look at Evolving Wild's methodology.
____________________
ADDING MORE CONTEXT
There are a few ways we can take the basic and advanced concepts and add more context to them.
RELATIVE STATS
Sometimes you may be looking at a team that overall had a strong game, and all of the players have good advanced analytics. Or, you could be looking at a team's numbers for the entire season. If a team as a whole is struggling, the entire roster's advanced analytics over the course of the season may be poor.
To add context, we have relative stats. We can use relative stats for Corsi, Fenwick, shots, goals, and more.
Relative stats look at the team's on-ice performance when a player is on the ice compared to the team's on-ice performance when he is not on the ice, and are presented in the form of a percentage. Positive percentages mean the team had better results when that player was on the ice, and negative percentages mean the team had worse results when he was on the ice.
Looking at the Penguins' March 10 game against the Devils, most players had a Corsi For percentage above 50 percent, since the team as a whole was dominant. In this case, looking at a player's individual Corsi For percentages won't tell us much about their own performance. We can see though that Evgeni Malkin had a relative Corsi For (CF% Rel) of 30.69 percent. When Malkin was on the ice, the Penguins' shot attempt differential was 30.69 percent better than the Penguins' shot attempt differential when he was on the bench.
WITH OR WITHOUT YOU
No, not the U2 song.
"With or without you" stats are used to see how a line or pairing or any other combination of players performs together compared to how they perform apart. You can see their application in a few stories I've done.
In this story from February 2019, I looked at players' Corsi For percentages with and without Jack Johnson on the ice, and saw that in nearly every case that season, a player's CF% dropped when Johnson was on the ice during five-on-five.
Another application of the stat took a look at the results of different top line combinations used by the Penguins in the 2018-19 season. We were able to isolate the results from Sidney Crosby and Jake Guentzel's time with four different wingers, and see how the different combinations performed as units. The results showed that the top line's best goal differential came when either Bryan Rust or Dominik Simon were on the top line.
RATES
Presenting a statistic in the form of a rate rather than as a straight count is a good way of controlling for ice time, or number of games played.
Rather than see how many times a player has done something, we can see how often they do it. These statistics are often presented as "per 60," which tells us how often a player did something per 60 minutes of ice time.
It's a similar concept as "per game," it just controls for ice time.
For example, the Penguins' goal-scoring leader this season is Rust, with 27 goals in 55 games. Jake Guentzel, who has been injured since December, currently sits at 20 in 39 games. Obviously, they've played in a different number of games, but they also have different average time on ice per game, so comparing their goals per game isn't ideal. When we look at their goals per 60, we can see that both players were scoring at identical rates, averaging 1.49 goals for every 60 minutes of ice time