I was talking to a lady on Twitter recently about the concept of ‘data science’ and what it means. Basically, she was frustrated that the term is so vague, “How do you *science* data anyway?” she said. Her comments got me thinking about data science in the world of fancy stats. We almost take for granted nowadays the availability of the possession-based metrics like Corsi, Fenwick, zone deployment, etc. But, what people may not think about is the reason those advanced stats are available is that coders (data scientists) figured out how to take the NHL’s official play-by-play data and parse out who was on the ice for each shot attempt, which players were deployed for face-offs, and so on. The reason we have access to those stats each night is that we have automated the process–we got the computers to scrape the data and perform the calculations to give us the stats we are all familiar with. It would be literally impossible to take the data and work it through manually each night, there just aren’t enough hours. So, our understanding of hockey is very much shaped by the processes we are able to automate. This is one of the many reasons that I am so excited about RFID tracking, or the SportVU (a.k.a. “missile defense”) cameras that have been utilized in the NBA. The frustrating thing about it is that many arenas share NBA and NHL, so the opportunity is there for us hockey fans, but we are not able to get that awesome data.
People who have been reading me for a while might know that every so often, I like to dust off stats that people have developed that cannot be automated, and thus do not get looked at very much, but I still think provide useful information. Back in the day, researchers had to take a long time to collect and clean the data they wanted before writing it up. Now, we just go click, click, click, and we can see a player’s scoring chance proportions for the last eight years.
Anyway, this is all just a long introduction to my weekend project where I looked at an older stat called the Disciplined Aggression Proxy (DAP). This was created by one of the godfathers of the fancy stats movement, Ian Fyffe, and basically looks at the physical aspects of the game (hits and takeaways) compared to the number of minor penalties a player takes. So, which players are being aggressive but disciplined, are getting hits and takeaways, and separating players from pucks without getting called for slashing, interference, etc.
Another #fancystats pioneer, Neil Greenberg, wrote about DAP a couple of years ago. It’s actually a rather simple formula: (Hits + Takeaways) / # of minor penalties. I like stats like this because they are easy for people to wrap their heads around, it’s basically just a fraction. The more hits and takeaways a player has, the higher his DAP will be, and the more minors he takes, that will start to cut down the number. So, I like to bring back these stats once in a while to show that there is a lot of information that we could be gaining, but since it has to be gathered by hand, it’s not in the ‘mainstream’ of advanced stats if you will.
Just to give you an idea of how I went about putting together my dataset, I downloaded all NHL players’ individual data from corsica.hockey for the 15-16 and 16-17 seasons (this was before Saturday’s games), but that site doesn’t split out minor penalties from major penalties. So, I had to search around for that data (shout out to @stateofstats, who is definitely worth a follow)…and wouldn’t you know it, NHL.com actually had it. Who would have guessed!? So, I scraped the data from that site (NHL doesn’t have a download feature, so I copied and pasted from 18 pages of stat tables into excel) before merging the data sets. Oh, and I had to do some data cleaning, too…for example, corsica tends to shorten players’ names in its database, but NHL does not…so we have ‘Alex Ovechkin’ vs ‘Alexander Ovechkin’ and my VLOOKUP function did not work until I reconciled the different names. Finally, I ran a the simple =(hits+takeaways)/minors function to get the DAP. Anyway, this is not to get pats on the back for the work I did, but to pull back the curtain and show how inefficient it can be to pull data that ends up going into a very simple equation.
So, I decided to break the rules and start by showing the 16-17 season only, which I freely admit is too small of a sample size to rely too heavily on, but it’s a good starting point. Here are your top ten skaters for DAP this year:
|Nic Dowd (LAK)||63||3||1||66.00|
|Scott Wilson (PIT)||53||5||1||58.00|
|Micheal Haley (SJS)||52||6||1||58.00|
|Matt Read (PHI)||44||10||1||54.00|
|Ryan Hartman (CHI)||41||7||1||48.00|
|Elias Lindholm (CAR)||29||17||1||46.00|
|Bryan Rust (PIT)||40||5||1||45.00|
|Brandon Tanev (WPG)||73||15||2||44.00|
|Pierre-Edouard Bellemare (PHI)||34||9||1||43.00|
|Aleksander Barkov (FLA)||20||21||1||41.00|
Obviously, at this point in the season, this list is influenced heavily by penalties. If Nic Dowd takes another minor, his DAP is cut in half, to 33.00–still impressive but not in the top-10. Also, as we’ve seen from Greenberg’s and others’ work, these numbers are quite inflated, as a DAP of around 20-25 over the course of a season is considered quite good.
Also, an obvious problem is that there are 100 players who have not taken a penalty this year. The formula does not work with them, because no matter how many hits and takeaways they have, we’re dividing by zero, and we don’t get a DAP at all. I thought it would be informative to show some of those players, because they should get credit for being aggressive and disciplined as well:
|Name||Hits||Takeaways||Minors||H + T|
|Micheal Ferland (CGY)||62||15||0||77|
|Kevin Klein (NYR)||48||9||0||57|
|Colton Sissons (NSH)||38||5||0||43|
|Joseph Cramarossa (ANA)||38||5||0||40|
|Adam Cracknell (DAL)||33||3||0||36|
|John Carlson (WSH)||24||11||0||35|
|Anton Stralman (TBL)||26||7||0||33|
|Frans Nielsen (DET)||24||9||0||33|
I think that while there are obvious flaws in this stat, we should give credit where credit is due–Nic Dowd is playing well as a grinder on the Kings’ fourth line. He’s throwing hits without taking penalties (and oh by the way he’s chipped in 2G and 9A for 11 pts) so he deserves kudos for that. Also, he’s a St. Cloud State Husky so you gotta love that. Also, shout out to Brandon Tanev for accumulating 73 hits and 15 takeaways while only receiving two minor penalties. But it’s obvious that we need to look at a larger sample size before we can really draw conclusions about who the best disciplined aggressive players are.
Since I haven’t mentioned any Minnesota players, here are the top five Wild skaters who are doing well in this metric this year.
Defense is always harder to measure than offense, and while this stat isn’t a comprehensive defensive metric, it still provides an interesting glimpse into the contributions of certain players that don’t always get the spotlight. I have data going back to the start of 2015-16 that I will post on Wednesday, which will allow us to draw more conclusions. In the mean time, I’d love to hear your thoughts about the overall concept of the ‘data science’ side of fancy stats, as well as the DAP stat shown here. Leave a comment on this post or hit me up on Twitter @BobaFenwick. Thanks for reading!