Wednesday, December 14, 2011

Predictive Quality of Early Season Results: The Holland Date

Yes, calling ToArray() in a for loop is inefficient -- thankfully
the calculations happen faster than I can perform a GET on the page
About a month ago I was struck with the idea that the Oilers were probably not going to be a top 5 team in the NHL this season. The real question I had was regarding how predictive the early results would be in determining the eventual finish of the club. I had heard a quote from an NHL GM, Ken Holland, put forth in an Eric Duhatscheck story. The thought mirrored my own when I was teasing an answer from the 'common sense' department of my brain regarding predictability of final results from early ones: teams generally show their mettle fairly early in a season. Certainly enough that we can have a rough idea of the top and bottom of the league. I just wanted to see when that date was, and how predictive it ended up being. Here is the top part of that story:
Detroit Red Wings general manager Ken Holland has a theory about the NHL playoff race that makes a lot of sense. He believes that after Thanksgiving weekend, NHL teams move in a pack.
If they were good early, chances are they'll be good the rest of the way, or at least good enough to make the playoffs.
He also believes that if a team struggles in the early going, it'll probably struggle the rest of the way to make the playoffs.
Once in a great while, a team rallies all the way back to make the playoffs. The San Jose Sharks did it the season they acquired Joe Thornton on Nov. 30, when they were mired in 10th place.
Mostly, though, teams that get out of the gate well are usually the ones left standing as May turns into June and the Stanley Cup playoffs get down to the final four. - Erik Duhatscheck, ,
Basically I wondered if I took a look at standings data in a step-by-step process, at which point would the point pace/win percentage became valuable for early prediction. Thankfully my day job is programming the infernal machine, and as you can see from the code snippet above, I wrote a quickie web-spider/data processor to scour standing data from and this gave me a nice set of data points for every day within a given season.

Charts and methodology after the hop.

The Methodolgy

The basic idea is I took every teams point pace for a given day, and checked how many points -- plus or minus -- it was away from there final point pace. I then averaged this point difference for all teams on that given day. The idea, of course, that the point pace will more closely match the eventual point pace as the season goes on, eventually reaching zero difference.

The Charts

The X axis is the season date, and the Y axis is the number of points, on average, teams in the league are off of their final season point pace.

Here is 2005:

As you can see, the first month is not a great indicator of final results. Our first low water mark occurs at about one month in, and the average as a whole (not just the standard deviation), settles down right around the Holland-predicted-date. Here is 2006-2008:

More of the same for the next three years. The first week of December is usually where the predictive quality gives the best bang for the buck (a good 8-10% difference from the final point rates, and you can advance 2 months and only pick up an additional 2-3%). Finally the two previous seasons:

More of the same here, except the teams are really tightly clustered now -- the standard deviation closely matches the league average, and once again that December 1-7 date is where the reverse hockey stick becomes a shaft instead of a blade (just what you wanted, a terrible global warming reference).


If the last two seasons are any indication, Ken Holland is a very smart man. We can see that during the first week of December, we have a very good idea (+/- 8 points) of where the majority of the teams will finish. If you are an Oilers fan, this is not the best news. If this logic holds up we will need to hope we are one of those teams with a better than average second half of the season.

Ladies and Gentleman, the Holland Date: December 1.

P.S. If anyone wants the raw data, or has some different ideas on how to use standing data, feel free to ask/suggest it below.


Post a Comment