How excited are we all for the season to start this week?!?! Wisconsin vs Notre Dame kicks us off on Friday, and over the coming months, we will be treated to both the men and women playing mostly conference only schedules! Let's dive into how we at HockeyU will treat the upcoming season!
I planned to make my first improvements to the model, namely, implementing a Monte Carlo simulation, to more accurately sim the results of individual games as well as simulating the outcomes of things like standings, conference tourneys, and the NCAA tournament. Unfortunately, due to a combination of the pandemic as well as schoolwork piling up on me, I'll unfortunately have to postpone these improvements for the next season. We will still continue to use Oliver in it's current form for this season with data from last season now being included! With little to no intra-conference games being played, we might see some difficulty in accurately reflecting team strength since there isn't any of those games being played, however the underlying metrics for each team should paint a picture for us regardless of strength of schedule, which will be thrown off due to scheduling. I'm curious to compare! Also, I'm very hopeful that I can apply Oliver to the women's game also! Hopefully CHN fully updates the stats pages for this season, which will allow me to include them fully.
Charts and Other Content
I will continue to update the charts as the season progresses! I plan to add a couple new charts, mostly related to special teams play, and those should roll out soon. Win Shares charts will also be updated every weekend, just like the scatter plots. I also plan to try and write more articles and blog posts to complement the graphics and charts I post on Twitter. Hopefully, this will engage you all and drive more eyeballs to the site! I'm currently also working on getting Win Shares data for past seasons up, as well as Scatter/Win Shares charts up for USports! After all, the maple leaf is in the logo :) As always, if you have any comments, suggestions, or feedback, please don't hesitate to talk to me through email or Twitter (both can be found on the About page).
Welcome back! Today, we are going to do a fun exercise inspired by Anne Tokarski's Ice Garden article where she selected all-conference teams for a hypothetical women's hockey tournament (link to the article). I'm going to do something similar here using my Win Shares metric as well as some other twists to make things interesting. First, I want to explain my Win Shares Roster card you may have seen me post on Twitter a little bit ago with my 19/20 Men's and Women's All Star Teams:
These cards were inspired by the NHL WAR roster generator cards created by @JFreshHockey on Twitter. I filled out the All-Star rosters my taking the top 12 forwards, top 6 defensemen/women, and the top 2 goalies. You might ask, why do the Win Shares numbers for some of the players on these cards not match the data I have published on the site? Well, these cards are designed to evaluate the strength of these players as a team. Thusly, ice time plays a role in the calculations. Players on the lower lines will see their Win Shares deflated, as they get less ice time to contribute on the ice. Therefore, these cards are designed to predict the strength of the team if they actually all took the ice and played together, rather than highlighting the individual strengths.
For this blog post, we are going to do the same thing, however divide it up even further and pick all-conference teams and see which conferences would field the top lineups. To make this more interesting, I set the rule that each conference team must have at least one player from every member school. As you'll see, this bumps out some players that otherwise truly deserve to be in the all-conference lineup. In lieu of this, I decided to at least show who the first player to miss out on making the roster, by position group. If we played hypothetical tournaments with these teams, those players would be reserves in case of player injuries or dropouts. With that out of the way, let's get to the teams! We'll start with the women and then move on to the men.
Women's Hockey East
Women's DI Top Units
The rosters chosen reflect the overall conference strength, I believe, with the WCHA, ECAC, and Hockey East leading the way in strength with the CHA and NEWHA bringing up the rear. The WCHA lineup is, on paper, the best team led by the offensive firepower of Wisconsin, Minnesota, and Ohio State. Their closest rival, the ECAC team, is a model of balance with top tier players in every position group led by players from Clarkson, Cornell, and Princeton. Hockey East's strength comes from the back, leading the way at the defense and goalie positions. Northeastern dominates the lineup with key additions coming from teams like BU, Providence, and Maine, UConn, and BC. CHA comes in 4th with Mercyhurst names dotting all across the lines. Syracuse, Robert Morris, and Penn State players round out the team. Finally, NEWHA's team is the worst on paper, but there are some fantastic players leading the way for them to make noise, with Sacred Heart, LIU, and Franklin Pierce players controlling the lion's share of the spots.
Men's Atlantic Hockey
Men's Big Ten
Men's Hockey East
Men's DI Top Units
The men's rosters are more evenly balanced with Hockey East making up Tier 1 by itself, the WCHA and NCHC making up Tier 2, and the Big Ten, ECAC, and Atlantic making up Tier 3. Hockey East has by and far the best lineup on paper, anchored with an elite forward group and goalie tandem. BC players make up the majority of the roster, but elite players come from other schools like Providence, UMass, Maine, and BU. The WCHA and NCHC have nearly identical strength rosters here, with above average forward groups, elite defensive groups, and below average goaltending tandems. Looking at the players selected, it comes down to mostly North Dakota vs Minnesota State players as expected, with the better defensemen going to the NCHC, and better goaltending duo going to the WCHA. The Big Ten leads the next tier, off the back of their very strong goaltending tandem. No one team makes up most of the team, with Penn State and Ohio State leading the way in the player count with Michigan and Michigan State following close behind. Following the Big Ten is the ECAC, with an average forward and goaltending group, but they have the worst defensive group of any conference. Surprisingly, the top team Cornell only has 2 players, with Quinnipiac, Harvard, and Clarkson each contributing more. Finally, as expected, the Atlantic Hockey team comes in with the weakest roster, but not by as much as most would expect. Sacred Heart and AIC make up most of the picks for a lineup that's actually middle-tier for the defensemen and goalies, but they come in last with their forwards which causes them to come in last on paper.
Hope you enjoyed this exercise! Feel free to share these if you'd like and I'd love to hear your feedback!
Hello again! Today, I'm going to introduce a new metric for player analysis in college hockey, called Win Shares. My Win Shares metric is based on Hockey Reference's Point Shares, created by Justin Kubatko. What I did was take what Justin did, and tweaked it around for college hockey, given the extra information we have available to us. I'll get into a basic description of Win Shares, as well as share what tweaks I made to it. For the sake of efficiency, I won't write out the whole formula here. If you'd like to see the full formula for yourself, it's available on Hockey Reference (www.hockey-reference.com/about/point_shares.html). Also, a quick note, the following tweaks were done for NCAA players. As little data that exists for the NCAA, even less is out there for USports hockey. The USports Win Shares metric mostly sticks to the original formula without my tweaks.
A basic explanation of Win Shares (or Point Shares), is that it attempts to quantify an individual player's contribution to their team in three facets of the game: offensive, defensive, and goalie. Skaters will have both an offensive and defensive win shares number, while goalies will only have the goalie win shares number attributed to them. It tries to boil down a player's different stat lines into one single metric for the purpose of comparison against other players that play in the same league (in this case, other college hockey players across the country). It isn't perfect by any means (i'll get into this later) and we have better metrics for analysis at the NHL level for evaluating performance. However, at the college hockey level, where data is limited compared to pro leagues, these numbers can shed some insight as long as you're careful about jumping to conclusions.
The first thing that I obviously changed was tweaking the formula so that it was tuned to team wins, rather than team standings points. I did this since college hockey doesn't have a standardized point system at the national level, so I adjusted the formula so that it's based off of the number of wins a team has, with a tie counting as half a win, just like it does to the NCAA.
The next thing I changed was in respect to the "Goals Created" stat used in the Offensive Point Shares metric. Hockey Reference says that an assist is worth half a goal for the sake of simplicity. I decided that I wanted to calculate the actual value of assist, and use that as my coefficient. Drawing off of the incredible work done by Shawn Ferris, Mike Murphy, and Dom Luszczyszyn, I decided to go with the same approach as their "Game Score" calculation. I totaled the number of goals and assists scored in the relevant season by every team. Then, you divide the total assists by total goals to get your assists coefficient. I did this for every season that I scraped, as I wanted to capture how much an assist was worth from season to season. For 2019/20, I calculated an assist coefficient of 0.579 for the men, and 0.586 for the women. This meant that an assist was worth a little less than 60% of a goal. In addition to calculating new assist coefficients, I also added Corsi For to the formula. I figured that I wanted to also capture the offensive output of players that were driving play, but not necessarily gathering points. In the 2019/20 season, the Corsi coefficient for both men and women was 0.051, meaning one shot attempt was roughly worth about 5% of a goal.
The last thing I changed was the proportion of team marginal goals against assigned to skaters, used to calculate Defensive Point Shares. In the original formula, it compared team shots on goal against to league shots on goal against. I changed it to team corsi against compared to league corsi against. I felt that using Corsi instead of shots on goal more accurately reflected a team's defensive capabilities. We were able to do this since we have corsi against data on the league and team level, just not at the individual level.
After all these adjustments, we are able to calculate our various Win Shares components. I use this to put into my player cards, which shows a player's season in context, showing both Win Shares numbers as well as rates for certain stats. All of these show percentile as well to compare it to other college hockey players. The stars in the cards are assigned based off of the percentiles, which each star adding another 20%. If a player has 5 gold stars in a certain category, that means he/she placed in the 96th percentile or higher in that stat. This was done to highlight the truly elite players in these categories. OWS stands for "Offensive Win Shares", DWS stands for "Defensive Win Shares", and TWS stands for "Total Win Shares", calculated by adding a skater's OWS and DWS. An example of a card is shown below:
I want to also address some of the cons of Win Shares. It doesn't take into account nearly enough to make it a stat similar to something like EvolvingHockey's WAR at the NHL level. The defensive side uses plus/minus to adjust individual players. We all know how bad of a stat plus/minus is compared to something even as simple as a player's CF%. Unfortunately, that's the absolute best we can get right now, with the only "advanced data" we get at the NCAA level is team CF and CA, and individual CF. There's no TOI or CA data individually available publicly, so we have to make do with the more traditional stats. Therefore, while this is the best we can do to boil down a college player's contribution into one number, we must be cautious when using that one number. Understanding it in context helps when making judgments on players. For example, goalie win shares tend to be inflated with respect to their teammates. Another thing to look out for is good players having low defensive win share numbers due to their team giving up more goals than usual. Win Shares can be a very helpful tool when evaluating, but it's honestly just a start to truly measuring a player's ability.
As a thought exercise, let's use Win Shares to determine our picks for the Hobey Baker as well as the Patty Kazmaier awards. First, for the Patty Kaz, we completely agree with the actual pick. Élizabeth Giguère was far and away the best women's DI player the whole season, and our data agrees, attributing 4.875 win shares to her! For the men, going strictly based off of Win Shares we would give the Hobey to Jeremy Swayman, Maine's stud junior goaltender. He racked up an astonishing 5.322 win shares this season. However, we do know that goalie win shares tend to be a bit inflated, something that is down to a matter of opinion how valuable goaltenders are at different levels of the game. Therefore, if you didn't want to pick a goalie, we would go with Bowling Green's Alec Rauhauser. He finished 2019/20 with 3.184 win shares, good for best at the DI level for forwards and defenseman.
Thanks for reading! I hope you like the new logos for the site! I decided to put to use what limited artistic ability I had to make the site look a little nicer. I hope to get the actual WS data up on the site ASAP, once I figure out the best way to go about it. I also hope to implement the code to calculate Win Shares for USports soon! Stay tuned!
Thanks for waiting! Here you will find an explanation for all the content you would find on the pages of this website! I'll outline what I'm doing now as well as what I plan to for the future.
Pre-Game Prediction Model 1.0: Oliver
Oliver was built on data from games from 2014/15 to 2018/19. As each season concludes, I will add the data to our training set to improve Oliver for the next season. The data is scraped and gathered from College Hockey News (www.collegehockeynews.com). I owe them a huge amount of thanks, as the data they put on their website would not be available to the public without them purchasing it from the NCAA. Their articles are top-notch and the coverage they provide for the sport is unparalleled. I also based Oliver on the amazing model created by Peter Tanner at www.moneypuck.com. Please give his website a look for NHL content.
Oliver is designed to predict the winning percentage of teams based on several factors. The first factor is the season Close Corsi For % of a team. This stat, boiled down, is how much a team out-shoots their opponents over the course of a season when the game is either tied or within a goal of each other. This was the most predictive metric for shot volume to use in Oliver, more so than overall Corsi For %.
The second factor I use is my calculated adjRating for each team. At its core, this is a goal efficiency metric that I created to replace Expected Goals. Since there really is no standardized shot tracking in the NCAA, besides quantity, there's no way to create an expected goals model, as many have done in the NHL. What I wanted to do with my goal efficiency metric was capture a component of the usefulness of expected goals, which is how good a team is at scoring based on its shot volume. To do this, I borrowed from Ken Pomeroy, who does college basketball stats at www.kenpom.com. I calculate a raw offensive rating (OR) and defensive rating (DR) for each team for each game. This is calculated as Goals For or Against / Corsi For or Against * 100. This is then averaged throughout the season to get a team's season offensive and defensive rating. Finally, I used ridge regression to adjust the ratings to account for strength of schedule to get what you see on my page. To interpret this, a team's adjusted OR is how many goals we would expect this team to score for every 100 shot attempts. A team's adjusted DR is how many goals we would expect this team to give up for every 100 shot attempts against. The final adjRating is done by subtracting a team's adjDR from its adjOR.
The final two factors I use in Oliver are simple. I use the factors of PDO, team shot percentage, and team save percentage. While these metrics are poor when being used to evaluate the strength of a team properly, as these stats are primarily luck-driven, they are very predictive when trying to calculate win probabilities, lending to the old adage that you have to be lucky to win.
With all these factors, I ran a multivariate regression with the target variable being a team's win percentage to calculate my coefficients. The specific coefficients will be published here over the summer as I complete the rewrite of my code. I then use the coefficients with the current season's factors for each team to calculate a predicted winning percentage of each team. This "Expected Winning Percentage" has been calculated to be more predictive of future wins than a team's actual current winning percentage. I rank teams by their "Expected Winning Percentage" which can also be used to see which teams are over-preforming or under-preforming their actual underlying play.
Due to my novice coding abilities, I have had to manually track the results of my model, which I started in the middle of December. As of when I write this, the model has gone 261-128 (67%) with 62 ties. The way I calculate the win probabilities is percent chance of a team to win, so I assume a tie can't occur (which it can). To calculate the probability of a tie, I would have to rework the whole model also given the fact that ties have to have a 65 minute game rather than the normal 60. Below is a look at the expected win percentage of each team vs. its actual win percentage as of 2/25/20:
On my website you can find my predictions for the next day's worth of games. I use each team's expected winning percentage to calculate the probability of each team winning the game. The graphic is fairly simple to decipher, as teams with the higher probability are highlighted in green, with teams with the lower probability are highlighted in red. I want to store my data in a database, so that probabilities can be updated and shown in real time as well as you guys having the ability of looking at more games than just the next day, such as what is found at MoneyPuck. Unfortunately, I am nowhere near competent enough coding something like that, so that will have to be a future project for me.
These charts were made from the data I scraped from CHN by using Tableau. Due to there being no TOI data available on a team level, all team charts are using data from the whole game and all game states. All player charts, on the other hand, are made using only data from even strength play, as this shows a better indication of a player's overall ability. I am always looking for ideas for more charts to feel free to suggest them! Also, if you have any questions about interpreting charts, please send me those inquiries.
Future Work and Ideas
The first thing I have planned is to rewrite my code base this offseason so I can better update the website more efficiently. I kind of cobbled together the current version so there's many things I can make better to improve the time spent on the front end. Something that I also want to do is add in women's rankings. As you can see, currently my model is only used for the men which shows on my rankings and predictions pages. The only place the women's teams and players show up are in the charts. When I set out to do this, I wanted to provide equal coverage to both the men and the women, something traditional media outlets do not. Unfortunately, with the whole transition of NEWHA from DII to DI, certain teams don't show up on CHN that would be important for scraping and calculating the full rankings. When I reached out about this, I was told that while the conference was DI, they still thought that the 4 teams missing were DII, which led them to exclude them from certain stats pages. They reached out to the NCAA about this for clarification, and hopefully we can get this fixed soon. As soon as they add those teams, I can implement my full model for the women's game, which will be fantastic!
I also want to improve my model in the future by adding in a home ice advantage component. I want to first see the effect of home ice and whether it's important enough to add to my model. I also want to adjust my process to weigh recent results more. We know that a hockey season is long and teams change the way they play over the course of the season. I want to better capture that as I think it will be more predictive. Finally, I want to get better at coding so I can switch to a more database style of storing my data rather than a bunch of CSV files lying around. This will help me look at past results for the model, as well as run queries to see fun stuff, such as looking at the biggest upsets of the season, or the most one-sided games.
Thanks for reading! Again, if you have any questions or suggestions, feel free to go to the Contact page to get in touch! Also, take a look at my #CBJHAC slides below for an abridged version of this post!
I want to welcome everybody to this site where I will share all my work into the unexplored world of college hockey analytics! This post will explain a bit about me and how this site will work.
I am a junior currently studying Economics and Statistics at The Ohio State University in Columbus, OH (Go Bucks!) I want to work in the world of data analytics (hopefully, in sports) and have undertaken this project due to my huge interest in college hockey as well as in the world of hockey analytics. I also work as the video production intern for the Columbus Blue Jackets! Unfortunately, not much has been done for college hockey specifically by people much smarter than me, so I decided to do this myself to push the envelope on what's out there, publicly, as well as help to increase my own knowledge and skills! For anybody out there wanting to get involved in sports analytics, but lack the knowledge and skills to start as I did, I recommend throwing yourself into a project that interests you. Nothing is a better teacher than practice in this field. I am available to talk as well for anybody looking for help or guidance (take a look at the contact page)! Keep in mind, I am also a beginner in this process, continually learning as I go, so if I can't help you, I will point you in the direction of somebody who can!
This website will be home to both my own personal college hockey rankings as well as the advanced charts that I make. My rankings are completely objective, and my process for creating this will be detailed in one of my blog posts (hopefully to be presented and posted at the CBJ Hockey Analytics Conference on February 8, 2020). I currently am only ranking men's teams, but I am in the process of creating women's rankings as well. I use these rankings to compute my predictions. Using data from College Hockey News, I also make charts in the style of Sean Tierney (his site chartinghockey.ca is a fantastic resource) for both men's and women's D1 teams. Finally, I will try and post to the blog page regularly with analysis and thoughts about the current news and happenings of college hockey!
Thanks for visiting and I hope you enjoy!