Hello again! Today, I'm going to introduce a new metric for player analysis in college hockey, called Win Shares. My Win Shares metric is based on Hockey Reference's Point Shares, created by Justin Kubatko. What I did was take what Justin did, and tweaked it around for college hockey, given the extra information we have available to us. I'll get into a basic description of Win Shares, as well as share what tweaks I made to it. For the sake of efficiency, I won't write out the whole formula here. If you'd like to see the full formula for yourself, it's available on Hockey Reference (www.hockey-reference.com/about/point_shares.html). Also, a quick note, the following tweaks were done for NCAA players. As little data that exists for the NCAA, even less is out there for USports hockey. The USports Win Shares metric mostly sticks to the original formula without my tweaks.
A basic explanation of Win Shares (or Point Shares), is that it attempts to quantify an individual player's contribution to their team in three facets of the game: offensive, defensive, and goalie. Skaters will have both an offensive and defensive win shares number, while goalies will only have the goalie win shares number attributed to them. It tries to boil down a player's different stat lines into one single metric for the purpose of comparison against other players that play in the same league (in this case, other college hockey players across the country). It isn't perfect by any means (i'll get into this later) and we have better metrics for analysis at the NHL level for evaluating performance. However, at the college hockey level, where data is limited compared to pro leagues, these numbers can shed some insight as long as you're careful about jumping to conclusions.
The first thing that I obviously changed was tweaking the formula so that it was tuned to team wins, rather than team standings points. I did this since college hockey doesn't have a standardized point system at the national level, so I adjusted the formula so that it's based off of the number of wins a team has, with a tie counting as half a win, just like it does to the NCAA.
The next thing I changed was in respect to the "Goals Created" stat used in the Offensive Point Shares metric. Hockey Reference says that an assist is worth half a goal for the sake of simplicity. I decided that I wanted to calculate the actual value of assist, and use that as my coefficient. Drawing off of the incredible work done by Shawn Ferris, Mike Murphy, and Dom Luszczyszyn, I decided to go with the same approach as their "Game Score" calculation. I totaled the number of goals and assists scored in the relevant season by every team. Then, you divide the total assists by total goals to get your assists coefficient. I did this for every season that I scraped, as I wanted to capture how much an assist was worth from season to season. For 2019/20, I calculated an assist coefficient of 0.579 for the men, and 0.586 for the women. This meant that an assist was worth a little less than 60% of a goal. In addition to calculating new assist coefficients, I also added Corsi For to the formula. I figured that I wanted to also capture the offensive output of players that were driving play, but not necessarily gathering points. In the 2019/20 season, the Corsi coefficient for both men and women was 0.051, meaning one shot attempt was roughly worth about 5% of a goal.
The last thing I changed was the proportion of team marginal goals against assigned to skaters, used to calculate Defensive Point Shares. In the original formula, it compared team shots on goal against to league shots on goal against. I changed it to team corsi against compared to league corsi against. I felt that using Corsi instead of shots on goal more accurately reflected a team's defensive capabilities. We were able to do this since we have corsi against data on the league and team level, just not at the individual level.
After all these adjustments, we are able to calculate our various Win Shares components. I use this to put into my player cards, which shows a player's season in context, showing both Win Shares numbers as well as rates for certain stats. All of these show percentile as well to compare it to other college hockey players. The stars in the cards are assigned based off of the percentiles, which each star adding another 20%. If a player has 5 gold stars in a certain category, that means he/she placed in the 96th percentile or higher in that stat. This was done to highlight the truly elite players in these categories. OWS stands for "Offensive Win Shares", DWS stands for "Defensive Win Shares", and TWS stands for "Total Win Shares", calculated by adding a skater's OWS and DWS. An example of a card is shown below:
I want to also address some of the cons of Win Shares. It doesn't take into account nearly enough to make it a stat similar to something like EvolvingHockey's WAR at the NHL level. The defensive side uses plus/minus to adjust individual players. We all know how bad of a stat plus/minus is compared to something even as simple as a player's CF%. Unfortunately, that's the absolute best we can get right now, with the only "advanced data" we get at the NCAA level is team CF and CA, and individual CF. There's no TOI or CA data individually available publicly, so we have to make do with the more traditional stats. Therefore, while this is the best we can do to boil down a college player's contribution into one number, we must be cautious when using that one number. Understanding it in context helps when making judgments on players. For example, goalie win shares tend to be inflated with respect to their teammates. Another thing to look out for is good players having low defensive win share numbers due to their team giving up more goals than usual. Win Shares can be a very helpful tool when evaluating, but it's honestly just a start to truly measuring a player's ability.
As a thought exercise, let's use Win Shares to determine our picks for the Hobey Baker as well as the Patty Kazmaier awards. First, for the Patty Kaz, we completely agree with the actual pick. Élizabeth Giguère was far and away the best women's DI player the whole season, and our data agrees, attributing 4.875 win shares to her! For the men, going strictly based off of Win Shares we would give the Hobey to Jeremy Swayman, Maine's stud junior goaltender. He racked up an astonishing 5.322 win shares this season. However, we do know that goalie win shares tend to be a bit inflated, something that is down to a matter of opinion how valuable goaltenders are at different levels of the game. Therefore, if you didn't want to pick a goalie, we would go with Arizona State's Johnny Walker. He finished 2019/20 with 3.369 win shares, good for best at the DI level for forwards and defenseman.
Thanks for reading! I hope you like the new logos for the site! I decided to put to use what limited artistic ability I had to make the site look a little nicer. I hope to get the actual WS data up on the site ASAP, once I figure out the best way to go about it. I also hope to implement the code to calculate Win Shares for USports soon! Stay tuned!
Thanks for waiting! Here you will find an explanation for all the content you would find on the pages of this website! I'll outline what I'm doing now as well as what I plan to for the future.
Pre-Game Prediction Model 1.0: Oliver
Oliver was built on data from games from 2014/15 to 2018/19. As each season concludes, I will add the data to our training set to improve Oliver for the next season. The data is scraped and gathered from College Hockey News (www.collegehockeynews.com). I owe them a huge amount of thanks, as the data they put on their website would not be available to the public without them purchasing it from the NCAA. Their articles are top-notch and the coverage they provide for the sport is unparalleled. I also based Oliver on the amazing model created by Peter Tanner at www.moneypuck.com. Please give his website a look for NHL content.
Oliver is designed to predict the winning percentage of teams based on several factors. The first factor is the season Close Corsi For % of a team. This stat, boiled down, is how much a team out-shoots their opponents over the course of a season when the game is either tied or within a goal of each other. This was the most predictive metric for shot volume to use in Oliver, more so than overall Corsi For %.
The second factor I use is my calculated adjRating for each team. At its core, this is a goal efficiency metric that I created to replace Expected Goals. Since there really is no standardized shot tracking in the NCAA, besides quantity, there's no way to create an expected goals model, as many have done in the NHL. What I wanted to do with my goal efficiency metric was capture a component of the usefulness of expected goals, which is how good a team is at scoring based on its shot volume. To do this, I borrowed from Ken Pomeroy, who does college basketball stats at www.kenpom.com. I calculate a raw offensive rating (OR) and defensive rating (DR) for each team for each game. This is calculated as Goals For or Against / Corsi For or Against * 100. This is then averaged throughout the season to get a team's season offensive and defensive rating. Finally, I used ridge regression to adjust the ratings to account for strength of schedule to get what you see on my page. To interpret this, a team's adjusted OR is how many goals we would expect this team to score for every 100 shot attempts. A team's adjusted DR is how many goals we would expect this team to give up for every 100 shot attempts against. The final adjRating is done by subtracting a team's adjDR from its adjOR.
The final two factors I use in Oliver are simple. I use the factors of PDO, team shot percentage, and team save percentage. While these metrics are poor when being used to evaluate the strength of a team properly, as these stats are primarily luck-driven, they are very predictive when trying to calculate win probabilities, lending to the old adage that you have to be lucky to win.
With all these factors, I ran a multivariate regression with the target variable being a team's win percentage to calculate my coefficients. The specific coefficients will be published here over the summer as I complete the rewrite of my code. I then use the coefficients with the current season's factors for each team to calculate a predicted winning percentage of each team. This "Expected Winning Percentage" has been calculated to be more predictive of future wins than a team's actual current winning percentage. I rank teams by their "Expected Winning Percentage" which can also be used to see which teams are over-preforming or under-preforming their actual underlying play.
Due to my novice coding abilities, I have had to manually track the results of my model, which I started in the middle of December. As of when I write this, the model has gone 261-128 (67%) with 62 ties. The way I calculate the win probabilities is percent chance of a team to win, so I assume a tie can't occur (which it can). To calculate the probability of a tie, I would have to rework the whole model also given the fact that ties have to have a 65 minute game rather than the normal 60. Below is a look at the expected win percentage of each team vs. its actual win percentage as of 2/25/20:
On my website you can find my predictions for the next day's worth of games. I use each team's expected winning percentage to calculate the probability of each team winning the game. The graphic is fairly simple to decipher, as teams with the higher probability are highlighted in green, with teams with the lower probability are highlighted in red. I want to store my data in a database, so that probabilities can be updated and shown in real time as well as you guys having the ability of looking at more games than just the next day, such as what is found at MoneyPuck. Unfortunately, I am nowhere near competent enough coding something like that, so that will have to be a future project for me.
These charts were made from the data I scraped from CHN by using Tableau. Due to there being no TOI data available on a team level, all team charts are using data from the whole game and all game states. All player charts, on the other hand, are made using only data from even strength play, as this shows a better indication of a player's overall ability. I am always looking for ideas for more charts to feel free to suggest them! Also, if you have any questions about interpreting charts, please send me those inquiries.
Future Work and Ideas
The first thing I have planned is to rewrite my code base this offseason so I can better update the website more efficiently. I kind of cobbled together the current version so there's many things I can make better to improve the time spent on the front end. Something that I also want to do is add in women's rankings. As you can see, currently my model is only used for the men which shows on my rankings and predictions pages. The only place the women's teams and players show up are in the charts. When I set out to do this, I wanted to provide equal coverage to both the men and the women, something traditional media outlets do not. Unfortunately, with the whole transition of NEWHA from DII to DI, certain teams don't show up on CHN that would be important for scraping and calculating the full rankings. When I reached out about this, I was told that while the conference was DI, they still thought that the 4 teams missing were DII, which led them to exclude them from certain stats pages. They reached out to the NCAA about this for clarification, and hopefully we can get this fixed soon. As soon as they add those teams, I can implement my full model for the women's game, which will be fantastic!
I also want to improve my model in the future by adding in a home ice advantage component. I want to first see the effect of home ice and whether it's important enough to add to my model. I also want to adjust my process to weigh recent results more. We know that a hockey season is long and teams change the way they play over the course of the season. I want to better capture that as I think it will be more predictive. Finally, I want to get better at coding so I can switch to a more database style of storing my data rather than a bunch of CSV files lying around. This will help me look at past results for the model, as well as run queries to see fun stuff, such as looking at the biggest upsets of the season, or the most one-sided games.
Thanks for reading! Again, if you have any questions or suggestions, feel free to go to the Contact page to get in touch! Also, take a look at my #CBJHAC slides below for an abridged version of this post!
I want to welcome everybody to this site where I will share all my work into the unexplored world of college hockey analytics! This post will explain a bit about me and how this site will work.
I am a junior currently studying Economics and Statistics at The Ohio State University in Columbus, OH (Go Bucks!) I want to work in the world of data analytics (hopefully, in sports) and have undertaken this project due to my huge interest in college hockey as well as in the world of hockey analytics. I also work as the video production intern for the Columbus Blue Jackets! Unfortunately, not much has been done for college hockey specifically by people much smarter than me, so I decided to do this myself to push the envelope on what's out there, publicly, as well as help to increase my own knowledge and skills! For anybody out there wanting to get involved in sports analytics, but lack the knowledge and skills to start as I did, I recommend throwing yourself into a project that interests you. Nothing is a better teacher than practice in this field. I am available to talk as well for anybody looking for help or guidance (take a look at the contact page)! Keep in mind, I am also a beginner in this process, continually learning as I go, so if I can't help you, I will point you in the direction of somebody who can!
This website will be home to both my own personal college hockey rankings as well as the advanced charts that I make. My rankings are completely objective, and my process for creating this will be detailed in one of my blog posts (hopefully to be presented and posted at the CBJ Hockey Analytics Conference on February 8, 2020). I currently am only ranking men's teams, but I am in the process of creating women's rankings as well. I use these rankings to compute my predictions. Using data from College Hockey News, I also make charts in the style of Sean Tierney (his site chartinghockey.ca is a fantastic resource) for both men's and women's D1 teams. Finally, I will try and post to the blog page regularly with analysis and thoughts about the current news and happenings of college hockey!
Thanks for visiting and I hope you enjoy!