# Sports Journalism

Uncategorized

Since this is a sports adjacent blog, I thought I would attach a few pieces of sports journalism that I think about often. Particularly, I am interested in journalism rooted in the human experience – pieces that explore sport as a facet of the cultural landscape, how people relate to sport, and sport as a form of self-expression.

• Coming of age with a surf board in Hawaii. Finding passion and navigating the politics of the waters.

• The story of Ronnie O’Sullivan, perhaps the most talented snooker player ever, and his battle with the demons of childhood, a solitary sport, and the gifts he was given.

Federer as a Religious Experience

• DFW on watching Roger Federer play

The Silent Season of a Hero

• Joe Dimaggio profile

Katy Ledecky is the Present and Future of Swimming

The Meaning of Serena Williams

The Art of Pitching

• Contains my favourite poem about sports

Eliud Kipchoge is the Greatest Marathon Runner, Ever

Defying Space and Time

# Podcast Episodes I like

Uncategorized

A reference for myself, and anyone interested, of podcasts that have stuck with me over the years. Not so much a list of all the podcasts I like, but a catalogue of specific episodes I find myself thinking about often. I’ll try to periodically update it. If there is an episode of something that really stands out to you, please let me know so I can give it a listen.

This American Life

Cars

Reruns

Chips in my brain

The Problem we all live with

Dr. Gilmer and Mr. Hyde

Three Miles

Americans in Paris

Five Women

Translation

Oliver Sacks: A Journey from Where to Where

On the Edge

Mau Mau

La Mancha Screwjob

Black Box

99 Percent Invisible

The Sunshine Hotel

Three Records from Sundown

Numbers Stations

The Stethoscope

H-Day

Punk Style

Greetings from Coney Island

Sesquipedalian

Photochemical

Heavyweight

Sven

Rachael

Soraya

Marchel

Mystery Show

Source Code

Kotter

Ear Hustle

Cellies

Looking Out

Invisibilia

The New Norm

How to become Batman

Emotions

Code Switch

A year of Love and Striggle in a New High School

Detroit 1967: There is still debate over what to call it

Revisionist History

A polite word for liar

Narco Tours

Imaginary Worlds

Do you speak Conlang

Politics of Thrones

The Real Twin Peaks

Surprisingly Awesome

Pigeons

Free Throws

Planet Money

How Fake Money Saved Brazil

Why Coke Cost a Nickel for 70 years

The Experiment Experiment

The Allusionist

Evolution of Accents

Jennicam

The Man in the FBI Hat

The French Connection

On the Inside

The Heart

Talks

Daryl Leroux – “Now I am Metis: How White People Become Indigenous”

# (Cold) WAR Hypothesis Testing (Part 1 of ???)

Uncategorized

“Day 102. The WAR wars rage on. Tentative fire fills the night sky, peace talks stalled as trenches dug deeper, charts pored over ever more furiously. Morale at an all-time low.”

• Dramatic, fake report from scout (Me) about a war that is more an unpleasant conversation than a battle.

For those of you not following, hockey twitter has been mired in controversy over the value of WAR, or Wins Above Replacement, metrics of late. The counterargument to such methods, the best I understand it is as follows:

Wins Above Replacement metrics use overly technical, black-box regression models to produce a single-number metric. This obscures the process, is unlikely to capture everything in a sport as fluid and poorly documented as hockey, is hard to critique because it is so complex and arcane (does not seem to have stopped anybody…), the outputs are sometimes weird, and error bounds are not displayed.

I don’t have the time nor the brainpower to address everything, but others have done a good job (see Brian MacDonald Interview for example) . Suffice it to say, I agree partially with the critiques listed above but am much more interested in finding better ways to model hockey data than discredit an entire class of methods. To me being an applied statistician is all about considering assumptions and limitations and working with institutional knowledge and data to make better models. Most of the process is about contextualizing and knowing what you are looking at and how to use it. My point here being that people making these models are probably more obsessed with limitations and context than detractors, but it’s hard to express that clearly when the value of what you do is being attacked and straw-manned.

What I am trying to do with this piece is to use the attention and discussion about WAR to make better models or at least to give people tools to use them in new or better ways. Today that means trying to show how we can construct a test-statistic to test if one player has a significantly better model output than another using so-called black box regression techniques. I’m not suggesting that every WAR list should have some huge matrix comparing every permutation of hypotheses or that this is necessary for WAR to be useful (it’s not), but just addressing the point that this can be done. It is not some fundamental problem with regression modeling, in fact in my business of causal inference it’s one of the reasons why we turn to regression models. (Aside: this is much easier with regression than with a corsi or counting stat analysis. If someone has figured out valid error estimates for Wowy or something like that, I would love to see how the hell that works. My guess is that anti-WAR folk aren’t super into block bootstrapping methods.)

Consider a relatively simple WAR model (compared to say Manny Elk or Evolving Wild anyway).

Value will be the sum of contributions to shot creation, shot suppression, goal creation and goal suppression. Everything will be at 5 on 5. Notice I do not have any penalty differential component, nothing to do with expected goals, special teams, etc. I will use OLS regression to estimate the impact for the 4 components, in two (sorta) different regressions. The data used is coming from shift data that I transformed from pbp I downloaded at corsica.hockey. It is a small subset of data from the 2007-2008 season (a random sampling of the equivalent of 10 games). Many of these choices would not be great if I was trying to produce a serious model for use, but they help me make things more concise, clear, and easier to code. What I describe can be extended to more complicated components and methods.

I have two linear models, one for goals and the other for shots. Notice that the regressor matrix X is the same in both and in this case it will include dummy variables for whether each player is on offense or defense. Theoretically X could include some different covariates and ideally they would also include state variables like score effects etc. The outputs are recorded per shift, which are defined to be when the same set of players are on the ice at the same time or when a faceoff has occurred.

Now imagine if we wanted to test the hypothesis that Jarome Iginla and Evgeni Malkin are equally valuable according to my (incomplete) WAR stat, how would we do that?

let WAR for player j be defined as:

What we want to test then is the null that $WAR_{JI} = WAR_{EM}$ against the alternative that they do not.

Let’s start simpler, imagine we just want to know about the shot creation (SC) abilities of the two, the nice thing about this hypothesis is that the parameters that we are interested in are all in the same regression model.

What we want is a test stat that looks like this:

We need to know what the standard error of the difference in parametears to test this hypothesis. This is slightly different than when we want to test is a parameter is different than zero because the regression summary doesn’t automatically tell us this information. Here are two ways you can get this test statistic.

First, let us just use the definition of standard error

$se = \frac{var}{df}$ ,

Where df is the degrees of freedom in the model (n-p), where n is data points and p is the number of estimated parameters). By definition

$Var(\hat{SC_{Iginla}}-\hat{SC_{Malkin}}) = Var(\hat{SC_{Iginla}}) + Var(\hat{SC_{Malkin}}) - 2COV(\hat{SC_{Iginla}}-\hat{SC_{Malkin}})$

We can extract all of this information directly from the covariance matrix provided with the regression. As long as the regression method and package you use outputs a valid covariance matrix, you can always do this. Pretty much every type of regression can do this, although of course you must be careful about assumptions, but for many methods covariance matrices which require looser assumptions exist in a mature form.

Xmat <- as.matrix(readRDS("Xmat.Rda"))
model1 <- lm(Fen ~ Xmat)

betas <- model1$coefficients Malk <-betas[grepl("MALKIN",names(betas))] Iggy <- betas[grep("IGINLA",names(betas))] search <- paste(c("MALKIN","IGINLA"),collapse="|") sig <- summary(model1)$sigma

covmat <- summary(model1)$cov.unscaled cov <- covmat[grepl(search,rownames(covmat)), grepl(search,colnames(covmat))]  Here we can see just the estimate and the relevant portion of the covariance matrix. Now let’s make our test statistic for just the offense. se <- sig*sqrt(cov[1,1] + cov[2,2] - 2*cov[1,2]) t_stat <- (Iggy[1] - Malk[1])/se t_stat  ## XmatJAROME.IGINLA ## 1.539289  Here the estimated t statistic is 1.54. At the typical 5% level this isn’t significant for a two tail test. In general we might be less interested in a formal hypothesis test than we are in getting some probability that some player is better than another, since we aren’t considering rolling out some drug or something where being very conservative is important. We just want to make the best decision possible regarding relative values of players. In this case, 1.54 is around the 93rd percentile so it seems fairly likely that Jarome Iginla has a larger effect on shot creation than Malkin in 2007- 2008.(Again, not necessarily in real life because this is a small subset of data with only a subset of the real coefficients). You should be able to see that we can extend this pretty easily if we wanted to combine the shot creation and shot suppression estimates because we have a covariance matrix that estimates the standard deviation between all these parameters for both parameters of interest. There is another way to get this same test stat. Notice that we can always write $\beta_1 = \beta_2 + \alpha$ . $alpha = \beta_1 - \beta_2$ Consider an arbitrary regression of Y on X1 and X2. What this says is that if we simple add the covariates X_1 and X_2 together and make that a new variable and drop X_2 as a regressor for our model, testing the significance of the parameter associated with X_1 being different than 0 is the same as testing if the difference between the two original coefficients is different than zero. The added bonus with this is that we do not have to mess around with the covariance matrix of the model, the desired hypothesis will be in our summary output! newvar <- Xmat[,grepl("MALKIN",colnames(Xmat))] + Xmat[,grepl("IGINLA",colnames(Xmat))] colnames(newvar) <- c("Iggy_plus_Malk","Iggy_plus_Malk_Def") Xmat2 <- cbind(Xmat[,-which(grepl("MALKIN",colnames(Xmat)))],newvar) ff <- Xmat[,-which(grepl("MALKIN",colnames(Xmat)))] model2 <- lm(Fen ~ Xmat2) betamat <- summary(model2)$coefficients

test <- betamat[grepl("IGINLA",rownames(betamat)),]


Here we can see the estimated difference and we get the same t-stat, 1.54 for the difference in shot creation. We also see a significant difference in shot suppression, represented by the second row. A negative result in this model means better defense (it means that the player being on the ice is associated with fewer shots against).

Great, so we see that it is possible to do hypothesis testing between coefficients. If we want to test differences between shot creation + shot suppression we can extend either of these two techniques because the model we are using estimates the covariance between all the regressors in the model.

If you haven’t seen this stuff before and are following along you might be scratching your head at what we might do if we want to test the difference between sums of components from different models. If we use two different models, a covariance matrix between coefficients from difference models won’t magically appear will it?

Yes that is true, but we can still do it. It becomes trickier of course, but what we have to do is think very carefully about the errors in the different models. I am going to cut this off because I think this is already too long, but try to explain how we do it soon. Spoiler, we might even gain efficiency in the process.

Data: Corsica

Please go check out or even better, support Corsica and/or Evolving Wild , links to patreon’s below.  Both of these websites are amazing resources for people trying to do hockey analytics and the creators are doing really cutting edge work to make WAR models better all the time.

Corsica Patreon

Evolving Wild Patreon

And while I am plugging cool analytics projects, check these patreon’s out as well please!!!

Hockey Viz

The Ice Garden Podcast

# Zenon Ranking Plots

Uncategorized

This is a quick post visualizing my new Zenon Rankings for faceoffs. For an introduction to my Zenon Rankings and Elo-type models see this post.

The below plot shows 8 players throughout the 2015-16 season. As mentioned in the introductory post my model assumes that players start off with an initial ranking of 1500 and a deviation of 50. As players play more and more games their deviation parameters drop because we deem their rankings to be more and more reliable. Large deviations also mean that we also adjust rankings in a more extreme way in accordance with results.

Winning faceoffs particularly against tough opponents leads to ratings increases and losing lots of faceoffs especially to weak opponents causes decreases. In my other posts I focused on displaying the final Zenon Rankings or the rankings for each player after they took their last faceoff in the time-frame of the model. This only captures some of the information that an elo-type model can tell us. A well-specified elo-type model can detect improvements and declines during the season, and sometimes blending these kinds of ranking over a time-period can be more informative about actual ability than the final rankings alone.

Interactive Version

The bolded line represents the weighted average Zenon Ranking of 1507. Players that ended the season above the line are considered by the model above average drawmen and expected to win more than 50% of their draws against neutral competition. The opposite is true of those below the line. The numbers on the x-axis represent days during the 2015-16 season. There were 226 days in the 2015-16 season including the post-season where games were played. As you can see by looking at the Nuge or McDavid who played on teams not included in the playoffs, the post-season began on approximately day 180.

All of the players in the graph start at the same point, but they quickly diverge based on their initial results due to the deviance parameter. Although these initial divergences are drastic, they are not condemning in the sense that it is very possible to recover from a low ranking early in the season. Just look at the green line depicting Tyler Seguin for an example. On average, however, players that diverge downwards quickly, do not end up above average by the end of the season. This is part of the reason why the glicko model outperformed a standard elo-model in this application. Elo models do not have deviance parameters and so all players are adjusted equally regardless the number of faceoffs they have taken.

Looking at the purple line representing Evgeni Malkin we see that when he was hurt over two stretches that his ranking stays constant. When not playing, our model doesn’t update his ranking or deviation parameter. Second we can notice that he seemed to improve somewhat during the playoffs as represented by the uptick at the end of the season. It is too early to tell exactly how sensitive this model is to real improvements, but assuming the model is a decent measure of actual ability (it does have some levels of predictive power as outlined in the introduction post) we can say that Malkin was a better drawmen in the playoffs than the period just before he was injured. Whether this is because he was taking draws more seriously or playing injury free is another matter.

Red line represents Leon Draisaitl’s ranking after each day of the 2015-16 season including playoffs. The dark grey error bars are 95% confidence region based on Draisaitl’s deviance parameter and ranking. Here we can see that Draisaitl is approximately league average at taking faceoffs.

Here is a visualization of how the deviation parameter works. At the beginning we are assuming that Leon Draisaitl is highly likely to have a true value between 1400 and 1600. As he actually takes faceoffs his rating adjusts and his deviation shrinks. The error bars around the red line represent 95% confidence regions based on the deviance parameter. The dotted line on the graph roughly represents the beginning of playoffs. Since the Oilers didnt’t make the playoffs, Draisaitl never took any faceoffs and accordingly his ranking nor his deviation was changed. But even before the dotted line we can see that his deviance became less volatile. One feature of the glicko model that Mark Glickman originally recommended is that there is some sort of lower bound or slowing mechanism on the deviance parameter so that the model doesn’t reach a state of stasis. We can see the consequences of this model specification visually in the above graph.

All of the 2015-16 Zenon Rankings are here

For the complete rankings for 2008-2016 check here

And for a refresher on the Zenon Model check here.

If there are any specific players or teams that you would like me to take a look into, just let me know and I’ll get back to you. I’m always looking for feedback on what users find useful.

# Zenon Rankings 2015-16

Faceoffs

These are the Zenon Faceoff Rankings for the complete 2015-16 season. For an explanation of what Zenon Rankings are see here.

For the complete rankings for 2008-2016 see here.

For some Zenon Ranking plots check here.

# Zenon Rankings 2008-2016

Uncategorized

This is simply the full list of players who took more than 50 players from the 2008-09 season through 2015-16 sorted by the lower bound. Let me know if you see something interesting.

An explanation of the Zenon Rankings for faceoffs can be found here.

# Introducing an Elo-type model for NHL Faceoffs (Zenon Rankings)

Elo Ratings

Elo ratings were created by the Hungarian-born American physicist Arpad Elo in order to compare the strengths of chess players. The idea in it’s simplest form is that Chess players should get large ratings boosts if they beat a really tough opponent, who we’d expect them to lose to, and lose rank if they lost against a relatively weak opponent.

If two roughly equally matched opponents play each other, the adjustment should be small because the initial ratings suggested that the probability of a win for either player was roughly 50%. Given those prior ratings, either result should not be a surprise to our model and we should not update our new ratings very much to reflect the fact that our model more or less expected the results that did occur.

Elo models take many different forms and can even be extended to include margin of victory. To use the last NHL season as an example, let’s say our model in the middle of the 2015-16 season had the Washington Capitals as the strongest team in the league and the Edmonton Oilers near the bottom (STRICTLY HYPOTHETICAL) . We expect Washington to win, so we should only increase their rating if they are able to outperform our models expectations. Let’s say the model predicts that Washington should win 4-2. If they win exactly 4-2, we shouldn’t change any ratings, Washington and Edmonton performed exactly like the teams we thought they were.

You see then, it is possible in this system for a win to result in a ratings decrease and for a loss to increase a team ranking. If  Washington beats Edmonton 3-2, Edmonton outperformed their expectation, we should increase their rating to reflect the fact that they played like a slightly better team than we previously thought.

For faceoffs there are no margins of victory. The way we currently record the stat is simply win or loss and no shades of grey, so we will be sticking to a simple elo-type model without margins. That being said, the key idea in the model is still all about under/over-performing given our prior expectations and updating our rankings accordingly.

Parameters

Saying that this is an elo-type model, still doesn’t mean very much because there are many variations and a few different parameters choices which can make two elo models very different.

One of the most important choices for any elo-type ranking is the k-factor. Basically, the k-factor tells your model how big a single ratings adjustment can be. To return to the Oilers and Capitals examples, a really large k-factor would suggest that if the Oilers were able to beat the higher-rated Capitals that the Oilers should gain a bunch of points and maybe even place ahead of the Capitals in our post-game rankings. In hockey, of course, we know that single games aren’t very meaningful and that on any given night a relatively bad team can beat the best team in the league without the status of the better team being revoked. Individual hockey games are highly luck-driven and even the most hard-line, old-school fans will at a minimum agree that it makes sense to play more than 1 game in a playoff series to determine the better team. In a game where there is very little luck, we would want to have a high k-value, that way the model can adjust to real improvements or setbacks appropriately and quickly. In a game with lots of luck, we want to set a smaller k-value so the model doesn’t over-react to noisy and luck-driven results in the short-term.

My original elo uses a k-factor of about 1[1], which is pretty low for these kind of models. For reference, fivethirtyeight, who is famous for adapting elo models to rank players and teams in a variety of sports, uses 4 as their k-factor for baseball[2], 20 for the NFL[3], and 20 for the NBA[4]. Chess ratings tend to use somewhere in the range of 20-32 and often have separate k-factors depending on the pre-game ratings( i.e highly ranked players have lower k-factors so that their rankings are less prone to abrupt changes.)

It makes some sense that faceoffs would have such a low k-factor – nearly every centre has a win percentage in the 45-55% range and even defensemen, who are unlikely to do any real training, win roughly 35% of the draws they take. Individual faceoffs are highly luck driven and any given result should not be weighted too heavily. Beating Patrice Bergeron in a one-off does not make you the king of face-offs, and similarly my expectations of a centremen’s ability should not drop much in the face of a single loss – even to a no-name defensemen.

Another choice to be made is the initial ranking  of a player that your model has never seen before. I chose 1500 largely in accordance with convention, but determined the starting deviation of 50 using the validation dataset.

I also ended up not using elo itself exactly, but rather Glicko which is a model inspired by the original Elo model. It is more or less the same, with an additional parameter representing deviation or how much we should trust the ratings it produces. Players who take many faceoffs will have low deviations and players who take only the occasional faceoff will have a high deviation. The idea is that those with high deviations should be adjusted slightly quicker than those with low deviations because we trust the ranking less. As players play more games the deviation falls because we become more and more confident that the ranking is an accurate reflection of the players true ability.

The deviations also translate nicely into confidence regions (well, Central Posterior Intervals if you care about that kind of thing) and in the validation stage, I found Glicko to be slightly better at making predictions on unseen data. This was largely because poor faceoffmen who tend to not take many draws were dragged far below 1500 quickly, without introducing extra volatility into the rankings of established centers. In a true elo-model the only way to get the model to adjust more quickly is to increase the k-factor, but the unintended consequence can be that the final rankings are too heavily influenced by the last few contests and not a reflection of the sample as a whole. In the case of faceoffs, the dual ability to change the rankings of new centermen quickly without weighing the recent past too heavily turns out to be a predictive advantage. If you want more math check out the orginal paper here.

In general, for a new application the only way to choose a k-factor is to test it on unseen data.I used the data from four seasons – 2008-09 through 2011-12 – as the validation data. The validation process that I used is discussed in italics below.

I used 5-fold validation whereby I shuffled the data, split it randomly into 5 groups then would train the data on four of the groups and test on the remaining one, repeating so all five folds were left out once. The test-metric was an averaged capped binomial deviance (and RMSE for further validation) to see how well the different parameters performed on the unseen data. My model isn’t amazing at making predictions. In fact, it is really only a bit more than 1.2% better than guessing 50/50 for every single faceoff. If I have time I will try and repeat the results from the Schuckers, Pasquali, and Curro paper and test my model against their logistic regression, but their results show that simple winning percentage is already a decent enough metric and substantial gains are unlikely.

My model also made a few small adjustments for the strength state (PP or SH), home ice advantage, and whether or not the opponent was a player that we wouldn’t expect to take faceoffs normally[5].

Running the model on all of the data I have from the 2008-09 season all the way through 2015-16, here are the top 30 drawmen with a minimum of 50 draws taken[6]:

 Player Zenon Ranking Lower Bound Upper Bound Faceoff% Deviation Faceoffs 1 JEREMY.COLLITON 1576 1526 1626 62.99% 25.0 154 2 RADEK.BONK 1575 1550 1600 59.92% 12.7 751 3 MICHAEL.ZIGOMANIS 1575 1536 1614 61.51% 19.6 291 4 MANNY.MALHOTRA 1574 1561 1586 59.84% 6.1 6593 5 ZENON.KONOPKA 1572 1557 1587 59.81% 7.6 2884 6 ROD.BRIND’AMOUR 1570 1555 1585 59.39% 7.4 2556 7 SCOTT.NICHOL 1566 1551 1582 58.51% 7.8 2608 8 TRENT.WHITFIELD 1564 1522 1607 60.09% 21.1 233 9 DAVID.STECKEL 1564 1551 1578 58.86% 6.8 4565 10 PATRICE.BERGERON 1561 1551 1570 58.43% 4.8 13857 11 BOBBY.HOLIK 1560 1531 1590 58.54% 14.9 521 12 KRIS.DRAPER 1560 1541 1579 57.54% 9.4 1618 13 JONATHAN.TOEWS 1559 1549 1569 57.29% 4.8 13881 14 CHRIS.GRATTON 1559 1516 1601 58.52% 21.3 229 15 VLADIMIR.SOBOTKA 1556 1542 1570 56.95% 7.0 3036 16 JERRED.SMITHSON 1555 1541 1569 55.90% 7.0 3798 17 JAMIE.MCGINN 1555 1488 1622 62.50% 33.6 64 18 ADAM.HALL 1555 1539 1571 56.51% 7.9 2458 19 ANDY.MCDONALD 1555 1536 1574 56.25% 9.5 1591 20 PAUL.GAUSTAD 1555 1543 1566 57.05% 5.8 8377 21 BOYD.GORDON 1555 1543 1566 57.39% 5.7 7768 22 JAMAL.MAYERS 1553 1534 1572 55.96% 9.5 1569 23 MIKE.SILLINGER 1551 1497 1604 58.20% 26.8 122 24 JIM.SLATER 1550 1536 1564 56.58% 7.0 3694 25 BRAD.MILLS 1550 1496 1604 60.33% 27.1 121 26 RICH.PEVERLEY 1548 1535 1561 56.03% 6.4 4755 27 RYAN.KESLER 1548 1538 1557 55.49% 4.9 11968 28 ANTOINE.VERMETTE 1546 1536 1556 56.02% 5.1 11548 29 JOE.THORNTON 1546 1534 1557 55.74% 5.8 9561 30 WARREN.PETERS 1543 1517 1569 55.49% 12.9 719

How to Interpret these Results

A ranking of roughly 1500 is approximately league average, although because glicko is not exactly zero-sum like Elo (the deviation parameter changes the way rankings are updated slightly), the weighted average is closer to 1505.These rankings can be translated back into win probabilities. A 20 point gap between players is approximately 3% (but in actuality the probability calculation depends slightly on the deviation parameters as well). To take a concrete example, this model predicts that Manny Malhotra (ranking: 1574) would beat Jonathan Toews (ranking: 1559) about 52.15% of the time. Eventually I will include an adjusted faceoff percentage based off the rankings, that would be like an expected percentage against a league average player, but a few details still need to be worked out[7].

Notice also that I have included the lower and upper bounds using the final deviation parameters for each player. I think in future rankings I will list the players by lower bound in order to more explicitly account for the fact that the players who take a bunch of draws are more likely to have reliable rankings. For example, we can see that Jeremy Colliton has the highest ranking, but his lower bound is much lower than the next 5 or so guys because he took relatively few faceoffs.

The top 30 is a bit of a mixed bag. Some current allstars like Toews and Bergeron, solid two-way Olympian Rod “the bod” Brind’amour (that’s a real nickname btw), some no-namers like Jeremy Colliton and Brad Mills as well as former Oiler fourth-liners Boyd Gordon and Jerred Smithson.

The guy that popped out at me as immediately interesting was Zenon Konopka. He most recently played in Poland, bounced around with 6 NHL teams from 2003-2014, finishing the 2013-14 campaign for Buffalo with 23 games played and a single assist to his name. In the NHL the guy largely took on an enforcers role, racking up 1082 PMs in just 346 games. But in the AHL he showed that he could score some goals, putting up an impressive 11 goals and 29 points in 19 playoff games for the Portland Pirates in 2005-06. And this wasn’t a one-off, he had several AHL seasons near a point a game and finished his junior career in the OHL with an 86 point season. Not bad for a guy averaging roughly 3 PMs a game in the NHL and a goal every 30 or so.

He mentions his faceoff prowess and using faceoffs to find an edge in the bio of his personal website 3 or 4 times and seems genuinely committed to the craft. Off the ice, Zenon is the owner of several business including a wine products company and Zenon is a dedicated pet-owner to his beloved rabbit, Hoppy, who he has taken team-to-team since 2006. So in honour of Mr. Knopka –  the NHL enforcer with the silky AHL mitts – and his best bud Hoppy, I’d like to call these Zenon Rankings!

Link to the full rankings list for 2008-16 here

Link to the full 2015-16 rankings here

Link to Zenon Plots article about plots like the one feature below here

Credits, Acknowledgments

All the data for this project came from the play-by-play data on corsica.hockey. It’s a great site you should check out if you haven’t already and an excellent data source especially for those like me who are not pros at scraping their own data. If you really like it, consider contributing to the Corsica Patreon to keep it rolling. The Hockey Analytics community relies heavily on websites like Corsica and we can’t afford to lose another good one.

I used the PlayerRatings r package to implement the models. I also owe a bunch to Michael Schuckers, Tom Pasquali, and Jim Curro for their extensive work on faceoffs and the Glicko creator, Mark Glickman.

Footnotes and additional  model details (Nerds Only)

# Kobe Bryant Buzzer Shots

The past 10 days or so, I participated in a data competition on Kaggle. The challenge was trying to predict how likely Kobe Bryant was to score 5000 shots, given a training set of a little over 25,000. While I was messing around with selecting features for my models, I became side-tracked with understanding how the last few seconds of a basketball quarter is different than the rest of the game and what that means for a shooter like Kobe. I thought I’d share a few visualizations from this process.

To start, the last seconds of a close basketball game can be breathtaking and the stuff of legends. Even casual fans like me know about Jordan last shot where he broke Bryon Russel’s ankles to clinch the 1998 title.

And at least anecdotally speaking, the last seconds of an NBA game seem to operate differently. For example, the flow of the game changes, as trailing teams look for quick possessions to chip away at leads and teams ahead look to manage the clock. Occasionally the final seconds produce odds-defying shots from the back court in heavy pressure that find the net to the cheers or gasps of the crowd. And with all this myth-making and hype around the final seconds there is the idea that some players perform better under this pressure than others. Some players are considered clutch and winners like Jordan, others are considered “absent from the big moments”.

So this is a post to visualize some of the ways the final seconds are different for Kobe Bryant. I am limited in my understanding of the NBA, my knowledge of advanced basketball stats, and by the Kobe Bryant specific data set the Kaggle competition happened to provide for me, so in no way will this offer hard answers to finer questions about the game of professional basketball. But what I hope to do at least, is document the ways Kobe changes his behaviour with the clock counting down and to see if there is any unexplained greatness or weakness in Kobe’s game.

This next graph I made is what motivated me to look into this closer.

Shots are binned in groups of 5 seconds. The larger the bubble, the more shots were taken in the bin. The more opaque the colour the farther away those shots were taken on average. Red represents the last 5 seconds of the quarter.

What we can see is that shots in the last 5 seconds are huge outliers. Kobe Bryant in our sample made 44.6% of the shots he took, but in the last 5 seconds of the quarter this drops to just 25.4%. Additionally, we can see from the large bubble, Kobe is much more likely to throw up a shot in the last 5 seconds than any other 5 second interval.

Shot Selection

To establish a bit of context, the first thing I looked at was how do different shots vary in accuracy with distance. In the data set there are six basic types of shots: Bank Shots, Dunk, Hook Shot, Jump shot, Layup, and Tip Shot.

Interactive Version of Graph 2

A few things to note from the graph. Different shots have different probabilities of going in. Dunks go in more than 90% of the time, hook shots on the other hand are pretty hit or miss. The next is that the only type of shot that has an obvious trend in distance is jump shots, which makes sense. Jump shots should get progressively more difficult as you get farther from the basket. That is probably true for layups and dunks as well, but past a certain distance we don’t see any dunk or layup attempts for obvious reasons. So as distance changes, so does shot selection. Outside of the paint, Kobe’s only reliable option is the jumpshot. And as distances gets larger, Kobe becomes a less and less reliable marksman. At around 45 feet out, the Black Mamba’s stats could be easily confused for my high school stats, zeros all around.

What I want to understand is how much of Kobe’s performance in the last 5 seconds can be attributed to the simple fact that he is likely to be shooting from farther away. Or are there factors unaccounted for like psychological pressure, or increased defensive coverage that are the culprits behind Kobe’s lackluster final moment performance?

One way to get a sense of this is to ignore shot selection for the moment and look at just Kobe’s Jump Shots to see if there is an unexplained drop in performance, and  compare apples to apples so to speak. If the only driving factor behind lower shooting percentages was simply a change in shot selection (i.e Kobe hits his shots just like normal except he is forced to take lower percentages shots like jump shots instead of dunks), we should still see Kobe hit jumpers at his normal rate. If not, there may be other factors at play.

Interactive Version of Graph 3

So what we see is that even comparing jump shots to jump shots, Kobe is much less likely to score in the last five seconds, 21.2%, versus his normal clip of 39% during the rest of the game. Again, he also shoots many more shots in the last 5 seconds than a typical 5 second interval during the game as represented by the larger bubble.

A typical Kobe shot is taken in the 12-15 ft. range, with 13.44 ft. being the average and 15 ft. his median shot in the sample. In the last 5 seconds, these shots are taken from over 20 ft. on average. A typical jumper for Kobe is in the 17-19 ft. range, with an averate of 17.37 and a median of 18 ft. In final 5 seconds, Kobe’s averages jump shots from over 24 ft.

Let’s visualize the difference.

Interactive Version of Graph 4

Only Jumpshots are included in the graph

Here we divide all of the jump shots up into categories of roughly how far away the shots were (Less than 8 ft., 8-16 ft., 16-24 ft., 24 ft. +, and back court shot), keeping the x-axis the same and colouring the last 5 seconds in red again.
Comparing the shots in each range to each other directly, the difference between the last 5 seconds and other intervals is not so large. Take the 8-16 ft. range in the above graph. When we compare how Kobe did in the last 5 seconds for shots between 8-16 ft. in the last 5 seconds to shots between 8-16 ft. in the rest of the game, it is no longer a huge outlier. It is definitely near the lower end of the expected shooting percentage range, but comparing the red dot in this panel to graph 3 above it looks more like a typical 5 second interval. This is similarly true for shots take from less than 8 ft. (Panel 1 in graph 4), shots from 16-24 ft.(Panel 3 in graph 4), and to a lesser extent 24 + ft (Panel 4 in graph 4).

Another thing that we notice in graph 4 is that many more shots were taken from 24ft. + or the back court than in the rest of the game, as seen by how the red dots are much larger in panel 4 and 5 than the black dots in the same panels. So the story that is starting to emerge is not so much that Kobe is somehow 20% worse at shooting during the last 5 seconds, but from graph 4 he seems to shoot a bit below average, but his shots are from worse distances than we typically observe during the rest of the game.

Diving deeper, I found that even within these distance ranges (each represented by a panel in graph 4), that shots in the last 5 seconds were likely to be farther away. For example, the average 24ft. + shot (panel 4, graph 4) is taken from pretty close to 24ft., but in the last 5 seconds they are taken from around 27 ft..

Even though we haven’t used any fancy regression or machine learning techniques to this point, we can reasonably believe that distance is one of the main reasons that shots in the last 5 seconds are much less likely to go in. When we compare similar shot selection at similar distances, shots in the last 5 seconds tend to look much more like typical 5 second bins in our dataset.

Let’s visualize all of the jump shots he has taken to get a better sense in how they are different in the last 5 seconds than in the rest of gameplay.

Interactive Version of Graph 5

Visualation of all jump shots taken. Last 5 seconds on the right, all other jumpers on the right. Red means the basket was scored, black represents a miss.

What we see in graph 5 is that the vast majority of the shots are taken from inside the 3 point arc, and those outside the line are almost all within a couple feet of it. On the right part of the graph, the shots are much more erratic, many of which occur outside the 3 point line and on average they are far from the line. The story we see here is a Kobe that is desperately fighting the clock and throwing up whatever shots he can to hopefully score another bucket and push his team over the edge.

I used a simple logistic regression to test some of the ideas we developed with the visualizations. Once I control for distance (modeled as a 3rd degree polynomial), area on the court (as in right-side, left-side, back-court, etc), and shot selection, there is only roughly a 1.4% shooting percentage gap left to account for in the last 5 seconds of the quarter. So Kobe doesn’t shoot 20% worse in the last 5 seconds in the quarter, he mostly just shoots from crappier positions on the floor. According to this logistic model, he is also significantly less likely to take dunks and layups like we suggested earlier, and much more likely to take jump shots from beyond the 3 point arc.

And this is without factoring in any sort of defensive pressure that may be put on Kobe at the end of a quarter, which is information that I do not have included in my dataset. What this suggests to me at least is that it is unlikely that the pressure of performing in the big moment is driving Kobe’s performance at the end of quarters. However, this still doesn’t tell us what Kobe’s value is in pressure moments, because his true value would be above replacement or above expectation. Should Kobe be trusted with the last shot? Does he do enough to get into good scoring position with the game on the line?

I have no idea…

The data set that I have and any models that I developed for this project are not sophisticated enough to tell us how Kobe does compared to giving the final shot to someone else, or whether or not we should have expected Kobe to get himself into better scoring position as the clock ticked down the final seconds, only that from where he was shooting he shot 1.4% worse than he did in the rest of the game.

# Shooting % Part ii

In part 1 we quickly introduced shooting percentage and examined briefly the role of luck and skill in this number. In the process we touched on regression to the mean and ‘mathematical luck’ and some of the factors, like  variables we do not control for like changing line mates or opponents and statistical noise, that lead to it.

So what is the role of skill and luck, and can we reduce the ‘luck’ that we see without compromising our data experiment?

It is important to distinguish that reducing statistical noise and reducing the other factors that look like it are not the same thing. To reduce noise, or more specifically the variance that is inherent to shooting percentage, we need to increase the number of ‘events’ or shots. If we could somehow keep everything in hockey exactly the same except everyone took two or three times as many shots, on average each players shooting percentage would be closer to his true talent level “shooting percentage”. Considering this, in small samples it is sometimes useful to think of a players talent level and his actual output as different concepts.

Shooting Percentage and Talent

Let’s imagine two twins, not named Daniel and Henrik, who have the exact same skill and talent set. Let’s say their true talent level is 20%, meaning that they should score 20% of the shots they take each. If, in practice, we made them take 5 shots each from the top of the circle would they both score exactly 1/5 shots for 20%? Of course not. In a 5 shot sample there is likely to be lots of variance, one twin might easily score 3 of his shots for 60%, while the other 0%.

Like we said earlier, we can reduce this statistical noise or variance by making them take many shots. Lets say several hundred. If each twin took 500 shots from the top of the circle, we still might not expect them to score an identical number of shots, but their percentages should be much closer. It would be highly unlikely for a 20% talented shooter to score 60% of 500 shots for 300 goals, just as it is unlikely that the other misses all 500 of his. The idea here is that as we increase the number of shots, this kind of variance should decrease and in a closed environment, like the practice rink from the top of the circle, we should see someones shooting percentage  reflect more and more closely their true talent level.

If this were the end of the story, however, what we should see is if we simply restrict our sample to players that took a lot of shots, then we shouldn’t see much luck or regression to the mean. But this is not exactly what the data tells us.