3-Point Consistency in the NBA

Data manipulation and visualizations exploring 3-point consistency in the 2023-24 NBA Season.

Aug 09, 2024

Introduction

When it comes to shooting specialists in today’s NBA, there are plenty. It seems every young 3-point specialist is an instant lottery pick, and every other lottery pick is “a 3-point shot away from being an all-star”. The Warriors pioneered this behind-the-arc barrage, and this year’s Celtics showcased another great example of spacing and shooting.

When analyzing the best shooters, overall 3-point percentage is pretty hard to argue with. How many shots did you take, and how many did you make? Over the course of the season, or even many seasons, this percentage can reveal a lot about a player. In general, it’s a pretty good representation of their ability too! But I want to focus in on one less often aspect of 3-point specialists: catching fire and getting cold. 3-point slumps are no rarity, and even the best shooters have cold spells (for example, Duncan Robinson). Similarly, there are also times when it feels like a player just can’t miss.

3-point volatility was an interesting idea brought up to me in a recent conversation: I know this guy can shoot, but how consistent is he? Is he going to be lights-out one night and then chucking bricks the next? Coaches and teams want consistency: someone who won’t disappear in the middle of a playoff push (or even worse in the playoffs themselves). In this analysis, I’ll explore week-by-week 3-point consistency in the 2023-24 NBA Season, and discuss how teams could use this to their benefit. I’ve also included an interactive table and charts, that I hope can allow you to do some self-exploration if you’re interested too!

Data: Reasoning and Preparation

When considering volatility, it was quickly apparent that a game-by-game basis was too small of a sample size. Players just don’t shoot enough to get an accurate representation of volatility at this narrow of an observation. Weekly data on the other hand is a small enough timeframe to capture hot and cold streaks, but large enough to justify using a percentage. For this data, I include players who took at least 100 3-point shots in the 2023-24 regular season, and only include weeks where they took at least five 3-pointers. This gave me a sample greater than 250 players, which was plenty big for this use.

To prepare the data for this analysis, I had three main steps. First, I used NBA Stats’ API to access the regular season data using python. I next cleaned the data in R, and finally created charts using Datawrapper. If you’re not interested in the data analysis side of things, feel free to skip this section! If you want to know some more details, read on.

My hope for the data was simple: aggregate box scores into weekly totals, and then create distributions for each player. I found a Kaggle dataset that had 99% of what I looked for, but unfortunately didn’t actually include the game date, just the game ID. Luckily though, the creator of the data had also posted their python code on Kaggle, and it was fairly simple to modify that code in a script of my own. The only change I made was to add the game date into the box score statistics.

I then had a dataset of each player’s stat line from every game of the season. Next I created a “week” variable (starting on the first date of the season) and collapsing to get aggregated weekly shooting splits. From there I pivoted the table wide so each observation was a unique player, and the data included their 3-point data from each week of the season. This final data frame allowed me to calculate each player’s mean and standard deviation of those weekly shooting splits. I also include the season-long 3pt stats for reference, as there is some slight variation between average of the weekly splits and overall average. If any of this is unclear, leave a comment and I’d be happy to explain!

The following table may look a little off on mobile. Should still have the same info:

When investigating the above table, it quickly becomes apparent that the best shooters are also very consistent. Some of this may come from a large sample size (I’ll get into that in the future improvements section) but overall I’d say that consistency is worth valuing. There are of course consistently bad 3-point shooters too, and the following graph explores this relationship:

Regions of the above graph are shaded at the median, with more consistent (lower SD) being in yellow/green and better shooters being in green/blue. You can of course explore this graph on your own (put your mouse or tap on dots to see individual players) as well as searching the above table for specific numbers.

Steph Curry, Michael Porter Jr., Grayson Allen, and CJ McCollum are all some of the most consistent, high-quality shooters in the league. Porter Jr. especially stands out as he is sometimes considered inconsistent but this data may argue otherwise. Simone Fontecchio and Desmond Bane also stand out as lesser-known but ultra-dependable shooters. Generally speaking, the green-shaded region are solid, consistent 3-point shooters.

The top right on the other hand consists of good, yet inconsistent, 3-point shooters. A lot of these players don’t take threes as often, and aren’t quite known as specialists behind the arc. I’d be hesitant to sign these players as a 3-point specialist (save Luke Kennard and a few others) but if they brought other skills to the table, inconsistency wouldn’t be a deal-breaker.

The top left (unshaded) region is where you start to get worried. These are players who are both inconsistent and low-quality shooters behind the arc. Josh Hart, Cristian Wood, and more are all great players in their own respect, but improving their 3-point consistency could add value to their game. Russel Westbrook is another interesting one here, and I’d like to see previous seasons data: was he more consistent in the past?

The bottom left is made up of low-quality shooters behind the arc, but at least you know what to expect. Ausar Thompson is a terribly poor 3-point shooter, but at least it’s consistent? I’d say representative players of this group include Marcus Smart, Jaren Jackson Jr., and Kyle Kuzma.

How could this be used?

When it comes to practical applications, there are two primary uses. The first is identifying undervalued consistent shooter (an ultra-consistent 36% 3-point shooter can add a lot more value than you’d expect). The second would be for an internal team to identify current shortcomings and address them.

My guess is that most of the inconsistent high-volume guys struggle from poor shot selection more than anything else, and being able to track that would be really useful. Being able to identify areas for improvement within the current roster is an often-overlooked strategy for improvement. Player development is key!

Shortcomings of the metric:

As with any analysis, there is clear room for improvement. The first and most important note is that there is no formal hypothesis testing being done. Obviously I could, but I’d prefer to use this as a starting point for discussion instead of trying to make a bold claim.

The other obvious issue with this study is sample size. Good shooters will take more threes and there’s something to be said for that. For players who don’t shoot as much though, sample size can be a legit issue. Here’s a graph of the same volatility metric on the Y-axis, but this time with 3-point volume on the X-axis:

As you can see, standard deviation depends on volume, and that clearly makes sense. If you’re only taking 5-6 threes per week, there’s a lot more room for weekly variation compared to someone who takes upwards of 5-6 in a night. It’s a clear shortcoming but I’d argue the analysis still passes the eye test.

Another way to look at this would to classify players based on fitting a trendline and taking that residual (projected vs actual Week-SD). You could then use that residual to classify players into three groups and compare those groups. That might also reveal new insights and is one potential solution to control for volume.

Conclusions

If there’s one takeaway from this, it’s that consistency should be further investigated. Over the course of multiple years, teams want to depend on their best players and know they can trust them to not disappear in an important series. Obviously, consistency between the regular season and playoffs is a whole different analysis, but this write-up serves as a good starting point. If you have any advice for improvement, as always, please leave a comment! I benefit from new perspectives and advice. If there’s anything else you’d be interested in seeing, let me know too.

Casey

Dec 3

I don't mean to be a jerk, but I don't think you discovered anything here except the Central Limit Theorem. of course sample standard deviation depends on sample size! sometimes it's instructive to make up some data. (that's the great thing about data science. statisticians are supposed to know this stuff cold, but we can always code up a quick computer simulation.)

imagine two players, one shoots 30 3's a week, one shoots 15 a week, both shoot 40%. simulate shooting %age for 24 weeks.

bleph_curry = [sum(np.random.choice(2, 30, p=[.6,.4])) /30 for x in range(24)]

klay_chompson = [sum(np.random.choice(2, 15, p=[.6,.4])) /15 for x in range(24)]

## can you guess what this will be? it's not a trick question!

np.std(bleph_curry) / np.std(klay_chompson)

shooting percentage will also affect standard deviation - the closer it is to 50%, the higher the variance, because that's how the binomial distribution works (n*p*(1-p)). so bad shooters are going to look a bit more consistent than good shooters.

finally, even if you had the same sample size for each player, you'd want to show confidence intervals for the SD. based on a quick look at the data, there is not a statistically significant difference between the 95% CI for the SD of Klay Thompson [.06,.11] and Luka Doncic [.077, .14]

Expand full comment

1 reply by Vaughn Hajra

1 more comment...