Discussion about this post

User's avatar
Casey's avatar

I don't mean to be a jerk, but I don't think you discovered anything here except the Central Limit Theorem. of course sample standard deviation depends on sample size! sometimes it's instructive to make up some data. (that's the great thing about data science. statisticians are supposed to know this stuff cold, but we can always code up a quick computer simulation.)

imagine two players, one shoots 30 3's a week, one shoots 15 a week, both shoot 40%. simulate shooting %age for 24 weeks.

bleph_curry = [sum(np.random.choice(2, 30, p=[.6,.4])) /30 for x in range(24)]

klay_chompson = [sum(np.random.choice(2, 15, p=[.6,.4])) /15 for x in range(24)]

## can you guess what this will be? it's not a trick question!

np.std(bleph_curry) / np.std(klay_chompson)

shooting percentage will also affect standard deviation - the closer it is to 50%, the higher the variance, because that's how the binomial distribution works (n*p*(1-p)). so bad shooters are going to look a bit more consistent than good shooters.

finally, even if you had the same sample size for each player, you'd want to show confidence intervals for the SD. based on a quick look at the data, there is not a statistically significant difference between the 95% CI for the SD of Klay Thompson [.06,.11] and Luka Doncic [.077, .14]

Expand full comment
1 more comment...

No posts