Jump To:
Introduction
Player progression is a big part of my model for the upcoming NBA season. How do NBA players develop over time? What causes players to develop? If you can answer these questions, you can get to answering team questions too. Although I won’t discuss team-level statistics in this write-up, I thought this intermediate step made for some interesting analysis in itself.
Things to keep in mind
To rank players, I use Value Over Replacement Player (VORP). As with any advanced stat, it isn’t perfect: namely, it over-rewards a large role on offense, and also under-rewards off-ball defense. If there’s a player you think is criminally underrated, it might be worth looking at where VORP rated them last year.
The model also takes a relatively cautious approach. It’s not going to forecast huge leaps and bounds in improvement, and I’m okay with that. This approach may miss a couple of breakout stars or big disappointments, but overall leads to a more accurate prediction system.
Other notes about the model
VORP is minutes-adjusted. If someone has missed extended time in the past two years, my model will generally capture and forecast that accordingly. This can sometimes produce an unreasonable estimate for future production. For example, Ja Morant’s 25-game suspension is currently treated as if a player was injured for 25 games. This would be worth considering in subsequent analysis.
Because rookies have no NBA experience, forecasting first-year contributions is a different project. Although I am currently working on a draft model, for now, I’m leaving out rookie forecasts. Young stars (Wemby and Chet for example) are also typically underrated in the current model. I have some ideas to improve this which I’ll also discuss.
Top Ten Compared to Vegas Odds
When comparing my model’s top ten, compared to Vegas’ preseason odds for MVP, some interesting ideas arise.
Seven of the top ten players are also agreed upon, and I’m pleasantly surprised that we also agree on the top three candidates: Jokic, Doncic, and SGA. I think these three (plus a healthy Embiid) are clear pre-season favorites for MVP, and nice to see some consistency here.
Notably, my model discounts Embiid’s chances of being a top MVP candidate compared to Vegas, primarily because of his health. Embiid wasn’t even eligible for MVP last season, playing only 39 games. My model does see him returning closer to his normal self but is cautious in assuming he’ll be back to where he was two years ago.
My model’s most improved player, Ja Morant, also may be overly cautious As previously discussed, Ja had a 25-game suspension last year which was quickly followed by a season-ending shoulder surgery. This resulted in Morant only appearing in nine games. My model does have him as the largest gainer, but not enough to reach all-star status. After already having a preseason injury scare, health may continue to remain a question mark. Although I could manually adjust the data to exclude Ja’s suspension, I’ve decided against making one-off changes at this stage. This keeps the model consistent, but for future analyses, such manual adjustments could improve accuracy in cases like his.
Anthony Edwards and Victor Wembanyama both highlight a potential drawback of my model: its relative inability to capture fast-rising stars. My model is likely mean regressing them closer to what a “typical” 20 or 23-year-old plays like in the NBA. I currently don’t include draft position as a predictor in the model, and it’s something I’d like to improve in a future version to better capture rising stars.
Trending Up
Most of the forecasted improvements are players who may have missed extended time last year and hope to return to form. Darius Garland stands out as a good example. Between December and February, Garland reportedly lost 12 pounds due to his fractured jaw. These things do take time to recover from but likely won’t have a big impact on his future health. Other players such as Embiid, Morant, Miles Bridges, and Scoot Henderson also look to make a splash following extended time out.
There’s also a group of young guys which my model shows improving this year. Keyonte George is a great example, who looks to continue to grow as a young guard on the Jazz. Walker Kessler is expected to continue to grow as one of the league’s premiere young bigs. Gradey Dick and Bilal Coulibaly also fall into the bin of young guards who are primed to improve.
Trending Down
Kevin Durant, Mike Conley, and Chet Holmgren are the three players expected to take the biggest step back this year. Both Durant and Conley played fewer than 40 games two years ago, even though they had super healthy years last year. The model expects them to be somewhere in between those values, while also accounting for their age. Both players are now 36 years old. Although they played 70+ games last year, it’ll be interesting to see if they can keep that up.
Holmgren on the other hand, played all 82 games last year following a right-foot Lisfranc injury before his original rookie season was slated to begin. I think this may be a bit of an over-exaggeration. Holmgren's case is unique due to the extremes we've seen in his first two years—zero games in his rookie year, followed by a full 82-game season. This variability, combined with the model's struggle to fully account for rising stars, may lead to an especially bad prediction for him. While he remains a bit of a 'unicorn' in this respect too, future updates to the model may improve predictions for exceptional cases like his.
Improvements and Next Steps
There is always room for improvement in analytics, and this model is no exception. As previously mentioned, there are two big improvements I’d like to make: forecasting rookie contributions and better capturing rising stars. The cautious approach my model takes does a good job of being correct most of the time, but young stars seem to be the odd men out here. I think adding controls for draft position (maybe only up to a certain age or experience level) would do the trick here.
I also don’t currently account for new teammates or a new team. When players have a change of scenery or a new role, they may play better or worse and my model doesn’t currently capture this. Finding a better way to capture this would be great. Coaching changes would be nice to investigate too.
As far as the next steps, I hope to do the following. First, I hope to get a projection for each player’s minutes played and build team rotations. Second, I’d like to create power rankings and win projections for each team. I hope to get these articles out before the season officially starts. I’ll continue to improve the model throughout the season but it’ll be fun to officially share my pre-season predictions!
The next section is a little more in-depth, but feel free to read if you’re interested in how I created the model I discussed in this paper. Otherwise, thanks for reading this far and I’d appreciate any suggestions for improvement!
Methodology
This section exists for those who are interested in analytics and want to understand how the model is built.
I originally fit two models for this scenario, to be used for two potentially different reasons. The first, multiple linear regression, is beneficial when considering the marginal effects of predictors. The second, a general additive model (GAM) is beneficial as it allows for smoothing terms. Both offer their pros and cons but are fit on the same variables. Both models have extremely similar results.
I have an adjusted R-squared of 0.648 and an RMSE of 0.704. Predictors in the model include last year’s VORP, a trend from the year before last year, games played, position controls, age controls, and an interaction term with an indicator variable if a player had a VORP over 3.
The following graph shows a generalized additive model trained on 2013-2023 regular season data, and compares predicted vs actual VORP last season:
I also normalize the results to match the distribution of what has happened in the past. Other studies have done similar, and I am comfortable normalizing.
It was relatively difficult to find publicly available player forecasts to compare for quality. This GitHub repository did something similar to me, using improvement in win shares as the response. My model offers major improvements over the linked approach when fit on win shares, with an RMSE of 1.2 compared to their RMSE of 2.86. These improvements don’t come from the predictor variables themselves, but rather the timeframe (they used data going back to the 80s), response variable (they predict improvement instead of overall value), and normalization (they normalize using improvement rather than overall value).