CASSIE: Context-Adjusted Simulation and Statistical Inference Engine

One of the most important aspects of player valuation involves projecting the distribution of future player performance, relative to some baseline. This has been a particular focus of my projection and simulation project, CASSIE (Context-Adjusted Simulation and Statistical Inference Engine). The core of typical projection systems involves taking an average of historical data, “regressing” in some way, and adjusting for age. I believe CASSIE improves upon the public projection systems in two main ways. First of all, its use of Bayesian inference allows for much more flexibility than regressing towards the league average. Bayesian inference is important because it allows for prior information to be incorporated, and these priors can come from subjective expertise or from other types of data than what is being predicted. Prior information such as height, Statcast data, or even scouting reports can provide superior baselines to combine with historical data. The other main improvement of CASSIE is that the proper context of the historical data is explicitly accounted for. If two players put up the same historical stats, but one did so in a more difficult environment, CASSIE would project more from him going forward. Public systems might acknowledge the need for considering context in the form of park adjustments, but factors like platoon advantage and quality of opponent are often omitted. Quality of opponent is a particularly important contextual factor in the presence of stark differences in talent across the league, for instance in a minor league featuring a mixture of “org guys” and future MLB stars. Through its Bayesian updating, CASSIE thus can make the best use of historical data, whether play-by-play or season-long, and combines it systematically with prior assessments to form a distribution of future performance.

Once a player’s projected distribution of performance is established, some economics must be applied to find a dollar value. Public methods typically assume a linear dollar value per win above replacement level. However, due to talent distribution and roster constraints, many have theorized that a nonlinear value would be more appropriate. Furthermore, making the playoffs represents a discontinuous jump in revenue compared to missing the playoffs, so many theorize a “win curve” would dictate that teams expecting to be on the cusp of making the playoffs would value wins more highly than teams further away. If valuations do involve nonlinearities, then understanding a distribution of possible levels of production becomes more important. Simulations from a tool like CASSIE can be even more valuable in going beyond mere point estimates. Finally, time is an important component as well. Just as there is said to be a time value to money, there could be a time value to production that dictates that wins in the near term are worth more than wins in later seasons (although this may or may not be offset by expectations that the cost per win may inflate in the future).