User blog comment:Sirebel/Mathematician Required - Apply Within/@comment-27170954-20170425003411/@comment-27170954-20170425130304

Thought about this some more. You don't need the expected increments. I was thinking about estimating the fractional part when I wrote that, but it is easier to estimate the whole PR contribution from an upgrade.

Question: do you think the PR contribution from an upgrade category is the same for all levels in that category? i.e. is the incremental increase in PR for Engine 1 the same as Engine 2 and the same as engine 5? Or do you think they are different? I looked at several cars – perhaps 20 – and I think it is possible that they are the same. The range of values in a category is always 0.0 or 0.1 Using your example, all the engine upgrades are 0.2 or 0.3 and they all could be the same (likely between those values)

The form of a regression equation and the number of samples required would depend on the answer to this question. If you assume all upgrades in a category are the same, then there are 7 independent variables and the regression equation would be:

PRug = PRbase + PRen*Lvlen + PRdr*Lvldr + PRbo*Lvlbo + PRsu*Lvlsu + PRex*Lvlex + PRbr*Lvlbr + PRtw*Lvltw

PRug is the resulting PR after upgrades is applied where ug is the upgrade index (i.e. 4233341)

Lvlxx is the upgrade level (0, 1, 2…)

You’d collect the data for PRug (the dependent variable) vs. the independents (Lvlxx) and the statistical regression will estimate all the PRxx values.

The minimum size of the dataset depends on the number of independent variables. A theoretical minimum is 2 per variable. 5 per variable is a more practical minimum and 10 per variable is more robust/conservative. Larger samples will result in better estimates of actual values. There are 7 variables here, so 14, 35, or 70, or even more. If you fully upgrade the car 3 times, making sure to take a different (non-duplicating) path each time you’d have over 70 samples. Doing it 4 times would get you about 100, allowing for some unintentional duplicates. (if you do this, record PR and the upgrade index for each category (i.e. 7.1 and 5443343)

With smaller samples, you must be careful with the structure of the data collection. For the 14-sample set you’d want to be sure each category is represented twice. For the 35-sample set, you’d ideally like each category represented 5 times. This becomes less critical for the 70-sample set, but each category still needs to be adequately represented.

If the PR additions in a category are not constant, the regression becomes much more complex. 26 variables for this example upgrade tree and a correspondingly larger dataset. But it can be done.