I finally got it to outperform the simple 20,000 game method. But I had to make the resolution very low, I think I used 0.2 ELO and I had to use very high alpha and beta values - something close to 0.50! So those terms are not making sense to me but the simulation is returning a more sure evaluation with less games.Michel wrote:BTW It occurred to me that with your simulation program you could actually verify if the results of wald are correct.When I test Komodo versions head to head on one particular level on one particular machine, I get about 51% draws. Should I set the draw ratio to 0.51?
I have never confirmed them by simulation, only by some obvious sanity checking (like verifying that certain probabilities sum to 1).
The mathematics for deriving the formulas is a bit complicated so one has to be on the lookout for mistakes.
It also seems that if I am willing to up the 20,000 to 30,000 or 40,000 games I can get way up there, I can get 93% with 32,000 games.
If I want to almost guarantee no false positives, I'm sure this becomes much easier. It's obvious that you can get arbitrarily close to 100% by throwing out any version you are not sure of - but that of course is also a tradeoff - how many good versions are you willing to throw away because it is not clear?
Don