emadsen wrote: ↑Wed Jun 02, 2021 1:31 am
My engine had PVS and aspiration windows. I removed aspiration windows (except for MultiPV > 1 and limit strength mode) and the engine plays slightly stronger. In other words, I found in my engine PVS is good enough on its own.
If I understand your post correctly, you've added aspiration windows but don't yet have PVS. I'm guessing you'll gain strength when you add PVS (so aspiration windows + PVS). However, I wonder at that point, if you disable aspiration windows (leaving only PVS), does your engine play stronger than when both are enabled?
Hi Eric,
I've started testing features separately. Killers and PVS are clear winners, testing with CuteChess's SPRT function.
Condition:
Code: Select all
sprt elo0=1 elo1=10 alpha=0.05 beta=0.05
H1: Engine A is at least 1 elo stronger than Engine B.
H0: Engine A is NOT more than 10 elo stronger than Engine B.
The SPRT-test terminates as soon as it confirms either H1 or H0.
I think it's confusing that Elo0 belongs to H1, and Elo1 belongs to H0. At least. At least, this is how I interpret the CuteChess description:
Use a Sequential Probability Ratio Test as a termination
criterion for the match. This option should only be used
in matches between two players to test if engine A is
stronger than engine B. Hypothesis H1 is that A is
stronger than B by at least ELO0 ELO points, and H0
(the null hypothesis) is that A is not stronger than B
by at least ELO1 ELO points. The maximum probabilities
for type I and type II errors outside the interval
[ELO0, ELO1] are ALPHA and BETA. The match is stopped if
either H0 or H1 is accepted or if the maximum number of
games set by '-rounds' and/or '-games' is reached.
Result:
Testing PVS in 2.2.100
Score of Rustic Alpha 2.2.100 vs Rustic Alpha 2.1.100: 304 - 205 - 140 [0.576] 649
... Rustic Alpha 2.2.100 playing White: 175 - 87 - 63 [0.635] 325
... Rustic Alpha 2.2.100 playing Black: 129 - 118 - 77 [0.517] 324
... White vs Black: 293 - 216 - 140 [0.559] 649
Elo difference: 53.4 +/- 23.9, LOS: 100.0 %, DrawRatio: 21.6 %
SPRT: llr 2.95 (100.3%), lbound -2.94, ubound 2.94 - H1 was accepted
Testing killers in 2.1.100:
Score of Rustic Alpha 2.1.100 vs Rustic Alpha 2: 291 - 203 - 145 [0.569] 639
... Rustic Alpha 2.1.100 playing White: 162 - 88 - 70 [0.616] 320
... Rustic Alpha 2.1.100 playing Black: 129 - 115 - 75 [0.522] 319
... White vs Black: 277 - 217 - 145 [0.547] 639
Elo difference: 48.2 +/- 23.9, LOS: 100.0 %, DrawRatio: 22.7 %
SPRT: llr 2.95 (100.3%), lbound -2.94, ubound 2.94 - H1 was accepted
So, roughly 50 Elo per feature in self-play.
(The version with PVS obviously also had killers. The version with killers tested against the current master version, Alpha 2.)
The test with Aspiration Window (on top of killers and pvs, 50cp window, reset to INFINITY if it fails) is now running. It's at 51% +/- 0.2% against the version with killers+pvs, so it's unlikely this is going to make a huge difference. (Edit: while I was typing this post, the version with AW dropped to 49.8%. We're 1200 games into the test. So, I feel as if AW are not going to make any clear difference, at least not with the 0.5 window, and a simple evaluation.)
I'll test aspiration windows with a smaller margin (0.25 or 0.33 instead of 0.50), but if that doesn't bring a clear strength increase, I"m side-lining this feature to test in a later version of Rustic. After this test, I'll re-implement history, and test that as well.
Somewhere I'll have to decide if I'm going to postpone Alpha 3's release further, or just release it with killers+pvs, and use aspiration windows later (and possibly postpone history until I implement LMR).