Progress on Rustic

mvanthoor · Post by **mvanthoor** » Thu Jun 03, 2021 1:00 am

Ras wrote: ↑Thu Jun 03, 2021 12:30 am
mvanthoor wrote: ↑Thu Jun 03, 2021 12:00 amThanks for confirming. I've been unsure about VICE's implementation of history:
Yeah, I wouldn't do it that way, and with the killers, I don't just shift the current quiet killer move into the killer slots. I do it only it the current killer move is not already the one in the first killer slot, or else I'd end up with two identical killers, which defeats the point of having two slots.

Same here. It improved the Elo rating by about 15 points (+30 -> +45). At some point I realized you're bound to end up with the same killers, because if a move causes a beta cutoff, then it's a killer... if that killer then causes the beta cutoff, it becomes a killer again, and it's possibly a duplicate.

Which doesn't make too much sense because say moving a white knight to e6 and having an outpost there is a totally different thing from Black moving his knight there where it is probably in the way. Or moving a white rook to the 7th rank and controlling it is doing much more than Black putting a rook on the 7th rank.

Yes. That's why I decided to add the "side" to the indexing. I do not have 12 piece types, I only have 6. And I prefer a "side" index over two separate arrays, because then I don't have to distinguish between arrays when putting a piece in it.

I guess my history implementation is one of the few that have both static aging, dividing everything by 8 when search starts, and dynamic aging, halving everything when the 0-2047 range would overflow during search. This happens rarely enough so that it doesn't cost time.

So you keep the history for the entire game? Have you ever tried what happens if you keep it for only a single search?

I also rank the move from the null search of on level up because that's a threat move, and if the opponent's move from one level up hasn't addressed the threat, that can cut off here.

Yes, but I don't have a null move routine yet

Ras · Post by **Ras** » Thu Jun 03, 2021 1:23 pm

mvanthoor wrote: ↑Thu Jun 03, 2021 1:00 amSo you keep the history for the entire game?

Yes.

Have you ever tried what happens if you keep it for only a single search?

I gave it a run of 26k games at 10s/game, and resetting history upon a new search lost 6 Elo (+/- 4 Elo) in self-play. I'd expect a loss of 4 Elo against other engines.

mvanthoor · Post by **mvanthoor** » Thu Jun 03, 2021 2:12 pm

Ras wrote: ↑Thu Jun 03, 2021 1:23 pm I gave it a run of 26k games at 10s/game, and resetting history upon a new search lost 6 Elo (+/- 4 Elo) in self-play. I'd expect a loss of 4 Elo against other engines.

Thanks. In that case, I'm not even going to _try_ and keep the history for the entire game. I'll gladly eat the 4 Elo loss, as I don't have to change anything in the structure of my engine. The search is now completely self-contained, except for the TT, which is shared. IT is in the Engine object, shared through an atomic reference and a mutex. As long as this is not a bottleneck (which will probably only happen if TCEC ever gets to running this thing on a 16 or 32+ core system), I leave it like this.

Ras · Post by **Ras** » Thu Jun 03, 2021 2:47 pm

mvanthoor wrote: ↑Thu Jun 03, 2021 2:12 pmThanks. In that case, I'm not even going to _try_ and keep the history for the entire game.

Even more so because it would be difficult when going multi-threaded anyway because the history should stay per thread, and there's no guarantee that in the next search, a specific thread gets a similar part of the tree as before.

IT is in the Engine object, shared through an atomic reference and a mutex.

That's not the best idea for multi-threaded search because it is a bottleneck. That's how the threads communicate, after all. There are forum threads here how to do a lockless TT, and that would solve the problem. If you benchmark the initial position with a single worker, what's the NPS impact with and without mutex? Because that's the minimum loss scenario with only the overhead, not yet with the contention.

Ras · Post by **Ras** » Thu Jun 03, 2021 9:24 pm

mvanthoor wrote: ↑Thu Jun 03, 2021 2:12 pma mutex. As long as this is not a bottleneck

Btw., you could also divide the hash table into N domains, each with a mutex of its own, because e.g an access in the lower half doesn't really collide with an access in the upper half. Depending on the hash table size, there could be a mutex for e.g. every 4MB of the hash table. Then you would still have the mutex overhead, but the contention should be less.

mvanthoor · Post by **mvanthoor** » Sat Jun 05, 2021 3:20 pm

I've been retesting the development versions of Rustic in a 10s+0.1s gauntlet.

At the moment, the results are:
2.1.100 (Alpha 2 + Killers): +27 Elo
2.2.100 (2.1.100 + PVS): 24 Elo
2.3.100 (2.2.100 + Aspiration Window): running...

I don't know if I'm going to re-implement history before the tuned and tapered evaluation. I feel only PST's don't provide enough good information. I also wonder if PVS and Aspriation Window (which depend on the evaluation score) would have gotten bigger rating increases if my evaluation already had been better.

If I add tapering and tuning on top of 2.3.100, and it's indeed true that both PVS and AW perform better with a better evaluation, then the Elo boost will not come from tapering and tuning alone.

I still need to run these versions through the 60s+0.6s gauntlet.

I also wonder how its possible why some engines, sometimes 50 or even 100 points ahead of Rustic Alpha 2 in the CCRL 2m+1s list, are completely destroyed in the fast 10s+0.1s time controls against Rustic; so much so that I can't even use them for this gauntlet. Also, some engines still seem to be very sensitive about speed / depth increases of their opponents (older Zahak versions, Clueless 1.4), while others aren't (MinimalChess).

I assume that engines with a stronger evaluation function are less sensitive to speed / depth increases of an opponent that doesn't otherwise change.

To be sure, I'm going to test this backwards, after the tapering and tuning, to see how much Rustic _loses_ when I remove features. If it loses more when I disable PVS than it gained, then the better evaluation definitely boosted the PVS results.

mvanthoor · Post by **mvanthoor** » Sun Jun 06, 2021 11:07 pm

emadsen wrote: ↑Wed Jun 02, 2021 1:31 am My engine had PVS and aspiration windows. I removed aspiration windows (except for MultiPV > 1 and limit strength mode) and the engine plays slightly stronger. In other words, I found in my engine PVS is good enough on its own.

If I understand your post correctly, you've added aspiration windows but don't yet have PVS. I'm guessing you'll gain strength when you add PVS (so aspiration windows + PVS). However, I wonder at that point, if you disable aspiration windows (leaving only PVS), does your engine play stronger than when both are enabled?

Hi Eric,

I've started testing features separately. Killers and PVS are clear winners, testing with CuteChess's SPRT function.

Condition:

Code: Select all

sprt elo0=1 elo1=10 alpha=0.05 beta=0.05

H1: Engine A is at least 1 elo stronger than Engine B.
H0: Engine A is NOT more than 10 elo stronger than Engine B.

The SPRT-test terminates as soon as it confirms either H1 or H0.

I think it's confusing that Elo0 belongs to H1, and Elo1 belongs to H0. At least. At least, this is how I interpret the CuteChess description:

Use a Sequential Probability Ratio Test as a termination
criterion for the match. This option should only be used
in matches between two players to test if engine A is
stronger than engine B. Hypothesis H1 is that A is
stronger than B by at least ELO0 ELO points, and H0
(the null hypothesis) is that A is not stronger than B
by at least ELO1 ELO points. The maximum probabilities
for type I and type II errors outside the interval
[ELO0, ELO1] are ALPHA and BETA. The match is stopped if
either H0 or H1 is accepted or if the maximum number of
games set by '-rounds' and/or '-games' is reached.

Result:

Testing PVS in 2.2.100

Score of Rustic Alpha 2.2.100 vs Rustic Alpha 2.1.100: 304 - 205 - 140 [0.576] 649
... Rustic Alpha 2.2.100 playing White: 175 - 87 - 63 [0.635] 325
... Rustic Alpha 2.2.100 playing Black: 129 - 118 - 77 [0.517] 324
... White vs Black: 293 - 216 - 140 [0.559] 649
Elo difference: 53.4 +/- 23.9, LOS: 100.0 %, DrawRatio: 21.6 %
SPRT: llr 2.95 (100.3%), lbound -2.94, ubound 2.94 - H1 was accepted

Testing killers in 2.1.100:

Score of Rustic Alpha 2.1.100 vs Rustic Alpha 2: 291 - 203 - 145 [0.569] 639
... Rustic Alpha 2.1.100 playing White: 162 - 88 - 70 [0.616] 320
... Rustic Alpha 2.1.100 playing Black: 129 - 115 - 75 [0.522] 319
... White vs Black: 277 - 217 - 145 [0.547] 639
Elo difference: 48.2 +/- 23.9, LOS: 100.0 %, DrawRatio: 22.7 %
SPRT: llr 2.95 (100.3%), lbound -2.94, ubound 2.94 - H1 was accepted

So, roughly 50 Elo per feature in self-play.

(The version with PVS obviously also had killers. The version with killers tested against the current master version, Alpha 2.)

The test with Aspiration Window (on top of killers and pvs, 50cp window, reset to INFINITY if it fails) is now running. It's at 51% +/- 0.2% against the version with killers+pvs, so it's unlikely this is going to make a huge difference. (Edit: while I was typing this post, the version with AW dropped to 49.8%. We're 1200 games into the test. So, I feel as if AW are not going to make any clear difference, at least not with the 0.5 window, and a simple evaluation.)

I'll test aspiration windows with a smaller margin (0.25 or 0.33 instead of 0.50), but if that doesn't bring a clear strength increase, I"m side-lining this feature to test in a later version of Rustic. After this test, I'll re-implement history, and test that as well.

Somewhere I'll have to decide if I'm going to postpone Alpha 3's release further, or just release it with killers+pvs, and use aspiration windows later (and possibly postpone history until I implement LMR).

Ras · Post by **Ras** » Sun Jun 06, 2021 11:25 pm

mvanthoor wrote: ↑Sun Jun 06, 2021 11:07 pmAspiration Window (on top of killers and pvs, 50cp window, reset to INFINITY if it fails

I have a similar implementation with +/- 50 cp, but I use the AW only at main iteration depth >=4. If it fails, I only open the half that it failed towards, i.e. for a fail-low, I open alpha to -INF, and for a fail-high, I open beta to +INF.

mvanthoor · Post by **mvanthoor** » Sun Jun 06, 2021 11:33 pm

Ras wrote: ↑Sun Jun 06, 2021 11:25 pm
mvanthoor wrote: ↑Sun Jun 06, 2021 11:07 pmAspiration Window (on top of killers and pvs, 50cp window, reset to INFINITY if it fails
I have a similar implementation with +/- 50 cp, but I use the AW only at main iteration depth >=4. If it fails, I only open the half that it failed towards, i.e. for a fail-low, I open alpha to -INF, and for a fail-high, I open beta to +INF.

That is another possibility to look into. There's many variables to tinker with.

I canceled the test myself. At 1700 games, the result was exactly even; same number of wins, loss, draws. Clearly, at least with a simple evaluation, just using AW's with 50cp and widening to INF/-INF is not going to make a difference. I refuse to believe that after 1700 games, the 50/50 result could change much.

I keep thinking that testing features that rely on the evaluation (AW, PVS) would become easier if the evaluation is more detailed. That would probably also help in sorting history moves.

Ras · Post by **Ras** » Sun Jun 06, 2021 11:41 pm

mvanthoor wrote: ↑Sun Jun 06, 2021 11:33 pmI keep thinking that testing features that rely on the evaluation (AW, PVS) would become easier if the evaluation is more detailed. That would probably also help in sorting history moves.

Yes, in particular with a more advanced pawn structure evaluation which you then can interleave with the rook positioning. That's more in the realm of what to do when there isn't much going on. It won't even hurt NPS much if you derive both pawn eval and the associated preferred rook files from the pawn hash table.

Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic