Scoring root moves with randomness

lithander · Post by **lithander** » Fri Nov 18, 2022 6:59 pm

While looking into Syzygy tables I found that "Ronald de Man proposed a method to apply randomness [...] at the root" and it made me want to try that in Leorik. Mainly I had two use-cases in mind:

Making the engine play arbitrarily weaker without limiting the search depth with the goal of making it a more interesting sparing partner for human opponents of diverse strength.
Making the engine less deterministic during self-play so that a wider variety of positions are encountered. This could be helpful when using selfplay to generate data for the tuner.

So my first implementation looked like this:

Code: Select all

//Scoring Root Moves with a random bonus: https://www.chessprogramming.org/Ronald_de_Man
int bonus = _rng.Next(_rootRandomness); //generates a random number between 0 and _rootRandomness
int score = bonus - EvaluateTT(1, depth - 1, bonus - MAX_BETA, bonus - alpha, moveGen);

...then I realized that because of iterative-deepening each root-move will be evaluated multiple times and with this implementation the move's bonus would change every time. So I fixed that problem by computing an array with random values after each 'go' command so that each root move would get the same random bonus during the entire search.

Code: Select all

//Scoring Root Moves with a random bonus: https://www.chessprogramming.org/Ronald_de_Man
int bonus = RootBonus[playState.PlayedMoves - 1];
int score = bonus - EvaluateTT(1, depth - 1, bonus - MAX_BETA, bonus - alpha, moveGen);

Both implementation seem to work well in terms of generating a great variety of games from the start position.

But surprisingly (for me) when you pit both implementations against each other with the same range of randomness then the first version is playing much stronger! For example with the random bonus set to [0..50] centipawns the first implementation was playing ~50 Elo stronger. I would have intuitively expected the opposite to be true and first assumed a bug or that I made an error testing. But I found nothing.

How can changing the bonus of moves each search-iteration be benefitial to the quality of the final search result? Any ideas?

hgm · Post by **hgm** » Fri Nov 18, 2022 7:39 pm

This is very mysterious. Fairy-Max randomizes by adding a random score to the incremental evaluaton in its first few moves, and I noticed the same problem that you mention in iterative deepening. The method I developed for 'curing' that was to derive the random bonus from the hash key after the move. I have a random dirived from the time at game start, and XOR that with the hash key. Then I multiply with some number that makes all bits of the key contribute to the upper bits, and shift right enough to get a number in the desired range.

bitsquid · Post by **bitsquid** » Sun Nov 20, 2022 5:05 pm

I guess I was hoping this was a known issue or that someone would be intrigued to try and reproduce that behavior in their own engine. But I undstand that it's not that interesting. But I continued to run matches to look into this and just for completeness sake I'll share my measurements here:

So let's say we have a version A that is the reference implementation without any support for randomness. And we have B and C where you can supply the degree of Randomness (R) via an UCI option.

Each time B plays a root move it will apply a random bonus from the range of [0..R] centipawns. Playing the same move repeatedly during iterative deepening will apply a *different* bonus every time.

When C plays a root move it will also apply a random bonus from the range of [0..R]. But playing the same move repeatedly during iterative deepening will apply the *same* bonus every time.

When you let A and B play the Elo from B's point of view is:

Code: Select all

0 Elo for R=0cp
-7 Elo for R=5cp
-40 Elo for R=20cp
-450 Elo for R=200cp

This is consistent with my expectations. A random bonus of 5cp doesn't really cause bad moves to be played. But 200 centipawns will cause serious blunders.

The surprising results come in when you let B and C play each other and both are configured to use the same Randomness values. The Elo difference from C's point of view is:

Code: Select all

~0 Elo for R=10cp
-60 Elo for R=50cp
-300 Elo for R=100cp

So while both approaches to applying the random bonus result in equal playing strength for small values of R (10cp) as the R value increases version C (a consistent bonus per root move in each iteration) starts to underperform more and more when compared to version B that rolls a new random bonus every time.

And this can be confirmed by playing C against version A: B vs A at R=200cp was losing "only" 450 Elo whereas C already performs worse than that at R=100cp

Code: Select all

Score of C@R=100 vs A: 1 - 195 - 7  [0.022] 203
Elo difference: -657.8 +/- 156.2, LOS: 0.0 %, DrawRatio: 3.4 %

(I stopped this after 200 games but where I report small Elo difference the number of played games is in the thousands so my reported results are statistically significant)

syzygy · Post by **syzygy** » Tue Jan 10, 2023 11:34 pm

lithander wrote: ↑Fri Nov 18, 2022 6:59 pm While looking into Syzygy tables I found that "Ronald de Man proposed a method to apply randomness [...] at the root" and it made me want to try that in Leorik.

Haha, the title of this thread already made me curious

.

At the time my engine desperately needed this to be able to play on FICS since it was otherwise almost completely deterministic (single-threaded, not a huge book, etc.).

...then I realized that because of iterative-deepening each root-move will be evaluated multiple times and with this implementation the move's bonus would change every time. So I fixed that problem by computing an array with random values after each 'go' command so that each root move would get the same random bonus during the entire search.

That is also what I did.

But surprisingly (for me) when you pit both implementations against each other with the same range of randomness then the first version is playing much stronger! For example with the random bonus set to [0..50] centipawns the first implementation was playing ~50 Elo stronger. I would have intuitively expected the opposite to be true and first assumed a bug or that I made an error testing. But I found nothing.

Does your first implementation also regenerate the bonus on re-searches? In that case I suppose things tend to even out a bit and the best move is chosen more often. But if your first implementation picks the bonus for each move once at each iteration, this attempted explanation does not seem to work.

Is there a difference between the two implementations in average depth reached?
A match with fixed depth between the two implementations might be interesting.

lithander · Post by **lithander** » Wed Jan 11, 2023 5:55 pm

syzygy wrote: ↑Tue Jan 10, 2023 11:34 pm Does your first implementation also regenerate the bonus on re-searches? In that case I suppose things tend to even out a bit and the best move is chosen more often. But if your first implementation picks the bonus for each move once at each iteration, this attempted explanation does not seem to work.

Yes, in the first iteration the same move receives a different random bonus in each iteration. And I was happy with that explanation until I recently realized that, since I use Principal Variation Search, I have to apply the same bonus when I conduct the zero-window search around alpha. If I don't, it basically let's only moves through that (without any bonus!) beat alpha. The bar is raised higher and higher by each new candidate move that makes it into the PV. So this filters out the truely inferior moves that only would have a chance to make it into the PV by receiving a lucky bonus also at the PVS level.

This explains why the performance was better as expected: my moves were less random than they should have been.

Bjoern · Post by **Bjoern** » Wed Jan 11, 2023 7:53 pm

Well, I reduce Alpha in All-Nodes by 3/100 (my Pawn Value: ~ 80000) and I have a gain with some other search tweaks of a nice amount of ELO...

lithander · Post by **lithander** » Thu Jan 12, 2023 11:48 am

...and after the fix both implemenation (lookup array vs different random bonus each time a move is played) are equally strong. (+/- 10 Elo with amax random offset of 100cp)

Scoring root moves with randomness

Scoring root moves with randomness

Re: Scoring root moves with randomness

Re: Scoring root moves with randomness

Re: Scoring root moves with randomness

Re: Scoring root moves with randomness

Re: Scoring root moves with randomness

Re: Scoring root moves with randomness