Standard candles

Daniel Shawul · Post by **Daniel Shawul** » Fri Aug 30, 2013 1:31 am

The 'standard candle' is already there, namely 200 ELO points correspond to 75% winning ratio. That doesn't say anything about which two players you are comparing so I don't see why the focus now is on one of the players. A beats B by 75% is same as C beats D by 75%. Even if we forgive that, peaking a random move selector is probably the worst choice because you don't have many similar strength opponents to gauge its ELO with. Even the standard ELO definition above is not that important as evidenced by bayeselo that uses different scales based on the number of draws (because mostly we are interested in relative performance of current players). To compare two different pool of players from different era, one of the new players have to play games with one of the players from the old group (preferable the reference player). For a reference computer player, all hardware and software specifications should be the same, so saying just a 'random player' does not cut it. I would rather pick a computer player of say roughly current 1500 ELO club player, and then specify the exact hardware to be run on at any time even in the future. The choice of a 'random player' as a reference could only add inconveniences when trying to gauge its strength IMHO.

michiguel · Post by **michiguel** » Fri Aug 30, 2013 1:39 am

Don wrote:
sje wrote:A detailed approach:

1.Take a source snapshot of a strong, open source program like Stockfish. Call the program made form this snapshot "Candle-0".

2. Modify the program source in only one way: after a search completes. generate a random number on the unit (0..1) interval. If this random number is less than a given percentage (zero percent for Candle-0), then this triggers the generation of a second random number. This second number is an integer used to select a move from all the available ply zero moves.

3. For Candle-0, the first random number never has an effect, so the second number is never generated.

4. Successively make Candle-1, Candle-2 up to Candle-100. Each program performs the same as its predecessor expect for a one point percentage addition to its random trigger.

5. Run all the programs against each other in a huge round robin tournament with game/hour sudden death controls for about a year.

6. A the end of all this, you'll have a very good idea as to the elo of a random program.
This would produce a lot of versions of different strengths, but it's too bizarre and unbalanced. Why not just use a node level? Something like:

candle(n) = 100 * 2^n nodes

candle(0) = 100 nodes
candle(1) = 200 nodes
candle(2) = 400 nodes

etc ...

Having a program play like a super Grandmaster but blundering horribly a percentage of the time is really ugly.

There is a more natural way to do your idea though. Program runs in multipv mode of INF - in other words RANK all the moves from best to worst.

You then choose a random number between 0 and the number of legal moves. That is the weakest play and is completely random. To get to the next strongest candle you generate a second random number between zero and the FIRST random number. You now have a curve where the best move is favored a little more than the second best and so on. The worst move is still possible but less likely than any other move. You can continue this process with additional random numbers.

Yes, something like this is a good idea. A similar one could be to get a random number to pick among the top 50% of the moves, in another engine from the top 30% etc. etc. The moves should be sorted with something like fixed depth=6 etc. and the engines could be more than one, of different levels. All of them will converge to the same value when the moves are totally random.

This will generate enough "weak engines" to accurately calculate the elo of a random mover, and assign a fixed value to it (zero?).

This could be a reference. Of course, at different time controls, the elo of all "normal" engines will shift, because the random mover will always be zero (won't change with time). But may be that is the first step to actually have an absolute elo scale.

I think there are some interesting considerations here to link these type of absolute scale with entropic properties of move selection.

Miguel

After 20 random numbers you will almost always pick the top Stockfish move but with some very slight chance of picking the 2nd best. The odds of picking the 3rd or 4th best are exceedingly remote and it's possible to still pick the worst move if you are willing to wait a few trillion years. But with more random numbers you will asymptotically approach the full strength of Stockfish. And I think the quality (even if low) will seem a lot more consistent from move to move.

Of course you will take a hit for multipv mode of perhaps 100 ELO - I don't know for sure.

mcostalba · Post by **mcostalba** » Fri Aug 30, 2013 6:44 am

mcostalba wrote: So here is my suggestion.

take and engine and modify so that:

1) Generates all the moves and for each one calls SEE

2) The result of the "search" of each move is simply the eval of the position after the move + SEE score

3) The eval of the position is simply given by the psq score.

4) The engine chooses as best move the move with highest eval. If two or more moves have the same eval the one that starts from the lowest square is done (this is just a convention to make the moves reproducible also on different generators)

I have to add another condition:

5) Games with this engine are adjudicated as wins soon as the engine returns a score value >= a rook or as lost for score <= rook. This is because this kind of engine very difficult will be able to mate (of course in case of a pure random engine it will never mate).

To make the engine more competitive I also do one correction and a different way to generate the moves.

1bis) The engine will do a perft 2 of a position and evaluate psq + SEE of each leaf node.

hgm · Post by **hgm** » Fri Aug 30, 2013 9:30 am

mcostalba wrote:take and engine and modify so that:

1) Generates all the moves and for each one calls SEE

2) The result of the "search" of each move is simply the eval of the position after the move + SEE score

3) The eval of the position is simply given by the psq score.

4) The engine chooses as best move the move with highest eval. If two or more moves have the same eval the one that starts from the lowest square is done (this is just a convention to make the moves reproducible also on different generators)

This is basically what N.E.G. is: a version of Joker modified this way. The only difference is that it also gets a small bonus for moves that deliver check, and that it also takes into account the SEE of from-square. (Joker's static move ordering already did that.)

hgm · Post by **hgm** » Fri Aug 30, 2013 9:47 am

Don wrote:This would produce a lot of versions of different strengths, but it's too bizarre and unbalanced. Why not just use a node level? Something like:
....

I still have plans to generate a series of extremely weak engines through the 'Beal effect': using purely random evaluation and a normal alpha-beta search, and then vary the search depth. At 1 ply this should be a purely random mover. But at larger depths it should try to optimize its mobility.

My first attempt for this failed, however, because the search would find checkmates, and score them independently of the evaluation. And of course that really helps; if you can see mate-in-6, but otherwise move randomly, you have a really huge advantage to a pure random mover.

I am still looking for good ways to solve this. One idea I had (but did not implement yet) is to treat checkmates differently. Recognize stalemate in search, (and score it as 0, like repetitions), but otherwise search to King capture, and even beyond. The idea was to allow a side that no longer has a King to play only null moves. But the opponent can still play on until the nominal depth, and will do a (random) evaluation there. This allows him to pick the best node his one-sided perft tree can reach from the King-capture position, and this would presumably be better for a deep perft tree (behind an early King capture) than for a shallow one (where the capture occurs close to the horizon). This would generate 'mate scores' in a way more in line with the eval scores, and also subject the finding of mates to the fuzzy, random process of the Beal effect.

This should build a series of progressively stronger standard engines Ram(n), for search depth n=1, 2, ..., where Ram(1) would be a purely random mover. There still is an unspecified design parameter, though: the distribution from which the random evals are drawn.

Michel · Post by **Michel** » Fri Aug 30, 2013 10:50 am

Be ELO independent from PC hardware and from TC (that's why a unmodified normal engine that thinks more when has more time or when runs on a faster hardware is not suitable).

What do you mean independence of TC? For example an engine that always searches a fixed number of nodes/moves regardless of TC?

That would mean the elo of other engines measured against it would depend strongly on TC, something you don't want I presume...

More suitable is to fix the NPS and compute the number of nodes accordingly.

Uri Blass · Post by **Uri Blass** » Fri Aug 30, 2013 11:12 am

hgm wrote:
Don wrote:This would produce a lot of versions of different strengths, but it's too bizarre and unbalanced. Why not just use a node level? Something like:
....
I still have plans to generate a series of extremely weak engines through the 'Beal effect': using purely random evaluation and a normal alpha-beta search, and then vary the search depth. At 1 ply this should be a purely random mover. But at larger depths it should try to optimize its mobility.

My first attempt for this failed, however, because the search would find checkmates, and score them independently of the evaluation. And of course that really helps; if you can see mate-in-6, but otherwise move randomly, you have a really huge advantage to a pure random mover.

I am still looking for good ways to solve this. One idea I had (but did not implement yet) is to treat checkmates differently. Recognize stalemate in search, (and score it as 0, like repetitions), but otherwise search to King capture, and even beyond. The idea was to allow a side that no longer has a King to play only null moves. But the opponent can still play on until the nominal depth, and will do a (random) evaluation there. This allows him to pick the best node his one-sided perft tree can reach from the King-capture position, and this would presumably be better for a deep perft tree (behind an early King capture) than for a shallow one (where the capture occurs close to the horizon). This would generate 'mate scores' in a way more in line with the eval scores, and also subject the finding of mates to the fuzzy, random process of the Beal effect.

This should build a series of progressively stronger standard engines Ram(n), for search depth n=1, 2, ..., where Ram(1) would be a purely random mover. There still is an unspecified design parameter, though: the distribution from which the random evals are drawn.

I think that finding mates is part of the Beal effect because checkmate limit the mobility of the opponent to 0 and the beal effect is about mobility.

You can test also only the mate effect by having a program that play random move except the case that it finds mate.

simply search every move to fixed depth and later if the best move does not force mate based on your search play a random moves from the moves that you did not find mate against yourself.

My guess is that most of the rating difference that the beal effect generates is not because of finding mates.

I guess that you can be 400 elo stronger than the random mover because of finding mates but more than 1000 elo stronger than the random mover because of the full beal effect that includes also finding mates.

sje · Post by **sje** » Fri Aug 30, 2013 1:09 pm

A Tale of Two Players

Player Zero will play a checkmating move if one is available; otherwise, it will play a randomly selected move.

Player One will play a randomly selected move.

What happens in a one million game match between these players? With alternating colors and from the viewpoint of Player Zero, the win/lose/draw/total counts are:

Code: Select all

&#91;532,416/54,688/412,896/1,000,000&#93;

hgm · Post by **hgm** » Fri Aug 30, 2013 1:18 pm

win / draw / loss / total?

This assumes either player will immediately claim a draw when they can do so? Or do they never claim?

sje · Post by **sje** » Fri Aug 30, 2013 1:48 pm

The format is WLDT (win/lose/draw/total).

A draw is claimed when it occurs, but this only happens as the result of a randomly selected move.

Standard candles

Re: Standard candles

Re: Candlemaking

Re: Candlemaking

Re: Candlemaking

Re: Candlemaking

Re: Candlemaking

Re: Candlemaking

A Tale of Two Players

Re: A Tale of Two Players

Re: A Tale of Two Players