Standard candles

sje · Post by **sje** » Sun Sep 01, 2013 2:38 pm

hgm wrote:If you allow the amount of look-ahead needed to selects moves that checkmate, it would be consistent to explicitly avoid moves that stalemate.

Symbolic does check/checkmate marking in part to produce correct SAN for the moves, so the code is already there. As you know, SAN does not allow for draw indicators, although such was proposed long ago (using the "=" final suffix character).

sje · Post by **sje** » Sun Sep 01, 2013 2:44 pm

Don wrote:You can also build a simple rule interpreter - in a similar style to a paper program (a program that a human can execute unambiguously.) For example if you are in check, try captures of the checking piece first, etc ....

I think that is getting too far off track. I'd first like to see someone convincingly determine the elo of a random mover.

Don · Post by **Don** » Sun Sep 01, 2013 2:44 pm

sje wrote:
hgm wrote:If you allow the amount of look-ahead needed to selects moves that checkmate, it would be consistent to explicitly avoid moves that stalemate.
Symbolic does check/checkmate marking in part to produce correct SAN for the moves, so the code is already there. As you know, SAN does not allow for draw indicators, although such was proposed long ago (using the "=" final suffix character).

Is avoiding stalemate going to improve the play? It will only help when the player who does this is superior to the opponent he is playing against.

We get back into that hairy subject of contempt and how it should be set and what a program is allowed to know about it's opponent.

Don · Post by **Don** » Sun Sep 01, 2013 3:20 pm

sje wrote:
Don wrote:You can also build a simple rule interpreter - in a similar style to a paper program (a program that a human can execute unambiguously.) For example if you are in check, try captures of the checking piece first, etc ....
I think that is getting too far off track. I'd first like to see someone convincingly determine the elo of a random mover.

There have been players on the USCF rating list that have negative ELO. Years ago I proposed fixing random play at zero ELO. I think with rules such as always checkmate if one is available you are already beyond the level of players who just learned the rules and have no chess sense yet.

But I don't know that. I would like to get a sense too of the ELO of a random player. It would be relative to what though? Fide ratings? USCF ratings? BCF ratings?

One thing that could be done is to FIX the rating of uniformly random play to ZERO and call it the absolute standard - then you can interpret other rating system in terms of that - somewhat like we do when we convert Imperial and Metric or Fahrenheit, Celsius, Kelvin, etc ...

In fact I have advocated standardizing the rating system in some way for YEARS. There have been projects in the past to extrapolate the ratings of players like Morphy and Capablanca in their primes by "bridging" them to modern players, in other forming a chain into modern times and going backwards. But it's a shame that we forever have to talk about inflation, deflation and argue over how various system compare to each other without any real standard - and surely there is rating drift over the decades.

If you fix random play to zero (for instance) you then need to relate it to real ratings - obviously you want to know how this compares to FIDE ratings. You cannot do this in any practical way without incorporating search and real chess programs. You cannot play a random player against Komodo and expect to get an accurate rating. At some point you have to build a reference player and let him compete. To get people willing to play this reference player he needs to be weaker than the top chess programs and you probably won't get top players to play it even then unless you pay them - but it's probably best to shoot for a level where players are plentiful but still in the FIDE range.

Komodo when set to 100 nodes per second is probably still between 1500-1700 FIDE ELO and maybe even stronger. It will do 6 and 7 ply searches in the middlegame at tournament time controls. Super Constellation achieved a USCF rating of 2018 and generally did 4 or 5 ply searches. It was probably more like 1600 or 1700 FIDE perhaps. But we could at least try to determine if we are within 2 or 3 hundred ELO by setting random play to zero.

It's going to be tricky because ELO is not 100% transitive. How you test can affect the ELO differences and WHO you test against has an impact. So if you are going to create many players and "bridge" them with many steps between the weakest and the strongest then you should also formalize the testing conditions and methodology used to do this.

I can tell you this however, you are NEVER going to get very far just trying simplistic rules without any real lookahead - for example a 1 ply modern search is going to crush beyond reason any combination of trivial rules you come up with such as "always play checkmate" and such. This may be fun but you are going to have to do a lot more if you intend to assign a rating to a random player.

So what do you propose for your "bridge" players or advanced "candles" that will actually get you up to some real level of play?

Don · Post by **Don** » Sun Sep 01, 2013 3:38 pm

sje wrote: ... I'd first like to see someone convincingly determine the elo of a random mover.

We could at least start by comparing it to a real program doing a 1 ply search. My guess is that it will win or draw only 1 game out of hundreds or even thousands.

With mate-grabber and some other enhancements we might get it close enough to be able to establish a relative rating - it probably needs to be within a few hundred ELO in order to get a reasonably accurate comparison.

Another complexity is that programs are never given absolute ratings - you cannot talk about how strong a 7 ply search is or a fixed node 10,000 search is except when compared directly to something else specified in absolute terms. MOST rating lists compare programs at various TIME CONTROLS and so when Stockfish is tested at 40/4 and then 40/40 it is playing MUCH stronger at 40/40 but it's being compared to programs also playing much stronger. Even though most humans don't realize it they also play hundreds of ELO stronger when playing tournament time controls compared to Blitz - and in fact they are more scalable than computers.

So how can you even say that a random mover plays at such and such an ELO? What is that in comparison to? How would you propose assign an ELO value for random play?

Don · Post by **Don** » Sun Sep 01, 2013 4:14 pm

Don wrote: So how can you even say that a random mover plays at such and such an ELO? What is that in comparison to? How would you propose assign an ELO value for random play?

The answer is that you cannot. Buy a new and faster computer and the same old programs are suddenly 100 ELO stronger. Does that mean the random-mover is suddenly 100 ELO weaker? A random mover is 500 ELO stronger in a human blitz tournament that it wold be in a standard time control tournament.

But it's still useful to benchmark such a program and it's valid as long as you define precisely what you are measuring. My solution for these fixed entity programs is to not bother - just define it to be zero ELO for the sake of fixed algorithm programs and go from there.

hgm · Post by **hgm** » Sun Sep 01, 2013 4:50 pm

Don wrote:Is avoiding stalemate going to improve the play?

It is almost always the weak side that is stalemated. It is really difficult to stalemate someone that has a lot of material without having even more material yourself. (The mentioned KPK is the exception.) I am pretty sure that more than 99% of all stalemates in the capturer-vs-random would be won by the capturer if he had avoided stalemate. They are probably mostly positions where you are 3 of 4 Queens ahead.

Uri Blass · Post by **Uri Blass** » Sun Sep 01, 2013 5:06 pm

sje wrote:Four more matches between MaterGrabber and Random:
Code: Select all
W/L/D/T: [848,150/2,513/149,337/1,000,000]
W/L/D/T: [847,410/2,614/149,976/1,000,000]
W/L/D/T: [848,227/2,598/149,175/1,000,000]
W/L/D/T: [847,365/2,620/150,015/1,000,000]
It's interesting that Random can win any games at all.

The random player can even beat me if it is lucky even if the probability is practically very close to 0(less than 1/10^20).

MateGrabber has no chance because it is going to make a stupid capture very soon because I believe that in order to beat me you need to play a move that is not capture when you can capture.

I am not surprised that the random player can win games.

1.f3 e5 2.g4 Qh4 mate has probability of more than 1/1000,000
and
1.f3 e6 2.g4 Qh4 or 1.g4 e6 2.f3 Qh4 or 1.g4 e6 2.f4 Qh4 or
1.g4 e5 2.f4 Qh4 are also possible with probability bigger than 1/1,000,000.

Don · Post by **Don** » Sun Sep 01, 2013 5:09 pm

Uri Blass wrote:
sje wrote:Four more matches between MaterGrabber and Random:
Code: Select all
W/L/D/T: [848,150/2,513/149,337/1,000,000]
W/L/D/T: [847,410/2,614/149,976/1,000,000]
W/L/D/T: [848,227/2,598/149,175/1,000,000]
W/L/D/T: [847,365/2,620/150,015/1,000,000]
It's interesting that Random can win any games at all.
The random player can even beat me if it is lucky even if the probability is practically very close to 0(less than 1/10^20).

MateGrabber has no chance because it is going to make a stupid capture very soon because I believe that in order to beat me you need to play a move that is not capture when you can capture.

I am not surprised that the random player can win games.

1.f3 e5 2.g4 Qh4 mate has probability of more than 1/1000,000
and
1.f3 e6 2.g4 Qh4 or 1.g4 e6 2.f3 Qh4 or 1.g4 e6 2.f4 Qh4 or
1.g4 e5 2.f4 Qh4 are also possible with probability bigger than 1/1,000,000.

A random mover is capable of playing a perfect game of chess, even if the odds of that happened are infinitesimal. A program such as Houdini, Komodo or Stockfish may perhaps be LESS likely of playing perfect chess since their moves are more controlled. For example if the best move goes against the heurstics of a strong program it will NEVER play it (unless of course you allow it enough time.)

Don · Post by **Don** » Sun Sep 01, 2013 5:13 pm

hgm wrote:
Don wrote:Is avoiding stalemate going to improve the play?
It is almost always the weak side that is stalemated. It is really difficult to stalemate someone that has a lot of material without having even more material yourself. (The mentioned KPK is the exception.) I am pretty sure that more than 99% of all stalemates in the capturer-vs-random would be won by the capturer if he had avoided stalemate. They are probably mostly positions where you are 3 of 4 Queens ahead.

That sounds right to me. So avoiding stalemate probably would be a good strategy to improve your results.

Standard candles

Re: A new player in a Tale of Two Players

Re: Four more runs

Re: A new player in a Tale of Two Players

Re: Four more runs

Re: Four more runs

Re: Four more runs

Re: A new player in a Tale of Two Players

Re: Four more runs

Re: Four more runs

Re: A new player in a Tale of Two Players