Ways to avoid "Draw Death" in Computer Chess

Uri Blass · Post by **Uri Blass** » Sat Jul 29, 2017 10:51 am

Laskos wrote:
Guenther wrote:
Laskos wrote:
Uri Blass wrote:I can add that I played my opponent and played many moves that allow my opponent to blunder even if they are not best objectively and of course did not resign when I saw mate in the next move but I believe even with normal game when I use longer time control and never resign a player with fide rating 2000 will usually win considering the fact that he will play clearly better than what I played and probability of 1 to 10 to random move that in big percentage of the cases may be a blunder is too much.
I think these games with humans who trick engines and know what strategy to adopt are pretty irrelevant for establishing CCRL rating. Recently Komodo has beaten Fritz 11 (2850 or so CCRL) in Knight-odds match, but lost 0-3 against a human FM of about 2100 FIDE ELO who knew how to play these odds matches.
I think you have not played the random% player versions vs. at least one other program, but only against other versions of random%?
Actually I believe with normal players Uri does not mean Humans only, but simply _other_ players with an established (CCRL) rating.
I am sure there will be different results then.

Guenther

Edit:
I replied w/o reading the last posts in this thread it seems, now I see Uri confirmed my belief.
Yes, but I now see Brutus_RND on CCRL 40/4 with 200 rating (Bayeselo, a bit deflated rating), and as I used Ordo for rating, it is compatible with my -100 to -200 rating of Random Player. I don't think of these differences as important for the bulk of the discussion, whether Uri or some weak engines beat Random 10% or not.

CCRL 40/4 is based on buggy engines that sometimes make draws against the random engine because of bugs

The random mover lost 20-0 against Ram2 that has CCRL rating of 517 when it lost only 10.5-9.5 against LaMoSca 0.10 with 19 draws

I believe that if you use only normal engines you get clearly lower rating for the random engine(normal engines can be every normal program at small number of nodes when I define normal program as a program that know the rules and usually will not make a move that draw the game immediately based on fifty move rule or repetition in a better position).

Guenther · Post by **Guenther** » Sat Jul 29, 2017 12:26 pm

Uri Blass wrote:
Laskos wrote:
Guenther wrote:
Laskos wrote:
Uri Blass wrote:I can add that I played my opponent and played many moves that allow my opponent to blunder even if they are not best objectively and of course did not resign when I saw mate in the next move but I believe even with normal game when I use longer time control and never resign a player with fide rating 2000 will usually win considering the fact that he will play clearly better than what I played and probability of 1 to 10 to random move that in big percentage of the cases may be a blunder is too much.
I think these games with humans who trick engines and know what strategy to adopt are pretty irrelevant for establishing CCRL rating. Recently Komodo has beaten Fritz 11 (2850 or so CCRL) in Knight-odds match, but lost 0-3 against a human FM of about 2100 FIDE ELO who knew how to play these odds matches.
I think you have not played the random% player versions vs. at least one other program, but only against other versions of random%?
Actually I believe with normal players Uri does not mean Humans only, but simply _other_ players with an established (CCRL) rating.
I am sure there will be different results then.

Guenther

Edit:
I replied w/o reading the last posts in this thread it seems, now I see Uri confirmed my belief.
Yes, but I now see Brutus_RND on CCRL 40/4 with 200 rating (Bayeselo, a bit deflated rating), and as I used Ordo for rating, it is compatible with my -100 to -200 rating of Random Player. I don't think of these differences as important for the bulk of the discussion, whether Uri or some weak engines beat Random 10% or not.
CCRL 40/4 is based on buggy engines that sometimes make draws against the random engine because of bugs

The random mover lost 20-0 against Ram2 that has CCRL rating of 517 when it lost only 10.5-9.5 against LaMoSca 0.10 with 19 draws

I believe that if you use only normal engines you get clearly lower rating for the random engine(normal engines can be every normal program at small number of nodes when I define normal program as a program that know the rules and usually will not make a move that draw the game immediately based on fifty move rule or repetition in a better position).

From a test running since yesterday. NEG 1.2 is a slightly improved version over NEG 0.3d which should avoid more stalemates.
Obviously it still stalemates from time to time, I will add Ram and RuyRandom and together with the games of Daniel get an ordo calculation.

Code: Select all

RWBC CAPPUCCINO  2017
                             1           2            3                  Tot.                                                                                          
1   N.E.G. 1.2               96.5 - 3.5  97.0 - 3.0   99.5 - 0.5  **     293.0/300
2   AndRand_08902-64         3.5 - 96.5  50.0 - 50.0  89.0 - 11.0  **    142.5/300
3   Brutus_RND_01            3.0 - 97.0  50.0 - 50.0  87.0 - 13.0   **   140.0/300
4   Andworst -0.3n &#40;UCI2WB&#41;  0.5 - 99.5  11.0 - 89.0  13.0 - 87.0    **   24.5/300

Brutus Rnd has earned most of its rating in CCRL because LaMosca, Ace and POS allowed quite a lot of draws due to bugs otherwise it would be already around -300 in the CCRL scale for 40/4.

BTW I think it is nearly impossible to get a reliable rating for the % random versions, or at least you'll need much more games than
for normal rating calculations.
It is completely unpredictable how much a random move loses during a game, if it only happens %-wise.

Adam Hair · Post by **Adam Hair** » Sat Jul 29, 2017 1:04 pm

Here is one of Brutus RND's draws:

[pgn]
[Event "CCRL 40/4"]
[Site "CCRL"]
[Date "2012.07.08"]
[Round "173.6.487"]
[White "Brutus RND"]
[Black "Iota 1.0 32-bit"]
[Result "1/2-1/2"]
[ECO "A09"]
[Opening "Reti accepted"]
[PlyCount "102"]
[WhiteElo "208"]
[BlackElo "964"]

1. Nf3 d5 2. c4 dxc4 3. g3 e6 4. Nc3 Bd7 5. Nb1 Bc6 6. Bg2 b5 7. Rf1 Bc5 8. d4
cxd3 9. Qd2 dxe2 10. Qd6 exf1=Q+ 11. Bxf1 Bxd6 12. Nfd2 f5 13. Bh3 Qg5 14. g4
fxg4 15. Kd1 gxh3 16. f4 Bxf4 17. b3 Qe5 18. Ba3 Qxa1 19. b4 Bxh2 20. Ke1 Bg3+
21. Kd1 h2 22. Bc1 h1=Q+ 23. Nf1 Qxf1+ 24. Kd2 Qxb1 25. Kc3 Be5+ 26. Kd2 Qxa2+
27. Bb2 Qh3 28. Ke2 Qxb2+ 29. Ke1 Nd7 30. Kd1 Bd5 31. Ke1 g5 32. Kd1 h5 33. Ke1
Ngf6 34. Kd1 Qxb4 35. Ke2 Ne4 36. Kd1 a5 37. Kc2 Qf3 38. Kc1 Qc4+ 39. Kb1 Qa4
40. Kc1 c5 41. Kb1 g4 42. Kc1 g3 43. Kb1 g2 44. Kc1 Bd4 45. Kb1 Ne5 46. Kc1 Ke7
47. Kb1 Qg3 48. Kc1 Qh3 49. Kb1 Qg3 50. Kc1 Qh3 51. Kb1 Qg3 1/2-1/2[/pgn]

IIRC, the rest are similar. The other engines have material superiority but are unable to mate Brutus RND.

hgm · Post by **hgm** » Sat Jul 29, 2017 1:37 pm

N.E.G. 1.2 uses a very simplistic kludge to avoid stalemates: there is a penalty on capturing the opponent's last minor, when it already has a winning advantage. This enormously reduces the likelihood of stalemating the opponent. But occasionally the minor gets pinned, and then it can run into a stalemate.

N.E.G.'s success in converting large material advantages is caused by its preference for moves that deliver (safe) check with another piece than it last moved.

BTW, N.E.G. is not a 'normal engine'; it has no alpha-beta search. It just counts how many times each square is attacked by each side, and what the lowest attacker is, and uses that to decide if it is safe to capture there or remain on that square.

The problem with the low end of the rating list is that the Elo model completely fails for buggy engines. Losing through illegal moves or crashes can happen irrespective of rating difference, and even when you weed out such games, draws because of failing repetiition or stalemate detection, or insufficient appreciation of checkmate can happen against arbitrarily weak opponents.

It would be better to characterize engies by two numbers: a playing strength, and a 'failure-to-convert probability'. Games as the Brutus RND game shown by Adam above should not count as a draw, but as a 'failure to convert'. As far as Brutus RND's playing strength is concerned, it should not count as a draw, as that would grossly overrate Brutus RND's performance in this game. So for the rating the game should be ignored (or counted as a win for Iota), and it should increase Iota's failure-to-convert score.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sat Jul 29, 2017 2:42 pm

Uri Blass wrote:
Laskos wrote:
Uri Blass wrote:
I think that you overestimate the random player.
I think that a player with rating 1900 is closer to perfect player relative to random player.

I do not know how you get 0 elo for random player and it seems to me high.
Maybe it is because some weak engines allow stalemates but
I believe that if you take non buggy engines that do not allow stalemates and play them at fixed depths then you will get more than 3600 elo difference between depth 1 and depth 20 when depth 1 is clearly more than 400 elo better than the random player and I believe more than 800 elo better than the random player.
No, I tested pretty thoroughly the random player to be at about -100 to -200 CCRL 40/40 ELO points according to Logistic (which is pretty firmly established for engine-engine matches on large ELO span). Look at this thread:
http://talkchess.com/forum/viewtopic.ph ... =0&t=62510
There I have a table:
Code: Select all
   # PLAYER         &#58; RATING    POINTS  PLAYED    (%) 
   1 Random 0%      &#58; 2697.0     935.0    1000   93.5% 
   2 Random 10%     &#58; 2229.8    1033.0    2000   51.6% 
   3 Random 20%     &#58; 1632.3     970.0    2000   48.5% 
   4 Random 30%     &#58; 1156.3     582.0    2000   29.1% 
   5 Random 40%     &#58; 1142.2    1217.0    2000   60.9% 
   6 Random 50%     &#58;  961.6    1148.0    2000   57.4% 
   7 Random 60%     &#58;  604.0     820.5    2000   41.0% 
   8 Random 70%     &#58;  450.9    1097.5    2000   54.9% 
   9 Random 80%     &#58;  204.7     872.0    2000   43.6% 
  10 Random 90%     &#58;   76.6    1115.0    2000   55.8% 
  11 Random 100%    &#58; -155.6     210.0    1000   21.0%
given in CCRL 40/40 ELO points. So, in my reply to Sven, I took -100 to -200 for random player, 1700-1800 for strong amateur and Zurichess_00, and 3800-3900 for non-losing from standard opening position player. These are all supported by empirical data.
It may be interesting to test not only against random players but against normal engines or humans.

I cannot believe that random 20% can achieve fide rating of 1600 against humans.

It seems to me an engine that I guess that I can easily win against it
at blitz(5 minutes per game) and when I am clearly better than fide rating 1600 I believe my level at blitz is lower than 1600 fide rating(at tournament time control)

jumping in here to somment without having read the whole thread, anyway:

- a random mover is a relatively strong engine, it will make the strongest move about once every 30 moves, and often make the 2nd-best, 3rd-best or 10-th best move

- the real deal will be a worst-mover, that would pick only the very worst moves

how stronger would be a random-mover next to the worst-mover?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sat Jul 29, 2017 2:47 pm

Laskos wrote:
Uri Blass wrote:I can add that I played my opponent and played many moves that allow my opponent to blunder even if they are not best objectively and of course did not resign when I saw mate in the next move but I believe even with normal game when I use longer time control and never resign a player with fide rating 2000 will usually win considering the fact that he will play clearly better than what I played and probability of 1 to 10 to random move that in big percentage of the cases may be a blunder is too much.
I think these games with humans who trick engines and know what strategy to adopt are pretty irrelevant for establishing CCRL rating. Recently Komodo has beaten Fritz 11 (2850 or so CCRL) in Knight-odds match, but lost 0-3 against a human FM of about 2100 FIDE ELO who knew how to play these odds matches.

Larry has been using much faster TC in this match, and, most importantly, allocated many more cores to Komodo.

why allocate more cores, when you are testing engines?

so, I guess, Larry's results are off by at least 300 elo or so.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sat Jul 29, 2017 2:52 pm

Adam Hair wrote:Here is one of Brutus RND's draws:

[pgn]
[Event "CCRL 40/4"]
[Site "CCRL"]
[Date "2012.07.08"]
[Round "173.6.487"]
[White "Brutus RND"]
[Black "Iota 1.0 32-bit"]
[Result "1/2-1/2"]
[ECO "A09"]
[Opening "Reti accepted"]
[PlyCount "102"]
[WhiteElo "208"]
[BlackElo "964"]

1. Nf3 d5 2. c4 dxc4 3. g3 e6 4. Nc3 Bd7 5. Nb1 Bc6 6. Bg2 b5 7. Rf1 Bc5 8. d4
cxd3 9. Qd2 dxe2 10. Qd6 exf1=Q+ 11. Bxf1 Bxd6 12. Nfd2 f5 13. Bh3 Qg5 14. g4
fxg4 15. Kd1 gxh3 16. f4 Bxf4 17. b3 Qe5 18. Ba3 Qxa1 19. b4 Bxh2 20. Ke1 Bg3+
21. Kd1 h2 22. Bc1 h1=Q+ 23. Nf1 Qxf1+ 24. Kd2 Qxb1 25. Kc3 Be5+ 26. Kd2 Qxa2+
27. Bb2 Qh3 28. Ke2 Qxb2+ 29. Ke1 Nd7 30. Kd1 Bd5 31. Ke1 g5 32. Kd1 h5 33. Ke1
Ngf6 34. Kd1 Qxb4 35. Ke2 Ne4 36. Kd1 a5 37. Kc2 Qf3 38. Kc1 Qc4+ 39. Kb1 Qa4
40. Kc1 c5 41. Kb1 g4 42. Kc1 g3 43. Kb1 g2 44. Kc1 Bd4 45. Kb1 Ne5 46. Kc1 Ke7
47. Kb1 Qg3 48. Kc1 Qh3 49. Kb1 Qg3 50. Kc1 Qh3 51. Kb1 Qg3 1/2-1/2[/pgn]

IIRC, the rest are similar. The other engines have material superiority but are unable to mate Brutus RND.

they are making too many good moves for random-movers.

I especially like 5.Nb1

I don't know why I have the strange impression everyone has gone mad on this thread...

Guenther · Post by **Guenther** » Sat Jul 29, 2017 2:56 pm

Lyudmil Tsvetkov wrote:
Uri Blass wrote:
Laskos wrote:
Uri Blass wrote:
I think that you overestimate the random player.
I think that a player with rating 1900 is closer to perfect player relative to random player.

I do not know how you get 0 elo for random player and it seems to me high.
Maybe it is because some weak engines allow stalemates but
I believe that if you take non buggy engines that do not allow stalemates and play them at fixed depths then you will get more than 3600 elo difference between depth 1 and depth 20 when depth 1 is clearly more than 400 elo better than the random player and I believe more than 800 elo better than the random player.
No, I tested pretty thoroughly the random player to be at about -100 to -200 CCRL 40/40 ELO points according to Logistic (which is pretty firmly established for engine-engine matches on large ELO span). Look at this thread:
http://talkchess.com/forum/viewtopic.ph ... =0&t=62510
There I have a table:
Code: Select all
   # PLAYER         &#58; RATING    POINTS  PLAYED    (%) 
   1 Random 0%      &#58; 2697.0     935.0    1000   93.5% 
   2 Random 10%     &#58; 2229.8    1033.0    2000   51.6% 
   3 Random 20%     &#58; 1632.3     970.0    2000   48.5% 
   4 Random 30%     &#58; 1156.3     582.0    2000   29.1% 
   5 Random 40%     &#58; 1142.2    1217.0    2000   60.9% 
   6 Random 50%     &#58;  961.6    1148.0    2000   57.4% 
   7 Random 60%     &#58;  604.0     820.5    2000   41.0% 
   8 Random 70%     &#58;  450.9    1097.5    2000   54.9% 
   9 Random 80%     &#58;  204.7     872.0    2000   43.6% 
  10 Random 90%     &#58;   76.6    1115.0    2000   55.8% 
  11 Random 100%    &#58; -155.6     210.0    1000   21.0%
given in CCRL 40/40 ELO points. So, in my reply to Sven, I took -100 to -200 for random player, 1700-1800 for strong amateur and Zurichess_00, and 3800-3900 for non-losing from standard opening position player. These are all supported by empirical data.
It may be interesting to test not only against random players but against normal engines or humans.

I cannot believe that random 20% can achieve fide rating of 1600 against humans.

It seems to me an engine that I guess that I can easily win against it
at blitz(5 minutes per game) and when I am clearly better than fide rating 1600 I believe my level at blitz is lower than 1600 fide rating(at tournament time control)
jumping in here to somment without having read the whole thread, anyway:

- a random mover is a relatively strong engine, it will make the strongest move about once every 30 moves, and often make the 2nd-best, 3rd-best or 10-th best move

No it is very weak, because even a random best move doesn't help if you already blundered a dozen of times before...

The random movers in my test made around 11-13% vs. Andworst to answer your last question.
In CCRL the RM has around 200 rating on CCRL scale but only due to the fact it played 3-5 buggy programs (out of its opponents), which stalemated too often or did not know 3 time-rep,
otherwise it would be around -200/-300.

Of course all of this was already answered and posted in this thread if you had read it...

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sat Jul 29, 2017 2:57 pm

cdani wrote:
Guenther wrote: IIRC that was only your first attempt, which was not truly random?
That should be the one:
http://www.andscacs.com/randscacs089025.zip
Ah! I didn't remember it. Well, the former was random, but not at depth 1 if I remember well. The new one you point also worked at depth 1. And also was stronger.

how stronger is a random-mover than worst-mover?

how stronger would be SF than some 2000 elo engine, and this engine in turn than the random-mover?

maybe in this case we will need some 10 000 elo scale.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sat Jul 29, 2017 3:01 pm

Evert wrote:What's to avoid? It's pretty clear that chess (FIDE rules) is a draw from the initial position with optimal play from both sides.
Unless you rig the opening by having one side play vastly suboptimal, a draw is the expected outcome. If you do unbalance the opening like that, you're not measuring a result, just confirming your input.

Or am I missing something?

1.c4 wins.

Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess

Re: Ways to avoid "Draw Death" in Computer Chess