Uri Blass wrote:I can add that I played my opponent and played many moves that allow my opponent to blunder even if they are not best objectively and of course did not resign when I saw mate in the next move but I believe even with normal game when I use longer time control and never resign a player with fide rating 2000 will usually win considering the fact that he will play clearly better than what I played and probability of 1 to 10 to random move that in big percentage of the cases may be a blunder is too much.
I think these games with humans who trick engines and know what strategy to adopt are pretty irrelevant for establishing CCRL rating. Recently Komodo has beaten Fritz 11 (2850 or so CCRL) in Knight-odds match, but lost 0-3 against a human FM of about 2100 FIDE ELO who knew how to play these odds matches.
I think you have not played the random% player versions vs. at least one other program, but only against other versions of random%?
Actually I believe with normal players Uri does not mean Humans only, but simply _other_ players with an established (CCRL) rating.
I am sure there will be different results then.
Guenther
Edit:
I replied w/o reading the last posts in this thread it seems, now I see Uri confirmed my belief.
Yes, but I now see Brutus_RND on CCRL 40/4 with 200 rating (Bayeselo, a bit deflated rating), and as I used Ordo for rating, it is compatible with my -100 to -200 rating of Random Player. I don't think of these differences as important for the bulk of the discussion, whether Uri or some weak engines beat Random 10% or not.
CCRL 40/4 is based on buggy engines that sometimes make draws against the random engine because of bugs
The random mover lost 20-0 against Ram2 that has CCRL rating of 517 when it lost only 10.5-9.5 against LaMoSca 0.10 with 19 draws
I believe that if you use only normal engines you get clearly lower rating for the random engine(normal engines can be every normal program at small number of nodes when I define normal program as a program that know the rules and usually will not make a move that draw the game immediately based on fifty move rule or repetition in a better position).
Uri Blass wrote:I can add that I played my opponent and played many moves that allow my opponent to blunder even if they are not best objectively and of course did not resign when I saw mate in the next move but I believe even with normal game when I use longer time control and never resign a player with fide rating 2000 will usually win considering the fact that he will play clearly better than what I played and probability of 1 to 10 to random move that in big percentage of the cases may be a blunder is too much.
I think these games with humans who trick engines and know what strategy to adopt are pretty irrelevant for establishing CCRL rating. Recently Komodo has beaten Fritz 11 (2850 or so CCRL) in Knight-odds match, but lost 0-3 against a human FM of about 2100 FIDE ELO who knew how to play these odds matches.
I think you have not played the random% player versions vs. at least one other program, but only against other versions of random%?
Actually I believe with normal players Uri does not mean Humans only, but simply _other_ players with an established (CCRL) rating.
I am sure there will be different results then.
Guenther
Edit:
I replied w/o reading the last posts in this thread it seems, now I see Uri confirmed my belief.
Yes, but I now see Brutus_RND on CCRL 40/4 with 200 rating (Bayeselo, a bit deflated rating), and as I used Ordo for rating, it is compatible with my -100 to -200 rating of Random Player. I don't think of these differences as important for the bulk of the discussion, whether Uri or some weak engines beat Random 10% or not.
CCRL 40/4 is based on buggy engines that sometimes make draws against the random engine because of bugs
The random mover lost 20-0 against Ram2 that has CCRL rating of 517 when it lost only 10.5-9.5 against LaMoSca 0.10 with 19 draws
I believe that if you use only normal engines you get clearly lower rating for the random engine(normal engines can be every normal program at small number of nodes when I define normal program as a program that know the rules and usually will not make a move that draw the game immediately based on fifty move rule or repetition in a better position).
From a test running since yesterday. NEG 1.2 is a slightly improved version over NEG 0.3d which should avoid more stalemates.
Obviously it still stalemates from time to time, I will add Ram and RuyRandom and together with the games of Daniel get an ordo calculation.
Brutus Rnd has earned most of its rating in CCRL because LaMosca, Ace and POS allowed quite a lot of draws due to bugs otherwise it would be already around -300 in the CCRL scale for 40/4.
BTW I think it is nearly impossible to get a reliable rating for the % random versions, or at least you'll need much more games than
for normal rating calculations.
It is completely unpredictable how much a random move loses during a game, if it only happens %-wise.
N.E.G. 1.2 uses a very simplistic kludge to avoid stalemates: there is a penalty on capturing the opponent's last minor, when it already has a winning advantage. This enormously reduces the likelihood of stalemating the opponent. But occasionally the minor gets pinned, and then it can run into a stalemate.
N.E.G.'s success in converting large material advantages is caused by its preference for moves that deliver (safe) check with another piece than it last moved.
BTW, N.E.G. is not a 'normal engine'; it has no alpha-beta search. It just counts how many times each square is attacked by each side, and what the lowest attacker is, and uses that to decide if it is safe to capture there or remain on that square.
The problem with the low end of the rating list is that the Elo model completely fails for buggy engines. Losing through illegal moves or crashes can happen irrespective of rating difference, and even when you weed out such games, draws because of failing repetiition or stalemate detection, or insufficient appreciation of checkmate can happen against arbitrarily weak opponents.
It would be better to characterize engies by two numbers: a playing strength, and a 'failure-to-convert probability'. Games as the Brutus RND game shown by Adam above should not count as a draw, but as a 'failure to convert'. As far as Brutus RND's playing strength is concerned, it should not count as a draw, as that would grossly overrate Brutus RND's performance in this game. So for the rating the game should be ignored (or counted as a win for Iota), and it should increase Iota's failure-to-convert score.
Uri Blass wrote:
I think that you overestimate the random player.
I think that a player with rating 1900 is closer to perfect player relative to random player.
I do not know how you get 0 elo for random player and it seems to me high.
Maybe it is because some weak engines allow stalemates but
I believe that if you take non buggy engines that do not allow stalemates and play them at fixed depths then you will get more than 3600 elo difference between depth 1 and depth 20 when depth 1 is clearly more than 400 elo better than the random player and I believe more than 800 elo better than the random player.
No, I tested pretty thoroughly the random player to be at about -100 to -200 CCRL 40/40 ELO points according to Logistic (which is pretty firmly established for engine-engine matches on large ELO span). Look at this thread: http://talkchess.com/forum/viewtopic.ph ... =0&t=62510
There I have a table:
# PLAYER : RATING POINTS PLAYED (%)
1 Random 0% : 2697.0 935.0 1000 93.5%
2 Random 10% : 2229.8 1033.0 2000 51.6%
3 Random 20% : 1632.3 970.0 2000 48.5%
4 Random 30% : 1156.3 582.0 2000 29.1%
5 Random 40% : 1142.2 1217.0 2000 60.9%
6 Random 50% : 961.6 1148.0 2000 57.4%
7 Random 60% : 604.0 820.5 2000 41.0%
8 Random 70% : 450.9 1097.5 2000 54.9%
9 Random 80% : 204.7 872.0 2000 43.6%
10 Random 90% : 76.6 1115.0 2000 55.8%
11 Random 100% : -155.6 210.0 1000 21.0%
given in CCRL 40/40 ELO points. So, in my reply to Sven, I took -100 to -200 for random player, 1700-1800 for strong amateur and Zurichess_00, and 3800-3900 for non-losing from standard opening position player. These are all supported by empirical data.
It may be interesting to test not only against random players but against normal engines or humans.
I cannot believe that random 20% can achieve fide rating of 1600 against humans.
It seems to me an engine that I guess that I can easily win against it
at blitz(5 minutes per game) and when I am clearly better than fide rating 1600 I believe my level at blitz is lower than 1600 fide rating(at tournament time control)
jumping in here to somment without having read the whole thread, anyway:
- a random mover is a relatively strong engine, it will make the strongest move about once every 30 moves, and often make the 2nd-best, 3rd-best or 10-th best move
- the real deal will be a worst-mover, that would pick only the very worst moves
how stronger would be a random-mover next to the worst-mover?
Uri Blass wrote:I can add that I played my opponent and played many moves that allow my opponent to blunder even if they are not best objectively and of course did not resign when I saw mate in the next move but I believe even with normal game when I use longer time control and never resign a player with fide rating 2000 will usually win considering the fact that he will play clearly better than what I played and probability of 1 to 10 to random move that in big percentage of the cases may be a blunder is too much.
I think these games with humans who trick engines and know what strategy to adopt are pretty irrelevant for establishing CCRL rating. Recently Komodo has beaten Fritz 11 (2850 or so CCRL) in Knight-odds match, but lost 0-3 against a human FM of about 2100 FIDE ELO who knew how to play these odds matches.
Larry has been using much faster TC in this match, and, most importantly, allocated many more cores to Komodo.
why allocate more cores, when you are testing engines?
so, I guess, Larry's results are off by at least 300 elo or so.
Uri Blass wrote:
I think that you overestimate the random player.
I think that a player with rating 1900 is closer to perfect player relative to random player.
I do not know how you get 0 elo for random player and it seems to me high.
Maybe it is because some weak engines allow stalemates but
I believe that if you take non buggy engines that do not allow stalemates and play them at fixed depths then you will get more than 3600 elo difference between depth 1 and depth 20 when depth 1 is clearly more than 400 elo better than the random player and I believe more than 800 elo better than the random player.
No, I tested pretty thoroughly the random player to be at about -100 to -200 CCRL 40/40 ELO points according to Logistic (which is pretty firmly established for engine-engine matches on large ELO span). Look at this thread: http://talkchess.com/forum/viewtopic.ph ... =0&t=62510
There I have a table:
# PLAYER : RATING POINTS PLAYED (%)
1 Random 0% : 2697.0 935.0 1000 93.5%
2 Random 10% : 2229.8 1033.0 2000 51.6%
3 Random 20% : 1632.3 970.0 2000 48.5%
4 Random 30% : 1156.3 582.0 2000 29.1%
5 Random 40% : 1142.2 1217.0 2000 60.9%
6 Random 50% : 961.6 1148.0 2000 57.4%
7 Random 60% : 604.0 820.5 2000 41.0%
8 Random 70% : 450.9 1097.5 2000 54.9%
9 Random 80% : 204.7 872.0 2000 43.6%
10 Random 90% : 76.6 1115.0 2000 55.8%
11 Random 100% : -155.6 210.0 1000 21.0%
given in CCRL 40/40 ELO points. So, in my reply to Sven, I took -100 to -200 for random player, 1700-1800 for strong amateur and Zurichess_00, and 3800-3900 for non-losing from standard opening position player. These are all supported by empirical data.
It may be interesting to test not only against random players but against normal engines or humans.
I cannot believe that random 20% can achieve fide rating of 1600 against humans.
It seems to me an engine that I guess that I can easily win against it
at blitz(5 minutes per game) and when I am clearly better than fide rating 1600 I believe my level at blitz is lower than 1600 fide rating(at tournament time control)
jumping in here to somment without having read the whole thread, anyway:
- a random mover is a relatively strong engine, it will make the strongest move about once every 30 moves, and often make the 2nd-best, 3rd-best or 10-th best move
No it is very weak, because even a random best move doesn't help if you already blundered a dozen of times before...
The random movers in my test made around 11-13% vs. Andworst to answer your last question.
In CCRL the RM has around 200 rating on CCRL scale but only due to the fact it played 3-5 buggy programs (out of its opponents), which stalemated too often or did not know 3 time-rep,
otherwise it would be around -200/-300.
Of course all of this was already answered and posted in this thread if you had read it...
Ah! I didn't remember it. Well, the former was random, but not at depth 1 if I remember well. The new one you point also worked at depth 1. And also was stronger.
how stronger is a random-mover than worst-mover?
how stronger would be SF than some 2000 elo engine, and this engine in turn than the random-mover?
maybe in this case we will need some 10 000 elo scale.
Evert wrote:What's to avoid? It's pretty clear that chess (FIDE rules) is a draw from the initial position with optimal play from both sides.
Unless you rig the opening by having one side play vastly suboptimal, a draw is the expected outcome. If you do unbalance the opening like that, you're not measuring a result, just confirming your input.