Kirill Kryukov wrote:Uri Blass wrote:Kirill Kryukov wrote:Logically it is to be expected, but still it's nice to see some data.
This draw percentage table is constructed for my selection of well-tested free single-CPU engines tested under CCRL 40/4 conditions.
You can scroll around or use zoom function if your browser has it. Although there is a large variation from one engine pair to another, overall it is clear that that there are much more draws made by stronger engines.
This is one of the reasons why I am becoming more interested in weaker engines recently - the games are more interesting to watch.
I see that most discussions are about stronger engines. Do you think the quality of games played by stronger engines makes up for the number of draws they make?
Best, Kirill
I think that some weak engines also make more draws when they play against engines with similiar strength(mainly weak engines that are unable to detect repetition so they get better position only to allow the opponent to get a draw by repetition).
I think that it is better simply to drop engines that do it out of the rating list.
I believe that they are responsible to the fact that the rating of very weak engines is too high because if an engine play like 2300 in the middle game and allow draw by repetition because of not having hash tables then that engine may perform almost like 2300 against 2500 players when the same engine may get many draws against 1900 engine and perform like 2100 against 1900 engine.
Practically the engine may get 25% against 2500 and 75% against 1900
If we decide to give the 2500 engine rating of 2500 then the 1900 engine may get rating near 2100 based only on this information.
I think that we need some rule that only stable engine that do not make stupid bugs often can be allowed to enter to tating lists.
Uri
I think no engine can be punished by having a drawish style. As long as an engine is stable it should be allowed to make any legal moves it likes to make. Whether it prefers draws or not, its rating still can be determined (assuming it has no learning).
BTW, please note that my own testing strategy is to test each engine with both stronger and weaker opponents. (At least 16 closest stronger opponents and 16 closest weaker opponents).
The main problem is that engines with serious bugs distort the rating list.
As an extreme case
If an engine is drawing all games against levels of 2000-2400 by forcing repetition in better positions then the rating of the 2000 players go up and the rating of the 2400 players go down.
It does not happen with human and no human will force repetition often in superior positions like counter0.1(note that counter0.l is not tested by CCRL but I am not sure that there are not engines with the same problem.
I think that we need engines with no serious bugs to have a reliable rating list for the weak engine and a good idea may be to use rybka or other strong programs at fixed depth.
The only problem is how to play games of engine with fixed depth against engines with specific time control(it may be interesting to have ccrl rating for rybka at different depths and games like that can take very small time when we use small depths)
I believe that rybka2.3.2a depth 11 may get blitz ccrl rating above 2700 when it is going to use less than 4 minutes/40 moves in most cases.
Games with smaller depth can be played even faster and can help to get fast reliable rating for the small depths that may help to get more reliable rating for the weak engines.
If the problem of playing fixed depth against specific time control is solved then
I suggest to include
Rybka2.3.2a depth 11
Rybka2.3.2a depth 10
Rybka2.3.2a depth 9
Rybka2.3.2a depth 8
and other strong engines (not rybka) with fixed depth both in the CCRL 40/4 and CCRL 40/40
It may give us also better data about the rating difference between
40/4 and 40/40(of course I expect the same entry at fixed depth to have smaller rating at 40/40 but the question is how much smaller)
Uri