Elo differences on 40/40 vs 40/4 CCRL lists? Human play?

JayRod · Post by **JayRod** » Wed Sep 03, 2014 10:31 am

Compare this list, which is 40 moves in 40 minutes (40/40):
http://computerchess.org.uk/ccrl/4040/

to this one, which is 40 moves in 4 minutes (40/4):
http://www.computerchess.org.uk/ccrl/404/

Why do the engines in the longer time control not have a much higher Elo than the same engine in the shorter time control? Especially the top five in each list. I understand that Elo is relative to the pool you are in, but you would think that with more time the engines could seek the "truth" of a position better and avoid losing games, hence there should be compression of the Elos a bit more so you should have Elos converge about some number. That is not the case, and strangely you also see some lower rated engines actually perform weaker than their peers at longer time controls and vice versa. See for example the Hermann program and the Joker program in both lists, which flip-flop in Elo.

I speculate it is because certain engines are written to optimize their blitz performance, while others simply use the same algorithm regardless of time control.

Put another way: if there was no optimization for blitz, then every engine should have the same Elo in the two lists, regardless of time control (time would be irrelevant).

Finally, though this is a topic for another thread, if anybody knows of a list that correlates human Elo play and machine Elo play (so you can see what 3369 that Stockfish 5 scores equates to human play) please post here. I think there's no such list since the sample size is too small, only a handful of grandmasters have played these engines.

JayRod

Uri Blass · Post by **Uri Blass** » Wed Sep 03, 2014 10:58 am

JayRod wrote:Compare this list, which is 40 moves in 40 minutes (40/40):
http://computerchess.org.uk/ccrl/4040/

to this one, which is 40 moves in 4 minutes (40/4):
http://www.computerchess.org.uk/ccrl/404/

Why do the engines in the longer time control not have a much higher Elo than the same engine in the shorter time control? Especially the top five in each list. I understand that Elo is relative to the pool you are in, but you would think that with more time the engines could seek the "truth" of a position better and avoid losing games, hence there should be compression of the Elos a bit more so you should have Elos converge about some number. That is not the case, and strangely you also see some lower rated engines actually perform weaker than their peers at longer time controls and vice versa. See for example the Hermann program and the Joker program in both lists, which flip-flop in Elo.

I speculate it is because certain engines are written to optimize their blitz performance, while others simply use the same algorithm regardless of time control.

Put another way: if there was no optimization for blitz, then every engine should have the same Elo in the two lists, regardless of time control (time would be irrelevant).

Finally, though this is a topic for another thread, if anybody knows of a list that correlates human Elo play and machine Elo play (so you can see what 3369 that Stockfish 5 scores equates to human play) please post here. I think there's no such list since the sample size is too small, only a handful of grandmasters have played these engines.

JayRod

I do not see a reason that all engines should have the same elo in the 2 lists with no optimization for blitz.

There are things that help more at blitz relative to long time control(for example making the program faster)
There are also things that help relatively more at long time control(for example better order of moves that practically can cause the program to be 10% faster in blitz but 20% faster in long time control).

I do not know how you can develop with no optimization for blitz
unless you do not test your program because you cannot get enough games that are not blitz to get a significant result.

The stockfish team accept only changes that are productive at 1 minute per game so practically they optimize their program to be stronger at bullet but it helps at all time controls.

l also do not see the flip-flop between Joker and Hermann

long time control
Hermann 2.8 64-bit 2516 +21 −21
198‑200 Joker 1.1.14 2294 +22 −22

blitz
Hermann 2.8 64-bit 2530 +11 −11 47.1%
195 Joker 1.1.14 2306 +14 −14 46.6%

Edit:I can add that compression of the elo seems to exist(stockfish and the top programs has bigger rating at blitz) and the reason for not having more compression is that weaker programs usually earn less from time in the meaning that you need bigger speed difference for the weaker program to get 50% at longer time control.

Usually part of the strength of the strong programs is simply that they earn more from time.

I remember that when I tested movei against weaker programs I found that movei could lose against the weak programs with 5:1 time disadvantage at fast time control but could win with the same time disadvantage at slower time control.

When I tested against significantly stronger programs I found the opposite results and movei could win with 10:1 time advantage at fast time control but lost at slower time control with the same ratio of time advantage.

JayRod · Post by **JayRod** » Wed Sep 03, 2014 1:27 pm

OK thanks Uri Blass, I was mistaken about the flip-flop, and indeed it seems that the strong programs are optimized for bullet, which as you say mostly carries over into improvements at longer time.

JayRod

JayRod · Post by **JayRod** » Wed Sep 03, 2014 4:30 pm

JayRod wrote:OK thanks Uri Blass, I was mistaken about the flip-flop, and indeed it seems that the strong programs are optimized for bullet, which as you say mostly carries over into improvements at longer time.

JayRod

Actually I found a program that flip-flops, where the longer time control at 40/40 produces a lower Elo than 40/4, it is the program "Mustang 4.97" where at 40/40 it has an Elo of 2025 and at 40/4 it has an Elo that is smaller, at 1940. It might be due to the smaller number of games played at the shorter time control, 484 vs 2045 games at the longer time control.

cdani · Post by **cdani** » Wed Sep 03, 2014 11:55 pm

Hi!
With Andscacs there is huge difference in elo depending on the time control.

Example

50+0.04

Code: Select all

 1 Gull 1.2 x64           193   29   28   522   84%   -82   18%
 2 Gaviota v1.0           127   27   26   524   77%   -82   18%
 3 cheng4 0.36c            47   24   24   521   68%   -82   24%
 4 Critter 0.52b 64-bit    39   24   24   524   67%   -82   22%
 5 Nirvanachess 1.6        24   24   24   522   65%   -82   24%
 6 Naraku 1.4               5   25   24   524   62%   -82   18%
 7 Spike 1.2 Turin        -67   23   23   523   52%   -82   27%
 8 Andscacs 0.64147       -82    9    9  4708   38%     9   22%
 9 Atlas  3.60  x64       -82   23   23   524   50%   -82   28%
10 Philou 3.7.1 64 bits  -203   24   24   524   33%   -82   22%

300+2

Code: Select all

 1 Gull 1.2 x64           160   51   47   135   78%   -41   27%
 2 Gaviota v1.0           132   49   46   136   75%   -41   27%
 3 Nirvanachess 1.6        73   47   46   132   67%   -41   29%
 4 cheng4 0.36c            41   47   46   130   62%   -41   25%
 5 Naraku 1.4              13   46   46   140   57%   -41   19%
 6 Critter 0.52b 64-bit   -12   46   45   140   54%   -41   21%
 7 Andscacs 0.64147       -41   16   16  1227   44%     3   27%
 8 Spike 1.2 Turin        -48   43   43   136   49%   -41   36%
 9 Atlas  3.60  x64      -108   44   44   138   40%   -41   33%
10 Philou 3.7.1 64 bits  -210   46   49   140   27%   -41   23%

This same version, at 10 seconds is -100 or less with the same opponents.

This is because I found some improvements that did not work at 50+0.04 but worked well at 300+2.

Most people don’t try at longer time controls when the change is bad at short time controls. I do mostly the same, unless there is something that attracts my attention.

Ferdy · Post by **Ferdy** » Thu Sep 04, 2014 7:04 am

JayRod wrote: [...]

Finally, though this is a topic for another thread, if anybody knows of a list that correlates human Elo play and machine Elo play (so you can see what 3369 that Stockfish 5 scores equates to human play) please post here. I think there's no such list since the sample size is too small, only a handful of grandmasters have played these engines.
JayRod

There is a very interesting thread regarding engine strength and human elo.

http://talkchess.com/forum/viewtopic.ph ... mputer+elo

hgm · Post by **hgm** » Thu Sep 04, 2014 10:58 am

It is not just a matter of optimization. Some programs gain more Elo per search-time doubling than others, because they have a lower effective branching ratio (e.g. better move ordering or smarter pruning). So the same time doubling gives them more effective ply look-ahead. A program with a clever search thus should profit more from long TC than programs with a dumb search. That flip-flops are not more common is because programs with dumb search also tend to have lower Elo overall. So if you look at prgrams that are close to each other in the list, at (say) pretty low Elo, they are likely to have similarly dumb searches, and thus also profit about equally from slower TC. Only if two programs are about equal, but one derives its strength from a combination of dumb search but elaborate eval, while the other has smart search but simplistic eval you could expect different behavior.

Uri Blass · Post by **Uri Blass** » Thu Sep 04, 2014 12:45 pm

hgm wrote:It is not just a matter of optimization. Some programs gain more Elo per search-time doubling than others, because they have a lower effective branching ratio (e.g. better move ordering or smarter pruning). So the same time doubling gives them more effective ply look-ahead. A program with a clever search thus should profit more from long TC than programs with a dumb search. That flip-flops are not more common is because programs with dumb search also tend to have lower Elo overall. So if you look at prgrams that are close to each other in the list, at (say) pretty low Elo, they are likely to have similarly dumb searches, and thus also profit about equally from slower TC. Only if two programs are about equal, but one derives its strength from a combination of dumb search but elaborate eval, while the other has smart search but simplistic eval you could expect different behavior.

I believe that being relatively better at long time control can be also result of evaluation and not only result of search so smart search and simplistic evaluation is not always better for long time control relative to dumb search and elaborate eval.

I remember reading that one program(I think some old glaurung but not sure about it) did relatively worse at long time control relative to programs at similar strength and the problem disappeared when the author added king safety into the evaluation(If I remember correctly it was not pawn shield but simply bonus for attacking squares near the opponent king).

Elo differences on 40/40 vs 40/4 CCRL lists? Human play?

Elo differences on 40/40 vs 40/4 CCRL lists? Human play?

Re: Elo differences on 40/40 vs 40/4 CCRL lists? Human play?

Re: Elo differences on 40/40 vs 40/4 CCRL lists? Human play?

Re: Elo differences on 40/40 vs 40/4 CCRL lists? Human play?

Re: Elo differences on 40/40 vs 40/4 CCRL lists? Human play?

Re: Elo differences on 40/40 vs 40/4 CCRL lists? Human play?

Re: Elo differences on 40/40 vs 40/4 CCRL lists? Human play?

Re: Elo differences on 40/40 vs 40/4 CCRL lists? Human play?