Examples for engines that are relatively better at LTC

Uri Blass · Post by **Uri Blass** » Wed Jan 08, 2025 6:26 am

From the CCRL 15/40 rating list
https://computerchess.org.uk/ccrl/4040/ ... t_all.html

Movei 00.8.438 (10 10 10) 2661 +9 −9 49.4% +0.1 36.5% 3082
Colossus 2008b 2638 +11 −11 49.2% +3.5 38.6% 2188

2661-9>2638+11 conclusion Movei is stronger at 15/40

From the CCRL 2+1 blitz rating list
https://computerchess.org.uk/ccrl/404/r ... t_all.html

Colossus 2008b 2632 +27 −27 50.2% −2.9 27.6% 428
Movei 00.8.438 (10 10 10) 2567 +30 −30 44.8% +41.4 26.3% 335
2632-27>2567+30 conclusion Colossus 2008b is stronger than Movei at 2+1

I wonder how many pairs you can find and if somebody can find all the pairs.
Unfortunately CCRL even does not test the same engines at blitz and at long time control so if we look at the top I can see 8 cpu leading the blitz list when 4 cpu leading the long time control list so finding manually the common engines is not an easy task.

Can somebody write a software to get a list of only engines that appear both in the CCRL blitz and the CCRL long time control and find all the pairs of engines when A is stronger than B at long time control and weaker than B at blitz.

I wonder how many pairs you can find.

Note that Movei (my own engine) when I stopped the developement many years ago was not designed to do better at long time control and I was surprised by the results.
I tested movei against significantly stronger engines in the past and found that it needs bigger time handicap to get 50% at long time control so I am sure that some weak engines that is based on some strong engine that is 100 times slower is going to do relatively better at long time control but usually weak engines do not work in that way.

Graham Banks · Post by **Graham Banks** » Thu Jan 09, 2025 11:04 am

Colossus 2024a 64-bit has many more games running, so we'll see if that has any effect. It may not, but we'll find out soon enough.

The 40/15 and Blitz ratings are calculated separately, not from the same database.

Uri Blass · Post by **Uri Blass** » Thu Jan 09, 2025 11:55 am

Graham Banks wrote: ↑Thu Jan 09, 2025 11:04 am Colossus 2024a 64-bit has many more games running, so we'll see if that has any effect. It may not, but we'll find out soon enough.

The 40/15 and Blitz ratings are calculated separately, not from the same database.

I compared in the past with old version of Colossus.
Note that I did not compare rating of different lists but the order of engines. There are not many cases when A is stronger than B in blitz with high level of confidence when the opposite happens at long time control with high level of confidence(and high level of confidence means that a-b>c+d when a is the higher rating b is the lower rating and c and d are possible errors that you give in the list).

For Colossus of course new version is better but it is not clear if there is an improvement from 2021b to 2024
I see that in blitz 2021b seems stronger
Colossus 2021b 64-bit 2801 +14 −14 48.7% +9.6 30.0% 1609
Colossus 2024a 64-bit 2767 +19 −19 54.7% −39.5 20.0% 893

2801-14=2787>2767+19=2786

At long time control you do not have enough games and we even have 2771-30<2759 so it is not clear if 2024a is stronger.

Colossus 2024a 64-bit 2771 +30 −30 45.4% +37.5 28.7% 296
Colossus 2021b 64-bit 2759 +16 −16 49.3% +6.6 33.7% 1012

Uri Blass · Post by **Uri Blass** » Thu Jan 09, 2025 12:08 pm

Colossus2021b is clearly stronger than Bright0.5c or Bright0.4a at blitz

Colossus 2021b 64-bit 2801 +14 −14 48.7% +9.6 30.0% 1609
Bright 0.5c 2749 +17 −17 45.6% +34.8 26.1% 1158
Bright 0.4a 2742 +17 −17 49.5% +3.6 26.3% 1145

Colossus 2021b 64-bit>=2801-14=2787
Bright 0.5c<=2749+17=2766

If I look at long time control then it seems to be the opposite
Bright 0.5c 2786 +10 −10 46.7% +22.8 40.9% 2225
Bright 0.4a 2783 +8 −8 46.6% +25.3 38.4% 3668
Colossus 2021b 64-bit 2759 +16 −16 49.3% +6.6 33.7% 1012

Bright 0.5c>=2786-10=2776
Colossus 2021b 64-bit<=2759+16=2775

Uri Blass · Post by **Uri Blass** » Thu Jan 09, 2025 4:10 pm

Another example that is maybe better

blitz
Colossus 2021b 64-bit 2801 +14 −14 48.7% +9.6 30.0% 1609
Coiled 1.2 (no NNUE) 64-bit 2728 +15 −15 50.8% −7.1 23.1% 1464
Coiled 1.1 (no NNUE) 64-bit 2727 +14 −14 53.6% −26.9 26.7% 1690

Colossus 2021b 64-bit>=2787>2741>=Coiled 1.1 (no NNUE) 64-bit

long time control
Coiled 1.1 (no NNUE) 64-bit 2796 +14 −14 49.8% +1.9 33.7% 1303
Coiled 1.2 (no NNUE) 64-bit 2783 +14 −14 48.3% +15.1 32.0% 1321
Colossus 2021b 64-bit 2759 +16 −16 49.3% +6.6 33.7% 1012

Coiled 1.1 (no NNUE) 64-bit>=2782>2775>=Colossus 2021b 64-bit

Modern Times · Post by **Modern Times** » Fri Jan 10, 2025 7:58 am

I'd suggest you run some tightly controlled tests to verify what you are seeing. I'd be wary of drawing conclusions from those lists directly - as Graham says they are different ratings pools, and I'd add also that there is a lot of noise in the lists as well with probably different books used etc

Examples for engines that are relatively better at LTC

Examples for engines that are relatively better at LTC

Re: Examples for engines that are relatively better at LTC

Re: Examples for engines that are relatively better at LTC

Re: Examples for engines that are relatively better at LTC

Re: Examples for engines that are relatively better at LTC

Re: Examples for engines that are relatively better at LTC