Engine ELO estimation

Necromancer · Post by **Necromancer** » Wed Apr 02, 2025 10:22 am

After some years I'm back to improving my humble engine, Tunguska. I'm implementing NNUE, with interesting results, but now I'm realy confusing about something.

Here, it is listed with ELO 2417 +-15 https://computerchess.org.uk/ccrl/4040/index.html. When running the STS tool https://www.talkchess.com/forum/viewtopic.php?t=56653 with the default movetime of 500ms, I got 2435, very close to the CCRL 40/15 listed value of 2417 +-15.

Now, when running the tool with movetime = 5 secs, I got an estimated ELO of 2958:
STS_Rating_v3 -f "STS1-STS15_LAN.EPD" -e "D:\cpp_projs\Tunguska1_1\bin\x64\Release\Tunguska1_1_old.exe" --proto uci --movetime 5000

STS ID STS1 STS2 STS3 STS4 STS5 STS6 STS7 STS8 STS9 STS10 STS11 STS12 STS13 STS14 STS15 ALL
NumPos 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 1500
BestCnt 64 55 70 62 61 56 57 45 52 79 60 59 71 68 40 899
Score 719 671 793 733 722 765 679 606 634 859 736 682 781 783 616 10779
Score(%) 71.9 67.1 79.3 73.3 72.2 76.5 67.9 60.6 63.4 85.9 73.6 68.2 78.1 78.3 61.6 71.9

STS rating: 2956

The calculation is
ELO = 44.523 * mean(Score %) - 242.85
ELO = 44.523 * 71.9-242.85
ELO ~ 2958

How can I explain this? The tool appears to be very reliable, so maybe my time management code is so bad? It is a very naive time management, just take the clock time and divide by movesToGo. The engine respect it when using Arena, so 15 * 60/40 makes ~22 secs per move. Well the code is here if someone wants to check it out: https://github.com/fernandotenorio/Tunguska.

Note: this was done 10 years ago, I was not a very experienced programmer, let alone a C++ one.
Thanks!

smatovic · Post by **smatovic** » Wed Apr 02, 2025 2:04 pm

CCRL (Blitz?) rating had an adjustment and it seems the STS Elo tool is ~100 Elo points off, further the formula is based on 0.2 seconds per position what can cause trouble with engines w/o Bullet proof time management:

https://www.talkchess.com/forum/viewtop ... 25#p628025

CCRL Blitz
https://computerchess.org.uk/ccrl/404/

CCRL Blitz before adjustment
https://computerchess.org.uk/ccrl/402.archive/

Disclaimer: people in the Discord channels will probably tell you to use proper SPRT testing, IMHO STS can be of some use too.

https://www.chessprogramming.org/Sequen ... Ratio_Test

--
Srdja

Necromancer · Post by **Necromancer** » Wed Apr 02, 2025 10:56 pm

Thanks for the SPRT links, I'll check this.
Can you elaborate a bit on what do you think is happening? I don't folow what you wrote.

Thanks!

smatovic · Post by **smatovic** » Thu Apr 03, 2025 8:27 am

1. Make sure your time management can handle Bullet with 200 ms per move.
2. Use STS tool with 200 ms per position.
3. STS tool is in your case +72 Elo off, Tunguska 1.1 old rating was 2445 new rating is 2373, cos CCRL Blitz rating formula was changed.
4. Use SPRT for proper testing during engine development and STS tool to get an rough rating estimate only (for regression tests there are updated STS versions: https://github.com/fsmosca/STS-Rating).

--
Srdja

Necromancer · Post by **Necromancer** » Thu Apr 03, 2025 5:24 pm

I see the issue with fast time controls, but still do not understand why it is rated around 2400 in the 40/15 ccrl page, when the SST tests says it is stronger at longer time controls.

smatovic · Post by **smatovic** » Thu Apr 03, 2025 6:15 pm

Cos the STS tool (with LAN3 set) formula is based on 0.2s / pos and CCRL Blitz ratings?

https://www.talkchess.com/forum/viewtop ... 25#p628025

--
Srdja

smatovic · Post by **smatovic** » Thu Apr 03, 2025 8:18 pm

Hmm, pondering a bit about the numbers you posted and the formula, I can not explain the Elo jump from 0.5s to 5s. Idk what the subtrahend in the formula means. You might have to increase subtrahend according to increased time control, but Idk how.

--
Srdja

Necromancer · Post by **Necromancer** » Sat Apr 05, 2025 2:27 am

Yeah it's very weird. IMO movetime is a potent predictor which is not included in the linear model. So movetimes different than the default one are out of scope for this model.

It would be better to do a multiple linear regression with each STS theme as regressor and either movetime also as regressor, or (better) have different linear models for separated movetimes 500, 1000 and 5000 for example.

I've double checked my UCI code, movetime is working ok.

Engine ELO estimation

Engine ELO estimation

Re: Engine ELO estimation

Re: Engine ELO estimation

Re: Engine ELO estimation

Re: Engine ELO estimation

Re: Engine ELO estimation

Re: Engine ELO estimation

Re: Engine ELO estimation