After some years I'm back to improving my humble engine, Tunguska. I'm implementing NNUE, with interesting results, but now I'm realy confusing about something.
Here, it is listed with ELO 2417 +-15 https://computerchess.org.uk/ccrl/4040/index.html. When running the STS tool https://www.talkchess.com/forum/viewtopic.php?t=56653 with the default movetime of 500ms, I got 2435, very close to the CCRL 40/15 listed value of 2417 +-15.
Now, when running the tool with movetime = 5 secs, I got an estimated ELO of 2958:
STS_Rating_v3 -f "STS1-STS15_LAN.EPD" -e "D:\cpp_projs\Tunguska1_1\bin\x64\Release\Tunguska1_1_old.exe" --proto uci --movetime 5000
STS ID STS1 STS2 STS3 STS4 STS5 STS6 STS7 STS8 STS9 STS10 STS11 STS12 STS13 STS14 STS15 ALL
NumPos 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 1500
BestCnt 64 55 70 62 61 56 57 45 52 79 60 59 71 68 40 899
Score 719 671 793 733 722 765 679 606 634 859 736 682 781 783 616 10779
Score(%) 71.9 67.1 79.3 73.3 72.2 76.5 67.9 60.6 63.4 85.9 73.6 68.2 78.1 78.3 61.6 71.9
STS rating: 2956
The calculation is
ELO = 44.523 * mean(Score %) - 242.85
ELO = 44.523 * 71.9-242.85
ELO ~ 2958
How can I explain this? The tool appears to be very reliable, so maybe my time management code is so bad? It is a very naive time management, just take the clock time and divide by movesToGo. The engine respect it when using Arena, so 15 * 60/40 makes ~22 secs per move. Well the code is here if someone wants to check it out: https://github.com/fernandotenorio/Tunguska.
Note: this was done 10 years ago, I was not a very experienced programmer, let alone a C++ one.
Thanks!
Engine ELO estimation
Moderator: Ras
-
- Posts: 37
- Joined: Wed Nov 23, 2016 1:30 am
- Location: Brazil
Engine ELO estimation
The truth comes from inside.
https://github.com/fernandotenorio/Tunguska
https://github.com/fernandotenorio/Tunguska
-
- Posts: 3169
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Engine ELO estimation
CCRL (Blitz?) rating had an adjustment and it seems the STS Elo tool is ~100 Elo points off, further the formula is based on 0.2 seconds per position what can cause trouble with engines w/o Bullet proof time management:
https://www.talkchess.com/forum/viewtop ... 25#p628025
CCRL Blitz
https://computerchess.org.uk/ccrl/404/
CCRL Blitz before adjustment
https://computerchess.org.uk/ccrl/402.archive/
Disclaimer: people in the Discord channels will probably tell you to use proper SPRT testing, IMHO STS can be of some use too.
https://www.chessprogramming.org/Sequen ... Ratio_Test
--
Srdja
https://www.talkchess.com/forum/viewtop ... 25#p628025
CCRL Blitz
https://computerchess.org.uk/ccrl/404/
CCRL Blitz before adjustment
https://computerchess.org.uk/ccrl/402.archive/
Disclaimer: people in the Discord channels will probably tell you to use proper SPRT testing, IMHO STS can be of some use too.
https://www.chessprogramming.org/Sequen ... Ratio_Test
--
Srdja
-
- Posts: 37
- Joined: Wed Nov 23, 2016 1:30 am
- Location: Brazil
Re: Engine ELO estimation
Thanks for the SPRT links, I'll check this.
Can you elaborate a bit on what do you think is happening? I don't folow what you wrote.
Thanks!
Can you elaborate a bit on what do you think is happening? I don't folow what you wrote.
Thanks!
The truth comes from inside.
https://github.com/fernandotenorio/Tunguska
https://github.com/fernandotenorio/Tunguska
-
- Posts: 3169
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Engine ELO estimation
1. Make sure your time management can handle Bullet with 200 ms per move.
2. Use STS tool with 200 ms per position.
3. STS tool is in your case +72 Elo off, Tunguska 1.1 old rating was 2445 new rating is 2373, cos CCRL Blitz rating formula was changed.
4. Use SPRT for proper testing during engine development and STS tool to get an rough rating estimate only (for regression tests there are updated STS versions: https://github.com/fsmosca/STS-Rating).
--
Srdja
2. Use STS tool with 200 ms per position.
3. STS tool is in your case +72 Elo off, Tunguska 1.1 old rating was 2445 new rating is 2373, cos CCRL Blitz rating formula was changed.
4. Use SPRT for proper testing during engine development and STS tool to get an rough rating estimate only (for regression tests there are updated STS versions: https://github.com/fsmosca/STS-Rating).
--
Srdja
-
- Posts: 37
- Joined: Wed Nov 23, 2016 1:30 am
- Location: Brazil
Re: Engine ELO estimation
I see the issue with fast time controls, but still do not understand why it is rated around 2400 in the 40/15 ccrl page, when the SST tests says it is stronger at longer time controls.
The truth comes from inside.
https://github.com/fernandotenorio/Tunguska
https://github.com/fernandotenorio/Tunguska
-
- Posts: 3169
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Engine ELO estimation
Cos the STS tool (with LAN3 set) formula is based on 0.2s / pos and CCRL Blitz ratings?
https://www.talkchess.com/forum/viewtop ... 25#p628025
--
Srdja
https://www.talkchess.com/forum/viewtop ... 25#p628025
--
Srdja
-
- Posts: 3169
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Engine ELO estimation
Hmm, pondering a bit about the numbers you posted and the formula, I can not explain the Elo jump from 0.5s to 5s. Idk what the subtrahend in the formula means. You might have to increase subtrahend according to increased time control, but Idk how.
--
Srdja
--
Srdja
-
- Posts: 37
- Joined: Wed Nov 23, 2016 1:30 am
- Location: Brazil
Re: Engine ELO estimation
Yeah it's very weird. IMO movetime is a potent predictor which is not included in the linear model. So movetimes different than the default one are out of scope for this model.
It would be better to do a multiple linear regression with each STS theme as regressor and either movetime also as regressor, or (better) have different linear models for separated movetimes 500, 1000 and 5000 for example.
I've double checked my UCI code, movetime is working ok.
It would be better to do a multiple linear regression with each STS theme as regressor and either movetime also as regressor, or (better) have different linear models for separated movetimes 500, 1000 and 5000 for example.
I've double checked my UCI code, movetime is working ok.
The truth comes from inside.
https://github.com/fernandotenorio/Tunguska
https://github.com/fernandotenorio/Tunguska