A test idea without Elo, I think I start middle of Jan.26!

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Frank Quisinsky
Posts: 7189
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: A test idea without Elo, I think I start middle of Jan.26!

Post by Frank Quisinsky »

Hi there,

exactly ... this solved the problem of the day.

A third test is necessary ... after Fritz 16.
Dog 4.10.2 can be play the third test.

I will add the new rank-system later this day.

:-)

Best
Frank
Frank Quisinsky
Posts: 7189
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: A test idea without Elo, I think I start middle of Jan.26!

Post by Frank Quisinsky »

Code: Select all

[b]Standard Engines:[/b]

01. Uralochka 3.42a JA             3550
02. Revenge 4.0                    3500
03. CSTal 2.0                      3475
04. Velvet 8.1.1                   3475
05. Igel 3.6.0 JA                  3475
06. SlowChess Blitz 2.9            3400
07. Texel 1.12                     3400
08. Stockfish 200731 HCE           3375
09. Wasp 7.00                      3350
10. Patricia 5.0 JA                3350
11. Monty 251209 MCTS dev          3300
12. Leorik 3.1.3 JA                3275
13. Tcheran 9.0                    3275
14. Nemorino 6.11 JA               3250
15. Booot 6.50 HCE                 3200
16. Xiphos 0.6.1 HCE JA            3175
17. Laser 1.7 HCE JA               3175
18. Senpai 3.0.1 HCE               3125
19. DanaSah 9.1 JA                 3075
20. Fizbo 2.0 JA                   3075
21. Petrel 3.1 JA                  3050
22. Vajolet2 2.8.0 HCE             2975
23. Critter 1.6a HCE               2975
24. Deep iCE 4.0.853 HCE           2950
25. Hakkapelitta TCEC v2 HCE       2875
26. Spark 1.0 HCE                  2775
                                  -----
                                  83875 : 26 = 3225,97 Elo

Code: Select all

Rank = 20, 3-4-3 system

1.300,0 (100,00%) - 1.144,0 ( 88,00%) points = 01. ***** General Field Marshal
1.143,5 ( 87,97%) - 1.105,0 ( 85,00%) points = 02. ****  General
1.104,5 ( 84,97%) - 1.066,0 ( 82,00%) points = 03. ***   Lieutenant General
1.065,5 ( 81,97%) - 1.027,0 ( 79,00%) points = 04. **    Major General
1.026,5 ( 78,97%) -   988,0 ( 76,00%) points = 05. *     Brigadier General
--
  987,5 ( 75,97%) -   936,0 ( 72,00%) points = 06. Colonel
  935,5 ( 71,97%) -   884,0 ( 68,00%) points = 07. Lieutenant Colonel
  883,5 ( 67,97%) -   832,0 ( 64,00%) points = 08. Major
  831,5 ( 63,97%) -   780,0 ( 60,00%) points = 09. Captain
  779,5 ( 59,97%) -   728,0 ( 56,00%) points = 10. First Lieutenant
  727,5 ( 55,97%) -   676,0 ( 52,00%) points = 11. Second Lieutenant
--
  675,5 ( 51,97%) -   637,0 ( 49,00%) points = 12. Sergeant Major
  636,5 ( 48,97%) -   598,0 ( 46,00%) points = 13. Master Sergeant
  597,5 ( 45,97%) -   559,0 ( 43,00%) points = 14. Sergeant First Class
  558,5 ( 42,97%) -   520,0 ( 40,00%) points = 15. Staff Sergeant
  519,5 ( 39,97%) -   481,0 ( 37,00%) points = 16. Sergeant
  480,5 ( 36,97%) -   442,0 ( 34,00%) points = 17. Corporal
  441,5 ( 33,97%) -   403,0 ( 31,00%) points = 18. Specialist
  402,5 ( 30,97%) -   364,0 ( 28,00%) points = 19. Private First Class
  363,5 ( 27,97%) -     0,0 ( 25,00%) points = 20. Private
Ready ...
New is the "Staff Sergeant" and "Sergeant First Class".
Problem solved!

Well, they're not new.
They've always been around, but they're new to my rank system.

For 19. Private First Class ... Fritz 16 (Rybka) should make 28,00% - 30,97%.

Live mode:
shredder *.sto file (tournament configuration, game plan)
https://www.amateurschach.de/fling/etoc-g_test-02.sto

shredder *.html file (current results)
https://www.amateurschach.de/fling/etoc-g_test-02.html

shredder *.pgn file (the games)
https://www.amateurschach.de/fling/etoc-g_test-02.pgn

Updates each 2 minutes with FTP-Software Fling Plus 5.04.

Best
Frank
Frank Quisinsky
Posts: 7189
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: A test idea without Elo, I think I start middle of Jan.26!

Post by Frank Quisinsky »

Hi there,

Now, what should I display later the detail site?
In the group of "Standard Engines" there are many of the "King attacker", engines can win many games very shortly.

The following situation:
Engine A plays 1.300 games vs. the 26 Standard Engines, result = 745,5 points, rank: First Lieutenant
Engine B plays 1,300 games vs. the 26 Standard Engines, result = 745,5 points, rank: First Lieutenant

Is it possible if I test around 250 or more engines next year.

I have interest in displaying on a detail site only 4 stats.

- draw quote
- move average
- short won games below 50 moves
- short lost games below 50 moves

After all the years computerchess ...
Nothing is more boring as Elo!
... for such strong engines, today are available!

Engines give me a higher draw quote = several possible reasons!
Engines give me a higher move-average, are fighting more in endgames or fighting for the 0,5-1,0% possibility to make a half point more, waiting for mistakes by the opponent.
Engines give me a higher quantity of won games, are very aggressive in the middle game.
Engines give me a higher quantity of fast-loss games, and are usually stronger in the endgames (in most cases)
Engines give me a higher quantity of won and lost games, are playing more or less a bit speculative

Draw Quote = difficult to evaluate!
Very fast draws are all the time a topic but not for my idea.
More games can be played with a lower move-average, less boring end games will be the result.
On the other hand, a lower move-average because an engine gives to fast a draw isn’t very nice.

Back to Engine A and Engine B, both with 745,5 points.

Engine A: 12 won games below 50 moves, 4 lost games below 50 moves = 12-4 = +8
Engine B: 3 won games below 50 moves, 1 lost game below 50 moves = 3-1 = +2

A battle can be won through surprise attacks.
Very difficult at all, because also the endgame-artist plays nice chess.

I have no solution for an second system for evaluate all this; each game phase is interesting.

What I can do is to display:
1. Engine A = 745,5 P., First Lieutenant, 52,5% draws, 84,4% m-avg, 12-w / 04-l = +008
1. Engine B = 745,5 P., First Lieutenant, 56,8% draws, 89,4% m-avg, 03-w / 01-l = +002

Such information should be enough.
As little information as possible on an overview page with over 250 engines.

I will be thinking about it in the next day’s how I display all that.
Furthermore, url to the test-run result, url to the PGN database, url to the tournament configuration for each test.

Best
Frank
Frank Quisinsky
Posts: 7189
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: A test idea without Elo, I think I start middle of Jan.26!

Post by Frank Quisinsky »

Again, what I saw from Laser is just great. Pure random that I have it in the group of "Standard Engines". Of course, I know the program from all my tourneys, the compiles from Jim give Laser more power.

Monty games = Fun, fun, fun ... what for a fine development.

Still no idea how strong Monty is, but clearly much stronger than I had assumed. We'll find out exactly how strong later, once we've done a lot of test runs.

So far, I am very happy with the group of 26 Standard engines.
This group of 26 engines, can test all 250 TOP-Engines very well with the rank-system.
No changes are necessary here. It must be hard for every engine I test to play vs. the 26-Standard engines.

:-)

Also the work from the Russian programmer, the engine Petrel ...
Very aggressive in the earlier mid-games. Often I am thinking the program will win the game after 10 moves.
Frank Quisinsky
Posts: 7189
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: A test idea without Elo, I think I start middle of Jan.26!

Post by Frank Quisinsky »

Hi there,

now the third and the last test before I will start my new test-system January 15th, 2026.

I am using Dog 4.10.2 vs. the 26 Standard-Engines.
The group of Standard-Engines is now complete.
I changed from Senpai 3.0.1 HCE to Senpai 3.0 HCE JA (10% faster, BMI2 compile on the Intel system).

Furthermore, the first version of the ETOC-G site is ready. I made it very easy.
To be honest, I'm not really interested in complex *.html pages. Everything has to be very simple.
I will have only a selection in my engine-overview with own results.

What I changed is the time-control?!
The time-control will be 5 minutes/game + 2 seconds/move - BLITZ -

Note:
Also this Dog 4.10.2 test-run will be play on Intel i9-10900k with 10 cores.
Later I am using AMD Ryzen™ 9 5950X with 16 cores.

ETOC-G
https://www.amateurschach.de/main/_etoc-g.htm

Dog 4.10.2 vs. 26 Standard-Engines:
After all what I saw ... Dog 4.10.2 should be stronger as Fritz 16 (Rybka).

perhaps a Corporal ...
perhaps a Corporal with a lot of medals ... who know?

Live mode:
shredder *.sto file (tournament configuration, game plan)
https://www.amateurschach.de/fling/etoc-g_test-03.sto

shredder *.html file (current results)
https://www.amateurschach.de/fling/etoc-g_test-03.html

shredder *.pgn file (the games)
https://www.amateurschach.de/fling/etoc-g_test-03.pgn

Updates each 2 minutes with FTP-Software Fling Plus 5.04.

Best
Frank
Frank Quisinsky
Posts: 7189
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: A test idea without Elo, I think I start middle of Jan.26!

Post by Frank Quisinsky »

Ah shit ... the medals!
This was a top-secret idea for the tournament start.
You haven't read it and for the moment you have to forget it and we are best-friends.

Code: Select all

Sergeant 	519,5 ( 39,97%) -   481,0 ( 37,00%)
Corporal 	480,5 ( 36,97%) -   442,0 ( 34,00%)
Or a Sergeant ... who know?
I believe the final result will be between 34,00 and 40,00% ... wow!
Frank Quisinsky
Posts: 7189
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: A test idea without Elo, I think I start middle of Jan.26!

Post by Frank Quisinsky »

I had forgotten ...

The group of "Standard-Engines" never changes during the time ETOC-G is still running. I have really interest in testing many engines, again and again vs. the same group of "Standard-Engines". So I gave the 26 versions of 26 different engines the name "Standard-Engines".

Example:
Petrel 4.0 is out.
Petrel 3.1 JA is in the group of "Standard-Engines".
For sure, what I can do is to start a test-run ...
Petrel 4.0 vs. the 26 Standard-Engines.
In this case I have a pairing ... Petrel 4.0 JA - Petrel 3.1 JA.
But that's not a problem, it can be interesting too.

Other Example:
Uralochka 3.42a is in the group of "Standard-Engines".
No newer version will be out.
That's a problem with my system?!
It made no sense that I produced a pairing Uralochka 3.42a vs. Uralochka 3.42a.

Solution:
I have after each test-run 50 new Uralochka 3.42a games and can display this in an extra table on the detail site.
Can be seen on the new ETOC-G site. With other words, after 200 test-runs, 10.000 Uralochka 3.42a games (200x50) will be the results.

All the Standard-Engines can play without any problems, time losses, or other things.
That's very important, or I have a lot to do during test runs.

Another problem is the time-control?!

I am more a fan of longer time-controls.
But I will use only one system for a test, there should be a little bit animation for others (all is possible with one system only).
One test-run should no longer run as 1 day. Furthermore, energy is expensive in Germany. Furthermore, I need my other systems for my "private not available" book development. For ETOC-G I am using the FEOBOS opening book I developed with Klaus Wlotzka, more than 16 millions of formulas in an Excel spreadsheet you know, available on the download site from my older website: Frank's Chess Page.
Frank Quisinsky
Posts: 7189
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: A test idea without Elo, I think I start middle of Jan.26!

Post by Frank Quisinsky »

Code: Select all

  597,5 ( 45,97%) -   559,0 ( 43,00%) points = 14. Sergeant First Class
  558,5 ( 42,97%) -   520,0 ( 40,00%) points = 15. Staff Sergeant
Ups ...
Dog 4.10.2 is a Staff Sergeant or a Sergeant First Class ...
I completely underestimated that.

This could be something.
Junge, Junge, Junge ...