I'm using Franz Huber's excellent emulators of old dedicated chess computers (https://fhub.jimdofree.com/). This allows me to automate the processing of test sets via Arena.
I'm initially interested in how these old machines compare in terms of their endgame ability. Here's what I've tried:
- I made a test set of 380 endgame tests while trying to get a broad range of difficulty and endgame categories
- I checked the tests with Stockfish using 6 EGTB, etc
- ran the automated tests and generated a PGN file using a "win" for solving, otherwise a "loss" e.g.
Code: Select all
[White "Mephisto TM London 68030"]
[Black "endgame - 3 hxg5+"]
[Result "1-0"]
1. *
[White "Mephisto TM London 68030"]
[Black "endgame - 4 e5"]
[Result "0-1"]
1. *
- I then use PGN Stat to run BayesElo/EloStat/Ordo and generate a rating list. I don't think this suits Ordo well because of the grouping?!
So I'm hoping each position gets a "rating" too based on how often it is solved or not by different computers. I think this is how chess.com do their puzzle test ratings?!
A snippet from bayeselo output:
Code: Select all
Rank Name Elo + - games score oppo. draws
64 endgame - 343 Kc5 2208 237 180 15 80% 1935 0%
65 endgame - 188 Bh7 2208 237 180 15 80% 1935 0%
66 Mephisto TM London 68030 2161 49 47 380 81% 1842 0%
67 endgame - 31 d6 2153 213 178 15 73% 1935 0%
(Is it possible to put an image here based on a screenshot of Excel ?)
Code: Select all
Computer Selective Endgame Delta Score Ave Time Level
Mephisto London 68030 2298 2164 -134 307 3.2 NORML 8 = 3 min/move
Saitek RISC 2500 2232 2134 -98 299 2.8 180s/move
Saitek Sparc 2208 2092 -116 287 3.7 e7 = 3 min/move
Novag Star Diamond 2173 2075 -98 282 2.7 b8 = 3 min/move
Mephisto Portorose 68020 2135 2025 -110 267 2 BLITZ 9 = 60 min/game
Fidelity Designer Mach IV 2325 2075 2048 -27 274 2.5 a7 = 40 moves in 2 hrs (3 min/move)
Novag Diablo 2002 1960 -42 246 3.4 d8 = 3 min/move
Mephisto Amsterdam 1946 1883 -63 220 3.4 6 = 40 moves in 2 hrs
CXG Sphinx Galaxy 1866 1817 -49 196 3.1 a8 = 3 min/move
Conchess Plymate Victoria 1865 1858 -7 211 3.7 8 = 40 moves in 2 hrs (3 min/move)
Fidelity Par Excellence 1829 1863 34 213 5.9 11 = 40 moves in 2 hrs
Saitek Turbostar 432 1760 1763 3 177 2.4 a6 = 3 min/move
Novag Super Constellation 1728 1810 82 194 2.6 7 = 1-10 min/move (40 moves in 2 hrs)
Novag VIP 1631 1724 93 163 3 FT 8 = 3 min/move
ARB Sargon 1320 1847 527 207 1.9 5 = 3 min/move
29068 29063
One thing that looks suspicious to me is how all of the negative deltas are at the top of the table?! I was expecting to see some computers performing better or worse in the endgame (compared to their overall rating) regardless of position in table. There is also a chance that the emulated version I chose isn't the same as the Selective Search version of the machine. ARB Sargon looks suspicious in that respect.
Any thoughts on this experiment are appreciated