Added MadChess in TOPSIS ranking.
UCI_Elo
Moderators: hgm, Rebel, chrisw
-
- Posts: 4840
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: UCI_Elo
Added Stockfish 10 in the 5000 test positions to see how it would compare to the UCI_Elo 1500 engines.
The result is a bit surprising, it topped the number of match criteria. This would mean that Humans with Elo 1500 is capable of making good moves, perhaps some of these are forced moves like captures and check evasions. If this is the case then uci elo 1500 engine can be adjusted to make stronger moves. But there is a clear difference on Stockfish 10 performance and that is its low LACD at 47cp, or if its move is weak it only gives its opponent a small advantage. So to improve uci elo 1500 approximation, LACD should be increased. Notable engines with high LACD are CT800, Hiarcs, Amyan and MadChess.
Now I got an idea on TOPSIS weights. Modify criteria weight such that Stockfish 10 ranking is bad.
Here is a sample weighing, weight=[0.5, 0.05, 0.05, 0.1, 0.3].
With these weights Stockfish 10 is at rank 14/15.
The result is a bit surprising, it topped the number of match criteria. This would mean that Humans with Elo 1500 is capable of making good moves, perhaps some of these are forced moves like captures and check evasions. If this is the case then uci elo 1500 engine can be adjusted to make stronger moves. But there is a clear difference on Stockfish 10 performance and that is its low LACD at 47cp, or if its move is weak it only gives its opponent a small advantage. So to improve uci elo 1500 approximation, LACD should be increased. Notable engines with high LACD are CT800, Hiarcs, Amyan and MadChess.
Code: Select all
UCI_Elo 1500 engine test results on FIDE Elo 1500
Test positions are taken from players with FIDE Elo 1450 to 1550
Code: Select all
Engine Total Match High Low HACD LACD
Deuterium v2019.2.37.59 UCI_Elo 1500 5000 1891 1360 1749 426 284
Ufim v8.02 UCI_Elo 1500 5000 2164 1360 1476 447 423
CT800 V1.34 UCI_Elo 1500 5000 1627 1164 2209 329 914
Arasan 21.3 UCI_Elo 1500 5000 1634 1178 2188 491 440
DanaSah 7.9 Human UCI_Elo 1500 5000 1710 1195 2095 434 324
Stockfish 2019.07.14 UCI_Elo 1500 5000 1304 1231 2465 443 390
Cheng 4.39 UCI_Elo 1500 5000 2141 1527 1332 427 144
Discocheck 5.2 UCI_Elo 1500 5000 1947 1308 1745 380 445
Houdini 3 UCI_Elo 1500 5000 1704 1269 2027 427 174
Rhetoric 1.4.3 UCI_Elo 1500 5000 1875 1330 1795 360 448
Hiarcs 14 UCI_Elo 1500 5000 1798 1112 2090 375 685
Cheese 2.1 UCI_Elo 1500 5000 2138 1532 1330 421 165
Amyan 1.72 UCI_Elo 1500 5000 1803 1186 2011 460 551
MadChess 2.2 UCI_Elo 1500 5000 1705 1195 2100 438 526
Stockfish 10 5000 2237 2333 430 355 47
Code: Select all
::Legend::
Total: Number of test positions from human games.
Match: Count of pos, where engine and human move are the same.
High : Count of pos, where engine move is stronger than human move.
Low : Count of pos, where engine move is weaker than human move.
HACD : High Average Centipawn Difference, between engine move and human move where engine move is stronger
than human move by Centipawn amount, according to Stockfish 2019.04.16.
LACD : Low Average Centipawn Difference, between engine move and human move where engine move is weaker
than human move by Centipawn amount, according to Stockfish 2019.04.16.
Now I got an idea on TOPSIS weights. Modify criteria weight such that Stockfish 10 ranking is bad.
Here is a sample weighing, weight=[0.5, 0.05, 0.05, 0.1, 0.3].
With these weights Stockfish 10 is at rank 14/15.
Code: Select all
TOPSIS (mnorm=vector, wnorm=sum) - Solution:
ALT./CRIT. Match (max) W.0.5 High (min) W.0.05 Low (max) W.0.05 HACD (min) W.0.1 LACD (max) W.0.3 Rank
------------------------------------ ------------------- ------------------- ------------------ ------------------ ------------------ ------
Deuterium v2019.2.37.59 UCI_Elo 1500 1891 1360 1749 426 284 11
Ufim v8.02 UCI_Elo 1500 2164 1360 1476 447 423 5
CT800 V1.34 UCI_Elo 1500 1627 1164 2209 329 914 1
Arasan 21.3 UCI_Elo 1500 1634 1178 2188 491 440 8
DanaSah 7.9 Human UCI_Elo 1500 1710 1195 2095 434 324 10
Stockfish 2019.07.14 UCI_Elo 1500 1304 1231 2465 443 390 9
Cheng 4.39 UCI_Elo 1500 2141 1527 1332 427 144 13
Discocheck 5.2 UCI_Elo 1500 1947 1308 1745 380 445 6
Houdini 3 UCI_Elo 1500 1704 1269 2027 427 174 15
Rhetoric 1.4.3 UCI_Elo 1500 1875 1330 1795 360 448 7
Hiarcs 14 UCI_Elo 1500 1798 1112 2090 375 685 2
Cheese 2.1 UCI_Elo 1500 2138 1532 1330 421 165 12
Amyan 1.72 UCI_Elo 1500 1803 1186 2011 460 551 3
MadChess 2.2 UCI_Elo 1500 1705 1195 2100 438 526 4
Stockfish 10 2237 2333 430 355 47 14
-
- Posts: 4840
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: UCI_Elo
Add new criteria to compare the effect of human move with Elo 1500 and engine move with uci elo 1500. This is called HEMSE or
HEMSE: Human and Engine MSE (Mean Square Error) or Sum((HumanScore - EngineScore)^2)/total, smaller is better. If the score of human move is close to the score of engine move then the human and engine is similar. For same move the MSE is zero because their scores are the same. So even if for a given position the human move is not the save as the engine move, if their score is close according to the judge Stockfish dev 2019.04.16 then there is a high probability that their strength is close.
Add Stockfish 10 ccrl 3000+ engine and Arminius a ccrl 2600+ engine in the test to observe its MSE and use to help determine appropriate weight for TOPSIS. These 2 engines should be ranked lowest when weights are applied in TOPSIS.
Tried to run TOPSIS with weight distributions [0.1, 0.4, 0.5] for criteria [Match, LACD, HEMSE] respectively. By using these weights Stockfish and Arminius are ranked lower.
The top 3 engines that is close to the human player with Elo 1500 are:
Rhetoric, Discocheck and Ufim.
HEMSE: Human and Engine MSE (Mean Square Error) or Sum((HumanScore - EngineScore)^2)/total, smaller is better. If the score of human move is close to the score of engine move then the human and engine is similar. For same move the MSE is zero because their scores are the same. So even if for a given position the human move is not the save as the engine move, if their score is close according to the judge Stockfish dev 2019.04.16 then there is a high probability that their strength is close.
Add Stockfish 10 ccrl 3000+ engine and Arminius a ccrl 2600+ engine in the test to observe its MSE and use to help determine appropriate weight for TOPSIS. These 2 engines should be ranked lowest when weights are applied in TOPSIS.
Code: Select all
UCI_Elo 1500 engine test results on FIDE Elo 1500
Test positions are taken from players with FIDE Elo 1450 to 1550
Engine Total Match High Low HACD LACD HEMSE
Deuterium v2019.2.37.61 UCI_Elo 1500 5000 1886 1370 1744 424 282 4208112
Ufim v8.02 UCI_Elo 1500 5000 2168 1363 1469 422 361 4737056
Arasan 21.3 UCI_Elo 1500 5000 1641 1167 2192 495 452 6708957
CT800 V1.34 UCI_Elo 1500 5000 1642 1149 2209 333 807 10656012
DanaSah 7.9 Human UCI_Elo 1500 5000 1679 1202 2119 461 396 6041935
Stockfish 2019.07.14 UCI_Elo 1500 5000 1335 1245 2420 421 428 6462853
Cheng 4.39 UCI_Elo 1500 5000 2068 1466 1466 474 300 5016436
Discocheck 5.2 UCI_Elo 1500 5000 1948 1308 1744 381 443 5505496
Amyan 1.72 UCI_Elo 1500 5000 1803 1218 1979 419 561 7648687
MadChess 2.2 UCI_Elo 1500 5000 1693 1240 2067 424 552 7680236
Cheese 2.1 UCI_Elo 1500 5000 2123 1486 1391 404 210 3546737
Rhetoric 1.4.3 UCI_Elo 1500 5000 1853 1349 1798 357 461 5719087
Stockfish 10 5000 2244 2340 416 355 46 3238666
Arminius 2017-01-01 5000 2216 1827 957 420 68 3236474
Code: Select all
::Legend::
Total: Number of test positions from human games.
Match: Count of pos, where engine and human move are the same.
High : Count of pos, where engine move is stronger than human move.
Low : Count of pos, where engine move is weaker than human move.
HACD : High Average Centipawn Difference, or diff between engine move score
and human move score where engine move is stronger than human move
according to Stockfish dev 2019.04.16
LACD : Low Average Centipawn Difference, or diff between engine move score
and human move score where engine move is weaker than human move.
HEMSE: Human and Engine MSE (Mean Square Error) or Sum((HumanScore - EngineScore)^2)/total, smaller is better.
Tried to run TOPSIS with weight distributions [0.1, 0.4, 0.5] for criteria [Match, LACD, HEMSE] respectively. By using these weights Stockfish and Arminius are ranked lower.
Code: Select all
TOPSIS (mnorm=vector, wnorm=sum) - Solution:
ALT./CRIT. Match (max) W.0.1 LACD (max) W.0.4 HEMSE (min) W.0.5 Rank
------------------------------------ ------------------- ------------------ ------------------- ------
Deuterium v2019.2.37.61 UCI_Elo 1500 1886 282 4.20811e+06 7
Ufim v8.02 UCI_Elo 1500 2168 361 4.73706e+06 3
Arasan 21.3 UCI_Elo 1500 1641 452 6.70896e+06 8
CT800 V1.34 UCI_Elo 1500 1642 807 1.0656e+07 6
DanaSah 7.9 Human UCI_Elo 1500 1679 396 6.04194e+06 9
Stockfish 2019.07.14 UCI_Elo 1500 1335 428 6.46285e+06 10
Cheng 4.39 UCI_Elo 1500 2068 300 5.01644e+06 12
Discocheck 5.2 UCI_Elo 1500 1948 443 5.5055e+06 2
Amyan 1.72 UCI_Elo 1500 1803 561 7.64869e+06 4
MadChess 2.2 UCI_Elo 1500 1693 552 7.68024e+06 5
Cheese 2.1 UCI_Elo 1500 2123 210 3.54674e+06 11
Rhetoric 1.4.3 UCI_Elo 1500 1853 461 5.71909e+06 1
Stockfish 10 2244 46 3.23867e+06 14
Arminius 2017-01-01 2216 68 3.23647e+06 13
Rhetoric, Discocheck and Ufim.
-
- Posts: 4840
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
-
- Posts: 4840
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: UCI_Elo
Extract some info on blunder amount and opponen'ts material values from that test set.
From the first plot (top, left), human blunders 50 to 150, cp or around 1 pawn from a playable positions [+/-50]cp. There is a high occurence when opponen'ts material is still high. Generally high cp loss is observed as own's position gets worse and opponent's material is still high. This info can be used to simulate the uci elo 1500 engines of when to blunder and at what amount.
-
- Posts: 4368
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: UCI_Elo
I have not tuned UCI_Elo for Arasan except very approximately. There are some parameters used for strength reduction and I am sure they could be set better than they are. Still, it appears a ELO setting of 1500 is not too far off the mark.
--Jon
--Jon
-
- Posts: 1056
- Joined: Fri Mar 10, 2006 6:07 am
- Location: Basque Country (Spain)
Re: UCI_Elo
In the list is not Hiarcs that was the engine that seemed closer to Kai's formula, was it an oversight?Ferdy wrote: ↑Fri Jul 26, 2019 1:02 am Add new criteria to compare the effect of human move with Elo 1500 and engine move with uci elo 1500. This is called HEMSE or
HEMSE: Human and Engine MSE (Mean Square Error) or Sum((HumanScore - EngineScore)^2)/total, smaller is better. If the score of human move is close to the score of engine move then the human and engine is similar. For same move the MSE is zero because their scores are the same. So even if for a given position the human move is not the save as the engine move, if their score is close according to the judge Stockfish dev 2019.04.16 then there is a high probability that their strength is close.
Add Stockfish 10 ccrl 3000+ engine and Arminius a ccrl 2600+ engine in the test to observe its MSE and use to help determine appropriate weight for TOPSIS. These 2 engines should be ranked lowest when weights are applied in TOPSIS.
Code: Select all
UCI_Elo 1500 engine test results on FIDE Elo 1500 Test positions are taken from players with FIDE Elo 1450 to 1550 Engine Total Match High Low HACD LACD HEMSE Deuterium v2019.2.37.61 UCI_Elo 1500 5000 1886 1370 1744 424 282 4208112 Ufim v8.02 UCI_Elo 1500 5000 2168 1363 1469 422 361 4737056 Arasan 21.3 UCI_Elo 1500 5000 1641 1167 2192 495 452 6708957 CT800 V1.34 UCI_Elo 1500 5000 1642 1149 2209 333 807 10656012 DanaSah 7.9 Human UCI_Elo 1500 5000 1679 1202 2119 461 396 6041935 Stockfish 2019.07.14 UCI_Elo 1500 5000 1335 1245 2420 421 428 6462853 Cheng 4.39 UCI_Elo 1500 5000 2068 1466 1466 474 300 5016436 Discocheck 5.2 UCI_Elo 1500 5000 1948 1308 1744 381 443 5505496 Amyan 1.72 UCI_Elo 1500 5000 1803 1218 1979 419 561 7648687 MadChess 2.2 UCI_Elo 1500 5000 1693 1240 2067 424 552 7680236 Cheese 2.1 UCI_Elo 1500 5000 2123 1486 1391 404 210 3546737 Rhetoric 1.4.3 UCI_Elo 1500 5000 1853 1349 1798 357 461 5719087 Stockfish 10 5000 2244 2340 416 355 46 3238666 Arminius 2017-01-01 5000 2216 1827 957 420 68 3236474
Code: Select all
::Legend:: Total: Number of test positions from human games. Match: Count of pos, where engine and human move are the same. High : Count of pos, where engine move is stronger than human move. Low : Count of pos, where engine move is weaker than human move. HACD : High Average Centipawn Difference, or diff between engine move score and human move score where engine move is stronger than human move according to Stockfish dev 2019.04.16 LACD : Low Average Centipawn Difference, or diff between engine move score and human move score where engine move is weaker than human move. HEMSE: Human and Engine MSE (Mean Square Error) or Sum((HumanScore - EngineScore)^2)/total, smaller is better.
Tried to run TOPSIS with weight distributions [0.1, 0.4, 0.5] for criteria [Match, LACD, HEMSE] respectively. By using these weights Stockfish and Arminius are ranked lower.
The top 3 engines that is close to the human player with Elo 1500 are:Code: Select all
TOPSIS (mnorm=vector, wnorm=sum) - Solution: ALT./CRIT. Match (max) W.0.1 LACD (max) W.0.4 HEMSE (min) W.0.5 Rank ------------------------------------ ------------------- ------------------ ------------------- ------ Deuterium v2019.2.37.61 UCI_Elo 1500 1886 282 4.20811e+06 7 Ufim v8.02 UCI_Elo 1500 2168 361 4.73706e+06 3 Arasan 21.3 UCI_Elo 1500 1641 452 6.70896e+06 8 CT800 V1.34 UCI_Elo 1500 1642 807 1.0656e+07 6 DanaSah 7.9 Human UCI_Elo 1500 1679 396 6.04194e+06 9 Stockfish 2019.07.14 UCI_Elo 1500 1335 428 6.46285e+06 10 Cheng 4.39 UCI_Elo 1500 2068 300 5.01644e+06 12 Discocheck 5.2 UCI_Elo 1500 1948 443 5.5055e+06 2 Amyan 1.72 UCI_Elo 1500 1803 561 7.64869e+06 4 MadChess 2.2 UCI_Elo 1500 1693 552 7.68024e+06 5 Cheese 2.1 UCI_Elo 1500 2123 210 3.54674e+06 11 Rhetoric 1.4.3 UCI_Elo 1500 1853 461 5.71909e+06 1 Stockfish 10 2244 46 3.23867e+06 14 Arminius 2017-01-01 2216 68 3.23647e+06 13
Rhetoric, Discocheck and Ufim.
-
- Posts: 1056
- Joined: Fri Mar 10, 2006 6:07 am
- Location: Basque Country (Spain)
Re: UCI_Elo
I find a problem with the Kai formula for Elo CCRL bass (minor 800 CCRL or 1400 FIDE).
I have been able to establish a relationship between Elo FIDE, Elo USCF and Elo Active ( Elo of dedicated chess machines in
https://www.schach-computer.info/wiki/i ... -Elo-Liste )
I use (formulas of other experts) :
If we take the value of an engine with random moves with an approximate Elo of 250 CCRL this would give an Elo FIDE of more than 1000, Elo USCF of more than 750 and an Elo Active of more than 700 that are the minimum of each list and for the games that I have observed with a player with Elo FIDE 1000 plays much better than an engine with random moves, the players of 1000 points know the value of the pieces and in most plays they do not lose material. Also if we take an engine like Ram with 500 CCRL we can easily see that it will not win a 1200 FIDE. Even Alouette with CCRL 700 will not play as a 1300 FIDE.
In these cases I think it is more approximate:
Elo FIDE = (0.75 x Elo CCRL) + 700 or
Elo FIDE = (0.8 x Elo CCRL) + 560 or even
Elo FIDE = (0.85 x Elo CCRL) + 420
I have applied the 3 formulas progressively, so for example Ram has an ELO FIDE more similar to 850 instead of 1200, a random move has a ELO FIDE of 650 and not 1000 and Alouette will play as 1100 Elo and not 1300.
I have been able to establish a relationship between Elo FIDE, Elo USCF and Elo Active ( Elo of dedicated chess machines in
https://www.schach-computer.info/wiki/i ... -Elo-Liste )
Code: Select all
ELO CCRL ELO FIDE ELO USCF ELO ACTIVE NODES/S RANDF RANDF/2
2600 2660 2733 2665 838.861
2500 2590 2662 2594 419.430 0,00 0,00
2400 2520 2590 2522 209.715 12,50 6,25
2300 2450 2519 2451 104.858 25,00 12,50
2200 2380 2448 2380 52.429 37,50 18,75
2100 2310 2376 2308 26.214 50,00 25,00
2000 2240 2305 2237 15.729 75,00 37,50
1900 2170 2233 2165 13.107 100,00 50,00
1800 2100 2162 2094 9.175 125,00 62,50
1700 2030 2091 2023 5.898 150,00 75,00
1600 1960 2019 1951 3.604 183,33 91,67
1500 1890 1948 1880 2.130 216,67 108,33
1400 1820 1857 1789 1.229 250,00 125,00
1300 1750 1763 1695 696 300,00 150,00
1200 1680 1670 1602 389 350,00 175,00
1100 1610 1577 1509 215 400,00 200,00
1000 1540 1483 1415 200 450,00 225,00
900 1470 1390 1322 200 650,00 325,00
800 1400 1297 1229 200 850,00 425,00
700 1330 1203 1135 100 1050,00 525,00
600 1260 1110 1042 100 1250,00 625,00
500 1190 1017 949 100 1450,00 725,00
400 1120 923 855 100 1850,00 925,00
300 1050 830 762 100 2250,00 1125,00
200 980 737 669 100 2650,00 1325,00
100 910 643 575 100 3050,00 1525,00
0 840 550 482 100
Code: Select all
ELO USCF <--> ELO ACTIVE
Elo USCF = Elo Info-Active + 68
ELO FIDE <--> ELO USCF
Elo USCF = 20 + (1.02×Elo FIDE) if Elo FIDE > 1886
Elo USCF = -570 + (Elo FIDE/0.75) if Elo FIDE ≤ 1886
Elo FIDE = (Elo USCF-20) / 1.02 if Elo USCF > 1945
Elo FIDE = 0.75 * (570 + Elo USCF) if Elo USCF ≤ 1945
ELO FIDE <--> ELO CCRL
Elo FIDE = (0.7 x Elo CCRL) + 840
In these cases I think it is more approximate:
Elo FIDE = (0.75 x Elo CCRL) + 700 or
Elo FIDE = (0.8 x Elo CCRL) + 560 or even
Elo FIDE = (0.85 x Elo CCRL) + 420
I have applied the 3 formulas progressively, so for example Ram has an ELO FIDE more similar to 850 instead of 1200, a random move has a ELO FIDE of 650 and not 1000 and Alouette will play as 1100 Elo and not 1300.
-
- Posts: 558
- Joined: Sat Mar 25, 2006 8:27 pm
Re: UCI_Elo
A little bit of a tangent: if you get an engine to frequently match the moves of a 1500 player, is there any reason to believe the engine would also be 1500? If you match 95% of moves, what are in the 5%? Probably all the dumb blunders. So I would think you would end up with something that picks 1500-level quality moves, but never blunders, and it would actually be maybe 200 elo stronger in practice. Or am I missing something?
-
- Posts: 4840
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines