CCRL, FIDE and Ratings

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

CCRL, FIDE and Ratings

Post by MikeB »

The last few months, I have been focused on the UCI_LimitStrength feature , trying to make it more meaningful to the user. Unfortunately, it took me a while to realized that CCRL ratings, as good as they are at the higher end, were very frustrating to work with at the lower levels as they were totally disconnected from FIDE and other human play rating systems (note, ICCF is not human play ;>) ). A typical example is ShallowBlue rated 1712 on the CCRL 40/4 list. It's fine for CCRL - but for human play, it plays much stronger than 1712 - it reality , it's not that far from FIDE 2000. So after enhancing UCI Limit Strength and syncing the ratings to CCRl, I was very disappointed to find out that for me, an experience club player, while no master, I probably play OTB in the 1600 to 1700 level, had to play my engine near the low end of the scale to have shot at winning, On the one hand , it doesn't really matter, but there are consideration as well that were making human play less desirable. My mission was to try to fix that to the, keep in mind , that at the near 2900 Elo levels, CCRL is actually very close to the human ratings systems. Did some research , most of which I found here , thank you very much, and came up with an algorithm ( directly based on some of Kai's comments) to address the issues I have mentioned . Below are the results of that effort: Note Honey now has has two ratings systems to select from, the initial one that was developed and in sync with CCRL (more or less) and second one , which I labeled FIDE that is more in alignment with human play rating systems. To activate FIDE ratings, simply check the UCI option box labeled "FIDE_Ratings". Leaving it unchecked , it will play more in alignment with CCRL ratings when using UCI LimitStrength, checking the "FIDE_Ratings" box , the ratings you select will be more in alignment with FIDE and other similar human play rating systems.

Where you see "FIDE" below , indicates the "FIDE_Ratings" box was checked , where you see CCRL, the FIDE_Ratings" box was unchecked, this table was anchored to ShallowBlue CCRL 1712 . The very lowest lower ratings now are much lower than indicated , which I';m fine since they are designed for "newbies" anyway and everyone should be able to find a level that they can compete. An average club player can select FIDE 1700 and have good game, under the CCRL rating system they would be forced to select CCRL rating of 1250 or so to have decent game. Around 2700, the ratings merge and above that FIDE ratings are slightly stronger than CCRL

All results below are single core with 30 sec / .3 second increment and 64M hash

Code: Select all

Rank Name                Rating   Δ     +    -     #     Σ    Σ%     W    L    D   W%    =%   OppR 
---------------------------------------------------------------------------------------------------------
   1 Honey-FIDE-2900      2901   0.0   21   21   700  431.5  61.6  268  105  327  38.3  46.7  2836 
   2 Honey-CCRL-2900      2838  62.7   25   25   500  193.0  38.6   70  184  246  14.0  49.2  2901 
   3 Honey-FIDE-2800      2828   9.9   15   15  2200 1626.5  73.9 1324  271  605  60.2  27.5  2549 
   4 Honey-CCRL-2800      2781  47.4   15   15  1500  894.5  59.6  611  322  567  40.7  37.8  2718 
   5 Honey-FIDE-2700      2716  65.3   19   19  1000  515.5  51.5  336  305  359  33.6  35.9  2704 
   6 Honey-CCRL-2700      2711   4.8   19   19  1000  518.0  51.8  336  300  364  33.6  36.4  2698 
   7 Honey-CCRL-2600      2615  95.8   14   14  2000  964.5  48.2  655  726  619  32.8  30.9  2626 
   8 Honey-FIDE-2600      2579  36.1   15   15  2000  957.5  47.9  683  768  549  34.1  27.4  2592 
   9 Honey-CCRL-2500      2535  43.8   19   19  1000  482.0  48.2  311  347  342  31.1  34.2  2545 
  10 Honey-FIDE-2500      2485  50.7   19   19  1000  535.5  53.5  387  316  297  38.7  29.7  2458 
  11 Honey-CCRL-2400      2476   9.0   12   12  2500 1407.0  56.3 1067  753  680  42.7  27.2  2433 
  12 Honey-CCRL-2300      2392  83.5   19   19  1000  518.0  51.8  382  346  272  38.2  27.2  2379 
  13 Honey-FIDE-2400      2337  54.8   14   14  2500 1169.0  46.8  889 1051  560  35.6  22.4  2353 
  14 Honey-CCRL-2200      2283  54.1   20   20  1000  279.5  28.0  152  593  255  15.2  25.5  2434 
  15 Honey-FIDE-2300      2173 110.4   20   20  1000  467.0  46.7  361  427  212  36.1  21.2  2194 
  16 Honey-CCRL-2000      2081  92.0   19   19  1500 1152.0  76.8 1050  246  204  70.0  13.6  1862 
  17 Honey-FIDE-2200      2052  29.2   14   14  2500 1301.0  52.0 1135 1033  332  45.4  13.3  2032 
  18 Honey-CCRL-1900      1990  61.6   20   20  1000  534.5  53.4  443  374  183  44.3  18.3  1963 
  20 Honey-FIDE-2100      1917  62.8   21   21  1000  521.0  52.1  446  404  150  44.6  15.0  1901 
  21 Honey-CCRL-1800      1845  72.4   13   13  3000 1671.0  55.7 1479 1137  384  49.3  12.8  1799 
  22 Honey-FIDE-2000      1750  94.9   14   14  3000 1162.5  38.8 1016 1691  293  33.9   9.8  1846 
  23 Honey-CCRL-1712      1734  16.0   14   14  2500 1447.0  57.9 1323  929  248  52.9   9.9  1643 
  24 Shallow Blue 2.0.0   1712  22.3   20   20  1000  470.0  47.0  419  479  102  41.9  10.2  1734 
  25 Honey-CCRL-1700      1706   5.9   15   15  2000  881.5  44.1  760  997  243  38.0  12.2  1750 
  26 Honey-FIDE-1900      1583 122.6   21   21  1000  469.5  46.9  416  477  107  41.6  10.7  1607 
  27 Honey-CCRL-1600      1577   5.9   17   17  2000 1016.5  50.8  931  898  171  46.5   8.6  1536 
  28 Honey-CCRL-1500      1516  61.7   23   23  1500 1144.0  76.3 1111  323   66  74.1   4.4  1208 
  29 Honey-FIDE-1800      1463  52.2   14   14  3000 1891.5  63.0 1776  993  231  59.2   7.7  1345 
  30 Honey-FIDE-1712      1241 222.0   20   20  1500  407.0  27.1  355 1041  104  23.7   6.9  1472 
  31 Honey-FIDE-1700      1218  23.2   16   16  2000  788.0  39.4  708 1132  160  35.4   8.0  1307 
  32 Honey-CCRL-1400      1077 141.5   20   20  2000  821.0  41.0  793 1151   56  39.6   2.8  1203 
  33 Honey-FIDE-1600      1059  18.2   17   17  2000  881.5  44.1  827 1064  109  41.4   5.5  1123 
  34 Honey-FIDE-1500       968  90.3   20   20  1500  542.5  36.2  513  928   59  34.2   3.9  1139 
  35 Honey-CCRL-1300       910  58.0   22   22  1000  461.0  46.1  441  519   40  44.1   4.0   943 
  36 Honey-FIDE-1400       842  68.8   15   15  2000  863.5  43.2  834 1107   59  41.7   3.0   899 
  37 Honey-FIDE-1300       814  27.9   21   21  1000  518.0  51.8  506  470   24  50.6   2.4   799 
  38 Honey-CCRL-1200       809   4.4   15   15  2500 1301.0  52.0 1271 1169   60  50.8   2.4   793 
  39 Honey-FIDE-1200       757  51.9   14   14  2500 1281.0  51.2 1257 1195   48  50.3   1.9   743 
  40 Honey-FIDE-1100       728  29.4   16   16  2000 1318.0  65.9 1309  673   18  65.5   0.9   591 
  41 Honey-CCRL-1100       661  66.5   22   22  1000  475.0  47.5  466  516   18  46.6   1.8   683 
  42 Honey-CCRL-1000       557 104.4   16   16  2000  714.0  35.7  700 1272   28  35.0   1.4   681 
  43 Honey-FIDE-1000       524  33.1   16   16  2000  612.0  30.6  607 1383   10  30.4   0.5   692 
---------------------------------------------------------------------------------------------------------
I use polyglot, so all of this can be setup in the ini file as such for each engine like this:

Code: Select all

[PolyGlot]
EngineDir = .
EngineCommand = ./Honey_Elo
EngineName = Honey-FIDE-2800
Resign = true
ResignScore = 1000

[Engine]
Score_Output = Centipawn
SyzygyPath = /Volumes/OWC960/syzygy
SyzygyProbeLimit = 6
UCI_LimitStrength = true
UCI_Elo = 2800
Adaptive_Play = false
Variety = false
FIDE_Ratings = true
Threads = 1
Fast_Play = false
Hash = 16
Ponder = false
OwnBook = false
MultiPV = 1
With respective to adaptive play, it is now modified so it can be activated without UCI_LimitStrength

So Honey, going full strength with adaptive play on, can play at Crafty 25.3 level if you're having a good day...

Code: Select all

Rank Name              Rating   Δ     +    -     #     Σ    Σ%     W    L    D   W%    =%   OppR 
---------------------------------------------------------------------------------------------------------
   1 Crafty-25.3        1059   0.0   59   59   100   51.5  51.5   40   37   23  40.0  23.0  1049 
   2 Honey-FIDE-3500a   1049   9.1   59   59   100   48.5  48.5   37   40   23  37.0  23.0  1059 
---------------------------------------------------------------------------------------------------------
or as low as CCRL 2300, If you're having a bad day. Unfortunately this 600/700 Elo point spread did not show up at the the lower levels.

Code: Select all

Rank Name              Rating   Δ     +    -     #     Σ    Σ%     W    L    D   W%    =%   OppR 
---------------------------------------------------------------------------------------------------------
   1 Honey-FIDE-3500a   1057   0.0   31   31   300  153.0  51.0   98   92  110  32.7  36.7  1051 
   2 Honey-CCRL-2300    1051   6.4   31   31   300  147.0  49.0   92   98  110  30.7  36.7  1057 
---------------------------------------------------------------------------------------------------------
...will be released soon...
Image
BrendanJNorman
Posts: 2526
Joined: Mon Feb 08, 2016 12:43 am
Full name: Brendan J Norman

Re: CCRL, FIDE and Ratings

Post by BrendanJNorman »

MikeB wrote: Thu Oct 03, 2019 5:31 am The last few months, I have been focused on the UCI_LimitStrength feature , trying to make it more meaningful to the user. Unfortunately, it took me a while to realized that CCRL ratings, as good as they are at the higher end, were very frustrating to work with at the lower levels as they were totally disconnected from FIDE and other human play rating systems (note, ICCF is not human play ;>) ). A typical example is ShallowBlue rated 1712 on the CCRL 40/4 list. It's fine for CCRL - but for human play, it plays much stronger than 1712 - it reality , it's not that far from FIDE 2000. So after enhancing UCI Limit Strength and syncing the ratings to CCRl, I was very disappointed to find out that for me, an experience club player, while no master, I probably play OTB in the 1600 to 1700 level, had to play my engine near the low end of the scale to have shot at winning, On the one hand , it doesn't really matter, but there are consideration as well that were making human play less desirable.
I've known of this issue for a long time, during my time messing around with Rodent and other engines, trying to weaken them whilst retaining style.

The problem is that, as much as it hurts to admit, we humans are absolutely feeble compared to even weak engines.

And I'm saying this as a strongish player.

We simple make FAR too many tactical mistakes and even weak engines make basically zero.

You have to push Nodes per second basically into the floor to produce tactical mistakes, but sometimes this somehow affects style as well.

It's a tough thing to balance out.

Simulating human play is the most interesting thing for me in computer chess - Elo chasing is only interesting for so long...

Will be following your work, Mike, thanks. :)
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: CCRL, FIDE and Ratings

Post by MikeB »

BrendanJNorman wrote: Sat Oct 05, 2019 3:31 am
MikeB wrote: Thu Oct 03, 2019 5:31 am The last few months, I have been focused on the UCI_LimitStrength feature , trying to make it more meaningful to the user. Unfortunately, it took me a while to realized that CCRL ratings, as good as they are at the higher end, were very frustrating to work with at the lower levels as they were totally disconnected from FIDE and other human play rating systems (note, ICCF is not human play ;>) ). A typical example is ShallowBlue rated 1712 on the CCRL 40/4 list. It's fine for CCRL - but for human play, it plays much stronger than 1712 - it reality , it's not that far from FIDE 2000. So after enhancing UCI Limit Strength and syncing the ratings to CCRl, I was very disappointed to find out that for me, an experience club player, while no master, I probably play OTB in the 1600 to 1700 level, had to play my engine near the low end of the scale to have shot at winning, On the one hand , it doesn't really matter, but there are consideration as well that were making human play less desirable.
I've known of this issue for a long time, during my time messing around with Rodent and other engines, trying to weaken them whilst retaining style.

The problem is that, as much as it hurts to admit, we humans are absolutely feeble compared to even weak engines.

And I'm saying this as a strongish player.

We simple make FAR too many tactical mistakes and even weak engines make basically zero.

You have to push Nodes per second basically into the floor to produce tactical mistakes, but sometimes this somehow affects style as well.

It's a tough thing to balance out.

Simulating human play is the most interesting thing for me in computer chess - Elo chasing is only interesting for so long...

Will be following your work, Mike, thanks. :)
For the weaker levels, I had to introduce some radonmization. Stockfish is so strong that even with just very few nodes ( I use nodes per second approach 100 nodes per second is probably 1700 (FIDE , not CCRL) or at game 3/2. And as stockfish gets getting stronger this floor keeps rising. I would say ten floor raises faster ( more Elo) , than the top end. I don't know what the ratio is , but 1 or 2 Elo gaining patch at the top end could be 3 or 4 Elo or probably more at the floor ( the weakest Stockfish will play at 50 or 60 nps). That phenomenal may be partly responsible for the "lower end " of CCRL being so strong when compared to the FIDE ratings ( ratings at the lower levels are more compressed - i.e.,wider range of true strength than what the ratings show , than Human ratings ). There's not any flaw in the methodology CCRL, , it's just simply "it is what it is". Anyway. I'm pretty excited about this next release as I think folks will enjoy it.
Image
dannyb
Posts: 78
Joined: Mon Jul 09, 2018 6:08 pm
Full name: Daniel Bennett

Re: CCRL, FIDE and Ratings

Post by dannyb »

MikeB wrote: Sat Oct 05, 2019 8:13 am For the weaker levels, I had to introduce some radonmization.
...
Do you also disable the quiescence search? Would it be useful for the lower levels to make them tactically weaker?
Uri Blass
Posts: 10300
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: CCRL, FIDE and Ratings

Post by Uri Blass »

BrendanJNorman wrote: Sat Oct 05, 2019 3:31 am
MikeB wrote: Thu Oct 03, 2019 5:31 am The last few months, I have been focused on the UCI_LimitStrength feature , trying to make it more meaningful to the user. Unfortunately, it took me a while to realized that CCRL ratings, as good as they are at the higher end, were very frustrating to work with at the lower levels as they were totally disconnected from FIDE and other human play rating systems (note, ICCF is not human play ;>) ). A typical example is ShallowBlue rated 1712 on the CCRL 40/4 list. It's fine for CCRL - but for human play, it plays much stronger than 1712 - it reality , it's not that far from FIDE 2000. So after enhancing UCI Limit Strength and syncing the ratings to CCRl, I was very disappointed to find out that for me, an experience club player, while no master, I probably play OTB in the 1600 to 1700 level, had to play my engine near the low end of the scale to have shot at winning, On the one hand , it doesn't really matter, but there are consideration as well that were making human play less desirable.
I've known of this issue for a long time, during my time messing around with Rodent and other engines, trying to weaken them whilst retaining style.

The problem is that, as much as it hurts to admit, we humans are absolutely feeble compared to even weak engines.

And I'm saying this as a strongish player.

We simple make FAR too many tactical mistakes and even weak engines make basically zero.

You have to push Nodes per second basically into the floor to produce tactical mistakes, but sometimes this somehow affects style as well.

It's a tough thing to balance out.

Simulating human play is the most interesting thing for me in computer chess - Elo chasing is only interesting for so long...

Will be following your work, Mike, thanks. :)
weak engines make a lot of tactical mistakes.

For example the random mover is a very weak engine that does tactical mistakes.
There are other weak engines that make tactical mistakes.

Ram is stronger than the random mover but still does a lot of tactical mistakes.

I proved that I can beat it easily without a queen and 2 rooks


[pgn][Event "Computer chess game"]
[Site "URIBLASS-THINK"]
[Date "2019.10.05"]
[Round "?"]
[White "UriBlass"]
[Black "Ram"]
[Result "1-0"]
[BlackElo "400"]
[ECO "A05"]
[Opening "Reti Opening"]
[Time "13:07:48"]
[Variation "1...Nf6"]
[WhiteElo "2400"]
[TimeControl "120+12"]
[Termination "normal"]
[PlyCount "117"]
[WhiteType "human"]
[BlackType "program"]

1. Nf3 Nf6 2. Ng1 Ng4 3. Nf3 Ne3 4. Ng1 Nxd1 5. Nf3 Ne3 6. Ng1 Nf5 7. Nf3
Ng3 8. Ng1 Nxh1 9. Nf3 Ng3 10. Ng1 Ne4 11. Nf3 Nc5 12. Ng1 Nb3 13. Nf3 Nxa1
14. Ng1 Nb3 15. Nf3 Nc5 16. Ng1 Ne4 17. Nf3 Nf6 18. Ng1 Ng8 19. e4 e6
{(e7e6) -3.37/7 13} 20. Nc3 Qg5 {(d8g5) -0.48/7 33} 21. Nf3 Bc5 {(f8c5)
+59.47/6 5} 22. Nxg5 Bxf2+ {(c5f2) +37.15/6 7} 23. Kxf2 Nf6 {(g8f6)
-26.63/7 18} 24. h3 Nc6 {(b8c6) +35.88/6 2} 25. a3 Ne5 {(c6e5) +41.71/6 7}
26. d4 h6 {(h7h6) +40.86/6 6} 27. dxe5 hxg5 {(h6g5) -31.85/7 10} 28. exf6
gxf6 {(g7f6) -39.78/7 17} 29. b3 g4 {(g5g4) +30.04/6 2} 30. hxg4 Rh1
{(h8h1) -29.64/7 12} 31. Nb5 Ke7 {(e8e7) -36.55/7 6} 32. Nxc7 b6 {(b7b6)
-38.09/7 10} 33. Nxa8 Bb7 {(c8b7) -32.66/7 7} 34. Nc7 Bxe4 {(b7e4) -35.33/7
16} 35. c4 a5 {(a7a5) -28.18/7 12} 36. Nb5 Kf8 {(e7f8) -29.11/7 9} 37. Nc3
Bg6 {(e4g6) -32.46/7 14} 38. Bb2 Ke8 {(f8e8) -27.72/7 16} 39. Nb5 Kf8
{(e8f8) -28.21/7 3} 40. Bxf6 Kg8 {(f8g8) -31.74/7 7} 41. Nd4 b5 {(b6b5)
-31.09/7 9} 42. cxb5 Be4 {(g6e4) -35.48/7 7} 43. b6 e5 {(e6e5) -33.55/7 9}
44. Bxe5 Bd5 {(e4d5) -35.01/7 14} 45. b4 Be4 {(d5e4) -35.57/7 18} 46. bxa5
Kh7 {(g8h7) -37.42/7 10} 47. a6 d6 {(d7d6) -37.15/7 9} 48. Bxd6 Kg6 {(h7g6)
-39.59/7 11} 49. b7 Bb1 {(e4b1) -48.34/7 12} 50. b8=Q Rxf1+ {(h1f1) -2.24/8
5} 51. Kxf1 Bd3+ {(b1d3) -M460/9 5} 52. Kf2 Kh7 {(g6h7) -13.11/8 10} 53.
Nf5 Bxf5 {(d3f5) -13.24/8 3} 54. gxf5 Kh6 {(h7h6) -M2/12 4} 55. Qg8 Kh5
{(h6h5) -M1/18 0} 56. g4+ Kh4 {(h5h4) -M1/18 0} 57. Bg3+ Kh3 {(h4h3) -M1/18
0} 58. Kf3 f6 {(f7f6) -M0/18 0} 59. Qh8# 1-0
[/pgn]
With only knight and bishop I lost

[pgn][Event "Computer chess game"]
[Site "URIBLASS-THINK"]
[Date "2019.10.05"]
[Round "?"]
[White "UriBlass"]
[Black "Ram"]
[Result "0-1"]
[BlackElo "400"]
[ECO "A05"]
[Opening "Reti Opening"]
[Time "13:20:39"]
[Variation "1...Nf6"]
[WhiteElo "2400"]
[TimeControl "120+12"]
[Termination "normal"]
[PlyCount "104"]
[WhiteType "human"]
[BlackType "program"]

1. Nf3 Nf6 2. Ng1 Ng4 3. Nf3 Ne3 4. Ng1 Nxd1 5. Nf3 Ne3 6. Ng1 Nf5 7. Nf3
Ng3 8. Ng1 Nxh1 9. Nf3 Ng3 10. Ng1 Ne4 11. Nf3 Nc5 12. Ng1 Nb3 13. Nf3 Nxa1
14. Ng1 Nb3 15. Nf3 Nxc1 16. Ng1 Nb3 17. Nf3 Nc5 18. Ng1 Na4 19. Nf3 Nc3
20. Ng1 Nxb1 21. Nf3 Nc3 22. Ng1 Nd5 23. Nf3 Nf6 24. Ng1 Ng8 25. e4 e5
{(e7e5) +1.51/7 10} 26. Nf3 Qf6 {(d8f6) +59.73/6 7} 27. Bc4 Qxf3 {(f6f3)
+62.08/6 4} 28. gxf3 d5 {(d7d5) -6.96/7 14} 29. exd5 Bd6 {(f8d6) -7.12/7
10} 30. a3 Kd8 {(e8d8) -1.72/7 12} 31. b4 Nf6 {(g8f6) +0.13/7 17} 32. Bb3
Bh3 {(c8h3) +2.56/7 19} 33. c4 e4 {(e5e4) +3.55/7 18} 34. c5 Re8 {(h8e8)
+5.41/7 18} 35. cxd6 exf3+ {(e4f3) +0.33/7 7} 36. Kd1 cxd6 {(c7d6) +3.25/7
5} 37. Bc4 Bf5 {(h3f5) +6.29/7 13} 38. Kc1 Re1+ {(e8e1) +9.21/7 12} 39. Kb2
Rb1+ {(e1b1) +7.77/7 18} 40. Kc3 Rc1+ {(b1c1) +8.08/7 16} 41. Kd4 a5
{(a7a5) +7.10/7 24} 42. b5 Ng4 {(f6g4) +8.15/7 16} 43. a4 b6 {(b7b6)
+9.53/7 8} 44. Bb3 Nxh2 {(g4h2) +9.60/7 7} 45. Ke3 g5 {(g7g5) +10.92/7 3}
46. d3 Nd7 {(b8d7) +M311/8 2} 47. Kd2 Rac8 {(a8c8) +M472/9 13} 48. Bc4
R1xc4 {(c1c4) +M2/15 2} 49. dxc4 Rxc4 {(c8c4) +M2/18 0} 50. Ke3 Nc5 {(d7c5)
+M1/18 0} 51. Kd2 Nb3+ {(c5b3) +M1/18 0} 52. Ke3 Rc3# {(c4c3) +M0/18 0} 0-1
[/pgn]
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: CCRL, FIDE and Ratings

Post by MikeB »

Uri Blass wrote: Sat Oct 05, 2019 12:30 pm
BrendanJNorman wrote: Sat Oct 05, 2019 3:31 am
MikeB wrote: Thu Oct 03, 2019 5:31 am The last few months, I have been focused on the UCI_LimitStrength feature , trying to make it more meaningful to the user. Unfortunately, it took me a while to realized that CCRL ratings, as good as they are at the higher end, were very frustrating to work with at the lower levels as they were totally disconnected from FIDE and other human play rating systems (note, ICCF is not human play ;>) ). A typical example is ShallowBlue rated 1712 on the CCRL 40/4 list. It's fine for CCRL - but for human play, it plays much stronger than 1712 - it reality , it's not that far from FIDE 2000. So after enhancing UCI Limit Strength and syncing the ratings to CCRl, I was very disappointed to find out that for me, an experience club player, while no master, I probably play OTB in the 1600 to 1700 level, had to play my engine near the low end of the scale to have shot at winning, On the one hand , it doesn't really matter, but there are consideration as well that were making human play less desirable.
I've known of this issue for a long time, during my time messing around with Rodent and other engines, trying to weaken them whilst retaining style.

The problem is that, as much as it hurts to admit, we humans are absolutely feeble compared to even weak engines.

And I'm saying this as a strongish player.

We simple make FAR too many tactical mistakes and even weak engines make basically zero.

You have to push Nodes per second basically into the floor to produce tactical mistakes, but sometimes this somehow affects style as well.

It's a tough thing to balance out.

Simulating human play is the most interesting thing for me in computer chess - Elo chasing is only interesting for so long...

Will be following your work, Mike, thanks. :)
weak engines make a lot of tactical mistakes.

For example the random mover is a very weak engine that does tactical mistakes.
There are other weak engines that make tactical mistakes.

Ram is stronger than the random mover but still does a lot of tactical mistakes.

I proved that I can beat it easily without a queen and 2 rooks


[pgn][Event "Computer chess game"]
[Site "URIBLASS-THINK"]
[Date "2019.10.05"]
[Round "?"]
[White "UriBlass"]
[Black "Ram"]
[Result "1-0"]
[BlackElo "400"]
[ECO "A05"]
[Opening "Reti Opening"]
[Time "13:07:48"]
[Variation "1...Nf6"]
[WhiteElo "2400"]
[TimeControl "120+12"]
[Termination "normal"]
[PlyCount "117"]
[WhiteType "human"]
[BlackType "program"]

1. Nf3 Nf6 2. Ng1 Ng4 3. Nf3 Ne3 4. Ng1 Nxd1 5. Nf3 Ne3 6. Ng1 Nf5 7. Nf3
Ng3 8. Ng1 Nxh1 9. Nf3 Ng3 10. Ng1 Ne4 11. Nf3 Nc5 12. Ng1 Nb3 13. Nf3 Nxa1
14. Ng1 Nb3 15. Nf3 Nc5 16. Ng1 Ne4 17. Nf3 Nf6 18. Ng1 Ng8 19. e4 e6
{(e7e6) -3.37/7 13} 20. Nc3 Qg5 {(d8g5) -0.48/7 33} 21. Nf3 Bc5 {(f8c5)
+59.47/6 5} 22. Nxg5 Bxf2+ {(c5f2) +37.15/6 7} 23. Kxf2 Nf6 {(g8f6)
-26.63/7 18} 24. h3 Nc6 {(b8c6) +35.88/6 2} 25. a3 Ne5 {(c6e5) +41.71/6 7}
26. d4 h6 {(h7h6) +40.86/6 6} 27. dxe5 hxg5 {(h6g5) -31.85/7 10} 28. exf6
gxf6 {(g7f6) -39.78/7 17} 29. b3 g4 {(g5g4) +30.04/6 2} 30. hxg4 Rh1
{(h8h1) -29.64/7 12} 31. Nb5 Ke7 {(e8e7) -36.55/7 6} 32. Nxc7 b6 {(b7b6)
-38.09/7 10} 33. Nxa8 Bb7 {(c8b7) -32.66/7 7} 34. Nc7 Bxe4 {(b7e4) -35.33/7
16} 35. c4 a5 {(a7a5) -28.18/7 12} 36. Nb5 Kf8 {(e7f8) -29.11/7 9} 37. Nc3
Bg6 {(e4g6) -32.46/7 14} 38. Bb2 Ke8 {(f8e8) -27.72/7 16} 39. Nb5 Kf8
{(e8f8) -28.21/7 3} 40. Bxf6 Kg8 {(f8g8) -31.74/7 7} 41. Nd4 b5 {(b6b5)
-31.09/7 9} 42. cxb5 Be4 {(g6e4) -35.48/7 7} 43. b6 e5 {(e6e5) -33.55/7 9}
44. Bxe5 Bd5 {(e4d5) -35.01/7 14} 45. b4 Be4 {(d5e4) -35.57/7 18} 46. bxa5
Kh7 {(g8h7) -37.42/7 10} 47. a6 d6 {(d7d6) -37.15/7 9} 48. Bxd6 Kg6 {(h7g6)
-39.59/7 11} 49. b7 Bb1 {(e4b1) -48.34/7 12} 50. b8=Q Rxf1+ {(h1f1) -2.24/8
5} 51. Kxf1 Bd3+ {(b1d3) -M460/9 5} 52. Kf2 Kh7 {(g6h7) -13.11/8 10} 53.
Nf5 Bxf5 {(d3f5) -13.24/8 3} 54. gxf5 Kh6 {(h7h6) -M2/12 4} 55. Qg8 Kh5
{(h6h5) -M1/18 0} 56. g4+ Kh4 {(h5h4) -M1/18 0} 57. Bg3+ Kh3 {(h4h3) -M1/18
0} 58. Kf3 f6 {(f7f6) -M0/18 0} 59. Qh8# 1-0
[/pgn]
With only knight and bishop I lost

[pgn][Event "Computer chess game"]
[Site "URIBLASS-THINK"]
[Date "2019.10.05"]
[Round "?"]
[White "UriBlass"]
[Black "Ram"]
[Result "0-1"]
[BlackElo "400"]
[ECO "A05"]
[Opening "Reti Opening"]
[Time "13:20:39"]
[Variation "1...Nf6"]
[WhiteElo "2400"]
[TimeControl "120+12"]
[Termination "normal"]
[PlyCount "104"]
[WhiteType "human"]
[BlackType "program"]

1. Nf3 Nf6 2. Ng1 Ng4 3. Nf3 Ne3 4. Ng1 Nxd1 5. Nf3 Ne3 6. Ng1 Nf5 7. Nf3
Ng3 8. Ng1 Nxh1 9. Nf3 Ng3 10. Ng1 Ne4 11. Nf3 Nc5 12. Ng1 Nb3 13. Nf3 Nxa1
14. Ng1 Nb3 15. Nf3 Nxc1 16. Ng1 Nb3 17. Nf3 Nc5 18. Ng1 Na4 19. Nf3 Nc3
20. Ng1 Nxb1 21. Nf3 Nc3 22. Ng1 Nd5 23. Nf3 Nf6 24. Ng1 Ng8 25. e4 e5
{(e7e5) +1.51/7 10} 26. Nf3 Qf6 {(d8f6) +59.73/6 7} 27. Bc4 Qxf3 {(f6f3)
+62.08/6 4} 28. gxf3 d5 {(d7d5) -6.96/7 14} 29. exd5 Bd6 {(f8d6) -7.12/7
10} 30. a3 Kd8 {(e8d8) -1.72/7 12} 31. b4 Nf6 {(g8f6) +0.13/7 17} 32. Bb3
Bh3 {(c8h3) +2.56/7 19} 33. c4 e4 {(e5e4) +3.55/7 18} 34. c5 Re8 {(h8e8)
+5.41/7 18} 35. cxd6 exf3+ {(e4f3) +0.33/7 7} 36. Kd1 cxd6 {(c7d6) +3.25/7
5} 37. Bc4 Bf5 {(h3f5) +6.29/7 13} 38. Kc1 Re1+ {(e8e1) +9.21/7 12} 39. Kb2
Rb1+ {(e1b1) +7.77/7 18} 40. Kc3 Rc1+ {(b1c1) +8.08/7 16} 41. Kd4 a5
{(a7a5) +7.10/7 24} 42. b5 Ng4 {(f6g4) +8.15/7 16} 43. a4 b6 {(b7b6)
+9.53/7 8} 44. Bb3 Nxh2 {(g4h2) +9.60/7 7} 45. Ke3 g5 {(g7g5) +10.92/7 3}
46. d3 Nd7 {(b8d7) +M311/8 2} 47. Kd2 Rac8 {(a8c8) +M472/9 13} 48. Bc4
R1xc4 {(c1c4) +M2/15 2} 49. dxc4 Rxc4 {(c8c4) +M2/18 0} 50. Ke3 Nc5 {(d7c5)
+M1/18 0} 51. Kd2 Nb3+ {(c5b3) +M1/18 0} 52. Ke3 Rc3# {(c4c3) +M0/18 0} 0-1
[/pgn]
I would expect that from you, you are a GM. You will probably like my new adaptive play, as it pulls ahead , ii will throw you a bone now and then so you can stay even.
Image
BrendanJNorman
Posts: 2526
Joined: Mon Feb 08, 2016 12:43 am
Full name: Brendan J Norman

Re: CCRL, FIDE and Ratings

Post by BrendanJNorman »

Uri Blass wrote: Sat Oct 05, 2019 12:30 pm
weak engines make a lot of tactical mistakes.

For example the random mover is a very weak engine that does tactical mistakes.
There are other weak engines that make tactical mistakes.
A couple things.

1. I refer to engines which have at least have a decent playing style. I would instantly disqualify a "random mover" (or such bean-counter types) from this group. So, for example, Schola 1.1 is about 2000 CCRL, has been praised for it's style, and for such a 'weak' engine, it is still probably 2600 real world Elo, tactically.

This is the just the reality with chess engines, especially at fast time controls. People used to use GNUchess to beat GMs on ICC back in the day... And why I'm so interested in creative ways of dumbing them down (what I'd call smartening them up haha!)

2. Weak engines get outsearched, they don't necessarily simulate human-like tactical blunders.

I know that in some way, humans are basically being "outsearched" when they blunder as well, but there are also psychological, emotional etc factors that contribute to human blunders. How interesting would it be to write an algorithm that simulates (and randomizes or something) the moods, positional biases, 'move before you think' bad habits and so on.

THAT'D be real AI to me...it'd be something you can plug into a DGT board and play REAL blitz games as if you're on liChess.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: CCRL, FIDE and Ratings

Post by Ovyron »

BrendanJNorman wrote: Sun Oct 06, 2019 3:39 amI know that in some way, humans are basically being "outsearched" when they blunder as well, but there are also psychological, emotional etc factors that contribute to human blunders. How interesting would it be to write an algorithm that simulates (and randomizes or something) the moods, positional biases, 'move before you think' bad habits and so on.

THAT'D be real AI to me...it'd be something you can plug into a DGT board and play REAL blitz games as if you're on liChess.
The technology already exists for that. You could train a Neural Network like Leela, but instead of rewarding it for winning games, you'd reward it for how closely it matches the moves of some human. After enough iterations, it'd be indistinguishable from the human, so it would be able to play realistic human blunders (though, I have said in other threads, you can save iterations by having an adversarial network try to tell the difference between the human's games and those of the NN, and reward the NN for fooling the AN. If Leela becomes very good at playing chess, then this NN would become very good at fooling the AN, and the AN would already be better at distinguishing the NN moves from the human moves than any human, so it should pass the turing test.)

At least, this worked for Super Mario Kart (where the AI trying to imitate the human outperformed the NN trying to win races, with a fraction of the effort. Even when the human wasn't an expert in the game.)

My suggestion would be starting with Capablanca, he was the closest player to moves from Rybka, so he'd probably be the easiest one to emulate like this (Paul Morphy would be at the other end. It has been surprisingly easy to make engines play like Tal by just tweaking the settings, but nothing plays like Morphy that I can see. Is it Morphy's Law?)
Your beliefs create your reality, so be careful what you wish for.
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: CCRL, FIDE and Ratings

Post by carldaman »

Ovyron wrote: Sun Oct 06, 2019 9:17 am
BrendanJNorman wrote: Sun Oct 06, 2019 3:39 amI know that in some way, humans are basically being "outsearched" when they blunder as well, but there are also psychological, emotional etc factors that contribute to human blunders. How interesting would it be to write an algorithm that simulates (and randomizes or something) the moods, positional biases, 'move before you think' bad habits and so on.

THAT'D be real AI to me...it'd be something you can plug into a DGT board and play REAL blitz games as if you're on liChess.
The technology already exists for that. You could train a Neural Network like Leela, but instead of rewarding it for winning games, you'd reward it for how closely it matches the moves of some human. After enough iterations, it'd be indistinguishable from the human, so it would be able to play realistic human blunders (though, I have said in other threads, you can save iterations by having an adversarial network try to tell the difference between the human's games and those of the NN, and reward the NN for fooling the AN. If Leela becomes very good at playing chess, then this NN would become very good at fooling the AN, and the AN would already be better at distinguishing the NN moves from the human moves than any human, so it should pass the turing test.)

At least, this worked for Super Mario Kart (where the AI trying to imitate the human outperformed the NN trying to win races, with a fraction of the effort. Even when the human wasn't an expert in the game.)

My suggestion would be starting with Capablanca, he was the closest player to moves from Rybka, so he'd probably be the easiest one to emulate like this (Paul Morphy would be at the other end. It has been surprisingly easy to make engines play like Tal by just tweaking the settings, but nothing plays like Morphy that I can see. Is it Morphy's Law?)
Dietrich Kappe may have attempted something like that with his BadGyal and EvilGyal nets, but they eventually became too strong for most people.

https://github.com/dkappe/leela-chess-w ... d-Networks
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CCRL, FIDE and Ratings

Post by Laskos »

Ovyron wrote: Sun Oct 06, 2019 9:17 am
At least, this worked for Super Mario Kart (where the AI trying to imitate the human outperformed the NN trying to win races, with a fraction of the effort. Even when the human wasn't an expert in the game.)
The goals are different, but is the first one SL and the second one zero approach RL solely, or the latter is SL + RL?

My suggestion would be starting with Capablanca, he was the closest player to moves from Rybka, so he'd probably be the easiest one to emulate like this (Paul Morphy would be at the other end. It has been surprisingly easy to make engines play like Tal by just tweaking the settings, but nothing plays like Morphy that I can see. Is it Morphy's Law?)