We are reaching the Max Odds that a top engine can give the top 3 GM's

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Uri Blass
Posts: 11175
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: We are reaching the Max Odds that a top engine can give the top 3 GM's

Post by Uri Blass »

lkaufman wrote: Wed May 12, 2021 8:08 pm
Uri Blass wrote: Wed May 12, 2021 7:50 pm Benjamin still did not score 100% against Dragon and the question is what is the minimal level that can score 100%
when it play against top programs.

Rybka2.3.2a is a candidate and in my test it won against latest stockfish that for some reason allowed trading pieces(maybe because it calculated that other options are worse so I do not know if it was a mistake)

This time I gave stockfish 7 cores and contempt 100 against 1 core of Rybka.

[pgn][Event "?"]
[Site "?"]
[Date "2021.05.12"]
[Round "?"]
[White "stockfish_21051119_x64_avx2"]
[Black "Rybkav2.3.2a.mp.x64"]
[Result "0-1"]
[FEN "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKB1R w KQkq - 0 1"]
[GameDuration "00:40:10"]
[GameEndTime "2021-05-12T20:34:12.887 שעון קיץ ירושלים"]
[GameStartTime "2021-05-12T19:54:02.631 שעון קיץ ירושלים"]
[PlyCount "136"]
[SetUp "1"]
[TimeControl "40/780"]

1. d4 {-4.95/35 46s} Nf6 {+1.82/17 29s} 2. g3 {-4.92/30 9.6s} d6 {+1.82/16 19s}
3. Bg2 {-4.62/29 8.0s} g6 {+1.85/17 18s} 4. e4 {-4.49/30 28s} Bg7 {+1.85/16 26s}
5. O-O {-4.81/33 9.6s} O-O {+1.91/16 27s} 6. e5 {-4.87/34 48s}
dxe5 {+2.31/18 20s} 7. dxe5 {-5.00/31 12s} Ng4 {+2.36/18 7.1s}
8. f4 {-4.87/29 7.2s} Nc6 {+2.13/17 28s} 9. Na3 {-4.94/35 78s}
Be6 {+2.42/16 6.7s} 10. h3 {-5.09/32 29s} Nh6 {+2.42/15 2.8s}
11. c3 {-5.17/34 40s} Qxd1 {+2.45/15 8.1s} 12. Rxd1 {-5.05/29 4.7s}
a6 {+2.54/15 3.2s} 13. Bd2 {-4.65/33 36s} Rfd8 {+2.62/15 10s}
14. Be3 {-4.87/31 6.8s} Rxd1+ {+2.75/15 16s} 15. Rxd1 {-4.71/32 6.3s}
f6 {+2.89/16 4.3s} 16. exf6 {-4.94/34 11s} exf6 {+2.96/17 12s}
17. b3 {-4.94/36 48s} Nf5 {+3.15/17 3.3s} 18. Bf2 {-4.69/32 7.3s}
Rd8 {+3.17/16 4.1s} 19. Rxd8+ {-4.87/32 9.2s} Nxd8 {+2.91/15 0s}
20. Kh2 {-5.09/36 67s} Nd6 {+3.46/18 7.5s} 21. c4 {-5.04/30 10s}
Bf5 {+3.58/17 20s} 22. c5 {-5.17/30 9.7s} Ne4 {+3.70/18 3.0s}
23. g4 {-4.67/31 7.0s} Nxf2 {+4.65/17 2.9s} 24. gxf5 {-4.86/33 11s}
gxf5 {+4.92/18 16s} 25. Nc2 {-5.43/38 55s} Ne4 {+5.01/17 12s}
26. Nd4 {-5.66/38 42s} Bh6 {+4.48/15 0s} 27. Bf1 {-5.61/39 32s}
Bxf4+ {+5.36/15 11s} 28. Kg2 {-5.94/30 6.9s} Nxc5 {+5.46/16 17s}
29. b4 {-5.94/36 28s} Nce6 {+5.69/15 26s} 30. Bc4 {-6.12/30 3.2s}
Kf7 {+4.62/13 0s} 31. Nxf5 {-6.03/30 3.1s} Kg6 {+5.90/15 13s}
32. Bd3 {-6.29/32 22s} Bd2 {+6.03/16 22s} 33. Bb1 {-6.40/30 11s}
Bxb4 {+6.45/13 17s} 34. Ne3+ {-6.47/32 12s} Kg7 {+6.63/15 44s}
35. Nf5+ {-6.55/31 1.5s} Kf7 {+6.63/14 26s} 36. Kf3 {-6.53/30 8.2s}
Nc6 {+6.68/13 20s} 37. h4 {-6.39/27 3.7s} Ne5+ {+6.85/14 25s}
38. Ke3 {-6.40/23 0.58s} Bc5+ {+6.99/16 35s} 39. Kd2 {-6.77/28 1.9s}
b5 {+7.01/16 28s} 40. a4 {-6.54/19 0.39s} bxa4 {+7.70/13 20s}
41. h5 {-10.34/38 78s} a3 {+7.94/15 16s} 42. Ba2 {-10.48/39 12s}
Nf3+ {+8.28/13 4.9s} 43. Kc3 {-12.23/40 115s} Nd4 {+8.82/17 37s}
44. Nh6+ {-13.40/40 127s} Kg7 {+8.62/13 3.8s} 45. Ng4 {-13.54/29 7.7s}
Nf4 {+8.82/17 41s} 46. h6+ {-12.24/33 28s} Kg6 {+9.11/17 26s}
47. Kc4 {-13.37/32 21s} Ba7 {+9.08/15 13s} 48. Kc3 {-12.69/36 31s}
Nf3 {+9.99/14 9.8s} 49. Kb4 {-21.85/31 87s} f5 {+11.08/16 9.7s}
50. Nh2 {-24.32/30 67s} Nxh2 {+11.65/14 7.9s} 51. Bb3 {-29.63/28 52s}
Nh5 {+12.53/14 14s} 52. Kxa3 {-73.20/35 22s} f4 {+12.50/13 7.6s}
53. Kb4 {-113.70/38 13s} f3 {+13.12/14 14s} 54. Bc4 {-M32/69 8.4s}
f2 {+12.93/11 3.9s} 55. Be2 {-M30/59 2.7s} c5+ {+16.50/14 25s}
56. Kc3 {-M26/64 2.8s} f1=Q {+19.80/14 10s} 57. Bd3+ {-M26/58 3.0s}
Qxd3+ {+14.82/12 41s} 58. Kxd3 {-M24/59 2.9s} Nf4+ {+22.94/14 28s}
59. Kd2 {-M22/58 3.4s} Nf1+ {+M35/13 6.4s} 60. Kc2 {-M18/65 7.0s}
c4 {+M25/12 1.1s} 61. Kb1 {-M16/67 2.2s} Bd4 {+M19/7 0.028s}
62. Kc2 {-M14/82 3.2s} Ne3+ {+M17/4 0.004s} 63. Kc1 {-M12/132 3.1s}
c3 {+M11/3 0s} 64. Kb1 {-M10/212 0.50s} c2+ {+M9/3 0.001s}
65. Kc1 {-M8/245 0.049s} Ne2+ {+M7/3 0.001s} 66. Kd2 {-M6/245 0.008s}
c1=Q+ {+M5/3 0s} 67. Kxe2 {-M4/245 0.006s} Qf1+ {+M3/3 0.001s}
68. Kd2 {-M2/245 0.003s} Qd1# {+M1/3 0.001s, Black mates} 0-1

[/pgn]
Asking for 100% wins is a bit unreasonable in my opinion. Even Dragon vs Dragon or Stockfish vs Stockfish might draw one game in ten thousand or so at knight odds due to failure to recognize obscure fortress draws for example, and who has computer time to devote to play 10,000 Rapid games like this? For me the interesting question is the point at which the weaker engine crosses 50% outright wins, meaning it would win a match at knight odds with Armageddon scoring of draws counting as wins for the odds-giver. Benjamin is already above that point in Rapid games with Dragon 2 on one thread, it's about even that way if Dragon uses four threads.
I doubt if top programs can draw one game in ten thousand at knight odds.
Failing to recognize some fortress is not enough and the engine need also not to find something better because if the engine evaluate the fortress as +3 and evaluate another line as more than +3 then it is not going to fall into the fortress but you practically do not need to play 10,000 rapid games and even a score like 100-0 may be interesting.
lkaufman
Posts: 6297
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: We are reaching the Max Odds that a top engine can give the top 3 GM's

Post by lkaufman »

Uri Blass wrote: Thu May 13, 2021 7:53 am
lkaufman wrote: Wed May 12, 2021 8:08 pm
Uri Blass wrote: Wed May 12, 2021 7:50 pm Benjamin still did not score 100% against Dragon and the question is what is the minimal level that can score 100%
when it play against top programs.

Rybka2.3.2a is a candidate and in my test it won against latest stockfish that for some reason allowed trading pieces(maybe because it calculated that other options are worse so I do not know if it was a mistake)

This time I gave stockfish 7 cores and contempt 100 against 1 core of Rybka.

[pgn][Event "?"]
[Site "?"]
[Date "2021.05.12"]
[Round "?"]
[White "stockfish_21051119_x64_avx2"]
[Black "Rybkav2.3.2a.mp.x64"]
[Result "0-1"]
[FEN "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKB1R w KQkq - 0 1"]
[GameDuration "00:40:10"]
[GameEndTime "2021-05-12T20:34:12.887 שעון קיץ ירושלים"]
[GameStartTime "2021-05-12T19:54:02.631 שעון קיץ ירושלים"]
[PlyCount "136"]
[SetUp "1"]
[TimeControl "40/780"]

1. d4 {-4.95/35 46s} Nf6 {+1.82/17 29s} 2. g3 {-4.92/30 9.6s} d6 {+1.82/16 19s}
3. Bg2 {-4.62/29 8.0s} g6 {+1.85/17 18s} 4. e4 {-4.49/30 28s} Bg7 {+1.85/16 26s}
5. O-O {-4.81/33 9.6s} O-O {+1.91/16 27s} 6. e5 {-4.87/34 48s}
dxe5 {+2.31/18 20s} 7. dxe5 {-5.00/31 12s} Ng4 {+2.36/18 7.1s}
8. f4 {-4.87/29 7.2s} Nc6 {+2.13/17 28s} 9. Na3 {-4.94/35 78s}
Be6 {+2.42/16 6.7s} 10. h3 {-5.09/32 29s} Nh6 {+2.42/15 2.8s}
11. c3 {-5.17/34 40s} Qxd1 {+2.45/15 8.1s} 12. Rxd1 {-5.05/29 4.7s}
a6 {+2.54/15 3.2s} 13. Bd2 {-4.65/33 36s} Rfd8 {+2.62/15 10s}
14. Be3 {-4.87/31 6.8s} Rxd1+ {+2.75/15 16s} 15. Rxd1 {-4.71/32 6.3s}
f6 {+2.89/16 4.3s} 16. exf6 {-4.94/34 11s} exf6 {+2.96/17 12s}
17. b3 {-4.94/36 48s} Nf5 {+3.15/17 3.3s} 18. Bf2 {-4.69/32 7.3s}
Rd8 {+3.17/16 4.1s} 19. Rxd8+ {-4.87/32 9.2s} Nxd8 {+2.91/15 0s}
20. Kh2 {-5.09/36 67s} Nd6 {+3.46/18 7.5s} 21. c4 {-5.04/30 10s}
Bf5 {+3.58/17 20s} 22. c5 {-5.17/30 9.7s} Ne4 {+3.70/18 3.0s}
23. g4 {-4.67/31 7.0s} Nxf2 {+4.65/17 2.9s} 24. gxf5 {-4.86/33 11s}
gxf5 {+4.92/18 16s} 25. Nc2 {-5.43/38 55s} Ne4 {+5.01/17 12s}
26. Nd4 {-5.66/38 42s} Bh6 {+4.48/15 0s} 27. Bf1 {-5.61/39 32s}
Bxf4+ {+5.36/15 11s} 28. Kg2 {-5.94/30 6.9s} Nxc5 {+5.46/16 17s}
29. b4 {-5.94/36 28s} Nce6 {+5.69/15 26s} 30. Bc4 {-6.12/30 3.2s}
Kf7 {+4.62/13 0s} 31. Nxf5 {-6.03/30 3.1s} Kg6 {+5.90/15 13s}
32. Bd3 {-6.29/32 22s} Bd2 {+6.03/16 22s} 33. Bb1 {-6.40/30 11s}
Bxb4 {+6.45/13 17s} 34. Ne3+ {-6.47/32 12s} Kg7 {+6.63/15 44s}
35. Nf5+ {-6.55/31 1.5s} Kf7 {+6.63/14 26s} 36. Kf3 {-6.53/30 8.2s}
Nc6 {+6.68/13 20s} 37. h4 {-6.39/27 3.7s} Ne5+ {+6.85/14 25s}
38. Ke3 {-6.40/23 0.58s} Bc5+ {+6.99/16 35s} 39. Kd2 {-6.77/28 1.9s}
b5 {+7.01/16 28s} 40. a4 {-6.54/19 0.39s} bxa4 {+7.70/13 20s}
41. h5 {-10.34/38 78s} a3 {+7.94/15 16s} 42. Ba2 {-10.48/39 12s}
Nf3+ {+8.28/13 4.9s} 43. Kc3 {-12.23/40 115s} Nd4 {+8.82/17 37s}
44. Nh6+ {-13.40/40 127s} Kg7 {+8.62/13 3.8s} 45. Ng4 {-13.54/29 7.7s}
Nf4 {+8.82/17 41s} 46. h6+ {-12.24/33 28s} Kg6 {+9.11/17 26s}
47. Kc4 {-13.37/32 21s} Ba7 {+9.08/15 13s} 48. Kc3 {-12.69/36 31s}
Nf3 {+9.99/14 9.8s} 49. Kb4 {-21.85/31 87s} f5 {+11.08/16 9.7s}
50. Nh2 {-24.32/30 67s} Nxh2 {+11.65/14 7.9s} 51. Bb3 {-29.63/28 52s}
Nh5 {+12.53/14 14s} 52. Kxa3 {-73.20/35 22s} f4 {+12.50/13 7.6s}
53. Kb4 {-113.70/38 13s} f3 {+13.12/14 14s} 54. Bc4 {-M32/69 8.4s}
f2 {+12.93/11 3.9s} 55. Be2 {-M30/59 2.7s} c5+ {+16.50/14 25s}
56. Kc3 {-M26/64 2.8s} f1=Q {+19.80/14 10s} 57. Bd3+ {-M26/58 3.0s}
Qxd3+ {+14.82/12 41s} 58. Kxd3 {-M24/59 2.9s} Nf4+ {+22.94/14 28s}
59. Kd2 {-M22/58 3.4s} Nf1+ {+M35/13 6.4s} 60. Kc2 {-M18/65 7.0s}
c4 {+M25/12 1.1s} 61. Kb1 {-M16/67 2.2s} Bd4 {+M19/7 0.028s}
62. Kc2 {-M14/82 3.2s} Ne3+ {+M17/4 0.004s} 63. Kc1 {-M12/132 3.1s}
c3 {+M11/3 0s} 64. Kb1 {-M10/212 0.50s} c2+ {+M9/3 0.001s}
65. Kc1 {-M8/245 0.049s} Ne2+ {+M7/3 0.001s} 66. Kd2 {-M6/245 0.008s}
c1=Q+ {+M5/3 0s} 67. Kxe2 {-M4/245 0.006s} Qf1+ {+M3/3 0.001s}
68. Kd2 {-M2/245 0.003s} Qd1# {+M1/3 0.001s, Black mates} 0-1

[/pgn]
Asking for 100% wins is a bit unreasonable in my opinion. Even Dragon vs Dragon or Stockfish vs Stockfish might draw one game in ten thousand or so at knight odds due to failure to recognize obscure fortress draws for example, and who has computer time to devote to play 10,000 Rapid games like this? For me the interesting question is the point at which the weaker engine crosses 50% outright wins, meaning it would win a match at knight odds with Armageddon scoring of draws counting as wins for the odds-giver. Benjamin is already above that point in Rapid games with Dragon 2 on one thread, it's about even that way if Dragon uses four threads.
I doubt if top programs can draw one game in ten thousand at knight odds.
Failing to recognize some fortress is not enough and the engine need also not to find something better because if the engine evaluate the fortress as +3 and evaluate another line as more than +3 then it is not going to fall into the fortress but you practically do not need to play 10,000 rapid games and even a score like 100-0 may be interesting.
Even to win 100 to 0 you would need a very strong engine, maybe within 300 elo or so of the odds-giver. A lot depends on whether the weaker engine knows to avoid closing the position too much, when even with an extra knight it will be very difficult to win.
Komodo rules!
chrisw
Posts: 4835
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: We are reaching the Max Odds that a top engine can give the top 3 GM's

Post by chrisw »

lkaufman wrote: Thu May 13, 2021 2:49 pm
Uri Blass wrote: Thu May 13, 2021 7:53 am
lkaufman wrote: Wed May 12, 2021 8:08 pm
Uri Blass wrote: Wed May 12, 2021 7:50 pm Benjamin still did not score 100% against Dragon and the question is what is the minimal level that can score 100%
when it play against top programs.

Rybka2.3.2a is a candidate and in my test it won against latest stockfish that for some reason allowed trading pieces(maybe because it calculated that other options are worse so I do not know if it was a mistake)

This time I gave stockfish 7 cores and contempt 100 against 1 core of Rybka.

[pgn][Event "?"]
[Site "?"]
[Date "2021.05.12"]
[Round "?"]
[White "stockfish_21051119_x64_avx2"]
[Black "Rybkav2.3.2a.mp.x64"]
[Result "0-1"]
[FEN "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKB1R w KQkq - 0 1"]
[GameDuration "00:40:10"]
[GameEndTime "2021-05-12T20:34:12.887 שעון קיץ ירושלים"]
[GameStartTime "2021-05-12T19:54:02.631 שעון קיץ ירושלים"]
[PlyCount "136"]
[SetUp "1"]
[TimeControl "40/780"]

1. d4 {-4.95/35 46s} Nf6 {+1.82/17 29s} 2. g3 {-4.92/30 9.6s} d6 {+1.82/16 19s}
3. Bg2 {-4.62/29 8.0s} g6 {+1.85/17 18s} 4. e4 {-4.49/30 28s} Bg7 {+1.85/16 26s}
5. O-O {-4.81/33 9.6s} O-O {+1.91/16 27s} 6. e5 {-4.87/34 48s}
dxe5 {+2.31/18 20s} 7. dxe5 {-5.00/31 12s} Ng4 {+2.36/18 7.1s}
8. f4 {-4.87/29 7.2s} Nc6 {+2.13/17 28s} 9. Na3 {-4.94/35 78s}
Be6 {+2.42/16 6.7s} 10. h3 {-5.09/32 29s} Nh6 {+2.42/15 2.8s}
11. c3 {-5.17/34 40s} Qxd1 {+2.45/15 8.1s} 12. Rxd1 {-5.05/29 4.7s}
a6 {+2.54/15 3.2s} 13. Bd2 {-4.65/33 36s} Rfd8 {+2.62/15 10s}
14. Be3 {-4.87/31 6.8s} Rxd1+ {+2.75/15 16s} 15. Rxd1 {-4.71/32 6.3s}
f6 {+2.89/16 4.3s} 16. exf6 {-4.94/34 11s} exf6 {+2.96/17 12s}
17. b3 {-4.94/36 48s} Nf5 {+3.15/17 3.3s} 18. Bf2 {-4.69/32 7.3s}
Rd8 {+3.17/16 4.1s} 19. Rxd8+ {-4.87/32 9.2s} Nxd8 {+2.91/15 0s}
20. Kh2 {-5.09/36 67s} Nd6 {+3.46/18 7.5s} 21. c4 {-5.04/30 10s}
Bf5 {+3.58/17 20s} 22. c5 {-5.17/30 9.7s} Ne4 {+3.70/18 3.0s}
23. g4 {-4.67/31 7.0s} Nxf2 {+4.65/17 2.9s} 24. gxf5 {-4.86/33 11s}
gxf5 {+4.92/18 16s} 25. Nc2 {-5.43/38 55s} Ne4 {+5.01/17 12s}
26. Nd4 {-5.66/38 42s} Bh6 {+4.48/15 0s} 27. Bf1 {-5.61/39 32s}
Bxf4+ {+5.36/15 11s} 28. Kg2 {-5.94/30 6.9s} Nxc5 {+5.46/16 17s}
29. b4 {-5.94/36 28s} Nce6 {+5.69/15 26s} 30. Bc4 {-6.12/30 3.2s}
Kf7 {+4.62/13 0s} 31. Nxf5 {-6.03/30 3.1s} Kg6 {+5.90/15 13s}
32. Bd3 {-6.29/32 22s} Bd2 {+6.03/16 22s} 33. Bb1 {-6.40/30 11s}
Bxb4 {+6.45/13 17s} 34. Ne3+ {-6.47/32 12s} Kg7 {+6.63/15 44s}
35. Nf5+ {-6.55/31 1.5s} Kf7 {+6.63/14 26s} 36. Kf3 {-6.53/30 8.2s}
Nc6 {+6.68/13 20s} 37. h4 {-6.39/27 3.7s} Ne5+ {+6.85/14 25s}
38. Ke3 {-6.40/23 0.58s} Bc5+ {+6.99/16 35s} 39. Kd2 {-6.77/28 1.9s}
b5 {+7.01/16 28s} 40. a4 {-6.54/19 0.39s} bxa4 {+7.70/13 20s}
41. h5 {-10.34/38 78s} a3 {+7.94/15 16s} 42. Ba2 {-10.48/39 12s}
Nf3+ {+8.28/13 4.9s} 43. Kc3 {-12.23/40 115s} Nd4 {+8.82/17 37s}
44. Nh6+ {-13.40/40 127s} Kg7 {+8.62/13 3.8s} 45. Ng4 {-13.54/29 7.7s}
Nf4 {+8.82/17 41s} 46. h6+ {-12.24/33 28s} Kg6 {+9.11/17 26s}
47. Kc4 {-13.37/32 21s} Ba7 {+9.08/15 13s} 48. Kc3 {-12.69/36 31s}
Nf3 {+9.99/14 9.8s} 49. Kb4 {-21.85/31 87s} f5 {+11.08/16 9.7s}
50. Nh2 {-24.32/30 67s} Nxh2 {+11.65/14 7.9s} 51. Bb3 {-29.63/28 52s}
Nh5 {+12.53/14 14s} 52. Kxa3 {-73.20/35 22s} f4 {+12.50/13 7.6s}
53. Kb4 {-113.70/38 13s} f3 {+13.12/14 14s} 54. Bc4 {-M32/69 8.4s}
f2 {+12.93/11 3.9s} 55. Be2 {-M30/59 2.7s} c5+ {+16.50/14 25s}
56. Kc3 {-M26/64 2.8s} f1=Q {+19.80/14 10s} 57. Bd3+ {-M26/58 3.0s}
Qxd3+ {+14.82/12 41s} 58. Kxd3 {-M24/59 2.9s} Nf4+ {+22.94/14 28s}
59. Kd2 {-M22/58 3.4s} Nf1+ {+M35/13 6.4s} 60. Kc2 {-M18/65 7.0s}
c4 {+M25/12 1.1s} 61. Kb1 {-M16/67 2.2s} Bd4 {+M19/7 0.028s}
62. Kc2 {-M14/82 3.2s} Ne3+ {+M17/4 0.004s} 63. Kc1 {-M12/132 3.1s}
c3 {+M11/3 0s} 64. Kb1 {-M10/212 0.50s} c2+ {+M9/3 0.001s}
65. Kc1 {-M8/245 0.049s} Ne2+ {+M7/3 0.001s} 66. Kd2 {-M6/245 0.008s}
c1=Q+ {+M5/3 0s} 67. Kxe2 {-M4/245 0.006s} Qf1+ {+M3/3 0.001s}
68. Kd2 {-M2/245 0.003s} Qd1# {+M1/3 0.001s, Black mates} 0-1

[/pgn]
Asking for 100% wins is a bit unreasonable in my opinion. Even Dragon vs Dragon or Stockfish vs Stockfish might draw one game in ten thousand or so at knight odds due to failure to recognize obscure fortress draws for example, and who has computer time to devote to play 10,000 Rapid games like this? For me the interesting question is the point at which the weaker engine crosses 50% outright wins, meaning it would win a match at knight odds with Armageddon scoring of draws counting as wins for the odds-giver. Benjamin is already above that point in Rapid games with Dragon 2 on one thread, it's about even that way if Dragon uses four threads.
I doubt if top programs can draw one game in ten thousand at knight odds.
Failing to recognize some fortress is not enough and the engine need also not to find something better because if the engine evaluate the fortress as +3 and evaluate another line as more than +3 then it is not going to fall into the fortress but you practically do not need to play 10,000 rapid games and even a score like 100-0 may be interesting.
Even to win 100 to 0 you would need a very strong engine, maybe within 300 elo or so of the odds-giver. A lot depends on whether the weaker engine knows to avoid closing the position too much, when even with an extra knight it will be very difficult to win.
Probably you already thought of the question in the reverse direction. What odds could an (imaginary) very strong engine never to be able to give to a top player? An upper bound on Elo then becomes Elo(strong player) + Elo(value of odds material).

Just fooling around on my iPhone on the train, it’s pretty much a struggle for me to give knight odds to Stockfish. I tried queen odds the other day and SF doesn’t have much chance, if at all - somehow I doubt even a 32man egtb chess monster could give queen odds. Maybe rook odds, although that’s doubtful.
lkaufman
Posts: 6297
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: We are reaching the Max Odds that a top engine can give the top 3 GM's

Post by lkaufman »

chrisw wrote: Thu May 13, 2021 6:12 pm
Probably you already thought of the question in the reverse direction. What odds could an (imaginary) very strong engine never to be able to give to a top player? An upper bound on Elo then becomes Elo(strong player) + Elo(value of odds material).

Just fooling around on my iPhone on the train, it’s pretty much a struggle for me to give knight odds to Stockfish. I tried queen odds the other day and SF doesn’t have much chance, if at all - somehow I doubt even a 32man egtb chess monster could give queen odds. Maybe rook odds, although that’s doubtful.
I think you meant to say that it's a struggle for you to beat Stockfish when it gives you knight odds, but that you can win easily if it gives you queen odds. I'm guessing from this that you are a reasonably strong player, perhaps somewhere in the general ballpark of 2000 elo or so? It's not easy to beat a top engine with knight odds unless you are a pretty good player, although of course it depends greatly on the time limit and the hardware and the specific engine. I think that recent Lc0 as well as Dragon are both much better than Stockfish in giving piece odds to humans. As for the upper limit of handicap against the best human (at say 15' + 10" rapid), I used to believe that knight odds would be forever impossible, but I no longer believe that, although it could still well be true. It's pretty hard to imagine any engine in the future that could avoid defeat at rook odds vs. the top human, but it's certainly possible. Beyond that, say two minor piece odds or queen for knight odds, I would say it's impossible, it's just too easy to win. A 32 man egtb is far from the best program for this purpose, you need one that estimates the probability of the human opponent making serious mistakes and playing accordingly. MCTS engines make some effort to do this.
Thanks again for your knight odds opening book, which I'm using (trimmed) for these tests. They may be quite useful for improving play giving odds to humans, as at least the program "Benjamin" seems to play the piece-up side very well, much closer to how a top human might play it than most other engines. I'm really having a tough time getting Dragon to be able to win a knight odds match from Benjamin even with shorter time limits and four threads for Dragon. Perhaps there are other programs that are also good at this, but except for NNs I don't know any. Benjamin seems to be willing to give back a pawn at times for piece activity, which is probably a good strategy.
Komodo rules!
User avatar
towforce
Posts: 12921
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: We are reaching the Max Odds that a top engine can give the top 3 GM's

Post by towforce »

lkaufman wrote: Thu May 13, 2021 7:33 pmI think you meant to say that it's a struggle for you to beat Stockfish when it gives you knight odds...

Here a hand-written evaluation function (EF) could be better than one based on machine learning. When the situation is:

1. significantly losing

2. opponent is human

You then want your EF to value complexity over normal positional factors: the more the human brain gets overwhelmed, the less likely it is to correctly account for all the information in the position.
Human chess is partly about tactics and strategy, but mostly about memory
lkaufman
Posts: 6297
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: We are reaching the Max Odds that a top engine can give the top 3 GM's

Post by lkaufman »

towforce wrote: Thu May 13, 2021 10:24 pm
lkaufman wrote: Thu May 13, 2021 7:33 pmI think you meant to say that it's a struggle for you to beat Stockfish when it gives you knight odds...

Here a hand-written evaluation function (EF) could be better than one based on machine learning. When the situation is:

1. significantly losing

2. opponent is human

You then want your EF to value complexity over normal positional factors: the more the human brain gets overwhelmed, the less likely it is to correctly account for all the information in the position.
True, but if the NNs are tuned by low-depth games, they already simulate top level human play to the extent that the low depth matches what humans can do in serious games. Because of this, I'm inclined to think that modifying the search is more promising for this purpose than replacing an eval using nets tuned by fairly short searches.
Komodo rules!
User avatar
towforce
Posts: 12921
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: We are reaching the Max Odds that a top engine can give the top 3 GM's

Post by towforce »

lkaufman wrote: Fri May 14, 2021 12:05 am
towforce wrote: Thu May 13, 2021 10:24 pm
lkaufman wrote: Thu May 13, 2021 7:33 pmI think you meant to say that it's a struggle for you to beat Stockfish when it gives you knight odds...

Here a hand-written evaluation function (EF) could be better than one based on machine learning. When the situation is:

1. significantly losing

2. opponent is human

You then want your EF to value complexity over normal positional factors: the more the human brain gets overwhelmed, the less likely it is to correctly account for all the information in the position.
True, but if the NNs are tuned by low-depth games, they already simulate top level human play to the extent that the low depth matches what humans can do in serious games. Because of this, I'm inclined to think that modifying the search is more promising for this purpose than replacing an eval using nets tuned by fairly short searches.

I think you might be overestimating NN learning: you know how, on the way to becoming a GM, you had many tremendously big thinking exercises about various positions? The NNs never did that. You know how the NNs trained on billions of positions? You never did that - you have only seen a tiny fraction of the positions the NN has seen.

You know how, when presented with a chess position, you have a good idea about how you should be thinking about it? The NN doesn't have that.

Remember how, in the 1970s, David Levy wrote about how to play chess computers, and one of his recommendations was to play for the end game where computers were weak? LC0 is still said to be weak in the endgame.

NN based chess software still relies heavily on a big search, whereas the human ability to build a game tree is almost none-existent.

In other threads, I have postulated that chess NNs are learning a large amount of shallow knowledge (a lot of simple patterns) but not much deep knowledge (complicated patterns), and this explains some otherwise mysterious weaknesses in their evaluations without the benefit of a big search.

I have now come to think that this is likely to be true, and for me it resolves an even bigger mystery: why is it so difficult for ML systems to drive cars well on public roads when many humans can do it so easily? That question has been nagging me for years! This insight, if it is correct, would provide a simple and satisfying answer: you actually need complex patterns to drive a car well, and humans have picked these up during their lives prior to driving, whereas ML systems are only picking up the surface (shallow/simple) patterns.

Back to topic: the situation is that the system is a knight down, and the opponent is human. Now you need an evaluation function or an NN that's been trained to love complexity in positions over "most winning", because that's where you'll get a chance to bring down a human in this situation. For me, that's likely to come from the evaluation: you could argue that the search could prune branches where the position is too simple - but that would be a kind of evaluation in itself.
Human chess is partly about tactics and strategy, but mostly about memory
Chessqueen
Posts: 5685
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Re: We are reaching the Max Odds that a top engine can give the top 3 GM's

Post by Chessqueen »

lkaufman wrote: Sun May 09, 2021 8:29 pm
Chessqueen wrote: Sun May 09, 2021 12:50 am Very soon either Stockfish or Komodo will be able to give a Knight odds in TC of 2'+1" to the top 3 GM.s, but we are really witnessing the maximum odds that any engine would be able to give to the 3 top GMs, since none of us will be able to see an engine giving a Rook Odds with a TC of 2'+1"
The real challenge here is for an engine to give knight odds in Rapid (15' + 10") to GMs, at least with Armageddon rule that White (the odds-giver) wins draws. I think this is already possible with low-rated GMs (low 2400s). So the goal can be to gradually increase the rating of the GM to whom an engine can win say a five game Armageddon knight odds match. People expect blunders in blitz, but 15' + 10" is now the most popular time control for Elite events, especially online, so this is supposed to be a serious time limit where strong players should not make too many simple blunders. Whether an engine will ever be able to beat the world's best Rapid player this way is unknown, we are far from that point now.
Deep Blue with its capability of evaluating 200 million positions per second, was the first and fastest computer to face a Chess Champion (1997).
So we have commercial computer that are at least 24 times faster than Deep Blue, and it is still hard for the latest Engine to give a Knight Odds at TC 15'+10" to the top three players.
Ryzen 5000 CPU Bench 4.801.341.606 128 cpu's x32 threads Cluster System 4096threads Stockfish pop vondele
lkaufman
Posts: 6297
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: We are reaching the Max Odds that a top engine can give the top 3 GM's

Post by lkaufman »

Chessqueen wrote: Fri May 14, 2021 2:35 am
lkaufman wrote: Sun May 09, 2021 8:29 pm
Chessqueen wrote: Sun May 09, 2021 12:50 am Very soon either Stockfish or Komodo will be able to give a Knight odds in TC of 2'+1" to the top 3 GM.s, but we are really witnessing the maximum odds that any engine would be able to give to the 3 top GMs, since none of us will be able to see an engine giving a Rook Odds with a TC of 2'+1"
The real challenge here is for an engine to give knight odds in Rapid (15' + 10") to GMs, at least with Armageddon rule that White (the odds-giver) wins draws. I think this is already possible with low-rated GMs (low 2400s). So the goal can be to gradually increase the rating of the GM to whom an engine can win say a five game Armageddon knight odds match. People expect blunders in blitz, but 15' + 10" is now the most popular time control for Elite events, especially online, so this is supposed to be a serious time limit where strong players should not make too many simple blunders. Whether an engine will ever be able to beat the world's best Rapid player this way is unknown, we are far from that point now.
Deep Blue with its capability of evaluating 200 million positions per second, was the first and fastest computer to face a Chess Champion (1997).
So we have commercial computer that are at least 24 times faster than Deep Blue, and it is still hard for the latest Engine to give a Knight Odds at TC 15'+10" to the top three players.
Ryzen 5000 CPU Bench 4.801.341.606 128 cpu's x32 threads Cluster System 4096threads Stockfish pop vondele
Knight odds is roughly a thousand elo handicap when the odds-receiver is around GM level, as best I can measure it. I believe that today's top three engines, running on a system like that (with good GPUs for engines that need them) could beat Deep Blue in a standard chess match giving something like 1000 to 1 time odds (like 8 hours to half a minute for the game) or even more, but this is still not a thousand elo with equal time. I think that Komodo Dragon or recent Lc0 could indeed give Deep Blue knight odds at some reasonably quick time control (maybe Rapid), but is far from being able to give Kasparov knight odds in Rapid. My tests between Dragon and "Benjamin" have convinced me that more speed or cores isn't the solution; I get similar results in these knight odds matches whether Dragon uses 1,4, or 10 threads. I think that's because even on one thread, Dragon searches about ten plies deeper than Benjamin, and looking deeper just makes Dragon fear lines that Benjamin would never even find. The key is to find some ways, whether by search or eval, to anticipate what a weaker opponent might overlook.
Komodo rules!
chrisw
Posts: 4835
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: We are reaching the Max Odds that a top engine can give the top 3 GM's

Post by chrisw »

lkaufman wrote: Fri May 14, 2021 5:50 am
Chessqueen wrote: Fri May 14, 2021 2:35 am
lkaufman wrote: Sun May 09, 2021 8:29 pm
Chessqueen wrote: Sun May 09, 2021 12:50 am Very soon either Stockfish or Komodo will be able to give a Knight odds in TC of 2'+1" to the top 3 GM.s, but we are really witnessing the maximum odds that any engine would be able to give to the 3 top GMs, since none of us will be able to see an engine giving a Rook Odds with a TC of 2'+1"
The real challenge here is for an engine to give knight odds in Rapid (15' + 10") to GMs, at least with Armageddon rule that White (the odds-giver) wins draws. I think this is already possible with low-rated GMs (low 2400s). So the goal can be to gradually increase the rating of the GM to whom an engine can win say a five game Armageddon knight odds match. People expect blunders in blitz, but 15' + 10" is now the most popular time control for Elite events, especially online, so this is supposed to be a serious time limit where strong players should not make too many simple blunders. Whether an engine will ever be able to beat the world's best Rapid player this way is unknown, we are far from that point now.
Deep Blue with its capability of evaluating 200 million positions per second, was the first and fastest computer to face a Chess Champion (1997).
So we have commercial computer that are at least 24 times faster than Deep Blue, and it is still hard for the latest Engine to give a Knight Odds at TC 15'+10" to the top three players.
Ryzen 5000 CPU Bench 4.801.341.606 128 cpu's x32 threads Cluster System 4096threads Stockfish pop vondele
Knight odds is roughly a thousand elo handicap when the odds-receiver is around GM level, as best I can measure it. I believe that today's top three engines, running on a system like that (with good GPUs for engines that need them) could beat Deep Blue in a standard chess match giving something like 1000 to 1 time odds (like 8 hours to half a minute for the game) or even more, but this is still not a thousand elo with equal time. I think that Komodo Dragon or recent Lc0 could indeed give Deep Blue knight odds at some reasonably quick time control (maybe Rapid), but is far from being able to give Kasparov knight odds in Rapid. My tests between Dragon and "Benjamin" have convinced me that more speed or cores isn't the solution; I get similar results in these knight odds matches whether Dragon uses 1,4, or 10 threads. I think that's because even on one thread, Dragon searches about ten plies deeper than Benjamin, and looking deeper just makes Dragon fear lines that Benjamin would never even find. The key is to find some ways, whether by search or eval, to anticipate what a weaker opponent might overlook.
Took up the challenge, here’s a knight odds iPhone quickie while breakfast coffee. Boring but relatively solid (me) although my intended knight manoeuvre to c4 ended with SF pressure and some difficulties with the Bishop. Trading down solved the problem.
No real attempt at time control, SF was moving within a few seconds, me likewise.

[pgn] [Event "?"]
[Site "?"]
[Date "14 May 2021"]
[Round "?"]
[White "Stockfish"]
[Black "Me"]
[Result "0-1"]
[FEN "rnbqkbnr/pppppppp/8/8/3P4/8/PPP1PPPP/R1BQKBNR b KQkq -"]

1... Nf6 2. Nf3 g6 3. Bg5 Bg7 4. e3 d6 5. Bd3 O-O 6. c3 c5 7. b4 cxb4 8. cxb4
a6 9. O-O Nc6 10. a3 b5 11. Rc1 Bb7 12. e4 e5 13. d5 Nd4 14. Nxd4 exd4 15. f3
Qd7 16. Rf2 Rfc8 17. Ra2 Rxc1 18. Qxc1 Rc8 19. Qf1 Ne8 20. Ra1 Nc7 21. g4 Na8
22. a4 bxa4 23. b5 axb5 24. Bxb5 Qc7 25. Rxa4 Nb6 26. Ra7 Nd7 27. Bxd7 Qxd7 28.
Qb1 Rc7 29. Bd2 Qc8 30. h3 h6 31. Qd3 Qb8 32. Ra2 g5 33. Kf2 Bc8 34. Be1 Bd7
35. Qd1 Ra7 36. Rxa7 Qxa7 37. Kg2 Bb5 38. Qd2 Qa4 39. Qf2 Qc4 40. Ba5 Be5 41.
Bd2 Qc2 42. Bb4 Qxf2+ 43. Kxf2 Bf4 44. Ke1 Kg7 45. Ba3 Kf6 46. Bb4 Ke5 47. Kf2
Be3+ 48. Kg3 d3 49. Bc3+ Bd4 50. Bd2 Bb2 51. Kf2 Kd4 52. Be3+ Kc3 53. f4 gxf4
54. Bxf4 d2 55. Bxd2+ Kxd2 56. h4 f6 57. Kg3 Ke3 58. g5 fxg5 59. hxg5 hxg5 60.
Kg4 Bf6 61. e5 dxe5 62. Kf5 g4 63. Kxf6 g3 64. d6 g2 65. d7 Bxd7 66. Ke7 Bh3
67. Kd6 g1=Q 68. Kd5 Qa1 69. Kc5 Qd4+ 70. Kc6 Bg2+ 71. Kc7 Qd5 72. Kb6 Qc6+ 73.
Ka5 Kd4 74. Kb4 Qc5+ 75. Kb3 Qc3+ 76. Ka2 Bd5+ 77. Kb1 Qd2 78. Ka1 Qa2#
0-1
[/pgn]