Stockfish Handicap Matches

lkaufman · Post by **lkaufman** » Tue Jun 23, 2020 8:53 pm

These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?

chrisw · Post by **chrisw** » Tue Jun 23, 2020 9:24 pm

Rebel wrote: ↑Tue Jun 23, 2020 3:20 pm Stockfish 11 gauntlet with Chris old knight odds epd.

First match, tc=40/10

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 Komodo_14       : 3477.9      94.5     100   94.5%
   2 Houdini_6.03    : 3283.6      85.0     100   85.0%
   3 Laser_1.7       : 3251.3      82.5     100   82.5%
   4 rofChade_2.3    : 3228.0      80.5     100   80.5%
   5 Arasan_22       : 2979.6      50.0     100   50.0%
   6 Stockfish_11    : 2979.6     107.5     500   21.5%

Second match, tc=40/20

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 Komodo_14       : 3515.8      96.5     100   96.5%
   2 Houdini_6.03    : 3340.0      91.0     100   91.0%
   3 rofChade_2.3    : 3252.7      86.0     100   86.0%
   4 Laser_1.7       : 3194.4      81.5     100   81.5%
   5 Arasan_22       : 2962.6      54.0     100   54.0%
   6 Stockfish_11    : 2934.5      91.0     500   18.2%

Third match, tc=40/40, included oldies.

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 Komodo_14       : 3743.1      97.5     100   97.5%
   2 Houdini_6.03    : 3568.3      93.5     100   93.5%
   3 rofChade_2.3    : 3398.3      84.5     100   84.5%
   4 Laser_1.7       : 3293.6      75.0     100   75.0%
   5 Arasan_22       : 3146.9      56.5     100   56.5%
   6 Stockfish_11    : 3101.1     334.0     800   41.8%
   7 ProDeo          : 2899.1      24.0     100   24.0%
   8 Benjamin        : 2874.1      21.5     100   21.5%
   9 Fruit_2.1       : 2775.6      13.5     100   13.5%

Fourth match, tc=40/80

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 Komodo_14     > : 3655.4     100.0     100  100.0%
   2 Houdini_6.03    : 3282.1      96.0     100   96.0%
   3 rofChade_2.3    : 3066.2      87.5     100   87.5%
   4 Laser_1.7       : 2996.9      82.5     100   82.5%
   5 Arasan_22       : 2814.7      62.5     100   62.5%
   6 Stockfish_11    : 2725.1     303.5     800   37.9%
   7 ProDeo          : 2564.0      28.5     100   28.5%
   8 Benjamin        : 2523.1      24.0     100   24.0%
   9 Fruit_2.1       : 2428.0      15.5     100   15.5%

I will run Chris new knight-odds set when it's ready.

Do you have any pgns of games where Stockfish managed to turn a game into a win against any of the stronger opposition?

chrisw · Post by **chrisw** » Tue Jun 23, 2020 11:18 pm

Large nite-odds epd uploaded. 3800 epds, sorted by SF11 evaluation 250ms.

file is marked -5000.epds (not to be confused with the smaller -500.epds

https://github.com/ChrisWhittington/Chess-EPDs

first few hundred below ...

Code: Select all

rnbqkb1r/pppp1ppp/5n2/4p3/3PP3/3B4/PPP2PPP/R1BQK1NR w KQkq - 0 4; v=-179
rnbqkb1r/pppp1ppp/5n2/4p3/3PP3/3B4/PPP2PPP/RNBQK2R w KQkq - 0 4; v=-212
r1bqkbnr/pp1ppppp/n7/2p5/4P3/2P2N2/PP1P1PPP/R1BQKB1R w KQkq - 2 4; v=-261
rnbqk1nr/ppp2ppp/4p3/3pP3/1b1P4/8/PPP2PPP/R1BQKBNR w KQkq - 1 4; v=-276
r1bqkb1r/pppppp1p/n4np1/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 1 4; v=-281
rnbqk2r/pppp1ppp/5n2/2b1p3/2B1P3/5N2/PPPP1PPP/R1BQK2R w KQkq - 4 4; v=-287
r1bqk1nr/pppp1ppp/2n5/1B2p3/1b2P3/5N2/PPPP1PPP/R1BQK2R w KQkq - 4 4; v=-288
rnbqkb1r/ppppp2p/5ppn/8/3PP3/2N5/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-295
r1bqkb1r/pppppppp/n7/3nP3/8/5N2/PPPP1PPP/R1BQKB1R w KQkq - 1 4; v=-299
r2qkbnr/pppnpppp/8/3p1b2/2PP4/1Q6/PP2PPPP/RNB1KB1R w KQkq - 3 4; v=-300
rnbqkbnr/p2ppp1p/6p1/1ppP4/2P5/8/PP2PPPP/RNBQKB1R w KQkq - 0 4; v=-301
rnbqkb1r/pppppppp/8/2n1P3/8/3P4/PPP2PPP/R1BQKBNR w KQkq - 1 4; v=-303
rn1qkbnr/1bpppppp/p7/1p6/P2PP3/8/1PP2PPP/RNBQKB1R w KQkq - 1 4; v=-304
rnbqkbnr/p2ppp1p/2p3p1/1p6/8/5NP1/PPPPPPBP/R1BQK2R w KQkq - 0 4; v=-305
rnb1kbnr/pp1pppp1/2p4p/q7/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-307
rnbqkbnr/p2ppp1p/6p1/1ppP4/2P5/8/PP2PPPP/R1BQKBNR w KQkq - 0 4; v=-308
rnbqkb1r/pppppppp/8/2n1P3/8/3P4/PPP2PPP/RNBQKB1R w KQkq - 1 4; v=-310
rn1qkbnr/ppp2ppp/3p4/4p3/3PP1b1/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-312
rnbqk2r/ppppppbp/5np1/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 3 4; v=-313
rn1qkb1r/pbpppppp/5n2/1p6/3PP3/3B4/PPP2PPP/RNBQK2R w KQkq - 3 4; v=-314
rnbqk1nr/pp1pppbp/2p3p1/8/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 0 4; v=-314
rnbqkbnr/p2ppp1p/1p4p1/2p5/4P3/2P2N2/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-318
rnbqkbnr/pp1p1ppp/8/2pPp3/2P5/8/PP2PPPP/RNBQKB1R w KQkq - 0 4; v=-319
r1bqkbnr/pppp1p1p/2n3p1/4p3/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 1 4; v=-320
rnbqkbnr/pp1pp1p1/7p/2p2p2/3P3B/8/PPP1PPPP/RN1QKB1R w KQkq - 0 4; v=-320
r1bqkbnr/ppp2ppp/2n1p3/3p4/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-323
r1bqkbnr/ppp2ppp/2n1p3/3pP3/3P4/8/PPP2PPP/R1BQKBNR w KQkq - 0 4; v=-323
rnbqkb1r/p2ppppp/5n2/1ppP4/2P5/8/PP2PPPP/RNBQKB1R w KQkq - 0 4; v=-324
r1bqkb1r/pppppppp/2n5/4P3/6n1/5N2/PPPP1PPP/R1BQKB1R w KQkq - 1 4; v=-324
rnbqkb1r/p1pp1ppp/1p2pn2/8/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-324
r1bqkbnr/pp1npppp/2pp4/8/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-324
rnbqkb1r/pppppp1p/6p1/3nP3/3P4/8/PPP2PPP/RNBQKB1R w KQkq - 1 4; v=-325
rnbqkbnr/pp2ppp1/2p4p/3p4/3PP3/3B4/PPP2PPP/R1BQK1NR w KQkq - 0 4; v=-325
rnbqk2r/ppppppbp/5np1/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 2 4; v=-325
rnbqkbnr/ppp3pp/3p4/4pp2/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-325
rnbqk1nr/pp1pppbp/2p3p1/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 2 4; v=-326
rnbqkb1r/pp1ppppp/2p5/3nP3/3P4/8/PPP2PPP/RNBQKB1R w KQkq - 0 4; v=-327
rnb1kbnr/ppq1pppp/2pp4/8/3PP3/2N5/PPP2PPP/R1BQKB1R w KQkq - 1 4; v=-327
rnbqkb1r/1ppp1ppp/p3pn2/8/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-328
rnbqkb1r/pppp1p1p/4pnp1/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-328
rnbqkb1r/pp1ppp1p/2p2np1/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-328
rnbqkb1r/pp1ppp1p/5np1/2p5/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 1 4; v=-329
rnbqkb1r/ppp2ppp/3ppn2/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 1 4; v=-329
rnbqkbnr/ppp1pp2/3p2pp/8/3PP2P/8/PPP2PP1/R1BQKBNR w KQkq - 0 4; v=-330
rnbqkb1r/pppppp1p/6p1/3nP3/3P4/8/PPP2PPP/R1BQKBNR w KQkq - 1 4; v=-330
r1bqk1nr/ppppbppp/2n1p3/8/4P3/5N2/PPPPQPPP/R1B1KB1R w KQkq - 4 4; v=-330
rnbqkb1r/pp1ppppp/2p5/3nP3/3P4/8/PPP2PPP/RNBQKB1R w KQkq - 1 4; v=-331
rnb1kbnr/ppq1pppp/3p4/2p5/3PP3/2P5/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-331
rnbqk1nr/p1ppppbp/1p4p1/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-332
rnbqk1nr/p1pp1ppp/1p2p3/8/1b1PP3/2N5/PPP2PPP/R1BQKB1R w KQkq - 2 4; v=-332
r1bqkbnr/pp1ppp1p/n1p3p1/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-333
rnbqkbnr/pppp3p/5pp1/4p3/2PP4/6P1/PP2PP1P/RNBQKB1R w KQkq - 0 4; v=-333
rnbqkbnr/pp2pp1p/2p3p1/3p4/3PP3/7P/PPP2PP1/R1BQKBNR w KQkq - 0 4; v=-333
rnbqkbnr/pp3ppp/2pp4/4p3/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-334
rnbqkb1r/pp2pppp/2pp1n2/8/3PP3/5P2/PPP3PP/RNBQKB1R w KQkq - 0 4; v=-334
rnbqk1nr/pp1pppbp/2p3p1/8/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 2 4; v=-334
rnbqkbnr/p2ppp1p/1p4p1/2p5/3PP3/2P5/PP3PPP/RNBQKB1R w KQkq - 0 4; v=-334
rnbqkb1r/ppp1ppp1/3p1n1p/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-335
r1bqkbnr/pp1p1ppp/2n5/2p1p3/4P3/2P2N2/PP1P1PPP/R1BQKB1R w KQkq - 2 4; v=-335
rnbqkb1r/pp1p1ppp/2p1pn2/8/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-335
rnbqkbnr/p2ppp1p/1p4p1/2p5/3PP2P/8/PPP2PP1/R1BQKBNR w KQkq - 0 4; v=-335
rn1qkbnr/1bpppppp/p7/1p6/P2PP3/8/1PP2PPP/R1BQKBNR w KQkq - 1 4; v=-335
r1bqkbnr/ppp2ppp/2n1p3/3pP3/3P4/8/PPP2PPP/RNBQKB1R w KQkq - 0 4; v=-335
rnbqkbnr/pp1p2pp/4p3/2p2p2/3P4/6P1/PPP1PPBP/RNBQK2R w KQkq - 0 4; v=-336
rnbqkb1r/ppp1pp1p/3p1np1/8/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-336
rnb1kbnr/pp1pp1pp/1qp5/5p2/3P1B2/5N2/PPP1PPPP/R2QKB1R w KQkq - 2 4; v=-336
rnbqk1nr/ppp2ppp/8/2bpp3/2B1P3/5N2/PPPP1PPP/R1BQK2R w KQkq - 0 4; v=-336
r1bqkb1r/ppp1pppp/2np1n2/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 2 4; v=-337
rnb1kbnr/ppqppp1p/6p1/2p5/3PP3/2P5/PP3PPP/R1BQKBNR w KQkq - 1 4; v=-337
rnbqk1nr/1pppppbp/p5p1/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-337
rnbqkb1r/1p1ppppp/p4n2/2pP4/2P5/8/PP2PPPP/RNBQKB1R w KQkq - 0 4; v=-337
rnbqkb1r/ppp2ppp/3ppn2/8/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-338
rnbqkb1r/pp2pppp/2pp1n2/8/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 2 4; v=-338
rnbqkb1r/ppppp2p/5np1/5p2/3P3P/6P1/PPP1PP2/R1BQKBNR w KQkq - 1 4; v=-338
rnb1kbnr/pp1ppp1p/6p1/q1p5/4P3/2P2N2/PP1P1PPP/R1BQKB1R w KQkq - 1 4; v=-338
r1bqkbnr/pppp1pp1/2n4p/4p3/2B1P3/5N2/PPPP1PPP/R1BQK2R w KQkq - 0 4; v=-338
rnbqk1nr/p1pp1ppp/1p2p3/8/1b1PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 1 4; v=-338
r1bqk1nr/pppp1ppp/2nb4/1B2p3/4P3/5N2/PPPP1PPP/R1BQK2R w KQkq - 4 4; v=-339
rnb1kbnr/pp1pp1pp/1qp5/5pB1/3P4/4P3/PPP2PPP/RN1QKB1R w KQkq - 1 4; v=-339
rnbqkb1r/pp2pppp/2pp1n2/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-339
rn1qkb1r/pbpppppp/1p3n2/8/3PP3/3B4/PPP2PPP/RNBQK2R w KQkq - 3 4; v=-339
rnbqkb1r/pp1p1ppp/4pn2/2pP4/2P5/8/PP2PPPP/RNBQKB1R w KQkq - 0 4; v=-339
rnbqk1nr/pp1pppbp/2p3p1/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-340
rnbqkbnr/ppp2p1p/4p1p1/3p4/3PP3/8/PPPN1PPP/R1BQKB1R w KQkq - 0 4; v=-342
r1bqkb1r/ppp1pppp/2n2n2/3p4/2PP4/4P3/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-342
rnbqkbnr/pp1p1ppp/8/2pPp3/2P5/8/PP2PPPP/R1BQKBNR w KQkq - 0 4; v=-342
rnbqk1nr/pppp1pbp/4p1p1/8/3PP3/2N5/PPP2PPP/R1BQKB1R w KQkq - 2 4; v=-342
rnbqkbnr/pp3ppp/2pp4/4p3/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 0 4; v=-342
rnbqkb1r/ppp1pp1p/3p1np1/8/2BPP3/8/PPP2PPP/R1BQK1NR w KQkq - 2 4; v=-342
rnbqkbnr/p3pppp/1pp5/3pP3/3P4/8/PPP2PPP/R1BQKBNR w KQkq - 0 4; v=-342
rnbqk1nr/pp1pppbp/2p3p1/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-343
rnbqkb1r/p1pp1ppp/1p2pn2/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-343
rnbqkbnr/p2ppp1p/1p4p1/2p5/3PP2P/8/PPP2PP1/RNBQKB1R w KQkq - 0 4; v=-343
rnbqkbnr/pp3ppp/2ppp3/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-343
r1bqkb1r/pppp1ppp/2n1pn2/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 2 4; v=-344
rnbqkb1r/1ppp1ppp/p3pn2/8/2PPP3/8/PP3PPP/R1BQKBNR w KQkq - 1 4; v=-344
rnbqkb1r/pp2pppp/2pp1n2/8/3PP3/2N5/PPP2PPP/R1BQKB1R w KQkq - 2 4; v=-344
rnbqkbnr/pp2pp1p/3p2p1/2p5/4P3/2P2N2/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-344
r1bqkb1r/ppp1pppp/2n2n2/3p4/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-344
rnbqkb1r/pp1ppppp/5n2/2p5/2PP1B2/8/PP2PPPP/RN1QKB1R w KQkq - 0 4; v=-344
rn1qkb1r/pbpppppp/5n2/1p4B1/3P4/5N2/PPP1PPPP/R2QKB1R w KQkq - 2 4; v=-344
rnbqkb1r/pppp1p1p/4pnp1/8/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-344
rnbqk2r/ppppbppp/4pn2/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 2 4; v=-345
rn1qkb1r/ppp1pppp/5n2/3p1b2/2PP4/1Q6/PP2PPPP/RNB1KB1R w KQkq - 3 4; v=-345
rn1qkbnr/pbpp1ppp/1p2p3/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 2 4; v=-345
rnbqkb1r/pppp1ppp/4p3/3nP3/8/5N2/PPPP1PPP/R1BQKB1R w KQkq - 1 4; v=-345
r1bqkb1r/pppppp1p/2n2np1/8/2PP4/5P2/PP2P1PP/RNBQKB1R w KQkq - 1 4; v=-345
rnbqkb1r/pp1ppppp/2p5/3nP3/3P4/8/PPP2PPP/R1BQKBNR w KQkq - 1 4; v=-345
rnb1kbnr/ppq1pppp/2pp4/8/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-345
rnbqkb1r/ppppppp1/7p/4P3/4n3/5N2/PPPP1PPP/R1BQKB1R w KQkq - 1 4; v=-346
r1bqk1nr/pppp1ppp/2n5/2b1p3/2B1P3/5N2/PPPP1PPP/R1BQK2R w KQkq - 4 4; v=-346
rnbqkbnr/ppp1pp1p/6p1/3p4/3PPP2/8/PPP3PP/R1BQKBNR w KQkq - 0 4; v=-346
rnbqkb1r/ppppp2p/5np1/5p2/3P1B2/4P3/PPP2PPP/RN1QKB1R w KQkq - 0 4; v=-346
rnbqkb1r/p2ppppp/2p2n2/1p4B1/3P4/4P3/PPP2PPP/RN1QKB1R w KQkq - 0 4; v=-346
rnbqkb1r/ppp1pp1p/3p1np1/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 1 4; v=-346

Rebel · Post by **Rebel** » Tue Jun 23, 2020 11:27 pm

chrisw wrote: ↑Tue Jun 23, 2020 9:24 pm Do you have any pgns of games where Stockfish managed to turn a game into a win against any of the stronger opposition?

All games http://rebel13.nl/k0.7z

Looks like a job for MRI, I have no time myself.

Rebel · Post by **Rebel** » Tue Jun 23, 2020 11:31 pm

lkaufman wrote: ↑Tue Jun 23, 2020 8:53 pm These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?

Do the new knight-odds set with engines in the elo range of Arasan?

lkaufman · Post by **lkaufman** » Tue Jun 23, 2020 11:36 pm

Rebel wrote: ↑Tue Jun 23, 2020 11:31 pm
lkaufman wrote: ↑Tue Jun 23, 2020 8:53 pm These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?
Do the new knight-odds set with engines in the elo range of Arasan?

Yes, with some bias towards including engines rated somewhat below arasan rather than above, since I expect the new set to be on average more favorable for the weaker engine than the old set. I'm also running some related experiments which I'll report on later.

Rebel · Post by **Rebel** » Wed Jun 24, 2020 12:17 am

lkaufman wrote: ↑Tue Jun 23, 2020 11:36 pm
Rebel wrote: ↑Tue Jun 23, 2020 11:31 pm
lkaufman wrote: ↑Tue Jun 23, 2020 8:53 pm These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?
Do the new knight-odds set with engines in the elo range of Arasan?
Yes, with some bias towards including engines rated somewhat below arasan rather than above, since I expect the new set to be on average more favorable for the weaker engine than the old set. I'm also running some related experiments which I'll report on later.

Quicky with the new knight-odds epd. tc=40/40

Code: Select all

   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 Arasan_22       : 3368.9      86.5     100   86.5%
   2 Fruit_2.1       : 3187.7     139.0     200   69.5%
   3 Stockfish_11    : 3043.4      74.5     300   24.8%

lkaufman · Post by **lkaufman** » Wed Jun 24, 2020 3:04 am

Rebel wrote: ↑Wed Jun 24, 2020 12:17 am
lkaufman wrote: ↑Tue Jun 23, 2020 11:36 pm
Rebel wrote: ↑Tue Jun 23, 2020 11:31 pm
lkaufman wrote: ↑Tue Jun 23, 2020 8:53 pm These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?
Do the new knight-odds set with engines in the elo range of Arasan?
Yes, with some bias towards including engines rated somewhat below arasan rather than above, since I expect the new set to be on average more favorable for the weaker engine than the old set. I'm also running some related experiments which I'll report on later.
Quicky with the new knight-odds epd. tc=40/40
Code: Select all
   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 Arasan_22       : 3368.9      86.5     100   86.5%
   2 Fruit_2.1       : 3187.7     139.0     200   69.5%
   3 Stockfish_11    : 3043.4      74.5     300   24.8%

Wow, a day and night difference from the old knight-odds set. With the old set Arasan and Fruit scored 39% (averaged), here they averaged 78%, an improvement of 39%, roughly 273 elo (using 7 elo per percent)!! It seems that the new (500, actually 230 position) set really is something like a pawn harder for White than the old set, if a pawn is worth something like 273 elo, which is at least in the right ballpark. I looked over many of the positions in the new set, and except for a few near either extreme most are pretty reasonable for knight odds. A few of the outliers had moves played that didn't seem plausible for a grandmaster to have played in normal chess, perhaps the dataset included some amateur openings, and a few others were reasonable for normal chess but weren't sensible with the specific knight missing. But there were just a few such positions, and roughly balanced between White and Black errors, so the full set is a pretty good simulation of knight odds play, though culling ten or so from each end of the list would make it a bit better. So it seems we need somewhat weaker engines than Fruit 2.1 for knight odds at this time limit; perhaps it would be close to balanced with Fruit 2.1 at 40/10.

Chessqueen · Post by **Chessqueen** » Wed Jun 24, 2020 5:19 am

lkaufman wrote: ↑Wed Jun 24, 2020 3:04 am
Rebel wrote: ↑Wed Jun 24, 2020 12:17 am
lkaufman wrote: ↑Tue Jun 23, 2020 11:36 pm
Rebel wrote: ↑Tue Jun 23, 2020 11:31 pm
lkaufman wrote: ↑Tue Jun 23, 2020 8:53 pm These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?
Do the new knight-odds set with engines in the elo range of Arasan?
Yes, with some bias towards including engines rated somewhat below arasan rather than above, since I expect the new set to be on average more favorable for the weaker engine than the old set. I'm also running some related experiments which I'll report on later.
Quicky with the new knight-odds epd. tc=40/40
Code: Select all
   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 Arasan_22       : 3368.9      86.5     100   86.5%
   2 Fruit_2.1       : 3187.7     139.0     200   69.5%
   3 Stockfish_11    : 3043.4      74.5     300   24.8%
Wow, a day and night difference from the old knight-odds set. With the old set Arasan and Fruit scored 39% (averaged), here they averaged 78%, an improvement of 39%, roughly 273 elo (using 7 elo per percent)!! It seems that the new (500, actually 230 position) set really is something like a pawn harder for White than the old set, if a pawn is worth something like 273 elo, which is at least in the right ballpark. I looked over many of the positions in the new set, and except for a few near either extreme most are pretty reasonable for knight odds. A few of the outliers had moves played that didn't seem plausible for a grandmaster to have played in normal chess, perhaps the dataset included some amateur openings, and a few others were reasonable for normal chess but weren't sensible with the specific knight missing. But there were just a few such positions, and roughly balanced between White and Black errors, so the full set is a pretty good simulation of knight odds play, though culling ten or so from each end of the list would make it a bit better. So it seems we need somewhat weaker engines than Fruit 2.1 for knight odds at this time limit; perhaps it would be close to balanced with Fruit 2.1 at 40/10.

It could be that certain engine do not know how to trade when they are a Knight up, but try Spike or even Thinker versus Stockfish which are much weaker but they knows how to take advantage when it is a piece up, they will keep trading pieces when they are a piece up making harder for the opponent .

lkaufman · Post by **lkaufman** » Wed Jun 24, 2020 6:16 am

Chessqueen wrote: ↑Wed Jun 24, 2020 5:19 am
lkaufman wrote: ↑Wed Jun 24, 2020 3:04 am
Rebel wrote: ↑Wed Jun 24, 2020 12:17 am
lkaufman wrote: ↑Tue Jun 23, 2020 11:36 pm
Rebel wrote: ↑Tue Jun 23, 2020 11:31 pm
lkaufman wrote: ↑Tue Jun 23, 2020 8:53 pm These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?
Do the new knight-odds set with engines in the elo range of Arasan?
Yes, with some bias towards including engines rated somewhat below arasan rather than above, since I expect the new set to be on average more favorable for the weaker engine than the old set. I'm also running some related experiments which I'll report on later.
Quicky with the new knight-odds epd. tc=40/40
Code: Select all
   # ENGINE          : RATING    POINTS  PLAYED    (%)
   1 Arasan_22       : 3368.9      86.5     100   86.5%
   2 Fruit_2.1       : 3187.7     139.0     200   69.5%
   3 Stockfish_11    : 3043.4      74.5     300   24.8%
Wow, a day and night difference from the old knight-odds set. With the old set Arasan and Fruit scored 39% (averaged), here they averaged 78%, an improvement of 39%, roughly 273 elo (using 7 elo per percent)!! It seems that the new (500, actually 230 position) set really is something like a pawn harder for White than the old set, if a pawn is worth something like 273 elo, which is at least in the right ballpark. I looked over many of the positions in the new set, and except for a few near either extreme most are pretty reasonable for knight odds. A few of the outliers had moves played that didn't seem plausible for a grandmaster to have played in normal chess, perhaps the dataset included some amateur openings, and a few others were reasonable for normal chess but weren't sensible with the specific knight missing. But there were just a few such positions, and roughly balanced between White and Black errors, so the full set is a pretty good simulation of knight odds play, though culling ten or so from each end of the list would make it a bit better. So it seems we need somewhat weaker engines than Fruit 2.1 for knight odds at this time limit; perhaps it would be close to balanced with Fruit 2.1 at 40/10.
It could be that certain engine do not know how to trade when they are a Knight up, but try Spike or even Thinker versus Stockfish which are much weaker but they knows how to take advantage when it is a piece up, they will keep trading pieces when they are a piece up making harder for the opponent .

Yes, this relates to the following test I ran. Three engines involved: Komodo 14 (default) giving knight odds (small book I had, may rerun later with new book by ChrisW), Arasan 14 64 bit, and Stockfish 11 Skill Level 10. All engines running on four threads for greater variety, time limit 3' + 2". I chose Stockfish Skill Level 10 because it was evenly matched (in normal chess, with Contempt set to zero) with Arasan 14 (direct match came out 136 to 135, plus one elo for Arasan, we can call them dead equal. This is surprising enough, because Arasan 14 is only about 100 elo below Fruit 2.1 and on four threads at this standard blitz time limit would probably be a good match for Carlsen or Nakamura. SF Skill Level 10 is presumably intended for average amateurs, being in the middle of the Skill range, so this is pretty funny, but that's not the point of my test. When I ran Komodo 14 giving knight odds to Arasan 14, Komodo 14 won by 146 elo. But when Komodo 14 tried to give knight odds to Stockfish Skill Level 10 (with Contempt set to min of -100 to encourage trading), Stockfish won by 40 elo in 300 games. So Stockfish Skill Level 10 did 186 elo better than Arasan 14 at knight odds, despite being just even with it in normal chess, and despite being intended to be an "amateur" level. Part of this is due to Contempt, part to just better understanding of how to evaluate chess positions, and part to the method used to weaken the Skill levels, which causes many small errors (up to a pawn) but no huge blunders. So Stockfish Skill Levels appear to be better simulations of human grandmasters (at least for handicap play) than normal engines of the proper rating, though it's not yet clear if they fully simulate the proper level. I had previously tried Komodo skill levels for this same purpose, but they don't serve as well because the method they use to weaken, partly crippled search, will sometimes cause large enough mistakes to lose even with an extra piece. More tests are needed.

Stockfish Handicap Matches

Re: Stockfish Handicap Matches

Re: Stockfish Handicap Matches

Re: Stockfish Handicap Matches

Re: Stockfish Handicap Matches

Re: Stockfish Handicap Matches

Re: Stockfish Handicap Matches

Re: Stockfish Handicap Matches

Re: Stockfish Handicap Matches

Re: Stockfish Handicap Matches

Re: Stockfish Handicap Matches