Stockfish Handicap Matches
Moderators: hgm, Rebel, chrisw
-
- Posts: 5960
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Stockfish Handicap Matches
These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?
Komodo rules!
-
- Posts: 4315
- Joined: Tue Apr 03, 2012 4:28 pm
Re: Stockfish Handicap Matches
Do you have any pgns of games where Stockfish managed to turn a game into a win against any of the stronger opposition?Rebel wrote: ↑Tue Jun 23, 2020 3:20 pm Stockfish 11 gauntlet with Chris old knight odds epd.
First match, tc=40/10Second match, tc=40/20Code: Select all
# ENGINE : RATING POINTS PLAYED (%) 1 Komodo_14 : 3477.9 94.5 100 94.5% 2 Houdini_6.03 : 3283.6 85.0 100 85.0% 3 Laser_1.7 : 3251.3 82.5 100 82.5% 4 rofChade_2.3 : 3228.0 80.5 100 80.5% 5 Arasan_22 : 2979.6 50.0 100 50.0% 6 Stockfish_11 : 2979.6 107.5 500 21.5%
Third match, tc=40/40, included oldies.Code: Select all
# ENGINE : RATING POINTS PLAYED (%) 1 Komodo_14 : 3515.8 96.5 100 96.5% 2 Houdini_6.03 : 3340.0 91.0 100 91.0% 3 rofChade_2.3 : 3252.7 86.0 100 86.0% 4 Laser_1.7 : 3194.4 81.5 100 81.5% 5 Arasan_22 : 2962.6 54.0 100 54.0% 6 Stockfish_11 : 2934.5 91.0 500 18.2%
Fourth match, tc=40/80Code: Select all
# ENGINE : RATING POINTS PLAYED (%) 1 Komodo_14 : 3743.1 97.5 100 97.5% 2 Houdini_6.03 : 3568.3 93.5 100 93.5% 3 rofChade_2.3 : 3398.3 84.5 100 84.5% 4 Laser_1.7 : 3293.6 75.0 100 75.0% 5 Arasan_22 : 3146.9 56.5 100 56.5% 6 Stockfish_11 : 3101.1 334.0 800 41.8% 7 ProDeo : 2899.1 24.0 100 24.0% 8 Benjamin : 2874.1 21.5 100 21.5% 9 Fruit_2.1 : 2775.6 13.5 100 13.5%
I will run Chris new knight-odds set when it's ready.Code: Select all
# ENGINE : RATING POINTS PLAYED (%) 1 Komodo_14 > : 3655.4 100.0 100 100.0% 2 Houdini_6.03 : 3282.1 96.0 100 96.0% 3 rofChade_2.3 : 3066.2 87.5 100 87.5% 4 Laser_1.7 : 2996.9 82.5 100 82.5% 5 Arasan_22 : 2814.7 62.5 100 62.5% 6 Stockfish_11 : 2725.1 303.5 800 37.9% 7 ProDeo : 2564.0 28.5 100 28.5% 8 Benjamin : 2523.1 24.0 100 24.0% 9 Fruit_2.1 : 2428.0 15.5 100 15.5%
-
- Posts: 4315
- Joined: Tue Apr 03, 2012 4:28 pm
Re: Stockfish Handicap Matches
Large nite-odds epd uploaded. 3800 epds, sorted by SF11 evaluation 250ms.
file is marked -5000.epds (not to be confused with the smaller -500.epds
https://github.com/ChrisWhittington/Chess-EPDs
first few hundred below ...
file is marked -5000.epds (not to be confused with the smaller -500.epds
https://github.com/ChrisWhittington/Chess-EPDs
first few hundred below ...
Code: Select all
rnbqkb1r/pppp1ppp/5n2/4p3/3PP3/3B4/PPP2PPP/R1BQK1NR w KQkq - 0 4; v=-179
rnbqkb1r/pppp1ppp/5n2/4p3/3PP3/3B4/PPP2PPP/RNBQK2R w KQkq - 0 4; v=-212
r1bqkbnr/pp1ppppp/n7/2p5/4P3/2P2N2/PP1P1PPP/R1BQKB1R w KQkq - 2 4; v=-261
rnbqk1nr/ppp2ppp/4p3/3pP3/1b1P4/8/PPP2PPP/R1BQKBNR w KQkq - 1 4; v=-276
r1bqkb1r/pppppp1p/n4np1/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 1 4; v=-281
rnbqk2r/pppp1ppp/5n2/2b1p3/2B1P3/5N2/PPPP1PPP/R1BQK2R w KQkq - 4 4; v=-287
r1bqk1nr/pppp1ppp/2n5/1B2p3/1b2P3/5N2/PPPP1PPP/R1BQK2R w KQkq - 4 4; v=-288
rnbqkb1r/ppppp2p/5ppn/8/3PP3/2N5/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-295
r1bqkb1r/pppppppp/n7/3nP3/8/5N2/PPPP1PPP/R1BQKB1R w KQkq - 1 4; v=-299
r2qkbnr/pppnpppp/8/3p1b2/2PP4/1Q6/PP2PPPP/RNB1KB1R w KQkq - 3 4; v=-300
rnbqkbnr/p2ppp1p/6p1/1ppP4/2P5/8/PP2PPPP/RNBQKB1R w KQkq - 0 4; v=-301
rnbqkb1r/pppppppp/8/2n1P3/8/3P4/PPP2PPP/R1BQKBNR w KQkq - 1 4; v=-303
rn1qkbnr/1bpppppp/p7/1p6/P2PP3/8/1PP2PPP/RNBQKB1R w KQkq - 1 4; v=-304
rnbqkbnr/p2ppp1p/2p3p1/1p6/8/5NP1/PPPPPPBP/R1BQK2R w KQkq - 0 4; v=-305
rnb1kbnr/pp1pppp1/2p4p/q7/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-307
rnbqkbnr/p2ppp1p/6p1/1ppP4/2P5/8/PP2PPPP/R1BQKBNR w KQkq - 0 4; v=-308
rnbqkb1r/pppppppp/8/2n1P3/8/3P4/PPP2PPP/RNBQKB1R w KQkq - 1 4; v=-310
rn1qkbnr/ppp2ppp/3p4/4p3/3PP1b1/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-312
rnbqk2r/ppppppbp/5np1/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 3 4; v=-313
rn1qkb1r/pbpppppp/5n2/1p6/3PP3/3B4/PPP2PPP/RNBQK2R w KQkq - 3 4; v=-314
rnbqk1nr/pp1pppbp/2p3p1/8/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 0 4; v=-314
rnbqkbnr/p2ppp1p/1p4p1/2p5/4P3/2P2N2/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-318
rnbqkbnr/pp1p1ppp/8/2pPp3/2P5/8/PP2PPPP/RNBQKB1R w KQkq - 0 4; v=-319
r1bqkbnr/pppp1p1p/2n3p1/4p3/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 1 4; v=-320
rnbqkbnr/pp1pp1p1/7p/2p2p2/3P3B/8/PPP1PPPP/RN1QKB1R w KQkq - 0 4; v=-320
r1bqkbnr/ppp2ppp/2n1p3/3p4/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-323
r1bqkbnr/ppp2ppp/2n1p3/3pP3/3P4/8/PPP2PPP/R1BQKBNR w KQkq - 0 4; v=-323
rnbqkb1r/p2ppppp/5n2/1ppP4/2P5/8/PP2PPPP/RNBQKB1R w KQkq - 0 4; v=-324
r1bqkb1r/pppppppp/2n5/4P3/6n1/5N2/PPPP1PPP/R1BQKB1R w KQkq - 1 4; v=-324
rnbqkb1r/p1pp1ppp/1p2pn2/8/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-324
r1bqkbnr/pp1npppp/2pp4/8/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-324
rnbqkb1r/pppppp1p/6p1/3nP3/3P4/8/PPP2PPP/RNBQKB1R w KQkq - 1 4; v=-325
rnbqkbnr/pp2ppp1/2p4p/3p4/3PP3/3B4/PPP2PPP/R1BQK1NR w KQkq - 0 4; v=-325
rnbqk2r/ppppppbp/5np1/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 2 4; v=-325
rnbqkbnr/ppp3pp/3p4/4pp2/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-325
rnbqk1nr/pp1pppbp/2p3p1/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 2 4; v=-326
rnbqkb1r/pp1ppppp/2p5/3nP3/3P4/8/PPP2PPP/RNBQKB1R w KQkq - 0 4; v=-327
rnb1kbnr/ppq1pppp/2pp4/8/3PP3/2N5/PPP2PPP/R1BQKB1R w KQkq - 1 4; v=-327
rnbqkb1r/1ppp1ppp/p3pn2/8/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-328
rnbqkb1r/pppp1p1p/4pnp1/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-328
rnbqkb1r/pp1ppp1p/2p2np1/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-328
rnbqkb1r/pp1ppp1p/5np1/2p5/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 1 4; v=-329
rnbqkb1r/ppp2ppp/3ppn2/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 1 4; v=-329
rnbqkbnr/ppp1pp2/3p2pp/8/3PP2P/8/PPP2PP1/R1BQKBNR w KQkq - 0 4; v=-330
rnbqkb1r/pppppp1p/6p1/3nP3/3P4/8/PPP2PPP/R1BQKBNR w KQkq - 1 4; v=-330
r1bqk1nr/ppppbppp/2n1p3/8/4P3/5N2/PPPPQPPP/R1B1KB1R w KQkq - 4 4; v=-330
rnbqkb1r/pp1ppppp/2p5/3nP3/3P4/8/PPP2PPP/RNBQKB1R w KQkq - 1 4; v=-331
rnb1kbnr/ppq1pppp/3p4/2p5/3PP3/2P5/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-331
rnbqk1nr/p1ppppbp/1p4p1/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-332
rnbqk1nr/p1pp1ppp/1p2p3/8/1b1PP3/2N5/PPP2PPP/R1BQKB1R w KQkq - 2 4; v=-332
r1bqkbnr/pp1ppp1p/n1p3p1/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-333
rnbqkbnr/pppp3p/5pp1/4p3/2PP4/6P1/PP2PP1P/RNBQKB1R w KQkq - 0 4; v=-333
rnbqkbnr/pp2pp1p/2p3p1/3p4/3PP3/7P/PPP2PP1/R1BQKBNR w KQkq - 0 4; v=-333
rnbqkbnr/pp3ppp/2pp4/4p3/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-334
rnbqkb1r/pp2pppp/2pp1n2/8/3PP3/5P2/PPP3PP/RNBQKB1R w KQkq - 0 4; v=-334
rnbqk1nr/pp1pppbp/2p3p1/8/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 2 4; v=-334
rnbqkbnr/p2ppp1p/1p4p1/2p5/3PP3/2P5/PP3PPP/RNBQKB1R w KQkq - 0 4; v=-334
rnbqkb1r/ppp1ppp1/3p1n1p/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-335
r1bqkbnr/pp1p1ppp/2n5/2p1p3/4P3/2P2N2/PP1P1PPP/R1BQKB1R w KQkq - 2 4; v=-335
rnbqkb1r/pp1p1ppp/2p1pn2/8/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-335
rnbqkbnr/p2ppp1p/1p4p1/2p5/3PP2P/8/PPP2PP1/R1BQKBNR w KQkq - 0 4; v=-335
rn1qkbnr/1bpppppp/p7/1p6/P2PP3/8/1PP2PPP/R1BQKBNR w KQkq - 1 4; v=-335
r1bqkbnr/ppp2ppp/2n1p3/3pP3/3P4/8/PPP2PPP/RNBQKB1R w KQkq - 0 4; v=-335
rnbqkbnr/pp1p2pp/4p3/2p2p2/3P4/6P1/PPP1PPBP/RNBQK2R w KQkq - 0 4; v=-336
rnbqkb1r/ppp1pp1p/3p1np1/8/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-336
rnb1kbnr/pp1pp1pp/1qp5/5p2/3P1B2/5N2/PPP1PPPP/R2QKB1R w KQkq - 2 4; v=-336
rnbqk1nr/ppp2ppp/8/2bpp3/2B1P3/5N2/PPPP1PPP/R1BQK2R w KQkq - 0 4; v=-336
r1bqkb1r/ppp1pppp/2np1n2/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 2 4; v=-337
rnb1kbnr/ppqppp1p/6p1/2p5/3PP3/2P5/PP3PPP/R1BQKBNR w KQkq - 1 4; v=-337
rnbqk1nr/1pppppbp/p5p1/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-337
rnbqkb1r/1p1ppppp/p4n2/2pP4/2P5/8/PP2PPPP/RNBQKB1R w KQkq - 0 4; v=-337
rnbqkb1r/ppp2ppp/3ppn2/8/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-338
rnbqkb1r/pp2pppp/2pp1n2/8/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 2 4; v=-338
rnbqkb1r/ppppp2p/5np1/5p2/3P3P/6P1/PPP1PP2/R1BQKBNR w KQkq - 1 4; v=-338
rnb1kbnr/pp1ppp1p/6p1/q1p5/4P3/2P2N2/PP1P1PPP/R1BQKB1R w KQkq - 1 4; v=-338
r1bqkbnr/pppp1pp1/2n4p/4p3/2B1P3/5N2/PPPP1PPP/R1BQK2R w KQkq - 0 4; v=-338
rnbqk1nr/p1pp1ppp/1p2p3/8/1b1PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 1 4; v=-338
r1bqk1nr/pppp1ppp/2nb4/1B2p3/4P3/5N2/PPPP1PPP/R1BQK2R w KQkq - 4 4; v=-339
rnb1kbnr/pp1pp1pp/1qp5/5pB1/3P4/4P3/PPP2PPP/RN1QKB1R w KQkq - 1 4; v=-339
rnbqkb1r/pp2pppp/2pp1n2/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-339
rn1qkb1r/pbpppppp/1p3n2/8/3PP3/3B4/PPP2PPP/RNBQK2R w KQkq - 3 4; v=-339
rnbqkb1r/pp1p1ppp/4pn2/2pP4/2P5/8/PP2PPPP/RNBQKB1R w KQkq - 0 4; v=-339
rnbqk1nr/pp1pppbp/2p3p1/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-340
rnbqkbnr/ppp2p1p/4p1p1/3p4/3PP3/8/PPPN1PPP/R1BQKB1R w KQkq - 0 4; v=-342
r1bqkb1r/ppp1pppp/2n2n2/3p4/2PP4/4P3/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-342
rnbqkbnr/pp1p1ppp/8/2pPp3/2P5/8/PP2PPPP/R1BQKBNR w KQkq - 0 4; v=-342
rnbqk1nr/pppp1pbp/4p1p1/8/3PP3/2N5/PPP2PPP/R1BQKB1R w KQkq - 2 4; v=-342
rnbqkbnr/pp3ppp/2pp4/4p3/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 0 4; v=-342
rnbqkb1r/ppp1pp1p/3p1np1/8/2BPP3/8/PPP2PPP/R1BQK1NR w KQkq - 2 4; v=-342
rnbqkbnr/p3pppp/1pp5/3pP3/3P4/8/PPP2PPP/R1BQKBNR w KQkq - 0 4; v=-342
rnbqk1nr/pp1pppbp/2p3p1/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-343
rnbqkb1r/p1pp1ppp/1p2pn2/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-343
rnbqkbnr/p2ppp1p/1p4p1/2p5/3PP2P/8/PPP2PP1/RNBQKB1R w KQkq - 0 4; v=-343
rnbqkbnr/pp3ppp/2ppp3/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 0 4; v=-343
r1bqkb1r/pppp1ppp/2n1pn2/8/3PP3/5N2/PPP2PPP/R1BQKB1R w KQkq - 2 4; v=-344
rnbqkb1r/1ppp1ppp/p3pn2/8/2PPP3/8/PP3PPP/R1BQKBNR w KQkq - 1 4; v=-344
rnbqkb1r/pp2pppp/2pp1n2/8/3PP3/2N5/PPP2PPP/R1BQKB1R w KQkq - 2 4; v=-344
rnbqkbnr/pp2pp1p/3p2p1/2p5/4P3/2P2N2/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-344
r1bqkb1r/ppp1pppp/2n2n2/3p4/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 0 4; v=-344
rnbqkb1r/pp1ppppp/5n2/2p5/2PP1B2/8/PP2PPPP/RN1QKB1R w KQkq - 0 4; v=-344
rn1qkb1r/pbpppppp/5n2/1p4B1/3P4/5N2/PPP1PPPP/R2QKB1R w KQkq - 2 4; v=-344
rnbqkb1r/pppp1p1p/4pnp1/8/2P1P3/2N5/PP1P1PPP/R1BQKB1R w KQkq - 0 4; v=-344
rnbqk2r/ppppbppp/4pn2/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 2 4; v=-345
rn1qkb1r/ppp1pppp/5n2/3p1b2/2PP4/1Q6/PP2PPPP/RNB1KB1R w KQkq - 3 4; v=-345
rn1qkbnr/pbpp1ppp/1p2p3/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 2 4; v=-345
rnbqkb1r/pppp1ppp/4p3/3nP3/8/5N2/PPPP1PPP/R1BQKB1R w KQkq - 1 4; v=-345
r1bqkb1r/pppppp1p/2n2np1/8/2PP4/5P2/PP2P1PP/RNBQKB1R w KQkq - 1 4; v=-345
rnbqkb1r/pp1ppppp/2p5/3nP3/3P4/8/PPP2PPP/R1BQKBNR w KQkq - 1 4; v=-345
rnb1kbnr/ppq1pppp/2pp4/8/2PPP3/8/PP3PPP/RNBQKB1R w KQkq - 1 4; v=-345
rnbqkb1r/ppppppp1/7p/4P3/4n3/5N2/PPPP1PPP/R1BQKB1R w KQkq - 1 4; v=-346
r1bqk1nr/pppp1ppp/2n5/2b1p3/2B1P3/5N2/PPPP1PPP/R1BQK2R w KQkq - 4 4; v=-346
rnbqkbnr/ppp1pp1p/6p1/3p4/3PPP2/8/PPP3PP/R1BQKBNR w KQkq - 0 4; v=-346
rnbqkb1r/ppppp2p/5np1/5p2/3P1B2/4P3/PPP2PPP/RN1QKB1R w KQkq - 0 4; v=-346
rnbqkb1r/p2ppppp/2p2n2/1p4B1/3P4/4P3/PPP2PPP/RN1QKB1R w KQkq - 0 4; v=-346
rnbqkb1r/ppp1pp1p/3p1np1/8/2PP4/2N5/PP2PPPP/R1BQKB1R w KQkq - 1 4; v=-346
-
- Posts: 6991
- Joined: Thu Aug 18, 2011 12:04 pm
Re: Stockfish Handicap Matches
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 6991
- Joined: Thu Aug 18, 2011 12:04 pm
Re: Stockfish Handicap Matches
Do the new knight-odds set with engines in the elo range of Arasan?lkaufman wrote: ↑Tue Jun 23, 2020 8:53 pm These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 5960
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Stockfish Handicap Matches
Yes, with some bias towards including engines rated somewhat below arasan rather than above, since I expect the new set to be on average more favorable for the weaker engine than the old set. I'm also running some related experiments which I'll report on later.Rebel wrote: ↑Tue Jun 23, 2020 11:31 pmDo the new knight-odds set with engines in the elo range of Arasan?lkaufman wrote: ↑Tue Jun 23, 2020 8:53 pm These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?
Komodo rules!
-
- Posts: 6991
- Joined: Thu Aug 18, 2011 12:04 pm
Re: Stockfish Handicap Matches
Quicky with the new knight-odds epd. tc=40/40lkaufman wrote: ↑Tue Jun 23, 2020 11:36 pmYes, with some bias towards including engines rated somewhat below arasan rather than above, since I expect the new set to be on average more favorable for the weaker engine than the old set. I'm also running some related experiments which I'll report on later.Rebel wrote: ↑Tue Jun 23, 2020 11:31 pmDo the new knight-odds set with engines in the elo range of Arasan?lkaufman wrote: ↑Tue Jun 23, 2020 8:53 pm These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?
Code: Select all
# ENGINE : RATING POINTS PLAYED (%)
1 Arasan_22 : 3368.9 86.5 100 86.5%
2 Fruit_2.1 : 3187.7 139.0 200 69.5%
3 Stockfish_11 : 3043.4 74.5 300 24.8%
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 5960
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Stockfish Handicap Matches
Wow, a day and night difference from the old knight-odds set. With the old set Arasan and Fruit scored 39% (averaged), here they averaged 78%, an improvement of 39%, roughly 273 elo (using 7 elo per percent)!! It seems that the new (500, actually 230 position) set really is something like a pawn harder for White than the old set, if a pawn is worth something like 273 elo, which is at least in the right ballpark. I looked over many of the positions in the new set, and except for a few near either extreme most are pretty reasonable for knight odds. A few of the outliers had moves played that didn't seem plausible for a grandmaster to have played in normal chess, perhaps the dataset included some amateur openings, and a few others were reasonable for normal chess but weren't sensible with the specific knight missing. But there were just a few such positions, and roughly balanced between White and Black errors, so the full set is a pretty good simulation of knight odds play, though culling ten or so from each end of the list would make it a bit better. So it seems we need somewhat weaker engines than Fruit 2.1 for knight odds at this time limit; perhaps it would be close to balanced with Fruit 2.1 at 40/10.Rebel wrote: ↑Wed Jun 24, 2020 12:17 amQuicky with the new knight-odds epd. tc=40/40lkaufman wrote: ↑Tue Jun 23, 2020 11:36 pmYes, with some bias towards including engines rated somewhat below arasan rather than above, since I expect the new set to be on average more favorable for the weaker engine than the old set. I'm also running some related experiments which I'll report on later.Rebel wrote: ↑Tue Jun 23, 2020 11:31 pmDo the new knight-odds set with engines in the elo range of Arasan?lkaufman wrote: ↑Tue Jun 23, 2020 8:53 pm These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?
Code: Select all
# ENGINE : RATING POINTS PLAYED (%) 1 Arasan_22 : 3368.9 86.5 100 86.5% 2 Fruit_2.1 : 3187.7 139.0 200 69.5% 3 Stockfish_11 : 3043.4 74.5 300 24.8%
Komodo rules!
-
- Posts: 5582
- Joined: Wed Sep 05, 2018 2:16 am
- Location: Moving
- Full name: Jorge Picado
Re: Stockfish Handicap Matches
It could be that certain engine do not know how to trade when they are a Knight up, but try Spike or even Thinker versus Stockfish which are much weaker but they knows how to take advantage when it is a piece up, they will keep trading pieces when they are a piece up making harder for the opponent .lkaufman wrote: ↑Wed Jun 24, 2020 3:04 amWow, a day and night difference from the old knight-odds set. With the old set Arasan and Fruit scored 39% (averaged), here they averaged 78%, an improvement of 39%, roughly 273 elo (using 7 elo per percent)!! It seems that the new (500, actually 230 position) set really is something like a pawn harder for White than the old set, if a pawn is worth something like 273 elo, which is at least in the right ballpark. I looked over many of the positions in the new set, and except for a few near either extreme most are pretty reasonable for knight odds. A few of the outliers had moves played that didn't seem plausible for a grandmaster to have played in normal chess, perhaps the dataset included some amateur openings, and a few others were reasonable for normal chess but weren't sensible with the specific knight missing. But there were just a few such positions, and roughly balanced between White and Black errors, so the full set is a pretty good simulation of knight odds play, though culling ten or so from each end of the list would make it a bit better. So it seems we need somewhat weaker engines than Fruit 2.1 for knight odds at this time limit; perhaps it would be close to balanced with Fruit 2.1 at 40/10.Rebel wrote: ↑Wed Jun 24, 2020 12:17 amQuicky with the new knight-odds epd. tc=40/40lkaufman wrote: ↑Tue Jun 23, 2020 11:36 pmYes, with some bias towards including engines rated somewhat below arasan rather than above, since I expect the new set to be on average more favorable for the weaker engine than the old set. I'm also running some related experiments which I'll report on later.Rebel wrote: ↑Tue Jun 23, 2020 11:31 pmDo the new knight-odds set with engines in the elo range of Arasan?lkaufman wrote: ↑Tue Jun 23, 2020 8:53 pm These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?
Code: Select all
# ENGINE : RATING POINTS PLAYED (%) 1 Arasan_22 : 3368.9 86.5 100 86.5% 2 Fruit_2.1 : 3187.7 139.0 200 69.5% 3 Stockfish_11 : 3043.4 74.5 300 24.8%
Do NOT worry and be happy, we all live a short life
-
- Posts: 5960
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Stockfish Handicap Matches
Yes, this relates to the following test I ran. Three engines involved: Komodo 14 (default) giving knight odds (small book I had, may rerun later with new book by ChrisW), Arasan 14 64 bit, and Stockfish 11 Skill Level 10. All engines running on four threads for greater variety, time limit 3' + 2". I chose Stockfish Skill Level 10 because it was evenly matched (in normal chess, with Contempt set to zero) with Arasan 14 (direct match came out 136 to 135, plus one elo for Arasan, we can call them dead equal. This is surprising enough, because Arasan 14 is only about 100 elo below Fruit 2.1 and on four threads at this standard blitz time limit would probably be a good match for Carlsen or Nakamura. SF Skill Level 10 is presumably intended for average amateurs, being in the middle of the Skill range, so this is pretty funny, but that's not the point of my test. When I ran Komodo 14 giving knight odds to Arasan 14, Komodo 14 won by 146 elo. But when Komodo 14 tried to give knight odds to Stockfish Skill Level 10 (with Contempt set to min of -100 to encourage trading), Stockfish won by 40 elo in 300 games. So Stockfish Skill Level 10 did 186 elo better than Arasan 14 at knight odds, despite being just even with it in normal chess, and despite being intended to be an "amateur" level. Part of this is due to Contempt, part to just better understanding of how to evaluate chess positions, and part to the method used to weaken the Skill levels, which causes many small errors (up to a pawn) but no huge blunders. So Stockfish Skill Levels appear to be better simulations of human grandmasters (at least for handicap play) than normal engines of the proper rating, though it's not yet clear if they fully simulate the proper level. I had previously tried Komodo skill levels for this same purpose, but they don't serve as well because the method they use to weaken, partly crippled search, will sometimes cause large enough mistakes to lose even with an extra piece. More tests are needed.Chessqueen wrote: ↑Wed Jun 24, 2020 5:19 amIt could be that certain engine do not know how to trade when they are a Knight up, but try Spike or even Thinker versus Stockfish which are much weaker but they knows how to take advantage when it is a piece up, they will keep trading pieces when they are a piece up making harder for the opponent .lkaufman wrote: ↑Wed Jun 24, 2020 3:04 amWow, a day and night difference from the old knight-odds set. With the old set Arasan and Fruit scored 39% (averaged), here they averaged 78%, an improvement of 39%, roughly 273 elo (using 7 elo per percent)!! It seems that the new (500, actually 230 position) set really is something like a pawn harder for White than the old set, if a pawn is worth something like 273 elo, which is at least in the right ballpark. I looked over many of the positions in the new set, and except for a few near either extreme most are pretty reasonable for knight odds. A few of the outliers had moves played that didn't seem plausible for a grandmaster to have played in normal chess, perhaps the dataset included some amateur openings, and a few others were reasonable for normal chess but weren't sensible with the specific knight missing. But there were just a few such positions, and roughly balanced between White and Black errors, so the full set is a pretty good simulation of knight odds play, though culling ten or so from each end of the list would make it a bit better. So it seems we need somewhat weaker engines than Fruit 2.1 for knight odds at this time limit; perhaps it would be close to balanced with Fruit 2.1 at 40/10.Rebel wrote: ↑Wed Jun 24, 2020 12:17 amQuicky with the new knight-odds epd. tc=40/40lkaufman wrote: ↑Tue Jun 23, 2020 11:36 pmYes, with some bias towards including engines rated somewhat below arasan rather than above, since I expect the new set to be on average more favorable for the weaker engine than the old set. I'm also running some related experiments which I'll report on later.Rebel wrote: ↑Tue Jun 23, 2020 11:31 pmDo the new knight-odds set with engines in the elo range of Arasan?lkaufman wrote: ↑Tue Jun 23, 2020 8:53 pm These results are really quite incredible. If I interpret everything properly, they are saying that at the 40/10 level Stockfish 11 broke even with Arasan 22, which is rated 3141 on CCRL 40/15 list on one thread, giving knight odds with these randomized openings? Even at longer time controls, Arasan's winning margin is not that impressive. I'm sure that Arasan and all of the other engines playing with the extra piece will score higher when you use the new dataset based on CCRL openings rather than random ones, but even so it is really ridiculous to think that an engine rated nearly 300 elo above Carlsen (and probably would be higher in actual FIDE rapid) can only break even with an extra piece. I've commented on this phenomenon before, but this is a much more extreme example of it than I've ever seen. Am I missing something here?
Code: Select all
# ENGINE : RATING POINTS PLAYED (%) 1 Arasan_22 : 3368.9 86.5 100 86.5% 2 Fruit_2.1 : 3187.7 139.0 200 69.5% 3 Stockfish_11 : 3043.4 74.5 300 24.8%
Komodo rules!