Interesting. I'll try using varied.bin to run some further tests, just in case it is the opening book.lithander wrote: ↑Mon Jul 18, 2022 11:29 pm I use the varied.bin that came with SCID and you can find it here: https://sourceforge.net/p/scid/code/ci/ ... ree/books/
I don't know much about opening books but I was wondering if your book was maybe too small or otherwise not a good at representing what the engines would face under tournament/testing conditions. In the extreme case no book is used at all and two equal strength engines basically replay the same match over and over. If one engine happens to win this game from the starting position than it would win all the time and it would look like this engine is much stronger. So I was just thinking maybe it's something to do with the openings...
I just ran a match at the same time-controls you used: 5s + 500ms increment and Hash set to 50MB on an i7-9700K with 7 games in parallel and got this result which seems again more inline with expectations than your gauntlet results.
Code: Select all
Score of Leorik-2.2 vs blunder-8.0.0: 571 - 347 - 502 [0.579] 1420 ... Leorik-2.2 playing White: 313 - 153 - 244 [0.613] 710 ... Leorik-2.2 playing Black: 258 - 194 - 258 [0.545] 710 ... White vs Black: 507 - 411 - 502 [0.534] 1420 Elo difference: 55.3 +/- 14.6, LOS: 100.0 %, DrawRatio: 35.4 %
Still not sure why my results are so skewed compared to yours, but I'll keep doing a little more investigating since it seems wrong to just chalk this up to fast time controls. It seems hard to me for fast time controls to completely explain why two engines that seem to be 30 Elo apart for you, are more than 100 Elo apart for me when all of our other conditions are the same. Especially since the gauntlet game out as expected, though Blunder 8.4.5 and 8.0.0 got crushed by most of the gauntlet, 8.4.5 gained 35 Elo, which translates about as expected for 8.4.5 being ~50 Elo stronger in self-play.