I'm trying to test a change to my engine, so I'm doing a head-to-head using the cutechess-cli (just the new version playing the old/current version). The test completed (sprt stopped at about 800) and gave it said "Elo difference: 41.7 +/- 21.9" (and for what it's worth, I ran another test of 10,000 games and got round a 35 Elo difference reported then).
But then when I take that pgn file and run it in Ordo, it gives me back only a 5 Elo difference and the number of games it says has been played is way off (it says 2300 instead of 800). But I'm not sure why that is. I've looked at the pgn file and there are 803 completed games, plus 2 incomplete games.
If it helps, my cutechess command:
Code: Select all
cutechess-cli -tournament gauntlet -concurrency 3 -pgnout result.pgn -engine conf=khepricandidate tc=0/10+0.1 -engine conf=khepribase tc=0/10+0.1 -draw movenumber=40 movecount=5 score=8 -resign movecount=5 score=1000 -each proto=uci -openings file="openings-6ply-1000.pgn" order=random policy=round -repeat -rounds 2000 -games 2 -sprt elo0=0 elo1=10 alpha=0.05 beta=0.05