I have 4 available cores on my machine, so one of them is dedicated to lichess bot, and the rest to cutechess (with the maximal priority, so engines won't be disrupted by some system scheduler's hiccups, which is very important at such a fast time controls where every millisecond matters for the overall Elo stability). "tc=inf/2+0.1" may sound pretty fast, but engines at 2800-2900 Elo range are speedy enough to reach some reasonable depth in ~100 milliseconds - none of them loses on time. "Resign" adjudication has an aggressive threshold to save more time on already decided games ("twosided" is a very important flag, to prevent one of the engines from "cheating" and declaring a win by just having a much higher evaluation than the opponent even when the position is still not that conclusive).nice -n -20 ./cutechess-cli \
-concurrency 3 \
-tournament gauntlet -rounds 5000 -games 2 -repeat -ratinginterval 1 -recover \
-engine cmd="./engines/inanis" name="Inanis DEV" proto=uci option."Crash Files"=true \
-engine cmd="./engines/2800-2900/asymptote" proto=uci \
-engine cmd="./engines/2800-2900/gnucheese-1.00-64" proto=uci \
-engine cmd="./engines/2800-2900/daydreamer" name="daydreamer" proto=uci \
-engine cmd="./engines/2800-2900/Weiawaga" proto=uci \
-engine cmd="./engines/2800-2900/zurichess-luzern-linux-amd64" proto=uci \
-engine cmd="./engines/2800-2900/MinkoChess_1.3_x64" proto=uci \
-engine cmd="./engines/2800-2900/inanis_1_1_1" proto=uci \
-engine cmd="./engines/2800-2900/inanis_1_2_0" proto=uci \
-each tc=inf/2+0.1 timemargin=100 book=/home/ubuntu/books/Perfect2019.bin bookdepth=8 option.Hash=32 \
-resign movecount=3 score=300 twosided=true \
-draw movenumber=50 movecount=5 score=50 \
-maxmoves 100 \
-tb /home/ubuntu/syzygy/
I usually play 20,000 games which take around 24 hours to complete, and the result is within +- 4 Elo. This is still a lot if the changes are subtle, so sometimes I have to use my intuition to decide if something was worth it or not. I never use SPRT, since I just generally don't trust self-testing - engines are different and we can't just assume that some change (especially in the evaluation area) will work with other opponents.