Here, Zahak 0.2.1 and 0.3.0 underperform against Rustic on 10s+0.1s (i.e., Rustic Alpha 2 is clearly stronger than Zahak 0.2.1, while according to CCRL, the Elo difference should be only +10, for Zahak.) It doesn't matter too much for the gauntlet though, because they don't lose on time or by disconnects, so they can be used for testing. The only thing is that they won't survive long in the fast gauntlet.
Strangely enough, Loki 1.0.2 and 1.2.0 have exactly the same problem.
I started a gauntlet with 10 engines for Alpha 1 to play against. Then, I removed the weakest 4, and replaced them with 4 engines that are clearly stronger than Alpha 1, and then I tested Alpha 1.5. Then I replaced 2 engines, and tested Alpha 2. Now I've replaced 2 engines again for Alpha 2.1.100, and again for Alpha 2.2.100.I switch back to slower time control with less games (1200 in total)
You can see the pattern. The overlapping engines between the Rustic versions are responsible for the rating of the new Rustic version. The new engines get their rating according to the performance against the new Rustic version. That way, the gauntlet keeps creeping up in strength while Rustic's strength increases.
If an engine underperforms, but otherwise plays normally (no crashes, no forfeits, etc), it'll just be replaced earlier. If a new engine does REALLY bad in a fast time gauntlet, I'll just remove its games from the database (that first run will be 500 games), replace it with a different engine, do a 1-on-1, and then add those games to the database.
I'm keeping two gauntlets this way: a 10s+0.1s, and a 60s+0.6s, so I can use both to determine the approximate gain in strength per version. To get a (very) rough idea, I run a 1-on-1 between the latest and the previous version for 2000 games, but I intend to replace that with an SPRT run.