Carbec wrote: ↑Thu Feb 03, 2022 3:09 pm
I did 2 matches and get rather different results...
 
Two ways:
1. Between to engines, play at least 1000 games to make the result meaningful. (I have noticed that 1000 gives a very good indication; playing more games makes the indication more precise, but doesn't greatly change it anymore.) Then you will know your relative rating against this engine.
2. Play a gauntlet against 5 or 6 engines (with 1000 games per match, which would thus mean 5000-6000 games), with your version X. Note down the result X obtains, such as +35. This means that your engine version X obtained +35 Elo against the average of the field. Then play against the exact same engines in the same conditions, but use your engine version X+1. If you now score +100, you have improved by +65 Elo. This is the best method to guess the Elo-range where your engine will fall if it was tested by a rating list such as CCRL.
Personally I always do 1. first, and pick a target engine which is around the rating I expect my new version to be. If the other engine turns out to be too strong, I pick a weaker one; and the other way around. When I found an engine against which the new version scores roughly 50%, I run a gauntlet with 5-6 engines around the rating of that other engine. The rating span is about +/-50 up and down, so I would expect my engine to end somewhere in the middle of the gauntlet.
Third way, after you have progressed some way: SPRT testing. Test your new version x+1 against the old version X and see if it comes out better in CuteChess. Search Rustic's site (listed in my sig); I have a page online on how to do this.