Ras wrote: ↑Sun Jun 06, 2021 6:41 pm
mvanthoor wrote: ↑Sun Jun 06, 2021 6:26 pm50K games would run for days and days on end...
I need about a day for that.
What monstrous quadcore is that? I only have an i7-6700K. After the current tests, which often take more than one night (10 hours or so), I think extending the tests to 20K or 50K games would take at least 2 days.
I do that on my desktop with a CPU cooler that is clearly oversized in relation to the CPU's power consumption so that even sustained full load stays inaudible at currently 43°C CPU temperature.
Same here. I always use oversized hardware for what I want to do, so it stays cool and thus quiet.
Then you should see that at least in self-play. The problem with other engines is that if your engine has N serious weaknesses, and you fix one of them, you still won't see much of a difference because it will instead lose due to one of the other N-1 weaknesses. While self-play inflates the gains, it does allow to test individual features.
I noticed. Some engines don't really mind if my engine becomes faster and gains even 2 ply on them; they just keep scoring the same. Some other engines are _very_ sensitive to a speed gain in their opponent though.
I think that engines which get their playing strength from search depth are more sensitive to an engine that is "catching up" in search depth; where other engines, that get most of their strength due to a good evaluation function, are less sensitive to opponents becoming faster. (At least, within reason.)
Now I'll probably change my testing protocol again:
1. Create e new feature.
2. Test it in self-play against the dev-version, using SPRT
3. When the feature succeeds, merge it into dev.
4. Go back to 1 for the next feature
5. When all features are done, test dev against master
6. Test dev against a small selection of engines
Step 6 should get me at least in the ballpark with regard to the rating it is going to get in a CCRL test.