I find it really hard to test my engine strength, and to get an idea whether changes are improving my engine or not. The main issue is that it takes a lot of time to play 100s of games at longer time controls (3+2 or something like that). I assume strenght must vary depending on time control, some angine are better with 1+0 and some better with longer like 10+10.
Also, is testing games to a given depth a good way of checking just the evaluation function? I assume some cutoffs will make the engine weeker if I play to a given depth, with the same evaluation. For example, I tried to play a tournament with my engine vs TSCP 1.81 to depth 5 per move and won like 90% of the games with just value+PST evaluation.
Are there any standard time controls or levels for testing which is efficient? At a search rate of around 25-30k nodes/s it is a slow process