Richard Allbert wrote:Hi,
I'm currently testing new versions of Lime - having squeezed a whole 50 points out of it

(Could even be less), but I seem to have something wrong that I can't put my finger on.
I've been adjusting the move ordering scores using test suites to try and get some more strength, and on one position, IQ4 no. 96, Lime_v64 finds the winning line from depth 8.
Lime_v65 (different move ordering) doesn't find the winning move until depth 10,
Other than the ordering scores, the rest of the search is performed the same.
Does this indicate a bug? What should I look for?
How do you (99% stronger engine programmers than me) tune the move ordering?
Thanks
Richard
I gather statistics on the move number where a beta cutoff occurs.
That way, after a search I can see the average number of moves tried before the cutoff move is found.
So the closer to 1.00 that average is, the better.
This gives a rough indication of how good the move ordering is. It worked for me at the early stages of developement. But as the move ordering improves, it becomes more and more difficult to see an improvement on this number.
Also the number just gives an indication. The only way to verify if the changes are good or bad, is to run a lot of test games.
And now slightly off topic ...
I have found it to be very important to run test matches agains a diverse number of opponents.
Warning! Rambling based on pure speculation!
The reason for this is that computer chess is a "game of bottlenecks".
When two engines play, the positions they reach tend to gravitate towards positions where they disagree on the evaluation.
They typically end up in situations where each of them thinks they are slightly better.
In these situations the engine who is "right" about the evaluation tends to do better.
An small example to illustrate: My engine had poor passed pawn and king safety evaluation. Therefore against an engine with better passed pawn evaluation it would lose, because the opponent would, more often than not, create a passed pawn, and thereby creating an advantage and typically win the game.
Therefore my "bottleneck" against this engine was my poor passed pawn evaluation.
In essence poor passed pawn evaluation was preventing me from improving the performance against this particular engine.
Now there are many many bottlenecks hindering improvement typically.
Lack of search depth caused by poor move ordering could be a bottleneck!
If a make a change to my evaluation function improving the passed pawn evaluation, then how much improvement I gain against an opponent now depends upon whether or not passed pawn was a bottleneck against this particular opponent!
If it was, the games now tend to steer towards another "disagreement". And if this is another bottleneck for me, then no strength is apparently gained!
It might actually become a slight decrease in performance against this engine because the added evaluation might make the engine perform slightly slower.
When I added better pawn evaluation to Pupsi2, the test result indicated a huge gain against some opponents, ranging all the way to slightly worse performance against others. The net total was a huge plus, but if i had tested against only a few unluckily selected opponents, i might have thrown the changes away, thinking they were bad!
There it is! My "Simplified theory of bottlenecks in computer chess"
Summa summarum:
Test against a relatively large number of opponents.
I test against 10 opponents selected from the rankings in the "UCI Engines League". More specifically the 10 engines just above Pupsi in the rankings. I then run 80 Noomen games agaist each at 1+1.
The end
Kind regards,
Jesper