The last few weeks I have been devoted strictly to optimizing my evaluation function using the Texel method. At first, I was almost ready to chuck this idea because after getting my first set of weights after minimizing the errors I not only didn't get any improvement, but I lost Elo!!! I did eventually find a bug in my code that led to this result, but it still left me with a sense of doubt about whether this was a worth endeavor or not. So, I decided to conduct an experiment to prove to myself that this method works. Taking 500K positions from games played by computers having an Elo of 2800 or greater I started up my iterative process that minimizes the square of errors to see if it affects the Elo or not. This process is described in the Chess Programming wiki for the Texel tuning method. The only design change I made is to go through the weights randomly on each iteration to try and spread-out changes. While iteratively searching for better and better solutions I saved intermediate sets of weights along the way. Initially I saved weights after 5, 10, 15, 20, 25, 30, 50, 75, 100 and 150 iterations and my routine converged after 157 iterations. I then setup all those weights in a round-robin tournament along with the iteration 0 progenitor (paragon) to see how they all relate to one another in terms of Elo. I was trying to show myself that as the error was minimized, at least up to the point of overfitting, that the Elo would also go up. Here are the results of that first experiment:
Code: Select all
Rank Name Elo +/- Games Score Draw
1 pass75 9 11 1100 51.4% 72.2%
2 pass158 4 10 1100 50.6% 75.5%
3 pass100 3 10 1100 50.5% 75.6%
4 pass50 3 10 1100 50.4% 75.5%
5 pass15 3 10 1100 50.4% 75.5%
6 pass30 2 10 1100 50.3% 76.5%
7 pass5 -1 10 1100 49.9% 74.9%
8 pass40 -2 10 1100 49.8% 77.7%
9 pass10 -3 10 1100 49.5% 76.3%
10 pass25 -5 10 1100 49.3% 76.6%
11 paragon -6 11 1100 49.2% 73.6%
12 pass20 -8 10 1100 48.8% 75.8%
These results weren't quite what I was hoping for, but they did show that in a very general sense the Elo goes up with more passes (i.e. less error). The upper half had all of the weights were created from an aggregate of 428 passes, while the lower half had 43 passes with paragon being pass0. However, as you can see there isn't an ordering that pops out at you that would allow someone to say a 40 pass solution is better than a 15 pass solution. However, it would appear I could pick up 15 Elo by choosing pass 75. I repeated this experiment with a different data set. This time I sampled 2 million records from human grandmaster games (i.e. > 2500 Elo) and preserved the weights for 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140 and 150 and my routine actually converged on the 157th pass. I then setup another round-robin tournament and here are the results:
Code: Select all
Rank Name Elo +/- Games Wins Losses Draws Points Score Draw
1 pass110 12 9 1024 116 82 826 529.0 51.7% 80.7%
2 pass50 7 10 1024 117 97 810 522.0 51.0% 79.1%
3 pass40 6 10 1024 112 94 818 521.0 50.9% 79.9%
4 pass150 5 10 1024 121 105 798 520.0 50.8% 77.9%
5 pass60 5 9 1024 105 89 830 520.0 50.8% 81.1%
6 pass157 5 10 1024 116 101 807 519.5 50.7% 78.8%
7 pass140 5 10 1024 123 108 793 519.5 50.7% 77.4%
8 pass70 3 9 1024 104 94 826 517.0 50.5% 80.7%
9 pass100 1 10 1024 109 105 810 514.0 50.2% 79.1%
10 pass20 0 10 1024 110 109 805 512.5 50.0% 78.6%
11 pass130 -0 10 1024 103 104 817 511.5 50.0% 79.8%
12 pass90 -5 10 1024 106 120 798 505.0 49.3% 77.9%
13 pass80 -5 10 1024 97 113 814 504.0 49.2% 79.5%
14 pass30 -6 10 1024 103 121 800 503.0 49.1% 78.1%
15 pass120 -7 9 1024 90 110 824 502.0 49.0% 80.5%
16 pass10 -13 11 1024 111 148 765 493.5 48.2% 74.7%
17 paragon -15 10 1024 101 144 779 490.5 47.9% 76.1%
These are very similar results to my first experiment. So, while the Texel method does work generally, you may need to test all of the solutions it creates to determine which are better. Of course, my testing method was limited to 100 games per round on the first experiment and 64 games per round on the second, so it's very possible that my results might change significantly if the games per round were increased to a larger number, but I doubt they would ever line up in order such that higher passes (i.e. less error) were always better than lower passes. In the meantime it looks like I may be able to pick up 27 Elo!