Hmmm...mvk wrote:Returning to the above results once more: I have recently completed a higher resolution repeat of the above experiment. For that I picked 1 book version from every month in the past 4 years (namely, the last version of each month). Then I pitted them against each other with cutechess-cli, always using the same version of my program but letting that play with a different book version. Maximum allowable path error for the move selection is 0.1p. Each match had 1,000 games at 90s+0.5s time control for a total of 159,000 games. There are too many entrants for a full round robin test, therefore each program/book combo 'N' played against version N-1, N-2, N-4, N-8, N-16, etc (and, consequently, also against N+1, N+2, N+4, etc). Unlike the earlier test, I didn't enter a no-book variant.mvk wrote:I'm currently in the midst of quantifying some of the book improvement results. One coarse measurement I did last month is where I pitted my engine with different versions of the book (from 4 different years) against each other, and one without book. No opponent learning was active in these test matches, just a static book. 'b<year>' means that a book from that year was used. Results are:
This encourages me to believe that there is some kind of progression in the books, but also that that effect starts to taper off.Code: Select all
Rank Name Elo + - games score oppo. draws 1 rookie3.9x1-b2015 0 12 12 2000 54% -22 48% 2 rookie3.9x1-b2014 -10 10 10 3000 55% -41 47% 3 rookie3.9x1-b2013 -34 9 9 4000 57% -79 44% 4 rookie3.9x1-b2012 -88 10 10 3000 50% -88 43% 5 rookie3.9x1-bnone -219 13 13 2000 26% -61 36%
The result is depicted in the plot below. The ratings and margins were calculated by bayeselo with default settings.
Both the books' elo increase over the years and the recent taper-off effect are confirmed by the second experiment. I tried to find a correlation between the delta-elo and the expansion methods I used in each book version (eg. learning, dropout expansion, importing playchess games, etc...), but it is not so clear what helps most, unfortunately. Maybe more on that later.
In the meantime I have streamlined my original set of scripts and I'm deepening the dropout searches of the repertoire. There is now a live visualisation of this update process running here, where I sometimes go watch when I'm bored . The yellow arrows represent moves that stay in the book (without indication if they are good or bad btw) and the red arrow is the current dropout move for that position (that is, the best move leaving book).
Finally, I did a test to see if there is really an impact on allowing for a certain path error, and corresponding score-wise position degradation, in exchange for opening line variability. For this I played the same program with the same, most recent, book version against itself while applying different maximum allowable path errors in book move selection: I tested for 0.10 pawn, 0.08 pawn, 0.06 pawn, 0.04 pawn and 0.02 pawn maximum path error. On the servers I normally use 0.1 pawn, which is very wide. During tournaments I normally use a much smaller value. There was a claim before that doing this weakens the program in theory and practice. The latter can be tested, which I did here. See results below. My conclusion is that, within a reasonable number of games, there is no observable correlation between performance and allowable path error in this range. (Note that a 0.1 pawn disadvantage would normally correspond to a 10 elo loss)
Code: Select all
Rank Name Elo + - games score oppo. draws 1 rookie3.9x1-m040 1 3 3 34000 50% 0 32% 2 rookie3.9x1-m020 0 4 4 24000 50% 0 34% 3 rookie3.9x1-m100 0 4 4 24000 50% -1 33% 4 rookie3.9x1-m080 -1 3 3 34000 50% 0 32% 5 rookie3.9x1-m060 -1 3 3 44000 50% 0 32%
I did some testing and in the opening the standard deviation of the evaluation seems to be about 10 to 20 centi-pawns over a 5 ply interval from 15 ply to 45 ply depending on the program. i.e. the SD between the evaluation at 15 ply and 20 ply (or 40 and 45 ply) using the same program. So, a path error of 10 centi-pawns or less seems too low. I think I would do some tests while doubling the path errors until I got a detectable (i.e. greater than the error bands) drop in ELO. Once you're sure you have a small ELO drop do several tests between the last two bounds to find the highest path error with no detectable drop in ELO. The point is to maximize the number of playable lines. This makes the program less predictable and does a better job of exploring move alternatives. It won't explore them optimally, but it will be better than what you are doing now.
If I recall, you stated that you use random selection from the available moves that meet the path error requirements. I don't think random move selection is optimal. I would try some other method of move selection that attempts to maximize game scores.
I liked the link you added. It says ~38Mnps. What kind of hardware is that running on? Whats the difference between the red and yellow arrows?
Regards,
Zen