Eval Dilemma

Discussion of chess software programming and technical issues.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Eval Dilemma

Post by bob »

michiguel wrote:
hgm wrote:Indeed. Even under-promotion is much more common.
I disagree. I think KNBK is more common than underpromotion. I have seen it quite a few times.

Miguel
PS: I have seen underpromotions, but most of them were not needed. i.e., promoting a Q was the best move but the engine decides to promote a R.
My only evidence is anecdotal. I screen every game Crafty plays except when it wins. I once broke the KBNK pc/square tables so that it could not mate at all with that. And it played like that for a year on ICC and with other uses without someone mentioning the fact that it was drawing a won ending. We had a discussion on r.g.c.c about this mate and I mentioned that Crafty could handle this trivially with a very shallow search. Someone tested and said "no it can't." After looking, it had the pc/sq tables backward and drove the king to the wrong corner where it had to accept a draw by repetition or 50 moves since it didn't know that was wrong. So we played a year without ever seeing this bug, either live on ICC, in our testing, or in anyone that was using crafty in their own tournaments...
MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: Eval Dilemma

Post by MattieShoes »

If you're playing a strong engine or human, I'd imagine the odds of it happening approach zero. When you're playing a human that's 500 points below you... Probably still extremely rare but it's happened at least once in the first 200 games it played. KNBP vs KR, human traded R for P, engine pushed the king to the wrong corner, and scored a draw. It'd still be close to 0 Elo gain. I was just thinking out-loud that even though the strength difference is negligible, it could be tested with a setup of a few KNB vs K positions and a few positions that lead to KNB vs K.

Edit: and with a search deep enough to make sure B and N are not going to be eaten by the enemy K, you can assign a very high score, more than +6 or +7 -- I That may help accuracy of non-KBN/K endgame play since it should assume an opponent will avoid trading down into a lost endgame... I'll have to think about how to handle situations where the K can eat the knight but only a couple ply later...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Eval Dilemma

Post by bob »

MattieShoes wrote:If you're playing a strong engine or human, I'd imagine the odds of it happening approach zero. When you're playing a human that's 500 points below you... Probably still extremely rare but it's happened at least once in the first 200 games it played. KNBP vs KR, human traded R for P, engine pushed the king to the wrong corner, and scored a draw. It'd still be close to 0 Elo gain. I was just thinking out-loud that even though the strength difference is negligible, it could be tested with a setup of a few KNB vs K positions and a few positions that lead to KNB vs K.

Edit: and with a search deep enough to make sure B and N are not going to be eaten by the enemy K, you can assign a very high score, more than +6 or +7 -- I That may help accuracy of non-KBN/K endgame play since it should assume an opponent will avoid trading down into a lost endgame... I'll have to think about how to handle situations where the K can eat the knight but only a couple ply later...
That last part is trivial. It happens automatically. Let the search handle the material loss, let the evaluation recognize good vs bad positions. All you need is a simple piece/square table for each colored bishop and use the right one as dictated by your bishop...
Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

Re: Eval Dilemma -- some quick data

Post by Edsel Apostol »

bob wrote:Below, I am pasting the output from a script that I run while a test is in progress. The initial data has 3 sets of 32,000 games against Glaurung 1 & 2, fruit 2 and toga 2. The three versions of Crafty (Crafty-23.0fast1, fast2 and fast3) are the same program, same everything, just run 3 times in a row. Pretty consistent.

I then start sampling this data every 30 seconds, while playing a new match with Crafty-23.1R05. Watch how the rating moves around as it settles to the right area (more about what this version is at the bottom)

Code: Select all

Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2666    4    4 23346   58%  2607   22% 
   2 Toga2              2657    4    4 23346   57%  2607   23% 
   3 Crafty-23.0-fast3  2608    4    4 31128   52%  2595   22% 
   4 Crafty-23.0-fast2  2607    4    4 31128   52%  2595   22% 
   5 Crafty-23.0-fast1  2606    4    4 31128   51%  2595   22% 
   6 Fruit 2.1          2560    5    4 23346   44%  2607   24% 
   7 Glaurung 1.1 SMP   2496    4    4 23346   35%  2607   20% 
-----------------------  currently using 83 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2664    5    5 23350   58%  2606   22% 
   2 Toga2              2656    5    5 23349   57%  2606   23% 
   3 Crafty-23.1R05     2609  116  116    17   50%  2571   41% 
   4 Crafty-23.0-fast3  2606    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast2  2605    4    5 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2605    4    5 31128   51%  2594   22% 
   7 Fruit 2.1          2559    4    5 23348   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 23354   35%  2606   20% 
-----------------------  currently using 84 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2660    5    4 23420   58%  2602   22% 
   2 Toga2              2652    4    4 23415   57%  2602   23% 
   3 Crafty-23.1R05     2638   31   31   293   56%  2587   20% 
   4 Crafty-23.0-fast3  2602    4    5 31128   52%  2589   22% 
   5 Crafty-23.0-fast2  2601    4    4 31128   52%  2589   22% 
   6 Crafty-23.0-fast1  2601    4    4 31128   51%  2589   22% 
   7 Fruit 2.1          2555    5    5 23415   44%  2602   24% 
   8 Glaurung 1.1 SMP   2491    5    4 23427   35%  2602   20% 
-----------------------  currently using 84 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2662    4    4 23498   58%  2604   22% 
   2 Toga2              2653    4    5 23496   57%  2604   23% 
   3 Crafty-23.1R05     2624   21   21   626   54%  2588   22% 
   4 Crafty-23.0-fast3  2604    4    4 31128   52%  2591   22% 
   5 Crafty-23.0-fast2  2603    5    4 31128   52%  2591   22% 
   6 Crafty-23.0-fast1  2603    5    5 31128   51%  2591   22% 
   7 Fruit 2.1          2557    4    4 23497   44%  2604   24% 
   8 Glaurung 1.1 SMP   2493    4    4 23519   35%  2604   20% 
-----------------------  currently using 86 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2664    5    5 23564   58%  2605   22% 
   2 Toga2              2655    5    5 23564   57%  2605   23% 
   3 Crafty-23.1R05     2611   18   18   894   53%  2590   20% 
   4 Crafty-23.0-fast3  2606    4    4 31128   52%  2593   22% 
   5 Crafty-23.0-fast2  2605    4    4 31128   52%  2593   22% 
   6 Crafty-23.0-fast1  2605    4    4 31128   51%  2593   22% 
   7 Fruit 2.1          2559    5    5 23558   44%  2605   24% 
   8 Glaurung 1.1 SMP   2495    5    5 23592   35%  2605   20% 
-----------------------  currently using 87 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 23630   58%  2606   22% 
   2 Toga2              2656    5    5 23632   57%  2606   23% 
   3 Crafty-23.1R05     2607   16   16  1166   52%  2591   21% 
   4 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast2  2606    5    5 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2605    4    5 31128   51%  2594   22% 
   7 Fruit 2.1          2559    4    4 23624   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 23664   35%  2606   20% 
-----------------------  currently using 88 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 23690   58%  2606   22% 
   2 Toga2              2656    5    5 23696   57%  2606   23% 
   3 Crafty-23.1R05     2608   14   14  1424   52%  2591   20% 
   4 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast2  2605    5    5 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2605    4    5 31128   51%  2594   22% 
   7 Fruit 2.1          2559    4    4 23683   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 23739   35%  2606   20% 
-----------------------  currently using 88 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2666    4    4 23760   58%  2607   22% 
   2 Toga2              2657    4    4 23762   57%  2607   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2595   22% 
   4 Crafty-23.0-fast2  2606    4    4 31128   52%  2595   22% 
   5 Crafty-23.0-fast1  2606    4    4 31128   51%  2595   22% 
   6 Crafty-23.1R05     2601   13   13  1698   51%  2592   21% 
   7 Fruit 2.1          2560    4    4 23747   44%  2607   24% 
   8 Glaurung 1.1 SMP   2496    4    4 23813   35%  2607   20% 
-----------------------  currently using 89 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2666    4    4 23827   58%  2607   22% 
   2 Toga2              2657    4    4 23831   57%  2607   23% 
   3 Crafty-23.0-fast3  2608    4    4 31128   52%  2595   22% 
   4 Crafty-23.0-fast2  2606    4    4 31128   52%  2595   22% 
   5 Crafty-23.0-fast1  2606    4    4 31128   51%  2595   22% 
   6 Crafty-23.1R05     2601   12   12  1975   51%  2592   21% 
   7 Fruit 2.1          2560    4    4 23812   44%  2607   24% 
   8 Glaurung 1.1 SMP   2496    4    4 23889   35%  2607   20% 
-----------------------  currently using 90 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 23888   58%  2606   22% 
   2 Toga2              2657    4    4 23897   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.0-fast2  2606    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast1  2606    5    4 31128   51%  2594   22% 
   6 Crafty-23.1R05     2603   12   11  2246   52%  2591   21% 
   7 Fruit 2.1          2560    4    4 23879   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 23966   35%  2606   20% 
-----------------------  currently using 83 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2666    4    4 23952   58%  2607   22% 
   2 Toga2              2657    4    4 23961   57%  2607   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2595   22% 
   4 Crafty-23.0-fast2  2606    4    4 31128   52%  2595   22% 
   5 Crafty-23.0-fast1  2606    4    4 31128   51%  2595   22% 
   6 Crafty-23.1R05     2602   11   11  2489   51%  2592   21% 
   7 Fruit 2.1          2560    5    5 23941   44%  2607   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24019   35%  2607   20% 
-----------------------  currently using 81 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24021   58%  2606   22% 
   2 Toga2              2657    4    4 24018   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.0-fast2  2606    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast1  2606    5    4 31128   51%  2594   22% 
   6 Crafty-23.1R05     2603   11   11  2716   51%  2593   21% 
   7 Fruit 2.1          2561    5    5 24010   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24051   35%  2606   20% 
-----------------------  currently using 81 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24076   58%  2606   22% 
   2 Toga2              2656    4    4 24072   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.0-fast2  2606    5    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast1  2606    5    5 31128   51%  2594   22% 
   6 Crafty-23.1R05     2605   10   10  2924   51%  2594   21% 
   7 Fruit 2.1          2560    4    4 24072   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24088   35%  2606   20% 
-----------------------  currently using 94 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24135   58%  2606   22% 
   2 Toga2              2656    4    4 24131   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.1R05     2606   10   10  3172   52%  2593   21% 
   5 Crafty-23.0-fast2  2606    5    5 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2605    5    5 31128   51%  2594   22% 
   7 Fruit 2.1          2560    4    4 24136   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24154   35%  2606   20% 
-----------------------  currently using 94 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24203   58%  2606   22% 
   2 Toga2              2657    4    4 24200   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.0-fast2  2606    5    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast1  2606    5    5 31128   51%  2594   22% 
   6 Crafty-23.1R05     2605    9   10  3439   51%  2594   21% 
   7 Fruit 2.1          2560    4    4 24196   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24224   35%  2606   20% 
-----------------------  currently using 94 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24275   58%  2606   22% 
   2 Toga2              2657    4    4 24266   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.0-fast2  2606    5    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast1  2606    5    5 31128   51%  2594   22% 
   6 Crafty-23.1R05     2605    9    9  3730   51%  2593   22% 
   7 Fruit 2.1          2560    4    4 24272   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24301   35%  2606   20% 
-----------------------  currently using 94 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24339   58%  2606   22% 
   2 Toga2              2656    4    4 24336   57%  2606   23% 
   3 Crafty-23.1R05     2607    9    9  4010   52%  2593   22% 
   4 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast2  2606    5    5 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2605    4    5 31128   51%  2594   22% 
   7 Fruit 2.1          2560    4    4 24341   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24378   35%  2606   20% 
-----------------------  currently using 92 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24411   58%  2606   22% 
   2 Toga2              2657    4    4 24397   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.0-fast2  2606    5    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast1  2606    5    5 31128   51%  2594   22% 
   6 Crafty-23.1R05     2605    8    8  4279   52%  2593   22% 
   7 Fruit 2.1          2560    4    4 24409   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24446   35%  2606   20% 
-----------------------  currently using 94 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24482   58%  2606   22% 
   2 Toga2              2657    4    4 24467   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.1R05     2606    8    8  4562   52%  2593   22% 
   5 Crafty-23.0-fast2  2606    5    4 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2606    5    5 31128   51%  2594   22% 
   7 Fruit 2.1          2560    4    4 24475   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24522   35%  2606   20% 
-----------------------  currently using 94 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24549   58%  2606   22% 
   2 Toga2              2656    4    4 24535   57%  2606   23% 
   3 Crafty-23.1R05     2607    8    8  4838   52%  2593   22% 
   4 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast2  2606    5    5 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2605    5    5 31128   51%  2594   22% 
   7 Fruit 2.1          2560    4    4 24539   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24599   35%  2606   20% 
after 17 games, R05 is 2609. After 293 it is up to 2638. If you stopped the test after 300 games you would be highly tempted to say R05 is significantly better. After 626 games it is down a bit to 2624. But still 20 Elo better than the original. And of course, 626 games is quite a bit of computation. By the time we have done 1698 games, it now looks like it is a little _worse_ than the original 3 tests.

And in case you are interested, it ended up at 2606. R05 is identical to 23.0 as this is the version with the LMR offset search window, but this first run uses an offset of zero (0) to verify that it produces the same result as the original 23.0 version.

This is a "quick test" version to see how things look. The entire 32K game match normally takes about an hour or a little less. However, I am only using about 3/4 of the cluster as another user is running on 32 nodes or so. If you notice the error bars, this is all staying within 1SD, and as the error bar narrows, the score gets closer to "the truth"...
If I only have this kind of setup I think I could make my engine 100 elo stronger in a week or two. :)

You should try new ideas to make Crafty much stronger as I think that tuning the current features would only result to at most 20 elo.

I will be looking forward to your results using History Pruning if you ever decided to try that.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Eval Dilemma -- some quick data

Post by bob »

Edsel Apostol wrote:
bob wrote:Below, I am pasting the output from a script that I run while a test is in progress. The initial data has 3 sets of 32,000 games against Glaurung 1 & 2, fruit 2 and toga 2. The three versions of Crafty (Crafty-23.0fast1, fast2 and fast3) are the same program, same everything, just run 3 times in a row. Pretty consistent.

I then start sampling this data every 30 seconds, while playing a new match with Crafty-23.1R05. Watch how the rating moves around as it settles to the right area (more about what this version is at the bottom)

Code: Select all

Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2666    4    4 23346   58%  2607   22% 
   2 Toga2              2657    4    4 23346   57%  2607   23% 
   3 Crafty-23.0-fast3  2608    4    4 31128   52%  2595   22% 
   4 Crafty-23.0-fast2  2607    4    4 31128   52%  2595   22% 
   5 Crafty-23.0-fast1  2606    4    4 31128   51%  2595   22% 
   6 Fruit 2.1          2560    5    4 23346   44%  2607   24% 
   7 Glaurung 1.1 SMP   2496    4    4 23346   35%  2607   20% 
-----------------------  currently using 83 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2664    5    5 23350   58%  2606   22% 
   2 Toga2              2656    5    5 23349   57%  2606   23% 
   3 Crafty-23.1R05     2609  116  116    17   50%  2571   41% 
   4 Crafty-23.0-fast3  2606    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast2  2605    4    5 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2605    4    5 31128   51%  2594   22% 
   7 Fruit 2.1          2559    4    5 23348   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 23354   35%  2606   20% 
-----------------------  currently using 84 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2660    5    4 23420   58%  2602   22% 
   2 Toga2              2652    4    4 23415   57%  2602   23% 
   3 Crafty-23.1R05     2638   31   31   293   56%  2587   20% 
   4 Crafty-23.0-fast3  2602    4    5 31128   52%  2589   22% 
   5 Crafty-23.0-fast2  2601    4    4 31128   52%  2589   22% 
   6 Crafty-23.0-fast1  2601    4    4 31128   51%  2589   22% 
   7 Fruit 2.1          2555    5    5 23415   44%  2602   24% 
   8 Glaurung 1.1 SMP   2491    5    4 23427   35%  2602   20% 
-----------------------  currently using 84 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2662    4    4 23498   58%  2604   22% 
   2 Toga2              2653    4    5 23496   57%  2604   23% 
   3 Crafty-23.1R05     2624   21   21   626   54%  2588   22% 
   4 Crafty-23.0-fast3  2604    4    4 31128   52%  2591   22% 
   5 Crafty-23.0-fast2  2603    5    4 31128   52%  2591   22% 
   6 Crafty-23.0-fast1  2603    5    5 31128   51%  2591   22% 
   7 Fruit 2.1          2557    4    4 23497   44%  2604   24% 
   8 Glaurung 1.1 SMP   2493    4    4 23519   35%  2604   20% 
-----------------------  currently using 86 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2664    5    5 23564   58%  2605   22% 
   2 Toga2              2655    5    5 23564   57%  2605   23% 
   3 Crafty-23.1R05     2611   18   18   894   53%  2590   20% 
   4 Crafty-23.0-fast3  2606    4    4 31128   52%  2593   22% 
   5 Crafty-23.0-fast2  2605    4    4 31128   52%  2593   22% 
   6 Crafty-23.0-fast1  2605    4    4 31128   51%  2593   22% 
   7 Fruit 2.1          2559    5    5 23558   44%  2605   24% 
   8 Glaurung 1.1 SMP   2495    5    5 23592   35%  2605   20% 
-----------------------  currently using 87 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 23630   58%  2606   22% 
   2 Toga2              2656    5    5 23632   57%  2606   23% 
   3 Crafty-23.1R05     2607   16   16  1166   52%  2591   21% 
   4 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast2  2606    5    5 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2605    4    5 31128   51%  2594   22% 
   7 Fruit 2.1          2559    4    4 23624   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 23664   35%  2606   20% 
-----------------------  currently using 88 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 23690   58%  2606   22% 
   2 Toga2              2656    5    5 23696   57%  2606   23% 
   3 Crafty-23.1R05     2608   14   14  1424   52%  2591   20% 
   4 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast2  2605    5    5 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2605    4    5 31128   51%  2594   22% 
   7 Fruit 2.1          2559    4    4 23683   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 23739   35%  2606   20% 
-----------------------  currently using 88 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2666    4    4 23760   58%  2607   22% 
   2 Toga2              2657    4    4 23762   57%  2607   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2595   22% 
   4 Crafty-23.0-fast2  2606    4    4 31128   52%  2595   22% 
   5 Crafty-23.0-fast1  2606    4    4 31128   51%  2595   22% 
   6 Crafty-23.1R05     2601   13   13  1698   51%  2592   21% 
   7 Fruit 2.1          2560    4    4 23747   44%  2607   24% 
   8 Glaurung 1.1 SMP   2496    4    4 23813   35%  2607   20% 
-----------------------  currently using 89 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2666    4    4 23827   58%  2607   22% 
   2 Toga2              2657    4    4 23831   57%  2607   23% 
   3 Crafty-23.0-fast3  2608    4    4 31128   52%  2595   22% 
   4 Crafty-23.0-fast2  2606    4    4 31128   52%  2595   22% 
   5 Crafty-23.0-fast1  2606    4    4 31128   51%  2595   22% 
   6 Crafty-23.1R05     2601   12   12  1975   51%  2592   21% 
   7 Fruit 2.1          2560    4    4 23812   44%  2607   24% 
   8 Glaurung 1.1 SMP   2496    4    4 23889   35%  2607   20% 
-----------------------  currently using 90 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 23888   58%  2606   22% 
   2 Toga2              2657    4    4 23897   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.0-fast2  2606    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast1  2606    5    4 31128   51%  2594   22% 
   6 Crafty-23.1R05     2603   12   11  2246   52%  2591   21% 
   7 Fruit 2.1          2560    4    4 23879   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 23966   35%  2606   20% 
-----------------------  currently using 83 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2666    4    4 23952   58%  2607   22% 
   2 Toga2              2657    4    4 23961   57%  2607   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2595   22% 
   4 Crafty-23.0-fast2  2606    4    4 31128   52%  2595   22% 
   5 Crafty-23.0-fast1  2606    4    4 31128   51%  2595   22% 
   6 Crafty-23.1R05     2602   11   11  2489   51%  2592   21% 
   7 Fruit 2.1          2560    5    5 23941   44%  2607   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24019   35%  2607   20% 
-----------------------  currently using 81 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24021   58%  2606   22% 
   2 Toga2              2657    4    4 24018   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.0-fast2  2606    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast1  2606    5    4 31128   51%  2594   22% 
   6 Crafty-23.1R05     2603   11   11  2716   51%  2593   21% 
   7 Fruit 2.1          2561    5    5 24010   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24051   35%  2606   20% 
-----------------------  currently using 81 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24076   58%  2606   22% 
   2 Toga2              2656    4    4 24072   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.0-fast2  2606    5    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast1  2606    5    5 31128   51%  2594   22% 
   6 Crafty-23.1R05     2605   10   10  2924   51%  2594   21% 
   7 Fruit 2.1          2560    4    4 24072   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24088   35%  2606   20% 
-----------------------  currently using 94 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24135   58%  2606   22% 
   2 Toga2              2656    4    4 24131   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.1R05     2606   10   10  3172   52%  2593   21% 
   5 Crafty-23.0-fast2  2606    5    5 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2605    5    5 31128   51%  2594   22% 
   7 Fruit 2.1          2560    4    4 24136   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24154   35%  2606   20% 
-----------------------  currently using 94 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24203   58%  2606   22% 
   2 Toga2              2657    4    4 24200   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.0-fast2  2606    5    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast1  2606    5    5 31128   51%  2594   22% 
   6 Crafty-23.1R05     2605    9   10  3439   51%  2594   21% 
   7 Fruit 2.1          2560    4    4 24196   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24224   35%  2606   20% 
-----------------------  currently using 94 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24275   58%  2606   22% 
   2 Toga2              2657    4    4 24266   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.0-fast2  2606    5    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast1  2606    5    5 31128   51%  2594   22% 
   6 Crafty-23.1R05     2605    9    9  3730   51%  2593   22% 
   7 Fruit 2.1          2560    4    4 24272   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24301   35%  2606   20% 
-----------------------  currently using 94 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24339   58%  2606   22% 
   2 Toga2              2656    4    4 24336   57%  2606   23% 
   3 Crafty-23.1R05     2607    9    9  4010   52%  2593   22% 
   4 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast2  2606    5    5 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2605    4    5 31128   51%  2594   22% 
   7 Fruit 2.1          2560    4    4 24341   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24378   35%  2606   20% 
-----------------------  currently using 92 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24411   58%  2606   22% 
   2 Toga2              2657    4    4 24397   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.0-fast2  2606    5    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast1  2606    5    5 31128   51%  2594   22% 
   6 Crafty-23.1R05     2605    8    8  4279   52%  2593   22% 
   7 Fruit 2.1          2560    4    4 24409   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24446   35%  2606   20% 
-----------------------  currently using 94 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24482   58%  2606   22% 
   2 Toga2              2657    4    4 24467   57%  2606   23% 
   3 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   4 Crafty-23.1R05     2606    8    8  4562   52%  2593   22% 
   5 Crafty-23.0-fast2  2606    5    4 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2606    5    5 31128   51%  2594   22% 
   7 Fruit 2.1          2560    4    4 24475   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24522   35%  2606   20% 
-----------------------  currently using 94 nodes.
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.2       2665    4    4 24549   58%  2606   22% 
   2 Toga2              2656    4    4 24535   57%  2606   23% 
   3 Crafty-23.1R05     2607    8    8  4838   52%  2593   22% 
   4 Crafty-23.0-fast3  2607    4    4 31128   52%  2594   22% 
   5 Crafty-23.0-fast2  2606    5    5 31128   52%  2594   22% 
   6 Crafty-23.0-fast1  2605    5    5 31128   51%  2594   22% 
   7 Fruit 2.1          2560    4    4 24539   44%  2606   24% 
   8 Glaurung 1.1 SMP   2495    5    5 24599   35%  2606   20% 
after 17 games, R05 is 2609. After 293 it is up to 2638. If you stopped the test after 300 games you would be highly tempted to say R05 is significantly better. After 626 games it is down a bit to 2624. But still 20 Elo better than the original. And of course, 626 games is quite a bit of computation. By the time we have done 1698 games, it now looks like it is a little _worse_ than the original 3 tests.

And in case you are interested, it ended up at 2606. R05 is identical to 23.0 as this is the version with the LMR offset search window, but this first run uses an offset of zero (0) to verify that it produces the same result as the original 23.0 version.

This is a "quick test" version to see how things look. The entire 32K game match normally takes about an hour or a little less. However, I am only using about 3/4 of the cluster as another user is running on 32 nodes or so. If you notice the error bars, this is all staying within 1SD, and as the error bar narrows, the score gets closer to "the truth"...
If I only have this kind of setup I think I could make my engine 100 elo stronger in a week or two. :)

You should try new ideas to make Crafty much stronger as I think that tuning the current features would only result to at most 20 elo.

I will be looking forward to your results using History Pruning if you ever decided to try that.
It's on my list. But I'll remind you that this "tuning" has already produced > 100Elo improvement without adding new features. It is a classic mistake to think that the only way to improve an engine is to keep adding stuff, whether or not the old stuff is tuned correctly and working correctly... In the past month I have tried at least a hundred different ways of doing LMR. Some suggested by others, some I came up with myself. So far, nothing good has been found, but that doesn't mean nothing will be found. Not much point in adding new things until all the new ideas for old things has been exhausted.