another interesting cluster test result

bob · Post by **bob** » Tue Oct 13, 2009 7:32 pm

A while back, there was a discussion about where to skip doing null-moves. I had been using (for several years) an exclusion where if alpha == beta - 1 I do a null-move otherwise I do not. Several had reported that adding this gave them a boost in Elo. While testing other things, I also decided to test that as well. Turns out that with or without that exclusion, the Elo in cluster testing is _identical_. Not surprising since almost all nodes are searched with alpha == beta-1 anyway, so the overhead this adds is almost nil. A few extra nodes here, a quick refutation there, it all seems to wash out. I will post the results once all of the tests have been repeated.

bob · Post by **bob** » Wed Oct 14, 2009 6:28 pm

bob wrote:A while back, there was a discussion about where to skip doing null-moves. I had been using (for several years) an exclusion where if alpha == beta - 1 I do a null-move otherwise I do not. Several had reported that adding this gave them a boost in Elo. While testing other things, I also decided to test that as well. Turns out that with or without that exclusion, the Elo in cluster testing is _identical_. Not surprising since almost all nodes are searched with alpha == beta-1 anyway, so the overhead this adds is almost nil. A few extra nodes here, a quick refutation there, it all seems to wash out. I will post the results once all of the tests have been repeated.

More data. The above tests have completed. Here's the results. Crafty-23.1R07 is the best 23.1 so far, with the standard "only try null-move if alpha == beta-1" restriction. Crafty-23.1R09 has that restriction removed so that a null is tried everywhere in the tree, every last node, PV or not. (still the same other restrictions of course, none at depth=1, nor if in check, etc.)

Code: Select all

   4 Crafty-23.1R09-2     2614    5    4 40000   51%  2605   23%
   5 Crafty-23.1R09-6     2613    4    4 40000   51%  2605   22%
   6 Crafty-23.1R07-4     2612    4    4 40000   51%  2605   22%
   7 Crafty-23.1R07-5     2612    4    4 40000   51%  2605   23%
   8 Crafty-23.1R07-6     2611    5    4 40000   51%  2605   22%
   9 Crafty-23.1R09-1     2611    4    5 40000   51%  2605   23%
  10 Crafty-23.1R07-1     2611    4    5 40000   51%  2605   23%
  11 Crafty-23.1R09-5     2611    4    5 40000   51%  2605   23%
  12 Crafty-23.1R09-4     2610    4    4 40000   51%  2605   23%
  13 Crafty-23.1R09-3     2610    3    4 40000   51%  2605   22%
  14 Crafty-23.1R07-3     2610    4    4 40000   51%  2605   23%
  15 Crafty-23.1R07-2     2610    4    4 40000   51%  2605   22%
  16 Crafty-23.1-4        2594    4    5 40000   48%  2605   22%
  17 Crafty-23.1-1        2593    4    4 40000   48%  2605   22%
  18 Crafty-23.1-2        2593    4    4 40000   48%  2605   22%
  19 Crafty-23.1-3        2592    4    4 40000   48%  2605   23%
  20 Crafty-23.0-4        2567    4    3 40000   45%  2605   21%
  21 Crafty-23.0-2        2567    4    4 40000   45%  2605   21%
  22 Crafty-23.0-1        2566    4    5 40000   45%  2605   21%
  23 Crafty-23.0-3        2565    4    4 40000   45%  2605   20%

Nothing significant going on there. They seem to be equally strong. I included 23.0 which is the released version, and 23.1 was our previous "best" before we added some optimizations and other changes over the past couple of weeks to produce 23.1R07. They also all lie within the usual error margins. So for this discussion, the answer is "makes no difference".

jwes · Post by **jwes** » Wed Oct 14, 2009 8:42 pm

So doing null-move at PV nodes saves an insignificant number of nodes while very rarely causing a bad move from a null-move false positive, i.e. in the last round of an important tournament.

bob · Post by **bob** » Wed Oct 14, 2009 9:42 pm

jwes wrote:So doing null-move at PV nodes saves an insignificant number of nodes while very rarely causing a bad move from a null-move false positive, i.e. in the last round of an important tournament.

Hard to say. It be saving nodes at a critical position in that last round that lets you go deeper and find the correct move.

There are hardly _any_ PV nodes in a normal search anyway.

Note that the cluster testing does not lie or give false impressions. It just gives the truth, the whole truth, and nothing but the truth. Whether our intuition agrees or not doesn't matter. In this case, null everywhere might be very slightly better or worse. I'm not interested in running enough games to measure this since it looks like a +/- 1 Elo deal that takes a huge number of games to resolve.

mjlef · Post by **mjlef** » Thu Oct 15, 2009 12:57 am

bob wrote:
bob wrote:A while back, there was a discussion about where to skip doing null-moves. I had been using (for several years) an exclusion where if alpha == beta - 1 I do a null-move otherwise I do not. Several had reported that adding this gave them a boost in Elo. While testing other things, I also decided to test that as well. Turns out that with or without that exclusion, the Elo in cluster testing is _identical_. Not surprising since almost all nodes are searched with alpha == beta-1 anyway, so the overhead this adds is almost nil. A few extra nodes here, a quick refutation there, it all seems to wash out. I will post the results once all of the tests have been repeated.
More data. The above tests have completed. Here's the results. Crafty-23.1R07 is the best 23.1 so far, with the standard "only try null-move if alpha == beta-1" restriction. Crafty-23.1R09 has that restriction removed so that a null is tried everywhere in the tree, every last node, PV or not. (still the same other restrictions of course, none at depth=1, nor if in check, etc.)
Code: Select all
   4 Crafty-23.1R09-2     2614    5    4 40000   51%  2605   23%
   5 Crafty-23.1R09-6     2613    4    4 40000   51%  2605   22%
   6 Crafty-23.1R07-4     2612    4    4 40000   51%  2605   22%
   7 Crafty-23.1R07-5     2612    4    4 40000   51%  2605   23%
   8 Crafty-23.1R07-6     2611    5    4 40000   51%  2605   22%
   9 Crafty-23.1R09-1     2611    4    5 40000   51%  2605   23%
  10 Crafty-23.1R07-1     2611    4    5 40000   51%  2605   23%
  11 Crafty-23.1R09-5     2611    4    5 40000   51%  2605   23%
  12 Crafty-23.1R09-4     2610    4    4 40000   51%  2605   23%
  13 Crafty-23.1R09-3     2610    3    4 40000   51%  2605   22%
  14 Crafty-23.1R07-3     2610    4    4 40000   51%  2605   23%
  15 Crafty-23.1R07-2     2610    4    4 40000   51%  2605   22%
  16 Crafty-23.1-4        2594    4    5 40000   48%  2605   22%
  17 Crafty-23.1-1        2593    4    4 40000   48%  2605   22%
  18 Crafty-23.1-2        2593    4    4 40000   48%  2605   22%
  19 Crafty-23.1-3        2592    4    4 40000   48%  2605   23%
  20 Crafty-23.0-4        2567    4    3 40000   45%  2605   21%
  21 Crafty-23.0-2        2567    4    4 40000   45%  2605   21%
  22 Crafty-23.0-1        2566    4    5 40000   45%  2605   21%
  23 Crafty-23.0-3        2565    4    4 40000   45%  2605   20%
Nothing significant going on there. They seem to be equally strong. I included 23.0 which is the released version, and 23.1 was our previous "best" before we added some optimizations and other changes over the past couple of weeks to produce 23.1R07. They also all lie within the usual error margins. So for this discussion, the answer is "makes no difference".

what do the dash and numbers in this stuff mean (like Crafty-23.1R09-2 and Crafty-23.1R09-6"? I cannot tell which is which. I would think in a head to head test you would just have two opponents with one change.

bob · Post by **bob** » Thu Oct 15, 2009 1:44 am

mjlef wrote:
bob wrote:
bob wrote:A while back, there was a discussion about where to skip doing null-moves. I had been using (for several years) an exclusion where if alpha == beta - 1 I do a null-move otherwise I do not. Several had reported that adding this gave them a boost in Elo. While testing other things, I also decided to test that as well. Turns out that with or without that exclusion, the Elo in cluster testing is _identical_. Not surprising since almost all nodes are searched with alpha == beta-1 anyway, so the overhead this adds is almost nil. A few extra nodes here, a quick refutation there, it all seems to wash out. I will post the results once all of the tests have been repeated.
More data. The above tests have completed. Here's the results. Crafty-23.1R07 is the best 23.1 so far, with the standard "only try null-move if alpha == beta-1" restriction. Crafty-23.1R09 has that restriction removed so that a null is tried everywhere in the tree, every last node, PV or not. (still the same other restrictions of course, none at depth=1, nor if in check, etc.)
Code: Select all
   4 Crafty-23.1R09-2     2614    5    4 40000   51%  2605   23%
   5 Crafty-23.1R09-6     2613    4    4 40000   51%  2605   22%
   6 Crafty-23.1R07-4     2612    4    4 40000   51%  2605   22%
   7 Crafty-23.1R07-5     2612    4    4 40000   51%  2605   23%
   8 Crafty-23.1R07-6     2611    5    4 40000   51%  2605   22%
   9 Crafty-23.1R09-1     2611    4    5 40000   51%  2605   23%
  10 Crafty-23.1R07-1     2611    4    5 40000   51%  2605   23%
  11 Crafty-23.1R09-5     2611    4    5 40000   51%  2605   23%
  12 Crafty-23.1R09-4     2610    4    4 40000   51%  2605   23%
  13 Crafty-23.1R09-3     2610    3    4 40000   51%  2605   22%
  14 Crafty-23.1R07-3     2610    4    4 40000   51%  2605   23%
  15 Crafty-23.1R07-2     2610    4    4 40000   51%  2605   22%
  16 Crafty-23.1-4        2594    4    5 40000   48%  2605   22%
  17 Crafty-23.1-1        2593    4    4 40000   48%  2605   22%
  18 Crafty-23.1-2        2593    4    4 40000   48%  2605   22%
  19 Crafty-23.1-3        2592    4    4 40000   48%  2605   23%
  20 Crafty-23.0-4        2567    4    3 40000   45%  2605   21%
  21 Crafty-23.0-2        2567    4    4 40000   45%  2605   21%
  22 Crafty-23.0-1        2566    4    5 40000   45%  2605   21%
  23 Crafty-23.0-3        2565    4    4 40000   45%  2605   20%
Nothing significant going on there. They seem to be equally strong. I included 23.0 which is the released version, and 23.1 was our previous "best" before we added some optimizations and other changes over the past couple of weeks to produce 23.1R07. They also all lie within the usual error margins. So for this discussion, the answer is "makes no difference".
what do the dash and numbers in this stuff mean (like Crafty-23.1R09-2 and Crafty-23.1R09-6"? I cannot tell which is which. I would think in a head to head test you would just have two opponents with one change.

the -1- or -2- is the run number. If you look at 23.0, I ran the 40K game test four times, hence the -1, -2, -3 and -4 suffixes. If I just run one time, it is always possible to get one of those 2-sigma events, or a 3-sigma event. If I run it more than once, this becomes less likely.

In this case, however, I was asked to just "light the cluster off" as our IT guys were fooling around with airflow and A/C levels since when I run this, the air around the cluster hits around 135 degrees F.

The closer the -1 -2 -3 -etc results are, the better. Most are very consistent. Occasionally one will be a bit outside the norm.

Dirt · Post by **Dirt** » Thu Oct 15, 2009 5:11 am

mjlef wrote:I would think in a head to head test you would just have two opponents with one change.

You may or may not be aware that the versions of Crafty being tested are not playing each other. The individual results of the opponent engines are not shown.

bob · Post by **bob** » Thu Oct 15, 2009 5:30 am

Dirt wrote:
mjlef wrote:I would think in a head to head test you would just have two opponents with one change.
You may or may not be aware that the versions of Crafty being tested are not playing each other. The individual results of the opponent engines are not shown.

I had thought this was obvious from previous posts. Each version of Crafty plays against 5 opponents that remain constant, using 4,000 starting positions that remain constant for each 40K game run.

Gian-Carlo Pascutto · Thu Oct 15, 2009 10:42 am

jwes wrote:So doing null-move at PV nodes saves an insignificant number of nodes while very rarely causing a bad move from a null-move false positive, i.e. in the last round of an important tournament.

This reasoning is completely ignorant of the statistics Bob already presented.

The results from Bob show that even if this change would cause randomly to throw one game in 5, it causes a strength increase in the 4 other games enough to completely offset and overcome this loss.

What's really happening is probably not quite so extreme, but the reasoning you give is flawed nevertheless: the results already show that the program is no weaker with this change.

mjlef · Post by **mjlef** » Thu Oct 15, 2009 3:59 pm

Who are the opponents? What are the time controls? Perhaps a link to the testing criteria/setup would be useful, especially if it does not change.

another interesting cluster test result

another interesting cluster test result

Re: another interesting cluster test result

Re: another interesting cluster test result

Re: another interesting cluster test result

Re: another interesting cluster test result

Re: another interesting cluster test result

Re: another interesting cluster test result

Re: another interesting cluster test result

Re: another interesting cluster test result

Re: another interesting cluster test result