another interesting cluster test result
Moderator: Ras
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
another interesting cluster test result
A while back, there was a discussion about where to skip doing null-moves. I had been using (for several years) an exclusion where if alpha == beta - 1 I do a null-move otherwise I do not. Several had reported that adding this gave them a boost in Elo. While testing other things, I also decided to test that as well. Turns out that with or without that exclusion, the Elo in cluster testing is _identical_. Not surprising since almost all nodes are searched with alpha == beta-1 anyway, so the overhead this adds is almost nil. A few extra nodes here, a quick refutation there, it all seems to wash out. I will post the results once all of the tests have been repeated.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: another interesting cluster test result
More data. The above tests have completed. Here's the results. Crafty-23.1R07 is the best 23.1 so far, with the standard "only try null-move if alpha == beta-1" restriction. Crafty-23.1R09 has that restriction removed so that a null is tried everywhere in the tree, every last node, PV or not. (still the same other restrictions of course, none at depth=1, nor if in check, etc.)bob wrote:A while back, there was a discussion about where to skip doing null-moves. I had been using (for several years) an exclusion where if alpha == beta - 1 I do a null-move otherwise I do not. Several had reported that adding this gave them a boost in Elo. While testing other things, I also decided to test that as well. Turns out that with or without that exclusion, the Elo in cluster testing is _identical_. Not surprising since almost all nodes are searched with alpha == beta-1 anyway, so the overhead this adds is almost nil. A few extra nodes here, a quick refutation there, it all seems to wash out. I will post the results once all of the tests have been repeated.
Code: Select all
4 Crafty-23.1R09-2 2614 5 4 40000 51% 2605 23%
5 Crafty-23.1R09-6 2613 4 4 40000 51% 2605 22%
6 Crafty-23.1R07-4 2612 4 4 40000 51% 2605 22%
7 Crafty-23.1R07-5 2612 4 4 40000 51% 2605 23%
8 Crafty-23.1R07-6 2611 5 4 40000 51% 2605 22%
9 Crafty-23.1R09-1 2611 4 5 40000 51% 2605 23%
10 Crafty-23.1R07-1 2611 4 5 40000 51% 2605 23%
11 Crafty-23.1R09-5 2611 4 5 40000 51% 2605 23%
12 Crafty-23.1R09-4 2610 4 4 40000 51% 2605 23%
13 Crafty-23.1R09-3 2610 3 4 40000 51% 2605 22%
14 Crafty-23.1R07-3 2610 4 4 40000 51% 2605 23%
15 Crafty-23.1R07-2 2610 4 4 40000 51% 2605 22%
16 Crafty-23.1-4 2594 4 5 40000 48% 2605 22%
17 Crafty-23.1-1 2593 4 4 40000 48% 2605 22%
18 Crafty-23.1-2 2593 4 4 40000 48% 2605 22%
19 Crafty-23.1-3 2592 4 4 40000 48% 2605 23%
20 Crafty-23.0-4 2567 4 3 40000 45% 2605 21%
21 Crafty-23.0-2 2567 4 4 40000 45% 2605 21%
22 Crafty-23.0-1 2566 4 5 40000 45% 2605 21%
23 Crafty-23.0-3 2565 4 4 40000 45% 2605 20%
-
- Posts: 778
- Joined: Sat Jul 01, 2006 7:11 am
Re: another interesting cluster test result
So doing null-move at PV nodes saves an insignificant number of nodes while very rarely causing a bad move from a null-move false positive, i.e. in the last round of an important tournament.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: another interesting cluster test result
Hard to say. It be saving nodes at a critical position in that last round that lets you go deeper and find the correct move.jwes wrote:So doing null-move at PV nodes saves an insignificant number of nodes while very rarely causing a bad move from a null-move false positive, i.e. in the last round of an important tournament.
There are hardly _any_ PV nodes in a normal search anyway.
Note that the cluster testing does not lie or give false impressions. It just gives the truth, the whole truth, and nothing but the truth. Whether our intuition agrees or not doesn't matter. In this case, null everywhere might be very slightly better or worse. I'm not interested in running enough games to measure this since it looks like a +/- 1 Elo deal that takes a huge number of games to resolve.
-
- Posts: 1494
- Joined: Thu Mar 30, 2006 2:08 pm
Re: another interesting cluster test result
what do the dash and numbers in this stuff mean (like Crafty-23.1R09-2 and Crafty-23.1R09-6"? I cannot tell which is which. I would think in a head to head test you would just have two opponents with one change.bob wrote:More data. The above tests have completed. Here's the results. Crafty-23.1R07 is the best 23.1 so far, with the standard "only try null-move if alpha == beta-1" restriction. Crafty-23.1R09 has that restriction removed so that a null is tried everywhere in the tree, every last node, PV or not. (still the same other restrictions of course, none at depth=1, nor if in check, etc.)bob wrote:A while back, there was a discussion about where to skip doing null-moves. I had been using (for several years) an exclusion where if alpha == beta - 1 I do a null-move otherwise I do not. Several had reported that adding this gave them a boost in Elo. While testing other things, I also decided to test that as well. Turns out that with or without that exclusion, the Elo in cluster testing is _identical_. Not surprising since almost all nodes are searched with alpha == beta-1 anyway, so the overhead this adds is almost nil. A few extra nodes here, a quick refutation there, it all seems to wash out. I will post the results once all of the tests have been repeated.
Nothing significant going on there. They seem to be equally strong. I included 23.0 which is the released version, and 23.1 was our previous "best" before we added some optimizations and other changes over the past couple of weeks to produce 23.1R07. They also all lie within the usual error margins. So for this discussion, the answer is "makes no difference".Code: Select all
4 Crafty-23.1R09-2 2614 5 4 40000 51% 2605 23% 5 Crafty-23.1R09-6 2613 4 4 40000 51% 2605 22% 6 Crafty-23.1R07-4 2612 4 4 40000 51% 2605 22% 7 Crafty-23.1R07-5 2612 4 4 40000 51% 2605 23% 8 Crafty-23.1R07-6 2611 5 4 40000 51% 2605 22% 9 Crafty-23.1R09-1 2611 4 5 40000 51% 2605 23% 10 Crafty-23.1R07-1 2611 4 5 40000 51% 2605 23% 11 Crafty-23.1R09-5 2611 4 5 40000 51% 2605 23% 12 Crafty-23.1R09-4 2610 4 4 40000 51% 2605 23% 13 Crafty-23.1R09-3 2610 3 4 40000 51% 2605 22% 14 Crafty-23.1R07-3 2610 4 4 40000 51% 2605 23% 15 Crafty-23.1R07-2 2610 4 4 40000 51% 2605 22% 16 Crafty-23.1-4 2594 4 5 40000 48% 2605 22% 17 Crafty-23.1-1 2593 4 4 40000 48% 2605 22% 18 Crafty-23.1-2 2593 4 4 40000 48% 2605 22% 19 Crafty-23.1-3 2592 4 4 40000 48% 2605 23% 20 Crafty-23.0-4 2567 4 3 40000 45% 2605 21% 21 Crafty-23.0-2 2567 4 4 40000 45% 2605 21% 22 Crafty-23.0-1 2566 4 5 40000 45% 2605 21% 23 Crafty-23.0-3 2565 4 4 40000 45% 2605 20%
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: another interesting cluster test result
the -1- or -2- is the run number. If you look at 23.0, I ran the 40K game test four times, hence the -1, -2, -3 and -4 suffixes. If I just run one time, it is always possible to get one of those 2-sigma events, or a 3-sigma event. If I run it more than once, this becomes less likely.mjlef wrote:what do the dash and numbers in this stuff mean (like Crafty-23.1R09-2 and Crafty-23.1R09-6"? I cannot tell which is which. I would think in a head to head test you would just have two opponents with one change.bob wrote:More data. The above tests have completed. Here's the results. Crafty-23.1R07 is the best 23.1 so far, with the standard "only try null-move if alpha == beta-1" restriction. Crafty-23.1R09 has that restriction removed so that a null is tried everywhere in the tree, every last node, PV or not. (still the same other restrictions of course, none at depth=1, nor if in check, etc.)bob wrote:A while back, there was a discussion about where to skip doing null-moves. I had been using (for several years) an exclusion where if alpha == beta - 1 I do a null-move otherwise I do not. Several had reported that adding this gave them a boost in Elo. While testing other things, I also decided to test that as well. Turns out that with or without that exclusion, the Elo in cluster testing is _identical_. Not surprising since almost all nodes are searched with alpha == beta-1 anyway, so the overhead this adds is almost nil. A few extra nodes here, a quick refutation there, it all seems to wash out. I will post the results once all of the tests have been repeated.
Nothing significant going on there. They seem to be equally strong. I included 23.0 which is the released version, and 23.1 was our previous "best" before we added some optimizations and other changes over the past couple of weeks to produce 23.1R07. They also all lie within the usual error margins. So for this discussion, the answer is "makes no difference".Code: Select all
4 Crafty-23.1R09-2 2614 5 4 40000 51% 2605 23% 5 Crafty-23.1R09-6 2613 4 4 40000 51% 2605 22% 6 Crafty-23.1R07-4 2612 4 4 40000 51% 2605 22% 7 Crafty-23.1R07-5 2612 4 4 40000 51% 2605 23% 8 Crafty-23.1R07-6 2611 5 4 40000 51% 2605 22% 9 Crafty-23.1R09-1 2611 4 5 40000 51% 2605 23% 10 Crafty-23.1R07-1 2611 4 5 40000 51% 2605 23% 11 Crafty-23.1R09-5 2611 4 5 40000 51% 2605 23% 12 Crafty-23.1R09-4 2610 4 4 40000 51% 2605 23% 13 Crafty-23.1R09-3 2610 3 4 40000 51% 2605 22% 14 Crafty-23.1R07-3 2610 4 4 40000 51% 2605 23% 15 Crafty-23.1R07-2 2610 4 4 40000 51% 2605 22% 16 Crafty-23.1-4 2594 4 5 40000 48% 2605 22% 17 Crafty-23.1-1 2593 4 4 40000 48% 2605 22% 18 Crafty-23.1-2 2593 4 4 40000 48% 2605 22% 19 Crafty-23.1-3 2592 4 4 40000 48% 2605 23% 20 Crafty-23.0-4 2567 4 3 40000 45% 2605 21% 21 Crafty-23.0-2 2567 4 4 40000 45% 2605 21% 22 Crafty-23.0-1 2566 4 5 40000 45% 2605 21% 23 Crafty-23.0-3 2565 4 4 40000 45% 2605 20%
In this case, however, I was asked to just "light the cluster off" as our IT guys were fooling around with airflow and A/C levels since when I run this, the air around the cluster hits around 135 degrees F.
The closer the -1 -2 -3 -etc results are, the better. Most are very consistent. Occasionally one will be a bit outside the norm.
-
- Posts: 2851
- Joined: Wed Mar 08, 2006 10:01 pm
- Location: Irvine, CA, USA
Re: another interesting cluster test result
You may or may not be aware that the versions of Crafty being tested are not playing each other. The individual results of the opponent engines are not shown.mjlef wrote:I would think in a head to head test you would just have two opponents with one change.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: another interesting cluster test result
I had thought this was obvious from previous posts. Each version of Crafty plays against 5 opponents that remain constant, using 4,000 starting positions that remain constant for each 40K game run.Dirt wrote:You may or may not be aware that the versions of Crafty being tested are not playing each other. The individual results of the opponent engines are not shown.mjlef wrote:I would think in a head to head test you would just have two opponents with one change.
-
- Posts: 1260
- Joined: Sat Dec 13, 2008 7:00 pm
Re: another interesting cluster test result
This reasoning is completely ignorant of the statistics Bob already presented.jwes wrote:So doing null-move at PV nodes saves an insignificant number of nodes while very rarely causing a bad move from a null-move false positive, i.e. in the last round of an important tournament.
The results from Bob show that even if this change would cause randomly to throw one game in 5, it causes a strength increase in the 4 other games enough to completely offset and overcome this loss.
What's really happening is probably not quite so extreme, but the reasoning you give is flawed nevertheless: the results already show that the program is no weaker with this change.
-
- Posts: 1494
- Joined: Thu Mar 30, 2006 2:08 pm
Re: another interesting cluster test result
Who are the opponents? What are the time controls? Perhaps a link to the testing criteria/setup would be useful, especially if it does not change.