recent cluster testing

bob · Post by **bob** » Sat Oct 31, 2009 4:09 pm

Someone asked recently how we were doing in the crafty development and testing and when a new release would come out. For the second question, we have decided to release the source right after the next ACCA event. For testing, I have been getting an occasional request from our cluster guys to crank it up so they can measure A/C load and airflow as they play with trying to get the heat away from the cluster (when I crank up all nodes, the air right behind the exhaust fans (the entire back of the cluster) sticks at around 135 degrees F. Hot, to say the least. They are trying to make sure that cold air is going in, at least.

Here are 4 test runs with 23.0 and 23.0. All use same positions, same everything, very consistent results at 40,000 games per run

Code: Select all

Crafty-23.1-4        2645    4    3 40000   56%  2597   23%
Crafty-23.1-2        2644    4    4 40000   56%  2597   23%
Crafty-23.1-1        2644    4    4 40000   56%  2597   23%
Crafty-23.1-3        2643    4    4 40000   56%  2597   23%
Crafty-23.0-1        2560    4    4 40000   45%  2597   21%
Crafty-23.0-2        2559    4    4 40000   45%  2597   21%
Crafty-23.0-4        2559    4    4 40000   45%  2597   21%
Crafty-23.0-3        2558    4    4 40000   45%  2597   21%

We have tested these versions at several time controls, with and without increments (all the above are at same time control, just two versions each version run 4 times to keep the cluster loaded for a whole day) and the above rating difference is consistent regardless of the time control used. This will be released in just over 2 weeks...

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Oct 31, 2009 4:30 pm

How much Energy e.g. Watt seconds or Joule does one game take on average on your cluster? Who pays that all?

bob · Post by **bob** » Sat Oct 31, 2009 4:44 pm

Gerd Isenberg wrote:How much Energy e.g. Watt seconds or Joule does one game take on average on your cluster? Who pays that all?

I am actually not sure. This cluster has 128 nodes, each node a pair of 3.2ghz 64 bit Intel Xeons (this was the early 64 bit Intel processor). It certainly is not cheap because you get to pay for it twice. Once to dissipate the heat at the CPU, again to pump the heat outside via the A/C system. The university pays to operate the cluster(s) that we have. I wouldn't be surprised if each node dissipates about 400 watts or so counting two internal disk drives, dual cpus, 8 gigs of RAM, etc. And I'd assume we have to use about 500 wats per hour to pump the heat back out of the room, so maybe 128kw hours per hour of operation, total. Just a SWAG of course, it could be lower or higher.

Vinvin · Post by **Vinvin** » Sat Oct 31, 2009 6:26 pm

Code: Select all

Crafty-23.1 2644    
Crafty-23.0 2559

85 pts above !
Is the improvement so big against other opponents ?

Uri Blass · Post by **Uri Blass** » Sat Oct 31, 2009 7:02 pm

Vinvin wrote:
Code: Select all
Crafty-23.1 2644    
Crafty-23.0 2559    
85 pts above !
Is the improvement so big against other opponents ?

Based on my knowledge the improvement is not based on Crafty-Crafty games so the answer is probably yes.

Uri Blass · Post by **Uri Blass** » Sat Oct 31, 2009 7:05 pm

bob wrote:Someone asked recently how we were doing in the crafty development and testing and when a new release would come out. For the second question, we have decided to release the source right after the next ACCA event. For testing, I have been getting an occasional request from our cluster guys to crank it up so they can measure A/C load and airflow as they play with trying to get the heat away from the cluster (when I crank up all nodes, the air right behind the exhaust fans (the entire back of the cluster) sticks at around 135 degrees F. Hot, to say the least. They are trying to make sure that cold air is going in, at least.

Here are 4 test runs with 23.0 and 23.0. All use same positions, same everything, very consistent results at 40,000 games per run
Code: Select all
Crafty-23.1-4        2645    4    3 40000   56%  2597   23%
Crafty-23.1-2        2644    4    4 40000   56%  2597   23%
Crafty-23.1-1        2644    4    4 40000   56%  2597   23%
Crafty-23.1-3        2643    4    4 40000   56%  2597   23%
Crafty-23.0-1        2560    4    4 40000   45%  2597   21%
Crafty-23.0-2        2559    4    4 40000   45%  2597   21%
Crafty-23.0-4        2559    4    4 40000   45%  2597   21%
Crafty-23.0-3        2558    4    4 40000   45%  2597   21%
We have tested these versions at several time controls, with and without increments (all the above are at same time control, just two versions each version run 4 times to keep the cluster loaded for a whole day) and the above rating difference is consistent regardless of the time control used. This will be released in just over 2 weeks...

In order to claim that the improvement is consistent you need to test it at different time controls.
What different time control did you use?

bob · Post by **bob** » Sat Oct 31, 2009 9:25 pm

Vinvin wrote:
Code: Select all
Crafty-23.1 2644    
Crafty-23.0 2559    
85 pts above !
Is the improvement so big against other opponents ?

I see no reason to believe otherwise. I have already had one regular opponent on ICC accuse me of running Rybka most of the time.

Vincent also asked me what I had done.

I have played a couple of test matches adding in 3-4 other opponents, didn't change the "difference" between 23. and 23.1 a bit...

bob · Post by **bob** » Sat Oct 31, 2009 9:26 pm

Uri Blass wrote:
Vinvin wrote:
Code: Select all
Crafty-23.1 2644    
Crafty-23.0 2559    
85 pts above !
Is the improvement so big against other opponents ?
Based on my knowledge the improvement is not based on Crafty-Crafty games so the answer is probably yes.

I didn't make that assumption that he thought this was 23.0 vs 23.1. This is 23.0 vs a group of opponents with a set of 4000 starting position, then 23.1 against the same set of opponents. I never play Crafty vs Crafty, although I might do it just for fun. I'll try to run that later today and report the results. Actually will be interesting to see what happens.

bob · Post by **bob** » Sat Oct 31, 2009 9:28 pm

Uri Blass wrote:
bob wrote:Someone asked recently how we were doing in the crafty development and testing and when a new release would come out. For the second question, we have decided to release the source right after the next ACCA event. For testing, I have been getting an occasional request from our cluster guys to crank it up so they can measure A/C load and airflow as they play with trying to get the heat away from the cluster (when I crank up all nodes, the air right behind the exhaust fans (the entire back of the cluster) sticks at around 135 degrees F. Hot, to say the least. They are trying to make sure that cold air is going in, at least.

Here are 4 test runs with 23.0 and 23.0. All use same positions, same everything, very consistent results at 40,000 games per run
Code: Select all
Crafty-23.1-4        2645    4    3 40000   56%  2597   23%
Crafty-23.1-2        2644    4    4 40000   56%  2597   23%
Crafty-23.1-1        2644    4    4 40000   56%  2597   23%
Crafty-23.1-3        2643    4    4 40000   56%  2597   23%
Crafty-23.0-1        2560    4    4 40000   45%  2597   21%
Crafty-23.0-2        2559    4    4 40000   45%  2597   21%
Crafty-23.0-4        2559    4    4 40000   45%  2597   21%
Crafty-23.0-3        2558    4    4 40000   45%  2597   21%
We have tested these versions at several time controls, with and without increments (all the above are at same time control, just two versions each version run 4 times to keep the cluster loaded for a whole day) and the above rating difference is consistent regardless of the time control used. This will be released in just over 2 weeks...
In order to claim that the improvement is consistent you need to test it at different time controls.
What different time control did you use?

I tried 1+1, 5+5 (very long test), 1+0 and 5+0. I tried the with/without increment options because of some time allocation changes and I wanted to make sure they didn't work with increments and fail without.

Crafty does worse against the opponents at some time controls, better at others. But the important point is that the spread between 23.0 and 23.1 is almost perfectly constant no matter what time control.

I did forget that I originally used 10s +0.1s because that set of 40,000 games finishes in an hour or so. I do most of my testing at that time control, and only occasionally do I do a "sanity check" by trying other time controls. I tried several this time because a couple of the changes were to time allocation.

Rémi Coulom · Post by **Rémi Coulom** » Sat Oct 31, 2009 9:51 pm

Oh. I'll make bayeselo display hundredth of Elo points in the next version

Rémi

recent cluster testing

recent cluster testing

Re: recent cluster testing

Re: recent cluster testing

Re: recent cluster testing

Re: recent cluster testing

Re: recent cluster testing

Re: recent cluster testing

Re: recent cluster testing

Re: recent cluster testing

Re: recent cluster testing