Thinker 5.4 2CPU <-> Thinker 5.4 4CPU

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
rainhaus
Posts: 143
Joined: Sun Feb 01, 2009 6:26 pm

Re: Thinker 5.4 2CPU <-> Thinker 5.4 4CPU

Post by rainhaus » Wed Feb 04, 2009 8:41 pm

>The "threads" option in Thinker is the number of search threads. However, Thinker has a master thread for monitoring the search. This means that when you specify 4 threads, there are actually 5 active threads. If your machine only has 4 CPUs, this would be really bad.
I suggest that you try "threads=3" if your machine only has 4 CPUs.
Also, the parallel search code is currently being overhauled by Kerwin. We should see better scaling in the next release.
Cheers<


Hi Lance,
what U say is not what I get on my screen. Watching the CPU-Meter in Vista’s Sidebar, even the Taskmanager, you can see how many processors are working. In my current tournament all 7 engines are configured with two threads and exactly so it is shown on the display”: 50% of the CPU, no more and no less. With an additional master thread there must be shown 75%, isn't it !?
The tournament goes on Arena 2.0, five programs are running as UCIs and your Thinkers as native Winboards. The parameters in the commando line are: threads=2 hashsize=9 (256 MB)

Nice, when the programmers themselves show up here sometimes

regards Rainer

rainhaus
Posts: 143
Joined: Sun Feb 01, 2009 6:26 pm

Re: Thinker 5.4 2CPU <-> Thinker 5.4 4CPU

Post by rainhaus » Wed Feb 04, 2009 9:56 pm

Norm Pollock wrote:I'm not a believer in intra-family matches (matches between different versions of the same engine). These matches tend to have an increased number of draws because the same basic engine does not "see" a weak position of its family member as often as another engine would. The intra-family matches can be boring and not as meaningful as inter-family matches.


Hi Norm,
from the statistical point of view you are basically right. Testing identical engines or several versions of engines, implicates a restricted variation of the scores. This is statistically well known as “restriction of range” or “restriction of variance”. You must consider, that a restriction of that kind is also effective with many clones and derivatives of the open sources like Fruit, Glaurung or Rybka (I don’t know exactly whether the Beta 1.0 is open source or not). An increased number of draws might be the consequence, but it is not forced necessarily. The correlative effects with identic or very similar constructed programs depend also on what variables and parameters are changed and compared: 2/CPU/4CPU, pondering on/off, hash, tb’s on/off etc.
You can check it on your own. Let the engines play against itself, change only one parameter and evaluate the matches statistically! After that you can inform the community with a very detailed scientific report :)
Nevertheless, sometimes methodical reflections seems to be a little bit very theoretical . In the Seven-Engines tournament I’m playing in the moment, there is working Lance Perkins Inert Thinker 5.4a as well as his Passiv Thinker 5.4a. 285 games are played, 13 games per engine pair. There are, up to now, only 2 draws between the two Thinkers and Inert licks Passive as he wants! And what shall I say, “Inert” is the leader indeed, and “Passiv” the final light of this tournament What the hell has ridden this excellent programmer and UCI-abstainer fathering such different babies, who simply ignore this equalising restriction of range ;) But of course, it is to wait for the final scoring in this tournament.
Beside of these practical experiences, there are statistical formulas to estimate and fix the restricted variance in various manner.
Cheers, Rainer

User avatar
Werner
Posts: 2610
Joined: Wed Mar 08, 2006 9:09 pm

Re: Thinker 5.4 2CPU <-> Thinker 5.4 4CPU

Post by Werner » Thu Feb 05, 2009 7:29 am

Rainer Marian wrote:Hi Werner
thus they are, the testers :) For the sake of selectivity and for their lists they play 1000 of games and more,- with each engine of course-, but to interpret a difference of only 5% there suddenly 50 games should be enough. Non ,non Messieurs, a sample of 50 games/match is too small for realizing a significance between 51% and 56% of scoring. There are statistical formulas to calculate the necessary size of a sample, please. You need about 400 games between Fruit / Thinker2CPU and also 400 games between Fruit / Thinker4CPU to test for the 95% level of significance! Based on these two matches you can interpret nothing, whatever engine scores better or worse.
Surly, an experienced Engine Tester sometimes feels instinctively, that there might be something a bit fishy. In this case it’s better to search for bugs, incorrect using, or ask the programmer. This may be more useful than generating a huge cemetery of games.
mfg Rainer
Hi Rainer,
please have a look at our list - this is only an example :)
35 Thinker 5.4Ai x64 2CPU 2960 +24 -24 455 games
40 Thinker 5.4Ai x64 4CPU 2954 +26 -26 350 games
Werner

rainhaus
Posts: 143
Joined: Sun Feb 01, 2009 6:26 pm

Re: Thinker 5.4 2CPU <-> Thinker 5.4 4CPU

Post by rainhaus » Fri Feb 06, 2009 12:02 am

Werner wrote: Hi Rainer,
please have a look at our list - this is only an example :)
35 Thinker 5.4Ai x64 2CPU 2960 +24 -24 455 games
40 Thinker 5.4Ai x64 4CPU 2954 +26 -26 350 games
Hi Werner,
of course I know the actual ratings, CGT is my daily reading at breakfast! This morning I talked to my wife: “look to this testing people, they are playing thousands of games, but they do not even know which engine is playing better against Fruit 2.3.5.m. Is it Thinker 5.4 Ai with 2 CPU or is it Thinker 5.4Ai with 4 CPU. And furthermore, they say to a sample, it is an example and only the whole population is here a sensation. And she said: Yes, I'm understanding this very well, I’m absolutely not interested, too”. O.K., I’ll try to talk about sample and population somewhere down the road.

Nevertheless, your “problems to see an advantage of the 4 CPUs” seems to be a significant score since a few weeks! There must be something wrong, indeed.

whether sample or example
see you later in this temple
Cheers, Rainer

Post Reply