Impressive Preliminary Results of Rybka 3 by Larry Kaufman!

Zach Wegner · Post by **Zach Wegner** » Thu Jul 10, 2008 12:54 am

I don't think the difference is so clear. I've used both, but right now I'm using processes. I think threads are easier really, at least a little bit. It's sort of a pain to pass around a pointer to a BOARD structure, but that's the only real problem. It's easier with threads to share information, and in a chess program you need to do that a lot. Overall, IMO it's better to have memory shared by default that you have to explicitly segregate for different processors with a pointer, than it is to have to explicitly share everything you need with custom code.

And the performance difference is very minimal, with copy-on-write.

Zach Wegner · Post by **Zach Wegner** » Thu Jul 10, 2008 12:57 am

Exactly. Take a program like Rybka, and put it on say 16 processors. It won't get much speed gain. Put it on 512, and it will probably be weaker than the 8 processor version. Take Zappa on the other hand, which is generally known for the best scaling, and it will shine on 512 (it already has). That's what scaling really means, and this measure will become more and more meaningful as our chips get bigger and bigger, and massively parallel systems become more and more common.

You simply need iterative search to get any real scaling on large amounts of processors. When Rybka 3 is released, it will be easy enough for the asm gurus to tell if it's using iterative search.

Nimzovik · Post by **Nimzovik** » Thu Jul 10, 2008 3:10 am

Dr.Wael Deeb wrote:
tiger wrote:
Dr.Wael Deeb wrote:
Nimzovik wrote:Heavens! I as a chess player love to analyse openings.....who doesn't? Now I have an even better engine looming on the horizon for this purpose. Vas -- like the terminator--JUST KEEPS COMING! My point is this however.....We as chess players are being given what we SAY we want---a strong engine-------------THEN we WHINE about it? Why?? Because it beats the snot out of the competition? Sheeeesh. Something wrong here... I applaud the new paradigm shift! This is capitalism and ingenuity at work! I too was tired of being milked like a cash flow cow and paying the "annual fee" for engines that improved perhaps 10 rating points a year.....were not we all? I chant "All hail Vas! All hail VAS". I still like to play Hiarcs tho. Maybe one day we will even get a program that can beat Father Pablo's stonewalling 100% of the time That is the paradigm shift I am truly waiting for!
You mean Pablo's monkey trick

Pablo cannot beat Chess Tiger's antihuman setting, so what you are waiting for has already been done years ago.

// Christophe
I am aware of that,but a lot of people don't even have the clue

Yes......... but the point made in an earlier post is ...Perhaps yes you can tweak your program to defeat a particular person (In this case Pablo) but it has not been proved that you can perform said tweak (-anti - human play/mode) and still beat grandmasters! Indeed I dare say there are other anti computer palyers that have more effective anti computer play. Pablo is perhaps NOT the pinnacle of this style. The point must apparantly be RE-STATED once again. The programs have difficulty with closed positions! Can they combine an assessment of benefit of closing or opening a postion accruately.. IMHO they can not at this juncture assess accurately these dynamics and maintain thier strength. In short --No program plays like a Nimzovich!

M ANSARI · Post by **M ANSARI** » Thu Jul 10, 2008 8:01 am

Dann Corbit wrote:
M ANSARI wrote:Vas has been working on MP scaling for quite a bit more than 6 weeks ... 60 weeks seems more accurate. I don't think that he has switched to threads from processes ... but if he says he has cracked MP scaling then I would tend to believe him. I could never figure out why Vas used processes instead of threads, but he must have a reason. Anyway in a few days we will find out.
It is much easier to use processes instead of threads.
The processes do not have to worry about any sharing of objects except for those things deliberately placed in shared memory.

The only downside is that a process is a much heavier burden on the OS than a thread.

I guess that you won't see more than 5% speedup from changing from processes to threads, but you will see some benefit.

That is interesting and makes a lot of sense. I noticed 2 things when comparing Zappa to Rybka ... Zappa heats up the CPU's a little more ... and Zappa also gains disproportionally when hardware gets racked up. I reached the conclusion that at 8 core and 5.2 Ghz ... Zappa would probably play at an equal level to Rybka 2.3.2a on the same hardware. That is after testing both engines in matches (100 games at a time at 5 2). This would all seem due to Zappa's amazing scaling ... although I was not able to test at 5.2 Ghz ... I did test at 4.4 and 4.6 and 4.8 ... and things move in a very linear fashion. Rybka still edges Zappa at 4.8 Ghz ... but it is damn close ... and the only parameter changing is the hardware.

The 5% you mention ... if true ... is very substantial ... and will be more so as hardware improves. 5% of 10 is not that much ... but 5% of a 1,000,000 is quite a bit. I think Vas knows all this and has probably put an emphasis on MP scaling to try to rival Zappa or improve on it. If Rybka 3 manages a 100 ELO point increase in strength without changing from processes to threads, that would mean that Rybka 4 will have an easy road to additional strength.

Dann Corbit · Post by **Dann Corbit** » Thu Jul 10, 2008 9:50 pm

Zach Wegner wrote:I don't think the difference is so clear. I've used both, but right now I'm using processes. I think threads are easier really, at least a little bit. It's sort of a pain to pass around a pointer to a BOARD structure, but that's the only real problem. It's easier with threads to share information, and in a chess program you need to do that a lot. Overall, IMO it's better to have memory shared by default that you have to explicitly segregate for different processors with a pointer, than it is to have to explicitly share everything you need with custom code.

And the performance difference is very minimal, with copy-on-write.

The reason that processes are easier is for the debugging stage.

It is very simple to know that your process only touches shared memory objects in common with the other processes. I can have thousands of global variables and they will never pose a problem because only one process can possibly touch them. (Not a good design obviously, but the point holds for less extreme cases).

With threads, it is easy to accidentally overwrite a global variable in more than one thread.

Dann Corbit · Post by **Dann Corbit** » Thu Jul 10, 2008 10:01 pm

Zach Wegner wrote:Exactly. Take a program like Rybka, and put it on say 16 processors. It won't get much speed gain. Put it on 512, and it will probably be weaker than the 8 processor version. Take Zappa on the other hand, which is generally known for the best scaling, and it will shine on 512 (it already has). That's what scaling really means, and this measure will become more and more meaningful as our chips get bigger and bigger, and massively parallel systems become more and more common.

You simply need iterative search to get any real scaling on large amounts of processors. When Rybka 3 is released, it will be easy enough for the asm gurus to tell if it's using iterative search.

I guess that the experimental Intel compiler uses software tranactions. The author of the paper that I quoted works at Intel.
http://softwarecommunity.intel.com/arti ... g/1460.htm

On the other hand, currently, most computers have 8 or fewer CPUs. The jillion CPU models are for military or government use for the most part.

Of course, that looks like it is going to change, but I guess we have at least 5 years before the average desktop has more than 16 CPUs on it.

Tord Romstad · Post by **Tord Romstad** » Thu Jul 10, 2008 10:37 pm

Dann Corbit wrote:It is much easier to use processes instead of threads.

I think this depends on your coding style, and on your prior knowledge and experience. To me, threads are far easier than processes, and ease of implementation was my only reason for choosing to use threads. I was under the (probably mistaken) impression that processes were slightly superior with respect to performance, but they were simply too tricky to get right for me.

he processes do not have to worry about any sharing of objects except for those things deliberately placed in shared memory.

That's never been a problem to me. On the contrary, this is one of the things which make processes difficult to me. Variables which look like global variables, but are actually local to one process, confuse me, and cause bugs everywhere. When I declare a variable as global, I want it to be global, and want all threads to notice when some thread has written to it. When I want something local to each thread, I find it much cleaner and clearer to use an array indexed by thread ID.

Tord

Dann Corbit · Post by **Dann Corbit** » Thu Jul 10, 2008 11:29 pm

Tord Romstad wrote:
Dann Corbit wrote:It is much easier to use processes instead of threads.
I think this depends on your coding style, and on your prior knowledge and experience. To me, threads are far easier than processes, and ease of implementation was my only reason for choosing to use threads. I was under the (probably mistaken) impression that processes were slightly superior with respect to performance, but they were simply too tricky to get right for me.

he processes do not have to worry about any sharing of objects except for those things deliberately placed in shared memory.
That's never been a problem to me. On the contrary, this is one of the things which make processes difficult to me. Variables which look like global variables, but are actually local to one process, confuse me, and cause bugs everywhere. When I declare a variable as global, I want it to be global, and want all threads to notice when some thread has written to it. When I want something local to each thread, I find it much cleaner and clearer to use an array indexed by thread ID.

Tord

I think for writing a new program from scratch, threads would be easier if you had SMP in mind when you started. But if you are porting an old program designed for single threads with tons of global cruft that gets written to (read-only globals are fine of course) then I think process will be much easier.

I wrote this fake fork() for Windows and I know of at least two chess programs that used it to port their Unix programs to Windows and use SMP at the same time:

Code: Select all

#include <windows.h>
#include <process.h>
#include <stdio.h>

extern int real_argc;
extern char **real_argv;

int             fork&#40;void&#41;
&#123;
    char            szPath&#91;FILENAME_MAX&#93;;
    char           *pPtr;
    DWORD           dwThreadID;
    char            szCommandLine&#91;1024&#93;;
    int             nLoop;
    PROCESS_INFORMATION pi;
    STARTUPINFO     startup_info;
    startup_info.cb = sizeof&#40;STARTUPINFO&#41;;
    startup_info.lpReserved = NULL;
    startup_info.lpDesktop = NULL;
    startup_info.lpTitle = NULL;
    startup_info.dwX = 0;
    startup_info.dwY = 0;
    startup_info.dwXSize = 0;
    startup_info.dwYSize = 0;
    startup_info.cbReserved2 = 0;
    startup_info.lpReserved2 = NULL;
    startup_info.dwFlags = STARTF_USESHOWWINDOW;
    startup_info.wShowWindow = SW_HIDE;

    GetModuleFileName&#40;NULL, szPath, sizeof&#40;szPath&#41; - 1&#41;;
    /* Copy existing command line parameters */
    strcpy&#40;szCommandLine, szPath&#41;;
    strcat&#40;szCommandLine, " ");
    for &#40;nLoop = 1; nLoop < real_argc; nLoop++) &#123;
        strcat&#40;szCommandLine, real_argv&#91;nLoop&#93;);
        strcat&#40;szCommandLine, " ");
    &#125;
    if &#40;0 == &#40;CreateProcess&#40;NULL,
                            szCommandLine,
                            NULL,
                            NULL,
                            TRUE,
                            0,
                            NULL,
                            NULL,
                            &startup_info,
                            &pi&#41;))
        return -1;

    return pi.dwProcessId;

&#125;

Zach Wegner · Post by **Zach Wegner** » Fri Jul 11, 2008 5:00 am

I agree Tord. I started off with processes because of the performance benefit: you don't have to carry around a BOARD * everywhere you go, which reduces register pressure and requires slightly less calculation for reading parts of the board. I'm not sure how much this last part makes a difference, but theoretically for a static global you know the address of each element at compile time, whereas with a pointer you have to add the offset for each element to the pointer you get passed.

But I have had a much easier time dealing with threaded programs. I'm considering converting ZCT to threads, but I think other things would be more productive.

bob · Post by **bob** » Fri Jul 11, 2008 5:41 pm

Milton wrote:By lkaufman Date 2008-07-08 09:51 Since yesterday I've been testing a version of Rybka that is very close to Rybka 3, with the improved scaling and all my latest eval terms added. I'm running it against 2.3.2a mp. It appears that on a direct match basis, we will reach the goal of a 100 Elo gain, at least on quads. As of now, after 900 games total, the lead is 110 Elo (105 Elo on quads, 120 on my octal). This is with both programs using the same short generic book, each taking White once in every opening. To achieve this result Rybka 3 has to win about 4 games for each win by 2.3.2a on the quads and about 5 for 1 on the octal, due to draws. How this will translate to gains on the rating lists remains to be seen.

Personally I think this is a _terrible_ way of estimating Elo gain. I quit doing this years ago because it horribly inflates the ELo for a simple reason...

When you add some new piece of knowledge that might be helpful here and there, and that is the _only_ difference between the two engines, then any rating change is a direct result of that change plus the normal randomness that games between equal opponents produces. Since the two programs are identical except for the new piece of knowledge, the one with the new piece will occasionally use it to win a game.

But in real games between _different_ opponents, that new piece of knowledge might produce absolutely no improvement at all, or one so small that it takes thousands of games to measure. Once you think about it for a few minutes, you see why this is pretty meaningless. The fact that it produces _any_ improvement is certainly significant, but the fact that it produces a 100 Elo improvement is worthless...

I could probably find some test results to show this as at times, we add an old version of Crafty to our gauntlet for testing, and new changes tend to exaggerate that score compared to the scores against other programs in the mix.

Impressive Preliminary Results of Rybka 3 by Larry Kaufman!

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm