Impressive Preliminary Results of Rybka 3 by Larry Kaufman!

Nimzovik · Post by **Nimzovik** » Sun Jul 13, 2008 6:53 am

Yes.............. This has been stated repeatedly -- that programs can not understand closed positions -something that Father Pablo has exploited ad infinitum. So. The program rely on a hack to avoid/deal with closed positions generally speaking. (Although I found the W-chess engine was interesting -early versions as in Power chess-as it handled closed postions generally better than a lot of programs) This is a fatal flaw obviously. However I am curious...Mind you I am a complete idiot in terms of programinning....would not more chess "knowledge" in terms of closed positions of a specfic engine help here and then compare it's eval of the position with a "standard" program (like Shredder's triple brain concept) help here ? I realize implementaion would be gargantuan. However Kudos and money and fame and perhaps adoration by females is at stake here.....

Ok so I know I am wrong but please briefly point out where. Thanks.

Zach Wegner · Post by **Zach Wegner** » Sun Jul 13, 2008 7:23 am

bob wrote:I am not convinced that iterative search is required for good speedups. The DTS approach is certainly going to be _more_ efficient, but the margin is not huge between the two based on my dissertation results.

In the last test I did on this, Crafty got 3.3 / 4.0 on the same test set that CB got 3.7 on. While that is certainly significant, it is not overwhelming. Also at 16, I think CB got around 11.7 or so, while Crafty was around 10.8, although without the DTS paper in front of me, I might be a little off on the 11.7 number. And the 16 core numbers need to be backed up with more runs to be as reliable as the 8-core numbers I have provided in the past.

I have some 32 and 64 node numbers but not enough runs to want to publish any numbers and start yet another pointless discussion. But at least thru 16 processors, current Crafty does well. I have (and can easily run more) tons of 8-core results sitting around. Quite a few are on my ftp box, although those are AMD quad dual-core boxes as opposed to dual-quads based on Intel which seems to perform better with crafty.

I already have approaches ready and in place when I get access to a 32-64 core system for any significant amount of time. The issues are certainly non-trivial, but are clearly solvable given enough test time.

Yes, I agree that the difference should be small on 16 or less processors. I'm surprised that the difference is that low for 16 processors actually. But there is still a difference, and I'm definitely going to be skeptical of anyone trying to claim that they have the _best_ speedup when using recursive search.

Here's hoping that you get some time on a 32/64 way SMP system. I'd love to see that. Alternatively you could clusterize Crafty. You can guess what I would do if I had one of those monsters sitting around...

Remember that (a) not everyone is doing an iterative-type search, most are using recursion. (b) your iterative data is also requiring an extra register reference for the subscript. Bottom line is there is very little difference between the two approaches. When I added the pointer to crafty I found no appreciable slow-down at all. In Cray Blitz, which used iterative search, I used both threads and processes depending on the year in question, and didn't notice any speed difference at all. And, in fact, that should be the desired result.

Only issue with using processes rather than threads today is that egtb.cpp is written for threading so that it shares the LRU buffers (egtb cache). In processes, each process will end up with its own egtb cache which is not nearly as effective as a shared cache with threads...

Yes, the iterative search does waste a register. I can't measure a speed difference with it, as my search has been iterative since day one and all of the supporting routines are based on the iterative stack structure (hashing, move ordering, etc). A long time ago, 4 years or so, I converted my old program from a global board to a pointer, and there was very little difference IIRC. I started my new program with processes just to get rid of the pointer, but I find that any speed difference is very small and the ease of using threads is worth it.

I do have the second issue too, as ZCT uses Scorpio EGBBs. But now it uses threads on Windows (using the thread local stuff), so I imagine most users won't complain.

Jeroen · Post by **Jeroen** » Sun Jul 13, 2008 10:29 am

So you might as well consider a Van Gogh or Monet simply being 'paint brushed on a piece of paper'

.

Jeroen · Post by **Jeroen** » Sun Jul 13, 2008 10:32 am

If you read the Rybka forum carefully, you can see that the 100 Elo gain claim is also supported by results against other engines. Surprisingly the elo gain from games vs other engines is even better than Larry's claim.

gerold · Post by **gerold** » Sun Jul 13, 2008 12:23 pm

Jeroen wrote:If you read the Rybka forum carefully, you can see that the 100 Elo gain claim is also supported by results against other engines. Surprisingly the elo gain from games vs other engines is even better than Larry's claim.

That does not surprise me.

Uri Blass · Post by **Uri Blass** » Sun Jul 13, 2008 2:41 pm

bob wrote:
Milton wrote:By lkaufman Date 2008-07-08 09:51 Since yesterday I've been testing a version of Rybka that is very close to Rybka 3, with the improved scaling and all my latest eval terms added. I'm running it against 2.3.2a mp. It appears that on a direct match basis, we will reach the goal of a 100 Elo gain, at least on quads. As of now, after 900 games total, the lead is 110 Elo (105 Elo on quads, 120 on my octal). This is with both programs using the same short generic book, each taking White once in every opening. To achieve this result Rybka 3 has to win about 4 games for each win by 2.3.2a on the quads and about 5 for 1 on the octal, due to draws. How this will translate to gains on the rating lists remains to be seen.
Personally I think this is a _terrible_ way of estimating Elo gain. I quit doing this years ago because it horribly inflates the ELo for a simple reason...

When you add some new piece of knowledge that might be helpful here and there, and that is the _only_ difference between the two engines, then any rating change is a direct result of that change plus the normal randomness that games between equal opponents produces. Since the two programs are identical except for the new piece of knowledge, the one with the new piece will occasionally use it to win a game.

But in real games between _different_ opponents, that new piece of knowledge might produce absolutely no improvement at all, or one so small that it takes thousands of games to measure. Once you think about it for a few minutes, you see why this is pretty meaningless. The fact that it produces _any_ improvement is certainly significant, but the fact that it produces a 100 Elo improvement is worthless...

I could probably find some test results to show this as at times, we add an old version of Crafty to our gauntlet for testing, and new changes tend to exaggerate that score compared to the scores against other programs in the mix.

Your assumption("it horribly inflates the ELo ") seems not to be correct
here.

Larry explained that the new knowledge also made rybka slower so it was outsearched by older rybka.

He claims that this reason made the improvement smaller in rybka-rybka games(relaive to rybka against other opponents).

tests against other opponents in the rybka forum suggest slightly bigger improvement relative to rybka-rybka games.

Uri

bob · Post by **bob** » Sun Jul 13, 2008 4:59 pm

Jeroen wrote:So you might as well consider a Van Gogh or Monet simply being 'paint brushed on a piece of paper' .

I would consider it "a painting". Perhaps done better than most others. But no different in the "nuts and bolts". Canvas. Oils. Brushes. Etc...

bob · Post by **bob** » Sun Jul 13, 2008 5:06 pm

Zach Wegner wrote:
bob wrote:I am not convinced that iterative search is required for good speedups. The DTS approach is certainly going to be _more_ efficient, but the margin is not huge between the two based on my dissertation results.

In the last test I did on this, Crafty got 3.3 / 4.0 on the same test set that CB got 3.7 on. While that is certainly significant, it is not overwhelming. Also at 16, I think CB got around 11.7 or so, while Crafty was around 10.8, although without the DTS paper in front of me, I might be a little off on the 11.7 number. And the 16 core numbers need to be backed up with more runs to be as reliable as the 8-core numbers I have provided in the past.

I have some 32 and 64 node numbers but not enough runs to want to publish any numbers and start yet another pointless discussion. But at least thru 16 processors, current Crafty does well. I have (and can easily run more) tons of 8-core results sitting around. Quite a few are on my ftp box, although those are AMD quad dual-core boxes as opposed to dual-quads based on Intel which seems to perform better with crafty.

I already have approaches ready and in place when I get access to a 32-64 core system for any significant amount of time. The issues are certainly non-trivial, but are clearly solvable given enough test time.
Yes, I agree that the difference should be small on 16 or less processors. I'm surprised that the difference is that low for 16 processors actually. But there is still a difference, and I'm definitely going to be skeptical of anyone trying to claim that they have the _best_ speedup when using recursive search.

I have _always_ been skeptical of such claims as well. With one caveat. I do not _know_ that a well-done recursive search won't scale as well or better than the iterative DTS I did in CB. I've never claimed that CB's search was the ultimate and better than anything else around. I've never claimed anything similar for Crafty's parallel search either. I've just posted the results, the raw data, and let it go at that. "some" won't do that of course, which is OK. And sooner or later, it will be simple enough to measure scaling for any program that is made public.

I'm also skeptical of "fixing the scaling problem in 6 weeks." Unless the previous search was so poorly written that the speedup was essentially nil. This is a time-consuming issue, and it is about much more than just searching in parallel. There are significant architectural hurdles that have to be overcome to get decent numbers.

Here's hoping that you get some time on a 32/64 way SMP system. I'd love to see that. Alternatively you could clusterize Crafty. You can guess what I would do if I had one of those monsters sitting around...

It is an interesting thought. The 70 node 8core cluster is an interesting animal. But the message passing really poses a hurdle that looks to be difficult to deal with, as opposed to shared memory which is far simpler and more flexible (if still somewhat problematic in NUMA architectures).

Remember that (a) not everyone is doing an iterative-type search, most are using recursion. (b) your iterative data is also requiring an extra register reference for the subscript. Bottom line is there is very little difference between the two approaches. When I added the pointer to crafty I found no appreciable slow-down at all. In Cray Blitz, which used iterative search, I used both threads and processes depending on the year in question, and didn't notice any speed difference at all. And, in fact, that should be the desired result.

Only issue with using processes rather than threads today is that egtb.cpp is written for threading so that it shares the LRU buffers (egtb cache). In processes, each process will end up with its own egtb cache which is not nearly as effective as a shared cache with threads...
Yes, the iterative search does waste a register. I can't measure a speed difference with it, as my search has been iterative since day one and all of the supporting routines are based on the iterative stack structure (hashing, move ordering, etc). A long time ago, 4 years or so, I converted my old program from a global board to a pointer, and there was very little difference IIRC. I started my new program with processes just to get rid of the pointer, but I find that any speed difference is very small and the ease of using threads is worth it.

I do have the second issue too, as ZCT uses Scorpio EGBBs. But now it uses threads on Windows (using the thread local stuff), so I imagine most users won't complain.

bob · Post by **bob** » Sun Jul 13, 2008 5:14 pm

Nimzovik wrote:Yes.............. This has been stated repeatedly -- that programs can not understand closed positions -something that Father Pablo has exploited ad infinitum. So. The program rely on a hack to avoid/deal with closed positions generally speaking. (Although I found the W-chess engine was interesting -early versions as in Power chess-as it handled closed postions generally better than a lot of programs) This is a fatal flaw obviously. However I am curious...Mind you I am a complete idiot in terms of programinning....would not more chess "knowledge" in terms of closed positions of a specfic engine help here and then compare it's eval of the position with a "standard" program (like Shredder's triple brain concept) help here ? I realize implementaion would be gargantuan. However Kudos and money and fame and perhaps adoration by females is at stake here..... Ok so I know I am wrong but please briefly point out where. Thanks.

You are absolutely correct about "more knowledge helping". But there is a serious issue to deal with. More knowledge means less tactical accuracy, because knowledge and speed are inversely related. Even worse is the way this knowledge has to be expressed exactly. A human can be told to avoid blocked positions and that is enough. But for a computer, describing "blocked" becomes something that is not so easy to do, because it has to be explicit, where a human can generallize. All of this turns into lots of code, and lots of code is slow... So we have resorted to simple hacks in the past. For example, the "trojan horse" code I used for several years to stop that particular strategy. Eventually others adopted similar ideas to avoid getting drubbed by players like "mercilous" on ICC. But it has some very bad properties. I once saw crafty refuse to re-capture after the opponent first captured a piece on g5, because that would open the h-file just as in the trojan attack. So it just played on a piece down where the recapture was perfectly safe. The hack works, but there is a significant cost in terms of general playing skill, which is why I only turn it on in certain cases and against humans but not computers.

bob · Post by **bob** » Sun Jul 13, 2008 5:16 pm

Jeroen wrote:If you read the Rybka forum carefully, you can see that the 100 Elo gain claim is also supported by results against other engines. Surprisingly the elo gain from games vs other engines is even better than Larry's claim.

Wouldn't argue with that at all. Just that drawing conclusions from N vs N+1 can be _very_ misleading for the reasons given.

Impressive Preliminary Results of Rybka 3 by Larry Kaufman!

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm

Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm