Impressive Preliminary Results of Rybka 3 by Larry Kaufman!
Moderator: Ras
-
Nimzovik
- Posts: 1831
- Joined: Sat Jan 06, 2007 11:08 pm
Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm
Yes.............. This has been stated repeatedly -- that programs can not understand closed positions -something that Father Pablo has exploited ad infinitum. So. The program rely on a hack to avoid/deal with closed positions generally speaking. (Although I found the W-chess engine was interesting -early versions as in Power chess-as it handled closed postions generally better than a lot of programs) This is a fatal flaw obviously. However I am curious...Mind you I am a complete idiot in terms of programinning....would not more chess "knowledge" in terms of closed positions of a specfic engine help here and then compare it's eval of the position with a "standard" program (like Shredder's triple brain concept) help here ? I realize implementaion would be gargantuan. However Kudos and money and fame and perhaps adoration by females is at stake here.....
Ok so I know I am wrong but please briefly point out where. Thanks. 
-
Zach Wegner
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm
Yes, I agree that the difference should be small on 16 or less processors. I'm surprised that the difference is that low for 16 processors actually. But there is still a difference, and I'm definitely going to be skeptical of anyone trying to claim that they have the _best_ speedup when using recursive search.bob wrote:I am not convinced that iterative search is required for good speedups. The DTS approach is certainly going to be _more_ efficient, but the margin is not huge between the two based on my dissertation results.
In the last test I did on this, Crafty got 3.3 / 4.0 on the same test set that CB got 3.7 on. While that is certainly significant, it is not overwhelming. Also at 16, I think CB got around 11.7 or so, while Crafty was around 10.8, although without the DTS paper in front of me, I might be a little off on the 11.7 number. And the 16 core numbers need to be backed up with more runs to be as reliable as the 8-core numbers I have provided in the past.
I have some 32 and 64 node numbers but not enough runs to want to publish any numbers and start yet another pointless discussion. But at least thru 16 processors, current Crafty does well. I have (and can easily run more) tons of 8-core results sitting around. Quite a few are on my ftp box, although those are AMD quad dual-core boxes as opposed to dual-quads based on Intel which seems to perform better with crafty.
I already have approaches ready and in place when I get access to a 32-64 core system for any significant amount of time. The issues are certainly non-trivial, but are clearly solvable given enough test time.
Here's hoping that you get some time on a 32/64 way SMP system. I'd love to see that. Alternatively you could clusterize Crafty. You can guess what I would do if I had one of those monsters sitting around...
Yes, the iterative search does waste a register. I can't measure a speed difference with it, as my search has been iterative since day one and all of the supporting routines are based on the iterative stack structure (hashing, move ordering, etc). A long time ago, 4 years or so, I converted my old program from a global board to a pointer, and there was very little difference IIRC. I started my new program with processes just to get rid of the pointer, but I find that any speed difference is very small and the ease of using threads is worth it.Remember that (a) not everyone is doing an iterative-type search, most are using recursion. (b) your iterative data is also requiring an extra register reference for the subscript. Bottom line is there is very little difference between the two approaches. When I added the pointer to crafty I found no appreciable slow-down at all. In Cray Blitz, which used iterative search, I used both threads and processes depending on the year in question, and didn't notice any speed difference at all. And, in fact, that should be the desired result.
Only issue with using processes rather than threads today is that egtb.cpp is written for threading so that it shares the LRU buffers (egtb cache). In processes, each process will end up with its own egtb cache which is not nearly as effective as a shared cache with threads...
I do have the second issue too, as ZCT uses Scorpio EGBBs. But now it uses threads on Windows (using the thread local stuff), so I imagine most users won't complain.
-
Jeroen
- Posts: 501
- Joined: Wed Mar 08, 2006 9:49 pm
Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm
So you might as well consider a Van Gogh or Monet simply being 'paint brushed on a piece of paper'
.
-
Jeroen
- Posts: 501
- Joined: Wed Mar 08, 2006 9:49 pm
Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm
If you read the Rybka forum carefully, you can see that the 100 Elo gain claim is also supported by results against other engines. Surprisingly the elo gain from games vs other engines is even better than Larry's claim.
-
gerold
- Posts: 10121
- Joined: Thu Mar 09, 2006 12:57 am
- Location: van buren,missouri
Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm
That does not surprise me.Jeroen wrote:If you read the Rybka forum carefully, you can see that the 100 Elo gain claim is also supported by results against other engines. Surprisingly the elo gain from games vs other engines is even better than Larry's claim.
-
Uri Blass
- Posts: 11207
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm
Your assumption("it horribly inflates the ELo ") seems not to be correctbob wrote:Personally I think this is a _terrible_ way of estimating Elo gain. I quit doing this years ago because it horribly inflates the ELo for a simple reason...Milton wrote:By lkaufman Date 2008-07-08 09:51 Since yesterday I've been testing a version of Rybka that is very close to Rybka 3, with the improved scaling and all my latest eval terms added. I'm running it against 2.3.2a mp. It appears that on a direct match basis, we will reach the goal of a 100 Elo gain, at least on quads. As of now, after 900 games total, the lead is 110 Elo (105 Elo on quads, 120 on my octal). This is with both programs using the same short generic book, each taking White once in every opening. To achieve this result Rybka 3 has to win about 4 games for each win by 2.3.2a on the quads and about 5 for 1 on the octal, due to draws. How this will translate to gains on the rating lists remains to be seen.
When you add some new piece of knowledge that might be helpful here and there, and that is the _only_ difference between the two engines, then any rating change is a direct result of that change plus the normal randomness that games between equal opponents produces. Since the two programs are identical except for the new piece of knowledge, the one with the new piece will occasionally use it to win a game.
But in real games between _different_ opponents, that new piece of knowledge might produce absolutely no improvement at all, or one so small that it takes thousands of games to measure. Once you think about it for a few minutes, you see why this is pretty meaningless. The fact that it produces _any_ improvement is certainly significant, but the fact that it produces a 100 Elo improvement is worthless...
I could probably find some test results to show this as at times, we add an old version of Crafty to our gauntlet for testing, and new changes tend to exaggerate that score compared to the scores against other programs in the mix.
here.
Larry explained that the new knowledge also made rybka slower so it was outsearched by older rybka.
He claims that this reason made the improvement smaller in rybka-rybka games(relaive to rybka against other opponents).
tests against other opponents in the rybka forum suggest slightly bigger improvement relative to rybka-rybka games.
Uri
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm
I would consider it "a painting". Perhaps done better than most others. But no different in the "nuts and bolts". Canvas. Oils. Brushes. Etc...Jeroen wrote:So you might as well consider a Van Gogh or Monet simply being 'paint brushed on a piece of paper'.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm
I have _always_ been skeptical of such claims as well. With one caveat. I do not _know_ that a well-done recursive search won't scale as well or better than the iterative DTS I did in CB. I've never claimed that CB's search was the ultimate and better than anything else around. I've never claimed anything similar for Crafty's parallel search either. I've just posted the results, the raw data, and let it go at that. "some" won't do that of course, which is OK. And sooner or later, it will be simple enough to measure scaling for any program that is made public.Zach Wegner wrote:Yes, I agree that the difference should be small on 16 or less processors. I'm surprised that the difference is that low for 16 processors actually. But there is still a difference, and I'm definitely going to be skeptical of anyone trying to claim that they have the _best_ speedup when using recursive search.bob wrote:I am not convinced that iterative search is required for good speedups. The DTS approach is certainly going to be _more_ efficient, but the margin is not huge between the two based on my dissertation results.
In the last test I did on this, Crafty got 3.3 / 4.0 on the same test set that CB got 3.7 on. While that is certainly significant, it is not overwhelming. Also at 16, I think CB got around 11.7 or so, while Crafty was around 10.8, although without the DTS paper in front of me, I might be a little off on the 11.7 number. And the 16 core numbers need to be backed up with more runs to be as reliable as the 8-core numbers I have provided in the past.
I have some 32 and 64 node numbers but not enough runs to want to publish any numbers and start yet another pointless discussion. But at least thru 16 processors, current Crafty does well. I have (and can easily run more) tons of 8-core results sitting around. Quite a few are on my ftp box, although those are AMD quad dual-core boxes as opposed to dual-quads based on Intel which seems to perform better with crafty.
I already have approaches ready and in place when I get access to a 32-64 core system for any significant amount of time. The issues are certainly non-trivial, but are clearly solvable given enough test time.
I'm also skeptical of "fixing the scaling problem in 6 weeks." Unless the previous search was so poorly written that the speedup was essentially nil. This is a time-consuming issue, and it is about much more than just searching in parallel. There are significant architectural hurdles that have to be overcome to get decent numbers.
It is an interesting thought. The 70 node 8core cluster is an interesting animal. But the message passing really poses a hurdle that looks to be difficult to deal with, as opposed to shared memory which is far simpler and more flexible (if still somewhat problematic in NUMA architectures).
Here's hoping that you get some time on a 32/64 way SMP system. I'd love to see that. Alternatively you could clusterize Crafty. You can guess what I would do if I had one of those monsters sitting around...
Yes, the iterative search does waste a register. I can't measure a speed difference with it, as my search has been iterative since day one and all of the supporting routines are based on the iterative stack structure (hashing, move ordering, etc). A long time ago, 4 years or so, I converted my old program from a global board to a pointer, and there was very little difference IIRC. I started my new program with processes just to get rid of the pointer, but I find that any speed difference is very small and the ease of using threads is worth it.Remember that (a) not everyone is doing an iterative-type search, most are using recursion. (b) your iterative data is also requiring an extra register reference for the subscript. Bottom line is there is very little difference between the two approaches. When I added the pointer to crafty I found no appreciable slow-down at all. In Cray Blitz, which used iterative search, I used both threads and processes depending on the year in question, and didn't notice any speed difference at all. And, in fact, that should be the desired result.
Only issue with using processes rather than threads today is that egtb.cpp is written for threading so that it shares the LRU buffers (egtb cache). In processes, each process will end up with its own egtb cache which is not nearly as effective as a shared cache with threads...
I do have the second issue too, as ZCT uses Scorpio EGBBs. But now it uses threads on Windows (using the thread local stuff), so I imagine most users won't complain.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm
You are absolutely correct about "more knowledge helping". But there is a serious issue to deal with. More knowledge means less tactical accuracy, because knowledge and speed are inversely related. Even worse is the way this knowledge has to be expressed exactly. A human can be told to avoid blocked positions and that is enough. But for a computer, describing "blocked" becomes something that is not so easy to do, because it has to be explicit, where a human can generallize. All of this turns into lots of code, and lots of code is slow... So we have resorted to simple hacks in the past. For example, the "trojan horse" code I used for several years to stop that particular strategy. Eventually others adopted similar ideas to avoid getting drubbed by players like "mercilous" on ICC. But it has some very bad properties. I once saw crafty refuse to re-capture after the opponent first captured a piece on g5, because that would open the h-file just as in the trojan attack. So it just played on a piece down where the recapture was perfectly safe. The hack works, but there is a significant cost in terms of general playing skill, which is why I only turn it on in certain cases and against humans but not computers.Nimzovik wrote:Yes.............. This has been stated repeatedly -- that programs can not understand closed positions -something that Father Pablo has exploited ad infinitum. So. The program rely on a hack to avoid/deal with closed positions generally speaking. (Although I found the W-chess engine was interesting -early versions as in Power chess-as it handled closed postions generally better than a lot of programs) This is a fatal flaw obviously. However I am curious...Mind you I am a complete idiot in terms of programinning....would not more chess "knowledge" in terms of closed positions of a specfic engine help here and then compare it's eval of the position with a "standard" program (like Shredder's triple brain concept) help here ? I realize implementaion would be gargantuan. However Kudos and money and fame and perhaps adoration by females is at stake here.....Ok so I know I am wrong but please briefly point out where. Thanks.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Impressive Preliminary Results of Rybka 3 by Larry Kaufm
Wouldn't argue with that at all. Just that drawing conclusions from N vs N+1 can be _very_ misleading for the reasons given.Jeroen wrote:If you read the Rybka forum carefully, you can see that the 100 Elo gain claim is also supported by results against other engines. Surprisingly the elo gain from games vs other engines is even better than Larry's claim.