What makes Rybka so strong?

Don · Post by **Don** » Sun Nov 01, 2009 2:03 pm

I've not seen anyone claim over +100 for LMR, assuming they use null-move R=3 everywhere. I only tested one non-crafty program and got numbers pretty close. I believe that when I turned it off on Fruit 2.x, it dropped closer to 50. But Fruit has a worse version of LMR than what I am using so that might make a difference in this measurement as well.

I have not run this test in a long time. I use adaptive null move pruning so I don't use R=3 everywhere. R=3 used to be the best value, but that has changed.

I recently performed a massive overhaul of my engine to get the level approximately where stockfish 1.4 is and it's probably not 100 ELO any longer because I am more aggressive about other kinds of pruning. ( http://greencheeks.homelinux.org:8015/~ ... islap.html )

I think the 100+ ELO in the old program is explained by the fact that I am pretty aggressive about LMR. I reduce up to 3 in some situations and from looking at the source code of other programs I see that 1 is much more common. I reduce later moves more that early moves and there are additional conditions for this.

frankp · Post by **frankp** » Sun Nov 01, 2009 2:16 pm

Don wrote:
I recently performed a massive overhaul of my engine to get the level approximately where stockfish 1.4 is and it's probably not 100 ELO any longer because I am more aggressive about other kinds of pruning. ( http://greencheeks.homelinux.org:8015/~ ... islap.html )
Geez... I am impressed. I cannot even get my new program to the level of my old one :-(

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Sun Nov 01, 2009 2:38 pm

mcostalba wrote:
Hart wrote:This makes sense and it was the first thing Anthony Cozzie said of Rybka 1.0 when the Strelka sources were released. He stated that Rybka was a search-heavy engine with very clear optimizations for speed and efficiency and contained no new tricks or novel ideas.
Yes, aim to efficency and speed tricks are very visible everywhere.

A few that stood out to me:

1) Separate search functions for, well, everything

I blame Fruit for the basic idea, but Rybka takes it way further.
2) Calculating both opening and endgame scores simultaneously by packing them both into an integer, thus saving ADD's.
3) Calculating move ordering scores at the same time as generating the moves.

Don · Post by **Don** » Sun Nov 01, 2009 3:10 pm

Gian-Carlo Pascutto wrote:
mcostalba wrote:
Hart wrote:This makes sense and it was the first thing Anthony Cozzie said of Rybka 1.0 when the Strelka sources were released. He stated that Rybka was a search-heavy engine with very clear optimizations for speed and efficiency and contained no new tricks or novel ideas.
Yes, aim to efficency and speed tricks are very visible everywhere.
A few that stood out to me:

1) Separate search functions for, well, everything I blame Fruit for the basic idea, but Rybka takes it way further.
2) Calculating both opening and endgame scores simultaneously by packing them both into an integer, thus saving ADD's.
3) Calculating move ordering scores at the same time as generating the moves.

Separate search functions for so many things cannot be a speed trick worth more than half a percent I would think. I have always had 3 cases in my programs since the days of rexchess which was an assembly program, now I have 4 cases: root, quies, PV, nonPV and I think it has some advantages in reasoning and thinking about the program, but it has the big disadvantage that certain kinds of changes may have to be maintained in several different places. However this does not seem to be happening as much as I thought it would.

bob · Post by **bob** » Sun Nov 01, 2009 6:21 pm

smcracraft wrote:
bob wrote:
smcracraft wrote:A very good question.

The "cone of silence" is deafening.
Whose "job" is it to analyze a mangled source program to see what it is doing, and then comment it so anybody can discover this without having to do any work of their own???
It's all of our job.

To make what is called human progress...

One person or group discovers a "secret" that enlarges their
personal largesse, security, wealth, happiness, etc.

In my personal opinion, it is our obligation and due diligence
as a species, for future generations, to uncover and expose
secretive/furtive progress and build upon the shoulders of others,
whether they like it or not.

--Stuart

I completely disagree. I'm not interested in reverse-engineering a program, unless I have no ideas on how to improve my own. So far, we have not had a lack of new ideas. We have more of a lack of enough time. Which makes reverse-engineering even less likely.

bob · Post by **bob** » Sun Nov 01, 2009 6:26 pm

Don wrote:
I've not seen anyone claim over +100 for LMR, assuming they use null-move R=3 everywhere. I only tested one non-crafty program and got numbers pretty close. I believe that when I turned it off on Fruit 2.x, it dropped closer to 50. But Fruit has a worse version of LMR than what I am using so that might make a difference in this measurement as well.
I have not run this test in a long time. I use adaptive null move pruning so I don't use R=3 everywhere. R=3 used to be the best value, but that has changed.

I recently performed a massive overhaul of my engine to get the level approximately where stockfish 1.4 is and it's probably not 100 ELO any longer because I am more aggressive about other kinds of pruning. ( http://greencheeks.homelinux.org:8015/~ ... islap.html )

I think the 100+ ELO in the old program is explained by the fact that I am pretty aggressive about LMR. I reduce up to 3 in some situations and from looking at the source code of other programs I see that 1 is much more common. I reduce later moves more that early moves and there are additional conditions for this.

I tried variable reductions early this year. But never found anything that worked. I even tried to qualitatively categorize moves as "good, fair, bad, and awful" where good moves were extended or left alone. Fair moves were reduced by 1, bad by 2 and awful by 3. I did not use history counters to determine the class, as I am convinced history counters are worthless at today's speeds and depths. But none was better. A couple of good (I thought) were break-even at best, most were slightly worse.

I used to use adaptive null-move until I added checks in the q-search. qchecks by themselves did not help, and actually cost a few Elo, but by going to R=3 everywhere, suddenly it was a gain of 4-5 Elo overall, due to the interaction and less danger of reducing near the tips (which could miss serious threats if led by a check).

mcostalba · Post by **mcostalba** » Sun Nov 01, 2009 6:28 pm

Gian-Carlo Pascutto wrote:
A few that stood out to me:

1) Separate search functions for, well, everything I blame Fruit for the basic idea, but Rybka takes it way further.
2) Calculating both opening and endgame scores simultaneously by packing them both into an integer, thus saving ADD's.
3) Calculating move ordering scores at the same time as generating the moves.

I answer for each point:

1) The only further "separation" that there is more then in SF is white/black splitting. I have tested not the search, but the do_move(), to split in do_move_white() and do_move_black(), with almost no difference.

Well in SF we use templates to do this so that there is no source code redundancy. Sometime it helped, as example in evaluation, to split in white and black, so for instance we have:

Code: Select all

  evaluate_king<WHITE>&#40;pos, ei&#41;;
  evaluate_king<BLACK>&#40;pos, ei&#41;;

Instead of a color loop, this helps becasue many indirect memory access are resolved at compile time and become direct ones. But in do_move() and in search() it doesn't help at least for SF.

Actually templetized version of do_move() is still under testing, but even if it helps we are below 1%.

2) This is what I call an ugly trick because is not platform independent (32/64 bit). And also if you save an add you also earn a left shift

The only advantage is to pack always in a uint32_t instead of a 2 unint32_t for 32 bit CPU or 2 uint64_t for 64 bit machine, but usually avoiding the natural 'integer' of the CPU is never a suggested technique and gains are all to be demonstrated.

3) This is horribly ugly becasue compeltely breaks the nice separation between move generation, scoring and sorting. And BTW I am not sure you save anything becasue you need an extra access during capture's move generation to a big table of precalculated MVV/LVA scores and also an extra access to look up what kind of piece you have in destination square. So the only save you have is the piece lookup of the origin square that during move generation is already known (of course) while if you score later you have to do it, but the origin square piece lookup is very fast because board[] is only 64 squares and is always in L1 for the whole scoring. So I think it is ugly and with very little gain (if any).

bob · Post by **bob** » Sun Nov 01, 2009 6:29 pm

frankp wrote:
Don wrote:
I recently performed a massive overhaul of my engine to get the level approximately where stockfish 1.4 is and it's probably not 100 ELO any longer because I am more aggressive about other kinds of pruning. ( http://greencheeks.homelinux.org:8015/~ ... islap.html )
Geez... I am impressed. I cannot even get my new program to the level of my old one
One has to be very careful when making comparisons based on 300-400 games. I will try to post some samples later today if I have time. It is quite easy to be _way_ off after 200-300 games, compared to where things finish after 40,000 games. The error bar is not that wide "just because". It really does mean "this is only accurate as a guess that is somewhere inside the window, not right at the center of the window"

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Sun Nov 01, 2009 8:09 pm

mcostalba wrote: 1) The only further "separation" that there is more then in SF is white/black splitting. I have tested not the search, but the do_move(), to split in do_move_white() and do_move_black(), with almost no difference.

Do you also have separate search functions for when being in check, for being close to the leaves but not in qsearch, and for doing a "move excluded" search?

Last time I looked, you did not.

2) This is what I call an ugly trick because is not platform independent (32/64 bit). And also if you save an add you also earn a left shift

The scores probably fit in 32 bits (16/16) which is portable.

You save much more than a single add at the cost of the shift, don't forget you save also all the lookups of the target addresses and the actual load+store on the value you are adding to.

3) This is horribly ugly becasue compeltely breaks the nice separation between move generation, scoring and sorting. And BTW I am not sure you save anything becasue you need an extra access during capture's move generation to a big table of precalculated MVV/LVA scores and also an extra access to look up what kind of piece you have in destination square.

You need those anyway if you score later, so doing it immediately saves instructions.

So the only save you have is the piece lookup of the origin square that during move generation is already known (of course) while if you score later you have to do it, but the origin square piece lookup is very fast because board[] is only 64 squares and is always in L1 for the whole scoring. So I think it is ugly and with very little gain (if any).

You also save the entire loop which has to go through the move data.

I'm not arguing about the layering violations - I also wouldn't write my engine like Rybka/Ippolit. But there's no doubt all those tricks combined are what make it faster than most programs.

mcostalba · Post by **mcostalba** » Sun Nov 01, 2009 8:43 pm

Gian-Carlo Pascutto wrote:
mcostalba wrote: 1) The only further "separation" that there is more then in SF is white/black splitting. I have tested not the search, but the do_move(), to split in do_move_white() and do_move_black(), with almost no difference.
Do you also have separate search functions for when being in check, for being close to the leaves but not in qsearch, and for doing a "move excluded" search?

Last time I looked, you did not.

No I don't. But also don't have all that different pruning techniques as "stand pat" in main search at low depth or the different pruning parameters and kind of generated moves according to distance to beta.

In case of SF the functions would be 95% the same so I guess benefit would not be that much.

Gian-Carlo Pascutto wrote:
2) This is what I call an ugly trick because is not platform independent (32/64 bit). And also if you save an add you also earn a left shift
The scores probably fit in 32 bits (16/16) which is portable.

You save much more than a single add at the cost of the shift, don't forget you save also all the lookups of the target addresses and the actual load+store on the value you are adding to.

I don't follow you. What I _could_ try is to define score as a 64 bit union of two uint32_t and access the high and/or low word, I would not need shifts and I would guess result is nicer to see, but I don't understand where I save the load+store on the value you are adding to because I have in any case to lookup in the midgame and endgame bonuses tables.

Instead a possible optimization could be to multiplex the midgame and endgame bonuses on the same table instead of 2 so that after the first access you have the next already in L1. Or even better you can access as a 64 bit quantity and get the two values in one go.

Gian-Carlo Pascutto wrote:
3) This is horribly ugly becasue compeltely breaks the nice separation between move generation, scoring and sorting. And BTW I am not sure you save anything becasue you need an extra access during capture's move generation to a big table of precalculated MVV/LVA scores and also an extra access to look up what kind of piece you have in destination square.
You need those anyway if you score later, so doing it immediately saves instructions.

So the only save you have is the piece lookup of the origin square that during move generation is already known (of course) while if you score later you have to do it, but the origin square piece lookup is very fast because board[] is only 64 squares and is always in L1 for the whole scoring. So I think it is ugly and with very little gain (if any).
You also save the entire loop which has to go through the move data.

Looking up big tables and / or more tables during move generation I am not sure is so cache friendly so also redo the loop but accessing two disjointing data sets in the two cases it seems to me more cache friendly. Anyhow this are just void argumentations without testing. What I can say is that I don't think this particular trick gains a lot...but of course when we talk about optimizations any argument is only 5% of the job, the 95% is testing on real numbers.

What makes Rybka so strong?

Re: What makes Rybka so strong?

Re: What makes Rybka so strong?

Re: What makes Rybka so strong?

Re: What makes Rybka so strong?

Re: What makes Rybka so strong?

Re: What makes Rybka so strong?

Re: What makes Rybka so strong?

Re: What makes Rybka so strong?

Re: What makes Rybka so strong?

Re: What makes Rybka so strong?