LMR

diep · Post by **diep** » Wed Mar 11, 2009 2:07 pm

bob wrote: All I can say is "where is the +40 Elo coming from?" It is not imaginary, it is based on tens of thousands of test games against multiple opponents.

Bob you raise a valid point. However now i'm lurking around at ICC daily again, 99% of all games you get are against 3200+ rated rybka's, crafty being 2700 rated or so.

So basically you lose about every game. If you then build in a feature that has a BESTCASE, besides a worst case, then of course you no longer feel the worstcase, because worse than it is going right now for us there it can't go.

We gotta improve some stuff Bob!

What i'm seeing in testgames against 2000 elo rated engines is that LMR shows its worst case, not its bestcase. I understand your big desire to sometimes beat Rybka 3.0, but let's first kick who's below us i'd say. I'd hate to see that posting of Uri saying: "my never parallellized program beated Diep".

Vincent

Stan Arts · Post by **Stan Arts** » Wed Mar 11, 2009 2:46 pm

Hi Vincent

diep wrote: Most testgames of today are simply not giving a program more system time than we did give software back in 1999. That's reality.

I also find the testing timecontrols have a big influence.
Take for example null move pruning, that isn't a huge winner till >8 ply, or even 10 ply and above. (Fresh in memory because I recently pitted my program with a completely fresh search part of completely flat alpha beta search, and was amazed to find in bullet and blitz (8 ply or less in middlegame) it can keep up.. also because without selectivity and "smart" tricks it's nps is about 2x higher. At longer TC however the normal one is completely dominant and runs circles around the idiot one. Testing if null move would be an improvement would probably come out as 0 Elo at bullet to several hundred at long timecontrols.)

diep wrote: LMR is a method that basically searches with simple moves your mainline deeper. My argument is that when simple trivial moves can flip your score in evaluation, that your engine has a bug in its evaluation.

I agree LMR as commonly described just searches a PV and not much else, which feels wrong and often is. On the other hand you can now handpick moves in an almost Shannon type B way without full risk. (only reducing instead of pruning) This is exciting and probably means there's some "holy grail" of picking moves filling up the weaknesses, that could lead to something playing strong. I guess this is why so many are trying. (As some sort of easy selectivity, following some famous super selective examples.)

In any case my engine is starving for depth and the main reason why it's so weak, so I need to find something.

diep wrote: Way more interesting than LMR i'd say is forward pruning last few plies. I'm gonna revive soon an experiment where i do search replacement. So last few plies when for example score is far above beta, i replace diep's slow search by some very fast tactical searcher that just has a look whether there is some sort of threat avoiding us from giving a cutoff here statically.

Using such a fast tactical searcher to find tactical threats sounds like a promising idea for Diep, especially as they can be run parallel? Hope you can get something out of it for Spain!

Does Diep pick up a lot of tactics through evaluation?
So far I hardly do that, all my tries quickly ran into unexpected exceptions that were impossible to code so each time I threw it out. (and then do not touch my engine for months.)

What I'm using is a tactical ply between full width search and Qsearch. Which is also a form of forward pruning. (matter of semantics) Here I try to detect some threats staticly, and if those check out ok I stand pat and handpick a bunch of moves. (for now just tactical moves, captures, checks, pawn pushes to 7th and promotions, etc.)
To me that is also attractive and safe (safe because you can only win or lose a ply to solution) and because it's very local you can be very specific.
I'm planning to put some more effort into this improving the threat detection and what moves to pick, so it becomes more of a usefull ply. Then if I do manage to improve that maybe do it for both sides. (2 ply)

Another thing I do on the layer before that selective ply, if eval is high above beta, again no static threats, then just return beta, because if I have the right to move along with being high above beta, likely the position is above beta. Fails in plenty of cases but it speeds up the search a little (usually only 10% or less, more in critical lines) and again the risk is only 1 ply. Probably doesn't work but I have it on anyway.

Anyway, what's sad is that versions of my program with vastly different searches perform mostly the same.

Bah.

Stan

diep · Post by **diep** » Wed Mar 11, 2009 3:39 pm

Stan Arts wrote:Hi Vincent

diep wrote: Most testgames of today are simply not giving a program more system time than we did give software back in 1999. That's reality.

I also find the testing timecontrols have a big influence.
Take for example null move pruning, that isn't a huge winner till >8 ply, or even 10 ply and above. (Fresh in memory because I recently pitted my program with a completely fresh search part of completely flat alpha beta search, and was amazed to find in bullet and blitz (8 ply or less in middlegame) it can keep up.. also because without selectivity and "smart" tricks it's nps is about 2x higher. At longer TC however the normal one is completely dominant and runs circles around the idiot one. Testing if null move would be an improvement would probably come out as 0 Elo at bullet to several hundred at long timecontrols.)

Well it is important to not follow the crafty model with nullmove. In Diep i'm using R=3. Big improvement with R=3 is that last 4 plies you do:

score = -qsearch(-beta,-alfa,side);
if( score >= beta )
return score; // in case of fail-soft or beta in case of fail-hard

Stan Arts wrote:
diep wrote: LMR is a method that basically searches with simple moves your mainline deeper. My argument is that when simple trivial moves can flip your score in evaluation, that your engine has a bug in its evaluation.
I agree LMR as commonly described just searches a PV and not much else, which feels wrong and often is.

Well there is 2 sides of the medal.

If you've got a tiny yet really well tuned eval and you are a real good low level programmer, such as the Cozzie type engines, then obviously it is a genius way. But be sure you get 4 million nps single core at least like Rybka does, or don't do it.

When just doing mainline checking, you obviously are designing an engine which will not soon find spectacular moves. It plays a kind of 'work chess'. Yet that's very effective. Don't underestimate how important it is to play bugfree chess.

Yet are you going to beat others in this manner?

I doubt it.

Stan Arts wrote: On the other hand you can now handpick moves in an almost Shannon type B way without full risk. (only reducing instead of pruning) This is exciting and probably means there's some "holy grail" of picking moves filling up the weaknesses, that could lead to something playing strong. I guess this is why so many are trying.

Without anyone trying, it's impossible to find something new that works great of course. However, end 98 until end 99, i not only parallellized diep, what i also did do is a 'hand picked LMR' search.

Using chessknowledge i decided which move to try and which move to not try. You'd call that reductions now. Note that reductions is something else than this.

A move in diep either was 1 or 2 units. Depending upon implementation there is a huge difference for hashtable, yet algorithmic it has the same problems and advantages like LMR of course.

Very soon i figured out it was crucial to just look at most 2 to 3 moves or so and not more, and reduce the rest including tactics.

If we ignore tactics now, as there is much easier ways to get tactical stronger than LMR, let's look to the other moves selected.

A major difference with LMR is that with LMR the odds to pick the best moves are a lot less and that the moves you pick with LMR each time are different moves.

The move selection i used in 99 was completely alfabeta dependant.

Observation 1 i did do:
a) the branching factor hardly improved at search depth >= 14
b) in tactical positions it won very few plies search depth, whereas there
was a very big risk
c) it had huge problems failing high

Observation with LMR is:
A) the branching factor does improve
B) it wins more plies
C) it has even more problems failing high, especially when you run parallel

If you make statistical significant why in 199 the b.f. hardly improved, that's because the randomness of LMR cooperates very bad with hashtable. The randomness of the moves you search somehow takes care that you just prune away more.

This is easy to prove (mathematical).

As odds are so little that the best move gets selected (other than when it is a tactical move) and the number of reductions are so big, that of course it is statistical logical that in the subtree behind us, most nodes where we reduce in, we do not have the best move in the not-reduced selection.

It is follows from that that we are maximizing our odds for a cutoff in a manner that what gets stored in hashtable is based upon random reductions.

So basically we search deeper BECAUSE we do reduce the bestmove in 80% of the positions also.

That's why LMR gives extra search depth and why the much better setup i had back in 1999 had big problems improving branching factor.

Of course C still is the same.

Stan Arts wrote: (As some sort of easy selectivity, following some famous super selective examples.)

In any case my engine is starving for depth and the main reason why it's so weak, so I need to find something.

A lot of that might get caused by evaluation.

Stan Arts wrote:
diep wrote: Way more interesting than LMR i'd say is forward pruning last few plies. I'm gonna revive soon an experiment where i do search replacement. So last few plies when for example score is far above beta, i replace diep's slow search by some very fast tactical searcher that just has a look whether there is some sort of threat avoiding us from giving a cutoff here statically.
Using such a fast tactical searcher to find tactical threats sounds like a promising idea for Diep, especially as they can be run parallel? Hope you can get something out of it for Spain!

Thanks!

The bad side of promising ideas is that they eat a lot of time and that happy testing and doing tiny bugfixes in eval brings more elo than doing effort for something. After it works genius for Diep, others who didn't do any experiment in that terrain can blindfolded follow without losing those years of research. End 2004, start 2005 i did put a lot of effort into all this.

So i can skip the phase now of writing code to write a fast searcher. That framework is ready!

Stan Arts wrote: Does Diep pick up a lot of tactics through evaluation?
So far I hardly do that, all my tries quickly ran into unexpected exceptions that were impossible to code so each time I threw it out. (and then do not touch my engine for months.)

Yes you nail some very important thing here. Diep picks up really a lot of tactics in eval. Especially kingsafety is a big issue. When you just start an engine i'd argue one of the most important things to put time into is kingsafety. If i would do a lazy eval with diep, diep gets 2x more nps.

Yet despite searching deeper then and being 2x faster, it is tactical a LOT weaker. Kingsafety in diep has no real 'maximum' penalty. It easily can go up to a pawn or 40. So the last few moves to mate a king in the center, diep doesn't need to play them at all. Its eval already picks it up handsdown.

This is one of the biggest obstacles always to get some trick last few plies to work to forward prune.

Stan Arts wrote: What I'm using is a tactical ply between full width search and Qsearch. Which is also a form of forward pruning. (matter of semantics) Here I try to detect some threats staticly, and if those check out ok I stand pat and handpick a bunch of moves. (for now just tactical moves, captures, checks, pawn pushes to 7th and promotions, etc.)

Diep has 32 plies of selective search in theory between main search and qsearch. Practical i limited it some years ago to 7 ply. In these days i really extended a lot there. Start 2006 when i finally had at home a 4 processor box (dual opteron dual core in fact) i saw how doing all this hurted search a lot. I made qsearch cheaper (kicked out all kind of moves i tried there that do not capture nor give check nor evade a check) and limited selective search more. Nowadays diep hardly does do that. Happens very little, yet i wouldn't be amazed that what i'm doing there still hurts search.

Stan Arts wrote: To me that is also attractive and safe (safe because you can only win or lose a ply to solution) and because it's very local you can be very specific.
I'm planning to put some more effort into this improving the threat detection and what moves to pick, so it becomes more of a usefull ply. Then if I do manage to improve that maybe do it for both sides. (2 ply)

Well you must preserve your branching factor. that extra ply is doing great of course in bullet and tactical testsets at 3 seconds a move, yet if it makes your b.f. worse it hurts of course.

Stan Arts wrote: Another thing I do on the layer before that selective ply, if eval is high above beta, again no static threats, then just return beta, because if I have the right to move along with being high above beta, likely the position is above beta. Fails in plenty of cases but it speeds up the search a little (usually only 10% or less, more in critical lines) and again the risk is only 1 ply. Probably doesn't work but I have it on anyway.

I used to extensive extend here into selective search in diep until say start 2006. In fact around 2000-2002 i limited it a little compared to what i did do before. It works tactical genius.

Rybka might be doing something like that. That's what happens if you tune for blitz. I'm not a big believer in it. I chose in the end for diep search a ply deeper instead of extending when a queen is attacked or multiple pieces attacked. It's very risky to extend there.

It is not similar to forward pruning as how i plan to forward prune. With forward pruning you also take a look to positional moves. With such extensions you're only busy at the tactical level. Your engine is tactical 500 elo stronger or so at bullet, but you lose plies search depth.

Now i do realize most engines that forward prune last plies, they basically search tactics there and also prune about every positoinal move away.

Yet with forward pruning last few plies is quite different from making specific tactical extensions.

With that forward pruning you can see the tactics AND see some positional stuff and you don't have the nullmove problem.

The difference between extending when 2 pieces are attacked and forward pruning using positional grounds (chessknowledge in short) is the nps you do it with. If you get 3 - 4 million nps, you just do not have the system time to have a lot of conditions to decide what to prune and what to not prune.

That's a severe slowdown of an engine both nps wise as well as search depth wise.

When extending based upon threats a big problem is nullmove. Like 70% of all nullmoves depthleft <= 4 is giving a cutoff.

Yet the only correct manner there is to not call qsearch nullmove, but your selective search first. If your nullmove would not be capable of also detecting all these threats, you have a big problem!

For the same reason with diep's big eval it is difficult to replace search. Its eval picks up that much, that a fast searcher doesn't.

Stan Arts wrote: Anyway, what's sad is that versions of my program with vastly different searches perform mostly the same. Bah.

Stan

Search is mighty interesting to research, yet what brings the real hard elo nowadays is a bugfree evaluation of course.

Diep is one of the few programs that doesn't get real deep when i'm not turning on crap stuff like LMR.

Vincent

Uri Blass · Post by **Uri Blass** » Wed Mar 11, 2009 4:06 pm

diep wrote:
bob wrote: All I can say is "where is the +40 Elo coming from?" It is not imaginary, it is based on tens of thousands of test games against multiple opponents.
<snipped>
Bob you raise a valid point. However now i'm lurking around at ICC daily again, 99% of all games you get are against 3200+ rated rybka's, crafty being 2700 rated or so.
Vincent

This is clearly irrelevant because Bob's +40 elo is based on testing against Fruit Toga and Glaurung and not based on ICC.

Uri

Uri Blass · Post by **Uri Blass** » Wed Mar 11, 2009 4:47 pm

Stan Arts wrote:
<snipped>
I also find the testing timecontrols have a big influence.
Take for example null move pruning, that isn't a huge winner till >8 ply, or even 10 ply and above. (Fresh in memory because I recently pitted my program with a completely fresh search part of completely flat alpha beta search, and was amazed to find in bullet and blitz (8 ply or less in middlegame) it can keep up..

Stan

Not based on my experience.

I remember that when I implemented null move in movei(I think in 2002 or 2003 and I do not remember the exact date) it helped also in blitz and movei was a relatively weak program at that time so it did not search very deep.

I also know that null move pruning is significantly productive for weak programs like micromax.

I think that it is the opposite and null move helps more at blitz relative to long time control because at the extreme case of very long time control not using null move pruning solve chess when using null move pruning cause errors(of course we do not get the extreme case so null move is still productive in all time controls that we can practically use but the benefit is smaller at longer time control).

Maybe at some extreme blitz time control null move is not productive but I never got that extreme blitz time control in games and I believe that H.G.Muller(programmer of micromax) also never got time control when micromax without null move pruning is stronger than micromax with null move pruning and I remember reading that he found that null move pruning helped micromax by some hundrends of elo when null move pruning helped Crafty only by 40 elo or 80 elo(40 elo if late move reductions is used and 80 elo if late move reductions is not used).

It seems that basically null move pruning is less productive for programs that are good in their other parts and null move pruning can help micromax more than Crafty for the simple reason that micromax is relatively weak because of it's small size(not the fault of the programmer but based on my memory he could not use killers for order of moves because of the small size when his target was to write a chess program with source that is less than 2 kbytes and micromax basically use iid for order of moves and it is only one example for micromax weaknesses).

Uri

bob · Post by **bob** » Wed Mar 11, 2009 5:11 pm

diep wrote:
bob wrote: All I can say is "where is the +40 Elo coming from?" It is not imaginary, it is based on tens of thousands of test games against multiple opponents.
Bob you raise a valid point. However now i'm lurking around at ICC daily again, 99% of all games you get are against 3200+ rated rybka's, crafty being 2700 rated or so.

So basically you lose about every game. If you then build in a feature that has a BESTCASE, besides a worst case, then of course you no longer feel the worstcase, because worse than it is going right now for us there it can't go.

We gotta improve some stuff Bob!

What i'm seeing in testgames against 2000 elo rated engines is that LMR shows its worst case, not its bestcase. I understand your big desire to sometimes beat Rybka 3.0, but let's first kick who's below us i'd say. I'd hate to see that posting of Uri saying: "my never parallellized program beated Diep".

Vincent

I don't pay any attention to ICC games. I see cooked books, really fast hardware, most are running Rybka, etc. My testing is done in a controlled environment on equal hardware, equal time controls, selected starting positions, no pondering, etc... I trust these results far more than anything from ICC where you can't even tell what hardware your opponent is using.

As far as Rybka goes, I am not even looking at that at the moment. I don't run it on my cluster and it isn't a part of any testing I do.

bob · Post by **bob** » Wed Mar 11, 2009 5:17 pm

Stan Arts wrote:Hi Vincent

diep wrote: Most testgames of today are simply not giving a program more system time than we did give software back in 1999. That's reality.

I also find the testing timecontrols have a big influence.
Take for example null move pruning, that isn't a huge winner till >8 ply, or even 10 ply and above. (Fresh in memory because I recently pitted my program with a completely fresh search part of completely flat alpha beta search, and was amazed to find in bullet and blitz (8 ply or less in middlegame) it can keep up.. also because without selectivity and "smart" tricks it's nps is about 2x higher. At longer TC however the normal one is completely dominant and runs circles around the idiot one. Testing if null move would be an improvement would probably come out as 0 Elo at bullet to several hundred at long timecontrols.)

diep wrote: LMR is a method that basically searches with simple moves your mainline deeper. My argument is that when simple trivial moves can flip your score in evaluation, that your engine has a bug in its evaluation.
I agree LMR as commonly described just searches a PV and not much else, which feels wrong and often is. On the other hand you can now handpick moves in an almost Shannon type B way without full risk. (only reducing instead of pruning) This is exciting and probably means there's some "holy grail" of picking moves filling up the weaknesses, that could lead to something playing strong. I guess this is why so many are trying. (As some sort of easy selectivity, following some famous super selective examples.)

In any case my engine is starving for depth and the main reason why it's so weak, so I need to find something.

diep wrote: Way more interesting than LMR i'd say is forward pruning last few plies. I'm gonna revive soon an experiment where i do search replacement. So last few plies when for example score is far above beta, i replace diep's slow search by some very fast tactical searcher that just has a look whether there is some sort of threat avoiding us from giving a cutoff here statically.
Using such a fast tactical searcher to find tactical threats sounds like a promising idea for Diep, especially as they can be run parallel? Hope you can get something out of it for Spain!

Does Diep pick up a lot of tactics through evaluation?
So far I hardly do that, all my tries quickly ran into unexpected exceptions that were impossible to code so each time I threw it out. (and then do not touch my engine for months.)

What I'm using is a tactical ply between full width search and Qsearch. Which is also a form of forward pruning. (matter of semantics) Here I try to detect some threats staticly, and if those check out ok I stand pat and handpick a bunch of moves. (for now just tactical moves, captures, checks, pawn pushes to 7th and promotions, etc.)
To me that is also attractive and safe (safe because you can only win or lose a ply to solution) and because it's very local you can be very specific.
I'm planning to put some more effort into this improving the threat detection and what moves to pick, so it becomes more of a usefull ply. Then if I do manage to improve that maybe do it for both sides. (2 ply)

Another thing I do on the layer before that selective ply, if eval is high above beta, again no static threats, then just return beta, because if I have the right to move along with being high above beta, likely the position is above beta. Fails in plenty of cases but it speeds up the search a little (usually only 10% or less, more in critical lines) and again the risk is only 1 ply. Probably doesn't work but I have it on anyway.

Anyway, what's sad is that versions of my program with vastly different searches perform mostly the same. Bah.

Stan

I'm not seeing this. I have tested null on/off at 10sec+0.1sec games, at 1min+1sec games, all the way up to 60mins+60secs and I did not find any significant difference, other than the fact that as depth increases, the rating difference between all programs seems to shrink due to more draws and fewer simple tactical blunders.

If I play Crafty vs the opponents/positions I use, as I go from the fastest to slowest time control, the ratings get somewhat closer (not hugely, but you can measure the effect. null-move (for me, anyway) follows that same slope, except that with null move is always better than without. I have certainly seen (and discussed here) some cases where something was better/worse at fast time controls than at deep ones. But IMHO, null move should favor fast games rather than slow games simply because it gives more depth and the faster the game, the more that extra depth helps...

diep · Post by **diep** » Wed Mar 11, 2009 6:17 pm

bob wrote:
diep wrote:
bob wrote: All I can say is "where is the +40 Elo coming from?" It is not imaginary, it is based on tens of thousands of test games against multiple opponents.
Bob you raise a valid point. However now i'm lurking around at ICC daily again, 99% of all games you get are against 3200+ rated rybka's, crafty being 2700 rated or so.

So basically you lose about every game. If you then build in a feature that has a BESTCASE, besides a worst case, then of course you no longer feel the worstcase, because worse than it is going right now for us there it can't go.

We gotta improve some stuff Bob!

What i'm seeing in testgames against 2000 elo rated engines is that LMR shows its worst case, not its bestcase. I understand your big desire to sometimes beat Rybka 3.0, but let's first kick who's below us i'd say. I'd hate to see that posting of Uri saying: "my never parallellized program beated Diep".

Vincent
I don't pay any attention to ICC games. I see cooked books, really fast hardware, most are running Rybka, etc. My testing is done in a controlled environment on equal hardware, equal time controls, selected starting positions, no pondering, etc... I trust these results far more than anything from ICC where you can't even tell what hardware your opponent is using.

As far as Rybka goes, I am not even looking at that at the moment. I don't run it on my cluster and it isn't a part of any testing I do.

Can you email me those games played in the controlled environment?

Thanks,
Vincent

bob · Post by **bob** » Thu Mar 12, 2009 12:20 am

diep wrote:
bob wrote:
diep wrote:
bob wrote: All I can say is "where is the +40 Elo coming from?" It is not imaginary, it is based on tens of thousands of test games against multiple opponents.
Bob you raise a valid point. However now i'm lurking around at ICC daily again, 99% of all games you get are against 3200+ rated rybka's, crafty being 2700 rated or so.

So basically you lose about every game. If you then build in a feature that has a BESTCASE, besides a worst case, then of course you no longer feel the worstcase, because worse than it is going right now for us there it can't go.

We gotta improve some stuff Bob!

What i'm seeing in testgames against 2000 elo rated engines is that LMR shows its worst case, not its bestcase. I understand your big desire to sometimes beat Rybka 3.0, but let's first kick who's below us i'd say. I'd hate to see that posting of Uri saying: "my never parallellized program beated Diep".

Vincent
I don't pay any attention to ICC games. I see cooked books, really fast hardware, most are running Rybka, etc. My testing is done in a controlled environment on equal hardware, equal time controls, selected starting positions, no pondering, etc... I trust these results far more than anything from ICC where you can't even tell what hardware your opponent is using.

As far as Rybka goes, I am not even looking at that at the moment. I don't run it on my cluster and it isn't a part of any testing I do.
Can you email me those games played in the controlled environment?

Thanks,
Vincent

How many millions of games do you want and what can they be used for???

Aleks Peshkov · Post by **Aleks Peshkov** » Thu Mar 12, 2009 6:39 am

I do not believe that Diep's eval is badly tuned.

But I think that complex evaluation hurts branching factor and worsen gains of any forward pruning tricks because of larger random factor of minor relevant positional moves that leads to search instabilities.

I suspect that Rybka 3 has worse branching factor then Rybka 2 because of more complex evaluation and not because Vas had greatly redesigned his search from Rybka 1 beta times.

Re: LMR

Re: LMR

Re: LMR

Re: LMR

Re: LMR

Re: LMR

Re: LMR

Re: LMR

Re: LMR

Re: LMR