When does a cut-node became an all-node?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: When does a cut-node became an all-node?

Post by lkaufman »

diep wrote:
lkaufman wrote: Please clarify -- are you simply talking about increasing the null move reduction (like from 3 plies to 4, or from 4 to 5) when the score is more than a pawn plus? If this is what you refer to, its effect is pretty minor, and everyone is using this now.
I had editted my post. Yes, amongst others i do refer to this.

It's yet another few plies of search that are dependant upon a simple static evaluation and avoid the backtracking to correct the evaluation.

It is wrong to call this minor, as when you're getting outsearched, you're dead meat, and the dead meat is even more dead if its evaluation that's already wrong in this position, adds another ply of total search reduction, by means of increasing the reduction factor.

Nullmove is that powerful especially because it reduces also positional lines, it's not just a tactical form of search.

What rybka is doing there is in between not taking a risk and the full blown risk that SF takes there as its reduction factor is in a lookuptable.

The advantage of SF there is that it always profits from a bigger search depth because of it, whereas Rybka has a real worst case there.

In Diep for now i'm not willing to soon increase the R of nullmove and i don't want to reduce more than 1 ply either with reductions.

Note we can mathematically prove that the optimum to reduce is 1 ply.
Vincent
That's a very interesting claim, can you provide more details? I don't know any top program nowadays that limits LMR to a single ply. I am quite sure we would drop a ton of elo points if we did so.
Regarding null move, my point is that there is little to gain by (let's say) reducing the null move by 5 plies instead of 4 if a pawn up. The time saved is just a few percent. Neverthless, it appears to be worth it (since all top programs do it), so it means that the risk is pretty low in practice.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: When does a cut-node became an all-node?

Post by diep »

lkaufman wrote:
diep wrote:
lkaufman wrote: Please clarify -- are you simply talking about increasing the null move reduction (like from 3 plies to 4, or from 4 to 5) when the score is more than a pawn plus? If this is what you refer to, its effect is pretty minor, and everyone is using this now.
I had editted my post. Yes, amongst others i do refer to this.

It's yet another few plies of search that are dependant upon a simple static evaluation and avoid the backtracking to correct the evaluation.

It is wrong to call this minor, as when you're getting outsearched, you're dead meat, and the dead meat is even more dead if its evaluation that's already wrong in this position, adds another ply of total search reduction, by means of increasing the reduction factor.

Nullmove is that powerful especially because it reduces also positional lines, it's not just a tactical form of search.

What rybka is doing there is in between not taking a risk and the full blown risk that SF takes there as its reduction factor is in a lookuptable.

The advantage of SF there is that it always profits from a bigger search depth because of it, whereas Rybka has a real worst case there.

In Diep for now i'm not willing to soon increase the R of nullmove and i don't want to reduce more than 1 ply either with reductions.

Note we can mathematically prove that the optimum to reduce is 1 ply.
Vincent
That's a very interesting claim, can you provide more details? I don't know any top program nowadays that limits LMR to a single ply. I am quite sure we would drop a ton of elo points if we did so.
Regarding null move, my point is that there is little to gain by (let's say) reducing the null move by 5 plies instead of 4 if a pawn up. The time saved is just a few percent. Neverthless, it appears to be worth it (since all top programs do it), so it means that the risk is pretty low in practice.
Saying something 'drops a ton of elo' meanwhile downplaying its importance is kind of weird.

You asked the difference. I gave it to you.

The direct static eval dependancy is the nullmove R, versus SF isn't doing it dependant.

So rybka has a bigger worst case than SF based upon assuming in any given position its eval is within 1 pawn of absolute truth.

Now you basically say: "this is no big deal". Yet it's the objective difference.

And IMHO it's a difference that's making it vulnerable.

Should you change it? No of course not - i like to beat you.

Reducing 1 ply is the mathematical optimum, basically reduction of 1 ply is still what's getting done in the end by most of todays engines, be it in a more risky way. Also that it's the mathematical optimum between risk and reduction of search tree doesn't mean that taking more risk isn't gonna work for you :)

Most of the ultraselective stuff getting used right now one can explain as working using the lemma i posted.

Also all this selective searching can of course only work using my lemma of 90s, which was disputed at the time, that is that there is a tactical barrier above which only positional searching deeper is interesting.

As that's exactly what about every todays program is doing. The focus is no longer upon picking up tactics as quick as possible.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: When does a cut-node became an all-node?

Post by lkaufman »

diep wrote: That's a very interesting claim, can you provide more details? I don't know any top program nowadays that limits LMR to a single ply. I am quite sure we would drop a ton of elo points if we did so.
Regarding null move, my point is that there is little to gain by (let's say) reducing the null move by 5 plies instead of 4 if a pawn up. The time saved is just a few percent. Neverthless, it appears to be worth it (since all top programs do it), so it means that the risk is pretty low in practice.
Saying something 'drops a ton of elo' meanwhile downplaying its importance is kind of weird.

You asked the difference. I gave it to you.



The elo reference was to limiting LMR. The downplaying of importance was regarding extra null move ply when ahead by a pawn. Two different topics.



The direct static eval dependancy is the nullmove R, versus SF isn't doing it dependant.



Stockfish also does an extra ply of null move reduction when ahead by a pawn. The only difference among the top programs is that the Ippo programs add 1.5 ply instead of just one ply.



So rybka has a bigger worst case than SF based upon assuming in any given position its eval is within 1 pawn of absolute truth.

Now you basically say: "this is no big deal". Yet it's the objective difference.

And IMHO it's a difference that's making it vulnerable.


As I say, there is no difference here. Maybe you looked at an old SF version?


Should you change it? No of course not - i like to beat you.

Reducing 1 ply is the mathematical optimum, basically reduction of 1 ply is still what's getting done in the end by most of todays engines, be it in a more risky way. Also that it's the mathematical optimum between risk and reduction of search tree doesn't mean that taking more risk isn't gonna work for you :)


I don't understand how there can be a mathematical optimum reduction, it should depend on how good your move ordering is. If it's good enough, you could reduce all moves after the first by 100 plies!



Most of the ultraselective stuff getting used right now one can explain as working using the lemma i posted.

Also all this selective searching can of course only work using my lemma of 90s, which was disputed at the time, that is that there is a tactical barrier above which only positional searching deeper is interesting.

As that's exactly what about every todays program is doing. The focus is no longer upon picking up tactics as quick as possible.[/quote]


Here I agree with you 100%. We don't even pay attention to improving tactics as such in Komodo, we just try to maximize the elo in actual play. Probably this is 90% or more achieved by improved positional play.

Larry
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: When does a cut-node became an all-node?

Post by diep »

[snip]
lkaufman wrote:
diep wrote: Reducing 1 ply is the mathematical optimum, basically reduction of 1 ply is still what's getting done in the end by most of todays engines, be it in a more risky way. Also that it's the mathematical optimum between risk and reduction of search tree doesn't mean that taking more risk isn't gonna work for you :)
I don't understand how there can be a mathematical optimum reduction, it should depend on how good your move ordering is. If it's good enough, you could reduce all moves after the first by 100 plies!
Well now you write as how we did do computerchess in the 90s. Namely using the extreme assumptions. We already know for a year or 13, that the fliprate (heh another computerchess statistic i introduced back then) of crafty was around a 5% and of Diep it was far lower, between 1% and 2% at the time. Probably that's different nowadays, especially crafty will have improved and with it most of the modern beancounters, but not *that much* better than it used to be.

So we can safely assume that reducing things 100 ply is too much of a risk. The mathematical optimum between risk and profit is then 1 ply. In how i wrote that down, that means simply this breaks even a lot quicker.

If you fiddle some with chances you'll also end up there.

Reducing 2 ply, simply is very very risky as compared to reducing 1 ply. Which doesn't mean that you shouldn't try doing that if you already get 20+ ply anyway :)

A pragmatic approach there isn't gonna get punished...
lkaufman wrote:
diep wrote: Most of the ultraselective stuff getting used right now one can explain as working using the lemma i posted.

Also all this selective searching can of course only work using my lemma of 90s, which was disputed at the time, that is that there is a tactical barrier above which only positional searching deeper is interesting.

As that's exactly what about every todays program is doing. The focus is no longer upon picking up tactics as quick as possible.
Here I agree with you 100%. We don't even pay attention to improving tactics as such in Komodo, we just try to maximize the elo in actual play. Probably this is 90% or more achieved by improved positional play.

Larry
In 90s, even start 21th century, majority of posters in this forum as well as in RGCC, the computerchess experts were asking for my head for predicting this :)
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: When does a cut-node became an all-node?

Post by lkaufman »

diep wrote:[snip]
lkaufman wrote:
diep wrote: Reducing 1 ply is the mathematical optimum, basically reduction of 1 ply is still what's getting done in the end by most of todays engines, be it in a more risky way. Also that it's the mathematical optimum between risk and reduction of search tree doesn't mean that taking more risk isn't gonna work for you :)
I don't understand how there can be a mathematical optimum reduction, it should depend on how good your move ordering is. If it's good enough, you could reduce all moves after the first by 100 plies!
Well now you write as how we did do computerchess in the 90s. Namely using the extreme assumptions. We already know for a year or 13, that the fliprate (heh another computerchess statistic i introduced back then) of crafty was around a 5% and of Diep it was far lower, between 1% and 2% at the time. Probably that's different nowadays, especially crafty will have improved and with it most of the modern beancounters, but not *that much* better than it used to be.

So we can safely assume that reducing things 100 ply is too much of a risk. The mathematical optimum between risk and profit is then 1 ply. In how i wrote that down, that means simply this breaks even a lot quicker.

If you fiddle some with chances you'll also end up there.

Reducing 2 ply, simply is very very risky as compared to reducing 1 ply. Which doesn't mean that you shouldn't try doing that if you already get 20+ ply anyway :)

A pragmatic approach there isn't gonna get punished...

Three questions:
1. Could you please define "fliprate"?
2. What assumptions are made in your proof that the ideal LMR reduction is just 1 ply?
3. What is your opinion of the idea of increasing the null move reduction with increased depth? Stockfish pushes this to the extreme, adding another ply of reduction for each 4 plies of search. I imagine you will think it is a bad idea, but I don't like to assume such things.

Larry
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: When does a cut-node became an all-node?

Post by diep »

lkaufman wrote:
diep wrote:[snip]
lkaufman wrote:
diep wrote: Reducing 1 ply is the mathematical optimum, basically reduction of 1 ply is still what's getting done in the end by most of todays engines, be it in a more risky way. Also that it's the mathematical optimum between risk and reduction of search tree doesn't mean that taking more risk isn't gonna work for you :)
I don't understand how there can be a mathematical optimum reduction, it should depend on how good your move ordering is. If it's good enough, you could reduce all moves after the first by 100 plies!
Well now you write as how we did do computerchess in the 90s. Namely using the extreme assumptions. We already know for a year or 13, that the fliprate (heh another computerchess statistic i introduced back then) of crafty was around a 5% and of Diep it was far lower, between 1% and 2% at the time. Probably that's different nowadays, especially crafty will have improved and with it most of the modern beancounters, but not *that much* better than it used to be.

So we can safely assume that reducing things 100 ply is too much of a risk. The mathematical optimum between risk and profit is then 1 ply. In how i wrote that down, that means simply this breaks even a lot quicker.

If you fiddle some with chances you'll also end up there.

Reducing 2 ply, simply is very very risky as compared to reducing 1 ply. Which doesn't mean that you shouldn't try doing that if you already get 20+ ply anyway :)

A pragmatic approach there isn't gonna get punished...

Three questions:
1. Could you please define "fliprate"?
2. What assumptions are made in your proof that the ideal LMR reduction is just 1 ply?
3. What is your opinion of the idea of increasing the null move reduction with increased depth? Stockfish pushes this to the extreme, adding another ply of reduction for each 4 plies of search. I imagine you will think it is a bad idea, but I don't like to assume such things.

Larry
1. odds that a node previously stored in hashtable as <= alfa flips to above beta. Probably you want to measure the flips to > alfa as well, but that's relative little obviously :)

2. It's not a proof of anything of course, pure mathematical seen. It's a simple calculation, yet quite lengthy. Realize very popular also back then was fractional plydepths, so when calculating the optimum we c an use broken depths as well. That makes it however a lineair programming problem which as we know is solvable even without complex numbers.

The answer to this problem is not 42 in this case, yet 1, assuming the normal reduction for the depthlimited search also is 1. So that makes the total reduction 2 in c ase of having areduction and 1 in case of not reducing or research. Note back in 1999 i didn't research, in case of a fail high.

Vincent
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: When does a cut-node became an all-node?

Post by diep »

lkaufman wrote:
diep wrote:[snip]
lkaufman wrote:
diep wrote: Reducing 1 ply is the mathematical optimum, basically reduction of 1 ply is still what's getting done in the end by most of todays engines, be it in a more risky way. Also that it's the mathematical optimum between risk and reduction of search tree doesn't mean that taking more risk isn't gonna work for you :)
I don't understand how there can be a mathematical optimum reduction, it should depend on how good your move ordering is. If it's good enough, you could reduce all moves after the first by 100 plies!
Well now you write as how we did do computerchess in the 90s. Namely using the extreme assumptions. We already know for a year or 13, that the fliprate (heh another computerchess statistic i introduced back then) of crafty was around a 5% and of Diep it was far lower, between 1% and 2% at the time. Probably that's different nowadays, especially crafty will have improved and with it most of the modern beancounters, but not *that much* better than it used to be.

So we can safely assume that reducing things 100 ply is too much of a risk. The mathematical optimum between risk and profit is then 1 ply. In how i wrote that down, that means simply this breaks even a lot quicker.

If you fiddle some with chances you'll also end up there.

Reducing 2 ply, simply is very very risky as compared to reducing 1 ply. Which doesn't mean that you shouldn't try doing that if you already get 20+ ply anyway :)

A pragmatic approach there isn't gonna get punished...

Three questions:
1. Could you please define "fliprate"?
2. What assumptions are made in your proof that the ideal LMR reduction is just 1 ply?
3. What is your opinion of the idea of increasing the null move reduction with increased depth? Stockfish pushes this to the extreme, adding another ply of reduction for each 4 plies of search. I imagine you will think it is a bad idea, but I don't like to assume such things.

Larry
Your 3 would of course deserve its own subthread.

I look at it maybe from a different viewpoint than you do.
My viewpoint is: "what actually are we doing with nullmove?"

In principle we're doing a move. So we are replacing making a move by one of the worst possible moves.

In fact we replace our normal search by a refutation search carried out by the opponent and we do that RECURSIVE. So we just need to look at THIS position, not to the total search length. We need to look. How MUCH can we reduce THIS move, assuming it's the worst move in history?

So calculating a mathematical optimum there is very complicated, as we do some heavy assumptions and factual give the opponent the move.

Not possible in the same 'simple manner' like with reductions. This is a total different type of thing.

Yet in the end we simply replace a move by a search for the opponent.

If we're at search depth D, then nullmove gets carried out at D-R-1
Don't forget that '-1' as well.

In principle however we continue our normal search just at a reduced depth. We do this to basically prove how strong our position "at least" is.

So the first observation of course is that our R is going to be dependant upon how much we do our normal reductions.

If you take more risk there, then it makes sense to do that in nullmove as well.

Not that they are any similar, but let's put it this way, if we have some sort of crappy way of hashting, say we hash positions to 32 bits, it doesn't make sense to use ECC memory.

Now if i assume a reduction of 1 ply for the reductions.
We see there that we have at most

D - Red - 1 = D - 1 - 1 = D - 2

With R=2 for nullmove we get to:

D - R - 1 = D - 3

As for a given depth, we can risk reducing by 100% the same depth,
that means that we can double our reduction effort.

So that means D-4 is ok. Which is R=3 for nullmove.

Of course this is for the mainsearch, what happens the last few plies in chess engines - no one knows what is best to do there. That also changes each few years. In 80s they razored. Start 90s Ed Schroder had optimized it even further and just forward pruned even bigger search depths.

Then mid 90s with a tad more computing power it was only nullmove no nothing razoring, as nullmove picked up more tactics as well then and to quote Frans Morsch: "We don't have the system time to do any sophisticated pruning".

So i basically skip the last few plies in the above guessing.

Now if you are doing reductions of 2 ply.
that means effectively D - 2 - 1 = D-3

So i would then use R=5 everywhere for nullmove,
as R+1 == 6 which is double the 3 of D-3.

that would mean that you search 1 ply at:

1 = D - R - 1 => D = 2+R = 2+5 = 7 ply

So i'd design then some sort of superrazoring last so many plies, say last 4 plies and at plydepths 5..6 i'd do some fast tactics only nullmove, maybe just a sophisticated qsearch, to pick up some tactics.

From my viewpoint doing a nullmove R=2,3,4 doesn't make much sense if you are gonna reduce with a ply or 2 anyway (on top of the normal 1 ply reduction), as that means that just doing 2 moves reduced already reduces more than a nullmove, making nullmove pretty much worthless.

Of course it's unclear what to do last few plies.

Maybe works for you.

This answers your question?

What i intend to do with Diep is not such thin search for now. I intend to use R=3 everywhere with nullmove and reduce 1 ply and try to make an efficient parallel search on a cluster is yet another challenge from a shared memory box - if i lose too much to the compiler GCC and to the MPI overhead and to clumsy parallel overhead then already a 2 socket Sandy Bridge is gonna outsearch my 64 power efficient oldie L5420 Xeon cores.

Then just improve evaluation. My blindfolded guess was that if i can get a ply or 23 and improve evaluation bigtime, that's enough to win.

Vincent
Last edited by diep on Wed Mar 28, 2012 11:02 pm, edited 3 times in total.
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: When does a cut-node became an all-node?

Post by rbarreira »

diep wrote: Also all this selective searching can of course only work using my lemma of 90s, which was disputed at the time, that is that there is a tactical barrier above which only positional searching deeper is interesting.
What is this tactical barrier? Are you saying that beyond a certain search depth, the search will stop picking up "tactics"? So you're saying that the positions deep in the search tree don't have deep tactics waiting to be found any more?

That sounds strange and IMHO unlikely...
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: When does a cut-node became an all-node?

Post by lkaufman »

diep wrote: 2. It's not a proof of anything of course, pure mathematical seen. It's a simple calculation, yet quite lengthy. Realize very popular also back then was fractional plydepths, so when calculating the optimum we c an use broken depths as well. That makes it however a lineair programming problem which as we know is solvable even without complex numbers.

The answer to this problem is not 42 in this case, yet 1, assuming the normal reduction for the depthlimited search also is 1. So that makes the total reduction 2 in c ase of having areduction and 1 in case of not reducing or research. Note back in 1999 i didn't research, in case of a fail high.

Vincent
Not re-searching after a fail high is a huge difference. I wouldn't want to reduce more than one ply either in that case.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: When does a cut-node became an all-node?

Post by diep »

rbarreira wrote:
diep wrote: Also all this selective searching can of course only work using my lemma of 90s, which was disputed at the time, that is that there is a tactical barrier above which only positional searching deeper is interesting.
What is this tactical barrier? Are you saying that beyond a certain search depth, the search will stop picking up "tactics"? So you're saying that the positions deep in the search tree don't have deep tactics waiting to be found any more?

That sounds strange and IMHO unlikely...
The religion in the 90s, though not shared by all chess programmers,
a good example there is Don Dailey, was that every additional ply lineair scaled further tactical.

So a program searching 6 ply would always beat a 5 ply program,
just like a 17 ply program would beat with the exact same percentage
a 16 ply program.

Realize how difficult it was back then to test, so all sorts of lemma's that look silly nowadays, they were used by even some strong top programmers.

They said the next thing simply: "each year we won X elopoints".
Every ply would be 70 elopoints for example.

Now in 1997 we got like 10 ply or so. Some a bit more some a bit less.

So todays 30+ ply search depths then would be 20 * 70 = 1400 elo
above.

Software improvements not counted!!!!!
Just search depth based elo!!!!!

It was the time of the superbeancounters.

Basically there was only 2 voices against the above lineair scaling.
That was Don and that was me.

Known experts against, that's a list so big i can keep typing. Most well known as he did write an article on it, that was Ernst A Heinz.

With something dubious as the number of fail highs that crafty would get at a bigger plydepth he hoped to prove this lineair scaling. Entire names got invented.

Don did do some tests to prove that lineair scaling didn't exist.

I designed the tactical barrier for that.

I've never done a hard formulation on what it was, but i intended with it several things.

a) basically the observation that in grandmaster chess most tactics is a ply or 12+ and players not giving away material, games get mostly decided then not by tactics, yet by chessknowledge, positional patterns and strategy.

In short the notion that above a certain search depth improving the evaluation function is more important than getting yet another ply.

This above explanation is what i posted most back then.

I think i twas Bob who led the pack attacking that. He attacked it by saying that under no circumstance a better evaluation function was worth more than or equal to 2 ply of additional search depth.

Online back then games of engines getting 6 ply versus opponents getting 8, that was not a contest. 8 ply always won.

That would lineair scale also to bigger search depths. I denied that.

Nowadays no one is busy with that, as 30 ply engines play 17 ply engines and the 17 ply sometimes wins (not much though), which in that eloscaling wold be impossible of course as 30 ply would be 13 * 70 = 910 elopoints stronger, which means that you have a hard 0% chance to ever win a game, as you just can't test enough games for that luck once as that would need to happen long after the Sun has become a supernova.

The next deduced interpretation of that tactical barrier is the notion of course, that once your program picks up a lot of tactics, that it becomes more important to search positional moves than ONLY tactical moves - again i don't mean to say you should ditch all tactics, i just say it's less relevant.

Please see that editted 'only'. As the notion of the 80s and 90s was to do something to ONLY see tactics and ONLY pick up tactics. In my interpretation, tactics is one positional aspect of the game, an important one, but not the only aspect. Not more important when we search that deep than STRATEGICAL considerations. In the end majority of tactics, just like other positional plans (such as trying to move a knight from f3 to d5 via a manoeuvre) that is short term plans. Strategy is a LONG term plan.

I hope i write that down in understandable manner; as the idea and notion of it is total derived from myself playing chess of course.

I also use this explanation to explain why for Diep futility doesn't really work. Let's start saying that futility needs a pretty narrow window. You can't do futility at 50 pawns, that's not so useful. If we do it however at say 1 pawn margin, then with diep's huge evaluation it's nearly impossible to predict which moves suddenly need to get evaluated lazy and which do not.

futility then effectively has the effect that it gets rid in quite some positions the best positional move(s) meanwhile not losing the tactical moves.

So at testsets that scores a lot more elopoints, yet in playing games it loses more games as it plays a lot weaker.

Now most beancounters here have a real simple evaluation, the few exceptions such as passers are easy to put in the lazy consideration;
that means that the odds you miss good positional moves suddenly
is very small and explains why futility works for them.

I argue that for such programs futility would also work without being combined with other forms of selectivity, such as bigger R's for nullmoves and/or bigger selectivity.

This to refute the criticism of Tord that one needs to go through a few hills and valleys in order to get it to work.
Last edited by diep on Wed Mar 28, 2012 11:33 pm, edited 1 time in total.