Comparative nodes per second

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Comparative nodes per second

Post by lucasart »

Uri Blass wrote:Here is another example for the huge positional score of chess programs(probably also includes IvanHoe that I did not install in my computer and at least it includes robbolito0.09w32 and houdini that gives more than +4 for white at depth 1 when the program cannot see the promotion by search because it is hidden by a lot of checks)

Toga1.2.1a gives 1.17 for black at depth 1 but toga is inferior old program
and I believe that all the programs that are at the playing strength of Rybka3 or higher give a big advantage for white when they search to depth 1 inspite of nominal 6 pawns advantage by black.

[D]7k/PP5p/2PP3p/7p/7p/7p/6r1/K7 b - - 0 1
???
it's black's turn to play in your fen, and it's very obvious that black has an unstoppable passed pawn on h3. the eval should already return a big score for black without even a search!
Uri Blass
Posts: 10297
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Comparative nodes per second

Post by Uri Blass »

lucasart wrote:
Uri Blass wrote:Here is another example for the huge positional score of chess programs(probably also includes IvanHoe that I did not install in my computer and at least it includes robbolito0.09w32 and houdini that gives more than +4 for white at depth 1 when the program cannot see the promotion by search because it is hidden by a lot of checks)

Toga1.2.1a gives 1.17 for black at depth 1 but toga is inferior old program
and I believe that all the programs that are at the playing strength of Rybka3 or higher give a big advantage for white when they search to depth 1 inspite of nominal 6 pawns advantage by black.

[D]7k/PP5p/2PP3p/7p/7p/7p/6r1/K7 b - - 0 1
???
it's black's turn to play in your fen, and it's very obvious that black has an unstoppable passed pawn on h3. the eval should already return a big score for black without even a search!
White has more advanced pawns in the 7th so black pawn in the 6th with turn to move does not help to prevent promotion with check.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Comparative nodes per second

Post by lucasart »

Uri Blass wrote:
lucasart wrote:
Uri Blass wrote:Here is another example for the huge positional score of chess programs(probably also includes IvanHoe that I did not install in my computer and at least it includes robbolito0.09w32 and houdini that gives more than +4 for white at depth 1 when the program cannot see the promotion by search because it is hidden by a lot of checks)

Toga1.2.1a gives 1.17 for black at depth 1 but toga is inferior old program
and I believe that all the programs that are at the playing strength of Rybka3 or higher give a big advantage for white when they search to depth 1 inspite of nominal 6 pawns advantage by black.

[D]7k/PP5p/2PP3p/7p/7p/7p/6r1/K7 b - - 0 1
???
it's black's turn to play in your fen, and it's very obvious that black has an unstoppable passed pawn on h3. the eval should already return a big score for black without even a search!
White has more advanced pawns in the 7th so black pawn in the 6th with turn to move does not help to prevent promotion with check.
What was happening is that my eval detects the unstoppable passed pawn on h3, but doesn't understand that black's rok will never be able to stop white's pawns, evem w/o any king interaction involved. it made me thing that I could code a rule where connected passed pawns can't be stopped by a rook if they're advanced enough and the king can't stop them
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Comparative nodes per second

Post by Don »

You guys need to learn how to next quotes :-)
bob wrote:
lkaufman wrote:
bob wrote:[ Have you actually tested "don't try null move if behind in material"? I would expect that to be better than what you are doing now.
Did you read my testing post? I actually tested "if eval < beta don't do a null move". I also tried a "fast eval". And a material only. Made absolutely no difference. I'll dig up the results and post when a server at the office gets rebooted so I can get in...
I didn't realize you had tested material only. That should avoid eval calls and so the benefit should be free. Do you use precise material values for this test or just 1-3-3-5-9? In this case the crude numbers should be best, because if you use precise values the side slightly behind in material (say bishop for knight for example) may actually be ahead in true eval. Using the crude numbers means that in general you would do null move unless a side is down a full pawn, which usually means his true eval is negative. If you didn't try this, I recommend doing so. I can hardly see how this could fail to help at least slightly.
Not quite sure what you mean by "precise values." But what I used was simply the material score that is incrementally maintained, where things like pawn=100, knight=325, rook=500 and queen=1050, our cluster-optimized material values. Nothing else is added. For the "fast eval test" I added in PST values and a few other things, but did not do the major piece loops where most of the time is spent in eval...[/quote]
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Comparative nodes per second

Post by Don »

lucasart wrote:
Uri Blass wrote:Here is another example for the huge positional score of chess programs(probably also includes IvanHoe that I did not install in my computer and at least it includes robbolito0.09w32 and houdini that gives more than +4 for white at depth 1 when the program cannot see the promotion by search because it is hidden by a lot of checks)

Toga1.2.1a gives 1.17 for black at depth 1 but toga is inferior old program
and I believe that all the programs that are at the playing strength of Rybka3 or higher give a big advantage for white when they search to depth 1 inspite of nominal 6 pawns advantage by black.

[D]7k/PP5p/2PP3p/7p/7p/7p/6r1/K7 b - - 0 1
???
it's black's turn to play in your fen, and it's very obvious that black has an unstoppable passed pawn on h3. the eval should already return a big score for black without even a search!
These positions only illustrate what is already well known, that humans don't evaluate positions the same way that computers do but instead reason about the position. I would remind you that in this case it usually involves not just evaluation but look-ahead - you are looking ahead here to know that there will be a checkmate in 2 moves. You didn't process this in your brain statically.

Computers use a version of the same process and it's called null move search. We will never (in my opinion) have computers match the human brain and it may not even be desirable to try - computers are much different and trying to force them to do things the same way is like trying to force a human to run on all fours because it works for Cheeta's.

In your example of course we know that a computers static evaluation function is not very good at sorting out what is most relevant. That can be improved incrementally with a more sophisticated evaluation function for specific cases but a general solution is probably going to make programs a lot weaker - simply because it will slow the program down enormously and only be useful once in a while. Even the position you show is not a problem for any chess program doing 1 or 2 more ply - although it's easily understood that you can compose arbitrarily more and more difficult cases.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Uri Blass
Posts: 10297
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Comparative nodes per second

Post by Uri Blass »

Don wrote:
lucasart wrote:
Uri Blass wrote:Here is another example for the huge positional score of chess programs(probably also includes IvanHoe that I did not install in my computer and at least it includes robbolito0.09w32 and houdini that gives more than +4 for white at depth 1 when the program cannot see the promotion by search because it is hidden by a lot of checks)

Toga1.2.1a gives 1.17 for black at depth 1 but toga is inferior old program
and I believe that all the programs that are at the playing strength of Rybka3 or higher give a big advantage for white when they search to depth 1 inspite of nominal 6 pawns advantage by black.

[D]7k/PP5p/2PP3p/7p/7p/7p/6r1/K7 b - - 0 1
???
it's black's turn to play in your fen, and it's very obvious that black has an unstoppable passed pawn on h3. the eval should already return a big score for black without even a search!
These positions only illustrate what is already well known, that humans don't evaluate positions the same way that computers do but instead reason about the position. I would remind you that in this case it usually involves not just evaluation but look-ahead - you are looking ahead here to know that there will be a checkmate in 2 moves. You didn't process this in your brain statically.

Computers use a version of the same process and it's called null move search. We will never (in my opinion) have computers match the human brain and it may not even be desirable to try - computers are much different and trying to force them to do things the same way is like trying to force a human to run on all fours because it works for Cheeta's.

In your example of course we know that a computers static evaluation function is not very good at sorting out what is most relevant. That can be improved incrementally with a more sophisticated evaluation function for specific cases but a general solution is probably going to make programs a lot weaker - simply because it will slow the program down enormously and only be useful once in a while. Even the position you show is not a problem for any chess program doing 1 or 2 more ply - although it's easily understood that you can compose arbitrarily more and more difficult cases.
I think that the evaluation function of top programs in this position is very good and they evaluate it even after searching to depth 1 when they cannot see the promotion of white(because black has many checks) as advantage for white.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Comparative nodes per second

Post by Don »

Uri Blass wrote:
Don wrote:
lucasart wrote:
Uri Blass wrote:Here is another example for the huge positional score of chess programs(probably also includes IvanHoe that I did not install in my computer and at least it includes robbolito0.09w32 and houdini that gives more than +4 for white at depth 1 when the program cannot see the promotion by search because it is hidden by a lot of checks)

Toga1.2.1a gives 1.17 for black at depth 1 but toga is inferior old program
and I believe that all the programs that are at the playing strength of Rybka3 or higher give a big advantage for white when they search to depth 1 inspite of nominal 6 pawns advantage by black.

[D]7k/PP5p/2PP3p/7p/7p/7p/6r1/K7 b - - 0 1
???
it's black's turn to play in your fen, and it's very obvious that black has an unstoppable passed pawn on h3. the eval should already return a big score for black without even a search!
These positions only illustrate what is already well known, that humans don't evaluate positions the same way that computers do but instead reason about the position. I would remind you that in this case it usually involves not just evaluation but look-ahead - you are looking ahead here to know that there will be a checkmate in 2 moves. You didn't process this in your brain statically.

Computers use a version of the same process and it's called null move search. We will never (in my opinion) have computers match the human brain and it may not even be desirable to try - computers are much different and trying to force them to do things the same way is like trying to force a human to run on all fours because it works for Cheeta's.

In your example of course we know that a computers static evaluation function is not very good at sorting out what is most relevant. That can be improved incrementally with a more sophisticated evaluation function for specific cases but a general solution is probably going to make programs a lot weaker - simply because it will slow the program down enormously and only be useful once in a while. Even the position you show is not a problem for any chess program doing 1 or 2 more ply - although it's easily understood that you can compose arbitrarily more and more difficult cases.
I think that the evaluation function of top programs in this position is very good and they evaluate it even after searching to depth 1 when they cannot see the promotion of white(because black has many checks) as advantage for white.
Exactly. But of course this was a poorly designed position if it's purpose was to illustrate the concept of being able to reason about a position and focus on only the most relevant. There are positions that requires 40 or more moves to mate which humans can solve by reasoning about the position - one of those would be a good example of what computers DO NOT do well. The most practical examples are covered by good chess programs, positions like lone Knight vs King is a draw - it's a rule we all know, but it can be reasoned about, the knight can never mate the king, the lone king can never mater the other king, so it's a good example even though it happens to be so common we can make a rule that easy to apply to a chess program.

A really practical example is something that ALL programs handle very poorly and it's not a rare thing - completely locked pawn structure - super closed positions. You could have an extra queen or two and it's still a draw if you cannot get to the other side of the board. Even that could be detected and scored correctly in a program by reasoning about whether one of the opponent pawns could ever be attacked to open up the position, but what happens if there is one last hope but it's all easily covered? One open file, but all intrusion points of the superior side are covered in all cases? It's not easy to detect that case and so being up a knight might be meaningless advantage if the knight can get it but be taken and other intrusions are still just as impossible.

So closed positions in general are still a serious problem for chess programs and knowing who has the advantage and what must be done to make progress is a hard problem. A few more ply will help only a little, we need better positional heuristics.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Comparative nodes per second

Post by bob »

lkaufman wrote:
bob wrote:
lkaufman wrote:
bob wrote: If I recall correctly, just checking the material score actually hurt the elo by something like -20. The lazy eval test hurt by something like -7, and the full eval was a pure wash, no change...

I did leave my other constraints in place of course...
Of course using just material score or even lazy score will hurt if you don't use a margin, because there is a high probability that the static score is above beta if the lazy score is just slightly below beta. Or do you mean that you used lazy eval with the lazy margin subtracted from beta for the comparison? That would be the right way to do it. I'm pretty sure that the reason full eval didn't work for you is just the overhead of the eval, which doesn't apply to us as we need the score anyway for other uses. If you got a zero result despite the cost, then obviously the idea would help if there were no cost.
Yes to the margin idea. But these were pretty fast games. I tried 4 different margins and reported the results for the best one.
Just out of curiosity, what were the best margins for the two cases (lazy and material)?

Since you found that the cost of doing the eval exactly balanced out the benefit, this suggests that you should try using this restriction only when depth is not too low, perhaps above your 4 plies of futility. The cost becomes insignificant if you are far enough from the leaves, but I don't think the same holds for the benefit.
I can give you the margins, but it will be misleading. In Crafty, we have three pieces of the eval.

1. material + a few things, followed by a potential lazy exit using a fairly wide margin.

2. material + a few things + pawn/passed pawn evaluation + things like trapped piece and such. Followed by a second potential lazy exit with a narrower margin.

3. full eval which includes everything above + the individual piece scoring terms and king safety...

The margin for my "lazy eval" would certainly be different (and misleading, hence my hesitance to post it only to have someone try it and fail) because of what my "lazy" eval includes, vs what someone else might include...

I will try the depth limit to see if it changes anything. Just finished the 3 tests at longer games (60s + 1s) and there was no change. Standard 22.5R22 is still the best, R24 was 1 elo worse (but within +/-4 so I consider them equal, and the material only version was -15 worse.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Comparative nodes per second

Post by bob »

lkaufman wrote:
bob wrote:
lkaufman wrote:
bob wrote:[ Have you actually tested "don't try null move if behind in material"? I would expect that to be better than what you are doing now.
Did you read my testing post? I actually tested "if eval < beta don't do a null move". I also tried a "fast eval". And a material only. Made absolutely no difference. I'll dig up the results and post when a server at the office gets rebooted so I can get in...
I didn't realize you had tested material only. That should avoid eval calls and so the benefit should be free. Do you use precise material values for this test or just 1-3-3-5-9? In this case the crude numbers should be best, because if you use precise values the side slightly behind in material (say bishop for knight for example) may actually be ahead in true eval. Using the crude numbers means that in general you would do null move unless a side is down a full pawn, which usually means his true eval is negative. If you didn't try this, I recommend doing so. I can hardly see how this could fail to help at least slightly.
Not quite sure what you mean by "precise values." But what I used was simply the material score that is incrementally maintained, where things like pawn=100, knight=325, rook=500 and queen=1050, our cluster-optimized material values. Nothing else is added. For the "fast eval test" I added in PST values and a few other things, but did not do the major piece loops where most of the time is spent in eval...
I was mainly wondering whether you used identical values for bishop and knight or not. But this won't matter if you used a decent-sized margin. By the way, do you treat the bishop pair as part of the material value?[/quote]

No. Bishop pair is recognized in the bishop evaluation which is after the stage-2 lazy exit test. Normal material values do have B=N=325 however...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Comparative nodes per second

Post by bob »

Don wrote:You guys need to learn how to next quotes :-)
bob wrote:
lkaufman wrote:
bob wrote:[ Have you actually tested "don't try null move if behind in material"? I would expect that to be better than what you are doing now.
Did you read my testing post? I actually tested "if eval < beta don't do a null move". I also tried a "fast eval". And a material only. Made absolutely no difference. I'll dig up the results and post when a server at the office gets rebooted so I can get in...
I didn't realize you had tested material only. That should avoid eval calls and so the benefit should be free. Do you use precise material values for this test or just 1-3-3-5-9? In this case the crude numbers should be best, because if you use precise values the side slightly behind in material (say bishop for knight for example) may actually be ahead in true eval. Using the crude numbers means that in general you would do null move unless a side is down a full pawn, which usually means his true eval is negative. If you didn't try this, I recommend doing so. I can hardly see how this could fail to help at least slightly.
Not quite sure what you mean by "precise values." But what I used was simply the material score that is incrementally maintained, where things like pawn=100, knight=325, rook=500 and queen=1050, our cluster-optimized material values. Nothing else is added. For the "fast eval test" I added in PST values and a few other things, but did not do the major piece loops where most of the time is spent in eval...
[/quote]

I despise this system of quoting. :)