Comparative nodes per second

Jose Carlos · Post by **Jose Carlos** » Wed Apr 18, 2012 8:16 am

Don wrote:
lucasart wrote:
Uri Blass wrote:Here is another example for the huge positional score of chess programs(probably also includes IvanHoe that I did not install in my computer and at least it includes robbolito0.09w32 and houdini that gives more than +4 for white at depth 1 when the program cannot see the promotion by search because it is hidden by a lot of checks)

Toga1.2.1a gives 1.17 for black at depth 1 but toga is inferior old program
and I believe that all the programs that are at the playing strength of Rybka3 or higher give a big advantage for white when they search to depth 1 inspite of nominal 6 pawns advantage by black.

[D]7k/PP5p/2PP3p/7p/7p/7p/6r1/K7 b - - 0 1
???
it's black's turn to play in your fen, and it's very obvious that black has an unstoppable passed pawn on h3. the eval should already return a big score for black without even a search!
These positions only illustrate what is already well known, that humans don't evaluate positions the same way that computers do but instead reason about the position. I would remind you that in this case it usually involves not just evaluation but look-ahead - you are looking ahead here to know that there will be a checkmate in 2 moves. You didn't process this in your brain statically.

Computers use a version of the same process and it's called null move search. We will never (in my opinion) have computers match the human brain and it may not even be desirable to try - computers are much different and trying to force them to do things the same way is like trying to force a human to run on all fours because it works for Cheeta's.

In your example of course we know that a computers static evaluation function is not very good at sorting out what is most relevant. That can be improved incrementally with a more sophisticated evaluation function for specific cases but a general solution is probably going to make programs a lot weaker - simply because it will slow the program down enormously and only be useful once in a while. Even the position you show is not a problem for any chess program doing 1 or 2 more ply - although it's easily understood that you can compose arbitrarily more and more difficult cases.

Hi. I'm experimenting with the concept of "search instability". I don't try to evaluate positions like the above one correctly, but I try to identify positions where the static eval is not reliable. I use it to shape the tree, cutting off soon on "boring" positions and extending the most complicated ones.

I don't have enough computer time to test the idea and report results, but at least it doesn't look bad at first sight.

bob · Post by **bob** » Thu Apr 19, 2012 11:53 pm

José Carlos wrote:
Don wrote:
lucasart wrote:
Uri Blass wrote:Here is another example for the huge positional score of chess programs(probably also includes IvanHoe that I did not install in my computer and at least it includes robbolito0.09w32 and houdini that gives more than +4 for white at depth 1 when the program cannot see the promotion by search because it is hidden by a lot of checks)

Toga1.2.1a gives 1.17 for black at depth 1 but toga is inferior old program
and I believe that all the programs that are at the playing strength of Rybka3 or higher give a big advantage for white when they search to depth 1 inspite of nominal 6 pawns advantage by black.

[D]7k/PP5p/2PP3p/7p/7p/7p/6r1/K7 b - - 0 1
???
it's black's turn to play in your fen, and it's very obvious that black has an unstoppable passed pawn on h3. the eval should already return a big score for black without even a search!
These positions only illustrate what is already well known, that humans don't evaluate positions the same way that computers do but instead reason about the position. I would remind you that in this case it usually involves not just evaluation but look-ahead - you are looking ahead here to know that there will be a checkmate in 2 moves. You didn't process this in your brain statically.

Computers use a version of the same process and it's called null move search. We will never (in my opinion) have computers match the human brain and it may not even be desirable to try - computers are much different and trying to force them to do things the same way is like trying to force a human to run on all fours because it works for Cheeta's.

In your example of course we know that a computers static evaluation function is not very good at sorting out what is most relevant. That can be improved incrementally with a more sophisticated evaluation function for specific cases but a general solution is probably going to make programs a lot weaker - simply because it will slow the program down enormously and only be useful once in a while. Even the position you show is not a problem for any chess program doing 1 or 2 more ply - although it's easily understood that you can compose arbitrarily more and more difficult cases.
Hi. I'm experimenting with the concept of "search instability". I don't try to evaluate positions like the above one correctly, but I try to identify positions where the static eval is not reliable. I use it to shape the tree, cutting off soon on "boring" positions and extending the most complicated ones.

I don't have enough computer time to test the idea and report results, but at least it doesn't look bad at first sight.

Here's an idea that has been mentioned, but never really researched. "panic time."

Hsu mentioned that DB did not just look at the score for the current iteration, but also looked at the shape of the tree. I think his idea was based on the fact that most trees get searched in a sort of predictable way, but every now and then, an iteration takes longer than expected. He used something from that sort of behavior to extend the time, even BEFORE the search failed low. I've never played around with it, but it is an interesting idea. Often, once you have failed low, it is too late. If you begin to see something going wrong, such as (maybe) the typical fail-high-on-first-move percentage dropping, that might give you an indication that something is going on here and needs more investigation, before you commit to a path that might turn out bad...

Don · Post by **Don** » Fri Apr 20, 2012 12:22 am

José Carlos wrote:
Don wrote:
lucasart wrote:
Uri Blass wrote:Here is another example for the huge positional score of chess programs(probably also includes IvanHoe that I did not install in my computer and at least it includes robbolito0.09w32 and houdini that gives more than +4 for white at depth 1 when the program cannot see the promotion by search because it is hidden by a lot of checks)

Toga1.2.1a gives 1.17 for black at depth 1 but toga is inferior old program
and I believe that all the programs that are at the playing strength of Rybka3 or higher give a big advantage for white when they search to depth 1 inspite of nominal 6 pawns advantage by black.

[D]7k/PP5p/2PP3p/7p/7p/7p/6r1/K7 b - - 0 1
???
it's black's turn to play in your fen, and it's very obvious that black has an unstoppable passed pawn on h3. the eval should already return a big score for black without even a search!
These positions only illustrate what is already well known, that humans don't evaluate positions the same way that computers do but instead reason about the position. I would remind you that in this case it usually involves not just evaluation but look-ahead - you are looking ahead here to know that there will be a checkmate in 2 moves. You didn't process this in your brain statically.

Computers use a version of the same process and it's called null move search. We will never (in my opinion) have computers match the human brain and it may not even be desirable to try - computers are much different and trying to force them to do things the same way is like trying to force a human to run on all fours because it works for Cheeta's.

In your example of course we know that a computers static evaluation function is not very good at sorting out what is most relevant. That can be improved incrementally with a more sophisticated evaluation function for specific cases but a general solution is probably going to make programs a lot weaker - simply because it will slow the program down enormously and only be useful once in a while. Even the position you show is not a problem for any chess program doing 1 or 2 more ply - although it's easily understood that you can compose arbitrarily more and more difficult cases.
Hi. I'm experimenting with the concept of "search instability". I don't try to evaluate positions like the above one correctly, but I try to identify positions where the static eval is not reliable. I use it to shape the tree, cutting off soon on "boring" positions and extending the most complicated ones.

You can think of the evaluation score as a win probability score which can be converted via a logistic formula. I have not yet tried to make this idea really work yet, but you can build uncertainty into your evaluation function by "pushing" the score towards zero. For example multiplying the score by a fraction such as 0.80 will bring it closer to zero. So the way you might attempt to apply this is that if you know you have some feature that is volatile, in other words it could easily go your way or against you, then there is great uncertainty in the outcome. Here is an example: Suppose you are a knight up, but BOTH sides have huge (and equal) king safety troubles. A knight up implies that you have nearly a certain win, but I don't think that is the case in the scenario I am describing.

That explanation is simplistic, but the concept is that chess programs deal with uncertainty primarily, we just don't think of it that way. Also, I'm not sure that an evaluation score of 1.5 pawns based on a deep search is the same as an evaluation score of 1.5 based on a leaf node at the end of a quies search. I'm not sure there is anything you can do about that but it seems important to me. Would you rather have a position that a strong program has searched to 30 ply and says is winning by 3 pawns or one that a quies search from the same program says is winning by 3 pawns?

A related issue is making your program fight for a win or draw in a lost position. Let's say you are down 1 pawn. I think that your opponents king safety is more important that it normally would be. If you are down 1 pawn you are probably going to lose assuming the evaluation is more or less accurate, you have serious trouble. I would think that your opponent king safety should be given different weight than your own so that the program is more likely to fight for the win. Conversely, if the program seems to be winning materially, you might want to be a little extra paranoid about your king.

I think these things tend to be linear in most programs, but probably they should not be. In fact this may partially solve the problem where programs put too much weight on things that are not very relevant - such as being up a couple of pawns but the pawns are so horrible that in reality you may not even have an advantage. So the concept could possibly be that certain evaluation features take on more weight (positive and negative) features depending on things that are often not directly correlated with those features.

I don't have enough computer time to test the idea and report results, but at least it doesn't look bad at first sight.

Jose Carlos · Post by **Jose Carlos** » Fri Apr 20, 2012 12:49 am

Don wrote:
José Carlos wrote:
Don wrote:
lucasart wrote:
Uri Blass wrote:Here is another example for the huge positional score of chess programs(probably also includes IvanHoe that I did not install in my computer and at least it includes robbolito0.09w32 and houdini that gives more than +4 for white at depth 1 when the program cannot see the promotion by search because it is hidden by a lot of checks)

Toga1.2.1a gives 1.17 for black at depth 1 but toga is inferior old program
and I believe that all the programs that are at the playing strength of Rybka3 or higher give a big advantage for white when they search to depth 1 inspite of nominal 6 pawns advantage by black.

[D]7k/PP5p/2PP3p/7p/7p/7p/6r1/K7 b - - 0 1
???
it's black's turn to play in your fen, and it's very obvious that black has an unstoppable passed pawn on h3. the eval should already return a big score for black without even a search!
These positions only illustrate what is already well known, that humans don't evaluate positions the same way that computers do but instead reason about the position. I would remind you that in this case it usually involves not just evaluation but look-ahead - you are looking ahead here to know that there will be a checkmate in 2 moves. You didn't process this in your brain statically.

Computers use a version of the same process and it's called null move search. We will never (in my opinion) have computers match the human brain and it may not even be desirable to try - computers are much different and trying to force them to do things the same way is like trying to force a human to run on all fours because it works for Cheeta's.

In your example of course we know that a computers static evaluation function is not very good at sorting out what is most relevant. That can be improved incrementally with a more sophisticated evaluation function for specific cases but a general solution is probably going to make programs a lot weaker - simply because it will slow the program down enormously and only be useful once in a while. Even the position you show is not a problem for any chess program doing 1 or 2 more ply - although it's easily understood that you can compose arbitrarily more and more difficult cases.
Hi. I'm experimenting with the concept of "search instability". I don't try to evaluate positions like the above one correctly, but I try to identify positions where the static eval is not reliable. I use it to shape the tree, cutting off soon on "boring" positions and extending the most complicated ones.
You can think of the evaluation score as a win probability score which can be converted via a logistic formula. I have not yet tried to make this idea really work yet, but you can build uncertainty into your evaluation function by "pushing" the score towards zero. For example multiplying the score by a fraction such as 0.80 will bring it closer to zero. So the way you might attempt to apply this is that if you know you have some feature that is volatile, in other words it could easily go your way or against you, then there is great uncertainty in the outcome. Here is an example: Suppose you are a knight up, but BOTH sides have huge (and equal) king safety troubles. A knight up implies that you have nearly a certain win, but I don't think that is the case in the scenario I am describing.

That explanation is simplistic, but the concept is that chess programs deal with uncertainty primarily, we just don't think of it that way. Also, I'm not sure that an evaluation score of 1.5 pawns based on a deep search is the same as an evaluation score of 1.5 based on a leaf node at the end of a quies search. I'm not sure there is anything you can do about that but it seems important to me. Would you rather have a position that a strong program has searched to 30 ply and says is winning by 3 pawns or one that a quies search from the same program says is winning by 3 pawns?

A related issue is making your program fight for a win or draw in a lost position. Let's say you are down 1 pawn. I think that your opponents king safety is more important that it normally would be. If you are down 1 pawn you are probably going to lose assuming the evaluation is more or less accurate, you have serious trouble. I would think that your opponent king safety should be given different weight than your own so that the program is more likely to fight for the win. Conversely, if the program seems to be winning materially, you might want to be a little extra paranoid about your king.

I think these things tend to be linear in most programs, but probably they should not be. In fact this may partially solve the problem where programs put too much weight on things that are not very relevant - such as being up a couple of pawns but the pawns are so horrible that in reality you may not even have an advantage. So the concept could possibly be that certain evaluation features take on more weight (positive and negative) features depending on things that are often not directly correlated with those features.

I don't have enough computer time to test the idea and report results, but at least it doesn't look bad at first sight.

Of course you use the correct word (uncertainity) that I missed in english. I'm trying to apply uncertainity somehow different than what you describe (which is, indeed, very interesting and worth a thought).

I have 3 levels of uncertainity: low, medium and high. My eval asseses the uncertainity level of a position by considering several factors. For example, if many pieces are attacked at the same time, the uncertainity is high because it'll take several plies to discover the underlaying tactics. King safety, pawn races in endings, attacked pieces... are some of the elements I examine for determining the uncertainity of the position.

When, during search, I find a position with uncertainity low, I wildly reduce or even prune. If, OTOH, the uncertainity is high, I try to make sure I search deep enough to resolve the tactics.

That makes for a very unbalanced tree, which is hard to understand. As I said, I don't have the resources to verify if the approach works or not, but at least it's funny to play with.

bob · Post by **bob** » Fri Apr 20, 2012 1:21 am

Don wrote:
José Carlos wrote:
Don wrote:
lucasart wrote:
Uri Blass wrote:Here is another example for the huge positional score of chess programs(probably also includes IvanHoe that I did not install in my computer and at least it includes robbolito0.09w32 and houdini that gives more than +4 for white at depth 1 when the program cannot see the promotion by search because it is hidden by a lot of checks)

Toga1.2.1a gives 1.17 for black at depth 1 but toga is inferior old program
and I believe that all the programs that are at the playing strength of Rybka3 or higher give a big advantage for white when they search to depth 1 inspite of nominal 6 pawns advantage by black.

[D]7k/PP5p/2PP3p/7p/7p/7p/6r1/K7 b - - 0 1
???
it's black's turn to play in your fen, and it's very obvious that black has an unstoppable passed pawn on h3. the eval should already return a big score for black without even a search!
These positions only illustrate what is already well known, that humans don't evaluate positions the same way that computers do but instead reason about the position. I would remind you that in this case it usually involves not just evaluation but look-ahead - you are looking ahead here to know that there will be a checkmate in 2 moves. You didn't process this in your brain statically.

Computers use a version of the same process and it's called null move search. We will never (in my opinion) have computers match the human brain and it may not even be desirable to try - computers are much different and trying to force them to do things the same way is like trying to force a human to run on all fours because it works for Cheeta's.

In your example of course we know that a computers static evaluation function is not very good at sorting out what is most relevant. That can be improved incrementally with a more sophisticated evaluation function for specific cases but a general solution is probably going to make programs a lot weaker - simply because it will slow the program down enormously and only be useful once in a while. Even the position you show is not a problem for any chess program doing 1 or 2 more ply - although it's easily understood that you can compose arbitrarily more and more difficult cases.
Hi. I'm experimenting with the concept of "search instability". I don't try to evaluate positions like the above one correctly, but I try to identify positions where the static eval is not reliable. I use it to shape the tree, cutting off soon on "boring" positions and extending the most complicated ones.
You can think of the evaluation score as a win probability score which can be converted via a logistic formula. I have not yet tried to make this idea really work yet, but you can build uncertainty into your evaluation function by "pushing" the score towards zero. For example multiplying the score by a fraction such as 0.80 will bring it closer to zero. So the way you might attempt to apply this is that if you know you have some feature that is volatile, in other words it could easily go your way or against you, then there is great uncertainty in the outcome. Here is an example: Suppose you are a knight up, but BOTH sides have huge (and equal) king safety troubles. A knight up implies that you have nearly a certain win, but I don't think that is the case in the scenario I am describing.

That explanation is simplistic, but the concept is that chess programs deal with uncertainty primarily, we just don't think of it that way. Also, I'm not sure that an evaluation score of 1.5 pawns based on a deep search is the same as an evaluation score of 1.5 based on a leaf node at the end of a quies search. I'm not sure there is anything you can do about that but it seems important to me. Would you rather have a position that a strong program has searched to 30 ply and says is winning by 3 pawns or one that a quies search from the same program says is winning by 3 pawns?

A related issue is making your program fight for a win or draw in a lost position. Let's say you are down 1 pawn. I think that your opponents king safety is more important that it normally would be. If you are down 1 pawn you are probably going to lose assuming the evaluation is more or less accurate, you have serious trouble. I would think that your opponent king safety should be given different weight than your own so that the program is more likely to fight for the win. Conversely, if the program seems to be winning materially, you might want to be a little extra paranoid about your king.

I think these things tend to be linear in most programs, but probably they should not be. In fact this may partially solve the problem where programs put too much weight on things that are not very relevant - such as being up a couple of pawns but the pawns are so horrible that in reality you may not even have an advantage. So the concept could possibly be that certain evaluation features take on more weight (positive and negative) features depending on things that are often not directly correlated with those features.

I don't have enough computer time to test the idea and report results, but at least it doesn't look bad at first sight.

Someone has previously suggested, although I have not given it much thought, that I could produce a pretty good eval -> winning percentage formula on my cluster stuff. Would be interesting, particularly when looking at the variance to see if one eval is very accurate, while another is much more jittery...

Something to do one day. I probably should fix my referee program so that it will insert the evals into the PGN during a cluster run, but that then requires that I modify every program I test with to have them send me the eval, and a few other issues to boot. But it would be pretty interesting and would be amusing to produce scores purely between -1.0 and +1.0.

Then Richardson would have something else to complain about, he already hates my white POV output.

mcostalba · Post by **mcostalba** » Fri Apr 20, 2012 7:18 am

bob wrote: Someone has previously suggested, although I have not given it much thought, that I could produce a pretty good eval -> winning percentage formula on my cluster stuff.

This eval <-> winning percentage stuff is really misleading. I don't know from where it came out but a lot of people (normally chess players more then programmers) blindly believes in this fetish. Eval score has no meaning taken as an absolute single number. It has a sense only considered relative to another eval score from another position. So it is just a part of a pair use to compare positions but what counts for the engine search is the pair.

Don said instead an interesting thing, that the same eval is not the same if returned upon searching the leafs or high in the tree.

Michel · Post by **Michel** » Fri Apr 20, 2012 8:10 am

This eval <-> winning percentage stuff is really misleading. I don't know from where it came out but a lot of people (normally chess players more then programmers) blindly believes in this fetish.

Well it is not a function, but there is a correlation. Nothing fetish about that.

Don · Post by **Don** » Fri Apr 20, 2012 12:57 pm

mcostalba wrote:
bob wrote: Someone has previously suggested, although I have not given it much thought, that I could produce a pretty good eval -> winning percentage formula on my cluster stuff.
This eval <-> winning percentage stuff is really misleading. I don't know from where it came out but a lot of people (normally chess players more then programmers) blindly believes in this fetish. Eval score has no meaning taken as an absolute single number. It has a sense only considered relative to another eval score from another position. So it is just a part of a pair use to compare positions but what counts for the engine search is the pair.

Converting to a win probability is an ill defined concept because win probability is all about your opponent and how he plays and how you play etc... In fact the same position where one side has an advantage has different win probabilities for a pair of strong players vs a pair of weak players in many cases because the advantage may exist only if you have the skill to exploit it.

But it's no worse a concept than eval score and it does attempt to attach a reasonable meaning to it, after all everyone know that the bigger your advantage the better wining chances you have, all other things being equal.

I think you could even improve your chess program by considering this a win probability instead of a score - or at least thinking about it that way. You could take thousands or even millions of positions and index them by type (perhaps by material signature) and track the actual win percentage and you may be able to prove that your program has problems when the actual win percentage does not match the evaluation function converted to win percentage. For a silly example, imagine that your program has the sign wrong for the bishop pair bonus, a serious mis-evaluation issue. If you were to plot on a graph all the positions where one side had a bishop pair advantage, you might find that the win percentage predicted is much lower than the win percentage actually achieved, but only in these positions! That would be a clear sign there was an evaluation problem I would think.

I think it might require a lot of games in order to deal with a lot of noise that would surely come from positional features that did not appear that often but it might be a way to tune a chess program evaluation function.

Don said instead an interesting thing, that the same eval is not the same if returned upon searching the leafs or high in the tree.

But as I said, I don't know what to do about that! If you modify the score to account for that you in a feedback loop, you would change the root score. If a system could be worked out I believe there would be many opportunities based on things going on in the tree.

michiguel · Post by **michiguel** » Fri Apr 20, 2012 1:35 pm

Michel wrote:
This eval <-> winning percentage stuff is really misleading. I don't know from where it came out but a lot of people (normally chess players more then programmers) blindly believes in this fetish.
Well it is not a function, but there is a correlation. Nothing fetish about that.

Correct.

Moreover, alfa-beta implies the search for a higher probability to win and/or closeness to it, given by a certain score that represents it. The "eval" score is an approximation to it, by definition. What is important is the variation of the probabilities (delta-P), which is related to delta-Ev. This is like Energy, it is irrelevant what the value is, the importance is the difference between two conditions or states.

The availability or lack thereof a good function that relates them is a different story, but we know that the higher one, the higher the other one.

Miguel

Don · Post by **Don** » Fri Apr 20, 2012 2:31 pm

mcostalba wrote:
bob wrote: Someone has previously suggested, although I have not given it much thought, that I could produce a pretty good eval -> winning percentage formula on my cluster stuff.
This eval <-> winning percentage stuff is really misleading. I don't know from where it came out but a lot of people (normally chess players more then programmers) blindly believes in this fetish. Eval score has no meaning taken as an absolute single number. It has a sense only considered relative to another eval score from another position.

P.S. It's my opinion that the biggest problem in computer chess is the fact that even though programs are good at comparing similar positions, they are much less capable of comparing positions that are significantly different and even worse at comparing positions that are unbalanced. This is another way of saying that they do not have very good evaluation functions. So really a single number should be all that is required because an evaluation function should be transitive. In chess programs they are not, simply because chess programs are broken with respect to evaluation.

With a properly transitive evaluation function you should be able to compare any 2 positions, even if they are totally different, and determine which one is "best." But in practice that doesn't work very well. If you have a choice of two ways to proceed that lead to 2 completely different kinds of positions your chess program probably doesn't have a clue which is better - unless one is significantly better than the other.

Part of the reason I like the logistic function is that it imposes a definite meaning to a score, or at least it attempts to. What does a pawn up really mean? You are never really a pawn up, just giving up a pawn usually gives you some advantage, even if it's not enough. If you watch 2 master player and one is pawn up or down after just a few moves, he probably has (at least some) compensation. Thus you sometimes hear the phrase, "a pawn down but with compensation" or "equal chances." But as chess programs have improved over the years you will notice that they are not quite as materialistic as they used to be. Komodo and most other good programs when given a gambit position to analyse will return a score fairly close to zero. So at least we are starting to think more in terms of positional chess and not head count.

Very often you will find that 2 chess programs have different "scales" when it comes to scoring positions, one tends to be more aggressive about scoring than the other. Where Komodo thinks 25 centipawns, Stockfish thinks 40 or 50 centipawns for example. We could standardize the meaning of 1 centi-pawn for each program by applying a simple calibration function - a few hundred test games could easily do this.

What I suggest however is not just something superficial, it should really be the way we think about an evaluation score. For example, if your program says you are up 100 centi-pawns does that mean the same thing in the ending as in the opening? If not, you have a source of error and your program will be lousy at comparing these two positions, thinking they are they equally good when they are not.

So it is just a part of a pair use to compare positions but what counts for the engine search is the pair.

Don said instead an interesting thing, that the same eval is not the same if returned upon searching the leafs or high in the tree.

Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second