lech wrote:This text is a try to show why machines (computers) are not able to solve many positions.
Machines will do all. Only wrong software can interfere with theirs good work.
Diagram 1:
[d]3N2r1/2K1p3/4Pk2/8/Bp5b/8/2P5/8 b - - 0 1
It is a simple endgame position with 10 pieces.
Welcome to the wonderful world of selective search! The holy grail of computer chess is knowing which moves to prune and which moves not to prune so if you can figure that out, please let us know
Your position illustrates that computers still play far from perfect. Every once in a while someone asks how close are computer to playing perfect chess and the answer is "not very."
But don't forget to complete the statement...
"not very, but much closer to it than the best humans are."
I have estimated computers to be approximately 1000 ELO away but that is a wild guess based on nothing substantial. I think there are probably ways to get rough estimates by doing rating studies with computers and trying to fit a curve. It would be hard to prove anything though. I'm doing one right now for a presentation I am going to give at MIT - where each player searches 2x the number of nodes and I have 14 players (so far) starting with a player that does 512 nodes with Komodo. This gets up to almost human speed chess levels. I stagger the players so that I'm not playing horrible mismatches - so nobody plays more than 3 levels down or up. Each player plays about 4000 games the way I have it set up. I will post a graph later for everyone to see. I hope to extend this to at least 24.
Player 9 does 2^9 nodes, Player 22 does 2^22 nodes and so on.
I would like to see a similar study done with Crafty - with your resources you could get some data pretty quickly. I'm running this on a 2 core slow laptop, but I will transfer the study to my 6 core machine after the current section (19-22) completes in a few days. I'm using fixed nodes and I stop searching exactly at the point when the specified node is counted.
I guess the reason is that engines are not primarily build to solve positions but to be strong in games.
So what is this position, Black is ahead in material and the winning path leads through sacrificing some material and entering a endgame with opposite colored bishops (which is drawish even if one side is a pawn, maybe two ahead).
So it is usually sound for an engine to avoid going this path and looking at other alternatives first.
Thomas...
"Middlegames where each side has an opposite colored Bishop are more likely won; endgames with opposite colored Bishops are difficult to win, even if you are ahead a pawn - or even possibly two."
Don wrote:
I have estimated computers to be approximately 1000 ELO away but that is a wild guess based on nothing substantial. I think there are probably ways to get rough estimates by doing rating studies with computers and trying to fit a curve. It would be hard to prove anything though. I'm doing one right now for a presentation I am going to give at MIT - where each player searches 2x the number of nodes and I have 14 players (so far) starting with a player that does 512 nodes with Komodo. This gets up to almost human speed chess levels. I stagger the players so that I'm not playing horrible mismatches - so nobody plays more than 3 levels down or up. Each player plays about 4000 games the way I have it set up. I will post a graph later for everyone to see. I hope to extend this to at least 24.
Player 9 does 2^9 nodes, Player 22 does 2^22 nodes and so on.
I would like to see a similar study done with Crafty - with your resources you could get some data pretty quickly. I'm running this on a 2 core slow laptop, but I will transfer the study to my 6 core machine after the current section (19-22) completes in a few days. I'm using fixed nodes and I stop searching exactly at the point when the specified node is counted.
Don
Don,
I am very interested in the fixed node testing you are doing. I am doing time odds testing right now to see the gain in Elo when thinking time is doubled. I am approximating this by doubling the time controls (not all engines obey time per move command). Here are my results so far:
Base Time Control = 6 seconds + 100 ms
(#) represents the multiple of the time control used. Thus, Crafty_23.4(8) stands for Crafty 23.4 playing with a time control of 48 seconds + 800ms.
One problem is that it's difficult to get large samples. I want at least a few thousand games at each level so that I'm not off by more than a few ELO.
Another problem is that with a massive round robin the majority of games are just wasted CPU resources. It's not sensible to play Komodo 512 nodes versus Komodo 1 million nodes as the score is likely to be something like 99.99 for komodo 1 million. A possible way to deal with that is with a series of Swiss systems, but even that will spend resources on serious mismatches. Swiss with accelerated pairings might be better.
The way I'm handling it is to just not match up players more than 3 doublings apart.
I used this 3 or 4 years ago to measure the scalability of Komodo by comparing to gluarung and found problems. Komodo looked quite strong at hyper fast levels in comparison to glaurung (at the time) but when I plotted the rating curve of glaurung and komodo using time/elo as my two axis I saw gluarung was improving with time much faster than Komodo. I was getting rather discouraged until I discovered that the difference was king safety - which I had not yet implemented in Komodo. It was a big surprise to me but king safety immediately change the shape of the curve in a pretty dramatic way. One of the big secrets of computer chess is that it's all about the evaluation function, not the search. However the search still has to be top notch - you cannot cover everything with evaluation.
The deep blue team discovered this too many years ago. They initially chose to emphasize speed at all costs figuring that extra depth would solve anything. That is true in a very general sense but they underestimated the power of evaluation. I think Hsu said something to the effect that search multiplies the power of the evaluation function, so a small improvement in evaluation is multiplied with depth.
Adam Hair wrote:
Don wrote:
I have estimated computers to be approximately 1000 ELO away but that is a wild guess based on nothing substantial. I think there are probably ways to get rough estimates by doing rating studies with computers and trying to fit a curve. It would be hard to prove anything though. I'm doing one right now for a presentation I am going to give at MIT - where each player searches 2x the number of nodes and I have 14 players (so far) starting with a player that does 512 nodes with Komodo. This gets up to almost human speed chess levels. I stagger the players so that I'm not playing horrible mismatches - so nobody plays more than 3 levels down or up. Each player plays about 4000 games the way I have it set up. I will post a graph later for everyone to see. I hope to extend this to at least 24.
Player 9 does 2^9 nodes, Player 22 does 2^22 nodes and so on.
I would like to see a similar study done with Crafty - with your resources you could get some data pretty quickly. I'm running this on a 2 core slow laptop, but I will transfer the study to my 6 core machine after the current section (19-22) completes in a few days. I'm using fixed nodes and I stop searching exactly at the point when the specified node is counted.
Don
Don,
I am very interested in the fixed node testing you are doing. I am doing time odds testing right now to see the gain in Elo when thinking time is doubled. I am approximating this by doubling the time controls (not all engines obey time per move command). Here are my results so far:
Base Time Control = 6 seconds + 100 ms
(#) represents the multiple of the time control used. Thus, Crafty_23.4(8) stands for Crafty 23.4 playing with a time control of 48 seconds + 800ms.
IMO this position is far from being a trivial win:
- Opposite colored bishop endings have a strong drawing tendency
- Passed pawn is still in the starting square and even worse is blocked by his own king
It's true that the position of defensive king is ugly, but can you be 100% sure just by looking at position (not calculating deeply) that this is enough to guarantee win???
I've seen a lot of games played by 2000-2200 ELO players where stronger side has a clear lead in material (like diag 1) and then stronger side goes for simplifying sacrifise which leads to optically good looking, but in fact drawn end game.
IMO this position is far from being a trivial win:
- Opposite colored bishop endings have a strong drawing tendency
- Passed pawn is still in the starting square and even worse is blocked by his own king
It's true that the position of defensive king is ugly, but can you be 100% sure just by looking at position (not calculating deeply) that this is enough to guarantee win???
I've seen a lot of games played by 2000-2200 ELO players where stronger side has a clear lead in material (like diag 1) and then stronger side goes for simplifying sacrifise which leads to optically good looking, but in fact drawn end game.
I have the same thoughts on this as you do. I tried this on Komodo and after 5 minutes it cannot solve the position. I turned off NULL move pruning to see if that made any difference and let it run to the same depth - still no solution. I doubt null move pruning has much impact because passed pawn scoring is very aggressive and would be seen as almost like a tactical threats.
The problem is pretty difficult - at least for computers and the bishop of opposite color evaluation probably does not help the program solve this. You have to search really deeply to see the point of the early sacrifice even with no pruning at all and I don't think most really good programs are pruning many of the passed pawn pushes and probably all of them are likely to be near the front of the list of moves so they won't be getting reduced much either.
I'm going to add this position to my personal favorites. As a problem for a test suite this should start AFTER the bishop check and Kd7 move because Bg3+ is a natural move, but Rxd8 is a sacrifice. Komodo plays Bg3+ right away, but cannot follow it up with the sac.