Page 2 of 5

Re: Opposite Color Bishop Endgames

Posted: Sat May 11, 2019 3:02 am
by Stephen Ham
That is why I posted as I did.

In your link, I recall Majid Ansari's post of over five years ago when he wrote:

I actually watched SF win a very nice opposite bishop endgame against Houdini. The struggle in the endgame was to avoid a drawn opposite bishop endgame and I thought SF handled it extremely well. Houdini seems to also realize that opposite bishop endgames are drawn but it is not as accurate as SF and is way too optimistic about drawing chances. SF seems much more adept at this and with its extremely deep endgame search ability it is just too strong for Houdini in such positions.

So, that's why I assumed that modern engines now have proper coding for opposite colored bishop endgames. Nonetheless, I understand that such coding is very difficult. For example, although such endgames are highly drawish, opposite colored bishop middlegames tend to be dynamic. So, when does one tell the engine to go from one extreme to the other?

Today, with 6-man endgame TBs, perhaps programmers consider this a minimal issue. Alas, I only have a 5-man TB. But, I will be upgrading my hardware this summer and so will purchase a 6-man TB stick.

All the best,
Steve

Re: Opposite Color Bishop Endgames

Posted: Sat May 11, 2019 8:26 am
by hgm
abulmo2 wrote: Sat May 11, 2019 12:31 amI have some trouble to understand you, to be good at chess IS to have a high Elo.
Not at all. Having a high Elo can be achieved by only doing good moves in a sub-set of all chess positions, namely the positions that can be reached by the moves you select. You can arbitrarily suck in positions that you will never reach because you avoid them. And I would not call an entity that heavily sucks in a large sub-set of all positions "good at chess". I would say it just knows a trick to win.

LC0 is a good example: it sucks in tactical positions, as is easily demonstrated from its performance in tactical test suits. But in games it just avoids tactically complex positions, so it still has a pretty high Elo for a poor Chess program.

It all boils down to the difference between average and worst case. Elo is just an average, and an average that can be skewed at that, as the player has a large say in the set of positions it is measured on, or their weighting. Reliability of a product is measured from its worst-case behavior, though; average is not very important. If there was a car that used on average somewhat less fuel than its competitors, but would not run at all when it is raining, virtually no one would even consider buying it.

Also see Dann's response.

Re: Opposite Color Bishop Endgames

Posted: Sat May 11, 2019 12:12 pm
by abulmo2
Stephen Ham wrote: Fri May 10, 2019 6:48 am
The following 8-piece endgame just arose in a Raubfisch X40a3-Cfish match:
[d]4b3/8/6k1/6p1/p6p/K3B1P1/8/8 w 01

Virtually all humans immediately see that this is a draw.
A simple variation of the above position:
[d]4b3/8/2p3k1/2K3p1/7p/4B1P1/8/8 w - - 0 1
Here it is a black win. I just move a pawn (and the opponent king, but it does matter less). The opposite color bishops are still there, but the position is no more a dead draw. I conclude that in your above position the problem is more on the pawn structure (edge pawn are difficult to promote) than in the opposite color bishops. The problem, while testing an opposite color bishop code, is that the test will probably encounter more positions like mine than like yours.

Re: Opposite Color Bishop Endgames

Posted: Sat May 11, 2019 12:27 pm
by hgm
abulmo2 wrote: Sat May 11, 2019 12:12 pmThe problem, while testing an opposite color bishop code, is that the test will probably encounter more positions like mine than like yours.
Sure. But that is no reason not to distinguish those.

And indeed 'unlike Bishops' plays no role in the given position, as the drawing plan sacs a Bishop.

Re: Opposite Color Bishop Endgames

Posted: Sat May 11, 2019 1:34 pm
by chrisw
hgm wrote: Sat May 11, 2019 8:26 am
abulmo2 wrote: Sat May 11, 2019 12:31 amI have some trouble to understand you, to be good at chess IS to have a high Elo.
Not at all. Having a high Elo can be achieved by only doing good moves in a sub-set of all chess positions, namely the positions that can be reached by the moves you select. You can arbitrarily suck in positions that you will never reach because you avoid them. And I would not call an entity that heavily sucks in a large sub-set of all positions "good at chess". I would say it just knows a trick to win.
Living to adult age and reproducing can only be achieved by eating non-poisonous, nutritious things from the subset of all chewable bitesized things, namely the chewable things you select.
You can arbitrarily die from poisoning from poisonous things you never eat because you avoid them. And I would not call an entity that arbitrarily dies in a world with a large subset of poisionous chewable things "good at surviving and reproducing". I would say it just knows a trick to avoid poison.

LC0 is a good example: it sucks in tactical positions, as is easily demonstrated from its performance in tactical test suits. But in games it just avoids tactically complex positions, so it still has a pretty high Elo for a poor Chess program.
Person A is a dismal failure at IQ tests, but very successful in life.
Person B is brilliant at IQ tests, but unable to tie his shoelaces.

Normally here, we would criticise the IQ test for actually being not very good at measuring "General Intelligence", but I can see you regard "the test" as the real measure and "real life performance" as a test that failed to measure up to the IQ test.

It all boils down to the difference between average and worst case. Elo is just an average, and an average that can be skewed at that, as the player has a large say in the set of positions it is measured on, or their weighting.
In the case of zero-sum games, ELO is a life-performance measurement. All players start from exactly the same position. If there is a general mismatch between performance on "test-suites" and "life-performance" then it is the "test-suites" that are inadequate. If we were to introduce tigers into a flock of sheep, it is no use claiming after all the sheep are dead that tigers fail the quality of wool test hitherto deemed as the ultimate measure of quality in the real world.

Reliability of a product is measured from its worst-case behavior, though; average is not very important. If there was a car that used on average somewhat less fuel than its competitors, but would not run at all when it is raining, virtually no one would even consider buying it.

Also see Dann's response.

Re: Opposite Color Bishop Endgames

Posted: Sat May 11, 2019 2:52 pm
by Ferdy
Stephen Ham wrote: Fri May 10, 2019 6:48 am Hello All,

The following 8-piece endgame just arose in a Raubfisch X40a3-Cfish match:
[d]4b3/8/6k1/6p1/p6p/K3B1P1/8/8 w 01

Virtually all humans immediately see that this is a draw. White will exchange pawns and then sac his Bishop for Black's dark-squared pawn, leading to a King versus King, Bishop, and pawn endgame draw. However, at this position, Raubfisch scored it -1.79 while Cfish scored it -2.30! I'm playing with 5-man Nalimov TBs.

Since opposite color bishop endgames with more than 6-7 pieces occur frequently, shouldn't programmers code their engines to know when such positions are draws? Then, the materially weaker side could learn to draw games they'd otherwise lose, while materially stronger sides could score wins by avoiding these draws.

Both of the above engines are Stockfish derivatives, suggesting that the strongest A-B engine also lacks this coding. So, are programmers counting on users to have 6-man TBs for engine probing?

Many moves were played since the above position. At move 130, after endless shuffling of bishops, Raubfisch scored it as a small white disadvantage while Cfish still scored it -2.30 until going over 40 plies, when the evaluation dropped to 0.00. They then drew a couple moves later.

All the best,
Steve
I think most programmers have implemented the famous KvKBP (rook-pawn, wrong bishop color) equal ending.

Image

With the availability of EGT most programmers concentrate more on other areas to improve their engine.

From your posted position on my computer Stockfish 10 (no egt) evaluates the position at around -1.26 for white. I believe this is a good score already.

Stephen Ham wrote: Fri May 10, 2019 6:48 am Virtually all humans immediately see that this is a draw.
I doubt that.


BTW does Raubfish and Cfish used Nalimov EGT? Can you post the whole game in pgn format?

Re: Opposite Color Bishop Endgames

Posted: Sat May 11, 2019 3:42 pm
by syzygy
hgm wrote: Sat May 11, 2019 8:26 am Not at all. Having a high Elo can be achieved by only doing good moves in a sub-set of all chess positions, namely the positions that can be reached by the moves you select. You can arbitrarily suck in positions that you will never reach because you avoid them. And I would not call an entity that heavily sucks in a large sub-set of all positions "good at chess". I would say it just knows a trick to win.
Yet that is exactly how human GMs approach the game of chess.

Re: Opposite Color Bishop Endgames

Posted: Sun May 12, 2019 9:42 am
by hgm
chrisw wrote: Sat May 11, 2019 1:34 pmLiving to adult age and reproducing can only be achieved by eating non-poisonous, nutritious things from the subset of all chewable bitesized things, namely the chewable things you select.
You can arbitrarily die from poisoning from poisonous things you never eat because you avoid them. And I would not call an entity that arbitrarily dies in a world with a large subset of poisionous chewable things "good at surviving and reproducing". I would say it just knows a trick to avoid poison.
It wouldn't automatically qualify as a "culinary expert", that is for sure.
Person A is a dismal failure at IQ tests, but very successful in life.
Person B is brilliant at IQ tests, but unable to tie his shoelaces.

Normally here, we would criticise the IQ test for actually being not very good at measuring "General Intelligence", but I can see you regard "the test" as the real measure and "real life performance" as a test that failed to measure up to the IQ test.
You got it completely the wrong way around. It is playing games that exposes you to such a small fraction of all positions in the "real world" of Chess that it is similar to an IQ test (of the special kind, where the subject under test can pick the questions he likes most). And it is seeing the draw in the position of the OP that is as trivial as tying shoe laces. You are the one trying to elevate the IQ test to the real measure.
In the case of zero-sum games, ELO is a life-performance measurement. All players start from exactly the same position. If there is a general mismatch between performance on "test-suites" and "life-performance" then it is the "test-suites" that are inadequate. If we were to introduce tigers into a flock of sheep, it is no use claiming after all the sheep are dead that tigers fail the quality of wool test hitherto deemed as the ultimate measure of quality in the real world.
'Life-performance' in an IQ test...

Re: Opposite Color Bishop Endgames

Posted: Sun May 12, 2019 11:36 am
by chrisw
hgm wrote: Sun May 12, 2019 9:42 am
chrisw wrote: Sat May 11, 2019 1:34 pmLiving to adult age and reproducing can only be achieved by eating non-poisonous, nutritious things from the subset of all chewable bitesized things, namely the chewable things you select.
You can arbitrarily die from poisoning from poisonous things you never eat because you avoid them. And I would not call an entity that arbitrarily dies in a world with a large subset of poisionous chewable things "good at surviving and reproducing". I would say it just knows a trick to avoid poison.
It wouldn't automatically qualify as a "culinary expert", that is for sure.
It would however be a necessary condition to qualify each and every ancestor going far back into time as a successful survivor and reproducer of one H.G.Muller.
Person A is a dismal failure at IQ tests, but very successful in life.
Person B is brilliant at IQ tests, but unable to tie his shoelaces.

Normally here, we would criticise the IQ test for actually being not very good at measuring "General Intelligence", but I can see you regard "the test" as the real measure and "real life performance" as a test that failed to measure up to the IQ test.
You got it completely the wrong way around.
haha! that’s your problem, as already shown.
It is playing games that exposes you to such a small fraction of all positions in the "real world" of Chess
Your “world of chess” is an wholly arbitrary and imaginary construct which means anything you want it to, whether you add “real” to the front of it or not. Chess is a board game played, played according to rules. If there is any “world” to it, it comes from the groups of agents playing the game.
The positions that arise are a small subset. Currently the only machine entity that can claim any sort of coherent strategy for the entire set of chess positions is a AB or minimax material counter. All positional heuristics are added in on an assumed but unstated programming rule that the position being considered is one of the normal subset, and as heuristics are refined they more and more demand the position be of the normal subset. Ask yourself what would be the use, other than as random noise and waste of computing time, of the bishop pair heuristic for one of your all-world-of-chess positions where one side had four bishops? SF itself, the strongest AB program, is developed entirely on the basis of what works statistically, eg what works with your “small fraction” of chess positions met, in the proportion to which they are met.

that it is similar to an IQ test (of the special kind, where the subject under test can pick the questions he likes most).
No. Now you really are in full on upside down mode.
Your chess IQ test is one in which the test setter picks the questions he likes most. The dominant paradigm in computer chess was, until disproven by DeepMind researchers, that “chess is tactics”, I seem to recollect being the only person, in a minority of one, years and years ago, arguing that this was a nonsense, and that one day would come along a program with positional knowledge that was going to blow away this material/tactical paradigm.
What you are doing, from old way of thinking, surprise, surprise, is elevate the “chess is tactics” paradigm with tactical suite IQ testing, as if these types of positions were more representative of “chess” than ability to play positional.
You select the test based on old paradigm, the test fails to perform against new paradigm, so you blame the new paradigm. Very typical of the old academic invested in an old field coming up against change. Kuhn, History of Scientific Revolutions.

And it is seeing the draw in the position of the OP that is as trivial as tying shoe laces.
Expressed in heuristics and logic is very easy, but you should know by now that actual programming of even simple special case heuristics is actually not easy at all. There are almost invariable special cases of the special cases, only discovered later by stupidly losing.
You are the one trying to elevate the IQ test to the real measure.
Silly comment when I am clearly arguing the direct opposite.
In the case of zero-sum games, ELO is a life-performance measurement. All players start from exactly the same position. If there is a general mismatch between performance on "test-suites" and "life-performance" then it is the "test-suites" that are inadequate. If we were to introduce tigers into a flock of sheep, it is no use claiming after all the sheep are dead that tigers fail the quality of wool test hitherto deemed as the ultimate measure of quality in the real world.
'Life-performance' in an IQ test...
That is often said, but it is not true at all. You may as well claim your journey from Amsterdam to Berlin is the car speedometer.
IQ tests are predictive of life-performance, supposedly, that’s the purpose of such tests and why they are composed. But IQ test is not life performance. Life performance is not IQ test, any more than anything is the measuring device that measures it.

Re: Opposite Color Bishop Endgames

Posted: Sun May 12, 2019 11:42 am
by hgm
Well, it isn't rocket science. It is the difference between measuring an average on a small, unrepresentative and manipulatable sub-set of problems or measuring the poor-case behavior on the total of all problems.

If you want to advertise that you cannot understand this, you are welcome. I won't waste any more time on it.

BTW, it seems you read 'in' as 'is'...