A balanced approach to imbalances

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: A balanced approach to imbalances

Post by Lyudmil Tsvetkov »

enhorning wrote:While I find the material imbalances interesting, I think this discussion has too much emotionally based arguments, and not enough games.

So, as the discussion started with 7 Knights versus 3 Queens, I ran QueeNy against a gauntlet of Komodo 6, Critter 1.6a, Stockfish 4, Houdini 1.5a and Gull 2.3, at a time-control of 1 minute + 1 second increment, starting from this position:
[d]nn1nknn1/2nppp1n/8/8/8/8/3PPP2/2QQKQ2 w - - 0 1
(The most pawns one can have and still have a Fide legal position. Two knights have been moved forward a single step from the home line to avoid having material en prise in the initial position.)

As white, the side with the 3 queens, QueeNy got 5 draws.

As black, the side with the 7 knights, QueeNy got 4 wins and 1 loss.

I would note that QueeNy has multiple handicaps. The other engines are running with my default settings of 4 threads and 256 MB Hash. QueeNy only has 1 thread, and whatever its default hash is - considerably smaller, looking at the memory footprint.

Games can be downloaded from: https://www.dropbox.com/s/pm5fgtniucixwt5/3Q-7N.pgn

I'll leave drawing conclusions up to others - I am not a chess theoretician - I just like playing oddball variants, and watch them get played, and this extreme inbalance of 7N vs 3Q was intriguing me.
Hi Ola Mikael.
Many thanks for the games! I really appreciate that. Someone that really helped.

The only serious drawback with this test is that, as said, the white king's shelter is in the center and it only has pawns, while black has 7 knights to shelter the king apart from the queen - and that is already a very serious advantage for black. The second serious black advantage is that with just 3 pawns in the center, the queens lose much of their strength, because they are quick moving pieces and need play/pawns on both sides.

I would be very grateful, if you could repeat exactly the same test, but starting from the position below:

[d]1nnnknn1/pppppppp/8/8/8/8/PPPPPPPP/2QQK3 w - - 0 1

We already agreed this position is the most natural one can get, does not favour any side in particular, and represents a medium ground of Q vs 3Ns and 3Qs vs 7Ns. This test would already really be quite significant.

Would you please be so kind to rerun the test with the above position and kindly report the results?

Many thanks in advance.

Best, Lyudmil
enhorning
Posts: 342
Joined: Wed Jan 05, 2011 10:05 pm

Re: A balanced approach to imbalances

Post by enhorning »

Lyudmil Tsvetkov wrote:I would be very grateful, if you could repeat exactly the same test, but starting from the position below:

[d]1nnnknn1/pppppppp/8/8/8/8/PPPPPPPP/2QQK3 w - - 0 1
That's not a Fide legal position, which means several of the programs crash or produce an illegal move. I don't have the patience to trial and error my way through my collection to find programs which can handle it.

I'd be happy to test 7N - 3Q or 5N - 2Q (with or without pawns) from some Fide legal position.
enhorning
Posts: 342
Joined: Wed Jan 05, 2011 10:05 pm

Re: A balanced approach to imbalances

Post by enhorning »

Starting with 5 knights against 2 queens, and 5 pawns each:
[d]1nnnknn1/2ppppp1/8/8/8/8/2PPPPP1/3QKQ2 w - - 0 1
(and increasing time to 2+2)
gave these results

As white (Qs), QueeNy drew 2 and lost 3.
As black (Ns), QueeNy won 3 and lost 2, one due to illegal move.

This means the Knights side scored a total of 7/10 (quite close to the 6.5/10 they scored in the 3Q-7N gauntlet).

Games are available at: https://www.dropbox.com/s/svaebxro69mh8eo/2Q-5N.pgn
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: A balanced approach to imbalances

Post by Lyudmil Tsvetkov »

enhorning wrote:
Lyudmil Tsvetkov wrote:I would be very grateful, if you could repeat exactly the same test, but starting from the position below:

[d]1nnnknn1/pppppppp/8/8/8/8/PPPPPPPP/2QQK3 w - - 0 1
That's not a Fide legal position, which means several of the programs crash or produce an illegal move. I don't have the patience to trial and error my way through my collection to find programs which can handle it.

I'd be happy to test 7N - 3Q or 5N - 2Q (with or without pawns) from some Fide legal position.
Hi Ola,
What do you mean Fide legal position?
The purpose would be to test in a relevant environment. The second position you started with with only central pawns and again bad white king shelter favours the knights side in almost exactly the same way as in the first position.

I am sure neither of the programs you used in the first test is going to crash, as they are all very stable ones. Would be happy if you could run the gauntlet, and if any engine crashes, but I do not believe so, that will not be such a tragedy, we will still have the remaining results. Again, testing with 5 central pawns simply makes no sense, it is a big disadvantage for the queens.

Thanks again for your efforts, I hope very much you could rerun with the position I suggested. (1+1 games would be just perfect) In any case, I am sure Stockfish and Critter will not crash, Komodo probably too, Queeny for sure, and you have 2 other engines that are very probable to perform also.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: A balanced approach to imbalances

Post by Lyudmil Tsvetkov »

But also we know for sure that DiscoCheck does not crash, as well as Fruit 2.1. Toga would also be fine.
enhorning
Posts: 342
Joined: Wed Jan 05, 2011 10:05 pm

Re: A balanced approach to imbalances

Post by enhorning »

Lyudmil Tsvetkov wrote: What do you mean Fide legal position?
A position that would be legal in a game of Fide chess. I.e. not more extra pieces than pawns removed.
I am sure neither of the programs you used in the first test is going to crash, as they are all very stable ones. [...] In any case, I am sure Stockfish and Critter will not crash, Komodo probably too, Queeny for sure, and you have 2 other engines that are very probable to perform also.
Critter crashed / produced illegal moves, Houdini crashed / produced illegal moves, as both black and white. They're forced to play a position that's outside their parameters, that is impossible in a game of normal chess. Given your misplaced confidence in Critter, I am not going to try the other engines you suggested; it's almost as if you don't realize that with your position we have moved outside the bounds of Fide chess and are now playing a variant.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: A balanced approach to imbalances

Post by Lyudmil Tsvetkov »

enhorning wrote:
Lyudmil Tsvetkov wrote: What do you mean Fide legal position?
A position that would be legal in a game of Fide chess. I.e. not more extra pieces than pawns removed.
I am sure neither of the programs you used in the first test is going to crash, as they are all very stable ones. [...] In any case, I am sure Stockfish and Critter will not crash, Komodo probably too, Queeny for sure, and you have 2 other engines that are very probable to perform also.
Critter crashed / produced illegal moves, Houdini crashed / produced illegal moves, as both black and white. They're forced to play a position that's outside their parameters, that is impossible in a game of normal chess. Given your misplaced confidence in Critter, I am not going to try the other engines you suggested; it's almost as if you don't realize that with your position we have moved outside the bounds of Fide chess and are now playing a variant.
And it is a pity, because all we now know is that Queeny possibly plays the Qs-Ns imbalance better than even the top engines, but we absolutely do not know which side the imbalance in itself favours. You need neutral ground to check that, and the best possible neutral ground is the position I suggested.

So that actually you succeeded with 4 of the engines of your gauntlet, and you only needed another 2 to proceed. I suggested DiscoCheck and Fruit, as Lucas and Harm posted games where they play unfide-like chess, but you could also have run the gauntlet with only 3 opponents. In any case, the results would be much more representative, as you would test a true hypothesis, but now the hypothesis we wanted to test is simply not tested.

Do not you realize that 7 knights, even in a fide-rule position, is a variant? Ever heard of a real game (even a single one), where 7 knights for one side featured? We wanted to test a hypothesis, but it seems that neither you, nor Harm, are willing to do so. Probably because you either already have the results for the position with equal conditions and know it strongly favours the queen side (as it really does, although the engines might not play it perfectly), or because..., well, I simply can not think of another reason.
lucasart
Posts: 3242
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: A balanced approach to imbalances

Post by lucasart »

Lyudmil Tsvetkov wrote:But also we know for sure that DiscoCheck does not crash, as well as Fruit 2.1. Toga would also be fine.
Actually there are some _arbitrary_ limitations in DiscoCheck:

* no more than 15 pieces of the same type: for each (color,piece), so no more than 15 black rooks, white bishops, etc. Many programs have a limit of 3 here, which is too restrictive IMO (although the 3 limit is rarely exceed in real games it may be in nodes of the search tree).

* no more than 128 legal moves per position: that applies to all positions, not just the ones played, but the ones at every node of the search tree. I can increase that limit easily, but it makes discocheck slower, due to increased cache pressure, at least on my machine (perhaps there's no impact with bigger cache memory).

* no more than 512 moves played in a game, and that includes the search tree. so if 500 moves are played, and the search reaches a depth >= 12, DiscoCheck should crash; or maybe it won't and silently break which is even worse. the reason is that I don't want to waste time doing memory reallocation operations at run time, so the size has to be hardcoded, and it cannot be too big to ensure cache efficiency (speed).

So there are limits, and all these limits have a good reason to exist. There is no limit, however, to the stupidity of some testers who will hand craft postions with several kings or pawns on the first rank, just to be able to parade around and say that DiscoCheck is crap and buggy, but I've tried to find the right compromise here.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: A balanced approach to imbalances

Post by Lyudmil Tsvetkov »

lucasart wrote:
Lyudmil Tsvetkov wrote:But also we know for sure that DiscoCheck does not crash, as well as Fruit 2.1. Toga would also be fine.
Actually there are some _arbitrary_ limitations in DiscoCheck:

* no more than 15 pieces of the same type: for each (color,piece), so no more than 15 black rooks, white bishops, etc. Many programs have a limit of 3 here, which is too restrictive IMO (although the 3 limit is rarely exceed in real games it may be in nodes of the search tree).

* no more than 128 legal moves per position: that applies to all positions, not just the ones played, but the ones at every node of the search tree. I can increase that limit easily, but it makes discocheck slower, due to increased cache pressure, at least on my machine (perhaps there's no impact with bigger cache memory).

* no more than 512 moves played in a game, and that includes the search tree. so if 500 moves are played, and the search reaches a depth >= 12, DiscoCheck should crash; or maybe it won't and silently break which is even worse. the reason is that I don't want to waste time doing memory reallocation operations at run time, so the size has to be hardcoded, and it cannot be too big to ensure cache efficiency (speed).

So there are limits, and all these limits have a good reason to exist. There is no limit, however, to the stupidity of some testers who will hand craft postions with several kings or pawns on the first rank, just to be able to parade around and say that DiscoCheck is crap and buggy, but I've tried to find the right compromise here.
Very nice implementation, at least you are able to play the 3Qs vs 7Ns game you posted. Btw., in which interface did you play it? Because I am just trying now to load my 2Qs vs 5Ns position to check it myself against some engine, Chesspad will do that, but I can not load engines in it, while the GUI I am using for loading engines - Fritz - stubbornly refuses to load the position.
Uri Blass
Posts: 11098
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: A balanced approach to imbalances

Post by Uri Blass »

lucasart wrote:
Lyudmil Tsvetkov wrote:But also we know for sure that DiscoCheck does not crash, as well as Fruit 2.1. Toga would also be fine.
Actually there are some _arbitrary_ limitations in DiscoCheck:

* no more than 15 pieces of the same type: for each (color,piece), so no more than 15 black rooks, white bishops, etc. Many programs have a limit of 3 here, which is too restrictive IMO (although the 3 limit is rarely exceed in real games it may be in nodes of the search tree).

* no more than 128 legal moves per position: that applies to all positions, not just the ones played, but the ones at every node of the search tree. I can increase that limit easily, but it makes discocheck slower, due to increased cache pressure, at least on my machine (perhaps there's no impact with bigger cache memory).

* no more than 512 moves played in a game, and that includes the search tree. so if 500 moves are played, and the search reaches a depth >= 12, DiscoCheck should crash; or maybe it won't and silently break which is even worse. the reason is that I don't want to waste time doing memory reallocation operations at run time, so the size has to be hardcoded, and it cannot be too big to ensure cache efficiency (speed).

So there are limits, and all these limits have a good reason to exist. There is no limit, however, to the stupidity of some testers who will hand craft postions with several kings or pawns on the first rank, just to be able to parade around and say that DiscoCheck is crap and buggy, but I've tried to find the right compromise here.
The first limit make sense.
You can even safely have a limit of 10 so I do not understand why do you use a limit of 15 but the other limits do not make sense.

1)There are clearly legal positions with more than 128 legal moves
and I see no reason to assume that you will never meet them in the search in one of discocheck games even if it is only in 1 out of 1,000,000 games.

256 is clearly safe but not 128

2)Games can have more than 512 moves and a programmer may tell his program to play slowly in drawn positions to cause the opponent to crash.

For example suppose that the following position happen in a comp-comp game at move 150(and it is clearly possible).

[d]7k/p7/B7/8/8/7P/7P/6K1 w - - 0 1


White may need 50 moves for h4,100 moves for h5 150 moves for h6 200 moves for h7 250 moves for h3 with the second pawn,300 moves for h4 with the second pawn,350 moves for h5 with the second pawn and 400 moves for h6 with the second pawn.

DiscoCheck may crash and lose the game instead of drawing it and it can happen even in TCEC conditions when there is tablebases adjudication for positions with 5 pieces because the position that I gave has 6 pieces on the board.

Note that programs without the knowledge about the blind bishop will probably not wait 50 moves for pushing pawns but programs with knowledge about the blind bishop and also knowledge about the bugs of other chess programs may decide to wait 50 moves thanks to this knowledge and I will not be surprised if some programmer implement the knowledge in order to win against other programs in drawn positions by the fact that they crash.