Piece handicap elo diff, an idea for Kai Laskos for testing

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Isaac
Posts: 265
Joined: Sat Feb 22, 2014 8:37 pm

Piece handicap elo diff, an idea for Kai Laskos for testing

Post by Isaac »

Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.

I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.

That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Post by MikeB »

Interesting question, my only recommendation is to pick one engine , maybe K or SF and run it against a guantlet of engines.
User avatar
hgm
Posts: 28514
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Post by hgm »

I have been doing these kinds of experiments since 2008, but usually by playing an engine against itself. (Fairy-Max or Joker80). If you have multiple engines of identical strength this would of course be preferable.

Note that deleting a Pawn ('classical Pawn odds') already gives about a 70-30 advantage. With piece odds the score rises to far above 90%, however,and can nolonger be described well by an Elo model. (You will just be measuring the probability that the advantaged side will commit a gross blunder that gets it checkmated early, which hardly depends on which piece the opponent is missing.)

So you have to keep the material imbalance within the range where the score is still sensitive to the advantage,say 30%-70%. So rather than delating, say, a Knight, I delete N+P vs R or 2N vs R. From these differences you can reconstruct the Elo value of P. N and R.

To get game diversity I usually start from many different (Chess960-like) positions that are all symmetric except for the deleted material. It isalso better to not rely on a single material imbalance, but test pieces against a variety of opposing material. (E.g. Q vs RR, BNN or BBN).
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Post by Laskos »

Isaac wrote:Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.

I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.

That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
It's not that easy. The Bishop handicap means different ELO at different time control or strength. Say, Bishop handicap at 1800 ELO brings ELO to 1200, or 600 points handicap. But Bishop handicap at 3400 ELO may bring you to ELO 2000, 1400 points handicap. To illustrate here, the Bishop handicap c1 and f1 at depth 4 and 7 for Komodo 10:

c1:

Code: Select all

depth=4
+1057  =756 -8187  14.3%

depth=7
+561 =535 -8904   8.2%
f1:

Code: Select all

depth=4
 +801  =769 -8430  11.8%

depth=7
+439 =535 -9026   7.0%
One thing is pretty clear, Bishop on f1 seems more valuable than at c1, and that the ELO value of the handicap increases substantially from depth=4 to depth=7. Comparing to what Stockfish does is a bit meaningless until we equal the strength, because these things are heavily strength dependent (or time control dependent). Also, self-play increases the ELO difference compared to playing different engines.

These issues have been discussed with regard to handicap matches organized by Larry Kaufman, and were shown in numerous threads and posts here, mostly in tournament section.
BeyondCritics
Posts: 416
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Post by BeyondCritics »

Are your results available? This would be of interest to programmers,chess players, problemists, ... etc.
Uri Blass
Posts: 11222
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Post by Uri Blass »

Laskos wrote:
Isaac wrote:Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.

I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.

That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
It's not that easy. The Bishop handicap means different ELO at different time control or strength. Say, Bishop handicap at 1800 ELO brings ELO to 1200, or 600 points handicap. But Bishop handicap at 3400 ELO may bring you to ELO 2000, 1400 points handicap. To illustrate here, the Bishop handicap c1 and f1 at depth 4 and 7 for Komodo 10:

c1:

Code: Select all

depth=4
+1057  =756 -8187  14.3%

depth=7
+561 =535 -8904   8.2%
f1:

Code: Select all

depth=4
 +801  =769 -8430  11.8%

depth=7
+439 =535 -9026   7.0%
One thing is pretty clear, Bishop on f1 seems more valuable than at c1, and that the ELO value of the handicap increases substantially from depth=4 to depth=7. Comparing to what Stockfish does is a bit meaningless until we equal the strength, because these things are heavily strength dependent (or time control dependent). Also, self-play increases the ELO difference compared to playing different engines.

These issues have been discussed with regard to handicap matches organized by Larry Kaufman, and were shown in numerous threads and posts here, mostly in tournament section.
I see one problems with all these games.

Programs do not consider the opponent and do not play a move based on oponnent model.
It may be interesting to have programmer write anti-depth 4 for Komodo for example to find how much is possible to get against komodo depth 4.

The idea is for example that white play without queen d1 but in the search assume that black has to play the move that komodo suggest at depth 4 and prune all other options for black.
This type of selective search is anti-komodo depth 4 and it may be interesting to see how much handicap it can give to komodo depth 4 and still win or at least draw.

I also wonder if this type of selective search can do better against humans with big handicap(relative to normal komodo).

It may be better to stop this type of selective search if the program is already winning against humans in order not to take risks but I do not think that it is very important.
Last edited by Uri Blass on Tue Jun 28, 2016 5:19 pm, edited 1 time in total.
lkaufman
Posts: 6299
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Post by lkaufman »

Laskos wrote:
Isaac wrote:Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.

I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.

That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
It's not that easy. The Bishop handicap means different ELO at different time control or strength. Say, Bishop handicap at 1800 ELO brings ELO to 1200, or 600 points handicap. But Bishop handicap at 3400 ELO may bring you to ELO 2000, 1400 points handicap. To illustrate here, the Bishop handicap c1 and f1 at depth 4 and 7 for Komodo 10:

c1:

Code: Select all

depth=4
+1057  =756 -8187  14.3%

depth=7
+561 =535 -8904   8.2%
f1:

Code: Select all

depth=4
 +801  =769 -8430  11.8%

depth=7
+439 =535 -9026   7.0%
One thing is pretty clear, Bishop on f1 seems more valuable than at c1, and that the ELO value of the handicap increases substantially from depth=4 to depth=7. Comparing to what Stockfish does is a bit meaningless until we equal the strength, because these things are heavily strength dependent (or time control dependent). Also, self-play increases the ELO difference compared to playing different engines.

These issues have been discussed with regard to handicap matches organized by Larry Kaufman, and were shown in numerous threads and posts here, mostly in tournament section.
The difference between the two bishops' results is much larger than can be accounted for by any terms in Komodo's eval. Rybka gave a bonus for the King's bishop, but it was just one centipawn or so, and later taken out I think. It would be interesting if you could repeat your test in (fast) timed games. If the results are too close to 100% you could remove the b8 knight in both cases to make for fairly even results, although this might introduce a small bias for one bishop or the other. If a significant superiority of the king's bishop still shows thru in all tests we would need to address this, although past tests showed that a king's bishop bonus doesn't test well with any meaningful value.
Komodo rules!
lkaufman
Posts: 6299
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Post by lkaufman »

Laskos wrote:
Isaac wrote:Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.

I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.

That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
It's not that easy. The Bishop handicap means different ELO at different time control or strength. Say, Bishop handicap at 1800 ELO brings ELO to 1200, or 600 points handicap. But Bishop handicap at 3400 ELO may bring you to ELO 2000, 1400 points handicap. To illustrate here, the Bishop handicap c1 and f1 at depth 4 and 7 for Komodo 10:

c1:

Code: Select all

depth=4
+1057  =756 -8187  14.3%

depth=7
+561 =535 -8904   8.2%
f1:

Code: Select all

depth=4
 +801  =769 -8430  11.8%

depth=7
+439 =535 -9026   7.0%
How did you achieve variety at fixed depth? My guess is you used two threads, since a single thread run at fixed depth should give you identical games and hence all draws or all wins. Or did you do something else?
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Post by Laskos »

lkaufman wrote:
How did you achieve variety at fixed depth? My guess is you used two threads, since a single thread run at fixed depth should give you identical games and hence all draws or all wins. Or did you do something else?
I tested earlier these days at time control 2s+0.02s both Stockfish and Komodo. For variety of openings, I took the handicap opening position as start and played 4 plies with the random mover (several thousands very fast games), building a handicap opening book (PGN) for each handicap.

Here are the results at 2+0.02:

Code: Select all

Stockfish dev.

tc=2+0.02
Bishop c1
 +33  =36 -931   5.1%
Bishop f1
 +17  =40 -943   3.7%

Komodo 10

tc=2+0.02
Bishop c1
 +47  =27 -926   6.0%
Bishop f1
 +29  =37 -934   4.7%
Again, for some reason f Bishop seems more valuable.
lkaufman
Posts: 6299
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Post by lkaufman »

Laskos wrote:
lkaufman wrote:
How did you achieve variety at fixed depth? My guess is you used two threads, since a single thread run at fixed depth should give you identical games and hence all draws or all wins. Or did you do something else?
I tested earlier these days at time control 2s+0.02s both Stockfish and Komodo. For variety of openings, I took the handicap opening position as start and played 4 plies with the random mover (several thousands very fast games), building a handicap opening book (PGN) for each handicap.

Here are the results at 2+0.02:

Code: Select all

Stockfish dev.

tc=2+0.02
Bishop c1
 +33  =36 -931   5.1%
Bishop f1
 +17  =40 -943   3.7%

Komodo 10

tc=2+0.02
Bishop c1
 +47  =27 -926   6.0%
Bishop f1
 +29  =37 -934   4.7%
Again, for some reason f Bishop seems more valuable.
Thanks. It is a known chess principle that in general the king's bishop is more valuable than the queen's bishop; I read an article on exactly that topic by a GM not long ago. But the magnitude of the difference in your tests is much more than what I observed in testing changes to Rybka and to Komodo. But it gives me motivation to work more on the issue.
Komodo rules!