Piece handicap elo diff, an idea for Kai Laskos for testing

Isaac · Post by **Isaac** » Sat Jun 25, 2016 4:33 am

Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.

I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.

That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.

MikeB · Post by **MikeB** » Sat Jun 25, 2016 4:49 am

Interesting question, my only recommendation is to pick one engine , maybe K or SF and run it against a guantlet of engines.

hgm · Post by **hgm** » Sat Jun 25, 2016 9:34 am

I have been doing these kinds of experiments since 2008, but usually by playing an engine against itself. (Fairy-Max or Joker80). If you have multiple engines of identical strength this would of course be preferable.

Note that deleting a Pawn ('classical Pawn odds') already gives about a 70-30 advantage. With piece odds the score rises to far above 90%, however,and can nolonger be described well by an Elo model. (You will just be measuring the probability that the advantaged side will commit a gross blunder that gets it checkmated early, which hardly depends on which piece the opponent is missing.)

So you have to keep the material imbalance within the range where the score is still sensitive to the advantage,say 30%-70%. So rather than delating, say, a Knight, I delete N+P vs R or 2N vs R. From these differences you can reconstruct the Elo value of P. N and R.

To get game diversity I usually start from many different (Chess960-like) positions that are all symmetric except for the deleted material. It isalso better to not rely on a single material imbalance, but test pieces against a variety of opposing material. (E.g. Q vs RR, BNN or BBN).

Laskos · Post by **Laskos** » Sat Jun 25, 2016 12:33 pm

Isaac wrote:Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.

I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.

That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.

It's not that easy. The Bishop handicap means different ELO at different time control or strength. Say, Bishop handicap at 1800 ELO brings ELO to 1200, or 600 points handicap. But Bishop handicap at 3400 ELO may bring you to ELO 2000, 1400 points handicap. To illustrate here, the Bishop handicap c1 and f1 at depth 4 and 7 for Komodo 10:

c1:

Code: Select all

depth=4
+1057  =756 -8187  14.3%

depth=7
+561 =535 -8904   8.2%

f1:

Code: Select all

depth=4
 +801  =769 -8430  11.8%

depth=7
+439 =535 -9026   7.0%

One thing is pretty clear, Bishop on f1 seems more valuable than at c1, and that the ELO value of the handicap increases substantially from depth=4 to depth=7. Comparing to what Stockfish does is a bit meaningless until we equal the strength, because these things are heavily strength dependent (or time control dependent). Also, self-play increases the ELO difference compared to playing different engines.

These issues have been discussed with regard to handicap matches organized by Larry Kaufman, and were shown in numerous threads and posts here, mostly in tournament section.

BeyondCritics · Post by **BeyondCritics** » Tue Jun 28, 2016 3:47 pm

Are your results available? This would be of interest to programmers,chess players, problemists, ... etc.

Uri Blass · Post by **Uri Blass** » Tue Jun 28, 2016 5:14 pm

Laskos wrote:
Isaac wrote:Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.

I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.

That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
It's not that easy. The Bishop handicap means different ELO at different time control or strength. Say, Bishop handicap at 1800 ELO brings ELO to 1200, or 600 points handicap. But Bishop handicap at 3400 ELO may bring you to ELO 2000, 1400 points handicap. To illustrate here, the Bishop handicap c1 and f1 at depth 4 and 7 for Komodo 10:

c1:
Code: Select all
depth=4
+1057  =756 -8187  14.3%

depth=7
+561 =535 -8904   8.2%
f1:
Code: Select all
depth=4
 +801  =769 -8430  11.8%

depth=7
+439 =535 -9026   7.0%
One thing is pretty clear, Bishop on f1 seems more valuable than at c1, and that the ELO value of the handicap increases substantially from depth=4 to depth=7. Comparing to what Stockfish does is a bit meaningless until we equal the strength, because these things are heavily strength dependent (or time control dependent). Also, self-play increases the ELO difference compared to playing different engines.

These issues have been discussed with regard to handicap matches organized by Larry Kaufman, and were shown in numerous threads and posts here, mostly in tournament section.

I see one problems with all these games.

Programs do not consider the opponent and do not play a move based on oponnent model.
It may be interesting to have programmer write anti-depth 4 for Komodo for example to find how much is possible to get against komodo depth 4.

The idea is for example that white play without queen d1 but in the search assume that black has to play the move that komodo suggest at depth 4 and prune all other options for black.
This type of selective search is anti-komodo depth 4 and it may be interesting to see how much handicap it can give to komodo depth 4 and still win or at least draw.

I also wonder if this type of selective search can do better against humans with big handicap(relative to normal komodo).

It may be better to stop this type of selective search if the program is already winning against humans in order not to take risks but I do not think that it is very important.

lkaufman · Post by **lkaufman** » Tue Jun 28, 2016 5:18 pm

Laskos wrote:
Isaac wrote:Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.

I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.

That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
It's not that easy. The Bishop handicap means different ELO at different time control or strength. Say, Bishop handicap at 1800 ELO brings ELO to 1200, or 600 points handicap. But Bishop handicap at 3400 ELO may bring you to ELO 2000, 1400 points handicap. To illustrate here, the Bishop handicap c1 and f1 at depth 4 and 7 for Komodo 10:

c1:
Code: Select all
depth=4
+1057  =756 -8187  14.3%

depth=7
+561 =535 -8904   8.2%
f1:
Code: Select all
depth=4
 +801  =769 -8430  11.8%

depth=7
+439 =535 -9026   7.0%
One thing is pretty clear, Bishop on f1 seems more valuable than at c1, and that the ELO value of the handicap increases substantially from depth=4 to depth=7. Comparing to what Stockfish does is a bit meaningless until we equal the strength, because these things are heavily strength dependent (or time control dependent). Also, self-play increases the ELO difference compared to playing different engines.

These issues have been discussed with regard to handicap matches organized by Larry Kaufman, and were shown in numerous threads and posts here, mostly in tournament section.

The difference between the two bishops' results is much larger than can be accounted for by any terms in Komodo's eval. Rybka gave a bonus for the King's bishop, but it was just one centipawn or so, and later taken out I think. It would be interesting if you could repeat your test in (fast) timed games. If the results are too close to 100% you could remove the b8 knight in both cases to make for fairly even results, although this might introduce a small bias for one bishop or the other. If a significant superiority of the king's bishop still shows thru in all tests we would need to address this, although past tests showed that a king's bishop bonus doesn't test well with any meaningful value.

lkaufman · Post by **lkaufman** » Tue Jun 28, 2016 6:15 pm

Laskos wrote:
Isaac wrote:Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.

I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.

That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
It's not that easy. The Bishop handicap means different ELO at different time control or strength. Say, Bishop handicap at 1800 ELO brings ELO to 1200, or 600 points handicap. But Bishop handicap at 3400 ELO may bring you to ELO 2000, 1400 points handicap. To illustrate here, the Bishop handicap c1 and f1 at depth 4 and 7 for Komodo 10:

c1:
Code: Select all
depth=4
+1057  =756 -8187  14.3%

depth=7
+561 =535 -8904   8.2%
f1:
Code: Select all
depth=4
 +801  =769 -8430  11.8%

depth=7
+439 =535 -9026   7.0%
How did you achieve variety at fixed depth? My guess is you used two threads, since a single thread run at fixed depth should give you identical games and hence all draws or all wins. Or did you do something else?

Laskos · Post by **Laskos** » Tue Jun 28, 2016 7:52 pm

lkaufman wrote:
How did you achieve variety at fixed depth? My guess is you used two threads, since a single thread run at fixed depth should give you identical games and hence all draws or all wins. Or did you do something else?

I tested earlier these days at time control 2s+0.02s both Stockfish and Komodo. For variety of openings, I took the handicap opening position as start and played 4 plies with the random mover (several thousands very fast games), building a handicap opening book (PGN) for each handicap.

Here are the results at 2+0.02:

Code: Select all

Stockfish dev.

tc=2+0.02
Bishop c1
 +33  =36 -931   5.1%
Bishop f1
 +17  =40 -943   3.7%

Komodo 10

tc=2+0.02
Bishop c1
 +47  =27 -926   6.0%
Bishop f1
 +29  =37 -934   4.7%

Again, for some reason f Bishop seems more valuable.

lkaufman · Post by **lkaufman** » Tue Jun 28, 2016 9:07 pm

Laskos wrote:
lkaufman wrote:
How did you achieve variety at fixed depth? My guess is you used two threads, since a single thread run at fixed depth should give you identical games and hence all draws or all wins. Or did you do something else?
I tested earlier these days at time control 2s+0.02s both Stockfish and Komodo. For variety of openings, I took the handicap opening position as start and played 4 plies with the random mover (several thousands very fast games), building a handicap opening book (PGN) for each handicap.

Here are the results at 2+0.02:
Code: Select all
Stockfish dev.

tc=2+0.02
Bishop c1
 +33  =36 -931   5.1%
Bishop f1
 +17  =40 -943   3.7%

Komodo 10

tc=2+0.02
Bishop c1
 +47  =27 -926   6.0%
Bishop f1
 +29  =37 -934   4.7%
Again, for some reason f Bishop seems more valuable.

Thanks. It is a known chess principle that in general the king's bishop is more valuable than the queen's bishop; I read an article on exactly that topic by a GM not long ago. But the magnitude of the difference in your tests is much more than what I observed in testing changes to Rybka and to Komodo. But it gives me motivation to work more on the issue.

Piece handicap elo diff, an idea for Kai Laskos for testing

Piece handicap elo diff, an idea for Kai Laskos for testing

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Re: Piece handicap elo diff, an idea for Kai Laskos for test

Re: Piece handicap elo diff, an idea for Kai Laskos for test