Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.
I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.
That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
Piece handicap elo diff, an idea for Kai Laskos for testing
Moderator: Ras
-
Isaac
- Posts: 265
- Joined: Sat Feb 22, 2014 8:37 pm
-
MikeB
- Posts: 4889
- Joined: Thu Mar 09, 2006 6:34 am
- Location: Pen Argyl, Pennsylvania
Re: Piece handicap elo diff, an idea for Kai Laskos for test
Interesting question, my only recommendation is to pick one engine , maybe K or SF and run it against a guantlet of engines.
-
hgm
- Posts: 28514
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Piece handicap elo diff, an idea for Kai Laskos for test
I have been doing these kinds of experiments since 2008, but usually by playing an engine against itself. (Fairy-Max or Joker80). If you have multiple engines of identical strength this would of course be preferable.
Note that deleting a Pawn ('classical Pawn odds') already gives about a 70-30 advantage. With piece odds the score rises to far above 90%, however,and can nolonger be described well by an Elo model. (You will just be measuring the probability that the advantaged side will commit a gross blunder that gets it checkmated early, which hardly depends on which piece the opponent is missing.)
So you have to keep the material imbalance within the range where the score is still sensitive to the advantage,say 30%-70%. So rather than delating, say, a Knight, I delete N+P vs R or 2N vs R. From these differences you can reconstruct the Elo value of P. N and R.
To get game diversity I usually start from many different (Chess960-like) positions that are all symmetric except for the deleted material. It isalso better to not rely on a single material imbalance, but test pieces against a variety of opposing material. (E.g. Q vs RR, BNN or BBN).
Note that deleting a Pawn ('classical Pawn odds') already gives about a 70-30 advantage. With piece odds the score rises to far above 90%, however,and can nolonger be described well by an Elo model. (You will just be measuring the probability that the advantaged side will commit a gross blunder that gets it checkmated early, which hardly depends on which piece the opponent is missing.)
So you have to keep the material imbalance within the range where the score is still sensitive to the advantage,say 30%-70%. So rather than delating, say, a Knight, I delete N+P vs R or 2N vs R. From these differences you can reconstruct the Elo value of P. N and R.
To get game diversity I usually start from many different (Chess960-like) positions that are all symmetric except for the deleted material. It isalso better to not rely on a single material imbalance, but test pieces against a variety of opposing material. (E.g. Q vs RR, BNN or BBN).
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Piece handicap elo diff, an idea for Kai Laskos for test
It's not that easy. The Bishop handicap means different ELO at different time control or strength. Say, Bishop handicap at 1800 ELO brings ELO to 1200, or 600 points handicap. But Bishop handicap at 3400 ELO may bring you to ELO 2000, 1400 points handicap. To illustrate here, the Bishop handicap c1 and f1 at depth 4 and 7 for Komodo 10:Isaac wrote:Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.
I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.
That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
c1:
Code: Select all
depth=4
+1057 =756 -8187 14.3%
depth=7
+561 =535 -8904 8.2%Code: Select all
depth=4
+801 =769 -8430 11.8%
depth=7
+439 =535 -9026 7.0%These issues have been discussed with regard to handicap matches organized by Larry Kaufman, and were shown in numerous threads and posts here, mostly in tournament section.
-
BeyondCritics
- Posts: 416
- Joined: Sat May 05, 2012 2:48 pm
- Full name: Oliver Roese
Re: Piece handicap elo diff, an idea for Kai Laskos for test
Are your results available? This would be of interest to programmers,chess players, problemists, ... etc.
-
Uri Blass
- Posts: 11222
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Piece handicap elo diff, an idea for Kai Laskos for test
I see one problems with all these games.Laskos wrote:It's not that easy. The Bishop handicap means different ELO at different time control or strength. Say, Bishop handicap at 1800 ELO brings ELO to 1200, or 600 points handicap. But Bishop handicap at 3400 ELO may bring you to ELO 2000, 1400 points handicap. To illustrate here, the Bishop handicap c1 and f1 at depth 4 and 7 for Komodo 10:Isaac wrote:Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.
I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.
That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
c1:f1:Code: Select all
depth=4 +1057 =756 -8187 14.3% depth=7 +561 =535 -8904 8.2%One thing is pretty clear, Bishop on f1 seems more valuable than at c1, and that the ELO value of the handicap increases substantially from depth=4 to depth=7. Comparing to what Stockfish does is a bit meaningless until we equal the strength, because these things are heavily strength dependent (or time control dependent). Also, self-play increases the ELO difference compared to playing different engines.Code: Select all
depth=4 +801 =769 -8430 11.8% depth=7 +439 =535 -9026 7.0%
These issues have been discussed with regard to handicap matches organized by Larry Kaufman, and were shown in numerous threads and posts here, mostly in tournament section.
Programs do not consider the opponent and do not play a move based on oponnent model.
It may be interesting to have programmer write anti-depth 4 for Komodo for example to find how much is possible to get against komodo depth 4.
The idea is for example that white play without queen d1 but in the search assume that black has to play the move that komodo suggest at depth 4 and prune all other options for black.
This type of selective search is anti-komodo depth 4 and it may be interesting to see how much handicap it can give to komodo depth 4 and still win or at least draw.
I also wonder if this type of selective search can do better against humans with big handicap(relative to normal komodo).
It may be better to stop this type of selective search if the program is already winning against humans in order not to take risks but I do not think that it is very important.
Last edited by Uri Blass on Tue Jun 28, 2016 5:19 pm, edited 1 time in total.
-
lkaufman
- Posts: 6299
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Piece handicap elo diff, an idea for Kai Laskos for test
The difference between the two bishops' results is much larger than can be accounted for by any terms in Komodo's eval. Rybka gave a bonus for the King's bishop, but it was just one centipawn or so, and later taken out I think. It would be interesting if you could repeat your test in (fast) timed games. If the results are too close to 100% you could remove the b8 knight in both cases to make for fairly even results, although this might introduce a small bias for one bishop or the other. If a significant superiority of the king's bishop still shows thru in all tests we would need to address this, although past tests showed that a king's bishop bonus doesn't test well with any meaningful value.Laskos wrote:It's not that easy. The Bishop handicap means different ELO at different time control or strength. Say, Bishop handicap at 1800 ELO brings ELO to 1200, or 600 points handicap. But Bishop handicap at 3400 ELO may bring you to ELO 2000, 1400 points handicap. To illustrate here, the Bishop handicap c1 and f1 at depth 4 and 7 for Komodo 10:Isaac wrote:Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.
I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.
That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
c1:f1:Code: Select all
depth=4 +1057 =756 -8187 14.3% depth=7 +561 =535 -8904 8.2%One thing is pretty clear, Bishop on f1 seems more valuable than at c1, and that the ELO value of the handicap increases substantially from depth=4 to depth=7. Comparing to what Stockfish does is a bit meaningless until we equal the strength, because these things are heavily strength dependent (or time control dependent). Also, self-play increases the ELO difference compared to playing different engines.Code: Select all
depth=4 +801 =769 -8430 11.8% depth=7 +439 =535 -9026 7.0%
These issues have been discussed with regard to handicap matches organized by Larry Kaufman, and were shown in numerous threads and posts here, mostly in tournament section.
Komodo rules!
-
lkaufman
- Posts: 6299
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Piece handicap elo diff, an idea for Kai Laskos for test
Laskos wrote:It's not that easy. The Bishop handicap means different ELO at different time control or strength. Say, Bishop handicap at 1800 ELO brings ELO to 1200, or 600 points handicap. But Bishop handicap at 3400 ELO may bring you to ELO 2000, 1400 points handicap. To illustrate here, the Bishop handicap c1 and f1 at depth 4 and 7 for Komodo 10:Isaac wrote:Hello people,
I have an idea for testing, especially for Kai Laskos or guys who like to perform some chess computer experiments.
Pick 2 popular enignes like Komodo and Stockfish. Pick a time control and pick 1 thread for simplicity.
For each engine, determine, by playing a few hundred games, how many elo are worth each piece and each pawn. Check and compare the differences between the 2 engines.
I'm almost sure that removing the left bishop weakens the engine in a different way than removing the right bishop. For both black and white pieces.
Remove each pawn (1 by 1) too.
That's a lot of testing but I guess could give us insights. Removing the queen could even give us some insights on whether Komodo or SF plays better against itself without the Queen. I.e. on which engine is "stronger" at using its other pieces.
c1:f1:Code: Select all
depth=4 +1057 =756 -8187 14.3% depth=7 +561 =535 -8904 8.2%How did you achieve variety at fixed depth? My guess is you used two threads, since a single thread run at fixed depth should give you identical games and hence all draws or all wins. Or did you do something else?Code: Select all
depth=4 +801 =769 -8430 11.8% depth=7 +439 =535 -9026 7.0%
Komodo rules!
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Piece handicap elo diff, an idea for Kai Laskos for test
I tested earlier these days at time control 2s+0.02s both Stockfish and Komodo. For variety of openings, I took the handicap opening position as start and played 4 plies with the random mover (several thousands very fast games), building a handicap opening book (PGN) for each handicap.lkaufman wrote:
How did you achieve variety at fixed depth? My guess is you used two threads, since a single thread run at fixed depth should give you identical games and hence all draws or all wins. Or did you do something else?
Here are the results at 2+0.02:
Code: Select all
Stockfish dev.
tc=2+0.02
Bishop c1
+33 =36 -931 5.1%
Bishop f1
+17 =40 -943 3.7%
Komodo 10
tc=2+0.02
Bishop c1
+47 =27 -926 6.0%
Bishop f1
+29 =37 -934 4.7%-
lkaufman
- Posts: 6299
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Piece handicap elo diff, an idea for Kai Laskos for test
Thanks. It is a known chess principle that in general the king's bishop is more valuable than the queen's bishop; I read an article on exactly that topic by a GM not long ago. But the magnitude of the difference in your tests is much more than what I observed in testing changes to Rybka and to Komodo. But it gives me motivation to work more on the issue.Laskos wrote:I tested earlier these days at time control 2s+0.02s both Stockfish and Komodo. For variety of openings, I took the handicap opening position as start and played 4 plies with the random mover (several thousands very fast games), building a handicap opening book (PGN) for each handicap.lkaufman wrote:
How did you achieve variety at fixed depth? My guess is you used two threads, since a single thread run at fixed depth should give you identical games and hence all draws or all wins. Or did you do something else?
Here are the results at 2+0.02:Again, for some reason f Bishop seems more valuable.Code: Select all
Stockfish dev. tc=2+0.02 Bishop c1 +33 =36 -931 5.1% Bishop f1 +17 =40 -943 3.7% Komodo 10 tc=2+0.02 Bishop c1 +47 =27 -926 6.0% Bishop f1 +29 =37 -934 4.7%
Komodo rules!