Some handicap results and conclusions.

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Some handicap results and conclusions.

Post by Lyudmil Tsvetkov »

lkaufman wrote:
Laskos wrote:
Lyudmil Tsvetkov wrote:
example position, arising in the Scandinavian after e4 d5 ed5 Nf6 c4 c6 dc6 Nc6:

[d]r1bqkb1r/pp2pppp/2n2n2/8/2P5/8/PP1P1PPP/RNBQKBNR w KQkq - 0 5

SF thinks black is better here, I guess Komodo would assess it similarly, when actually the advantage is with white.
Good catch! Komodo too evaluates that as very close to 0.00, sometimes even small Black advantage. I think most strong humans would prefer White.
I don't think this is a "good catch". Strong human players avoid this variation as White like the plague. Only four games in my strong humans book, 50% score, and a losing score for White in six games in the Hiarcs powerbook (mostly engine games). Grandmasters would say Black has full compensation here. Whether White or Black is "better" is very hard to say, but the score should surely be quite close to zero, since even if White is for choice he has nothing like his normal first move advantage.
after d3 e5 Be2:

[d]r1bqkb1r/pp3ppp/2n2n2/4p3/2P5/3P4/PP2BPPP/RNBQK1NR b KQkq - 0 2

white has small, but clear advantage

the game is still a draw here, but I don't know why you would consider giving -20cps score instead of +20cps a good evaluation?

those are full 40cps, almost half a pawn.

of course, c4 by white was weak, white has much better than that, but the point was to show some real-life examples, and this position was suitable.

you would be surprised to know that in each and every game one side has a pawn or 2 more, with other material being equal, Sf and Komodo would almost always display wrong evals.

for a human, that is evident.

I know very well you are proud of the way you managed to make Komodo play less materialistically, that is what actually made Komodo strong and is a distinction to other engines, but for such positions engine evals are clearly off.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Some handicap results and conclusions.

Post by Lyudmil Tsvetkov »

lkaufman wrote: But I'd bet the ranch (if I had one) on Komodo vs Carlsen at the next lower handicap, c7.
I am willing to bet your ranch too, that Carlsen will win. :)
Uri Blass
Posts: 10267
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Some handicap results and conclusions.

Post by Uri Blass »

lkaufman wrote:
Uri Blass wrote:
lkaufman wrote:I ran some handicap self-play matches using the latest Komodo version on single thread, zero Contempt for both sides, ten minutes plus 3 seconds, roughly like bullet chess using 24 cores at once. Roughly 400 games per handicap. Here are the results:

f2 handicap: White lost by 390 elo
f7 handicap: White won by 668 elo
f7 handicap with e4 played and WTM: White won by 931 elo.
c2 plus f2 handicap: White lost by 831 elo.
Exchange handicap (a1 for b8, insert ..Rb8): White lost by 619 elo.

These values look very reasonable to me, both relative to each other and in terms of how Komodo actually did in matches with human GMs.

Next I ran each handicap position for 2 1/2 minutes on six threads of fast laptop using latest Komodo in infinite mode (no Contempt). Here are the evals I got, in the above order: -1.00, +1.45, +2.30, -1.77, -1.36. Multiplying these by 440 each for scaling gives -440, +638, +1012, -779, -598. Reasonably close agreement, surely within sample error, but with some tendency for material to count for more in the results than in the evals.

Knight odds (b1) gave an eval of -3.32, which when multiplied by the above 440 scaling factor gives -1461. Since Komodo has actually performed somewhere in the lower 1900s FIDE at knight odds over the past two years, this implies an average rating of Komodo of about 3400 during that period, reasonable enough.

Based on the above plus some other tests I ran, I think I can now propose a series of handicaps of fairly increasing magnitude that seem ideal to me. I start with these premises:
1. We want the game to be as close as possible to normal chess, so never remove more than two chessmen.
2. The handicaps should be basically material, with neither side having a major positional advantage. Our experience has shown that humans are much better at exploiting material handicaps, and also that dynamic handicaps (involving king exposure) depend too much on preparation.
3. Pawn handicaps should neither expose the king nor aide development more than minimally. This means only "b" and "c" pawns.
4. In the opening position, a knight is worth four pawns. This seems to be quite an accurate and reliable rule.
5. Given a choice, the "c" pawn makes the most suitable pawn handicap.

This gives us an eight step handicap system, with each handicap corresponding to roughly a class (200 elo) difference when Komodo is the handicap giver. One class is the c2 pawn; two classes is the c7 pawn, three classes is the b2 + c2 pawns, four classes is the b7 + c7 pawns, five classes is the b1 knight for the c7 pawn, six classes is the b8 knight for the c2 pawn, seven classes is the b1 knight, and eight classes is the b8 knight. These handicaps roughly agree with the results of the 20 normal chess handicap matches held to date, given the huge error margins involved.

If we count Komodo as 3400, this would mean that the top five or so players after Carlsen would get b2 + c2 for a tossup match (at our standard 45 min plus 15 sec level), a 2600 FIDE GM would get b7 + c7, a 2400 FIDE IM would get N for P, a 2200 FIDE FM would get N and White for P, and a 2000 FIDE player would get knight odds. Does this scale sound about fair?
I would like to see computers in regular tournament for fide rating without knight b1(or even not for fide rating at time control 90+30 against players who have a fide rating).

I think that only by this way and not by match we can get a better estimate of the real strength of komodo without a knight against humans.

Suppose that some country decide to allow komodo to play without knight in tournament for national rating(when part of the players have fide rating below 1800 and part of the players have fide rating above 2200).

Can we expect in this case to see komodo playing in this tournament?

The idea is that the rules should not allow human to refuse to play against komodo and humans will go for prizes when they know in advance that maybe they will have to play komodo.
I would certainly cooperate if such an event were organized, but we wouldn't be willing to sponsor it other than awarding Komodo copies to winners or something like that. One practical problem is that you would either need many computers or one like my 24 core with only one core per game. So it wouldn't even answer the question of what Komodo could do at full strength.
I see no reason that you need many computers unless you play many different games of komodo at the same time.

I thought simply about a tournament when komodo(without a knight) is one of the players so only one human play against it in every round.

I think that it is possible to use 24 cores for every game if you have 24 core computer to use when the problem is of course to find sponsors.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Some handicap results and conclusions.

Post by Lyudmil Tsvetkov »

[d]r1bqkb1r/pp1p1ppp/2n2n2/4p3/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

another one

SF says initially 20-30cps black edge.

actually, white has substantial advantage and might be even winning that.

I don't know what Komodo say on this, but white is much better.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Some handicap results and conclusions.

Post by lkaufman »

Lyudmil Tsvetkov wrote:[d]r1bqkb1r/pp1p1ppp/2n2n2/4p3/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

another one

SF says initially 20-30cps black edge.

actually, white has substantial advantage and might be even winning that.

I don't know what Komodo say on this, but white is much better.
What is your basis for saying that White is much better here and also in the Scandanavian gambit position? To me they are both quite equal, I would have trouble choosing which side I wanted to play. Just saying you think so might count for something if you were Carlsen, but as it is you need some evidence. I could try MC playouts but you would say they are not valid due to fixed depth or to weakness of Fritz 15.
Komodo rules!
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Some handicap results and conclusions.

Post by Lyudmil Tsvetkov »

lkaufman wrote:
Lyudmil Tsvetkov wrote:[d]r1bqkb1r/pp1p1ppp/2n2n2/4p3/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

another one

SF says initially 20-30cps black edge.

actually, white has substantial advantage and might be even winning that.

I don't know what Komodo say on this, but white is much better.
What is your basis for saying that White is much better here and also in the Scandanavian gambit position? To me they are both quite equal, I would have trouble choosing which side I wanted to play. Just saying you think so might count for something if you were Carlsen, but as it is you need some evidence. I could try MC playouts but you would say they are not valid due to fixed depth or to weakness of Fritz 15.
would be useful, if you try some tests with normal TC on this position.

in case the result falls about equal, or black has the advantage, I will have to admit I have been terribly wrong.

if, however, white wins by some margin, you will have to acknowledge my chess understanding is very good, even though I am not Carlsen.

and a direct hint to you: Komodo, as well as other top chess engines, have very important eval term missing from their eval, which is of primary significance here.

please look more carefully at the compact pawn structures section in my book, as this is the missing eval link, and not so much tuning the values of mobility, psqt and material.

on above diagram, white has extremely compact pawn structure, with all pawns next to each other, even though unadvanced. black almost has no such compact pawns, apart from a single g7.

that certainly turns the game around.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Some handicap results and conclusions.

Post by lkaufman »

Uri Blass wrote:
lkaufman wrote:
Uri Blass wrote:
lkaufman wrote:I ran some handicap self-play matches using the latest Komodo version on single thread, zero Contempt for both sides, ten minutes plus 3 seconds, roughly like bullet chess using 24 cores at once. Roughly 400 games per handicap. Here are the results:

f2 handicap: White lost by 390 elo
f7 handicap: White won by 668 elo
f7 handicap with e4 played and WTM: White won by 931 elo.
c2 plus f2 handicap: White lost by 831 elo.
Exchange handicap (a1 for b8, insert ..Rb8): White lost by 619 elo.

These values look very reasonable to me, both relative to each other and in terms of how Komodo actually did in matches with human GMs.

Next I ran each handicap position for 2 1/2 minutes on six threads of fast laptop using latest Komodo in infinite mode (no Contempt). Here are the evals I got, in the above order: -1.00, +1.45, +2.30, -1.77, -1.36. Multiplying these by 440 each for scaling gives -440, +638, +1012, -779, -598. Reasonably close agreement, surely within sample error, but with some tendency for material to count for more in the results than in the evals.

Knight odds (b1) gave an eval of -3.32, which when multiplied by the above 440 scaling factor gives -1461. Since Komodo has actually performed somewhere in the lower 1900s FIDE at knight odds over the past two years, this implies an average rating of Komodo of about 3400 during that period, reasonable enough.

Based on the above plus some other tests I ran, I think I can now propose a series of handicaps of fairly increasing magnitude that seem ideal to me. I start with these premises:
1. We want the game to be as close as possible to normal chess, so never remove more than two chessmen.
2. The handicaps should be basically material, with neither side having a major positional advantage. Our experience has shown that humans are much better at exploiting material handicaps, and also that dynamic handicaps (involving king exposure) depend too much on preparation.
3. Pawn handicaps should neither expose the king nor aide development more than minimally. This means only "b" and "c" pawns.
4. In the opening position, a knight is worth four pawns. This seems to be quite an accurate and reliable rule.
5. Given a choice, the "c" pawn makes the most suitable pawn handicap.

This gives us an eight step handicap system, with each handicap corresponding to roughly a class (200 elo) difference when Komodo is the handicap giver. One class is the c2 pawn; two classes is the c7 pawn, three classes is the b2 + c2 pawns, four classes is the b7 + c7 pawns, five classes is the b1 knight for the c7 pawn, six classes is the b8 knight for the c2 pawn, seven classes is the b1 knight, and eight classes is the b8 knight. These handicaps roughly agree with the results of the 20 normal chess handicap matches held to date, given the huge error margins involved.

If we count Komodo as 3400, this would mean that the top five or so players after Carlsen would get b2 + c2 for a tossup match (at our standard 45 min plus 15 sec level), a 2600 FIDE GM would get b7 + c7, a 2400 FIDE IM would get N for P, a 2200 FIDE FM would get N and White for P, and a 2000 FIDE player would get knight odds. Does this scale sound about fair?
I would like to see computers in regular tournament for fide rating without knight b1(or even not for fide rating at time control 90+30 against players who have a fide rating).

I think that only by this way and not by match we can get a better estimate of the real strength of komodo without a knight against humans.

Suppose that some country decide to allow komodo to play without knight in tournament for national rating(when part of the players have fide rating below 1800 and part of the players have fide rating above 2200).

Can we expect in this case to see komodo playing in this tournament?

The idea is that the rules should not allow human to refuse to play against komodo and humans will go for prizes when they know in advance that maybe they will have to play komodo.
I would certainly cooperate if such an event were organized, but we wouldn't be willing to sponsor it other than awarding Komodo copies to winners or something like that. One practical problem is that you would either need many computers or one like my 24 core with only one core per game. So it wouldn't even answer the question of what Komodo could do at full strength.
I see no reason that you need many computers unless you play many different games of komodo at the same time.

I thought simply about a tournament when komodo(without a knight) is one of the players so only one human play against it in every round.

I think that it is possible to use 24 cores for every game if you have 24 core computer to use when the problem is of course to find sponsors.
OK, I was thinking of the Scheveningen style tournament where "team Komodo" plays "team world". Your way would just require an organizer to announce that one player would have to play an unrated game (but for prize money) vs Komodo each round. Maybe it could be one or two players each round, depending on whether there are an odd or even number of players that round. Then it could be advertised as a way to avoid byes while getting players a shot at some extra money. We could supply connection to two good computers and komodo copies or small money prizes for winners. All it would take would be a willing organizer and a willing operator.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Some handicap results and conclusions.

Post by Laskos »

lkaufman wrote:
Lyudmil Tsvetkov wrote:[d]r1bqkb1r/pp1p1ppp/2n2n2/4p3/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

another one

SF says initially 20-30cps black edge.

actually, white has substantial advantage and might be even winning that.

I don't know what Komodo say on this, but white is much better.
What is your basis for saying that White is much better here and also in the Scandanavian gambit position? To me they are both quite equal, I would have trouble choosing which side I wanted to play. Just saying you think so might count for something if you were Carlsen, but as it is you need some evidence. I could try MC playouts but you would say they are not valid due to fixed depth or to weakness of Fritz 15.
I did play at 10''+0.1'' Komodo self games on Scandinavian, building book and all that. The performance is 50.6% for White, very balanced, below average White opening. So, you are right that score should be around 0.00, as it is both in SF and Komodo.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Some handicap results and conclusions.

Post by Lyudmil Tsvetkov »

Laskos wrote:
lkaufman wrote:
Lyudmil Tsvetkov wrote:[d]r1bqkb1r/pp1p1ppp/2n2n2/4p3/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

another one

SF says initially 20-30cps black edge.

actually, white has substantial advantage and might be even winning that.

I don't know what Komodo say on this, but white is much better.
What is your basis for saying that White is much better here and also in the Scandanavian gambit position? To me they are both quite equal, I would have trouble choosing which side I wanted to play. Just saying you think so might count for something if you were Carlsen, but as it is you need some evidence. I could try MC playouts but you would say they are not valid due to fixed depth or to weakness of Fritz 15.
I did play at 10''+0.1'' Komodo self games on Scandinavian, building book and all that. The performance is 50.6% for White, very balanced, below average White opening. So, you are right that score should be around 0.00, as it is both in SF and Komodo.
when did I claim the Scandinavian position is won for white?

I said, white has small, but clear advantage, which was confrimed by your results, btw.

white is still winning, is not it?

so, if Komodo and SF would assess it as slightly advantageous for black, while white wins in actual case, I am very much more correct, of course.

what I claimed is that on the position featured above, the non-Scandinavian position, white has significant advantage, and that will also translate in score, if you are so kind to run some tests with the second position I posted.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Some handicap results and conclusions.

Post by lkaufman »

Lyudmil Tsvetkov wrote:
Laskos wrote:
lkaufman wrote:
Lyudmil Tsvetkov wrote:[d]r1bqkb1r/pp1p1ppp/2n2n2/4p3/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

another one

SF says initially 20-30cps black edge.

actually, white has substantial advantage and might be even winning that.

I don't know what Komodo say on this, but white is much better.
What is your basis for saying that White is much better here and also in the Scandanavian gambit position? To me they are both quite equal, I would have trouble choosing which side I wanted to play. Just saying you think so might count for something if you were Carlsen, but as it is you need some evidence. I could try MC playouts but you would say they are not valid due to fixed depth or to weakness of Fritz 15.
I did play at 10''+0.1'' Komodo self games on Scandinavian, building book and all that. The performance is 50.6% for White, very balanced, below average White opening. So, you are right that score should be around 0.00, as it is both in SF and Komodo.
when did I claim the Scandinavian position is won for white?

I said, white has small, but clear advantage, which was confrimed by your results, btw.

white is still winning, is not it?

so, if Komodo and SF would assess it as slightly advantageous for black, while white wins in actual case, I am very much more correct, of course.

what I claimed is that on the position featured above, the non-Scandinavian position, white has significant advantage, and that will also translate in score, if you are so kind to run some tests with the second position I posted.
I can't run an arbitrary position in our tester without Involvement by Mark, and he is on vacation, but I did run it overnight on the Fritz 15 MC tester at 16 ply, which is more or less bullet chess (Fritz 16 ply is stronger than Komodo or Stockfish 16 ply, though much slower). This is your own position, not the Scandanavian one. After 742 games White scored 48.7% for minus 9 elo; Komodo rated it at zero after half a minute or so. I also ran it at ten ply to see if the increased depth at 16 ply helped White, but I got exactly 50-50 at ten ply. So I see no evidence that White is better. As for Kai's result, a +4 elo result for the Scandanavian gambit is so tiny as to be meaningless and does not suggest that the eval should be more than perhaps a couple centiply above zero even if it were confirmed with a zillion games. Maybe Stockfish does overweight mobility a bit in your examples if it reports negative scores, but Komodo seems to be right on the money here.
Komodo rules!