Some handicap results and conclusions.

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Some handicap results and conclusions.

Post by lkaufman »

I ran some handicap self-play matches using the latest Komodo version on single thread, zero Contempt for both sides, ten minutes plus 3 seconds, roughly like bullet chess using 24 cores at once. Roughly 400 games per handicap. Here are the results:

f2 handicap: White lost by 390 elo
f7 handicap: White won by 668 elo
f7 handicap with e4 played and WTM: White won by 931 elo.
c2 plus f2 handicap: White lost by 831 elo.
Exchange handicap (a1 for b8, insert ..Rb8): White lost by 619 elo.

These values look very reasonable to me, both relative to each other and in terms of how Komodo actually did in matches with human GMs.

Next I ran each handicap position for 2 1/2 minutes on six threads of fast laptop using latest Komodo in infinite mode (no Contempt). Here are the evals I got, in the above order: -1.00, +1.45, +2.30, -1.77, -1.36. Multiplying these by 440 each for scaling gives -440, +638, +1012, -779, -598. Reasonably close agreement, surely within sample error, but with some tendency for material to count for more in the results than in the evals.

Knight odds (b1) gave an eval of -3.32, which when multiplied by the above 440 scaling factor gives -1461. Since Komodo has actually performed somewhere in the lower 1900s FIDE at knight odds over the past two years, this implies an average rating of Komodo of about 3400 during that period, reasonable enough.

Based on the above plus some other tests I ran, I think I can now propose a series of handicaps of fairly increasing magnitude that seem ideal to me. I start with these premises:
1. We want the game to be as close as possible to normal chess, so never remove more than two chessmen.
2. The handicaps should be basically material, with neither side having a major positional advantage. Our experience has shown that humans are much better at exploiting material handicaps, and also that dynamic handicaps (involving king exposure) depend too much on preparation.
3. Pawn handicaps should neither expose the king nor aide development more than minimally. This means only "b" and "c" pawns.
4. In the opening position, a knight is worth four pawns. This seems to be quite an accurate and reliable rule.
5. Given a choice, the "c" pawn makes the most suitable pawn handicap.

This gives us an eight step handicap system, with each handicap corresponding to roughly a class (200 elo) difference when Komodo is the handicap giver. One class is the c2 pawn; two classes is the c7 pawn, three classes is the b2 + c2 pawns, four classes is the b7 + c7 pawns, five classes is the b1 knight for the c7 pawn, six classes is the b8 knight for the c2 pawn, seven classes is the b1 knight, and eight classes is the b8 knight. These handicaps roughly agree with the results of the 20 normal chess handicap matches held to date, given the huge error margins involved.

If we count Komodo as 3400, this would mean that the top five or so players after Carlsen would get b2 + c2 for a tossup match (at our standard 45 min plus 15 sec level), a 2600 FIDE GM would get b7 + c7, a 2400 FIDE IM would get N for P, a 2200 FIDE FM would get N and White for P, and a 2000 FIDE player would get knight odds. Does this scale sound about fair?
Komodo rules!
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Some handicap results and conclusions.

Post by Lyudmil Tsvetkov »

lkaufman wrote:I ran some handicap self-play matches using the latest Komodo version on single thread, zero Contempt for both sides, ten minutes plus 3 seconds, roughly like bullet chess using 24 cores at once. Roughly 400 games per handicap. Here are the results:

f2 handicap: White lost by 390 elo
f7 handicap: White won by 668 elo
f7 handicap with e4 played and WTM: White won by 931 elo.
c2 plus f2 handicap: White lost by 831 elo.
Exchange handicap (a1 for b8, insert ..Rb8): White lost by 619 elo.

These values look very reasonable to me, both relative to each other and in terms of how Komodo actually did in matches with human GMs.

Next I ran each handicap position for 2 1/2 minutes on six threads of fast laptop using latest Komodo in infinite mode (no Contempt). Here are the evals I got, in the above order: -1.00, +1.45, +2.30, -1.77, -1.36. Multiplying these by 440 each for scaling gives -440, +638, +1012, -779, -598. Reasonably close agreement, surely within sample error, but with some tendency for material to count for more in the results than in the evals.

Knight odds (b1) gave an eval of -3.32, which when multiplied by the above 440 scaling factor gives -1461. Since Komodo has actually performed somewhere in the lower 1900s FIDE at knight odds over the past two years, this implies an average rating of Komodo of about 3400 during that period, reasonable enough.

Based on the above plus some other tests I ran, I think I can now propose a series of handicaps of fairly increasing magnitude that seem ideal to me. I start with these premises:
1. We want the game to be as close as possible to normal chess, so never remove more than two chessmen.
2. The handicaps should be basically material, with neither side having a major positional advantage. Our experience has shown that humans are much better at exploiting material handicaps, and also that dynamic handicaps (involving king exposure) depend too much on preparation.
3. Pawn handicaps should neither expose the king nor aide development more than minimally. This means only "b" and "c" pawns.
4. In the opening position, a knight is worth four pawns. This seems to be quite an accurate and reliable rule.
5. Given a choice, the "c" pawn makes the most suitable pawn handicap.

This gives us an eight step handicap system, with each handicap corresponding to roughly a class (200 elo) difference when Komodo is the handicap giver. One class is the c2 pawn; two classes is the c7 pawn, three classes is the b2 + c2 pawns, four classes is the b7 + c7 pawns, five classes is the b1 knight for the c7 pawn, six classes is the b8 knight for the c2 pawn, seven classes is the b1 knight, and eight classes is the b8 knight. These handicaps roughly agree with the results of the 20 normal chess handicap matches held to date, given the huge error margins involved.

If we count Komodo as 3400, this would mean that the top five or so players after Carlsen would get b2 + c2 for a tossup match (at our standard 45 min plus 15 sec level), a 2600 FIDE GM would get b7 + c7, a 2400 FIDE IM would get N for P, a 2200 FIDE FM would get N and White for P, and a 2000 FIDE player would get knight odds. Does this scale sound about fair?
great maths here.

why would N for P vs N account for only 200 elo difference in strength?

I guess N for P is at least 2 times more difficult to win, especially when a human is involved, so maybe more like 400 elo span between both.

I bet Carlsen will win with above odds.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Some handicap results and conclusions.

Post by Laskos »

lkaufman wrote:I ran some handicap self-play matches using the latest Komodo version on single thread, zero Contempt for both sides, ten minutes plus 3 seconds, roughly like bullet chess using 24 cores at once. Roughly 400 games per handicap. Here are the results:

f2 handicap: White lost by 390 elo
f7 handicap: White won by 668 elo
f7 handicap with e4 played and WTM: White won by 931 elo.
c2 plus f2 handicap: White lost by 831 elo.
Exchange handicap (a1 for b8, insert ..Rb8): White lost by 619 elo.

These values look very reasonable to me, both relative to each other and in terms of how Komodo actually did in matches with human GMs.

Next I ran each handicap position for 2 1/2 minutes on six threads of fast laptop using latest Komodo in infinite mode (no Contempt). Here are the evals I got, in the above order: -1.00, +1.45, +2.30, -1.77, -1.36. Multiplying these by 440 each for scaling gives -440, +638, +1012, -779, -598. Reasonably close agreement, surely within sample error, but with some tendency for material to count for more in the results than in the evals.

Knight odds (b1) gave an eval of -3.32, which when multiplied by the above 440 scaling factor gives -1461. Since Komodo has actually performed somewhere in the lower 1900s FIDE at knight odds over the past two years, this implies an average rating of Komodo of about 3400 during that period, reasonable enough.

Based on the above plus some other tests I ran, I think I can now propose a series of handicaps of fairly increasing magnitude that seem ideal to me. I start with these premises:
1. We want the game to be as close as possible to normal chess, so never remove more than two chessmen.
2. The handicaps should be basically material, with neither side having a major positional advantage. Our experience has shown that humans are much better at exploiting material handicaps, and also that dynamic handicaps (involving king exposure) depend too much on preparation.
3. Pawn handicaps should neither expose the king nor aide development more than minimally. This means only "b" and "c" pawns.
4. In the opening position, a knight is worth four pawns. This seems to be quite an accurate and reliable rule.
5. Given a choice, the "c" pawn makes the most suitable pawn handicap.

This gives us an eight step handicap system, with each handicap corresponding to roughly a class (200 elo) difference when Komodo is the handicap giver. One class is the c2 pawn; two classes is the c7 pawn, three classes is the b2 + c2 pawns, four classes is the b7 + c7 pawns, five classes is the b1 knight for the c7 pawn, six classes is the b8 knight for the c2 pawn, seven classes is the b1 knight, and eight classes is the b8 knight. These handicaps roughly agree with the results of the 20 normal chess handicap matches held to date, given the huge error margins involved.

If we count Komodo as 3400, this would mean that the top five or so players after Carlsen would get b2 + c2 for a tossup match (at our standard 45 min plus 15 sec level), a 2600 FIDE GM would get b7 + c7, a 2400 FIDE IM would get N for P, a 2200 FIDE FM would get N and White for P, and a 2000 FIDE player would get knight odds. Does this scale sound about fair?
Nice!

Several remarks:
  • The ELO differences depend (in scaling) on time control. Generally humans improve more with longer TC. And a game in 45 min might be a bit different from a game in 120 min, but the latter is not very practical to watch.

    3400 FIDE in your regular conditions seems a bit high, I found maybe 3300 FIDE.

    Like you, I also found that Komodo's eval undervalues a bit material advantage compared to outcome in games. I don't know if this can be used as a guide to modify evaluation.

    400 games are a bit few for 800+ ELO points handicap, the errors are large. The result can depend on several draws and losses.

    You play into the strong points of humans, giving them almost purely material advantages. Humans against engines like material. On the other hand, 5-6 movers or "Crazy King" in the middle of the board positional advantages are very hard to convert for even strong humans, and are spectacular (if a bit clownish). But these are dependant heavily on preparation.
Uri Blass
Posts: 10279
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Some handicap results and conclusions.

Post by Uri Blass »

lkaufman wrote:I ran some handicap self-play matches using the latest Komodo version on single thread, zero Contempt for both sides, ten minutes plus 3 seconds, roughly like bullet chess using 24 cores at once. Roughly 400 games per handicap. Here are the results:

f2 handicap: White lost by 390 elo
f7 handicap: White won by 668 elo
f7 handicap with e4 played and WTM: White won by 931 elo.
c2 plus f2 handicap: White lost by 831 elo.
Exchange handicap (a1 for b8, insert ..Rb8): White lost by 619 elo.

These values look very reasonable to me, both relative to each other and in terms of how Komodo actually did in matches with human GMs.

Next I ran each handicap position for 2 1/2 minutes on six threads of fast laptop using latest Komodo in infinite mode (no Contempt). Here are the evals I got, in the above order: -1.00, +1.45, +2.30, -1.77, -1.36. Multiplying these by 440 each for scaling gives -440, +638, +1012, -779, -598. Reasonably close agreement, surely within sample error, but with some tendency for material to count for more in the results than in the evals.

Knight odds (b1) gave an eval of -3.32, which when multiplied by the above 440 scaling factor gives -1461. Since Komodo has actually performed somewhere in the lower 1900s FIDE at knight odds over the past two years, this implies an average rating of Komodo of about 3400 during that period, reasonable enough.

Based on the above plus some other tests I ran, I think I can now propose a series of handicaps of fairly increasing magnitude that seem ideal to me. I start with these premises:
1. We want the game to be as close as possible to normal chess, so never remove more than two chessmen.
2. The handicaps should be basically material, with neither side having a major positional advantage. Our experience has shown that humans are much better at exploiting material handicaps, and also that dynamic handicaps (involving king exposure) depend too much on preparation.
3. Pawn handicaps should neither expose the king nor aide development more than minimally. This means only "b" and "c" pawns.
4. In the opening position, a knight is worth four pawns. This seems to be quite an accurate and reliable rule.
5. Given a choice, the "c" pawn makes the most suitable pawn handicap.

This gives us an eight step handicap system, with each handicap corresponding to roughly a class (200 elo) difference when Komodo is the handicap giver. One class is the c2 pawn; two classes is the c7 pawn, three classes is the b2 + c2 pawns, four classes is the b7 + c7 pawns, five classes is the b1 knight for the c7 pawn, six classes is the b8 knight for the c2 pawn, seven classes is the b1 knight, and eight classes is the b8 knight. These handicaps roughly agree with the results of the 20 normal chess handicap matches held to date, given the huge error margins involved.

If we count Komodo as 3400, this would mean that the top five or so players after Carlsen would get b2 + c2 for a tossup match (at our standard 45 min plus 15 sec level), a 2600 FIDE GM would get b7 + c7, a 2400 FIDE IM would get N for P, a 2200 FIDE FM would get N and White for P, and a 2000 FIDE player would get knight odds. Does this scale sound about fair?
I would like to see computers in regular tournament for fide rating without knight b1(or even not for fide rating at time control 90+30 against players who have a fide rating).

I think that only by this way and not by match we can get a better estimate of the real strength of komodo without a knight against humans.

Suppose that some country decide to allow komodo to play without knight in tournament for national rating(when part of the players have fide rating below 1800 and part of the players have fide rating above 2200).

Can we expect in this case to see komodo playing in this tournament?

The idea is that the rules should not allow human to refuse to play against komodo and humans will go for prizes when they know in advance that maybe they will have to play komodo.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Some handicap results and conclusions.

Post by Lyudmil Tsvetkov »

Laskos wrote:
lkaufman wrote:I ran some handicap self-play matches using the latest Komodo version on single thread, zero Contempt for both sides, ten minutes plus 3 seconds, roughly like bullet chess using 24 cores at once. Roughly 400 games per handicap. Here are the results:

f2 handicap: White lost by 390 elo
f7 handicap: White won by 668 elo
f7 handicap with e4 played and WTM: White won by 931 elo.
c2 plus f2 handicap: White lost by 831 elo.
Exchange handicap (a1 for b8, insert ..Rb8): White lost by 619 elo.

These values look very reasonable to me, both relative to each other and in terms of how Komodo actually did in matches with human GMs.

Next I ran each handicap position for 2 1/2 minutes on six threads of fast laptop using latest Komodo in infinite mode (no Contempt). Here are the evals I got, in the above order: -1.00, +1.45, +2.30, -1.77, -1.36. Multiplying these by 440 each for scaling gives -440, +638, +1012, -779, -598. Reasonably close agreement, surely within sample error, but with some tendency for material to count for more in the results than in the evals.

Knight odds (b1) gave an eval of -3.32, which when multiplied by the above 440 scaling factor gives -1461. Since Komodo has actually performed somewhere in the lower 1900s FIDE at knight odds over the past two years, this implies an average rating of Komodo of about 3400 during that period, reasonable enough.

Based on the above plus some other tests I ran, I think I can now propose a series of handicaps of fairly increasing magnitude that seem ideal to me. I start with these premises:
1. We want the game to be as close as possible to normal chess, so never remove more than two chessmen.
2. The handicaps should be basically material, with neither side having a major positional advantage. Our experience has shown that humans are much better at exploiting material handicaps, and also that dynamic handicaps (involving king exposure) depend too much on preparation.
3. Pawn handicaps should neither expose the king nor aide development more than minimally. This means only "b" and "c" pawns.
4. In the opening position, a knight is worth four pawns. This seems to be quite an accurate and reliable rule.
5. Given a choice, the "c" pawn makes the most suitable pawn handicap.

This gives us an eight step handicap system, with each handicap corresponding to roughly a class (200 elo) difference when Komodo is the handicap giver. One class is the c2 pawn; two classes is the c7 pawn, three classes is the b2 + c2 pawns, four classes is the b7 + c7 pawns, five classes is the b1 knight for the c7 pawn, six classes is the b8 knight for the c2 pawn, seven classes is the b1 knight, and eight classes is the b8 knight. These handicaps roughly agree with the results of the 20 normal chess handicap matches held to date, given the huge error margins involved.

If we count Komodo as 3400, this would mean that the top five or so players after Carlsen would get b2 + c2 for a tossup match (at our standard 45 min plus 15 sec level), a 2600 FIDE GM would get b7 + c7, a 2400 FIDE IM would get N for P, a 2200 FIDE FM would get N and White for P, and a 2000 FIDE player would get knight odds. Does this scale sound about fair?
Nice!

Several remarks:
  • The ELO differences depend (in scaling) on time control. Generally humans improve more with longer TC. And a game in 45 min might be a bit different from a game in 120 min, but the latter is not very practical to watch.

    3400 FIDE in your regular conditions seems a bit high, I found maybe 3300 FIDE.

    Like you, I also found that Komodo's eval undervalues a bit material advantage compared to outcome in games. I don't know if this can be used as a guide to modify evaluation.

    400 games are a bit few for 800+ ELO points handicap, the errors are large. The result can depend on several draws and losses.

    You play into the strong points of humans, giving them almost purely material advantages. Humans against engines like material. On the other hand, 5-6 movers or "Crazy King" in the middle of the board positional advantages are very hard to convert for even strong humans, and are spectacular (if a bit clownish). But these are dependant heavily on preparation.
Komodo, just like SF, overvalues a whole lot mobility, when 1 or 2 pawns instead are much more weighty factor, but I am certain they will not be able to reproduce this in testing, as those terms are part of a larger pool.

in any case, both SF and Komodo would assess some 90% of very relevant opening positions where one side has a pawn or 2 more, but is badly developed/mobile, as advantageous for the side with better development/mobility, when the opposite side might even be winning.

example position, arising in the Scandinavian after e4 d5 ed5 Nf6 c4 c6 dc6 Nc6:

[d]r1bqkb1r/pp2pppp/2n2n2/8/2P5/8/PP1P1PPP/RNBQKBNR w KQkq - 0 5

SF thinks black is better here, I guess Komodo would assess it similarly, when actually the advantage is with white.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Some handicap results and conclusions.

Post by Laskos »

Lyudmil Tsvetkov wrote:
example position, arising in the Scandinavian after e4 d5 ed5 Nf6 c4 c6 dc6 Nc6:

[d]r1bqkb1r/pp2pppp/2n2n2/8/2P5/8/PP1P1PPP/RNBQKBNR w KQkq - 0 5

SF thinks black is better here, I guess Komodo would assess it similarly, when actually the advantage is with white.
Good catch! Komodo too evaluates that as very close to 0.00, sometimes even small Black advantage. I think most strong humans would prefer White.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Some handicap results and conclusions.

Post by lkaufman »

Lyudmil Tsvetkov wrote:
lkaufman wrote:I ran some handicap self-play matches using the latest Komodo version on single thread, zero Contempt for both sides, ten minutes plus 3 seconds, roughly like bullet chess using 24 cores at once. Roughly 400 games per handicap. Here are the results:

f2 handicap: White lost by 390 elo
f7 handicap: White won by 668 elo
f7 handicap with e4 played and WTM: White won by 931 elo.
c2 plus f2 handicap: White lost by 831 elo.
Exchange handicap (a1 for b8, insert ..Rb8): White lost by 619 elo.

These values look very reasonable to me, both relative to each other and in terms of how Komodo actually did in matches with human GMs.

Next I ran each handicap position for 2 1/2 minutes on six threads of fast laptop using latest Komodo in infinite mode (no Contempt). Here are the evals I got, in the above order: -1.00, +1.45, +2.30, -1.77, -1.36. Multiplying these by 440 each for scaling gives -440, +638, +1012, -779, -598. Reasonably close agreement, surely within sample error, but with some tendency for material to count for more in the results than in the evals.

Knight odds (b1) gave an eval of -3.32, which when multiplied by the above 440 scaling factor gives -1461. Since Komodo has actually performed somewhere in the lower 1900s FIDE at knight odds over the past two years, this implies an average rating of Komodo of about 3400 during that period, reasonable enough.

Based on the above plus some other tests I ran, I think I can now propose a series of handicaps of fairly increasing magnitude that seem ideal to me. I start with these premises:
1. We want the game to be as close as possible to normal chess, so never remove more than two chessmen.
2. The handicaps should be basically material, with neither side having a major positional advantage. Our experience has shown that humans are much better at exploiting material handicaps, and also that dynamic handicaps (involving king exposure) depend too much on preparation.
3. Pawn handicaps should neither expose the king nor aide development more than minimally. This means only "b" and "c" pawns.
4. In the opening position, a knight is worth four pawns. This seems to be quite an accurate and reliable rule.
5. Given a choice, the "c" pawn makes the most suitable pawn handicap.

This gives us an eight step handicap system, with each handicap corresponding to roughly a class (200 elo) difference when Komodo is the handicap giver. One class is the c2 pawn; two classes is the c7 pawn, three classes is the b2 + c2 pawns, four classes is the b7 + c7 pawns, five classes is the b1 knight for the c7 pawn, six classes is the b8 knight for the c2 pawn, seven classes is the b1 knight, and eight classes is the b8 knight. These handicaps roughly agree with the results of the 20 normal chess handicap matches held to date, given the huge error margins involved.

If we count Komodo as 3400, this would mean that the top five or so players after Carlsen would get b2 + c2 for a tossup match (at our standard 45 min plus 15 sec level), a 2600 FIDE GM would get b7 + c7, a 2400 FIDE IM would get N for P, a 2200 FIDE FM would get N and White for P, and a 2000 FIDE player would get knight odds. Does this scale sound about fair?
great maths here.

why would N for P vs N account for only 200 elo difference in strength?

I guess N for P is at least 2 times more difficult to win, especially when a human is involved, so maybe more like 400 elo span between both.

I bet Carlsen will win with above odds.
The table is based on a pawn = 400, knight = 4 pawns = 1600, one tempo = half pawn = 200, first move = half a tempo = 100. So knight for pawn is 400 away from knight, but switching color drops it to 200. My real opinion is that tempo is about .4 pawn in the opening so 160, and so first move is 80 instead of 100, which would make the step from N and White for Pawn to N equal to 240. But I wanted to keep it simple, something anyone can remember and reproduce easily.
As for Carlsen vs. Komodo at b2 + c2, we only tried this exact handicap once, beating GM Mikhalevski, but he is about 300 elo below Carlsen. Maybe as Kai says Komodo (and Stockfish) are not really quite 3400 in human Elo, so perhaps even if my handicap values are exactly right, Carlsen would indeed be favored at that handicap. But I'd bet the ranch (if I had one) on Komodo vs Carlsen at the next lower handicap, c7.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Some handicap results and conclusions.

Post by lkaufman »

Laskos wrote:
lkaufman wrote:I ran some handicap self-play matches using the latest Komodo version on single thread, zero Contempt for both sides, ten minutes plus 3 seconds, roughly like bullet chess using 24 cores at once. Roughly 400 games per handicap. Here are the results:

f2 handicap: White lost by 390 elo
f7 handicap: White won by 668 elo
f7 handicap with e4 played and WTM: White won by 931 elo.
c2 plus f2 handicap: White lost by 831 elo.
Exchange handicap (a1 for b8, insert ..Rb8): White lost by 619 elo.

These values look very reasonable to me, both relative to each other and in terms of how Komodo actually did in matches with human GMs.

Next I ran each handicap position for 2 1/2 minutes on six threads of fast laptop using latest Komodo in infinite mode (no Contempt). Here are the evals I got, in the above order: -1.00, +1.45, +2.30, -1.77, -1.36. Multiplying these by 440 each for scaling gives -440, +638, +1012, -779, -598. Reasonably close agreement, surely within sample error, but with some tendency for material to count for more in the results than in the evals.

Knight odds (b1) gave an eval of -3.32, which when multiplied by the above 440 scaling factor gives -1461. Since Komodo has actually performed somewhere in the lower 1900s FIDE at knight odds over the past two years, this implies an average rating of Komodo of about 3400 during that period, reasonable enough.

Based on the above plus some other tests I ran, I think I can now propose a series of handicaps of fairly increasing magnitude that seem ideal to me. I start with these premises:
1. We want the game to be as close as possible to normal chess, so never remove more than two chessmen.
2. The handicaps should be basically material, with neither side having a major positional advantage. Our experience has shown that humans are much better at exploiting material handicaps, and also that dynamic handicaps (involving king exposure) depend too much on preparation.
3. Pawn handicaps should neither expose the king nor aide development more than minimally. This means only "b" and "c" pawns.
4. In the opening position, a knight is worth four pawns. This seems to be quite an accurate and reliable rule.
5. Given a choice, the "c" pawn makes the most suitable pawn handicap.

This gives us an eight step handicap system, with each handicap corresponding to roughly a class (200 elo) difference when Komodo is the handicap giver. One class is the c2 pawn; two classes is the c7 pawn, three classes is the b2 + c2 pawns, four classes is the b7 + c7 pawns, five classes is the b1 knight for the c7 pawn, six classes is the b8 knight for the c2 pawn, seven classes is the b1 knight, and eight classes is the b8 knight. These handicaps roughly agree with the results of the 20 normal chess handicap matches held to date, given the huge error margins involved.

If we count Komodo as 3400, this would mean that the top five or so players after Carlsen would get b2 + c2 for a tossup match (at our standard 45 min plus 15 sec level), a 2600 FIDE GM would get b7 + c7, a 2400 FIDE IM would get N for P, a 2200 FIDE FM would get N and White for P, and a 2000 FIDE player would get knight odds. Does this scale sound about fair?
Nice!

Several remarks:
  • The ELO differences depend (in scaling) on time control. Generally humans improve more with longer TC. And a game in 45 min might be a bit different from a game in 120 min, but the latter is not very practical to watch.

    3400 FIDE in your regular conditions seems a bit high, I found maybe 3300 FIDE.

    Like you, I also found that Komodo's eval undervalues a bit material advantage compared to outcome in games. I don't know if this can be used as a guide to modify evaluation.

    400 games are a bit few for 800+ ELO points handicap, the errors are large. The result can depend on several draws and losses.

    You play into the strong points of humans, giving them almost purely material advantages. Humans against engines like material. On the other hand, 5-6 movers or "Crazy King" in the middle of the board positional advantages are very hard to convert for even strong humans, and are spectacular (if a bit clownish). But these are dependant heavily on preparation.
I am thinking that for future matches we could switch to triple time odds plus material, especially if I get even better hardware soon, as is likely. Perhaps human would get 90' + 30" (FIDE min standard tc for titled players), Komodo 30' + 10". Games would be 33% longer than at current tc, a bit long but not unreasonable. The time odds would approximate what would happen with equal time on a good but affordable home computer. Of course the handicaps would have to be smaller for a given elo opponent than at the 45' level. Komodo would play about the same, but the human would play much better. This way no more excuses for humans.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Some handicap results and conclusions.

Post by lkaufman »

Uri Blass wrote:
lkaufman wrote:I ran some handicap self-play matches using the latest Komodo version on single thread, zero Contempt for both sides, ten minutes plus 3 seconds, roughly like bullet chess using 24 cores at once. Roughly 400 games per handicap. Here are the results:

f2 handicap: White lost by 390 elo
f7 handicap: White won by 668 elo
f7 handicap with e4 played and WTM: White won by 931 elo.
c2 plus f2 handicap: White lost by 831 elo.
Exchange handicap (a1 for b8, insert ..Rb8): White lost by 619 elo.

These values look very reasonable to me, both relative to each other and in terms of how Komodo actually did in matches with human GMs.

Next I ran each handicap position for 2 1/2 minutes on six threads of fast laptop using latest Komodo in infinite mode (no Contempt). Here are the evals I got, in the above order: -1.00, +1.45, +2.30, -1.77, -1.36. Multiplying these by 440 each for scaling gives -440, +638, +1012, -779, -598. Reasonably close agreement, surely within sample error, but with some tendency for material to count for more in the results than in the evals.

Knight odds (b1) gave an eval of -3.32, which when multiplied by the above 440 scaling factor gives -1461. Since Komodo has actually performed somewhere in the lower 1900s FIDE at knight odds over the past two years, this implies an average rating of Komodo of about 3400 during that period, reasonable enough.

Based on the above plus some other tests I ran, I think I can now propose a series of handicaps of fairly increasing magnitude that seem ideal to me. I start with these premises:
1. We want the game to be as close as possible to normal chess, so never remove more than two chessmen.
2. The handicaps should be basically material, with neither side having a major positional advantage. Our experience has shown that humans are much better at exploiting material handicaps, and also that dynamic handicaps (involving king exposure) depend too much on preparation.
3. Pawn handicaps should neither expose the king nor aide development more than minimally. This means only "b" and "c" pawns.
4. In the opening position, a knight is worth four pawns. This seems to be quite an accurate and reliable rule.
5. Given a choice, the "c" pawn makes the most suitable pawn handicap.

This gives us an eight step handicap system, with each handicap corresponding to roughly a class (200 elo) difference when Komodo is the handicap giver. One class is the c2 pawn; two classes is the c7 pawn, three classes is the b2 + c2 pawns, four classes is the b7 + c7 pawns, five classes is the b1 knight for the c7 pawn, six classes is the b8 knight for the c2 pawn, seven classes is the b1 knight, and eight classes is the b8 knight. These handicaps roughly agree with the results of the 20 normal chess handicap matches held to date, given the huge error margins involved.

If we count Komodo as 3400, this would mean that the top five or so players after Carlsen would get b2 + c2 for a tossup match (at our standard 45 min plus 15 sec level), a 2600 FIDE GM would get b7 + c7, a 2400 FIDE IM would get N for P, a 2200 FIDE FM would get N and White for P, and a 2000 FIDE player would get knight odds. Does this scale sound about fair?
I would like to see computers in regular tournament for fide rating without knight b1(or even not for fide rating at time control 90+30 against players who have a fide rating).

I think that only by this way and not by match we can get a better estimate of the real strength of komodo without a knight against humans.

Suppose that some country decide to allow komodo to play without knight in tournament for national rating(when part of the players have fide rating below 1800 and part of the players have fide rating above 2200).

Can we expect in this case to see komodo playing in this tournament?

The idea is that the rules should not allow human to refuse to play against komodo and humans will go for prizes when they know in advance that maybe they will have to play komodo.
I would certainly cooperate if such an event were organized, but we wouldn't be willing to sponsor it other than awarding Komodo copies to winners or something like that. One practical problem is that you would either need many computers or one like my 24 core with only one core per game. So it wouldn't even answer the question of what Komodo could do at full strength.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Some handicap results and conclusions.

Post by lkaufman »

Laskos wrote:
Lyudmil Tsvetkov wrote:
example position, arising in the Scandinavian after e4 d5 ed5 Nf6 c4 c6 dc6 Nc6:

[d]r1bqkb1r/pp2pppp/2n2n2/8/2P5/8/PP1P1PPP/RNBQKBNR w KQkq - 0 5

SF thinks black is better here, I guess Komodo would assess it similarly, when actually the advantage is with white.
Good catch! Komodo too evaluates that as very close to 0.00, sometimes even small Black advantage. I think most strong humans would prefer White.
I don't think this is a "good catch". Strong human players avoid this variation as White like the plague. Only four games in my strong humans book, 50% score, and a losing score for White in six games in the Hiarcs powerbook (mostly engine games). Grandmasters would say Black has full compensation here. Whether White or Black is "better" is very hard to say, but the score should surely be quite close to zero, since even if White is for choice he has nothing like his normal first move advantage.
Komodo rules!