Evaluating the handicap

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Evaluating the handicap

Post by Laskos »

With starting positions used in several Komodo handicap matches, I tried to see the ELO value of the handicap at typical for these matches time control 45'+30'' on strong hardware. To achieve this, I subjected handicap positions to Fritz 15 Monte Carlo games (for each datapoint at least 1000 games), with Fritz at depth=13. Rescaling to the time control used in handicap matches, and then seeing Komodo 10.1 eval at analysis time of 1 minute on four cores for each position, I plotted the following:

Image

I fitted with quadratic function, as the linear one doesn't fit very well. It is interesting to see that the dots don't scatter much, the Komodo 10.1 eval seems to obey pretty precisely the ELO value of the handicap. If one is interested to know the value of the handicap for a given position, he can safely look at Komodo eval and translate it to ELO advantage for given time control and hardware.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Evaluating the handicap

Post by Laskos »

I tried the same with the latest Stockfish dev, the dots seem to scatter more. In both plots there is some noise associated with varying in time eval and the accuracy of Monte Carlo.

Image
lkaufman
Posts: 6299
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Evaluating the handicap

Post by lkaufman »

Laskos wrote:With starting positions used in several Komodo handicap matches, I tried to see the ELO value of the handicap at typical for these matches time control 45'+30'' on strong hardware. To achieve this, I subjected handicap positions to Fritz 15 Monte Carlo games (for each datapoint at least 1000 games), with Fritz at depth=13. Rescaling to the time control used in handicap matches, and then seeing Komodo 10.1 eval at analysis time of 1 minute on four cores for each position, I plotted the following:

Image

I fitted with quadratic function, as the linear one doesn't fit very well. It is interesting to see that the dots don't scatter much, the Komodo 10.1 eval seems to obey pretty precisely the ELO value of the handicap. If one is interested to know the value of the handicap for a given position, he can safely look at Komodo eval and translate it to ELO advantage for given time control and hardware.
Yes, I also noticed the strong correlation between Komodo evals of these handicaps and the results of MC playouts. The rating values look quite realistic to me. It would be nice to add a couple more handicaps we have used quite a bit, namely Exchange plus first move, and "pawn and two" which we define as no f7 pawn, white "e" pawn on e4, WTM. We clearly need more handicaps between roughly 700 elo and 1500 elo. We have tried f7 + g7, pawn and 3 moves (no f7, e4 and d4, WTM), and N for P (we tried c7,d7,e7 and f7). Exchange plus a2 was also planned.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Evaluating the handicap

Post by Laskos »

lkaufman wrote:
Laskos wrote:With starting positions used in several Komodo handicap matches, I tried to see the ELO value of the handicap at typical for these matches time control 45'+30'' on strong hardware. To achieve this, I subjected handicap positions to Fritz 15 Monte Carlo games (for each datapoint at least 1000 games), with Fritz at depth=13. Rescaling to the time control used in handicap matches, and then seeing Komodo 10.1 eval at analysis time of 1 minute on four cores for each position, I plotted the following:

Image

I fitted with quadratic function, as the linear one doesn't fit very well. It is interesting to see that the dots don't scatter much, the Komodo 10.1 eval seems to obey pretty precisely the ELO value of the handicap. If one is interested to know the value of the handicap for a given position, he can safely look at Komodo eval and translate it to ELO advantage for given time control and hardware.
Yes, I also noticed the strong correlation between Komodo evals of these handicaps and the results of MC playouts. The rating values look quite realistic to me. It would be nice to add a couple more handicaps we have used quite a bit, namely Exchange plus first move, and "pawn and two" which we define as no f7 pawn, white "e" pawn on e4, WTM. We clearly need more handicaps between roughly 700 elo and 1500 elo. We have tried f7 + g7, pawn and 3 moves (no f7, e4 and d4, WTM), and N for P (we tried c7,d7,e7 and f7). Exchange plus a2 was also planned.
I am testing now some of those, will have them tomorrow.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Evaluating the handicap

Post by Laskos »

Laskos wrote:
lkaufman wrote:
Laskos wrote:With starting positions used in several Komodo handicap matches, I tried to see the ELO value of the handicap at typical for these matches time control 45'+30'' on strong hardware. To achieve this, I subjected handicap positions to Fritz 15 Monte Carlo games (for each datapoint at least 1000 games), with Fritz at depth=13. Rescaling to the time control used in handicap matches, and then seeing Komodo 10.1 eval at analysis time of 1 minute on four cores for each position, I plotted the following:

Image

I fitted with quadratic function, as the linear one doesn't fit very well. It is interesting to see that the dots don't scatter much, the Komodo 10.1 eval seems to obey pretty precisely the ELO value of the handicap. If one is interested to know the value of the handicap for a given position, he can safely look at Komodo eval and translate it to ELO advantage for given time control and hardware.
Yes, I also noticed the strong correlation between Komodo evals of these handicaps and the results of MC playouts. The rating values look quite realistic to me. It would be nice to add a couple more handicaps we have used quite a bit, namely Exchange plus first move, and "pawn and two" which we define as no f7 pawn, white "e" pawn on e4, WTM. We clearly need more handicaps between roughly 700 elo and 1500 elo. We have tried f7 + g7, pawn and 3 moves (no f7, e4 and d4, WTM), and N for P (we tried c7,d7,e7 and f7). Exchange plus a2 was also planned.
I am testing now some of those, will have them tomorrow.
Did it, and all seems to fit nicely with Komodo eval, only one outlier (marked with red legend), the f7 pawn and 2 moves (d4,e4).

Image

I am checking now a bit the outlier.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Evaluating the handicap

Post by Laskos »

I checked the outlier in more games (3000+), it is corrected a bit now (red dot for f7+2moves), but still seems a bit of an outlier. Komodo seems to over-evaluate the handicap a bit.

Image
lkaufman
Posts: 6299
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Evaluating the handicap

Post by lkaufman »

Laskos wrote:I checked the outlier in more games (3000+), it is corrected a bit now (red dot for f7+2moves), but still seems a bit of an outlier. Komodo seems to over-evaluate the handicap a bit.

Image
The fit is really remarkable (except the one outlier). I think the problem with f7 + two extra moves (traditionally called "pawn and three moves" counting getting White as one) is that fully exploiting it requires very precise play, beyond the level of the MC playouts at 13 ply. I think this means it is not as suitable a handicap as others; results depend too much on how much the opponent prepares. When the handicap is just material with or without the White pieces, the difference between optimum opening play and "good" opening play is not so much.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Evaluating the handicap

Post by Laskos »

lkaufman wrote:
Laskos wrote:I checked the outlier in more games (3000+), it is corrected a bit now (red dot for f7+2moves), but still seems a bit of an outlier. Komodo seems to over-evaluate the handicap a bit.

Image
The fit is really remarkable (except the one outlier). I think the problem with f7 + two extra moves (traditionally called "pawn and three moves" counting getting White as one) is that fully exploiting it requires very precise play, beyond the level of the MC playouts at 13 ply. I think this means it is not as suitable a handicap as others; results depend too much on how much the opponent prepares. When the handicap is just material with or without the White pieces, the difference between optimum opening play and "good" opening play is not so much.
The same fit with Stockfish dev is less than satisfactory:

Image

It seems SF can be tuned a bit for material imbalance.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Evaluating the handicap

Post by Laskos »

I put on the same plot the evaluation of 15 handicaps by both Komodo and Stockfish:

Image