Evaluating the handicap

Laskos · Post by **Laskos** » Fri Sep 30, 2016 10:31 am

With starting positions used in several Komodo handicap matches, I tried to see the ELO value of the handicap at typical for these matches time control 45'+30'' on strong hardware. To achieve this, I subjected handicap positions to Fritz 15 Monte Carlo games (for each datapoint at least 1000 games), with Fritz at depth=13. Rescaling to the time control used in handicap matches, and then seeing Komodo 10.1 eval at analysis time of 1 minute on four cores for each position, I plotted the following:

I fitted with quadratic function, as the linear one doesn't fit very well. It is interesting to see that the dots don't scatter much, the Komodo 10.1 eval seems to obey pretty precisely the ELO value of the handicap. If one is interested to know the value of the handicap for a given position, he can safely look at Komodo eval and translate it to ELO advantage for given time control and hardware.

Laskos · Post by **Laskos** » Fri Sep 30, 2016 10:35 am

I tried the same with the latest Stockfish dev, the dots seem to scatter more. In both plots there is some noise associated with varying in time eval and the accuracy of Monte Carlo.

lkaufman · Post by **lkaufman** » Fri Sep 30, 2016 4:47 pm

Laskos wrote:With starting positions used in several Komodo handicap matches, I tried to see the ELO value of the handicap at typical for these matches time control 45'+30'' on strong hardware. To achieve this, I subjected handicap positions to Fritz 15 Monte Carlo games (for each datapoint at least 1000 games), with Fritz at depth=13. Rescaling to the time control used in handicap matches, and then seeing Komodo 10.1 eval at analysis time of 1 minute on four cores for each position, I plotted the following:

I fitted with quadratic function, as the linear one doesn't fit very well. It is interesting to see that the dots don't scatter much, the Komodo 10.1 eval seems to obey pretty precisely the ELO value of the handicap. If one is interested to know the value of the handicap for a given position, he can safely look at Komodo eval and translate it to ELO advantage for given time control and hardware.

Yes, I also noticed the strong correlation between Komodo evals of these handicaps and the results of MC playouts. The rating values look quite realistic to me. It would be nice to add a couple more handicaps we have used quite a bit, namely Exchange plus first move, and "pawn and two" which we define as no f7 pawn, white "e" pawn on e4, WTM. We clearly need more handicaps between roughly 700 elo and 1500 elo. We have tried f7 + g7, pawn and 3 moves (no f7, e4 and d4, WTM), and N for P (we tried c7,d7,e7 and f7). Exchange plus a2 was also planned.

Laskos · Post by **Laskos** » Fri Sep 30, 2016 10:04 pm

lkaufman wrote:
Laskos wrote:With starting positions used in several Komodo handicap matches, I tried to see the ELO value of the handicap at typical for these matches time control 45'+30'' on strong hardware. To achieve this, I subjected handicap positions to Fritz 15 Monte Carlo games (for each datapoint at least 1000 games), with Fritz at depth=13. Rescaling to the time control used in handicap matches, and then seeing Komodo 10.1 eval at analysis time of 1 minute on four cores for each position, I plotted the following:

I fitted with quadratic function, as the linear one doesn't fit very well. It is interesting to see that the dots don't scatter much, the Komodo 10.1 eval seems to obey pretty precisely the ELO value of the handicap. If one is interested to know the value of the handicap for a given position, he can safely look at Komodo eval and translate it to ELO advantage for given time control and hardware.
Yes, I also noticed the strong correlation between Komodo evals of these handicaps and the results of MC playouts. The rating values look quite realistic to me. It would be nice to add a couple more handicaps we have used quite a bit, namely Exchange plus first move, and "pawn and two" which we define as no f7 pawn, white "e" pawn on e4, WTM. We clearly need more handicaps between roughly 700 elo and 1500 elo. We have tried f7 + g7, pawn and 3 moves (no f7, e4 and d4, WTM), and N for P (we tried c7,d7,e7 and f7). Exchange plus a2 was also planned.

I am testing now some of those, will have them tomorrow.

Laskos · Post by **Laskos** » Sat Oct 01, 2016 12:23 pm

Laskos wrote:
lkaufman wrote:
Laskos wrote:With starting positions used in several Komodo handicap matches, I tried to see the ELO value of the handicap at typical for these matches time control 45'+30'' on strong hardware. To achieve this, I subjected handicap positions to Fritz 15 Monte Carlo games (for each datapoint at least 1000 games), with Fritz at depth=13. Rescaling to the time control used in handicap matches, and then seeing Komodo 10.1 eval at analysis time of 1 minute on four cores for each position, I plotted the following:

I fitted with quadratic function, as the linear one doesn't fit very well. It is interesting to see that the dots don't scatter much, the Komodo 10.1 eval seems to obey pretty precisely the ELO value of the handicap. If one is interested to know the value of the handicap for a given position, he can safely look at Komodo eval and translate it to ELO advantage for given time control and hardware.
Yes, I also noticed the strong correlation between Komodo evals of these handicaps and the results of MC playouts. The rating values look quite realistic to me. It would be nice to add a couple more handicaps we have used quite a bit, namely Exchange plus first move, and "pawn and two" which we define as no f7 pawn, white "e" pawn on e4, WTM. We clearly need more handicaps between roughly 700 elo and 1500 elo. We have tried f7 + g7, pawn and 3 moves (no f7, e4 and d4, WTM), and N for P (we tried c7,d7,e7 and f7). Exchange plus a2 was also planned.
I am testing now some of those, will have them tomorrow.

Did it, and all seems to fit nicely with Komodo eval, only one outlier (marked with red legend), the f7 pawn and 2 moves (d4,e4).

I am checking now a bit the outlier.

Laskos · Post by **Laskos** » Sat Oct 01, 2016 4:07 pm

I checked the outlier in more games (3000+), it is corrected a bit now (red dot for f7+2moves), but still seems a bit of an outlier. Komodo seems to over-evaluate the handicap a bit.

lkaufman · Post by **lkaufman** » Sat Oct 01, 2016 4:50 pm

Laskos wrote:I checked the outlier in more games (3000+), it is corrected a bit now (red dot for f7+2moves), but still seems a bit of an outlier. Komodo seems to over-evaluate the handicap a bit.

The fit is really remarkable (except the one outlier). I think the problem with f7 + two extra moves (traditionally called "pawn and three moves" counting getting White as one) is that fully exploiting it requires very precise play, beyond the level of the MC playouts at 13 ply. I think this means it is not as suitable a handicap as others; results depend too much on how much the opponent prepares. When the handicap is just material with or without the White pieces, the difference between optimum opening play and "good" opening play is not so much.

Laskos · Post by **Laskos** » Mon Oct 03, 2016 5:37 am

lkaufman wrote:
Laskos wrote:I checked the outlier in more games (3000+), it is corrected a bit now (red dot for f7+2moves), but still seems a bit of an outlier. Komodo seems to over-evaluate the handicap a bit.

The fit is really remarkable (except the one outlier). I think the problem with f7 + two extra moves (traditionally called "pawn and three moves" counting getting White as one) is that fully exploiting it requires very precise play, beyond the level of the MC playouts at 13 ply. I think this means it is not as suitable a handicap as others; results depend too much on how much the opponent prepares. When the handicap is just material with or without the White pieces, the difference between optimum opening play and "good" opening play is not so much.

The same fit with Stockfish dev is less than satisfactory:

It seems SF can be tuned a bit for material imbalance.

Laskos · Post by **Laskos** » Mon Oct 03, 2016 5:54 am

I put on the same plot the evaluation of 15 handicaps by both Komodo and Stockfish:

Evaluating the handicap

Evaluating the handicap

Re: Evaluating the handicap

Re: Evaluating the handicap

Re: Evaluating the handicap

Re: Evaluating the handicap

Re: Evaluating the handicap

Re: Evaluating the handicap

Re: Evaluating the handicap

Re: Evaluating the handicap