To Larry Kaufman

lkaufman · Post by **lkaufman** » Fri Feb 05, 2016 5:41 pm

cdani wrote:
lkaufman wrote: Can you make a general statement as to whether longer time controls favored higher king safety and passed scores or lower ones? The one example you give suggests that longer tc favors higher king safety weights.
For king safety seems more clear that ltc favoring higher king safety. For passed pawns I found some contradictory results, but maybe because where bad tuned before.

We also found that king safety values should be higher for ltc, but the effect was just a few elo, nothing like 50.

Laskos · Post by **Laskos** » Fri Feb 05, 2016 6:10 pm

Jesse Gersenson wrote:
Laskos wrote:
Code: Select all
 (Avg game length = 1.422 sec)
Settings = Gauntlet/32MB/1ms per move/M 900cp for 5 moves, D 150 moves/EPD:2moves_v1.epd(32000)
Time = 2509 sec elapsed, 0 sec remaining
 1.  Komodo 9.3 KS=30         	5122.5/10000	4315-4070-1615  	(L: m=4045 t=0 i=0 a=25)	(D: r=709 i=237 f=523 s=8 a=138)	(tpm=10.9 d=6.03 nps=1746716)
Where are you getting 3ms? It says "Avg game length = 1.422 sec". Are the games averaging 237 moves per game? Assuming 120 ply per game it's be 11.85ms per move.

I am not familiar with engine testing but it seems an engine's using 10x it's alloted time is an invalid test.

Is there a guideline for the minimum time per move, on one core, for testing?

Some dexterity is required in testing here, with issues about overhead and such, I will just show that the tests were perfectly fine. In Cutechess I tested at fixed depth for roughly the same consumed time per game with fixed time matches you don't like. I often prefer fixed time to fixed depth for several reasons like being closer to how engines are usually tested and for having more randomized games.

For ultra-ultra-fast games -- fixed depth=5, with an almost identical result to that 1 ms per move in LittleBlitzer:

Code: Select all

Score of Komodo 9.3 KS=30 vs Komodo 9.3 KS=50: 3896 - 3647 - 2457  [0.512] 10000
ELO difference: 9
Finished match

For ultra-fast games -- fixed depth=10, with an almost identical result to that 30 ms per move in LittleBlitzer:

Code: Select all

Score of Komodo 9.3 KS=30 vs Komodo 9.3 KS=50: 1367 - 1527 - 2106  [0.484] 5000
ELO difference: -11
Finished match

All this in less than an hour of testing, in order to show the scaling of King Safety in Komodo.

Also, to show that KS=50 is close to optimal at 30 ms per move or depth=10, after I compared it to KS=30, I compared KS=50 to KS=70:
depth=10

Code: Select all

Score of Komodo 9.3 KS=50 vs Komodo 9.3 KS=70: 311 - 241 - 448  [0.535] 1000
ELO difference: 24
Finished match

All the results are highly statistically significant.

beram · Post by **beram** » Fri Feb 05, 2016 7:58 pm

lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.

Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html

cdani · Post by **cdani** » Fri Feb 05, 2016 8:23 pm

lkaufman wrote: We also found that king safety values should be higher for ltc, but the effect was just a few elo, nothing like 50.

I was referring to the added effect of tuning all the parameters for a given time control, i.e., testing until one finds the optimal values for such time control. A lot of work

50 was a random high value that I think one can achieve with such testing method, probably even more.

But as most people does not tune for a given tc, what results is that some parameters are better for longer or maybe shorter tc, and the engine ends at some midpoint but with higher probability where his testing method tends to take him.

JJJ · Post by **JJJ** » Fri Feb 05, 2016 8:45 pm

beram wrote:
lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html

You re always worried about Komodo less good than Stockfish. Of course Komodo 9.3 or 9.2 is less good than Stockfish 7 at these condition. For many raisons, the more important is because Stockfish 7 is more recent.

Just wait like usual the next Komodo to be better than Stockfish 7 on these rating list, you know it's gonna happen anytime soon.

And in the meantime, Komodo is still the best at TCEC condition for sure.

lkaufman · Post by **lkaufman** » Sat Feb 06, 2016 12:24 am

beram wrote:
lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html

The overall ratings for top SF and top Komodo on these two lists are essentially tied. The overall ratings are far more important statistically than the individual match result. For whatever reason, Komodo always seems to do better in the rating lists relative to SF than in direct matches. Perhaps it's just some stylistic thing, or perhaps it's related to contempt, or both. Anyway it's up to each person to decide whether ratings against a variety of opponents or the results of direct matches are more important.

beram · Post by **beram** » Sat Feb 06, 2016 9:10 am

lkaufman wrote:
beram wrote:
lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html
The overall ratings for top SF and top Komodo on these two lists are essentially tied. The overall ratings are far more important statistically than the individual match result. For whatever reason, Komodo always seems to do better in the rating lists relative to SF than in direct matches. Perhaps it's just some stylistic thing, or perhaps it's related to contempt, or both. Anyway it's up to each person to decide whether ratings against a variety of opponents or the results of direct matches are more important.

The overall ratings indeed are essentially tied. But the individual match results of Stockfish are better on all these lists !
So saying that Stockfish only has a small edge on blitz and bullet is denying the thruth ore twisting it.

lkaufman · Post by **lkaufman** » Sat Feb 06, 2016 8:11 pm

beram wrote:
lkaufman wrote:
beram wrote:
lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html
The overall ratings for top SF and top Komodo on these two lists are essentially tied. The overall ratings are far more important statistically than the individual match result. For whatever reason, Komodo always seems to do better in the rating lists relative to SF than in direct matches. Perhaps it's just some stylistic thing, or perhaps it's related to contempt, or both. Anyway it's up to each person to decide whether ratings against a variety of opponents or the results of direct matches are more important.
The overall ratings indeed are essentially tied. But the individual match results of Stockfish are better on all these lists !
So saying that Stockfish only has a small edge on blitz and bullet is denying the thruth ore twisting it.

In blitz, looking at both 4 cpu and 1 cpu results for both CEGT and CCRL at 40/4', Stockfish has roughly a fifteen point lead. I would call this a small edge. Going by direct results the gap is indeed larger. Some of this may go away if it is tested without contempt, since the contempt setting is optimized for a range of opponents well below Komodo level, not for a roughly equal opponent. In any case the fifteen elo gap in blitz becomes zero at the intermediate tc, showing the trend that Komodo gains with more time.

Leto · Post by **Leto** » Sat Feb 06, 2016 8:54 pm

lkaufman wrote:
beram wrote:
lkaufman wrote:
beram wrote:
lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html
The overall ratings for top SF and top Komodo on these two lists are essentially tied. The overall ratings are far more important statistically than the individual match result. For whatever reason, Komodo always seems to do better in the rating lists relative to SF than in direct matches. Perhaps it's just some stylistic thing, or perhaps it's related to contempt, or both. Anyway it's up to each person to decide whether ratings against a variety of opponents or the results of direct matches are more important.
The overall ratings indeed are essentially tied. But the individual match results of Stockfish are better on all these lists !
So saying that Stockfish only has a small edge on blitz and bullet is denying the thruth ore twisting it.
In blitz, looking at both 4 cpu and 1 cpu results for both CEGT and CCRL at 40/4', Stockfish has roughly a fifteen point lead. I would call this a small edge. Going by direct results the gap is indeed larger. Some of this may go away if it is tested without contempt, since the contempt setting is optimized for a range of opponents well below Komodo level, not for a roughly equal opponent. In any case the fifteen elo gap in blitz becomes zero at the intermediate tc, showing the trend that Komodo gains with more time.

Agreed. At CEGT 40/20 if Komodo 9.2 or Komodo 9.3 were tested with 0 contempt I think it would win by a small margin against Stockfish 7.

In the blitz CEGT testing you can see that Komodo 9.2 0 contempt 12CPU scores 4% higher against Stockfish 7 12CPU than does the default contempt of 15. At the moment the 0 contempt version is 13 elo higher than the default version. It would seem to me that contempt does not or at least at the moment is not doing any favors for Komodo on the rating lists.

beram · Post by **beram** » Sat Feb 06, 2016 9:10 pm

lkaufman wrote:
beram wrote:
lkaufman wrote:
beram wrote:
lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html
The overall ratings for top SF and top Komodo on these two lists are essentially tied. The overall ratings are far more important statistically than the individual match result. For whatever reason, Komodo always seems to do better in the rating lists relative to SF than in direct matches. Perhaps it's just some stylistic thing, or perhaps it's related to contempt, or both. Anyway it's up to each person to decide whether ratings against a variety of opponents or the results of direct matches are more important.
The overall ratings indeed are essentially tied. But the individual match results of Stockfish are better on all these lists !
So saying that Stockfish only has a small edge on blitz and bullet is denying the thruth ore twisting it.
In blitz, looking at both 4 cpu and 1 cpu results for both CEGT and CCRL at 40/4', Stockfish has roughly a fifteen point lead. I would call this a small edge. Going by direct results the gap is indeed larger. Some of this may go away if it is tested without contempt, since the contempt setting is optimized for a range of opponents well below Komodo level, not for a roughly equal opponent. In any case the fifteen elo gap in blitz becomes zero at the intermediate tc, showing the trend that Komodo gains with more time.

Stockfish rules over Komodo in all the individual match results at CEGT and CCRL from blitz to 40/20 or 40/40
With or without contempt Komodo comes second
That is just a fact don't twist it

To Larry Kaufman

Re: To Larry Kaufman

Re: To Larry Kaufman

Re: To Larry Kaufman

Re: To Larry Kaufman

Re: To Larry Kaufman

Re: To Larry Kaufman

Re: To Larry Kaufman

Re: To Larry Kaufman

Re: To Larry Kaufman

Re: To Larry Kaufman