To Larry Kaufman

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

lkaufman
Posts: 6257
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: To Larry Kaufman

Post by lkaufman »

cdani wrote:
lkaufman wrote: Can you make a general statement as to whether longer time controls favored higher king safety and passed scores or lower ones? The one example you give suggests that longer tc favors higher king safety weights.
For king safety seems more clear that ltc favoring higher king safety. For passed pawns I found some contradictory results, but maybe because where bad tuned before.
We also found that king safety values should be higher for ltc, but the effect was just a few elo, nothing like 50.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: To Larry Kaufman

Post by Laskos »

Jesse Gersenson wrote:
Laskos wrote:

Code: Select all

 (Avg game length = 1.422 sec)
Settings = Gauntlet/32MB/1ms per move/M 900cp for 5 moves, D 150 moves/EPD:2moves_v1.epd(32000)
Time = 2509 sec elapsed, 0 sec remaining
 1.  Komodo 9.3 KS=30         	5122.5/10000	4315-4070-1615  	(L: m=4045 t=0 i=0 a=25)	(D: r=709 i=237 f=523 s=8 a=138)	(tpm=10.9 d=6.03 nps=1746716)
Where are you getting 3ms? It says "Avg game length = 1.422 sec". Are the games averaging 237 moves per game? Assuming 120 ply per game it's be 11.85ms per move.

I am not familiar with engine testing but it seems an engine's using 10x it's alloted time is an invalid test.

Is there a guideline for the minimum time per move, on one core, for testing?
Some dexterity is required in testing here, with issues about overhead and such, I will just show that the tests were perfectly fine. In Cutechess I tested at fixed depth for roughly the same consumed time per game with fixed time matches you don't like. I often prefer fixed time to fixed depth for several reasons like being closer to how engines are usually tested and for having more randomized games.

For ultra-ultra-fast games -- fixed depth=5, with an almost identical result to that 1 ms per move in LittleBlitzer:

Code: Select all

Score of Komodo 9.3 KS=30 vs Komodo 9.3 KS=50: 3896 - 3647 - 2457  [0.512] 10000
ELO difference: 9
Finished match
For ultra-fast games -- fixed depth=10, with an almost identical result to that 30 ms per move in LittleBlitzer:

Code: Select all

Score of Komodo 9.3 KS=30 vs Komodo 9.3 KS=50: 1367 - 1527 - 2106  [0.484] 5000
ELO difference: -11
Finished match
All this in less than an hour of testing, in order to show the scaling of King Safety in Komodo.

Also, to show that KS=50 is close to optimal at 30 ms per move or depth=10, after I compared it to KS=30, I compared KS=50 to KS=70:
depth=10

Code: Select all

Score of Komodo 9.3 KS=50 vs Komodo 9.3 KS=70: 311 - 241 - 448  [0.535] 1000
ELO difference: 24
Finished match
All the results are highly statistically significant.
beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: To Larry Kaufman

Post by beram »

lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: To Larry Kaufman

Post by cdani »

lkaufman wrote: We also found that king safety values should be higher for ltc, but the effect was just a few elo, nothing like 50.
I was referring to the added effect of tuning all the parameters for a given time control, i.e., testing until one finds the optimal values for such time control. A lot of work :-) 50 was a random high value that I think one can achieve with such testing method, probably even more.

But as most people does not tune for a given tc, what results is that some parameters are better for longer or maybe shorter tc, and the engine ends at some midpoint but with higher probability where his testing method tends to take him.
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: To Larry Kaufman

Post by JJJ »

beram wrote:
lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html
You re always worried about Komodo less good than Stockfish. Of course Komodo 9.3 or 9.2 is less good than Stockfish 7 at these condition. For many raisons, the more important is because Stockfish 7 is more recent.

Just wait like usual the next Komodo to be better than Stockfish 7 on these rating list, you know it's gonna happen anytime soon.

And in the meantime, Komodo is still the best at TCEC condition for sure.
lkaufman
Posts: 6257
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: To Larry Kaufman

Post by lkaufman »

beram wrote:
lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html
The overall ratings for top SF and top Komodo on these two lists are essentially tied. The overall ratings are far more important statistically than the individual match result. For whatever reason, Komodo always seems to do better in the rating lists relative to SF than in direct matches. Perhaps it's just some stylistic thing, or perhaps it's related to contempt, or both. Anyway it's up to each person to decide whether ratings against a variety of opponents or the results of direct matches are more important.
Komodo rules!
beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: To Larry Kaufman

Post by beram »

lkaufman wrote:
beram wrote:
lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html
The overall ratings for top SF and top Komodo on these two lists are essentially tied. The overall ratings are far more important statistically than the individual match result. For whatever reason, Komodo always seems to do better in the rating lists relative to SF than in direct matches. Perhaps it's just some stylistic thing, or perhaps it's related to contempt, or both. Anyway it's up to each person to decide whether ratings against a variety of opponents or the results of direct matches are more important.
The overall ratings indeed are essentially tied. But the individual match results of Stockfish are better on all these lists !
So saying that Stockfish only has a small edge on blitz and bullet is denying the thruth ore twisting it.
lkaufman
Posts: 6257
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: To Larry Kaufman

Post by lkaufman »

beram wrote:
lkaufman wrote:
beram wrote:
lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html
The overall ratings for top SF and top Komodo on these two lists are essentially tied. The overall ratings are far more important statistically than the individual match result. For whatever reason, Komodo always seems to do better in the rating lists relative to SF than in direct matches. Perhaps it's just some stylistic thing, or perhaps it's related to contempt, or both. Anyway it's up to each person to decide whether ratings against a variety of opponents or the results of direct matches are more important.
The overall ratings indeed are essentially tied. But the individual match results of Stockfish are better on all these lists !
So saying that Stockfish only has a small edge on blitz and bullet is denying the thruth ore twisting it.
In blitz, looking at both 4 cpu and 1 cpu results for both CEGT and CCRL at 40/4', Stockfish has roughly a fifteen point lead. I would call this a small edge. Going by direct results the gap is indeed larger. Some of this may go away if it is tested without contempt, since the contempt setting is optimized for a range of opponents well below Komodo level, not for a roughly equal opponent. In any case the fifteen elo gap in blitz becomes zero at the intermediate tc, showing the trend that Komodo gains with more time.
Komodo rules!
User avatar
Leto
Posts: 2071
Joined: Thu May 04, 2006 3:40 am
Location: Dune

Re: To Larry Kaufman

Post by Leto »

lkaufman wrote:
beram wrote:
lkaufman wrote:
beram wrote:
lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html
The overall ratings for top SF and top Komodo on these two lists are essentially tied. The overall ratings are far more important statistically than the individual match result. For whatever reason, Komodo always seems to do better in the rating lists relative to SF than in direct matches. Perhaps it's just some stylistic thing, or perhaps it's related to contempt, or both. Anyway it's up to each person to decide whether ratings against a variety of opponents or the results of direct matches are more important.
The overall ratings indeed are essentially tied. But the individual match results of Stockfish are better on all these lists !
So saying that Stockfish only has a small edge on blitz and bullet is denying the thruth ore twisting it.
In blitz, looking at both 4 cpu and 1 cpu results for both CEGT and CCRL at 40/4', Stockfish has roughly a fifteen point lead. I would call this a small edge. Going by direct results the gap is indeed larger. Some of this may go away if it is tested without contempt, since the contempt setting is optimized for a range of opponents well below Komodo level, not for a roughly equal opponent. In any case the fifteen elo gap in blitz becomes zero at the intermediate tc, showing the trend that Komodo gains with more time.
Agreed. At CEGT 40/20 if Komodo 9.2 or Komodo 9.3 were tested with 0 contempt I think it would win by a small margin against Stockfish 7.

In the blitz CEGT testing you can see that Komodo 9.2 0 contempt 12CPU scores 4% higher against Stockfish 7 12CPU than does the default contempt of 15. At the moment the 0 contempt version is 13 elo higher than the default version. It would seem to me that contempt does not or at least at the moment is not doing any favors for Komodo on the rating lists.
beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: To Larry Kaufman

Post by beram »

lkaufman wrote:
beram wrote:
lkaufman wrote:
beram wrote:
lkaufman wrote:We would very much like to improve Komodo's blitz/bullet chess, since that's the one area where Stockfish seems to have a small edge. But we don't yet know why Stockfish is stronger at bullet level play, so it's hard to fix this except by generally improving Komodo. We can say it's because our better eval is a bit slower, but we're not that much slower than Stockfish, so there is something else going on that we would love to identify and fix. One clue: we have never been able to make "probcut" work for us, although it seems to work fine in stockfish. No idea why this is so.
Dear Larry,
The CEGT 40/20 or CCRL 40/40 list is not a blitz chess list
53,6 % and 54 % for SF7 vs K9.3 on 4 cores on these lists is a small margin at LTC chess

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_4CPU
http://www.husvankempen.de/nunn/40_40%2 ... ons/2.html
The overall ratings for top SF and top Komodo on these two lists are essentially tied. The overall ratings are far more important statistically than the individual match result. For whatever reason, Komodo always seems to do better in the rating lists relative to SF than in direct matches. Perhaps it's just some stylistic thing, or perhaps it's related to contempt, or both. Anyway it's up to each person to decide whether ratings against a variety of opponents or the results of direct matches are more important.
The overall ratings indeed are essentially tied. But the individual match results of Stockfish are better on all these lists !
So saying that Stockfish only has a small edge on blitz and bullet is denying the thruth ore twisting it.
In blitz, looking at both 4 cpu and 1 cpu results for both CEGT and CCRL at 40/4', Stockfish has roughly a fifteen point lead. I would call this a small edge. Going by direct results the gap is indeed larger. Some of this may go away if it is tested without contempt, since the contempt setting is optimized for a range of opponents well below Komodo level, not for a roughly equal opponent. In any case the fifteen elo gap in blitz becomes zero at the intermediate tc, showing the trend that Komodo gains with more time.
Stockfish rules over Komodo in all the individual match results at CEGT and CCRL from blitz to 40/20 or 40/40
With or without contempt Komodo comes second
That is just a fact don't twist it