SF against Houdini 3 contempt 0

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: SF against Houdini 3 contempt 0

Post by Laskos »

lkaufman wrote:
Laskos wrote:
lkaufman wrote:
Laskos wrote:
lkaufman wrote:
Laskos wrote: Yes, the diminishing returns can be quantified. From empirical data, I constructed the scaling functions of Komodo and Houdini as number of doublings. 20 doublings mean 2^20 ~ 1 million nodes per move. The fitted functions are 35,000/(number of doublings)^2 for Houdini and 52,000/(number of doublings)^2.1 for Komodo. The limiting value for infinite number of nodes is equal, assuming blitz is 2,800 ELO, the limiting value is 4,550 ELO points in both cases.
While I don't dissagree with anything you wrote, I would like to know what empirical data you based this on.

Larry
This Summer I played self-matches of Houdini and Komodo from 512k vs. 256k nodes per move to 16384k vs. 8192k nodes per move, then fitted the gain with number of doublings (similar to http://talkchess.com/forum/viewtopic.ph ... 3&start=12). The fit I presented here is a simplified one, the limiting value is just estimative, I had better, but more complicated fits. The extrapolation to 30-60 doublings is speculative, just to show that Uri's reversion of scaling probably occurs at some very large 10^15 (50 doublings) nodes per move.
Thanks. What would you estimate was the cumulative value of all five doublings from 4'+1" to (about) 2 hours + 30"?? This would be the approximate amount to add to blitz list ratings of Komodo to estimate standard tournament level ratings relative to Houdini.
About 25 ELO points, and about 30 ELO points if blitz is on one core, and the long games are on 4 cores.
Thanks. How did you factor in the "compression effect" in this? In other words, if let's say Houdini is fifty elo above Komodo on some blitz list, this might drop to only 25 at 40/2hours even if there were no scaling difference, due to more draws and other technical reasons. So would you first divide by 2 (or some other ratio) and then add the 25, or was this already factored into the 25 elo estimate?
Here I assumed that at the starting point (4'+1'' in this case) the ratings are equal, and the difference to longer TC is made by the scaling. The initial rating difference could be factored the same as the gain from doubling, if we assume a rating difference compresses the same as the gain from doubling (or that a better measure of rating difference is the relative number of nodes to make them equal). So, if one (Komodo) is weaker to start with at blitz, 25 ELO points will become something larger (add the rating compression from the plot). In our case, if Komodo is weaker by 30 points at 4'+1'', add another 10 points or so, for a total of 35 points to long 2 hours + 30'' games.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: SF against Houdini 3 contempt 0

Post by lkaufman »

Laskos wrote:
lkaufman wrote:
Laskos wrote:
lkaufman wrote:
Laskos wrote:
lkaufman wrote:
Laskos wrote: Yes, the diminishing returns can be quantified. From empirical data, I constructed the scaling functions of Komodo and Houdini as number of doublings. 20 doublings mean 2^20 ~ 1 million nodes per move. The fitted functions are 35,000/(number of doublings)^2 for Houdini and 52,000/(number of doublings)^2.1 for Komodo. The limiting value for infinite number of nodes is equal, assuming blitz is 2,800 ELO, the limiting value is 4,550 ELO points in both cases.
While I don't dissagree with anything you wrote, I would like to know what empirical data you based this on.

Larry
This Summer I played self-matches of Houdini and Komodo from 512k vs. 256k nodes per move to 16384k vs. 8192k nodes per move, then fitted the gain with number of doublings (similar to http://talkchess.com/forum/viewtopic.ph ... 3&start=12). The fit I presented here is a simplified one, the limiting value is just estimative, I had better, but more complicated fits. The extrapolation to 30-60 doublings is speculative, just to show that Uri's reversion of scaling probably occurs at some very large 10^15 (50 doublings) nodes per move.
Thanks. What would you estimate was the cumulative value of all five doublings from 4'+1" to (about) 2 hours + 30"?? This would be the approximate amount to add to blitz list ratings of Komodo to estimate standard tournament level ratings relative to Houdini.
About 25 ELO points, and about 30 ELO points if blitz is on one core, and the long games are on 4 cores.
Thanks. How did you factor in the "compression effect" in this? In other words, if let's say Houdini is fifty elo above Komodo on some blitz list, this might drop to only 25 at 40/2hours even if there were no scaling difference, due to more draws and other technical reasons. So would you first divide by 2 (or some other ratio) and then add the 25, or was this already factored into the 25 elo estimate?
Here I assumed that at the starting point (4'+1'' in this case) the ratings are equal, and the difference to longer TC is made by the scaling. The initial rating difference could be factored the same as the gain from doubling, if we assume a rating difference compresses the same as the gain from doubling (or that a better measure of rating difference is the relative number of nodes to make them equal). So, if one (Komodo) is weaker to start with at blitz, 25 ELO points will become something larger (add the rating compression from the plot). In our case, if Komodo is weaker by 30 points at 4'+1'', add another 10 points or so, for a total of 35 points to long 2 hours + 30'' games.
That sounds reasonable. I suppose it is roughly proportional, in the sense that if the blitz gap is for example 60 instead of 30 elo, the expected gain would be 25 + 20 = 45, in round numbers. I think this works out to saying that we need to be 37.5 elo below Houdini at blitz to be equal at 2 hours plus 30". This sounds about right to me.
beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: SF against Houdini 3 contempt 0

Post by beram »

lkaufman wrote:
Laskos wrote:
lkaufman wrote:
Laskos wrote:
lkaufman wrote:
Laskos wrote:
lkaufman wrote:
Laskos wrote: Yes, the diminishing returns can be quantified. From empirical data, I constructed the scaling functions of Komodo and Houdini as number of doublings. 20 doublings mean 2^20 ~ 1 million nodes per move. The fitted functions are 35,000/(number of doublings)^2 for Houdini and 52,000/(number of doublings)^2.1 for Komodo. The limiting value for infinite number of nodes is equal, assuming blitz is 2,800 ELO, the limiting value is 4,550 ELO points in both cases.
While I don't dissagree with anything you wrote, I would like to know what empirical data you based this on.

Larry
This Summer I played self-matches of Houdini and Komodo from 512k vs. 256k nodes per move to 16384k vs. 8192k nodes per move, then fitted the gain with number of doublings (similar to http://talkchess.com/forum/viewtopic.ph ... 3&start=12). The fit I presented here is a simplified one, the limiting value is just estimative, I had better, but more complicated fits. The extrapolation to 30-60 doublings is speculative, just to show that Uri's reversion of scaling probably occurs at some very large 10^15 (50 doublings) nodes per move.
Thanks. What would you estimate was the cumulative value of all five doublings from 4'+1" to (about) 2 hours + 30"?? This would be the approximate amount to add to blitz list ratings of Komodo to estimate standard tournament level ratings relative to Houdini.
About 25 ELO points, and about 30 ELO points if blitz is on one core, and the long games are on 4 cores.
Thanks. How did you factor in the "compression effect" in this? In other words, if let's say Houdini is fifty elo above Komodo on some blitz list, this might drop to only 25 at 40/2hours even if there were no scaling difference, due to more draws and other technical reasons. So would you first divide by 2 (or some other ratio) and then add the 25, or was this already factored into the 25 elo estimate?
Here I assumed that at the starting point (4'+1'' in this case) the ratings are equal, and the difference to longer TC is made by the scaling. The initial rating difference could be factored the same as the gain from doubling, if we assume a rating difference compresses the same as the gain from doubling (or that a better measure of rating difference is the relative number of nodes to make them equal). So, if one (Komodo) is weaker to start with at blitz, 25 ELO points will become something larger (add the rating compression from the plot). In our case, if Komodo is weaker by 30 points at 4'+1'', add another 10 points or so, for a total of 35 points to long 2 hours + 30'' games.
That sounds reasonable. I suppose it is roughly proportional, in the sense that if the blitz gap is for example 60 instead of 30 elo, the expected gain would be 25 + 20 = 45, in round numbers. I think this works out to saying that we need to be 37.5 elo below Houdini at blitz to be equal at 2 hours plus 30". This sounds about right to me.
And Larry if there isnt such a big gap between bullet and rapid ?
For instance how do you fit in these match results in your theories ?

Code: Select all

Houdini 3 vs Komodo 6 match results:

LS list (bullet ~ 170sec game) 54,3% - 1000 games
CCRL 4/40 4cpu (blitz TC game) 54%   - 50 games
CCRL 40/40 4cpu (Long TC game) 54%   - 149 games
CEGT 40/120 1cpu(Long TC game) 49,5% - 50 games
CEGT 5m3s (PB=ON, blitz game ) 55%   - 50 games
CEGT 40/20 4CPU (Long TC game) 51,5% - 100 games

AH LTC (90m/30s fast i7 3930)  47,3% - 300 games
http://www.talkchess.com/forum/viewtopic.php?t=49997

A. Chilantiev (10m 5s i7 3960) 51,3% - 1000 games
http://www.talkchess.com/forum/viewtopic.php?t=49987

R.Banks 4cpu(5m3s AMDX6 3.6GHz)54,7% - 500 games
http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?pid=489540
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: SF against Houdini 3 contempt 0

Post by Laskos »

lkaufman wrote:
Laskos wrote:
lkaufman wrote:
Laskos wrote:
lkaufman wrote:
Laskos wrote:
lkaufman wrote:
Laskos wrote: Yes, the diminishing returns can be quantified. From empirical data, I constructed the scaling functions of Komodo and Houdini as number of doublings. 20 doublings mean 2^20 ~ 1 million nodes per move. The fitted functions are 35,000/(number of doublings)^2 for Houdini and 52,000/(number of doublings)^2.1 for Komodo. The limiting value for infinite number of nodes is equal, assuming blitz is 2,800 ELO, the limiting value is 4,550 ELO points in both cases.
While I don't dissagree with anything you wrote, I would like to know what empirical data you based this on.

Larry
This Summer I played self-matches of Houdini and Komodo from 512k vs. 256k nodes per move to 16384k vs. 8192k nodes per move, then fitted the gain with number of doublings (similar to http://talkchess.com/forum/viewtopic.ph ... 3&start=12). The fit I presented here is a simplified one, the limiting value is just estimative, I had better, but more complicated fits. The extrapolation to 30-60 doublings is speculative, just to show that Uri's reversion of scaling probably occurs at some very large 10^15 (50 doublings) nodes per move.
Thanks. What would you estimate was the cumulative value of all five doublings from 4'+1" to (about) 2 hours + 30"?? This would be the approximate amount to add to blitz list ratings of Komodo to estimate standard tournament level ratings relative to Houdini.
About 25 ELO points, and about 30 ELO points if blitz is on one core, and the long games are on 4 cores.
Thanks. How did you factor in the "compression effect" in this? In other words, if let's say Houdini is fifty elo above Komodo on some blitz list, this might drop to only 25 at 40/2hours even if there were no scaling difference, due to more draws and other technical reasons. So would you first divide by 2 (or some other ratio) and then add the 25, or was this already factored into the 25 elo estimate?
Here I assumed that at the starting point (4'+1'' in this case) the ratings are equal, and the difference to longer TC is made by the scaling. The initial rating difference could be factored the same as the gain from doubling, if we assume a rating difference compresses the same as the gain from doubling (or that a better measure of rating difference is the relative number of nodes to make them equal). So, if one (Komodo) is weaker to start with at blitz, 25 ELO points will become something larger (add the rating compression from the plot). In our case, if Komodo is weaker by 30 points at 4'+1'', add another 10 points or so, for a total of 35 points to long 2 hours + 30'' games.
That sounds reasonable. I suppose it is roughly proportional, in the sense that if the blitz gap is for example 60 instead of 30 elo, the expected gain would be 25 + 20 = 45, in round numbers. I think this works out to saying that we need to be 37.5 elo below Houdini at blitz to be equal at 2 hours plus 30". This sounds about right to me.
Yes, it is proportional and yes, 37.5 ELO is roughly the quantity Komodo can be lower at 4'+1'' to be equal at 2 hours + 30''.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: SF against Houdini 3 contempt 0

Post by Milos »

Adam Hair wrote:87% confidence not enough for you? How about a LOS of 94%
Despite your testing hobby seams you really don't have any idea what variance means.
Here is just an example to illustrate. I'm currently replaying H3 contempt 0 vs. SF021113 2'+1'' running on different hardware but pretty similar strength as Kai's. I'm running on 5 cores simultaneously (on a 6-core machine) 5 tournaments from same 500 starting positions RR. In total there will be 5k games.
Now current results are:
Table 1 - H3:SF - 66+/39-/128= +40.4Elo LOS 99.6%
Table 2 - H3:SF - 62+/43-/127= +28.5Elo LOS 96.8%
Table 3 - H3:SF - 64+/42-/121= +33.8Elo LOS 98.4%
Table 4 - H3:SF - 69+/44-/110= +39.1Elo LOS 99.1%
Table 5 - H3:SF - 41+/62-/120= -32.8Elo LOS 1.9% !!!!!

So on 4 "tables" H3 is leading by 35Elo on average and on one it is loosing by 33Elo. And all that after 230+ games per "table". In first 4 "tables" LOS is 97-99% in last is <2%. How about that?
Openings are the same, gui is resetting after every 20 games, everything is completely random, still observe the difference. That is called variability. After you start running tens of thousands of games instead of hundreds you might get what I'm talking about.

And overall it is H3:SF - 302+/230-/660= +22Elo LOS 99.9%
So totally opposite from what Kai has. I will post final result with pgn and all the match conditions once all 5000 games are played.
Adam Hair wrote: The TCEC results do not contradict what has been reported at shorter time controls, such as this one.
Lol, are you even aware what are you talking about. TCEC is at least 15x longer TC and there it was H dev, at the link you've posted is H3 contempt 1. This is not even apples and oranges this is just ludicrous to even consider comparing.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: SF against Houdini 3 contempt 0

Post by Adam Hair »

Milos, if you were to remove the contempt that oozes out of most posts that you make, much more productive communication could take place.

I typical do test at high volume, so I am aware of variability in engine testing and its causes. In your example, the main difference in the results at this point is the openings used so far in each "table". It will be far more interesting to see the variance between the the tables at the end of your test.