Long TC matches with Houdini 3 Beta

lkaufman · Post by **lkaufman** » Thu Sep 27, 2012 1:32 am

Vinvin wrote:
Houdini wrote:The second match just finished.

Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games

Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !

It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.

Vinvin · Post by **Vinvin** » Thu Sep 27, 2012 1:52 am

lkaufman wrote:
Vinvin wrote:
Houdini wrote:The second match just finished.

Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games

Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !
It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.

I took the unified list (around 1 min/move AMD 2Ghz) : http://home.scarlet.be/vincentlejeune/r ... 120910.txt
And I count SF2.3.1=SF2.2.2=3178.75; Houdini 2.0 64-bit= 3219.66
H3b = (3219.66+94+3178.75+82)/2 = 3287 (+67 over H2)

lkaufman · Post by **lkaufman** » Thu Sep 27, 2012 2:00 am

Vinvin wrote:
lkaufman wrote:
Vinvin wrote:
Houdini wrote:The second match just finished.

Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games

Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !
It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.
I took the unified list (around 1 min/move AMD 2Ghz) : http://home.scarlet.be/vincentlejeune/r ... 120910.txt
And I count SF2.3.1=SF2.2.2=3178.75; Houdini 2.0 64-bit= 3219.66
H3b = (3219.66+94+3178.75+82)/2 = 3287 (+67 over H2)

So disregarding the self-play it would be +41 over H2 based on that list, which would be a very respectable gain for a year if it turns out to be so.

Laskos · Post by **Laskos** » Thu Sep 27, 2012 2:52 am

lkaufman wrote:
Vinvin wrote:
lkaufman wrote:
Vinvin wrote:
Houdini wrote:The second match just finished.

Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games

Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !
It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.
I took the unified list (around 1 min/move AMD 2Ghz) : http://home.scarlet.be/vincentlejeune/r ... 120910.txt
And I count SF2.3.1=SF2.2.2=3178.75; Houdini 2.0 64-bit= 3219.66
H3b = (3219.66+94+3178.75+82)/2 = 3287 (+67 over H2)
So disregarding the self-play it would be +41 over H2 based on that list, which would be a very respectable gain for a year if it turns out to be so.

The error margins are still high, and I wouldn't disregard the H-H match. After 360 games against H,S,K the error margins will be some 25 points, and we will have a better picture.

Kai

Steve Maughan · Post by **Steve Maughan** » Thu Sep 27, 2012 5:01 am

Hi Robert,

Some of the search depths seems quite shallow. Are you running any CPU intensive applications?

Steve

Uri Blass · Post by **Uri Blass** » Thu Sep 27, 2012 7:43 am

Maybe houdini3 gets smaller depth in some positions relative to houdini1.5 thanks to improvement by doing less pruning.

There are depths for the opponents and the interesting question is if the depths for the opponent are shallow.

M ANSARI · Post by **M ANSARI** » Thu Sep 27, 2012 8:12 am

I have been out for a while, did Komodo 5 get MP going?

Houdini · Post by **Houdini** » Thu Sep 27, 2012 11:13 am

Steve Maughan wrote:Hi Robert,

Some of the search depths seems quite shallow. Are you running any CPU intensive applications?

Steve

Your "humble i7 laptop" is a lot more powerful than the individual cores of my AMD 6274 server! The only thing running on the server is the match, 31 simultaneous games. There's no other software installed on this server.

In the position at move 16 you show there could have been some fail high/fail lows (apparently the engine hesitates between c4 and a4). Hard to tell, I don't have any more information than the PGN. It's interesting to note the difference in score between Houdini 3 and Houdini 1.5, +0.76 vs +0.28.

Robert

Houdini · Post by **Houdini** » Thu Sep 27, 2012 11:42 am

To give you an idea of the server environment used in this test, some screen shots during the match with Houdini 2.0c.

1) The Task Manager with the CPU load of the 64 cores - only one core per module is used.

2) The Task Manager with 31 Houdini 3 and 31 Houdini 2.0c processes running.

3) The cutechess-cli window with standing after 22 finished games (+12 -1 =9)

lucasart · Post by **lucasart** » Thu Sep 27, 2012 11:54 am

Houdini wrote: Detailed match conditions are:
- 120 games against Houdini 2.0c, Stockfish 2.3.1, Komodo 5
- 90+30 TC, single-core, 512 MB hash, 60 positions from Noomen 2006/2008 test suites played from both sides.
- No table bases, resign after 3 successive scores larger than 5.00, draw after move 120 if both evaluations below 0.10.
- Hardware: AMD Opteron 6274 Server (SSE4) producing approx. 1 MN/sec per core with Houdini.
- The Houdini 3 version that is played can be considered a Houdini 3 "Beta": close to, but not exactly the final release.
- The error margin on the individual 120-game match results will be about ± 45 Elo.

Hello Robert,

What about threads ? If I understand correctly each AMD Opteron has one match running (pondering should be off I hope). Engines select the nb of threads (should be the number of CPU), and use all CPU power alternatively, right ? Then you have 32 AMD Opteron doing the 32 concurrent matches ?

Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta