Long TC matches with Houdini 3 Beta

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

lkaufman
Posts: 5966
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Long TC matches with Houdini 3 Beta

Post by lkaufman »

Vinvin wrote:
Houdini wrote:The second match just finished.

Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games

Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !
It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Long TC matches with Houdini 3 Beta

Post by Vinvin »

lkaufman wrote:
Vinvin wrote:
Houdini wrote:The second match just finished.

Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games

Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !
It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.
I took the unified list (around 1 min/move AMD 2Ghz) : http://home.scarlet.be/vincentlejeune/r ... 120910.txt
And I count SF2.3.1=SF2.2.2=3178.75; Houdini 2.0 64-bit= 3219.66
H3b = (3219.66+94+3178.75+82)/2 = 3287 (+67 over H2)
lkaufman
Posts: 5966
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Long TC matches with Houdini 3 Beta

Post by lkaufman »

Vinvin wrote:
lkaufman wrote:
Vinvin wrote:
Houdini wrote:The second match just finished.

Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games

Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !
It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.
I took the unified list (around 1 min/move AMD 2Ghz) : http://home.scarlet.be/vincentlejeune/r ... 120910.txt
And I count SF2.3.1=SF2.2.2=3178.75; Houdini 2.0 64-bit= 3219.66
H3b = (3219.66+94+3178.75+82)/2 = 3287 (+67 over H2)
So disregarding the self-play it would be +41 over H2 based on that list, which would be a very respectable gain for a year if it turns out to be so.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Long TC matches with Houdini 3 Beta

Post by Laskos »

lkaufman wrote:
Vinvin wrote:
lkaufman wrote:
Vinvin wrote:
Houdini wrote:The second match just finished.

Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games

Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !
It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.
I took the unified list (around 1 min/move AMD 2Ghz) : http://home.scarlet.be/vincentlejeune/r ... 120910.txt
And I count SF2.3.1=SF2.2.2=3178.75; Houdini 2.0 64-bit= 3219.66
H3b = (3219.66+94+3178.75+82)/2 = 3287 (+67 over H2)
So disregarding the self-play it would be +41 over H2 based on that list, which would be a very respectable gain for a year if it turns out to be so.
The error margins are still high, and I wouldn't disregard the H-H match. After 360 games against H,S,K the error margins will be some 25 points, and we will have a better picture.

Kai
User avatar
Steve Maughan
Posts: 1222
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: Long TC matches with Houdini 3 Beta

Post by Steve Maughan »

Hi Robert,

Some of the search depths seems quite shallow. Are you running any CPU intensive applications?

Image

Steve
Uri Blass
Posts: 10409
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Long TC matches with Houdini 3 Beta

Post by Uri Blass »

Maybe houdini3 gets smaller depth in some positions relative to houdini1.5 thanks to improvement by doing less pruning.

There are depths for the opponents and the interesting question is if the depths for the opponent are shallow.
User avatar
M ANSARI
Posts: 3707
Joined: Thu Mar 16, 2006 7:10 pm

Re: Long TC matches with Houdini 3 Beta

Post by M ANSARI »

I have been out for a while, did Komodo 5 get MP going?
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Long TC matches with Houdini 3 Beta

Post by Houdini »

Steve Maughan wrote:Hi Robert,

Some of the search depths seems quite shallow. Are you running any CPU intensive applications?

Steve
Your "humble i7 laptop" is a lot more powerful than the individual cores of my AMD 6274 server! The only thing running on the server is the match, 31 simultaneous games. There's no other software installed on this server.

In the position at move 16 you show there could have been some fail high/fail lows (apparently the engine hesitates between c4 and a4). Hard to tell, I don't have any more information than the PGN. It's interesting to note the difference in score between Houdini 3 and Houdini 1.5, +0.76 vs +0.28.

Robert
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Long TC matches with Houdini 3 Beta

Post by Houdini »

To give you an idea of the server environment used in this test, some screen shots during the match with Houdini 2.0c.

1) The Task Manager with the CPU load of the 64 cores - only one core per module is used.

Image

2) The Task Manager with 31 Houdini 3 and 31 Houdini 2.0c processes running.

Image

3) The cutechess-cli window with standing after 22 finished games (+12 -1 =9)

Image
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Long TC matches with Houdini 3 Beta

Post by lucasart »

Houdini wrote: Detailed match conditions are:
- 120 games against Houdini 2.0c, Stockfish 2.3.1, Komodo 5
- 90+30 TC, single-core, 512 MB hash, 60 positions from Noomen 2006/2008 test suites played from both sides.
- No table bases, resign after 3 successive scores larger than 5.00, draw after move 120 if both evaluations below 0.10.
- Hardware: AMD Opteron 6274 Server (SSE4) producing approx. 1 MN/sec per core with Houdini.
- The Houdini 3 version that is played can be considered a Houdini 3 "Beta": close to, but not exactly the final release.
- The error margin on the individual 120-game match results will be about ± 45 Elo.
Hello Robert,

What about threads ? If I understand correctly each AMD Opteron has one match running (pondering should be off I hope). Engines select the nb of threads (should be the number of CPU), and use all CPU power alternatively, right ? Then you have 32 AMD Opteron doing the 32 concurrent matches ?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.