It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.Vinvin wrote:I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !Houdini wrote:The second match just finished.
Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games
Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
Long TC matches with Houdini 3 Beta
Moderators: hgm, Rebel, chrisw
-
- Posts: 5966
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Long TC matches with Houdini 3 Beta
-
- Posts: 5228
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: Long TC matches with Houdini 3 Beta
I took the unified list (around 1 min/move AMD 2Ghz) : http://home.scarlet.be/vincentlejeune/r ... 120910.txtlkaufman wrote:It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.Vinvin wrote:I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !Houdini wrote:The second match just finished.
Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games
Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
And I count SF2.3.1=SF2.2.2=3178.75; Houdini 2.0 64-bit= 3219.66
H3b = (3219.66+94+3178.75+82)/2 = 3287 (+67 over H2)
-
- Posts: 5966
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Long TC matches with Houdini 3 Beta
So disregarding the self-play it would be +41 over H2 based on that list, which would be a very respectable gain for a year if it turns out to be so.Vinvin wrote:I took the unified list (around 1 min/move AMD 2Ghz) : http://home.scarlet.be/vincentlejeune/r ... 120910.txtlkaufman wrote:It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.Vinvin wrote:I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !Houdini wrote:The second match just finished.
Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games
Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
And I count SF2.3.1=SF2.2.2=3178.75; Houdini 2.0 64-bit= 3219.66
H3b = (3219.66+94+3178.75+82)/2 = 3287 (+67 over H2)
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Long TC matches with Houdini 3 Beta
The error margins are still high, and I wouldn't disregard the H-H match. After 360 games against H,S,K the error margins will be some 25 points, and we will have a better picture.lkaufman wrote:So disregarding the self-play it would be +41 over H2 based on that list, which would be a very respectable gain for a year if it turns out to be so.Vinvin wrote:I took the unified list (around 1 min/move AMD 2Ghz) : http://home.scarlet.be/vincentlejeune/r ... 120910.txtlkaufman wrote:It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.Vinvin wrote:I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !Houdini wrote:The second match just finished.
Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games
Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
And I count SF2.3.1=SF2.2.2=3178.75; Houdini 2.0 64-bit= 3219.66
H3b = (3219.66+94+3178.75+82)/2 = 3287 (+67 over H2)
Kai
-
- Posts: 1222
- Joined: Wed Mar 08, 2006 8:28 pm
- Location: Florida, USA
Re: Long TC matches with Houdini 3 Beta
Hi Robert,
Some of the search depths seems quite shallow. Are you running any CPU intensive applications?
Steve
Some of the search depths seems quite shallow. Are you running any CPU intensive applications?
Steve
-
- Posts: 10409
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Long TC matches with Houdini 3 Beta
Maybe houdini3 gets smaller depth in some positions relative to houdini1.5 thanks to improvement by doing less pruning.
There are depths for the opponents and the interesting question is if the depths for the opponent are shallow.
There are depths for the opponents and the interesting question is if the depths for the opponent are shallow.
-
- Posts: 3707
- Joined: Thu Mar 16, 2006 7:10 pm
Re: Long TC matches with Houdini 3 Beta
I have been out for a while, did Komodo 5 get MP going?
-
- Posts: 1471
- Joined: Tue Mar 16, 2010 12:00 am
Re: Long TC matches with Houdini 3 Beta
Your "humble i7 laptop" is a lot more powerful than the individual cores of my AMD 6274 server! The only thing running on the server is the match, 31 simultaneous games. There's no other software installed on this server.Steve Maughan wrote:Hi Robert,
Some of the search depths seems quite shallow. Are you running any CPU intensive applications?
Steve
In the position at move 16 you show there could have been some fail high/fail lows (apparently the engine hesitates between c4 and a4). Hard to tell, I don't have any more information than the PGN. It's interesting to note the difference in score between Houdini 3 and Houdini 1.5, +0.76 vs +0.28.
Robert
-
- Posts: 1471
- Joined: Tue Mar 16, 2010 12:00 am
Re: Long TC matches with Houdini 3 Beta
To give you an idea of the server environment used in this test, some screen shots during the match with Houdini 2.0c.
1) The Task Manager with the CPU load of the 64 cores - only one core per module is used.
2) The Task Manager with 31 Houdini 3 and 31 Houdini 2.0c processes running.
3) The cutechess-cli window with standing after 22 finished games (+12 -1 =9)
1) The Task Manager with the CPU load of the 64 cores - only one core per module is used.
2) The Task Manager with 31 Houdini 3 and 31 Houdini 2.0c processes running.
3) The cutechess-cli window with standing after 22 finished games (+12 -1 =9)
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Long TC matches with Houdini 3 Beta
Hello Robert,Houdini wrote: Detailed match conditions are:
- 120 games against Houdini 2.0c, Stockfish 2.3.1, Komodo 5
- 90+30 TC, single-core, 512 MB hash, 60 positions from Noomen 2006/2008 test suites played from both sides.
- No table bases, resign after 3 successive scores larger than 5.00, draw after move 120 if both evaluations below 0.10.
- Hardware: AMD Opteron 6274 Server (SSE4) producing approx. 1 MN/sec per core with Houdini.
- The Houdini 3 version that is played can be considered a Houdini 3 "Beta": close to, but not exactly the final release.
- The error margin on the individual 120-game match results will be about ± 45 Elo.
What about threads ? If I understand correctly each AMD Opteron has one match running (pondering should be off I hope). Engines select the nb of threads (should be the number of CPU), and use all CPU power alternatively, right ? Then you have 32 AMD Opteron doing the 32 concurrent matches ?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.