Stockfish 020114 - Houdini 4 x64A Testing 39 of 100 played.

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: Stockfish 020114 - Houdini 4 x64A 74 of 100 played.

Post by Dr.Wael Deeb »

mwyoung wrote:
Dr.Wael Deeb wrote:
mwyoung wrote:
Milos wrote:
lkaufman wrote:So it still appears to me that Houdini 4 is stronger on one core and SF Dec. 30 on four cores, regardless of hyperthreading, sleeping threads, etc. Milos observed that SF takes much more time than Houdini, but I think this is simplly algorithmic; SF is much more aggressive about time use than any other top engine, which does help its middlegame play but leaves it short of time for the endgame. Maybe this tendency is exaggerated in MP play for some reason.
If you just go through pgns of Mark you will notice that despite weird TC (2'+12'') H4 TM is totally crazy i.e. it ends every game (winning or losing) with 5+ minutes on the clock. This is impossible to be normal behavior (algorithmic) and it doesn't behave like this at all (with the same TC) on my 6-core AMD machine and different GUI. I have neither Fritz nor Intel i7 to try so I can't debug the reason (even though I suspect it's GUI and if provided UCI logs, it would be easy to find where is the error), but something is obviously wrong.
My PGN have all depth, time, and move info in them. All you have to do is put them into a fritz GUI.

I am not sure what your problem is, or why you don't understand even some very basic concepts.

For the record there is nothing wrong with the PGN or Houdini 4's time.

The time control is 2+12, Houdini 4 is using table base. So it is gaining lots of time at the end of the game because the table base moves instantly and that gains Houdini 12s every time it moves like this in a long Table Base endgame.

In a Fritz GUI if you have the clock window open as you play over the game, you can see this as it was played in my test games.
A piece of advice Mark:

Manipulate one of your testing frameworks so that Houdini wins a direct match against the latest Stockfish version....

This time you'll suffer less attacks regards,
Dr.D
I could have stockfish show total contempt for houdini. Contempt=999. It is nice to have my results confirmed here and on ccrl. As predicted from my testing , stockfish holds the top spot.
I predicted 2-3 months for Houdini to be on top of the rating lists after its official release but it seems that this was way too optimistic....

I personally think that we've seen the last of Houdini's domination unless Robertolito applies a new strategy in the development of his chess engine regards,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
bnculp
Posts: 69
Joined: Wed Mar 08, 2006 8:19 pm

Re: Stockfish 020114 - Houdini 4 x64A 74 of 100 played.

Post by bnculp »

I ran 3 engine matches on my I7-3720QM quad system with hyperthread enabled. Each match was 1000 games using Cutechess at time control 15 sec + .05 sec, hash at 128mb, Houdini contempt set to 0, opening book 8moves_v3.pgn all openings repeated with colors reversed, no egtbs used.

Stockfish 311213 8-threads vs Stockfish 311213 4-threads : ELO +7
Houdini 4Pro 8-threads vs Houdini 4Pro 4-threads : ELO -14
Stockfish 311213 8-threads vs Houdini 4Pro 4-threads : ELO +7

Summary - in this test Stockfish 8-threads beat Stockfish 4-threads, Houdini 4-threads beat Houdini 8-threads, and Stockfish 8-threads beat Houdini 4-threads.
ernest
Posts: 2053
Joined: Wed Mar 08, 2006 8:30 pm

Re: Stockfish 020114 - Houdini 4 x64A 74 of 100 played.

Post by ernest »

mwyoung wrote:For the record, the elo gap is not 124 elo. It is only +62 elo for Stockfish.
Right!!! 8-)
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Stockfish 020114 - Houdini 4 x64A 74 of 100 played.

Post by Adam Hair »

Milos wrote:
mwyoung wrote:
lkaufman wrote:I'm rerunning my test using Fritz 11, but so far the results are similar; SF leading by 31 elo after 74 games. So the GUI doesn't seem to be the issue.
I am not having any issues in my test with the gui, even when I tested at 1m+0s. But I am running a very clean setup, my CPU usage is 0 to 2% before testing. I have nothing running in the background other then normal windows system usage.

So what is going on... are the results legit in our test that show stockfish beating Houdini at these time controls. Are we missing something... I am stumped at this point. If somehow we are giving Stockfish a unfair advantage in our setup. You are testing Stockfish the standard way 4 cpu on a 4 core system. I know I am testing HT, but the results is the same. Meaning stockfish is winning by more then just noise factors. In you add up all the results. Or not?

Maybe someone can see a problem in the games I posted. I left all move and time data in the PGN. So it could be put straight into a Fritz GUI for evaluation.
Well, if you just posted pgns earlier it would be easier to detect. Even though your pgn's are mostly hopeless (no depth or nps info) judging by the time (and despite ridiculous TC), SF is using 20-30% more time than H4 (this is a drastic difference), or simply said H4 is never using its time which is obviously GUI problem.
So my message to you don't use Fritz or similar crap (which is known for not following UCI standard properly) for serious engine testing.
I am going to assume that you were confused by the GUI's output. I almost never use the Fritz GUI myself, but I have looked at enough raw pgns from to be able to help you out here :wink:

I extracted the search times recorded in Mark's pgns. I threw out the first 8 moves (all book moves) and all moves after the engines reported mate scores. Here are the average search times for each game:

Code: Select all

17.9	20.6	H4 lost
15.6	16.0	H4 lost
15.7	17.0	H4 lost
15.2	15.3	H4 lost
16.3	11.3	H4 won
15.3	12.2	H4 won
14.8	14.8	H4 lost
15.2	12.0	H4 won
14.6	12.5	H4 won
Houdini actually took more time than Stockfish in the games it lost, and less time in the games it won.
bnculp
Posts: 69
Joined: Wed Mar 08, 2006 8:19 pm

Re: Stockfish 020114 - Houdini 4 x64A 74 of 100 played.

Post by bnculp »

bnculp wrote:I ran 3 engine matches on my I7-3720QM quad system with hyperthread enabled. Each match was 1000 games using Cutechess at time control 15 sec + .05 sec, hash at 128mb, Houdini contempt set to 0, opening book 8moves_v3.pgn all openings repeated with colors reversed, no egtbs used.

Stockfish 311213 8-threads vs Stockfish 311213 4-threads : ELO +7
Houdini 4Pro 8-threads vs Houdini 4Pro 4-threads : ELO -14
Stockfish 311213 8-threads vs Houdini 4Pro 4-threads : ELO +7

Summary - in this test Stockfish 8-threads beat Stockfish 4-threads, Houdini 4-threads beat Houdini 8-threads, and Stockfish 8-threads beat Houdini 4-threads.

I am running another 1000 game test with the same conditions as above except:

1) I am running on an i7-2600k with NO hyperthreading
2) each engine is using 4 threads

Its still early but the results so far are very different in favor of Houdini ELO +55
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Stockfish 020114 - Houdini 4 x64A 74 of 100 played.

Post by mwyoung »

bnculp wrote:I ran 3 engine matches on my I7-3720QM quad system with hyperthread enabled. Each match was 1000 games using Cutechess at time control 15 sec + .05 sec, hash at 128mb, Houdini contempt set to 0, opening book 8moves_v3.pgn all openings repeated with colors reversed, no egtbs used.

Stockfish 311213 8-threads vs Stockfish 311213 4-threads : ELO +7
Houdini 4Pro 8-threads vs Houdini 4Pro 4-threads : ELO -14
Stockfish 311213 8-threads vs Houdini 4Pro 4-threads : ELO +7

Summary - in this test Stockfish 8-threads beat Stockfish 4-threads, Houdini 4-threads beat Houdini 8-threads, and Stockfish 8-threads beat Houdini 4-threads.
+7 elo was the exact increase for ht that i had also in my ht test. I would recommnd ht as best for stockfish. But people are bias against this setting. Because it has always been said by most programmers not to use ht as it hurts performance in chess programs. I detected the reverse in stockfish. So i test and use the ht setting. At worst it does not hurt stockfish. And may help by a few elo.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Stockfish 020114 - Houdini 4 x64A 74 of 100 played.

Post by mwyoung »

ernest wrote:
mwyoung wrote:For the record, the elo gap is not 124 elo. It is only +62 elo for Stockfish.
Right!!! 8-)
Did stockfish eat your candy again. It must be hard to be wrong so many times as you have been with your accusations...i am almost tired of being right.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Stockfish 020114 - Houdini 4 x64A 74 of 100 played.

Post by lkaufman »

mwyoung wrote:
bnculp wrote:I ran 3 engine matches on my I7-3720QM quad system with hyperthread enabled. Each match was 1000 games using Cutechess at time control 15 sec + .05 sec, hash at 128mb, Houdini contempt set to 0, opening book 8moves_v3.pgn all openings repeated with colors reversed, no egtbs used.

Stockfish 311213 8-threads vs Stockfish 311213 4-threads : ELO +7
Houdini 4Pro 8-threads vs Houdini 4Pro 4-threads : ELO -14
Stockfish 311213 8-threads vs Houdini 4Pro 4-threads : ELO +7

Summary - in this test Stockfish 8-threads beat Stockfish 4-threads, Houdini 4-threads beat Houdini 8-threads, and Stockfish 8-threads beat Houdini 4-threads.
+7 elo was the exact increase for ht that i had also in my ht test. I would recommnd ht as best for stockfish. But people are bias against this setting. Because it has always been said by most programmers not to use ht as it hurts performance in chess programs. I detected the reverse in stockfish. So i test and use the ht setting. At worst it does not hurt stockfish. And may help by a few elo.
None of these tests show anything about using HT vs no HT. They show that if you do have HT enabled, with SF it pays to use it by running 2 threads per core. This is very different than saying that HT helps. With single core testing I found that if you have HT enable but don't use it (i.e. you keep threads = cores) you take a serious hit, maybe nearly 10%. So I think it is probably the case that HT off is still slightly better than HT on with threads = 2 x cores.
Still, with HT off, I'm showing +14 elo for SF after 775 games in MP mode (4t) at game/1' on Fritz 11 gui.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Stockfish 020114 - Houdini 4 x64A 74 of 100 played.

Post by mwyoung »

lkaufman wrote:
mwyoung wrote:
bnculp wrote:I ran 3 engine matches on my I7-3720QM quad system with hyperthread enabled. Each match was 1000 games using Cutechess at time control 15 sec + .05 sec, hash at 128mb, Houdini contempt set to 0, opening book 8moves_v3.pgn all openings repeated with colors reversed, no egtbs used.

Stockfish 311213 8-threads vs Stockfish 311213 4-threads : ELO +7
Houdini 4Pro 8-threads vs Houdini 4Pro 4-threads : ELO -14
Stockfish 311213 8-threads vs Houdini 4Pro 4-threads : ELO +7

Summary - in this test Stockfish 8-threads beat Stockfish 4-threads, Houdini 4-threads beat Houdini 8-threads, and Stockfish 8-threads beat Houdini 4-threads.
+7 elo was the exact increase for ht that i had also in my ht test. I would recommnd ht as best for stockfish. But people are bias against this setting. Because it has always been said by most programmers not to use ht as it hurts performance in chess programs. I detected the reverse in stockfish. So i test and use the ht setting. At worst it does not hurt stockfish. And may help by a few elo.
None of these tests show anything about using HT vs no HT. They show that if you do have HT enabled, with SF it pays to use it by running 2 threads per core. This is very different than saying that HT helps. With single core testing I found that if you have HT enable but don't use it (i.e. you keep threads = cores) you take a serious hit, maybe nearly 10%. So I think it is probably the case that HT off is still slightly better than HT on with threads = 2 x cores.
Still, with HT off, I'm showing +14 elo for SF after 775 games in MP mode (4t) at game/1' on Fritz 11 gui.

You maybe be correct, it is hard to know when trying to measure such a small increase. I leave it the the user to decide, I don't think it hurts Stockfish like in other programs if used correctly, i.e. Houdini.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Stockfish 020114 - Houdini 4 x64A 100 of 100 Played.

Post by mwyoung »

Code: Select all

Blitz, Blitz 2m+12s  0

                                      
1   Stockfish 020114 64 SSE4.2   +53  +25/=65/-10 57.50%   57.5/100
2   Houdini 4 Pro x64A           -53  +10/=65/-25 42.50%   42.5/100

"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.