Stockfish 301213 - Houdini 4 x64A, 1 CPU Core Test

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Results at game 50

Post by mwyoung »

lkaufman wrote:
lkaufman wrote:
mwyoung wrote:
lkaufman wrote:
mwyoung wrote: I tested it on in all my test. In the one core test i affinity locked houdini to its own core. I can do a 4 core test. To test the ht theory. I can lock houdini to one pipe on each core. This will tell us if there is a ht issue with houdini. And this will not affect stockfish. I can test stockfish. With the option sleeping threads on and off. Or you can if you system has this option. I run windows 8.
Your idea is worth doing, but I'm not so sure that locking houdini to one pipe per core is the same as turning off HT. But it will definitely be interesting! By the way, my quad is also Windows 8 so I guess I could do this sort of thing too if I figure out how to do affinity lock.
To use the affinity option go to task manager. Right click on the chess engine in processes
In the menu option popup click details. The next menu will have a option called affinity. Here you can tell the engine what cpu pipes to use. On a intel cpu the pipes are numbered 01 23 45 67. 01 for example is the 2 pipes for the first cpu core.
Thanks, I'll try that later.
I may have an answer to the mystery, though I won't really know until tonite or tomorrow. I'm testing with HT off, so this 8/4 core thing is not relevant for me, I just use 4 threads. But I am using SF (dec 30 version) in default mode, which means "True" for the sleeping threads option.
So far I've run fifty games at 30" + .3" (i tried 15" but got many Houdini time forfeits) in 4 core MP mode, and Stockfish leads by 29 to 21! I need a lot more games, but this is confirming your results so far. Next I'll have to rerun with the option set to "false". But it sure looks now like the sleeping threads option is the key to your great MP results for SF, with or without hyperthreading and regardless of running 4 or 8 threads with hyperthreading on. I don't know enough about this option to guess why it would have such huge effects, but that is what it looks like now.
I suggest you rerun the MP match at your same time control but with the option set to false (and presumably also with threads set to 4). I'll predict that Houdini will win in that case, and our mystery will be solved.
You mean I am not a thief of chess software, a liar, a fraud, a chess manipulator. That needs to stop posting stockfish results. Because it is clearly wrong, because everyone knows Houdini is king at blitz. So I need to stop posting my lies. And That Larry was going to expose me. Because I am the only tester that it getting this result.

I know they will be disappointed. I will not hold my breath for any apologizes. And you guys know who you are right BJ and others.

I will do more testing, running a test now at longer time controls. Thanks Larry, always a huge fan.

Here is my current test, I was running this for the ones who say 1+1 results are a fluke, it means nothing. It can't do this at longer time controls.

Code: Select all

Blitz, Blitz 2m+12s  0

                                      12345678901234
1   Stockfish 020114 64 SSE4.2  +102   1½½½1½1½½½½½1½   9.0/14
2   Houdini 4 Pro x64A          -102  0½½½0½0½½½½½0½    5.0/14

"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Results at game 50

Post by lkaufman »

mwyoung wrote:
lkaufman wrote:
lkaufman wrote:
mwyoung wrote:
lkaufman wrote:
mwyoung wrote: I tested it on in all my test. In the one core test i affinity locked houdini to its own core. I can do a 4 core test. To test the ht theory. I can lock houdini to one pipe on each core. This will tell us if there is a ht issue with houdini. And this will not affect stockfish. I can test stockfish. With the option sleeping threads on and off. Or you can if you system has this option. I run windows 8.
Your idea is worth doing, but I'm not so sure that locking houdini to one pipe per core is the same as turning off HT. But it will definitely be interesting! By the way, my quad is also Windows 8 so I guess I could do this sort of thing too if I figure out how to do affinity lock.
To use the affinity option go to task manager. Right click on the chess engine in processes
In the menu option popup click details. The next menu will have a option called affinity. Here you can tell the engine what cpu pipes to use. On a intel cpu the pipes are numbered 01 23 45 67. 01 for example is the 2 pipes for the first cpu core.
Thanks, I'll try that later.
I may have an answer to the mystery, though I won't really know until tonite or tomorrow. I'm testing with HT off, so this 8/4 core thing is not relevant for me, I just use 4 threads. But I am using SF (dec 30 version) in default mode, which means "True" for the sleeping threads option.
So far I've run fifty games at 30" + .3" (i tried 15" but got many Houdini time forfeits) in 4 core MP mode, and Stockfish leads by 29 to 21! I need a lot more games, but this is confirming your results so far. Next I'll have to rerun with the option set to "false". But it sure looks now like the sleeping threads option is the key to your great MP results for SF, with or without hyperthreading and regardless of running 4 or 8 threads with hyperthreading on. I don't know enough about this option to guess why it would have such huge effects, but that is what it looks like now.
I suggest you rerun the MP match at your same time control but with the option set to false (and presumably also with threads set to 4). I'll predict that Houdini will win in that case, and our mystery will be solved.
You mean I am not a thief of chess software, a liar, a fraud, a chess manipulator. That needs to stop posting stockfish results. Because it is clearly wrong, because everyone knows Houdini is king at blitz. So I need to stop posting my lies. And That Larry was going to expose me. Because I am the only tester that it getting this result.

I know they will be disappointed. I will not hold my breath for any apologizes. And you guys know who you are right BJ and others.

I will do more testing, running a test now at longer time controls. Thanks Larry, always a huge fan.

Here is my current test, I was running this for the ones who say 1+1 results are a fluke, it means nothing. It can't do this at longer time controls.

Code: Select all

Blitz, Blitz 2m+12s  0

                                      12345678901234
1   Stockfish 020114 64 SSE4.2  +102   1½½½1½1½½½½½1½   9.0/14
2   Houdini 4 Pro x64A          -102  0½½½0½0½½½½½0½    5.0/14

My current MP result has SF leading by 19 elo after about 300 games. So not as good as your results, but close enough if HT helps SF vs. Houdini as you believe. I'll look forward to rerunning with the threads option set to false, to see if that's the key.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: If you don't mind me budging in...

Post by Adam Hair »

Milos wrote:
arjuntemurnikar wrote:Book = AH_150_Opening_Suite
Unfortunately AH150 is heavily biased in Stockfish favor despite changing colors. Opening suit bias is at least 25Elo.
Based on what evidence? Simply because Stockfish performed better than Houdini at longer time controls with these openings?
Milos wrote: Actually, it's really easy to create biased opening suite without ppl noticing.
It's sufficient to take a bigger sample like 1000 openings, replay them 5 times and select top 150 openings by score per pairing (10 games in total) of engine that you want to favor. I'm pretty sure you can get good mix where one engine has even more than 30 Elo advantage.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: If you don't mind me budging in...

Post by Milos »

Adam Hair wrote:
Milos wrote:
arjuntemurnikar wrote:Book = AH_150_Opening_Suite
Unfortunately AH150 is heavily biased in Stockfish favor despite changing colors. Opening suit bias is at least 25Elo.
Based on what evidence? Simply because Stockfish performed better than Houdini at longer time controls with these openings?
So Adam deleted my post calling it libel since I accused Mark Young who is publishing a lot of results without single proof (no pgn, no opening suit, no book, nothing) of using biased set of opening.
He intentionally put me in impossible position requiring from me to prove someone wrong when I don't have any material, basically to prove wrong results which just exist as numbers.

So here is my evidence and now you Adam prove me wrong.
1'+0.5'' single core run, 5 games (GUIs) in parallel, machine Phenom X6 1055T OC 3.7GHz, 256MB hash, H4 x64 pro B (with Syz 5-men) vs. SF251213:
a) 1500 openings (my openings remaining private I can't publish them, mix of all known sets Nunn, Noomen, Silver, Gambit suits, and many more limited on 6 moves):
873+/1710=/417- +53.2Elo for H4

b) 150 AH openings, repeated 10 times
791+/1636=/573- +25.3Elo for H4

AH opening suit bias = 27.9Elo

I'm not getting into motives (since I don't want to be called for libel) is it intentional or accidental bias...
Last edited by Milos on Fri Jan 03, 2014 2:05 am, edited 1 time in total.
arjuntemurnikar
Posts: 204
Joined: Tue Oct 15, 2013 10:22 pm
Location: Singapore

Re: If you don't mind me budging in...

Post by arjuntemurnikar »

lkaufman wrote:
arjuntemurnikar wrote:Mark & Larry,

I am doing a mini-test on my laptop.

=====
MacBook Air 4,2
Intel i7 2677M @1.8Ghz (Dual Core)
Running in a Windows 8.1 VM (Parallels Desktop 9) with 1 core and 1GB RAM
=====
TC = 2' + 1" (Two minutes + One second)
Book = AH_150_Opening_Suite
GUI = Fritz 13 GUI
All Engines with 1 thread
Houdini Contempt=0
=====

Code: Select all

SF-311213 Gauntlet (After 400/600 games)

Stockfish 311213 64 SSE4.2   - Houdini 4 Pro x64              100.5 - 99.5    +41/=119/-40      50.25%

Stockfish 311213 64 SSE4.2   - Komodo TCEC 64-bit             111.0 - 89.0    +65/=92/-43      55.50%
=====

Take these games with a grain of salt. It is not a perfect test, but it is still something.
1. It was played on a VM so there was some (but not that much) performance overhead. I tried my best to keep the system idle as much as possible so that the processes in the background (Windows and Mac) do not interfere with the test.
2. It's a small sample of games.

The tournament is still ongoing and there are 200 games left. I'll post the end result when it's done. At the moment though, it seems like SF has caught up with Houdini 4, if not surpassed it.

I will link the pgn once the test is complete.

Thanks
How many games at once, one or two? Hyperthreading on or off? SF "idle threads" option true or false? All these may be important.
By the way, using 1 Gb for hash for games this fast is way too much; should be either 128 or 256. It probably doesn't hurt much, just a small slowdown, but it might favor one engine over the other.
No you misunderstand. 1GB RAM is for the VM. Each engine is using only 256MB hash.

Hyperthreading is on on my Macbook as I am not sure how to turn it off, but I have assigned only 1/4 cores to the VM so it shouldn't really hyperthread. (The rest of the system is isolated.)

All options except Houdini's contempt are set to default.

As for the opening book, I am not sure if it's biased or not. It seems to be a balanced share of openings most commonly played in GM games, and the evals seem to be always within the range of 0.0-0.5 of either engine right out of book, and since both all engines play all openings with both white and black, I don't see why this should be a problem. If SF is better than other engines in queens pawn openings or whatever else you say Milos, it's not my problem that most games at GM level are indeed queens pawn. That's my stance.
Last edited by arjuntemurnikar on Fri Jan 03, 2014 2:10 am, edited 1 time in total.
bnculp
Posts: 69
Joined: Wed Mar 08, 2006 8:19 pm

Re: Results at game 50

Post by bnculp »

I am currently running a match on my laptop with a i7-3720QM CPU. It has 4 real cores and hyperthreading enabled. I am using cutechess to run 4 games simultaneously with each engine having 2 threads. The current match is 1000 games in length and is a Stockfish 123113 development engine with idle threads sleep set to false versus the same engine with idle threads sleep set to true. I am using a time control of 60sec + .05 sec. After 760 games the Stockfish engine with idle threads sleep set to false is leading by 20 ELO.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: If you don't mind me budging in...

Post by mwyoung »

I dont use a set of openings to test as posted in my testing post. I use a book made of gm games. Both programs play with the same book. And each program plays both sides of the opening. I have no idea what openings will be played. Again attacked for what.....posting my test results that i have been doing since ccc first came online.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
bnculp
Posts: 69
Joined: Wed Mar 08, 2006 8:19 pm

Re: If you don't mind me budging in...

Post by bnculp »

The insinuation that Mark Young would actually try to create an opening book that would be intentionally biased towards one engine is ridiculous.


The final score of my 1000 game match was 177-138-685 in favor of the Stockfish engine with Idle Threads Sleep set to false. That's 14 ELO over the Stockfish engine with Idle Threads Sleep set to true.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Results at game 50

Post by lkaufman »

bnculp wrote:I am currently running a match on my laptop with a i7-3720QM CPU. It has 4 real cores and hyperthreading enabled. I am using cutechess to run 4 games simultaneously with each engine having 2 threads. The current match is 1000 games in length and is a Stockfish 123113 development engine with idle threads sleep set to false versus the same engine with idle threads sleep set to true. I am using a time control of 60sec + .05 sec. After 760 games the Stockfish engine with idle threads sleep set to false is leading by 20 ELO.
I'm not sure that is a valid test of TRUE vs FALSE, allowing only one core for each MP engine, but if it is then I should get as good or better results for SF when I set the option to false tomorrow. If that happens then it will pretty much prove that SF has a massively better MP implementation than Houdini.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: If you don't mind me budging in...

Post by lkaufman »

Milos wrote:
Adam Hair wrote:
Milos wrote:
arjuntemurnikar wrote:Book = AH_150_Opening_Suite
Unfortunately AH150 is heavily biased in Stockfish favor despite changing colors. Opening suit bias is at least 25Elo.
Based on what evidence? Simply because Stockfish performed better than Houdini at longer time controls with these openings?
So Adam deleted my post calling it libel since I accused Mark Young who is publishing a lot of results without single proof (no pgn, no opening suit, no book, nothing) of using biased set of opening.
He intentionally put me in impossible position requiring from me to prove someone wrong when I don't have any material, basically to prove wrong results which just exist as numbers.

So here is my evidence and now you Adam prove me wrong.
1'+0.5'' single core run, 5 games (GUIs) in parallel, machine Phenom X6 1055T OC 3.7GHz, 256MB hash, H4 x64 pro B (with Syz 5-men) vs. SF251213:
a) 1500 openings (my openings remaining private I can't publish them, mix of all known sets Nunn, Noomen, Silver, Gambit suits, and many more limited on 6 moves):
873+/1710=/417- +53.2Elo for H4

b) 150 AH openings, repeated 10 times
791+/1636=/573- +25.3Elo for H4

AH opening suit bias = 27.9Elo

I'm not getting into motives (since I don't want to be called for libel) is it intentional or accidental bias...
Regardless of the book used (I exclude anything obviously ridiculous, like say a book with 90% open games, but no one is using any silly books), the overall results are pretty clear that Houdini 4 is still comfortably stronger than any SF version at bullet speed on single core, but there is growing evidence that the contrary is true on 4 or more cores. We need more MP evidence from varied sources.