lkaufman wrote:I'm rerunning my test using Fritz 11, but so far the results are similar; SF leading by 31 elo after 74 games. So the GUI doesn't seem to be the issue.
I am not having any issues in my test with the gui, even when I tested at 1m+0s. But I am running a very clean setup, my CPU usage is 0 to 2% before testing. I have nothing running in the background other then normal windows system usage.
So what is going on... are the results legit in our test that show stockfish beating Houdini at these time controls. Are we missing something... I am stumped at this point. If somehow we are giving Stockfish a unfair advantage in our setup. You are testing Stockfish the standard way 4 cpu on a 4 core system. I know I am testing HT, but the results is the same. Meaning stockfish is winning by more then just noise factors. In you add up all the results. Or not?
Maybe someone can see a problem in the games I posted. I left all move and time data in the PGN. So it could be put straight into a Fritz GUI for evaluation.
Well, if you just posted pgns earlier it would be easier to detect. Even though your pgn's are mostly hopeless (no depth or nps info) judging by the time (and despite ridiculous TC), SF is using 20-30% more time than H4 (this is a drastic difference), or simply said H4 is never using its time which is obviously GUI problem.
So my message to you don't use Fritz or similar crap (which is known for not following UCI standard properly) for serious engine testing.
I am going to assume that you were confused by the GUI's output. I almost never use the Fritz GUI myself, but I have looked at enough raw pgns from to be able to help you out here
I extracted the search times recorded in Mark's pgns. I threw out the first 8 moves (all book moves) and all moves after the engines reported mate scores. Here are the average search times for each game:
17.9 20.6 H4 lost
15.6 16.0 H4 lost
15.7 17.0 H4 lost
15.2 15.3 H4 lost
16.3 11.3 H4 won
15.3 12.2 H4 won
14.8 14.8 H4 lost
15.2 12.0 H4 won
14.6 12.5 H4 won
Houdini actually took more time than Stockfish in the games it lost, and less time in the games it won.
Thank you for your analysis.
P.S. I don't think 2m+12s is a weird TC. It is played by thousands of players. I play this time control myself. It is a popular online TC. And it is one of the default preset testing TC in the Houdini 4 GUI.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung wrote:...i am almost tired of being right.
You were only right in giving the correct interpretation of the Elo in your table, as many people don't and I am surprised Ronald de Man also made that mistake.
For the rest, I still think you are a completely reckless tester, not even recognizing your results as completely off-base and looking at least for an explanation.
If my criticism and Larry's investigation had not helped, you would still be playing alone in your dark age cavern!
But redemption can come to all sinners...
Peace and love!
P.S. I don't think 2m+12s is a weird TC. It is played by thousands of players. I play this time control myself. It is a popular online TC. And it is one of the default preset testing TC in the Houdini 4 GUI.
It is not weird, it is simply inefficient. Many dead drawn endgames have to be played to the bitter end taking 12 seconds per move. Another characteristic of it is that it largely removes time management as a factor, since the allocation of the base time is no longer significant and using an increment is pretty straightforward. Whether this is a plus or a minus is a matter of personal preference.
bnculp wrote:I ran 3 engine matches on my I7-3720QM quad system with hyperthread enabled. Each match was 1000 games using Cutechess at time control 15 sec + .05 sec, hash at 128mb, Houdini contempt set to 0, opening book 8moves_v3.pgn all openings repeated with colors reversed, no egtbs used.
Stockfish 311213 8-threads vs Stockfish 311213 4-threads : ELO +7
Houdini 4Pro 8-threads vs Houdini 4Pro 4-threads : ELO -14 Stockfish 311213 8-threads vs Houdini 4Pro 4-threads : ELO +7
Summary - in this test Stockfish 8-threads beat Stockfish 4-threads, Houdini 4-threads beat Houdini 8-threads, and Stockfish 8-threads beat Houdini 4-threads.
+7 elo was the exact increase for ht that i had also in my ht test. I would recommnd ht as best for stockfish. But people are bias against this setting. Because it has always been said by most programmers not to use ht as it hurts performance in chess programs. I detected the reverse in stockfish. So i test and use the ht setting. At worst it does not hurt stockfish. And may help by a few elo.
None of these tests show anything about using HT vs no HT. They show that if you do have HT enabled, with SF it pays to use it by running 2 threads per core. This is very different than saying that HT helps. With single core testing I found that if you have HT enable but don't use it (i.e. you keep threads = cores) you take a serious hit, maybe nearly 10%. So I think it is probably the case that HT off is still slightly better than HT on with threads = 2 x cores.
Still, with HT off, I'm showing +14 elo for SF after 775 games in MP mode (4t) at game/1' on Fritz 11 gui.
My latest test is the same conditions as before except :
1) I am running on an i7-2600k with NO hyperthreading
2) each engine is using 4 threads
Stockfish 311213 4-threads vs Houdini 4Pro 4-threads : ELO -16
Hyperthreading appears to be somewhat of a factor. With HT on and using 8 threads, SF won by 7 ELO. With HT off using 4 threads, SF loses by 16 ELO.
IMHO Stockfish is getting stronger almost daily. Its already gained 25 ELO since the DD release back in November 2013. Houdini used to destroy any engine at the time controls I tested at (15sec + .05sec). Those days appear to be over.
bnculp wrote:
1) I am running on an i7-2600k with NO hyperthreading
2) each engine is using 4 threads
In yet another attempt to clarify these issues, the i7-2600k is a 4 core/8 thread processor, so when you say you are running "no HT", you are saying you are using your GUI to set threads equal to 8 (HT) or threads equal to 4 (no HT)?
bnculp wrote:
1) I am running on an i7-2600k with NO hyperthreading
2) each engine is using 4 threads
In yet another attempt to clarify these issues, the i7-2600k is a 4 core/8 thread processor, so when you say you are running "no HT", you are saying you are using your GUI to set threads equal to 8 (HT) or threads equal to 4 (no HT)?
Hyperthreading is an option that I can control in the system BIOS for the i7-2600k.
I had it turned off for that test so only 4 total cores are available. If I turn it on then there are 8 total cores available. Yes I use the engine GUI to allocate threads to each engine but that is limited by the BIOS hyperthreading option (either ON or OFF) . Note that for the i7-3720QM in previous tests there is no BIOS option to turn HT off, so for those tests HT had to be enabled.
I know that Mark Young sometimes uses "HT on" (8 threads) or "HT off" (4 threads) in reference to the number of threads that the GUI allocates. That may be confusing. In my view HT on or HT off is a hardware BIOS setting.
P.S. I don't think 2m+12s is a weird TC. It is played by thousands of players. I play this time control myself. It is a popular online TC. And it is one of the default preset testing TC in the Houdini 4 GUI.
It is not weird, it is simply inefficient. Many dead drawn endgames have to be played to the bitter end taking 12 seconds per move. Another characteristic of it is that it largely removes time management as a factor, since the allocation of the base time is no longer significant and using an increment is pretty straightforward. Whether this is a plus or a minus is a matter of personal preference.
It is weird because you effectively reduce games to fixed time per move (12sec), so you completely exclude TM component of tested engines and game quality per game length factor is really bad. Testing at this kind of TC just shows that a person performing such a test has no clue at all about chess engines or chess itself.
mwyoung wrote:...i am almost tired of being right.
You were only right in giving the correct interpretation of the Elo in your table, as many people don't and I am surprised Ronald de Man also made that mistake.
For the rest, I still think you are a completely reckless tester, not even recognizing your results as completely off-base and looking at least for an explanation.
If my criticism and Larry's investigation had not helped, you would still be playing alone in your dark age cavern!
But redemption can come to all sinners...
Peace and love!
Larry's results changed nothing. Just again confirmed my findings. Your problem is you again don't like the results. I test and post regardless of what others like or dislike about my testing. I have many years experience testing engines. If you don't trust my results. Don't read them. There are many other people that post.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.