Houdini beats Rybka4 with 57 – 43 %

Albert Silver · Post by **Albert Silver** » Sat Jun 05, 2010 3:04 am

kingliveson wrote:
Albert Silver wrote:
kingliveson wrote:
Albert Silver wrote:
kingliveson wrote:
albert silver wrote:Please note that there are no universal settings AFAIK. For example, my best settings at repeating time controls, minimum speed being CCRL/CEGT Blitz, just got published at CCRL Blitz, with a huge ELO leap, but they aren't anywhere near as good at 2+0 for example.

For 2+0, no increments, my best is TC Buffer =1, Normal Move = 72, Max Move = 115.

These were tested with single-CPU (no SSE42 or Large Pages BTW), ponder off, and 512 MB hash.
As end users, we have fun finding optimal settings for particular play. It Should Be Absolutely Unacceptable For Independent Testers To Tweak Settings And Configurations For Any Single Engine. By doing so, you're are distorting outcomes and perverting results. Are they also going to tweak Shredder, Naum, Stockfish to find optimal settings, and would there be consensus that those are actual optimal?! If they continue to tweak a single engine or any engine for that matter, they risk losing credibility as independent testers. Please, stop and think about it. Default ought to be used always. For us however, it should be a fun thing to play with.
Ok, just to be clear, these are my best overall settings at 2+0 and co. I published the Houdini results, as that was the topic of the thread, but when I say best, I mean best overall after testing against Houdini, FB 1.2, and Stockfish. Results were improved against all 3 opponents.

As to your comment on the Blitz list. Clearly you never ever look at these lists or you would refrain from such a comment. There are TONS of variations of tons of engines. You'll find plenty of Chessmaster profile results, Shredder has indeed been tested with options on and off, Hiarcs too, and many more. You are way off base here.

I have built many books in the past using pgn databases from CCRL, CEGT, SSDF. I am indeed familiar with these lists (games and results). Normally for example, when Shredder OA (Opening Advice) is turned on/off, results are separated. The potential issue here is tweaking parameters and combining the results -- such should not be done -- just as results of R3 Dynamic, Human, and Default are not combined.
So you are saying they should stop testing what they want because of a potential issue you imagined?
If CCRL wants to become an arm of Rybka enterprise, that is of course its business. Tweaking Rybka parameters to play against other engines' default, again, you are perverting the ranking table. This should be easy to understand. They just will not be known as independent testers as it becomes known to everyone how these results come about.

Ah... Now it becomes clear. It is ok to test options with other engines, just not Rybka. Kind of interesting perspective.

kingliveson · Post by **kingliveson** » Sat Jun 05, 2010 3:20 am

Albert Silver wrote:
kingliveson wrote:
Albert Silver wrote:
kingliveson wrote:
Albert Silver wrote:
kingliveson wrote:
albert silver wrote:Please note that there are no universal settings AFAIK. For example, my best settings at repeating time controls, minimum speed being CCRL/CEGT Blitz, just got published at CCRL Blitz, with a huge ELO leap, but they aren't anywhere near as good at 2+0 for example.

For 2+0, no increments, my best is TC Buffer =1, Normal Move = 72, Max Move = 115.

These were tested with single-CPU (no SSE42 or Large Pages BTW), ponder off, and 512 MB hash.
As end users, we have fun finding optimal settings for particular play. It Should Be Absolutely Unacceptable For Independent Testers To Tweak Settings And Configurations For Any Single Engine. By doing so, you're are distorting outcomes and perverting results. Are they also going to tweak Shredder, Naum, Stockfish to find optimal settings, and would there be consensus that those are actual optimal?! If they continue to tweak a single engine or any engine for that matter, they risk losing credibility as independent testers. Please, stop and think about it. Default ought to be used always. For us however, it should be a fun thing to play with.
Ok, just to be clear, these are my best overall settings at 2+0 and co. I published the Houdini results, as that was the topic of the thread, but when I say best, I mean best overall after testing against Houdini, FB 1.2, and Stockfish. Results were improved against all 3 opponents.

As to your comment on the Blitz list. Clearly you never ever look at these lists or you would refrain from such a comment. There are TONS of variations of tons of engines. You'll find plenty of Chessmaster profile results, Shredder has indeed been tested with options on and off, Hiarcs too, and many more. You are way off base here.

I have built many books in the past using pgn databases from CCRL, CEGT, SSDF. I am indeed familiar with these lists (games and results). Normally for example, when Shredder OA (Opening Advice) is turned on/off, results are separated. The potential issue here is tweaking parameters and combining the results -- such should not be done -- just as results of R3 Dynamic, Human, and Default are not combined.
So you are saying they should stop testing what they want because of a potential issue you imagined?
If CCRL wants to become an arm of Rybka enterprise, that is of course its business. Tweaking Rybka parameters to play against other engines' default, again, you are perverting the ranking table. This should be easy to understand. They just will not be known as independent testers as it becomes known to everyone how these results come about.
Ah... Now it becomes clear. It is ok to test options with other engines, just not Rybka. Kind of interesting perspective.

If a separate table is created to note parameters were tweaked, there would not be an issue.

Albert Silver · Post by **Albert Silver** » Sat Jun 05, 2010 3:21 am

kingliveson wrote:
Albert Silver wrote:
kingliveson wrote:
Albert Silver wrote:
kingliveson wrote:
Albert Silver wrote:
kingliveson wrote:
albert silver wrote:Please note that there are no universal settings AFAIK. For example, my best settings at repeating time controls, minimum speed being CCRL/CEGT Blitz, just got published at CCRL Blitz, with a huge ELO leap, but they aren't anywhere near as good at 2+0 for example.

For 2+0, no increments, my best is TC Buffer =1, Normal Move = 72, Max Move = 115.

These were tested with single-CPU (no SSE42 or Large Pages BTW), ponder off, and 512 MB hash.
As end users, we have fun finding optimal settings for particular play. It Should Be Absolutely Unacceptable For Independent Testers To Tweak Settings And Configurations For Any Single Engine. By doing so, you're are distorting outcomes and perverting results. Are they also going to tweak Shredder, Naum, Stockfish to find optimal settings, and would there be consensus that those are actual optimal?! If they continue to tweak a single engine or any engine for that matter, they risk losing credibility as independent testers. Please, stop and think about it. Default ought to be used always. For us however, it should be a fun thing to play with.
Ok, just to be clear, these are my best overall settings at 2+0 and co. I published the Houdini results, as that was the topic of the thread, but when I say best, I mean best overall after testing against Houdini, FB 1.2, and Stockfish. Results were improved against all 3 opponents.

As to your comment on the Blitz list. Clearly you never ever look at these lists or you would refrain from such a comment. There are TONS of variations of tons of engines. You'll find plenty of Chessmaster profile results, Shredder has indeed been tested with options on and off, Hiarcs too, and many more. You are way off base here.

I have built many books in the past using pgn databases from CCRL, CEGT, SSDF. I am indeed familiar with these lists (games and results). Normally for example, when Shredder OA (Opening Advice) is turned on/off, results are separated. The potential issue here is tweaking parameters and combining the results -- such should not be done -- just as results of R3 Dynamic, Human, and Default are not combined.
So you are saying they should stop testing what they want because of a potential issue you imagined?
If CCRL wants to become an arm of Rybka enterprise, that is of course its business. Tweaking Rybka parameters to play against other engines' default, again, you are perverting the ranking table. This should be easy to understand. They just will not be known as independent testers as it becomes known to everyone how these results come about.
Ah... Now it becomes clear. It is ok to test options with other engines, just not Rybka. Kind of interesting perspective.
If a separate table is created to note parameters were tweaked, there would not be an issue.

I'm guessing you didn't look at the list at all, since the title of the engine has the parameters in the name? As opposed to the results of the default settings of course, for all to see.

http://www.computerchess.org.uk/ccrl/40 ... t_all.html

kingliveson · Post by **kingliveson** » Sat Jun 05, 2010 3:35 am

Albert Silver wrote:
kingliveson wrote:
Albert Silver wrote:
kingliveson wrote:
Albert Silver wrote:
kingliveson wrote:
Albert Silver wrote:
kingliveson wrote:
albert silver wrote:Please note that there are no universal settings AFAIK. For example, my best settings at repeating time controls, minimum speed being CCRL/CEGT Blitz, just got published at CCRL Blitz, with a huge ELO leap, but they aren't anywhere near as good at 2+0 for example.

For 2+0, no increments, my best is TC Buffer =1, Normal Move = 72, Max Move = 115.

These were tested with single-CPU (no SSE42 or Large Pages BTW), ponder off, and 512 MB hash.
As end users, we have fun finding optimal settings for particular play. It Should Be Absolutely Unacceptable For Independent Testers To Tweak Settings And Configurations For Any Single Engine. By doing so, you're are distorting outcomes and perverting results. Are they also going to tweak Shredder, Naum, Stockfish to find optimal settings, and would there be consensus that those are actual optimal?! If they continue to tweak a single engine or any engine for that matter, they risk losing credibility as independent testers. Please, stop and think about it. Default ought to be used always. For us however, it should be a fun thing to play with.
Ok, just to be clear, these are my best overall settings at 2+0 and co. I published the Houdini results, as that was the topic of the thread, but when I say best, I mean best overall after testing against Houdini, FB 1.2, and Stockfish. Results were improved against all 3 opponents.

As to your comment on the Blitz list. Clearly you never ever look at these lists or you would refrain from such a comment. There are TONS of variations of tons of engines. You'll find plenty of Chessmaster profile results, Shredder has indeed been tested with options on and off, Hiarcs too, and many more. You are way off base here.

I have built many books in the past using pgn databases from CCRL, CEGT, SSDF. I am indeed familiar with these lists (games and results). Normally for example, when Shredder OA (Opening Advice) is turned on/off, results are separated. The potential issue here is tweaking parameters and combining the results -- such should not be done -- just as results of R3 Dynamic, Human, and Default are not combined.
So you are saying they should stop testing what they want because of a potential issue you imagined?
If CCRL wants to become an arm of Rybka enterprise, that is of course its business. Tweaking Rybka parameters to play against other engines' default, again, you are perverting the ranking table. This should be easy to understand. They just will not be known as independent testers as it becomes known to everyone how these results come about.
Ah... Now it becomes clear. It is ok to test options with other engines, just not Rybka. Kind of interesting perspective.
If a separate table is created to note parameters were tweaked, there would not be an issue.
I'm guessing you didn't look at the list at all, since the title of the engine has the parameters in the name? As opposed to the results of the default settings of course, for all to see.

http://www.computerchess.org.uk/ccrl/40 ... t_all.html

I was just at CCRL, looked at live results and didn't see it. But good that it's now updated and parameters do appear on the engine's name. Nothing personal -- just want transparency.

Albert Silver · Post by **Albert Silver** » Sat Jun 05, 2010 3:44 am

Albert Silver wrote:
kingliveson wrote: I'm guessing you didn't look at the list at all, since the title of the engine has the parameters in the name? As opposed to the results of the default settings of course, for all to see.

http://www.computerchess.org.uk/ccrl/40 ... t_all.html
I was just at CCRL, looked at live results and didn't see it. But good that it's now updated and parameters do appear on the engine's name. Nothing personal -- just want transparency.

I'm guessing you were looking at either a non-updated list in your browser's cache, or the Best list, which shows only the best results of any particular engine, or variation thereof.

beram · Post by **beram** » Sat Jun 05, 2010 9:50 am

Code: Select all

LTC games
T8100 &#40;6,29 Fritzmark&#41;, LTC - 20m/40+10m/20+&#40;10m+12s&#41; Nunn2 – 
yet after 36 games of 50 played
                            
1   Houdini 1.01 w32 2_CPU   +10/=22/-4 58,33 %    21/36
2   Deep Rybka 4 w32         +4/=22/-10 41,67 %    15/36

Regarding the special Rybka 4 time control settings for short time control. This doen't count for my used LTC (20m/40+10m/20+(10m+12s))

I did earlier also test Fire 1.31 at this LTC and here Deep Rybka wins:

Code: Select all

T8100 &#40;6,29 Fritzmark&#41; LTC 20m/40+10m/20+&#40;10m+12s&#41;  50 games Bram privat suite 1.2
                      
1   Deep Rybka 4 w32  +15/=28/-7 58.00%   29.0/50
2   Fire 1.3 w32      +7/=28/-15 42.00%   21.0/50

IWB · Post by **IWB** » Sat Jun 05, 2010 12:08 pm

If I would play hypoteticaly 1000 games with one thread and 5 + 3 ponder on with Houdini 1.01 they would most likly end like this:

Code: Select all

     Houdini 1.01 x64 1_CPU    2935 1000.0 &#40;716.0 &#58; 284.0&#41;
                                    100.0 ( 49.5 &#58;  50.5&#41; Deep Rybka 4              2948
                                    100.0 ( 59.5 &#58;  40.5&#41; Stockfish 1.7.1 JA        2883
                                    100.0 ( 68.0 &#58;  32.0&#41; Naum 4.2                  2818
                                    100.0 ( 69.0 &#58;  31.0&#41; Komodo 1.2 JA             2801
                                    100.0 ( 68.0 &#58;  32.0&#41; Deep Shredder 12          2797
                                    100.0 ( 76.0 &#58;  24.0&#41; Critter 0.70              2788
                                    100.0 ( 82.5 &#58;  17.5&#41; HIARCS 13.1 MP 32b        2731
                                    100.0 ( 82.0 &#58;  18.0&#41; spark-0.4                 2713
                                    100.0 ( 78.0 &#58;  22.0&#41; Zappa Mexico II           2710
                                    100.0 ( 83.5 &#58;  16.5&#41; Deep Onno 1-2-70          2681

this would result in exactly the 20-30 Elo area like all the other Littos since Robbo 83 in the past - but agreed, at the upper end of that 20 Elo frame:

Code: Select all

   1 Deep Rybka 4              2948   15   15  2000   79%  2724   29%
   2 Rybka 3 mp 2T             2943   13   13  2100   75%  2764   34%
   3 Houdini 1.01 x64 1_CPU    2935   19   18  1000   72%  2787   39%
   4 Stockfish 1.7.1 JA 2T     2928   14   14  1800   73%  2766   38%
   5 Rybka 3 mp                2898    9    9  5000   74%  2725   34%
   6 Stockfish 1.7.1 JA        2883   11   11  3500   70%  2735   35%
   7 Naum 4.2 2T               2882   13   13  1900   64%  2786   42%
   8 Stockfish 1.6.x JA 2T     2863   14   14  1800   65%  2764   42%
   9 Rybka 3 32b               2848   14   14  1800   70%  2713   36%
  10 Deep Shredder 12 2T       2835   13   13  2100   58%  2777   39%
  11 Stockfish 1.6.x JA        2831   11   10  3200   65%  2723   39%
  12 Naum 4 2T                 2829   14   14  1600   60%  2761   41%
  13 Deep Fritz 12 32b 2T      2823   13   13  1900   55%  2790   44%
  14 Naum 4.2                  2818   10   10  3300   62%  2732   40%
  15 Rybka 2.3.2a mp           2802   11   11  3100   67%  2691   40%
  16 Komodo 1.2 JA             2801   12   12  2300   59%  2735   40%
  17 Deep Shredder 12 UCI 32b  2800    9    9  4000   62%  2720   38%

It is a pitty I dont do these test, they would end all that baseless and "amateurish" (why is someone testing only agains ONE engine - looks like intension?) speculations.

Bye
Ingo

PS: I forgot to mention that the enigne is crashing from time to time and remains in memory under full load.

beram · Post by **beram** » Sat Jun 05, 2010 1:31 pm

Thanks for your very mature reply. It helps a lot when people clarify matters with hypothetical match results. It would contribute more though, when you would test such things for real.
For instance why are the match results - at LTC - against Rybka4 so different between Fire 1.31 and Houdini 1.01 ?
Houdart is certainly more than just a copy-paste program. I am just like other people fascinated by these new and incredibly strong program.

Gino Figlio · Post by **Gino Figlio** » Sat Jun 05, 2010 2:38 pm

Deep Rybka 4 SSE42 x64 TC3100150 vs. Houdini x64 POPCNT_4CPU
2 CPU each, ponder off TC 1/0
Quad i7-920 4.0 GHz
GUI Aquarium 4.0.5
Narrowbook, played each opening twice

117-113, + 6 elo

beram · Post by **beram** » Sat Jun 05, 2010 3:18 pm

This is fairly in line with my results in 100 games Nunn2 and Private book at 4m 2sec on my T4300 Win7 64 bit (Fritzmark 5,9)
Deep Rybka 4 64 bit 2CPU - Houdini 1.01 64 bit 2CPU result 48,5 - 51,5

Seems Houdini 32 bit is doing much better against Deep Rybka 4 on the 32 bit platform

The difference between Deep Rybka 4 TC3100150 64 bit and DR4 64 bit standard is 21 ELO at the moment on the CCRL 40/4 list of 4 june.
So DR4 and Houdini 1.01 are about equally strong according to their matches at blitz on 64 bit systems.

kind regards Bram

Houdini beats Rybka4 with 57 – 43 %

Re: Houdini beats Rybka4 with 57 – 43 %

Re: Houdini beats Rybka4 with 57 – 43 %

Re: Houdini beats Rybka4 with 57 – 43 %

Re: Houdini beats Rybka4 with 57 – 43 %

Re: Houdini beats Rybka4 with 57 – 43 %

Re: Houdini beats Rybka4 - AT LTC - with 57 – 43 %

Re: Houdini beats Rybka4 with 57 – 43 %

Re: Houdini beats Rybka4 with 57 – 43 %

Re: Houdini beats Rybka4 with 57 – 43 %

Re: Houdini beats Rybka4 with 57 – 43 %