Has Critter 1.4 been tuned to play against Houdini 2.0 ?

mwyoung · Post by **mwyoung** » Mon Jan 02, 2012 5:29 am

gerold wrote:
mwyoung wrote:
gerold wrote:
mwyoung wrote:
gerold wrote:
mwyoung wrote:Critter 1.4 played very well against Houdini 2.0. After 300 game it was -29 elo behind Houdini 2.0.

But as I test Critter 1.4 against other engines, Critter 1.4 is having a ratings fall off against the lower tier programs.

I expected Critter 1.4 to be close to Houdini 1.5a in rating and become the new top rated free program. I can already say with high confidence this will not happen.

Anyone else seeing this in their testing?
After 150 games at 5/3 TC. Critter 1.4 is leading Houdini 1.5 +20

Best,
Gerold.

I am at 136 games at 4m+1s and Houdini 1.5a is winning against Critter 1.4 +38 =74 -24 TP=+36.

Playing same opening book, each side plays same opening with white and black.

Interesting results, why the difference?
Different books, Different time control.Different computers, Different conditions all the way around. Using the 32 bit version here.

Houdini has been in the lead by 14 elo a couple of times. At 100 games Critter was + 14. At 150 Critter was +20. Now at 190 games
Critter +14.
Only one factor accounts for Critter being better in your test IMO. You are testing 32 bit, I am testing 64 bit. Time controls, books, and hardware should not have that big of a effect If they are fair for both sides, since Critter and Houdini scale almost the same.
In a few days have a better idea which 32 bit is the strongest.
Got a few hundred games to play yet.
Best,
Gerold.

I do also, after 174 games Critter 64 bit is being clearly beat by Houdini 1.5a 64 bit +49 =93 -32

rvida · Post by **rvida** » Mon Jan 02, 2012 5:36 am

mwyoung wrote:Critter 1.4 played very well against Houdini 2.0. After 300 game it was -29 elo behind Houdini 2.0.

But as I test Critter 1.4 against other engines, Critter 1.4 is having a ratings fall off against the lower tier programs.

Unlike Houdini or Komodo, Critter does not have a built-in contempt. If it sees a position as drawn (or equal), it happily goes for a 3-fold repetition. IIRC Larry said that even a very small contempt (0.05cp) gives about 15-20 ELO for Komodo when playing against much weaker engines.

Indeed, once I saw Critter playing Crafty a very complicated middlegame which ended with a draw by repetition. With nonzero contempt Critter would have avoided the repetition and Crafty would sooner or later made a blunder and thus changing the result from 1/2-1/2 to 1-0.

But I am not a very big fan of contempt, I see it as factor artificially inflating ELO performance.

mwyoung · Post by **mwyoung** » Mon Jan 02, 2012 5:44 am

IGarcia wrote:

mwyoung wrote:
IGarcia wrote:The best "anti-houdini" engine, if there is such thing... is Komodo 4
I am not testing Komodo 4 SP, what have you seen to say this...

look at IPON

Code: Select all

1 Houdini 2.0 STD          3016 2900.0 (2277.5 : 622.5)
                                   100.0 ( 48.5 :  51.5) Komodo 4 SSE42           2977
                                   100.0 ( 50.0 :  50.0) Critter 1.4 SSE42        2975
                                   100.0 ( 57.0 :  43.0) Komodo64 3 SSE42         2965
                                   100.0 ( 57.5 :  42.5) Deep Rybka 4.1 SSE42     2956
                                   100.0 ( 57.5 :  42.5) Deep Rybka 4             2954
                                   100.0 ( 54.5 :  45.5) Critter 1.2              2952
                                   100.0 ( 63.5 :  36.5) Stockfish 2.1.1 JA       2941
                                   100.0 ( 77.5 :  22.5) Chiron 1.1a              2834
                                   100.0 ( 75.5 :  24.5) Naum 4.2                 2827
                                   100.0 ( 85.5 :  14.5) Fritz 13 32b             2819
                                   100.0 ( 74.5 :  25.5) Deep Shredder 12         2800
                                   100.0 ( 83.5 :  16.5) Gull 1.2                 2795
                                   100.0 ( 79.5 :  20.5) Deep Sjeng c't 2010 32b  2788
                                   100.0 ( 79.0 :  21.0) Spike 1.4 32b            2785
                                   100.0 ( 83.0 :  17.0) Deep Fritz 12 32b        2779
                                   100.0 ( 86.0 :  14.0) Protector 1.4.0          2759
                                   100.0 ( 85.5 :  14.5) Hannibal 1.1             2758
                                   100.0 ( 86.5 :  13.5) spark-1.0 SSE42          2755
                                   100.0 ( 84.5 :  15.5) HIARCS 13.2 MP 32b       2748
                                   100.0 ( 81.5 :  18.5) Deep Junior 12.5         2731
                                   100.0 ( 89.0 :  11.0) Zappa Mexico II          2716
                                   100.0 ( 88.5 :  11.5) Deep Onno 1-2-70         2684
                                   100.0 ( 89.0 :  11.0) Toga II 1.4 beta5c BB    2672
                                   100.0 ( 93.0 :   7.0) Strelka 2.0 B            2671
                                   100.0 ( 94.5 :   5.5) Umko 1.2 SSE42           2664
                                   100.0 ( 90.0 :  10.0) Loop 2007                2621
                                   100.0 ( 96.5 :   3.5) Jonny 4.00 32b           2614
                                   100.0 ( 92.0 :   8.0) Tornado 4.80             2608
                                   100.0 ( 94.5 :   5.5) Crafty 23.3 JA           2598

You are correct, that is a testing profile that reeks of a engine being tuned to single or small group of other engines. Look at the drop off in rating as Komodo 4 plays lower tier programs.

Now look at Stockfish's profile, note that the ratiing TPR is nearly flat across all tiers of programs.

Stockfish 2.2 JA - Houdini 2.0 STD (3016) 27.0 - 44.0 38.03% Perf=2932
Stockfish 2.2 JA - Critter 1.4 SSE42 (2977) 34.5 - 31.5 52.27% Perf=2992
Stockfish 2.2 JA - Komodo 4 SSE42 (2975) 35.0 - 36.0 49.30% Perf=2971
Stockfish 2.2 JA - Deep Rybka 4.1 SSE42 (2956) 36.5 - 34.5 51.41% Perf=2965
Stockfish 2.2 JA - Naum 4.2 (2827) 51.5 - 19.5 72.54% Perf=2995
Stockfish 2.2 JA - Deep Shredder 12 (2800) 50.0 - 21.0 70.42% Perf=2950
Stockfish 2.2 JA - Gull 1.2 (2795) 52.5 - 18.5 73.94% Perf=2976
Stockfish 2.2 JA - Deep Sjeng c't 2010 32b (2788) 49.0 - 21.0 70.00% Perf=2935
Stockfish 2.2 JA - Spike 1.4 32b (2785) 53.0 - 18.0 74.65% Perf=2972
Stockfish 2.2 JA - Protector 1.4.0 (2759) 54.5 - 16.5 76.76% Perf=2966
Stockfish 2.2 JA - Hannibal 1.1 (2758) 55.5 - 14.5 79.29% Perf=2991
Stockfish 2.2 JA - spark-1.0 SSE42 (2755) 52.5 - 18.5 73.94% Perf=2936
Stockfish 2.2 JA - HIARCS 13.2 MP 32b (2748) 57.0 - 13.0 81.43% Perf=3004
Stockfish 2.2 JA - Deep Junior 12.5 (2731) 52.0 - 18.0 74.29% Perf=2915
Stockfish 2.2 JA - Zappa Mexico II (2716) 52.0 - 17.0 75.36% Perf=2910
Stockfish 2.2 JA - Deep Onno 1-2-70 (2684) 58.0 - 12.0 82.86% Perf=2957
Stockfish 2.2 JA - Strelka 2.0 B (2671) 54.0 - 16.0 77.14% Perf=2882
Stockfish 2.2 JA - Umko 1.2 SSE42 (2664) 61.5 - 9.5 86.62% Perf=2988
Stockfish 2.2 JA - Loop 2007 (2621) 65.0 - 5.0 92.86% Perf=3066
Stockfish 2.2 JA - Jonny 4.00 32b (2614) 61.5 - 8.5 87.86% Perf=2957
Stockfish 2.2 JA - Tornado 4.80 (2608) 59.5 - 8.5 87.50% Perf=2946
Stockfish 2.2 JA - Crafty 23.3 JA (2598) 63.5 - 6.5 90.71% Perf=2993
1135.5 - 407.5 73.59% Perf=2943

mwyoung · Post by **mwyoung** » Mon Jan 02, 2012 7:07 am

rvida wrote:
mwyoung wrote:Critter 1.4 played very well against Houdini 2.0. After 300 game it was -29 elo behind Houdini 2.0.

But as I test Critter 1.4 against other engines, Critter 1.4 is having a ratings fall off against the lower tier programs.

Unlike Houdini or Komodo, Critter does not have a built-in contempt. If it sees a position as drawn (or equal), it happily goes for a 3-fold repetition. IIRC Larry said that even a very small contempt (0.05cp) gives about 15-20 ELO for Komodo when playing against much weaker engines.

Indeed, once I saw Critter playing Crafty a very complicated middlegame which ended with a draw by repetition. With nonzero contempt Critter would have avoided the repetition and Crafty would sooner or later made a blunder and thus changing the result from 1/2-1/2 to 1-0.

But I am not a very big fan of contempt, I see it as factor artificially inflating ELO performance.

Interesting...
I never thought of the contempt factors as a kind of cheat. I always consider contempt as a very simple and effective AI that all human chess players also employ. I would like to see chess programs use somekind of intelligent contempt. If that is possible....

Has Critter 1.4 been tuned to play against Houdini 2.0 ?

Re: Has Critter 1.4 been tuned to play against Houdini 2.0 ?

Re: Has Critter 1.4 been tuned to play against Houdini 2.0 ?

Re: Has Critter 1.4 been tuned to play against Houdini 2.0 ?

Re: Has Critter 1.4 been tuned to play against Houdini 2.0 ?