program style, risk aversion

Adam Hair · Post by **Adam Hair** » Fri Dec 21, 2012 5:16 pm

Don wrote:
Adam Hair wrote: I will probably start tomorrow night.
I finally got my players time-adjusted. I think I am pretty close.

There is great disparity between the top programs and the weakest, Houdini 3 is adjusted to 6.5 seconds whereas Toga II 2.0 is the weakest program requiring 124 seconds.

By the way, an interesting side issue here:

If anyone thinks progress in computer chess is mostly about hardware, I think this disproves that as Toga II is stronger than the original Fruit program it is based on, which amazed everyone when it was released. That is about a 20 to 1 handicap. This takes us back about 6.5 years and I don't think hardware was 20 times slower then.

I have a similar time handicap for my weakest engine (Zappa Mexico Ii). I have to make an adjustment tonight or in the morning in order to get the scores a little closer to 50%.

By the way, are you playing game in x seconds or 40 moves in x seconds?

Don · Post by **Don** » Fri Dec 21, 2012 5:37 pm

Adam Hair wrote:
Don wrote:
Adam Hair wrote: I will probably start tomorrow night.
I finally got my players time-adjusted. I think I am pretty close.

There is great disparity between the top programs and the weakest, Houdini 3 is adjusted to 6.5 seconds whereas Toga II 2.0 is the weakest program requiring 124 seconds.

By the way, an interesting side issue here:

If anyone thinks progress in computer chess is mostly about hardware, I think this disproves that as Toga II is stronger than the original Fruit program it is based on, which amazed everyone when it was released. That is about a 20 to 1 handicap. This takes us back about 6.5 years and I don't think hardware was 20 times slower then.
I have a similar time handicap for my weakest engine (Zappa Mexico Ii). I have to make an adjustment tonight or in the morning in order to get the scores a little closer to 50%.

By the way, are you playing game in x seconds or 40 moves in x seconds?

I am playing Fischer time controls. The increment is 1/100 the main time.

Here are my settings - this is the configuration file my own tester uses and I would be curious to see how closely the handicaps match what you are getting:

Code: Select all

cpus = 8
book = standard


player = kdev-4518.00
invoke = /home/drd/u/kom/komodo/VERSIONS/4518.00/komodo
fis =  10 0.10
Hash = 64

player = sf23
invoke = sf23
fis = 17.0 0.170
Hash = 64
Threads = 1

player = spike14
invoke = spike14
fis = 61.1 0.611
Hash = 64
CPUs = 1

player = spark1-0
invoke = spark_1.0
fis = 43.1  0.431
Hash = 64
Threads = 1

player = c16
invoke = c16
fis = 8.3 0.083
Hash = 64
Threads = 1

player = hiarcs14
invoke = hi14
fis = 38.9 0.389
Hash = 64

player = Ivanhoe9.47b
invoke = ivh
fis = 12.3 0.123
Threads = 1
Hash = 64

player = TogaII_2.0
invoke = togaII
fis = 124 1.24
Hash = 64

player = Houdini3
invoke = hou3
Threads = 1
Hash = 64
fis = 6.5 0.065

Laskos · Post by **Laskos** » Fri Dec 21, 2012 7:18 pm

Adam Hair wrote:
Don wrote:
Adam Hair wrote: I will probably start tomorrow night.
I finally got my players time-adjusted. I think I am pretty close.

There is great disparity between the top programs and the weakest, Houdini 3 is adjusted to 6.5 seconds whereas Toga II 2.0 is the weakest program requiring 124 seconds.

By the way, an interesting side issue here:

If anyone thinks progress in computer chess is mostly about hardware, I think this disproves that as Toga II is stronger than the original Fruit program it is based on, which amazed everyone when it was released. That is about a 20 to 1 handicap. This takes us back about 6.5 years and I don't think hardware was 20 times slower then.
I have a similar time handicap for my weakest engine (Zappa Mexico Ii). I have to make an adjustment tonight or in the morning in order to get the scores a little closer to 50%.

By the way, are you playing game in x seconds or 40 moves in x seconds?

Meanwhile I ran a quick test myself. The time handicaps were similar to that of Don, only that my time controls were 2 times shorter and I use fixed time per move in order to avoid time losses. I didn't adjusted very well, so there are a couple of dozen Elo points differences. I used Houdini 3, Komodo 5, Critter 1.6, Stockfish 2.3.1, Rybka 4.1, Hiarcs 14 and Junior 13. Houdini 3 is with contempt=0.

Code: Select all

    Program                            Score     %      Elo    +   -    Draws

  1 Houdini 3                      : 891.5/1695  52.6   3016   13  13   35.9 %
  2 Junior 13                      : 879.0/1684  52.2   3013   14  14   28.1 %
  3 Komodo 5                       : 860.5/1691  50.9   3005   14  14   31.3 %
  4 Hiarcs 14                      : 858.0/1693  50.7   3004   14  14   27.8 %
  5 Critter 1.6                    : 846.5/1673  50.6   3004   14  14   33.5 %
  6 Rybka 4.1                      : 788.0/1651  47.7   2986   14  14   32.8 %
  7 Stockfish 2.3.1                : 760.5/1681  45.2   2972   14  14   31.0 %

The draw averseness using D/(u(1-u)), smaller means more averse:

Code: Select all

Engine        D/(u(1-u))

Hiarcs 14        1.11
Junior 13        1.13
Komodo 5         1.25
Stockfish 2.3.1  1.25
Rybka 4.1        1.31
Critter 1.6      1.34
Houdini 3        1.44

Hiarcs and Junior seems the most draw averse, Houdini 3 contempt 0 the most cautious. This was just a quick test anticipating your and Don larger tests.

Kai

Ajedrecista · Post by **Ajedrecista** » Fri Dec 21, 2012 8:17 pm

Hello Kai:

Laskos wrote:I used Houdini 3, Komodo 5, Critter 1.6, Stockfish 2.3.1, Rybka 4.1, Hiarcs 14 and Junior 13. Houdini 3 is with contempt=0.

Code: Select all

 

    Program                            Score     %      Elo    +   -    Draws 

  1 Houdini 3                      : 891.5/1695  52.6   3016   13  13   35.9 % 
  2 Junior 13                      : 879.0/1684  52.2   3013   14  14   28.1 % 
  3 Komodo 5                       : 860.5/1691  50.9   3005   14  14   31.3 % 
  4 Hiarcs 14                      : 858.0/1693  50.7   3004   14  14   27.8 % 
  5 Critter 1.6                    : 846.5/1673  50.6   3004   14  14   33.5 % 
  6 Rybka 4.1                      : 788.0/1651  47.7   2986   14  14   32.8 % 
  7 Stockfish 2.3.1                : 760.5/1681  45.2   2972   14  14   31.0 %

The draw averseness using D/(u(1-u)), smaller means more averse:

Code: Select all

 

Engine        D/(u(1-u)) 

Hiarcs 14        1.11 
Junior 13        1.13 
Komodo 5         1.25 
Stockfish 2.3.1  1.25 
Rybka 4.1        1.31 
Critter 1.6      1.34 
Houdini 3        1.44

Hiarcs and Junior seems the most draw averse, Houdini 3 contempt 0 the most cautious.

I calculated k*µ*(1 - µ) (rounding up to 0.0001) using 'Score' and 'Draws' columns:

Code: Select all

Engine            D/[µ*(1 - µ)]      k*µ*(1 - µ)

Hiarcs 14              1.11             0.0704
Junior 13              1.13             0.0733
Komodo 5               1.25             0.0796
Stockfish 2.3.1        1.25             0.0849
Rybka 4.1              1.31             0.0857
Critter 1.6            1.34             0.0848
Houdini 3              1.44             0.0944

We agree in the order of HIARCS, Junior, Komodo and Houdini.

Then I calculated mean and sample standard deviation (dividing by (n - 1) = 7 - 1 = 6) with the exact values (I have rounded values up to 0.0001) using a Fortran programme that I wrote in few minutes:

Code: Select all

                    KAI:

                            (Mean) ~  1.2624
       (Sample standard deviation) ~  0.1166

(Mean)/(sample standard deviation) ~ 10.8234

--------------------------------------------

                    ME:

                            (Mean) ~  0.0819
       (Sample standard deviation) ~  0.0081

(Mean)/(sample standard deviation) ~ 10.0565

This last code box is only for comparison purposes. I hope no typos this time.

@Everybody that work hard in this thread: please keep up your good works!

Regards from Spain.

Ajedrecista.

BubbaTough · Post by **BubbaTough** » Fri Dec 21, 2012 8:44 pm

It is not at al clear to me that % draws is a good measure of risk aversive behavior. Endgame knowledge has a massive role in draw % in my opinion. If you watch an engine that does not understand K+B v. K is drawn, you will see it draw a lot of games. Add some knowledge of common drawish endgames that other engines do not have, again you will see % of draws go up.

I also don't really understand what you mean by a risk averse engine either. Assuming white and black are treated symmetrically in the eval function, does it mean high value for king safety, or low value? My guess is neither intrinsically leads to more decisive games. More important is how unique your eval is. If you are valuing the same things as other engines roughly the same ways, both are going to be avoiding / pursuing the same goals (such as king attack / king safety) and it may reduce decisiveness. On the otherhand, if two engines battle that value things very differently, there is much more chance of the development of unbalanced features (such as a pawn for an attack, or pawn structure for mobility) which in turn seems likely to lead to less draws.

-Sam

Don · Post by **Don** » Fri Dec 21, 2012 9:09 pm

BubbaTough wrote:It is not at al clear to me that % draws is a good measure of risk aversive behavior. Endgame knowledge has a massive role in draw % in my opinion. If you watch an engine that does not understand K+B v. K is drawn, you will see it draw a lot of games. Add some knowledge of common drawish endgames that other engines do not have, again you will see % of draws go up.

But this is still a program characteristic. We are not trying to define exactly what the term means, instead we are just studying the behavior. The term "risk averse" implies something a bit deeper I admit, but if you prefer just think of it as studying the draw percentage of various programs and we can try to figure out what the means later.

I also don't really understand what you mean by a risk averse engine either. Assuming white and black are treated symmetrically in the eval function, does it mean high value for king safety, or low value? My guess is neither intrinsically leads to more decisive games.

I have a wild theory but it's only a theory. I think if your evaluation function has a lot of things wrong, your program is going to look like it's taking more chances. But I might have it backwards, who knows.

But I do think evaluation can have an impact on the draw rate. You already have shown how lack of knowledge can make you draw more so to extend your example it would imply that a better evaluation function will make you draw less - just the opposite of my wild theory. Maybe the point is that the more accurate the evaluation function the more "opportunistic" a program can be about exploiting small advantages, even if time adjusted. When time adjusted a program just has less technique - but still has the same motives.

More important is how unique your eval is. If you are valuing the same things as other engines roughly the same ways, both are going to be avoiding / pursuing the same goals (such as king attack / king safety) and it may reduce decisiveness. On the otherhand, if two engines battle that value things very differently, there is much more chance of the development of unbalanced features (such as a pawn for an attack, or pawn structure for mobility) which in turn seems likely to lead to less draws.

That sounds good to me! I think I would agree with that.

-Sam

Laskos · Post by **Laskos** » Sat Dec 22, 2012 3:37 am

Hello Jesus,

Yes, our recipes agree pretty well. I left this quick test run for several more hours, now I have 10,000+ games. Adjusted for strength (not very well), Houdini 3 at contempt=0.

Code: Select all

    Program                            Score       %     Elo    +   -    Draws

  1 Junior 13                      : 1525.5/2914  52.4   3014   11  11   27.7 %
  2 Houdini 3                      : 1524.0/2921  52.2   3013   10  10   36.1 %
  3 Hiarcs 14                      : 1485.5/2918  50.9   3005   11  11   27.9 %
  4 Komodo 5                       : 1482.0/2920  50.8   3004   10  10   32.4 %
  5 Critter 1.6                    : 1454.5/2893  50.3   3002   10  10   33.5 %
  6 Rybka 4.1                      : 1372.0/2853  48.1   2989   10  10   32.8 %
  7 Stockfish 2.3.1                : 1318.5/2905  45.4   2972   10  10   31.6 %

Draw-averseness, which is D/u(1-u) in my case, is:

Code: Select all

Engine        D/(u(1-u))

Junior 13        1.11
Hiarcs 14        1.12
Stockfish 2.3.1  1.27
Komodo 5         1.30
Rybka 4.1        1.31
Critter 1.6      1.34
Houdini 3        1.45

Smaller means more averse. Maybe you could compute the values with your recipe.

Ajedrecista · Post by **Ajedrecista** » Sat Dec 22, 2012 1:36 pm

Hello Kai!

Laskos wrote:Hello Jesus,

Yes, our recipes agree pretty well. I left this quick test run for several more hours, now I have 10,000+ games. Adjusted for strength (not very well), Houdini 3 at contempt=0.
Code: Select all
    Program                            Score       %     Elo    +   -    Draws

  1 Junior 13                      : 1525.5/2914  52.4   3014   11  11   27.7 %
  2 Houdini 3                      : 1524.0/2921  52.2   3013   10  10   36.1 %
  3 Hiarcs 14                      : 1485.5/2918  50.9   3005   11  11   27.9 %
  4 Komodo 5                       : 1482.0/2920  50.8   3004   10  10   32.4 %
  5 Critter 1.6                    : 1454.5/2893  50.3   3002   10  10   33.5 %
  6 Rybka 4.1                      : 1372.0/2853  48.1   2989   10  10   32.8 %
  7 Stockfish 2.3.1                : 1318.5/2905  45.4   2972   10  10   31.6 % 
Draw-averseness, which is D/u(1-u) in my case, is:
Code: Select all
Engine        D/(u(1-u))

Junior 13        1.11
Hiarcs 14        1.12
Stockfish 2.3.1  1.27
Komodo 5         1.30
Rybka 4.1        1.31
Critter 1.6      1.34
Houdini 3        1.45 
Smaller means more averse. Maybe you could compute the values with your recipe.

Of course I can! In fact, k*µ*(1 - µ) = [max.(µ, 1 - µ)]*D/2; I use 'Score' and 'Draws' columns again (I rounded k*µ*(1 - µ) up to 0.0001):

Code: Select all

Engine            D/[µ*(1 - µ)]      k*µ*(1 - µ)

Junior 13              1.11             0.0725
Hiarcs 14              1.12             0.071
Stockfish 2.3.1        1.27             0.0863
Komodo 5               1.30             0.0822
Rybka 4.1              1.31             0.0851
Critter 1.6            1.34             0.0842
Houdini 3              1.45             0.0942

I hope no typos. Disgracefully, this time Houdini is our only agreement... although it is true that we also agree in the fact that Junior and HIARCS are the most draw averse by a large margin, as well as Houdini is the least draw averse by a large margin too.

Thank you very much for running 10162 games! Please keep up your good work.

Regards from Spain.

Ajedrecista.

Laskos · Post by **Laskos** » Sat Dec 22, 2012 8:34 pm

Curious about the results of others, I left my little test run for few more hours, now with 16,000+ games

Code: Select all

    Program                            Score     %       Elo    +   -    Draws

  1 Junior 13                      : 2428.5/4620  52.6   3015    9   9   27.5 %
  2 Houdini 3                      : 2407.5/4630  52.0   3012    8   8   36.1 %
  3 Hiarcs 14                      : 2386.5/4629  51.6   3009    9   9   27.4 %
  4 Critter 1.6                    : 2339.0/4590  51.0   3006    8   8   33.3 %
  5 Komodo 5                       : 2348.5/4622  50.8   3005    8   8   31.4 %
  6 Rybka 4.1                      : 2152.0/4523  47.6   2986    8   8   33.1 %
  7 Stockfish 2.3.1                : 2048.0/4606  44.5   2967    8   8   31.8 %

Draw averseness with mine and Jesus scaling:

Code: Select all

Engine        D/(u*(1-u))    k*u*(1-u)

Hiarcs 14        1.10         0.071
Junior 13        1.10         0.072
Komodo 5         1.26         0.080
Stockfish 2.3.1  1.29         0.088
Rybka 4.1        1.33         0.087
Critter 1.6      1.33         0.085
Houdini 3        1.45         0.094

The error bars for the first column are 0.02 for 95% confidence, but everything is subject to my shorter (by a factor of 2 compared to Don) time controls.

Adam Hair · Post by **Adam Hair** » Wed Dec 26, 2012 1:47 pm

My test is under way and ~1200 of 7800 games have been played so far.

Here are my handicaps (40moves/Xseconds):

Code: Select all

Name                    TC
Houdini 3              40/14
Critter 1.4            40/22
Komodo 5               40/30
Rybka 4.1              40/30
Stockfish 2.2.2        40/40
Naum 4.2               40/84
Hannibal 1.2           40/108
Gull 1.2               40/110
Spike 1.4              40/130
Spark 1.0              40/140
Protector 1.4.0        40/175
Quazar 0.4             40/180
Zappa Mexico II        40/280

program style, risk aversion

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

My numeric method for determine draw trends of each engine.

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

My numeric method for determine draw trends of each engine.

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi