Only Robert Houdart could clarify that, but I don't think it explains why Houdini has a different style. What most likely happens is that, internally, Houdini has an eval like everyone else (probably a better one than most engines), and when it displays scores, it tries rescales them in this way. As to how the rescaling works, I don't know.Kempelen wrote:I read here: http://www.chessbase.com/newsdetail.asp?newsid=8591 that Houdini does not score evals as other engines doing 'pawn counts', but as probability of winning the game:jdart wrote:I think you can also measure this by looking at evals.
Some programs have high king safety scores. Scorpio is one example, Stockfish also I think. Houdini's scores in similar positions seem to be much lower, in my experience (this leads to a somwhat different conclusion than yours about Houdini's style: I think it is very good at finding winning shots but not quick to make a sacrifice or risky move that may not win).
--JonThis is maybe a reason why Houdini has a different style.For example, when Houdini 3 shows a +1.00 evaluation in the middle game it has an 80% chance to win the game against an equally strong opponent at blitz time controls. I believe this is a very useful aspect of the engine.
program style, risk aversion
Moderator: Ras
-
- Posts: 3241
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: program style, risk aversion
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: My numeric method for determine draw trends of each engi
I'm using the same exact formula that you are, the only difference is that I am "normalizing" them - comparing them to the total. Sum all the column and then use the sum as the numerator to get my numbers.Ajedrecista wrote:Hello Don:
I do not know how did you compute 'risk style' column this time. A higher number in this column means more likelihood of drawing, or less? I think that a higher number in your 'risk style' column means less likelihood to draw. Anyway, I get the following column c (rounding up to 0.0001):Don wrote:I'm running another test where komodo is given a high contempt factor and Houdini's is set to zero.
I only have a few hundred games each, but this appears to upset the balance a bit. Houdini gets stronger because contempt 1 is ridiculous against evenly matched opponents and Komodo gets weaker for the same reason but they are all within about 15 ELO of each other. I used 23 contempt in Komodo and it appears to have a much smaller effect on the results than changing it does for Houdini, probably due to the king safety issue Richard mentioned.
It appears from the data so far that Houdini is not particularly dynamic - the draw aversion was primarily a result of the contempt factor. It did not change Komodo very much.Please take all of this with a grain of salt. I'm not sure of the significance of any of this. I do have a hypothesis though. The hypothesis is that no strong program is going to be particularly draw fearing. To play really exiting "go for broke" chess you have to have a somewhat unsound evaluation function and strong programs do not have that. Maybe you can do some things to make them more "fun" but if you want your program to play soundly you cannot just sacrifice material left and right.Code: Select all
Percent Percent Percent Percent Risk Decisive Wins Losses Draws Style Player -------- -------- -------- -------- -------- ------------------- 60.56 29.47 31.09 39.44 3.07371 sf23 61.49 28.70 32.80 38.51 3.07323 kdev-4518.00 59.30 32.50 26.79 40.70 2.86312 hou3
By doing this the numbers are less arbitrary as they are compared to the sample.
A higher number c means more likelihood to draw, and viceversa. In this case with three engines, Houdini is the least 'draw fearing' engine.Code: Select all
Percent Percent Percent Percent Decisive Wins Losses Draws c Player -------- -------- -------- -------- -------- ------------------- 60.56 29.47 31.09 39.44 0.2004 sf23 61.49 28.70 32.80 38.51 0.2005 kdev-4518.00 59.30 32.50 26.79 40.70 0.2151 hou3 SF: (0.5 + |µ - 0.5|)*D = (0.5 + |0.4919 - 0.5|)*0.3944 ~ 0.2004 Komodo: (0.5 + |µ - 0.5|)*D ~ (0.5 + |0.47955 - 0.5|)*0.3851 ~ 0.2005 Houdini: (0.5 + |µ - 0.5|)*D ~ (0.5 + |0.5285 - 0.5|)*0.407 ~ 0.2151
Regards from Spain.
Ajedrecista.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
- Posts: 2104
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
My numeric method for determine draw trends of each engine.
Hello:
With this normalization, now higher numbers mean less risk aversion if we can define 'risk aversion' as 'being pleasant with draws', which is arguable.
Thanks again for your interest in my numerical idea! I stay tuned for further results and/or conclusions.
Regards from Spain.
Ajedrecista.
I see and I think it is a good idea:Don wrote:I'm using the same exact formula that you are, the only difference is that I am "normalizing" them - comparing them to the total. Sum all the column and then use the sum as the numerator to get my numbers.
By doing this the numbers are less arbitrary as they are compared to the sample.
Code: Select all
0.2004 + 0.2005 + 0.2151 = 0.616
0.616/0.2004 ~ 3.0739
0.616/0.2005 ~ 3.0723
0.616/0.2151 ~ 2.8638
(My numbers are slightly different from yours due to roundings).
And then, by definition:
1/(3.07371) + 1/(3.07323) + 1/(2.86312) ~ 1
Thanks again for your interest in my numerical idea! I stay tuned for further results and/or conclusions.
Regards from Spain.
Ajedrecista.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: My numeric method for determine draw trends of each engi
I took CCRL 40/4 and picked the engines around Rybka 1.0 strength. The strength must be accounted for, so I took engines of similar strength. For example, we know that stronger play (longer time control) increases the draw ratio, independent of the score ratio. A regression on strength (rating) can be made to account for it, but I am not ready to do it.
The draw ratio I divided by s(1-s), where s is the score ratio. Here is the draw averseness of the engines around Rybka 1.0 strength (smaller the number, more averse)
It seems the artistic impression about these engines has some grounds in this "numeric style". But more engines (and maybe a regression) must be included.
Kai
The draw ratio I divided by s(1-s), where s is the score ratio. Here is the draw averseness of the engines around Rybka 1.0 strength (smaller the number, more averse)
Code: Select all
Engine d / (1-s)s
Fritz 9 1.17
Junior 10 1.33
Shredder 10 1.33
Hiarcs 11 1.33
Fritz 10 1.36
Shredder 11 1.38
Zappa Mexico II 1.40
Fruit 2.3 1.42
Junior 11 1.43
Rybka 1.0 1.48
Doch 1.2 1.48
Naum 3.1 1.52
Fritz 11 1.55
Kai
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: My numeric method for determine draw trends of each engi
There clearly needs to be a lot of variety to be able to see the bigger picture. But I see that the contempt factor obfuscates the issue because that seems to have a fairly large impact on how draw averse a program is. I am more interested in the base playing style and how it affects this, not how skilled a program is in avoiding draw by repetition.Laskos wrote:I took CCRL 40/4 and picked the engines around Rybka 1.0 strength. The strength must be accounted for, so I took engines of similar strength. For example, we know that stronger play (longer time control) increases the draw ratio, independent of the score ratio. A regression on strength (rating) can be made to account for it, but I am not ready to do it.
The draw ratio I divided by s(1-s), where s is the score ratio. Here is the draw averseness of the engines around Rybka 1.0 strength (smaller the number, more averse)
It seems the artistic impression about these engines has some grounds in this "numeric style". But more engines (and maybe a regression) must be included.Code: Select all
Engine d / (1-s)s Fritz 9 1.17 Junior 10 1.33 Shredder 10 1.33 Hiarcs 11 1.33 Fritz 10 1.36 Shredder 11 1.38 Zappa Mexico II 1.40 Fruit 2.3 1.42 Junior 11 1.43 Rybka 1.0 1.48 Doch 1.2 1.48 Naum 3.1 1.52 Fritz 11 1.55
Kai
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
- Posts: 2104
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
My numeric method for determine draw trends of each engine.
Hello:
Don has posted more results in Open Chess Forum:
Draw aversion
I take the liberty to copy the info here because I think that these tests must be in TalkChess too:
According to the numbers of 'risk style' column, Spark is the most draw fear engine among these five while Spike is the most draw friendly, in the assumption that these numbers are actually correlated with draw aversion, which is a huge supposition.
Sorry for the long cross-posting but I think that TalkChess is a reference site in computer chess and also deserves to have this info.
Regards from Spain.
Ajedrecista.
Don has posted more results in Open Chess Forum:
Draw aversion
I take the liberty to copy the info here because I think that these tests must be in TalkChess too:
Thanks to Don for the credit given to me!Don wrote:Here are the players in my study the result of a few thousands game match. Note that I made a preliminary attempt to "time adjust" the ratings so that there would not be serious mismatches:
Code: Select all
Rank ELO +/- Games Score Player ---- ------- ------ -------- -------- ---------------------------- 1 3027.2 10.6 2938 52.280 spike14 2 3019.5 10.6 2940 50.901 kdev-4518.00 3 3015.8 10.6 2938 50.221 c16 4 3010.2 10.6 2940 49.218 sf23 5 3000.0 10.6 2938 47.379 spark1-0 w/l/d: 2582 2332 2433 33.12 percent draws
In a previous 3 player run I was meticulous about adjusting the ratings, coming within 5 ELO of each other. But a forumla was suggested by Jesús Muñoz which in his own words looks like this:
Code: Select all
µ_i: score of the i-th engine. D_i: draw ratio of the i-th engine. c_i = (0.5 + |µ_i - 0.5|)*D_i (c')_i = (0.5 - |µ_i - 0.5|)*D_i
When one program is significantly stronger than another the draw rate naturally goes way down so cannot simply observe the draw rate. In tests I did this formula appears to compensate for that, I could not make one program appears more drawish than another by manipulating the handicaps.
I "normalized" the values output by this formula by displaying each result as the denominator (and the sum as the numerator) to get this table - Risk Style is positive if the program wants to avoid draws:
Code: Select all
Percent Percent Percent Percent Risk Decisive Wins Losses Draws Style Player -------- -------- -------- -------- -------- ------------------- 68.90 31.82 37.08 31.10 5.19589 spark1-0 66.52 33.48 33.04 33.48 5.05858 c16 66.47 32.44 34.04 33.53 4.99435 sf23 66.20 34.04 32.17 33.80 4.94098 kdev-4518.00 66.28 35.42 30.86 33.72 4.82530 spike14
I am cautious to assign any deeper meaning to this - partly due to the contempt factor issue. It is simply an attempt to measure the "draw aversion" of a program in relation to other programs. Komodo has a default contempt of 7, Stockfish 0 and I have not checked the others. Some programs do not allow you to change it. With a little experimentation you can usually figure out what the contempt factor of a program is simply by setting up positions where it can force a draw and should.
I would love to run this test with many programs with contempt factors of zero.
According to the numbers of 'risk style' column, Spark is the most draw fear engine among these five while Spike is the most draw friendly, in the assumption that these numbers are actually correlated with draw aversion, which is a huge supposition.
Sorry for the long cross-posting but I think that TalkChess is a reference site in computer chess and also deserves to have this info.
Regards from Spain.
Ajedrecista.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: My numeric method for determine draw trends of each engi
I tested Houdini 3, Komodo 5 and Hiarcs 14 (all at default contempt), adjusted for strength
The "draw averseness" is (smaller - more averse) defined as draws over score*(1-score)
Houdini and Komodo almost equal, but Hiarcs much more "draw averse". Somehow a confirmation of an artistic impression that the recent Rybkish/Fruitish engines play a somewhat duller chess (but are much stronger).
Kai
Code: Select all
Program Score Elo + - Draws
1 Hiarcs 14 : 1022.5/2001 51.1 3005 13 13 28.3 %
2 Houdini 3 : 1000.5/2002 50.0 3000 13 13 32.0 %
3 Komodo 5 : 978.0/1999 48.9 2995 13 13 31.6 %
Code: Select all
Engine d / s(1-s)
Hiarcs 14 1.13
Komodo 5 1.26
Houdini 3 1.28
Kai
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: My numeric method for determine draw trends of each engi
Please feel free to cross post my stuff - my intent is not to "punish" talkchess but to save myself some time and aggravation.
Ajedrecista wrote:Hello:
Don has posted more results in Open Chess Forum:
Draw aversion
I take the liberty to copy the info here because I think that these tests must be in TalkChess too:
Thanks to Don for the credit given to me!Don wrote:Here are the players in my study the result of a few thousands game match. Note that I made a preliminary attempt to "time adjust" the ratings so that there would not be serious mismatches:
Code: Select all
Rank ELO +/- Games Score Player ---- ------- ------ -------- -------- ---------------------------- 1 3027.2 10.6 2938 52.280 spike14 2 3019.5 10.6 2940 50.901 kdev-4518.00 3 3015.8 10.6 2938 50.221 c16 4 3010.2 10.6 2940 49.218 sf23 5 3000.0 10.6 2938 47.379 spark1-0 w/l/d: 2582 2332 2433 33.12 percent draws
In a previous 3 player run I was meticulous about adjusting the ratings, coming within 5 ELO of each other. But a forumla was suggested by Jesús Muñoz which in his own words looks like this:
Code: Select all
µ_i: score of the i-th engine. D_i: draw ratio of the i-th engine. c_i = (0.5 + |µ_i - 0.5|)*D_i (c')_i = (0.5 - |µ_i - 0.5|)*D_i
When one program is significantly stronger than another the draw rate naturally goes way down so cannot simply observe the draw rate. In tests I did this formula appears to compensate for that, I could not make one program appears more drawish than another by manipulating the handicaps.
I "normalized" the values output by this formula by displaying each result as the denominator (and the sum as the numerator) to get this table - Risk Style is positive if the program wants to avoid draws:
Code: Select all
Percent Percent Percent Percent Risk Decisive Wins Losses Draws Style Player -------- -------- -------- -------- -------- ------------------- 68.90 31.82 37.08 31.10 5.19589 spark1-0 66.52 33.48 33.04 33.48 5.05858 c16 66.47 32.44 34.04 33.53 4.99435 sf23 66.20 34.04 32.17 33.80 4.94098 kdev-4518.00 66.28 35.42 30.86 33.72 4.82530 spike14
I am cautious to assign any deeper meaning to this - partly due to the contempt factor issue. It is simply an attempt to measure the "draw aversion" of a program in relation to other programs. Komodo has a default contempt of 7, Stockfish 0 and I have not checked the others. Some programs do not allow you to change it. With a little experimentation you can usually figure out what the contempt factor of a program is simply by setting up positions where it can force a draw and should.
I would love to run this test with many programs with contempt factors of zero.
According to the numbers of 'risk style' column, Spark is the most draw fear engine among these five while Spike is the most draw friendly, in the assumption that these numbers are actually correlated with draw aversion, which is a huge supposition.
Sorry for the long cross-posting but I think that TalkChess is a reference site in computer chess and also deserves to have this info.
Regards from Spain.
Ajedrecista.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: My numeric method for determine draw trends of each engi
Just so I would not be left out of the fun, I have done some work on this also. I compared the draw rates for each match to an estimate of draw rate as a function of Elo difference and found the average difference for each engine (throwing out the Zappa vs Fritz results due to being an outlier). I then adjusted the average differences due to the positive correlation of draw rates to Elo ratings. The resulting percentages represent the deviation of the IPON draw rates for each engine from the expected draw rates, given the Elo difference of each match and the strength of each engine. Here is Jesús' table with my draw deviation column added:Ajedrecista wrote:
From the IPON data:
Code: Select all
Name of the engine µ D D_max k k*µ*(1 - µ) Houdini 3 STD 82% 24% 36% 0.6667 0.0984 Komodo 5 73% 34% 54% 0.6296 0.1241 Critter 1.4a 71% 37% 58% 0.6379 0.1314 Stockfish 2.2.2 JA 69% 40% 62% 0.6452 0.138 Deep Rybka 4.1 68% 40% 64% 0.625 0.136 Chiron 1.5 52% 42% 96% 0.4375 0.1092 Deep Fritz 13 32b 51% 40% 98% 0.4082 0.102 Naum 4.2 50% 42% 100% 0.42 0.105 HIARCS 14 WCSC 32b 48% 40% 96% 0.4167 0.104 Hannibal 1.2 45% 40% 90% 0.4444 0.11 Gull 1.2 45% 39% 90% 0.4333 0.1073 Deep Shredder 12 45% 40% 90% 0.4444 0.11 Deep Sjeng c't 2010 32b 43% 41% 86% 0.4767 0.1169 Spike 1.4 32b 42% 40% 84% 0.4762 0.116 spark-1.0 41% 39% 82% 0.4756 0.1151 Protector 1.4.0 39% 39% 78% 0.5 0.119 Deep Junior 13.3 39% 34% 78% 0.4359 0.1037 Quazar 0.4 36% 37% 72% 0.5139 0.1184 Zappa Mexico II 32% 35% 64% 0.5469 0.119 MinkoChess 1.3 31% 36% 62% 0.5806 0.1242
Code: Select all
Name of the engine µ D D_max k k*µ*(1 - µ) Draw deviation
Houdini 3 STD 82% 24% 36% 0.6667 0.0984 -2.05%
Komodo 5 73% 34% 54% 0.6296 0.1241 -0.19%
Critter 1.4a 71% 37% 58% 0.6379 0.1314 0.10%
Stockfish 2.2.2 JA 69% 40% 62% 0.6452 0.138 1.68%
Deep Rybka 4.1 68% 40% 64% 0.625 0.136 2.41%
Chiron 1.5 52% 42% 96% 0.4375 0.1092 0.87%
Deep Fritz 13 32b 51% 40% 98% 0.4082 0.102 -0.07%
Naum 4.2 50% 42% 100% 0.42 0.105 0.63%
HIARCS 14 WCSC 32b 48% 40% 96% 0.4167 0.104 -1.06%
Hannibal 1.2 45% 40% 90% 0.4444 0.11 -0.04%
Gull 1.2 45% 39% 90% 0.4333 0.1073 -1.93%
Deep Shredder 12 45% 40% 90% 0.4444 0.11 -0.08%
Deep Sjeng c't 2010 32b 43% 41% 86% 0.4767 0.1169 0.45%
Spike 1.4 32b 42% 40% 84% 0.4762 0.116 0.24%
spark-1.0 41% 39% 82% 0.4756 0.1151 -0.03%
Protector 1.4.0 39% 39% 78% 0.5 0.119 -0.16%
Deep Junior 13.3 39% 34% 78% 0.4359 0.1037 -5.21%
Quazar 0.4 36% 37% 72% 0.5139 0.1184 -1.07%
Zappa Mexico II 32% 35% 64% 0.5469 0.119 0.00%
MinkoChess 1.3 31% 36% 62% 0.5806 0.1242 0.69%
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: My numeric method for determine draw trends of each engi
I didn't quite get this k factor, and I use only score*(1-score), or µ(1-µ)) in the notation of Jesus when comparing to draw ratio. The problem with the assumption that k*score*(1-score) is somehow constant is evidenced by this:
Score = 0.5, Draw Ratio = d = 0.40
Then k=0.4
k*s*(1-s)=0.1 assumed to be a constant for an engine
Same engine:
Score = 0.9
k is smaller than 1 by definition
Then k*s*(1-s) is smaller than 0.09
There is no k to match the old k*s*(1-s) of 0.1, and even the maximum k=1 is unrealistic, as there would be lots of wins and draws, but no losses.
On the other hand, if the factor is d / s*(1-s) then
Score = 0.5, Draw Ratio = d = 0.40
d / s*(1-s) = 1.6
Same engine:
Score = 0.9
The prediction for the same 1.6 is that d / 0.1*0.9 =! 1.6
Then d=0.09*1.6=0.144
So it predicts a result of 82.8% wins, 14.4% draws, and 2.8% losses, which is pretty realistic.
Therefore I think that d / s*(1-s) a more useful quantity than k*s*(1-s).
Anyway, I ran another test test with adjusted for strength engines (not perfectly adjusted):
And the draw averseness (smaller - more averse) is:
Again, "older style" engines seem more draw-averse.
Kai
Score = 0.5, Draw Ratio = d = 0.40
Then k=0.4
k*s*(1-s)=0.1 assumed to be a constant for an engine
Same engine:
Score = 0.9
k is smaller than 1 by definition
Then k*s*(1-s) is smaller than 0.09
There is no k to match the old k*s*(1-s) of 0.1, and even the maximum k=1 is unrealistic, as there would be lots of wins and draws, but no losses.
On the other hand, if the factor is d / s*(1-s) then
Score = 0.5, Draw Ratio = d = 0.40
d / s*(1-s) = 1.6
Same engine:
Score = 0.9
The prediction for the same 1.6 is that d / 0.1*0.9 =! 1.6
Then d=0.09*1.6=0.144
So it predicts a result of 82.8% wins, 14.4% draws, and 2.8% losses, which is pretty realistic.
Therefore I think that d / s*(1-s) a more useful quantity than k*s*(1-s).
Anyway, I ran another test test with adjusted for strength engines (not perfectly adjusted):
Code: Select all
Program Score % Elo Draws
1 Stockfish 2.3.1 : 520.5/834 62.4 3073 32.5 %
2 Komodo 5 : 421.5/842 50.1 3000 35.0 %
3 Rybka 4.1 : 404.5/819 49.4 2996 34.8 %
4 Hiarcs 14 : 415.0/845 49.1 2995 29.1 %
5 Houdini 3 : 394.0/840 46.9 2982 33.6 %
6 Junior 13 : 353.5/838 42.2 2954 26.1 %
Code: Select all
Engine d / s*(1-s)
Junior 13 1.07
Hiarcs 14 1.16
Houdini 3 1.35
Stockfish 2.3.1 1.39
Rybka 4.1 1.39
Komodo 5 1.40
Kai