program style, risk aversion

Don · Post by **Don** » Thu Dec 13, 2012 5:38 pm

jdart wrote:I think you can also measure this by looking at evals.

Some programs have high king safety scores. Scorpio is one example, Stockfish also I think. Houdini's scores in similar positions seem to be much lower, in my experience (this leads to a somwhat different conclusion than yours about Houdini's style: I think it is very good at finding winning shots but not quick to make a sacrifice or risky move that may not win).

--Jon

I'm not drawing any conclusions here and you will probably notice that I made a disclaimer because I don't fully understand what it really means either. So when I say that Houdini seems more "willing" to take chances to win and lose rather than draw, I'm really saying that Houdini plays in a such a way that it tends to lose more relative to these other two programs in order to get more wins. The mechanism behind it is not obvious to me. It could be something relatively simple that has little to do with style.

So I am very interesting in knowing what makes some programs more conservative than others in their "draw fear" or whatever you want to call it. It could indeed have a lot to do with king safety but that does not imply a superiority in king safety because it leads to more losses too. Another test is to create versions of program with King safety turned down or turned off and do a similar study.

Also, I only sampled 3 programs, I seriously doubt this exposes the full range of behaviors. In fact, even though Houdini seems to be a little more draw averse it does not seem to be a major thing - it may be the case that all 3 of these programs would be considered very similar with respect to this behavior if we studied a lot more programs. I don't believe that Houdini for example is especially reckless in this regard and Komodo is perfectly capable of speculative (and committal) moves.

This may simply come down to the quality of the evaluation function but I don't know in which direction - I can think of reasons why it would imply a stronger evaluation function as well as a weaker one.

Ajedrecista · Post by **Ajedrecista** » Thu Dec 13, 2012 6:00 pm

Hi Don:

Don wrote:I did not walk through the math yet, I just read it through once but can I assume that you are looking at what the maximum number of draws can be given the score and doing something with that? For example if you are winning 90% of the games you cannot be drawing more than 20% of them. Then you are somehow comparing the actual draws to the maximum possible draws and perhaps looking at that ratio to understand risk averseness?

You hit the spot! If µ = 0.9, the two extreme cases are no draws (+90% -10% =0%) and the maximum number of draws (+80% -0% =20%) and the real number of draws is obviously in the range [0, D_max].

If you read again my original post, you will notice that the word 'method' is in italics just before the point a): it is because this 'method' is not a trustable method itself but something I came up. I think it is better to do it with Round Robin tournaments (logic tournaments, not Houdini and a random move generator, of course!) for having the same number of games and each engine playing against the rest of the engines).

Don wrote:I'll try to wade through the math later and see if it makes sense to me but surely there is more noise in your method - any time you extrapolate (or interpolate) you lose some accuracy - so if it's sound I would suggest that you could use a hybrid approach - just make some educated guess adjustment to get you closer.

Good luck with the math! Because my 'method' is like Hocus-Pocus, so nothing really serious. My intention was trying to avoid you all the time adjusts, which are difficult and time consuming IMHO. It would be an honour and an achievement for me to use and hybrid approach: that would mean that my proposal is somewhat useful!

Please do not take it very seriously. It can work... or not (more likely not). But at least we reached the same conclusions: SF appears to be the most draw friendly while Houdini 3 is just the opposite. I know, it is only a coincidence...

I do not know where I extrapolate or interpolate (maybe with the original data rounded up to 1%?) and I agree that my 'method' will have tons of noise, although I do not know what noise can mean in this simple method.

Thank you very much for your interest. Good luck with the rewrite of Komodo!

Regards from Spain.

Ajedrecista.

Ajedrecista · Post by **Ajedrecista** » Thu Dec 13, 2012 6:19 pm

Hi Adam:

Adam Hair wrote:Hi Jesús,

As you undoubtedly have noticed, k is negatively correlated with the average rating difference between an engine and its opponents. The highest and lowest rated engines in a list will have a higher k (in general) because they will have a lower draw rate.

Also, k is correlated with the average rating of the opponents of an engine. IIRC, this presents no problem for the IPON data, for every engine played each other. If CCRL data was used, then it would be a factor. Kirill Kryukov has a page demonstrating the correlation between draw rate and average rating difference and with the average rating of the opponents.

Thanks for the link. I already knew that but I only take a glance to it from time to time; I did not take a look to that link for elaborating my proposal.

I agree with the fact that IPON is a good place to make my experiment, as opposed to CCRL and CEGT: I prefer Round Robin tournaments: it is the same I just wrote when I replied to Don.

Adam Hair wrote:After stating the obvious, I would like to ask you about the motivation for using µ*(1-µ) as an adjustment factor and justification for using it. My brain is working slow this morning
and I have not determined why it is okay to use µ*(1-µ).

I find your post to be very interesting (even if I have not understood the use of µ*(1-µ)).

Adam

Thanks for finding my post interesting!

The factor correction µ*(1 - µ) came to me when I was writing my original post. It is not something serious but just intuitive: as µ already affect to k, then I use µ again in a try to smooth things. And why µ*(1 - µ)? Because it is familiar to me (I use it in the calculation of the standard deviation of a match between two engines). In this case it is not about standard variations. If you want a sincere answer, here it is: I liked that factor and I used it! You can see that it is not a formal justification... your brain may be working well after all, because there is not a concrete reason of why I used µ*(1 - µ) instead anything else. Maybe someone can find a better factor or, even better, a better proposal or method! I wish good luck. I left here my two cents.

If you read my original post again, you will find that the word 'method' is in italics just before the point a): it is because my proposal is not a method itself but something weird and not serious that can give an idea, and of course it can work... or not! The title of my posts can be misleading and 'proposal' could be better than 'method'.

Regards from Spain.

Ajedrecista.

Don · Post by **Don** » Thu Dec 13, 2012 6:23 pm

Ajedrecista wrote:Hi Don:

Don wrote:I did not walk through the math yet, I just read it through once but can I assume that you are looking at what the maximum number of draws can be given the score and doing something with that? For example if you are winning 90% of the games you cannot be drawing more than 20% of them. Then you are somehow comparing the actual draws to the maximum possible draws and perhaps looking at that ratio to understand risk averseness?
You hit the spot! If µ = 0.9, the two extreme cases are no draws (+90% -10% =0%) and the maximum number of draws (+80% -0% =20%) and the real number of draws is obviously in the range [0, D_max].

If you read again my original post, you will notice that the word 'method' is in italics just before the point a): it is because this 'method' is not a trustable method itself but something I came up. I think it is better to do it with Round Robin tournaments (logic tournaments, not Houdini and a random move generator, of course!) for having the same number of games and each engine playing against the rest of the engines).

Don wrote:I'll try to wade through the math later and see if it makes sense to me but surely there is more noise in your method - any time you extrapolate (or interpolate) you lose some accuracy - so if it's sound I would suggest that you could use a hybrid approach - just make some educated guess adjustment to get you closer.
Good luck with the math! Because my 'method' is like Hocus-Pocus, so nothing really serious. My intention was trying to avoid you all the time adjusts, which are difficult and time consuming IMHO. It would be an honour and an achievement for me to use and hybrid approach: that would mean that my proposal is somewhat useful!

It would obviously be a good thing if we could take advantage of all the data already out there - so it's definitely worth take a look at.

Please do not take it very seriously. It can work... or not (more likely not). But at least we reached the same conclusions: SF appears to be the most draw friendly while Houdini 3 is just the opposite. I know, it is only a coincidence...

I do not know where I extrapolate or interpolate (maybe with the original data rounded up to 1%?) and I agree that my 'method' will have tons of noise, although I do not know what noise can mean in this simple method.

Thank you very much for your interest. Good luck with the rewrite of Komodo!

Regards from Spain.

Ajedrecista.

Don · Post by **Don** » Thu Dec 13, 2012 6:34 pm

Ajedrecista wrote: The main advantage of my method is avoid the time adjust; a drawback can be the calculations, but luckily they are easy (only multiplications and divisions!). Any comments, insights... are welcome.

Regards from Spain.

Ajedrecista.

It's not that difficult to make the adjustment but yes, it does take a little time. You start with an educated guess but once you have a few games you make major corrections on the second pass to get as close as possible. Once you are close you don't make the adjustments by trial and error, it's fairly easy to compute how much time you need to add or subtract. I simply assume that 1% additionally more time is worth 2 ELO which is about right. That 1% = 2 ELO doesn't work at all levels of course but once you have a feel for it you can zoom it pretty closely. The final pass is where you fine tune. My goal is to get everything within 5 ELO. I think it took 4 passes where I played more games with each pass. It can take more passes if one program does significantly better or worse than it should due to statistical noise.

If I did a lot of this I would create a more formalized system of getting it right with the least number of wasted games.

Don

Don · Post by **Don** » Thu Dec 13, 2012 6:46 pm

Don wrote:
Ajedrecista wrote: The main advantage of my method is avoid the time adjust; a drawback can be the calculations, but luckily they are easy (only multiplications and divisions!). Any comments, insights... are welcome.

Regards from Spain.

Ajedrecista.
It's not that difficult to make the adjustment but yes, it does take a little time. You start with an educated guess but once you have a few games you make major corrections on the second pass to get as close as possible. Once you are close you don't make the adjustments by trial and error, it's fairly easy to compute how much time you need to add or subtract. I simply assume that 1% additionally more time is worth 2 ELO which is about right. That 1% = 2 ELO doesn't work at all levels of course but once you have a feel for it you can zoom it pretty closely. The final pass is where you fine tune. My goal is to get everything within 5 ELO. I think it took 4 passes where I played more games with each pass. It can take more passes if one program does significantly better or worse than it should due to statistical noise.

If I did a lot of this I would create a more formalized system of getting it right with the least number of wasted games.

Don

It also occurred to me that you don't have to waste any games if you make the adjustments dynamically during the run. For example if one program starts to pull ahead you can adjust it down. The goal would be to finish with an even score. However I resisted doing this as I don't know if I can trust the data to be as valid. I believe it probably would be more or less valid but with low order errors, perhaps low enough to be ignored. If I were to do this I would still want to start with something that was as close as possible - and you could automate this method too.

With a method to automate it you can always throw out early results if they are found to be ridiculously out of line but presumably an automated method would make the adjustments very quickly when that happens and the total influence of these results would dwindle to almost nothing over time.

Kempelen · Post by **Kempelen** » Thu Dec 13, 2012 8:42 pm

jdart wrote:I think you can also measure this by looking at evals.

Some programs have high king safety scores. Scorpio is one example, Stockfish also I think. Houdini's scores in similar positions seem to be much lower, in my experience (this leads to a somwhat different conclusion than yours about Houdini's style: I think it is very good at finding winning shots but not quick to make a sacrifice or risky move that may not win).

--Jon

I read here: http://www.chessbase.com/newsdetail.asp?newsid=8591 that Houdini does not score evals as other engines doing 'pawn counts', but as probability of winning the game:

For example, when Houdini 3 shows a +1.00 evaluation in the middle game it has an 80% chance to win the game against an equally strong opponent at blitz time controls. I believe this is a very useful aspect of the engine.

This is maybe a reason why Houdini has a different style.

Ajedrecista · Post by **Ajedrecista** » Thu Dec 13, 2012 8:48 pm

Hi Fermín:

Kempelen wrote:
jdart wrote:I think you can also measure this by looking at evals.

Some programs have high king safety scores. Scorpio is one example, Stockfish also I think. Houdini's scores in similar positions seem to be much lower, in my experience (this leads to a somwhat different conclusion than yours about Houdini's style: I think it is very good at finding winning shots but not quick to make a sacrifice or risky move that may not win).

--Jon
I read here: http://www.chessbase.com/newsdetail.asp?newsid=8591 that Houdini does not score evals as other engines doing 'pawn counts', but as probability of winning the game:
For example, when Houdini 3 shows a +1.00 evaluation in the middle game it has an 80% chance to win the game against an equally strong opponent at blitz time controls. I believe this is a very useful aspect of the engine.
This is maybe a reason why Houdini has a different style.

A more detailed explanation can be found at the official web of Houdini.

The engine evaluations have been carefully recalibrated so that +1.00 pawn advantage gives a 80% chance of winning the game against an equal opponent at blitz time control. At +2.00 the engine will win 95% of the time, and at +3.00 about 99% of the time. If the advantage is +0.50, expect to win nearly 50% of the time.

Regards from Spain.

Ajedrecista.

Laskos · Post by **Laskos** » Thu Dec 13, 2012 9:12 pm

I don't know if this was mentioned, isn't the draw rate related to contempt too?

Don · Post by **Don** » Thu Dec 13, 2012 9:48 pm

Laskos wrote:I don't know if this was mentioned, isn't the draw rate related to contempt too?

Yes, it would have an impact.

The contempt factors in my study are zero for Komodo, zero for stockfish and the default for Houdini 3 which I think is -1. That should have virtually no impact on the results.

However if you wish you can view the results as an upper bound on how eager it is to avoid a draw. With zero it would be more willing to accept a draw and thus more like Stockfish in this regard so if anything Houdini's draw fear is over-stated.

It terms of playing style I don't know what increased lose rate really means compared to the other programs. It goes from 59 to 62 percent decisive games for Houdini over the other 2 and I don't know if that is a lot or a little. I really which I had the resources to test a variety of programs (not a bunch of ippolits, but different real families.) I could have added strelka, robbo, ivanhoe but that would be like adding komodo 3, komodo 4 and so on. I tried to add splke but it was going to require very heavy time odds which would have increased the run time dramatically. The other top programs are commercial or too low rated or I cannot get them working on Linux.

program style, risk aversion

Re: program style, risk aversion

My numeric method for determine draw trends of each engine.

My numeric method for determine draw trends of each engine.

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: program style, risk aversion

Re: Program style, risk aversion.

Re: program style, risk aversion

Re: program style, risk aversion