Don wrote:I'm trying a test where Houdini is handicapped to the point that it is clearly weaker, then I will try the opposite - idea is to see if the numbers are roughly equivalent.
Don
Clarification: The idea is to see if the numbers produced by the formula Jesús proposes makes any sense.
Thank you very much for your interest! I stay tuned for the results. Honestly, I will be surprised if these numbers make sense.
------------
Going off-topic: I take a look to Distributed Perft(14) Calculation almost daily and I notice that updates are frozen for weeks! Is there any problem with the project? Thanks in advance for your attention.
Don wrote:I'm trying a test where Houdini is handicapped to the point that it is clearly weaker, then I will try the opposite - idea is to see if the numbers are roughly equivalent.
Don
Clarification: The idea is to see if the numbers produced by the formula Jesús proposes makes any sense.
Thank you very much for your interest! I stay tuned for the results. Honestly, I will be surprised if these numbers make sense.
------------
Going off-topic: I take a look to Distributed Perft(14) Calculation almost daily and I notice that updates are frozen for weeks! Is there any problem with the project? Thanks in advance for your attention.
Regards from Spain.
ajedrecista.
That project is still on, but we decided to re-engineer it to make the calculation more efficient. It was set up with about 9 million unique perft(6) positions and the idea was to do a perft(8) from all of these positions.
But it's more efficient to do it just the opposite and let one client work on a large set of positions. So the new scheme will start with all the unique perft(8) positions and the clients will do 1000 of them. The total amount of CPU power to do it this way will be substantially less. Unfortunately, that requires me to restructure the whole thing. I have all the unique 8 ply positions right now and compressed that is 5 gig. But I have to fix up the clients and the server to work with this new system.
Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Ajedrecista wrote:
Thank you very much for your interest! I stay tuned for the results. Honestly, I will be surprised if these numbers make sense.
It doesn't make sense for me to test this by increase or decreasing the strength of Houdini. In either case it only increases the number of decisive games for Houdini. It either starts winning more games if you increase it, or it starts losing more games but in both cases you decisive game rate is lowest only when your wins and losses are balanced out.
So instead I decided to see if I could make Stockfish look like the most decisive program by weakening it. This makes more sense since it already consuming the most testing resources to match Komodo and Houdini.
I only have a couple of hundred games so I need a lot more, but this appears to drop the Stockfish draw rate enough to be below Houdini's. And your formula is not "fooled" either, it still see's Houdini as the most draw fearing program. I'll give a full report when I have a few thousand games.
If this works, we may be able to run any programs without messing with time adjustments, just as you set out to do. It probably wouldn't hurt to at least make a rough estimated adjustment anyway because I have a feeling there are still second order errors in your way - but I admit I don't understand it completely. For example maybe it works better if you get all programs within 100 ELO of each other.
Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
I hope no typos although I went too fast with my calculator. I do not want to interpret these results very deeply because I am not sure about the validity of the last column (a higher number means a higher trend to draw, and viceversa).
I sorted the data for you. I'm guessing that the data makes no sense unless the games are played round robin style but for what it's worth here is what we get:
I hope no typos although I went too fast with my calculator. I do not want to interpret these results very deeply because I am not sure about the validity of the last column (a higher number means a higher trend to draw, and viceversa).
Regards from Spain.
Ajedrecista.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
I decided to normalize the risk style numbers - so basically I use your values but use them as the demonintor where the numerator is the total. This makes it possible to compare directly any 2 runs, even when the program are at different time controls.
The first is the big run I did at fast time controls:
This is the run where I crippled Stockfish significantly in order to lower it's draw rate to be below to lower it's draw rate. As you see the numbers are extremely similar.
I'm satisfied with this experiment and believe that your formula probably works acceptably although possibly it could be improved. At the 4x longer time control the differences compress a bit:
I'm running another test where komodo is given a high contempt factor and Houdini's is set to zero.
I only have a few hundred games each, but this appears to upset the balance a bit. Houdini gets stronger because contempt 1 is ridiculous against evenly matched opponents and Komodo gets weaker for the same reason but they are all within about 15 ELO of each other. I used 23 contempt in Komodo and it appears to have a much smaller effect on the results than changing it does for Houdini, probably due to the king safety issue Richard mentioned.
It appears from the data so far that Houdini is not particularly dynamic - the draw aversion was primarily a result of the contempt factor. It did not change Komodo very much.
Please take all of this with a grain of salt. I'm not sure of the significance of any of this. I do have a hypothesis though. The hypothesis is that no strong program is going to be particularly draw fearing. To play really exiting "go for broke" chess you have to have a somewhat unsound evaluation function and strong programs do not have that. Maybe you can do some things to make them more "fun" but if you want your program to play soundly you cannot just sacrifice material left and right.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Don wrote:I'm running another test where komodo is given a high contempt factor and Houdini's is set to zero.
I only have a few hundred games each, but this appears to upset the balance a bit. Houdini gets stronger because contempt 1 is ridiculous against evenly matched opponents and Komodo gets weaker for the same reason but they are all within about 15 ELO of each other. I used 23 contempt in Komodo and it appears to have a much smaller effect on the results than changing it does for Houdini, probably due to the king safety issue Richard mentioned.
It appears from the data so far that Houdini is not particularly dynamic - the draw aversion was primarily a result of the contempt factor. It did not change Komodo very much.
Please take all of this with a grain of salt. I'm not sure of the significance of any of this. I do have a hypothesis though. The hypothesis is that no strong program is going to be particularly draw fearing. To play really exiting "go for broke" chess you have to have a somewhat unsound evaluation function and strong programs do not have that. Maybe you can do some things to make them more "fun" but if you want your program to play soundly you cannot just sacrifice material left and right.
Have you finished the test (a few thousand games)? For now it seems that the contempt factor was the primary cause of Houdini "aversion" to draws. My guess is that all three (Houdini, Komodo, Stockfish) are in Fruit or Fruit/Rybka branch of engines, also artistically known as "dull" engines. When Fritz 11 (Fruit/Strelka/Rybka) appeared, many observed its dullness compared to the "brilliant" Fritz 10. Could you make pretty equal strength Junior 13 or Hiarcs 14 by adjusting their time controls, and compare one of them to those 3 super-engines from above? They are not in Fruit/Rybka branch, maybe we will see differences.
Don wrote:I'm running another test where komodo is given a high contempt factor and Houdini's is set to zero.
I only have a few hundred games each, but this appears to upset the balance a bit. Houdini gets stronger because contempt 1 is ridiculous against evenly matched opponents and Komodo gets weaker for the same reason but they are all within about 15 ELO of each other. I used 23 contempt in Komodo and it appears to have a much smaller effect on the results than changing it does for Houdini, probably due to the king safety issue Richard mentioned.
It appears from the data so far that Houdini is not particularly dynamic - the draw aversion was primarily a result of the contempt factor. It did not change Komodo very much.
Please take all of this with a grain of salt. I'm not sure of the significance of any of this. I do have a hypothesis though. The hypothesis is that no strong program is going to be particularly draw fearing. To play really exiting "go for broke" chess you have to have a somewhat unsound evaluation function and strong programs do not have that. Maybe you can do some things to make them more "fun" but if you want your program to play soundly you cannot just sacrifice material left and right.
Have you finished the test (a few thousand games)? For now it seems that the contempt factor was the primary cause of Houdini "aversion" to draws. My guess is that all three (Houdini, Komodo, Stockfish) are in Fruit or Fruit/Rybka branch of engines, also artistically known as "dull" engines. When Fritz 11 (Fruit/Strelka/Rybka) appeared, many observed its dullness compared to the "brilliant" Fritz 10. Could you make pretty equal strength Junior 13 or Hiarcs 14 by adjusting their time controls, and compare one of them to those 3 super-engines from above? They are not in Fruit/Rybka branch, maybe we will see differences.
Kai
I don't have Junior but I do have Hiarcs, but I was unable to get it working due the key issue. I did purchase it so it's just a matter of trying to make it work. I'm using Linux but I do have it working on a different machine but not one that I would use for this test. I'll look into getting it working.
Here is the final result of my last test - I set Komodo contempt to 23 and Houdini to 0 (which still is something like 13 but I don't know exactly how it works.)
Don wrote:I'm running another test where komodo is given a high contempt factor and Houdini's is set to zero.
I only have a few hundred games each, but this appears to upset the balance a bit. Houdini gets stronger because contempt 1 is ridiculous against evenly matched opponents and Komodo gets weaker for the same reason but they are all within about 15 ELO of each other. I used 23 contempt in Komodo and it appears to have a much smaller effect on the results than changing it does for Houdini, probably due to the king safety issue Richard mentioned.
It appears from the data so far that Houdini is not particularly dynamic - the draw aversion was primarily a result of the contempt factor. It did not change Komodo very much.
Please take all of this with a grain of salt. I'm not sure of the significance of any of this. I do have a hypothesis though. The hypothesis is that no strong program is going to be particularly draw fearing. To play really exiting "go for broke" chess you have to have a somewhat unsound evaluation function and strong programs do not have that. Maybe you can do some things to make them more "fun" but if you want your program to play soundly you cannot just sacrifice material left and right.
I do not know how did you compute 'risk style' column this time. A higher number in this column means more likelihood of drawing, or less? I think that a higher number in your 'risk style' column means less likelihood to draw. Anyway, I get the following column c (rounding up to 0.0001):