Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo). Download Games
Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo). Download Games
I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !
It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.
I took the unified list (around 1 min/move AMD 2Ghz) : http://home.scarlet.be/vincentlejeune/r ... 120910.txt
And I count SF2.3.1=SF2.2.2=3178.75; Houdini 2.0 64-bit= 3219.66
H3b = (3219.66+94+3178.75+82)/2 = 3287 (+67 over H2)
So disregarding the self-play it would be +41 over H2 based on that list, which would be a very respectable gain for a year if it turns out to be so.
This is just my advice- it and a buck will get you a coke. But I think you should make yourself scarce- whether true or not- it looks like you are of the mind "any port in a storm". I have no idea how Houdini 3 and Komodo 5 or 6 will end up. But if Houdini slaughters and quarters Komodo, you are going to look awful foolish. Best to say nothing and leave it to the pundits.
It doesn't matter if you are worried or not. You give the perception of being terribly worried. Doesn't matter if you are or not- that is moot- the perception of it is what people see and what is important. Let it go.
The results so far are pretty impressive. It looks like you have a nice gain here.
I think you have to run some long tests like you are doing here in order to really know that you have improved the program. I have noticed that a lot of programs are coming out with new versions that have impressive ELO gains until they are tested at "real" time controls. It's almost certainly a by-product of the fact that you are forced to test this fast to resolve small ELO improvements. It's more and more difficult to get big ELO improvements from a single change.
What makes your results impressive, ignoring the large error margin of course, is that at long time controls the relative ELO difference between programs tends to close up significantly.
Don
Don, thank you, we both know how hard work every Elo point gain is.
Before this run my slowest test match with Houdini 3 was at 2'+2" so I'm very happy that at about 30 times longer TC the gain is still significant.
Hopefully you'll catch up with Komodo, it's more fun for everyone if there's a good competition at the top.
Robert
Actually, I would not want everyone to lay down and die. I want the competition and I doubt Komodo would be very strong if everything had stagnated 5 years ago.
So yes, we are trying to catch Houdini - our current dev version is almost certainly better than Houdini 1.5 but it's difficult to catch a moving target so please sit still for a minute or two.
Here are some results based on my distributed tester, where volunteers use their machines to help me test at much longer time controls (the time controls are adjusted to the hardware where the stated time control represents a very fast overclocked machine.) In these tests Komodo never plays other versions of itself.
It seems based on the results that Komodo 4471.02 64 bit is significantly stronger than other versions but for some reason you tested it only in the 90+1 list.
We stopped testing it long ago because we knew it was superior - by the same token we no longer test Komodo 3 either. But we decided to put it on one of the tests even though we are not longer interested just to verify that we have made the progress we think we have. But we rarely go back and test version that we have long ago lost interest in simply because it's a huge waste of resources.
I wonder if it was really a big improvement relative to other versions of komodo or maybe there is some mistake in the data or some problem in the machine that tested it.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
It doesn't matter if you enter this discussion to only talk about the weather in Bora Bora, it is a no-no and a grave error for one of the top members of a competing team to be seen anywhere near this discussion.
The results so far are pretty impressive. It looks like you have a nice gain here.
I think you have to run some long tests like you are doing here in order to really know that you have improved the program. I have noticed that a lot of programs are coming out with new versions that have impressive ELO gains until they are tested at "real" time controls. It's almost certainly a by-product of the fact that you are forced to test this fast to resolve small ELO improvements. It's more and more difficult to get big ELO improvements from a single change.
What makes your results impressive, ignoring the large error margin of course, is that at long time controls the relative ELO difference between programs tends to close up significantly.
Don
Don, thank you, we both know how hard work every Elo point gain is.
Before this run my slowest test match with Houdini 3 was at 2'+2" so I'm very happy that at about 30 times longer TC the gain is still significant.
Hopefully you'll catch up with Komodo, it's more fun for everyone if there's a good competition at the top.
Robert
Actually, I would not want everyone to lay down and die. I want the competition and I doubt Komodo would be very strong if everything had stagnated 5 years ago.
So yes, we are trying to catch Houdini - our current dev version is almost certainly better than Houdini 1.5 but it's difficult to catch a moving target so please sit still for a minute or two.
Here are some results based on my distributed tester, where volunteers use their machines to help me test at much longer time controls (the time controls are adjusted to the hardware where the stated time control represents a very fast overclocked machine.) In these tests Komodo never plays other versions of itself.
It seems based on the results that Komodo 4471.02 64 bit is significantly stronger than other versions but for some reason you tested it only in the 90+1 list.
We stopped testing it long ago because we knew it was superior - by the same token we no longer test Komodo 3 either. But we decided to put it on one of the tests even though we are not longer interested just to verify that we have made the progress we think we have. But we rarely go back and test version that we have long ago lost interest in simply because it's a huge waste of resources.
I do not understand.
I understand that you stopped testing komodo3 because it is obviously inferior but this is not similiar because 4471.02 64 bit has a significantly higher rating based on your results.
I understand if you do not test it because you know that there was a bug in the test and the result does not make sense(this is what albert silver suggests) but your post does not say it.
to all the programmers please stop all these engine matches and try to concentrate on position. i analyse with all the 5 top engines and I can tell you everyone sometime see things other don't see
bupalo wrote:to all the programmers please stop all these engine matches and try to concentrate on position. i analyse with all the 5 top engines and I can tell you everyone sometime see things other don't see
We are definitely interested in positions too - so every time you run across one of this you would be doing us a kindness to make a database of them. When you get a few, post them for us with the move that you believe should have been played, or should have been avoided and we will look at them.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
The results so far are pretty impressive. It looks like you have a nice gain here.
I think you have to run some long tests like you are doing here in order to really know that you have improved the program. I have noticed that a lot of programs are coming out with new versions that have impressive ELO gains until they are tested at "real" time controls. It's almost certainly a by-product of the fact that you are forced to test this fast to resolve small ELO improvements. It's more and more difficult to get big ELO improvements from a single change.
What makes your results impressive, ignoring the large error margin of course, is that at long time controls the relative ELO difference between programs tends to close up significantly.
Don
Don, thank you, we both know how hard work every Elo point gain is.
Before this run my slowest test match with Houdini 3 was at 2'+2" so I'm very happy that at about 30 times longer TC the gain is still significant.
Hopefully you'll catch up with Komodo, it's more fun for everyone if there's a good competition at the top.
Robert
Actually, I would not want everyone to lay down and die. I want the competition and I doubt Komodo would be very strong if everything had stagnated 5 years ago.
So yes, we are trying to catch Houdini - our current dev version is almost certainly better than Houdini 1.5 but it's difficult to catch a moving target so please sit still for a minute or two.
Here are some results based on my distributed tester, where volunteers use their machines to help me test at much longer time controls (the time controls are adjusted to the hardware where the stated time control represents a very fast overclocked machine.) In these tests Komodo never plays other versions of itself.
It seems based on the results that Komodo 4471.02 64 bit is significantly stronger than other versions but for some reason you tested it only in the 90+1 list.
We stopped testing it long ago because we knew it was superior - by the same token we no longer test Komodo 3 either. But we decided to put it on one of the tests even though we are not longer interested just to verify that we have made the progress we think we have. But we rarely go back and test version that we have long ago lost interest in simply because it's a huge waste of resources.
I do not understand.
I understand that you stopped testing komodo3 because it is obviously inferior but this is not similiar because 4471.02 64 bit has a significantly higher rating based on your results.
I understand if you do not test it because you know that there was a bug in the test and the result does not make sense(this is what albert silver suggests) but your post does not say it.
Disregard the high result for version 4471.02 as it was due to a flaw in the tester relating to the 50 move rule, which version 4471.02 revealed. The 60 + 1 test is with the corrected version of the tester.