TimoK wrote:b) Komodo 5 shows no improvement over Komodo 4 for the used test conditions, i.e. long TCs with low increment, AMD CPU, strong opponents, Noomen Opening Suite 2012.
Best regards
Timo
Well, after 500 games on CCRL 40/40, Komodo 5 shows just a 3 Elo improvement over Komodo 4. 400 of the 500 games are AMD SSE4. So that is another set of reasonably long time control results (on AMD) that do not favour Komodo 5. I'm inclined to disregard the AMD factor, which just leaves longer time controls where K5 does not shine over K4.
After 1000 games on CEGT at 40/20, Komodo 5 is 2 elo ahead of Houdini 2 and 24 elo ahead of Komodo 4. It just shows you need big samples to measure modest gains. Considering that we are still not close to Houdini 2 at 40/4, it shows that Komodo 5 does scale better than Houdini 2, at least up to 40/20.
Then, with 95% confidence using Timo games, scales back badly at very long 120/40 time control.
Kai
I think that timo used 120+3 and not 120/40
The CCRL rating suggests that it scales back badly but
I am not sure if it is not a statistical error and we certainly need more games.
TimoK wrote:b) Komodo 5 shows no improvement over Komodo 4 for the used test conditions, i.e. long TCs with low increment, AMD CPU, strong opponents, Noomen Opening Suite 2012.
Best regards
Timo
Well, after 500 games on CCRL 40/40, Komodo 5 shows just a 3 Elo improvement over Komodo 4. 400 of the 500 games are AMD SSE4. So that is another set of reasonably long time control results (on AMD) that do not favour Komodo 5. I'm inclined to disregard the AMD factor, which just leaves longer time controls where K5 does not shine over K4.
After 1000 games on CEGT at 40/20, Komodo 5 is 2 elo ahead of Houdini 2 and 24 elo ahead of Komodo 4. It just shows you need big samples to measure modest gains. Considering that we are still not close to Houdini 2 at 40/4, it shows that Komodo 5 does scale better than Houdini 2, at least up to 40/20.
Then, with 95% confidence using Timo games, scales back badly at very long 120/40 time control.
Kai
It would be almost impossible to design a chess program that scaled better than another from 40/4 up to 40/20 and then worse up to 40/120. The right way to look at this, due to the small samples, is just to pool all the games at 40/20, 40/40, and 40/120 and rate them, then compare the gap from Houdini 2 with the gap in the blitz lists. I expect it will show that we are much closer to Houdini 2 in the longer games.
TimoK wrote:b) Komodo 5 shows no improvement over Komodo 4 for the used test conditions, i.e. long TCs with low increment, AMD CPU, strong opponents, Noomen Opening Suite 2012.
Best regards
Timo
Well, after 500 games on CCRL 40/40, Komodo 5 shows just a 3 Elo improvement over Komodo 4. 400 of the 500 games are AMD SSE4. So that is another set of reasonably long time control results (on AMD) that do not favour Komodo 5. I'm inclined to disregard the AMD factor, which just leaves longer time controls where K5 does not shine over K4.
After 1000 games on CEGT at 40/20, Komodo 5 is 2 elo ahead of Houdini 2 and 24 elo ahead of Komodo 4. It just shows you need big samples to measure modest gains. Considering that we are still not close to Houdini 2 at 40/4, it shows that Komodo 5 does scale better than Houdini 2, at least up to 40/20.
Then, with 95% confidence using Timo games, scales back badly at very long 120/40 time control.
Kai
It would be almost impossible to design a chess program that scaled better than another from 40/4 up to 40/20 and then worse up to 40/120. The right way to look at this, due to the small samples, is just to pool all the games at 40/20, 40/40, and 40/120 and rate them, then compare the gap from Houdini 2 with the gap in the blitz lists. I expect it will show that we are much closer to Houdini 2 in the longer games.
I do not say that it is the case but I do not agree that
it is almost impossible.
It is easy to have some bug that cause the program to play weaker only at longer time control.
There can be stack overflow when the program
try to do something like A[n]=m when the size of the array A is too small and it is possible that the problem never happens at blitz (so the program scales well from bullet to blitz) but happens at long time control.
Note that stack overflow not always cause the program to crash and it can cause the program only to perform worse because A[n]=m change something that does not cause the program to crash.
There may be other bugs that do not cause stack overflow but still cause the program to play weaker at long time control.
thinking about it again it may happen also without bugs or only with chess related bugs.
Imagine that we use a very slow hardware and
a program that use selective search competes against
a program that use only brute force(both programs use the alpha beta algorithm with no pruning except the pruning of the selective search)
The selective search program may win at blitz because it goes deeper but at slower time control it is going to lose because of the things that the selective search miss(the selective search program may even scale better if you go from bullet to blitz).
Programs of today also may use some selective search because of a bug when the selective search is not a problem at fast time control(even with the big depths that the programs get today) because the selective search prunes very little but it starts to be a bigger problem at longer time control.
Uri Blass wrote:thinking about it again it may happen also without bugs or only with chess related bugs.
The only established "bug" is Larry's cherry-picking of rating lists, and the Komodo Team's tendency to make extrapolations without actually playing any games against Houdini 2.
The fact is that there is not a single rating list where Komodo is ahead of Houdini. Looking at the current evidence (IPON 5+3, CEGT 20/40, CCRL 40/40, Timo's 120+3) it seems that Komodo 5 is still about 20 Elo behind Houdini 2.
The results at 40/40 or 120+3 are not very supportive of the hypothesis that Komodo 5 would scale any better than Houdini 2.
Uri Blass wrote:thinking about it again it may happen also without bugs or only with chess related bugs.
The only established "bug" is Larry's cherry-picking of rating lists, and the Komodo Team's tendency to make extrapolations without actually playing any games against Houdini 2.
The fact is that there is not a single rating list where Komodo is ahead of Houdini. Looking at the current evidence (IPON 5+3, CEGT 20/40, CCRL 40/40, Timo's 120+3) it seems that Komodo 5 is still about 20 Elo behind Houdini 2.
The results at 40/40 or 120+3 are not very supportive of the hypothesis that Komodo 5 would scale any better than Houdini 2.
Robert
The main problem is not houdini.
The main problem is that it seems that komodo5 scales worse than previous versions of komodo(based on comparing between CCRL 40/40 and CCRL 40/4).
Maybe it is going to be changed with more games and we certainly need more games to be sure.
thx for your warm words concerning this little test! I'm happy that it was interesting for you, too. I also enjoyed the games and will continue my testing with future engines.
Uri Blass wrote:thinking about it again it may happen also without bugs or only with chess related bugs.
The only established "bug" is Larry's cherry-picking of rating lists, and the Komodo Team's tendency to make extrapolations without actually playing any games against Houdini 2.
The fact is that there is not a single rating list where Komodo is ahead of Houdini. Looking at the current evidence (IPON 5+3, CEGT 20/40, CCRL 40/40, Timo's 120+3) it seems that Komodo 5 is still about 20 Elo behind Houdini 2.
The results at 40/40 or 120+3 are not very supportive of the hypothesis that Komodo 5 would scale any better than Houdini 2.
Robert
Well said Robert, I fully agree on that
Alas Larry seems very stubborn on this and probably wont give up banging his single tone drum
TimoK wrote:b) Komodo 5 shows no improvement over Komodo 4 for the used test conditions, i.e. long TCs with low increment, AMD CPU, strong opponents, Noomen Opening Suite 2012.
Best regards
Timo
Well, after 500 games on CCRL 40/40, Komodo 5 shows just a 3 Elo improvement over Komodo 4. 400 of the 500 games are AMD SSE4. So that is another set of reasonably long time control results (on AMD) that do not favour Komodo 5. I'm inclined to disregard the AMD factor, which just leaves longer time controls where K5 does not shine over K4.
After 1000 games on CEGT at 40/20, Komodo 5 is 2 elo ahead of Houdini 2 and 24 elo ahead of Komodo 4. It just shows you need big samples to measure modest gains. Considering that we are still not close to Houdini 2 at 40/4, it shows that Komodo 5 does scale better than Houdini 2, at least up to 40/20.
Then, with 95% confidence using Timo games, scales back badly at very long 120/40 time control.
Kai
I think that timo used 120+3 and not 120/40
The CCRL rating suggests that it scales back badly but
I am not sure if it is not a statistical error and we certainly need more games.
That is what I said, 95% confidence (not 100%) from Timo games, and a similar confidence from CCRL games, but at a shorter control. CEGT games 40/20 are even faster and on a weaker hardware than Timo's. Would be nice if Timo completes the round-robin of all engines, means adding Komodo 5 and Critter 1.6 matches, completing the previous round-robin.