Re: For CCRL Tester/s- One More.....
Posted: Mon Mar 25, 2013 6:07 am
Would you happen to know how CEGT would handle the same scenario as above?
And thanks again in advance-
george
And thanks again in advance-
george
It arrived not exactly later that day but one year later there it was: Toga CMLX 1.4.5e. Teemu's Toga CMLX 1.4.5e version is hopefully still downloadable from this 2009 thread on Rybka forum.Eelco de Groot wrote: I see that Teemu Pudas (Vempele) writes in the same thread that he adapted the code so that would imply it is already in one of the CMLX versions. I did not remember that... He probably placed the code better, my version was placed somewhere as one of of the 'win' recognizers/situations early in eval.cpp but I don't know how Teemu did it.
Yes I do run the 64bit version. If someone would bother to read my conditions it becomes very clear. Unfortunately you are not the only one who doesn't care and just look to the rank, rating and if in good mood to the time control. That other conditions are at least as important seems to be ignored by many ... at least according to the emails I get from my web site ...geots wrote:Ok- my mistake. I guess Ingo is running Z Mexico 64bit.
Interesting thought ! But not what I meant. I meant 150(or 200) games against ONE opponent. That IS irrelevant!geots wrote: ... And don't listen to anyone who says 200 games mean nothing and you need over 1000. Stats are what they are, but if under 1000 meant nothing, you could throw out about 50% or more of CCRLs ratings.
Code: Select all
Toga II 3.0 32b - Zappa Mexico II (2703) 63.0 - 87.0 42.00% Perf=2647
Toga II 3.0 32b - Chiron 1.5 (2845) 57.5 - 92.5 38.33% Perf=2763
Ok thanks Eelco!Eelco de Groot wrote:It arrived not exactly later that day but one year later there it was: Toga CMLX 1.4.5e. Teemu's Toga CMLX 1.4.5e version is hopefully still downloadable from this 2009 thread on Rybka forum.Eelco de Groot wrote: I see that Teemu Pudas (Vempele) writes in the same thread that he adapted the code so that would imply it is already in one of the CMLX versions. I did not remember that... He probably placed the code better, my version was placed somewhere as one of of the 'win' recognizers/situations early in eval.cpp but I don't know how Teemu did it.
Eelco
IWB wrote:Yes I do run the 64bit version. If someone would bother to read my conditions it becomes very clear. Unfortunately you are not the only one who doesn't care and just look to the rank, rating and if in good mood to the time control. That other conditions are at least as important seems to be ignored by many ... at least according to the emails I get from my web site ...geots wrote:Ok- my mistake. I guess Ingo is running Z Mexico 64bit.
Interesting thought ! But not what I meant. I meant 150(or 200) games against ONE opponent. That IS irrelevant!geots wrote: ... And don't listen to anyone who says 200 games mean nothing and you need over 1000. Stats are what they are, but if under 1000 meant nothing, you could throw out about 50% or more of CCRLs ratings.
This is an exceprt from the latest Toga run:
That is a difference of 116 Elo* - which cant be explained by statistics! This is simply the difference that occurs because of the playing style suits better or not. So:Code: Select all
Toga II 3.0 32b - Zappa Mexico II (2703) 63.0 - 87.0 42.00% Perf=2647 Toga II 3.0 32b - Chiron 1.5 (2845) 57.5 - 92.5 38.33% Perf=2763
1. You can't draw conclusions out of 150 games against ONE opponent
2. You only get a valid average result if you run against a higher number of opponents. (which makes lists with many times the same opponent and just a few others doubtfull)
Of course, if you are just interested in how does the engine perform against one particular engine you might consider the result interesting ...
Bye
Ingo
*some engines even have a bigger performance gap than 116 Elo between opponents
I think that it is possible to get conclusions based on 150 games against a single opponent and the question is what are the conclusions.IWB wrote:Yes I do run the 64bit version. If someone would bother to read my conditions it becomes very clear. Unfortunately you are not the only one who doesn't care and just look to the rank, rating and if in good mood to the time control. That other conditions are at least as important seems to be ignored by many ... at least according to the emails I get from my web site ...geots wrote:Ok- my mistake. I guess Ingo is running Z Mexico 64bit.
Interesting thought ! But not what I meant. I meant 150(or 200) games against ONE opponent. That IS irrelevant!geots wrote: ... And don't listen to anyone who says 200 games mean nothing and you need over 1000. Stats are what they are, but if under 1000 meant nothing, you could throw out about 50% or more of CCRLs ratings.
This is an exceprt from the latest Toga run:
That is a difference of 116 Elo* - which cant be explained by statistics! This is simply the difference that occurs because of the playing style suits better or not. So:Code: Select all
Toga II 3.0 32b - Zappa Mexico II (2703) 63.0 - 87.0 42.00% Perf=2647 Toga II 3.0 32b - Chiron 1.5 (2845) 57.5 - 92.5 38.33% Perf=2763
1. You can't draw conclusions out of 150 games against ONE opponent
2. You only get a valid average result if you run against a higher number of opponents. (which makes lists with many times the same opponent and just a few others doubtfull)
Of course, if you are just interested in how does the engine perform against one particular engine you might consider the result interesting ...
Bye
Ingo
*some engines even have a bigger performance gap than 116 Elo between opponents
Uri Blass wrote:I think that it is possible to get conclusions based on 150 games against a single opponent and the question is what are the conclusions.IWB wrote:Yes I do run the 64bit version. If someone would bother to read my conditions it becomes very clear. Unfortunately you are not the only one who doesn't care and just look to the rank, rating and if in good mood to the time control. That other conditions are at least as important seems to be ignored by many ... at least according to the emails I get from my web site ...geots wrote:Ok- my mistake. I guess Ingo is running Z Mexico 64bit.
Interesting thought ! But not what I meant. I meant 150(or 200) games against ONE opponent. That IS irrelevant!geots wrote: ... And don't listen to anyone who says 200 games mean nothing and you need over 1000. Stats are what they are, but if under 1000 meant nothing, you could throw out about 50% or more of CCRLs ratings.
This is an exceprt from the latest Toga run:
That is a difference of 116 Elo* - which cant be explained by statistics! This is simply the difference that occurs because of the playing style suits better or not. So:Code: Select all
Toga II 3.0 32b - Zappa Mexico II (2703) 63.0 - 87.0 42.00% Perf=2647 Toga II 3.0 32b - Chiron 1.5 (2845) 57.5 - 92.5 38.33% Perf=2763
1. You can't draw conclusions out of 150 games against ONE opponent
2. You only get a valid average result if you run against a higher number of opponents. (which makes lists with many times the same opponent and just a few others doubtfull)
Of course, if you are just interested in how does the engine perform against one particular engine you might consider the result interesting ...
Bye
Ingo
*some engines even have a bigger performance gap than 116 Elo between opponents
A program may perform better against one opponent because of playing style but there is a limit for it.
If I see for example result like 120-30 I can be practically sure that the winner is better.
If I see for example result like 90-60 I cannot be practically sure that the winner is better but I can be practically sure that the winner is not more than 100 elo worse than the loser.
These conclusions are based on common sense and previous experience and I was relatively careful here.
I believe that there is no single practical case when A score even 70% against B in 150 games when A score worse than B against C.
Note that C should have a similiar rating to the average of A and B(not more than 100 elo difference).
If you think that I am wrong then I would like to see a single case including the names of the programs A,B,C so everybody can reproduce the results(with possible small differences).
Note that I do not claim that it is impossible to build programs A,B,C when it happens but only that it practically does not happen.
I have no idea why people are so nitpicking. Your example of 70% is about 150 Elo difference (or 105 to 45 in a 150 game match) while in the given case we have a 50 Elo difference.Uri Blass wrote:
I think that it is possible to get conclusions based on 150 games against a single opponent and the question is what are the conclusions.
A program may perform better against one opponent because of playing style but there is a limit for it.
If I see for example result like 120-30 I can be practically sure that the winner is better.
If I see for example result like 90-60 I cannot be practically sure that the winner is better but I can be practically sure that the winner is not more than 100 elo worse than the loser.
These conclusions are based on common sense and previous experience and I was relatively careful here.
I believe that there is no single practical case when A score even 70% against B in 150 games when A score worse than B against C.
Note that C should have a similiar rating to the average of A and B(not more than 100 elo difference).
If you think that I am wrong then I would like to see a single case including the names of the programs A,B,C so everybody can reproduce the results(with possible small differences).
Note that I do not claim that it is impossible to build programs A,B,C when it happens but only that it practically does not happen.