IWB wrote:
Are you using books or starting positions for that?
Both.
THAT is the biggest disadvantage as a test is not repeatable then! Engine A vs B is different than Engine A vs C! If the RATING is different (especially with enough game) is a complete different question.
Hi Ingo !
Even if you always use one and the same testset you
will never get the same result(s).
Do the following:
play a match Engine A vs Engine B with your testset.
After that reboot the machine and do exactly the same again.
I predict:
a.) you will not get the same result (+- 5%)
b.) you will not get the same games (probably more than 10%-15% different games ?)
ThatsIt wrote: you will not get the same games (probably more than 10%-15% different games ?)
Hi Gerhard,
What do you mean by same games?
Obviously at some point (15th move?, 20th move?...) the move will be different, even with 1 thread, mainly because the TC cannot work twice exactly the same (small timing differences)...
ThatsIt wrote: you will not get the same games (probably more than 10%-15% different games ?)
What do you mean by same games?
Obviously at some point (15th move?, 20th move?...) the move will be different, even with 1 thread, mainly because the TC cannot work twice exactly the same (small timing differences)...
Thats the point, Ernest.
And sometimes (4-5% maybe ?) the result will be different.
Keep in mind, Ingo wrote:
"THAT is the biggest disadvantage as a test is not repeatable then!
Engine A vs B is different than Engine A vs C! If the RATING is
different (especially with enough game) is a complete different question.
ThatsIt wrote: you will not get the same games (probably more than 10%-15% different games ?)
What do you mean by same games?
Obviously at some point (15th move?, 20th move?...) the move will be different, even with 1 thread, mainly because the TC cannot work twice exactly the same (small timing differences)...
Thats the point, Ernest.
And sometimes (4-5% maybe ?) the result will be different.
Keep in mind, Ingo wrote:
"THAT is the biggest disadvantage as a test is not repeatable then!
Engine A vs B is different than Engine A vs C! If the RATING is
different (especially with enough game) is a complete different question.
The main difference is that in one case the engines "decide" to play something different, in the other case YOU do that. Even if the result might be similar it is a conceptional mistake as YOU should interfere with the result as little as possible!
ThatsIt wrote:
Even if you always use one and the same testset you
will never get the same result(s).
Not the same, but playing many games nearly the same (with my 2000+ games for sure much less than 5%).
ThatsIt wrote:
Do the following:
play a match Engine A vs Engine B with your testset.
After that reboot the machine and do exactly the same again.
I predict:
a.) you will not get the same result (+- 5%)
Hmm, it was around 5% with 100 games, my guess (not checked) is that I am below 5% with 150 now.
ThatsIt wrote:
b.) you will not get the same games (probably more than 10%-15% different games ?)
Actually I guess that I will get MUCH higher rates of different games!
But that is not the point. If you want a structured testing you have to have conditions which are repeatable (as good as possible, we all have to compromise sometimes but openings are not nessesary and do not belong to that compromise). If the engines decide to do something different, fine. If you "make" it different, that is a wrong concept of testing!
IWB wrote:
[...snip...]
But that is not the point. If you want a structured testing you have to have conditions which are repeatable (as good as possible, we all have to compromise sometimes but openings are not nessesary and do not belong to that compromise). If the engines decide to do something different, fine. If you "make" it different, that is a wrong concept of testing!
I do not agree, but thats unimportant. We're talking about +- 5 points!
IWB wrote:
[...snip...]
But that is not the point. If you want a structured testing you have to have conditions which are repeatable (as good as possible, we all have to compromise sometimes but openings are not nessesary and do not belong to that compromise). If the engines decide to do something different, fine. If you "make" it different, that is a wrong concept of testing!
I do not agree, but thats unimportant. We're talking about +- 5 points!
Best wishes,
G.S.
There is nothing to "agree or not" except you deny the basic principles of sientific work!
However, you are right with the result of +/- 5 Elo.