Similarity Detector Available

georgerifkin · Post by **georgerifkin** » Tue Oct 11, 2011 2:08 pm

Laskos wrote:
georgerifkin wrote:I'm making some tests
I would like to make a question: why, when I set a time of 5ms, the test takes way more time with some engines, for example with stockfish, than with others
the time set is always 5ms
I guess 5ms are a little too few. Do you use time adjust for strength? If not, I don't think there is a big difference in convenience using a minimum of 10-20ms.

Kai

I wanted to test only the evaluation function of these engines
let's say one engines goes at 1 million nodes per second. In 100ms it has already searched 100000 nodes
I didn't use time adjust for strength, i'm testing top engines
I'll try with 10ms
thanks

georgerifkin · Post by **georgerifkin** » Tue Oct 11, 2011 2:09 pm

Adam Hair wrote:
georgerifkin wrote:I'm making some tests
I would like to make a question: why, when I set a time of 5ms, the test takes way more time with some engines, for example with stockfish, than with others
the time set is always 5ms
If I remember correctly, Stockfish searches 10 plies deep before sending its best move.

is this a correct behaviour?
shouldn't it respect always time constraints?

Don · Post by **Don** » Tue Oct 11, 2011 2:45 pm

georgerifkin wrote:
Adam Hair wrote:
georgerifkin wrote:I'm making some tests
I would like to make a question: why, when I set a time of 5ms, the test takes way more time with some engines, for example with stockfish, than with others
the time set is always 5ms
If I remember correctly, Stockfish searches 10 plies deep before sending its best move.
is this a correct behaviour?
shouldn't it respect always time constraints?

The tester basically runs an infinite search and then stops the program when the time is exhausted. fixed time levels are not always implemented in some programs thus this is a good way to handle it.

Also, some programs have a small startup delay between moves so if you test in milleseconds you may get signficantly better searches with some programs than others, regardless of the relative strength.

There is no practical way to compare the evaluation functions of chess programs for several reasons. Even at ridiculously fast searches some program have far superior searches, and this might even show up at 1 ply searches with various quies tricks and such. Some program extend threats (even on 1 ply search), checks, etc.

Also, if you tune an evaluation function properly it can be significantly weaker at (say) 3 ply searches than for realistic time controls where 15-25 ply searches might be done.

In fact, I personally don't like to think of the "static evaluation function" as separate from the search. They are joined at the hip. Komodo's evaluation function weakly detects some trivial tactics which many programs consider a search feature. So there is the issue of the speed vs quality tradeoff that various programs make differently that have no real bearing on the superiority of the programs, it's the mix that counts.

Laskos · Post by **Laskos** » Tue Oct 11, 2011 2:54 pm

georgerifkin wrote:
Laskos wrote:
georgerifkin wrote:I'm making some tests
I would like to make a question: why, when I set a time of 5ms, the test takes way more time with some engines, for example with stockfish, than with others
the time set is always 5ms
I guess 5ms are a little too few. Do you use time adjust for strength? If not, I don't think there is a big difference in convenience using a minimum of 10-20ms.

Kai
I wanted to test only the evaluation function of these engines
let's say one engines goes at 1 million nodes per second. In 100ms it has already searched 100000 nodes
I didn't use time adjust for strength, i'm testing top engines
I'll try with 10ms
thanks

I think that engines and the sim tester handle badly a time of 5ms, the standard windows C clock resolution being ~15ms. You could have one engine using 2ms, another 13ms instead of 5ms, distorting a bit the results. The degree of randomness is higher at shorter time, I am curious what's the self-similarity of Houdini at 5ms? You will observe that there are some engines more random than other engines even on 1 core.

Kai

Adam Hair · Post by **Adam Hair** » Tue Oct 11, 2011 3:13 pm

Laskos wrote:
georgerifkin wrote:
Laskos wrote:
georgerifkin wrote:I'm making some tests
I would like to make a question: why, when I set a time of 5ms, the test takes way more time with some engines, for example with stockfish, than with others
the time set is always 5ms
I guess 5ms are a little too few. Do you use time adjust for strength? If not, I don't think there is a big difference in convenience using a minimum of 10-20ms.

Kai
I wanted to test only the evaluation function of these engines
let's say one engines goes at 1 million nodes per second. In 100ms it has already searched 100000 nodes
I didn't use time adjust for strength, i'm testing top engines
I'll try with 10ms
thanks
I think that engines and the sim tester handle badly a time of 5ms, the standard windows C clock resolution being ~15ms. You could have one engine using 2ms, another 13ms instead of 5ms, distorting a bit the results. The degree of randomness is higher at shorter time, I am curious what's the self-similarity of Houdini at 5ms? You will observe that there are some engines more random than other engines even on 1 core.

Kai

The randomness that you and I have observed has some relationship to the positions the sim test uses. I replaced the positions with a new set and found Houdini's self-similarity to be approximately 97% as opposed to 75% with the original positions.

Laskos · Post by **Laskos** » Tue Oct 11, 2011 3:46 pm

Adam Hair wrote:
Laskos wrote:
georgerifkin wrote:
Laskos wrote:
georgerifkin wrote:I'm making some tests
I would like to make a question: why, when I set a time of 5ms, the test takes way more time with some engines, for example with stockfish, than with others
the time set is always 5ms
I guess 5ms are a little too few. Do you use time adjust for strength? If not, I don't think there is a big difference in convenience using a minimum of 10-20ms.

Kai
I wanted to test only the evaluation function of these engines
let's say one engines goes at 1 million nodes per second. In 100ms it has already searched 100000 nodes
I didn't use time adjust for strength, i'm testing top engines
I'll try with 10ms
thanks
I think that engines and the sim tester handle badly a time of 5ms, the standard windows C clock resolution being ~15ms. You could have one engine using 2ms, another 13ms instead of 5ms, distorting a bit the results. The degree of randomness is higher at shorter time, I am curious what's the self-similarity of Houdini at 5ms? You will observe that there are some engines more random than other engines even on 1 core.

Kai
The randomness that you and I have observed has some relationship to the positions the sim test uses. I replaced the positions with a new set and found Houdini's self-similarity to be approximately 97% as opposed to 75% with the original positions.

And how the general similarity between engines went? When Don first released Sim with 2,000 positions, they were not only too few, but a part of them were pretty clear moves for almost all engines. That meant that the range between "unrelated" and "probably related" was too small, ~10%, with the standard deviation of ~2%, a bit messy. The last Sim was much better, 8,000 positions, which were sorted as being pretty neutral as move choice goes. The range from "unrelated" to "probably related" increased to ~20%, with a standard deviation decreasing to ~1% (more positions), which gave statistically significant results. When Houdini has 97% self-similarity, how Sim generally fares? How many positions do you use? What is the typical range of the results?

Kai

georgerifkin · Post by **georgerifkin** » Tue Oct 11, 2011 5:05 pm

Laskos wrote:
georgerifkin wrote:
Laskos wrote:
georgerifkin wrote:I'm making some tests
I would like to make a question: why, when I set a time of 5ms, the test takes way more time with some engines, for example with stockfish, than with others
the time set is always 5ms
I guess 5ms are a little too few. Do you use time adjust for strength? If not, I don't think there is a big difference in convenience using a minimum of 10-20ms.

Kai
I wanted to test only the evaluation function of these engines
let's say one engines goes at 1 million nodes per second. In 100ms it has already searched 100000 nodes
I didn't use time adjust for strength, i'm testing top engines
I'll try with 10ms
thanks
I think that engines and the sim tester handle badly a time of 5ms, the standard windows C clock resolution being ~15ms. You could have one engine using 2ms, another 13ms instead of 5ms, distorting a bit the results. The degree of randomness is higher at shorter time, I am curious what's the self-similarity of Houdini at 5ms? You will observe that there are some engines more random than other engines even on 1 core.

Kai

I didn't test for self-similarity
I decided I will use 20 ms. I'll test the top 10 engines that I own or are free and put here the matrix generated by the program.

georgerifkin · Post by **georgerifkin** » Tue Oct 11, 2011 6:46 pm

Don wrote:
georgerifkin wrote:
Adam Hair wrote:
georgerifkin wrote:I'm making some tests
I would like to make a question: why, when I set a time of 5ms, the test takes way more time with some engines, for example with stockfish, than with others
the time set is always 5ms
If I remember correctly, Stockfish searches 10 plies deep before sending its best move.
is this a correct behaviour?
shouldn't it respect always time constraints?
The tester basically runs an infinite search and then stops the program when the time is exhausted. fixed time levels are not always implemented in some programs thus this is a good way to handle it.

Also, some programs have a small startup delay between moves so if you test in milleseconds you may get signficantly better searches with some programs than others, regardless of the relative strength.

There is no practical way to compare the evaluation functions of chess programs for several reasons. Even at ridiculously fast searches some program have far superior searches, and this might even show up at 1 ply searches with various quies tricks and such. Some program extend threats (even on 1 ply search), checks, etc.

Also, if you tune an evaluation function properly it can be significantly weaker at (say) 3 ply searches than for realistic time controls where 15-25 ply searches might be done.

In fact, I personally don't like to think of the "static evaluation function" as separate from the search. They are joined at the hip. Komodo's evaluation function weakly detects some trivial tactics which many programs consider a search feature. So there is the issue of the speed vs quality tradeoff that various programs make differently that have no real bearing on the superiority of the programs, it's the mix that counts.

thank you mr dailey for the explanation
I didn't know that not all programs support fixed time searches.
perhaps a command could be added to the uci protocol to make a program show the value of its static evaluation of a position, withouth any search

Laskos · Post by **Laskos** » Tue Oct 11, 2011 6:51 pm

georgerifkin wrote:
Laskos wrote:
georgerifkin wrote:
Laskos wrote:
georgerifkin wrote:I'm making some tests
I would like to make a question: why, when I set a time of 5ms, the test takes way more time with some engines, for example with stockfish, than with others
the time set is always 5ms
I guess 5ms are a little too few. Do you use time adjust for strength? If not, I don't think there is a big difference in convenience using a minimum of 10-20ms.

Kai
I wanted to test only the evaluation function of these engines
let's say one engines goes at 1 million nodes per second. In 100ms it has already searched 100000 nodes
I didn't use time adjust for strength, i'm testing top engines
I'll try with 10ms
thanks
I think that engines and the sim tester handle badly a time of 5ms, the standard windows C clock resolution being ~15ms. You could have one engine using 2ms, another 13ms instead of 5ms, distorting a bit the results. The degree of randomness is higher at shorter time, I am curious what's the self-similarity of Houdini at 5ms? You will observe that there are some engines more random than other engines even on 1 core.

Kai
I didn't test for self-similarity
I decided I will use 20 ms. I'll test the top 10 engines that I own or are free and put here the matrix generated by the program.

Good, do you own Houdini 2.0 or Chiron 1.0? I am curious about those, especially about Chiron. If you put the matrix here, I could feed SPSS with it to post the dendrogram.

Kai

Adam Hair · Post by **Adam Hair** » Thu Oct 13, 2011 1:36 am

Laskos wrote:
Adam Hair wrote:
Laskos wrote:
georgerifkin wrote:
Laskos wrote:
georgerifkin wrote:I'm making some tests
I would like to make a question: why, when I set a time of 5ms, the test takes way more time with some engines, for example with stockfish, than with others
the time set is always 5ms
I guess 5ms are a little too few. Do you use time adjust for strength? If not, I don't think there is a big difference in convenience using a minimum of 10-20ms.

Kai
I wanted to test only the evaluation function of these engines
let's say one engines goes at 1 million nodes per second. In 100ms it has already searched 100000 nodes
I didn't use time adjust for strength, i'm testing top engines
I'll try with 10ms
thanks
I think that engines and the sim tester handle badly a time of 5ms, the standard windows C clock resolution being ~15ms. You could have one engine using 2ms, another 13ms instead of 5ms, distorting a bit the results. The degree of randomness is higher at shorter time, I am curious what's the self-similarity of Houdini at 5ms? You will observe that there are some engines more random than other engines even on 1 core.

Kai
The randomness that you and I have observed has some relationship to the positions the sim test uses. I replaced the positions with a new set and found Houdini's self-similarity to be approximately 97% as opposed to 75% with the original positions.
And how the general similarity between engines went? When Don first released Sim with 2,000 positions, they were not only too few, but a part of them were pretty clear moves for almost all engines. That meant that the range between "unrelated" and "probably related" was too small, ~10%, with the standard deviation of ~2%, a bit messy. The last Sim was much better, 8,000 positions, which were sorted as being pretty neutral as move choice goes. The range from "unrelated" to "probably related" increased to ~20%, with a standard deviation decreasing to ~1% (more positions), which gave statistically significant results. When Houdini has 97% self-similarity, how Sim generally fares? How many positions do you use? What is the typical range of the results?

Kai

Hi Kai,

Sorry for the late response. I have not finished with the new set of positions. I will pick it back up at some point. I ran a quick test with Houdini to see if the low self-similarity still occurred, which it did not. I did run Stockfish 2.0 with it too. The similarity between Houdini and Stockfish was ~59%, with the old set it was ~50%, indicating that some culling is needed. I think there is ~16,000 positions in the new set. Anyway, I have to work on it some more.

Adam

Similarity Detector Available

Re: Similarity Detector Available

Re: Similarity Detector Available

Re: Similarity Detector Available

Re: Similarity Detector Available

Re: Similarity Detector Available

Re: Similarity Detector Available

Re: Similarity Detector Available

Re: Similarity Detector Available

Re: Similarity Detector Available

Re: Similarity Detector Available