TalkChess.com

Posted: **Sat May 26, 2007 4:07 pm**

It does not really matter if the individual scores for black or white are negative, and thus rounded towards -infinity rather than towards zero. As long as you do the same to both, the result will be symmetric.

Posted: **Sat May 26, 2007 7:08 pm**

Gerd Isenberg wrote:
bob wrote:I don't think so. I am searching about 1.7M nps on my core-2 duo at 2.0ghz. I guess I could use the hardware counters to count the number of instructions, but it has to be way more than 1000 instructions per node. Last time I did use hardware it was around 2700 instructions per node but that was probably 8-9 years ago and the new version has slowed down as we have added some eval stuff...

I looked at an old run where I divided by a variable (when I had variable scaling for endgame terms) and there was no difference in NPS....

I've seen others doing the same thing with the hash signature where it is modulo the hash table size to get the table index, and they apparently didn't feel it made enough difference compared with the AND and power-of-2 size limitation.
1.7MNode per second and 2GHz on a core-2 duo - ahh yes, two processors. So I was at least factor two wrong with my 1000 cycles per node guess - sorry.

1E9 ns / 1.7E6 = 588 ns per node = 1176 cycles per node on two cores, which is about 2352 cycles per node and processor.

One idiv latency is still 1-2% of that, but probably this latency hides some memory stalls or whatever and/or schedules partly with some other instructions around on a core 2 duo. Also, you may not divide at every node you count. Empirical evidence is always right.

Actually my 1.7M was using one core.

And no, I don't divide at every node, but I do divide at most of 'em (at every call to Evaluate() but not at non-qsearch internal nodes).

However, if you round my NPS up to 2M, that is 1usec/node, but in the core2, that is 1000nsec/node, which is certainly something well beyond 1000 instructions/node. On this box (4M L2) all instructions and most of the data fits into L2 (not counting hash tables of course). I'm going to stick a bogus divide of two int values (global values so the compiler can't optimize them away) and see what happens exactly...

Posted: **Sat May 26, 2007 7:25 pm**

OK, here's the results. I added an x = y / z; type operation at the very top of Evaluate(). the values were all global, and y and z are changed so that the compiler can't play any hanky-panky with getting rid of them and replacing the divide with a constant. I also use the value elsewhere so that it has to change correctly.

log.001: time=6.75 mat=0 n=12424532 fh=93% nps=1.8M
log.002: time=6.74 mat=0 n=12424532 fh=93% nps=1.8M

So I see absolutely no difference in speed at all...

Bob

BTW both were compiled using intel's 64 bit compiler, with PGO. the first run is without the divide, the second is with.

Posted: **Sat May 26, 2007 8:23 pm**

bob wrote: Actually my 1.7M was using one core.

And no, I don't divide at every node, but I do divide at most of 'em (at every call to Evaluate() but not at non-qsearch internal nodes).

However, if you round my NPS up to 2M, that is 1usec/node, but in the core2, that is 1000nsec/node, which is certainly something well beyond 1000 instructions/node. On this box (4M L2) all instructions and most of the data fits into L2 (not counting hash tables of course). I'm going to stick a bogus divide of two int values (global values so the compiler can't optimize them away) and see what happens exactly...

hmm, I don't get your math. With 2M nps I calculate 500nsec/node. Or 1000 cycles / node on a 2 GHz machine, where one cycle takes 0.5 nsec. Number of instructions per node depends on your average ipc. Using idiv tends to slow it down

I still have problems to understand that such an expensive instruction doesn't show up - but no need to inspect assembly.

Posted: **Sat May 26, 2007 9:27 pm**

Uri Blass wrote:I found that my evaluation is not symmetric for white and black.

one of the reason is that in some part of my evaluation I simply calculate score from white point of view and later divide by 8

now (14>>3)=1 and (-14>>3)=-2 so I can get a different number
for white and black.

My question is how do you solve that type of problem and if there is a way to solve it without doing the code slower(of course it is easy to solve it by doing the code slower and writing something like
if (score<0)
{
score=-score;
score=score>>3;
score=-score;
}
else
score=score>>3;

I thought also about using (score+4)>>3 but in that case (12+4)>>3=2
when (-12+4)>>3=-1

Uri

Uri, which medicines are you using nowadays?

I really have no clue what you are talking about.
Can you explain what the problem is?

Thanks,
Vincent

Posted: **Sat May 26, 2007 10:20 pm**

diep wrote: I really have no clue what you are talking about.
Can you explain what the problem is?
Thanks,
Vincent

Obviously Uri scales white relative eval aspects, where white and black values are already sign flipped. Whether this is a good idea is another question - but no reason to make deprecative statements.

The problem, you are probably aware of is arithmetical shift right versus idiv:

Code: Select all

if ( x & 7 )
   abs(-x >> 3) != abs(x >> 3)
else
   abs(-x >> 3) == abs(x >> 3)
but 
   abs(-x / 3) == abs(x / 3)
is always true.

Posted: **Sat May 26, 2007 11:29 pm**

diep wrote:
Uri Blass wrote:I found that my evaluation is not symmetric for white and black.

one of the reason is that in some part of my evaluation I simply calculate score from white point of view and later divide by 8

now (14>>3)=1 and (-14>>3)=-2 so I can get a different number
for white and black.

My question is how do you solve that type of problem and if there is a way to solve it without doing the code slower(of course it is easy to solve it by doing the code slower and writing something like
if (score<0)
{
score=-score;
score=score>>3;
score=-score;
}
else
score=score>>3;

I thought also about using (score+4)>>3 but in that case (12+4)>>3=2
when (-12+4)>>3=-1

Uri
Uri, which medicines are you using nowadays?

I really have no clue what you are talking about.
Can you explain what the problem is?

Thanks,
Vincent

see post number 4 in page 1.
It was about some code that I have to prefer pawns in the endgame.

The problem was already solved after the post of Bob hyatt when he suggested to use / instead of >>.

Uri

Posted: **Sun May 27, 2007 4:48 pm**

Gerd Isenberg wrote:
bob wrote: Actually my 1.7M was using one core.

And no, I don't divide at every node, but I do divide at most of 'em (at every call to Evaluate() but not at non-qsearch internal nodes).

However, if you round my NPS up to 2M, that is 1usec/node, but in the core2, that is 1000nsec/node, which is certainly something well beyond 1000 instructions/node. On this box (4M L2) all instructions and most of the data fits into L2 (not counting hash tables of course). I'm going to stick a bogus divide of two int values (global values so the compiler can't optimize them away) and see what happens exactly...
hmm, I don't get your math. With 2M nps I calculate 500nsec/node. Or 1000 cycles / node on a 2 GHz machine, where one cycle takes 0.5 nsec. Number of instructions per node depends on your average ipc. Using idiv tends to slow it down

I still have problems to understand that such an expensive instruction doesn't show up - but no need to inspect assembly.

total idiocy is all I can claim.

500 ns per node is right.

TalkChess.com

question about symmertic evaluation

Re: question about symmertic evaluation

Re: question about symmertic evaluation

Re: question about symmertic evaluation

Re: question about symmertic evaluation

Re: question about symmertic evaluation

Re: question about symmertic evaluation

Re: question about symmertic evaluation

Re: question about symmertic evaluation