question about symmertic evaluation

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw, Ras, hgm, chrisw, Rebel, Ras

User avatar
hgm
Posts: 28268
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: question about symmertic evaluation

Post by hgm »

It does not really matter if the individual scores for black or white are negative, and thus rounded towards -infinity rather than towards zero. As long as you do the same to both, the result will be symmetric.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: question about symmertic evaluation

Post by bob »

Gerd Isenberg wrote:
bob wrote:I don't think so. I am searching about 1.7M nps on my core-2 duo at 2.0ghz. I guess I could use the hardware counters to count the number of instructions, but it has to be way more than 1000 instructions per node. Last time I did use hardware it was around 2700 instructions per node but that was probably 8-9 years ago and the new version has slowed down as we have added some eval stuff...

I looked at an old run where I divided by a variable (when I had variable scaling for endgame terms) and there was no difference in NPS....

I've seen others doing the same thing with the hash signature where it is modulo the hash table size to get the table index, and they apparently didn't feel it made enough difference compared with the AND and power-of-2 size limitation.
1.7MNode per second and 2GHz on a core-2 duo - ahh yes, two processors. So I was at least factor two wrong with my 1000 cycles per node guess - sorry.

1E9 ns / 1.7E6 = 588 ns per node = 1176 cycles per node on two cores, which is about 2352 cycles per node and processor.

One idiv latency is still 1-2% of that, but probably this latency hides some memory stalls or whatever and/or schedules partly with some other instructions around on a core 2 duo. Also, you may not divide at every node you count. Empirical evidence is always right.
Actually my 1.7M was using one core.

And no, I don't divide at every node, but I do divide at most of 'em (at every call to Evaluate() but not at non-qsearch internal nodes).

However, if you round my NPS up to 2M, that is 1usec/node, but in the core2, that is 1000nsec/node, which is certainly something well beyond 1000 instructions/node. On this box (4M L2) all instructions and most of the data fits into L2 (not counting hash tables of course). I'm going to stick a bogus divide of two int values (global values so the compiler can't optimize them away) and see what happens exactly...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: question about symmertic evaluation

Post by bob »

OK, here's the results. I added an x = y / z; type operation at the very top of Evaluate(). the values were all global, and y and z are changed so that the compiler can't play any hanky-panky with getting rid of them and replacing the divide with a constant. I also use the value elsewhere so that it has to change correctly.

log.001: time=6.75 mat=0 n=12424532 fh=93% nps=1.8M
log.002: time=6.74 mat=0 n=12424532 fh=93% nps=1.8M

So I see absolutely no difference in speed at all...

Bob

BTW both were compiled using intel's 64 bit compiler, with PGO. the first run is without the divide, the second is with.
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: question about symmertic evaluation

Post by Gerd Isenberg »

bob wrote: Actually my 1.7M was using one core.

And no, I don't divide at every node, but I do divide at most of 'em (at every call to Evaluate() but not at non-qsearch internal nodes).

However, if you round my NPS up to 2M, that is 1usec/node, but in the core2, that is 1000nsec/node, which is certainly something well beyond 1000 instructions/node. On this box (4M L2) all instructions and most of the data fits into L2 (not counting hash tables of course). I'm going to stick a bogus divide of two int values (global values so the compiler can't optimize them away) and see what happens exactly...
hmm, I don't get your math. With 2M nps I calculate 500nsec/node. Or 1000 cycles / node on a 2 GHz machine, where one cycle takes 0.5 nsec. Number of instructions per node depends on your average ipc. Using idiv tends to slow it down ;-)

I still have problems to understand that such an expensive instruction doesn't show up - but no need to inspect assembly.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: question about symmertic evaluation

Post by diep »

Uri Blass wrote:I found that my evaluation is not symmetric for white and black.

one of the reason is that in some part of my evaluation I simply calculate score from white point of view and later divide by 8

now (14>>3)=1 and (-14>>3)=-2 so I can get a different number
for white and black.

My question is how do you solve that type of problem and if there is a way to solve it without doing the code slower(of course it is easy to solve it by doing the code slower and writing something like
if (score<0)
{
score=-score;
score=score>>3;
score=-score;
}
else
score=score>>3;

I thought also about using (score+4)>>3 but in that case (12+4)>>3=2
when (-12+4)>>3=-1

Uri
Uri, which medicines are you using nowadays?

I really have no clue what you are talking about.
Can you explain what the problem is?

Thanks,
Vincent
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: question about symmertic evaluation

Post by Gerd Isenberg »

diep wrote: I really have no clue what you are talking about.
Can you explain what the problem is?
Thanks,
Vincent
Obviously Uri scales white relative eval aspects, where white and black values are already sign flipped. Whether this is a good idea is another question - but no reason to make deprecative statements.

The problem, you are probably aware of is arithmetical shift right versus idiv:

Code: Select all

if ( x & 7 )
   abs&#40;-x >> 3&#41; != abs&#40;x >> 3&#41;
else
   abs&#40;-x >> 3&#41; == abs&#40;x >> 3&#41;
but 
   abs&#40;-x / 3&#41; == abs&#40;x / 3&#41;
is always true.
Uri Blass
Posts: 10661
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: question about symmertic evaluation

Post by Uri Blass »

diep wrote:
Uri Blass wrote:I found that my evaluation is not symmetric for white and black.

one of the reason is that in some part of my evaluation I simply calculate score from white point of view and later divide by 8

now (14>>3)=1 and (-14>>3)=-2 so I can get a different number
for white and black.

My question is how do you solve that type of problem and if there is a way to solve it without doing the code slower(of course it is easy to solve it by doing the code slower and writing something like
if (score<0)
{
score=-score;
score=score>>3;
score=-score;
}
else
score=score>>3;

I thought also about using (score+4)>>3 but in that case (12+4)>>3=2
when (-12+4)>>3=-1

Uri
Uri, which medicines are you using nowadays?

I really have no clue what you are talking about.
Can you explain what the problem is?

Thanks,
Vincent
see post number 4 in page 1.
It was about some code that I have to prefer pawns in the endgame.

The problem was already solved after the post of Bob hyatt when he suggested to use / instead of >>.

Uri
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: question about symmertic evaluation

Post by bob »

Gerd Isenberg wrote:
bob wrote: Actually my 1.7M was using one core.

And no, I don't divide at every node, but I do divide at most of 'em (at every call to Evaluate() but not at non-qsearch internal nodes).

However, if you round my NPS up to 2M, that is 1usec/node, but in the core2, that is 1000nsec/node, which is certainly something well beyond 1000 instructions/node. On this box (4M L2) all instructions and most of the data fits into L2 (not counting hash tables of course). I'm going to stick a bogus divide of two int values (global values so the compiler can't optimize them away) and see what happens exactly...
hmm, I don't get your math. With 2M nps I calculate 500nsec/node. Or 1000 cycles / node on a 2 GHz machine, where one cycle takes 0.5 nsec. Number of instructions per node depends on your average ipc. Using idiv tends to slow it down ;-)

I still have problems to understand that such an expensive instruction doesn't show up - but no need to inspect assembly.
total idiocy is all I can claim. :)

500 ns per node is right. :)