question about symmertic evaluation
Moderators: hgm, Rebel, chrisw, Ras, hgm, chrisw, Rebel, Ras
-
- Posts: 28268
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: question about symmertic evaluation
It does not really matter if the individual scores for black or white are negative, and thus rounded towards -infinity rather than towards zero. As long as you do the same to both, the result will be symmetric.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: question about symmertic evaluation
Actually my 1.7M was using one core.Gerd Isenberg wrote:1.7MNode per second and 2GHz on a core-2 duo - ahh yes, two processors. So I was at least factor two wrong with my 1000 cycles per node guess - sorry.bob wrote:I don't think so. I am searching about 1.7M nps on my core-2 duo at 2.0ghz. I guess I could use the hardware counters to count the number of instructions, but it has to be way more than 1000 instructions per node. Last time I did use hardware it was around 2700 instructions per node but that was probably 8-9 years ago and the new version has slowed down as we have added some eval stuff...
I looked at an old run where I divided by a variable (when I had variable scaling for endgame terms) and there was no difference in NPS....
I've seen others doing the same thing with the hash signature where it is modulo the hash table size to get the table index, and they apparently didn't feel it made enough difference compared with the AND and power-of-2 size limitation.
1E9 ns / 1.7E6 = 588 ns per node = 1176 cycles per node on two cores, which is about 2352 cycles per node and processor.
One idiv latency is still 1-2% of that, but probably this latency hides some memory stalls or whatever and/or schedules partly with some other instructions around on a core 2 duo. Also, you may not divide at every node you count. Empirical evidence is always right.
And no, I don't divide at every node, but I do divide at most of 'em (at every call to Evaluate() but not at non-qsearch internal nodes).
However, if you round my NPS up to 2M, that is 1usec/node, but in the core2, that is 1000nsec/node, which is certainly something well beyond 1000 instructions/node. On this box (4M L2) all instructions and most of the data fits into L2 (not counting hash tables of course). I'm going to stick a bogus divide of two int values (global values so the compiler can't optimize them away) and see what happens exactly...
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: question about symmertic evaluation
OK, here's the results. I added an x = y / z; type operation at the very top of Evaluate(). the values were all global, and y and z are changed so that the compiler can't play any hanky-panky with getting rid of them and replacing the divide with a constant. I also use the value elsewhere so that it has to change correctly.
log.001: time=6.75 mat=0 n=12424532 fh=93% nps=1.8M
log.002: time=6.74 mat=0 n=12424532 fh=93% nps=1.8M
So I see absolutely no difference in speed at all...
Bob
BTW both were compiled using intel's 64 bit compiler, with PGO. the first run is without the divide, the second is with.
log.001: time=6.75 mat=0 n=12424532 fh=93% nps=1.8M
log.002: time=6.74 mat=0 n=12424532 fh=93% nps=1.8M
So I see absolutely no difference in speed at all...
Bob
BTW both were compiled using intel's 64 bit compiler, with PGO. the first run is without the divide, the second is with.
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: question about symmertic evaluation
hmm, I don't get your math. With 2M nps I calculate 500nsec/node. Or 1000 cycles / node on a 2 GHz machine, where one cycle takes 0.5 nsec. Number of instructions per node depends on your average ipc. Using idiv tends to slow it downbob wrote: Actually my 1.7M was using one core.
And no, I don't divide at every node, but I do divide at most of 'em (at every call to Evaluate() but not at non-qsearch internal nodes).
However, if you round my NPS up to 2M, that is 1usec/node, but in the core2, that is 1000nsec/node, which is certainly something well beyond 1000 instructions/node. On this box (4M L2) all instructions and most of the data fits into L2 (not counting hash tables of course). I'm going to stick a bogus divide of two int values (global values so the compiler can't optimize them away) and see what happens exactly...
I still have problems to understand that such an expensive instruction doesn't show up - but no need to inspect assembly.
-
- Posts: 1822
- Joined: Thu Mar 09, 2006 11:54 pm
- Location: The Netherlands
Re: question about symmertic evaluation
Uri, which medicines are you using nowadays?Uri Blass wrote:I found that my evaluation is not symmetric for white and black.
one of the reason is that in some part of my evaluation I simply calculate score from white point of view and later divide by 8
now (14>>3)=1 and (-14>>3)=-2 so I can get a different number
for white and black.
My question is how do you solve that type of problem and if there is a way to solve it without doing the code slower(of course it is easy to solve it by doing the code slower and writing something like
if (score<0)
{
score=-score;
score=score>>3;
score=-score;
}
else
score=score>>3;
I thought also about using (score+4)>>3 but in that case (12+4)>>3=2
when (-12+4)>>3=-1
Uri
I really have no clue what you are talking about.
Can you explain what the problem is?
Thanks,
Vincent
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: question about symmertic evaluation
Obviously Uri scales white relative eval aspects, where white and black values are already sign flipped. Whether this is a good idea is another question - but no reason to make deprecative statements.diep wrote: I really have no clue what you are talking about.
Can you explain what the problem is?
Thanks,
Vincent
The problem, you are probably aware of is arithmetical shift right versus idiv:
Code: Select all
if ( x & 7 )
abs(-x >> 3) != abs(x >> 3)
else
abs(-x >> 3) == abs(x >> 3)
but
abs(-x / 3) == abs(x / 3)
is always true.
-
- Posts: 10661
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: question about symmertic evaluation
see post number 4 in page 1.diep wrote:Uri, which medicines are you using nowadays?Uri Blass wrote:I found that my evaluation is not symmetric for white and black.
one of the reason is that in some part of my evaluation I simply calculate score from white point of view and later divide by 8
now (14>>3)=1 and (-14>>3)=-2 so I can get a different number
for white and black.
My question is how do you solve that type of problem and if there is a way to solve it without doing the code slower(of course it is easy to solve it by doing the code slower and writing something like
if (score<0)
{
score=-score;
score=score>>3;
score=-score;
}
else
score=score>>3;
I thought also about using (score+4)>>3 but in that case (12+4)>>3=2
when (-12+4)>>3=-1
Uri
I really have no clue what you are talking about.
Can you explain what the problem is?
Thanks,
Vincent
It was about some code that I have to prefer pawns in the endgame.
The problem was already solved after the post of Bob hyatt when he suggested to use / instead of >>.
Uri
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: question about symmertic evaluation
total idiocy is all I can claim.Gerd Isenberg wrote:hmm, I don't get your math. With 2M nps I calculate 500nsec/node. Or 1000 cycles / node on a 2 GHz machine, where one cycle takes 0.5 nsec. Number of instructions per node depends on your average ipc. Using idiv tends to slow it downbob wrote: Actually my 1.7M was using one core.
And no, I don't divide at every node, but I do divide at most of 'em (at every call to Evaluate() but not at non-qsearch internal nodes).
However, if you round my NPS up to 2M, that is 1usec/node, but in the core2, that is 1000nsec/node, which is certainly something well beyond 1000 instructions/node. On this box (4M L2) all instructions and most of the data fits into L2 (not counting hash tables of course). I'm going to stick a bogus divide of two int values (global values so the compiler can't optimize them away) and see what happens exactly...
I still have problems to understand that such an expensive instruction doesn't show up - but no need to inspect assembly.
500 ns per node is right.