case 1. Near the top of the eval, I take the material score plus a pretty big margin (knight value in middle game, rook value in endgame) and if the current material score + margin < alpha, or current material score - margin > beta, I simply return the current material score and get out. The intent here is to not do positional evaluation when down major material, which is quite common (more on the how common in a bit).
case 2. Near the end of the eval, after I have done the pawn + passed pawn stuff, I test the current score +/- a margin (that is dynamic between roughly pawn and 2 pawns or so) and if the current score + margin is < alpha, I avoid doing the piece scoring, or if the current score - margin is > beta, ditto.
I decided to test these to see how often each one is tried, how often it succeeds, and how often it succeeds and is wrong. What I chose to do was to eliminate the lazy cutoff at the top, but set a flag that tells me whether the alpha or beta test failed. But I kept right on in the eval. At the point where I might skip evaluating pieces, I did the same and set another flag telling me whether the alpha or beta test suggested this scoring should be skipped.
Finally, at the bottom of evaluate(), just before returning score, I checked each of the flags, and then compared the actual score to the same bound. If the first lazy test succeeded, material + margin < alpha, then I verified that actual score was < alpha. Ditto for material - margin > beta, and then the same two tests for the second lazy eval test as well. After searching a position, I just dumped the counters. Here is a sample:
Code: Select all
lazy cutoffs:
#1: tried=94,909,941 <alpha_done=21,047,333 <alpha_wrong=117
#1: tried=94,909,941 >beta_done=17,374,265 >beta_wrong=107
#2: tried=94,909,941 <alpha_done=36,546,071 <alpha_wrong=33,954
#2: tried=94,909,941 >beta_done=27,702,218 >beta_wrong=20,568
tried is obviously the number of times Evaluate() was actually called. There is one way that tried for #2 won't match, that's when mate scoring is done and the rest of the piece evaluation stuff gets skipped completely. But in the above this did not happen. The "beta_done" value means the material - margin > beta was triggered and we would have returned Material (or beta, same thing) here. The "beta_wrong" means that when I compared the final score to beta, it was actually <= beta when the lazy cutoff test had predicted it would be > beta. Ditto for alpha, but using the alpha bound.
Now for some actual data. First one is kopek #22 (Bxe4) searched to depth=28, one cpu to avoid worrying about locking the counters above and such, not to mention the non-determinism.
Second position is win at chess #2, the Rxb2 position. Ends up in an endgame and even gets egtb hits. Lots of checks and such too.
Third position is the initial starting position searched to depth 24. Took a good bit longer than the others (bad depth choice) on my macbook.
The fourth position is the classic fine #70, Kb1 position. Searched for 2 minutes (always forget that Crafty searches this to a forced mate after 1 minute, but I decided to not run it again and just leave these numbers as is.
Code: Select all
lazy cutoffs:
#1: tried=232,765,181 <alpha_done=50,602,410 <alpha_wrong=485
#1: tried=232,765,181 >beta_done=44,333,281 >beta_wrong=348
#2: tried=232,765,181 <alpha_done=88,039,569 <alpha_wrong=86,823
#2: tried=232,765,181 >beta_done=71,048,470 >beta_wrong=55,624
#1: tried=216,176,727 <alpha_done=27,849,350 <alpha_wrong=1,429,110
#1: tried=216,176,727 >beta_done=14,305,040 >beta_wrong=2,048,679
#2: tried=215,925,197 <alpha_done=88,389,502 <alpha_wrong=328,371
#2: tried=215,925,197 >beta_done=53,822,321 >beta_wrong=373,325
lazy cutoffs:
#1: tried=274,505,203 <alpha_done=43,680,268 <alpha_wrong=4,364
#1: tried=274,505,203 >beta_done=30,890,091 >beta_wrong=2,532
#2: tried=274,505,203 <alpha_done=82,748,008 <alpha_wrong=160,751
#2: tried=274,505,203 >beta_done=59,762,057 >beta_wrong=180,485
lazy cutoffs:
#1: tried=137,919,807 <alpha_done=73,605,381 <alpha_wrong=549,500
#1: tried=137,919,807 >beta_done=57,146,523 >beta_wrong=587,823
#2: tried=132,816,452 <alpha_done=73,650,079 <alpha_wrong=14,119
#2: tried=132,816,452 >beta_done=56,001,456 >beta_wrong=17,247
Position number 2 suggests that the "pieceless" margin might be improved a bit, something I am going to test. Otherwise the number of failures is surprisingly low.
For performance, the difference is pretty clear. With lazy eval, 5.5M nodes per sec on my macbook (one cpu). With lazy eval disabled, 4.1M nodes per second. Or about 75% as fast as with lazy eval.
Has anybody else done any similar testing?
edit:
adjusting the margin for no pieces reduced the percentage of wrong cutoffs (#1) to something reasonable. More tuning scheduled.. Didn't seem to affect the NPS on wac 2 at all, which was nice.