Then the challenge is not properly formulated. He asks for
an assembly-written perft() that beat
his C-code. Apparently any assemby-code would not do, and there are some restrictions on it. Like that the assembly code in question could be considered an optimized version of his C-code algorithm. But that is an ill-defined task. Modern optimizers do amazing things, and when I try to time multiplier performance, and write code like
Code: Select all
cnt=1e9; while(--cnt) a *= 1.00000000001;
it always times as 0, because the optimizer removes the entire loop. (Unless you actually print the value of a afterwards.)
So if the C-code perft is slow, because it Makes/UnMakes the last ply, refraining from making that ply causes an enormous speedup. But a clever optimizer (especially a human optimizer!) can easily recognize the set of assembly instructions that make the move and then immediately take it back as a set of instructions that as a group do nothing, and remove it for that reason.
I remember having to optimize PDP 11 assembly code for Akkerman's function for a course in computer organization, which is a heavily recursively defined functions. The best I could come up with was beaten by orders of magnitude by some others, who had used the fact that for some low value of the arguments there existed a formula for that function in closed form (something like Akk(0, N) = N, when the definition for that case would reduce to Akk(0,N) = Akk(0,N-1) and Akk(0,0)=0. So my optimized code was calculating N recursively by N recursive function calls each doing an INC on the passed value to produce N. Others has 'optimized that away'.