Developments of the last two years

Joost Buijs · Post by **Joost Buijs** » Mon Mar 04, 2013 10:01 am

Joost Buijs wrote: Zortech was the very first C compiler I ever used, as I recall this was in the beginning of the 80's. I used it for some years, after that I went to Watcom C/C++.

I remember I also used something called Lattice C, it was in 1983 or 1984.
I don't know anymore which compiler I used at first, it is too long ago.

Sven · Post by **Sven** » Mon Mar 04, 2013 10:13 am

lucasart wrote:For example, the switch argument. Let's say that your piece values range from 0 to 11. How about writing this instead?
Code: Select all
if (piece >= 12) goto skip;
switch (piece) {...};
skip:
The fact that 0 <= piece <= 11 can help to optimize the code is of course not known by a C compiler, nor is it known by an assembler...

An even better way in terms of optimization, when maintaining a DEBUG version using assert(), would be:

Code: Select all

assert(piece >= 0); // only needed if "piece" is signed
assert(piece <= 11);
switch (piece) {
    case 0:
        ...
    case 10:
        ...
    case 11:
        ...
}

where you rely on thoroughly testing the DEBUG version, of course. The code above is NOT recommended for safety-critical applications, or for applications where performance is not an important issue. Some compilers might even emit a warning for the missing default case. Another variant would be to have

Code: Select all

    default:
        assert(piece == 11);
        ...

instead of the two assert's and the "case 11", but see below why that might be slower.

lucasart wrote:In C++, with typed enum, you can let the compiler know the range of possible values
Code: Select all
enum Piece {wPawn, wKnight...};
And then the switch knows that the set of possible values for a "Piece" is realatively compact so the compiler may decide to use a jump table, to make branching a O(1) operation.

You do not need an enum to help the compiler use a jump table or a simple jump offset calculation for switch statements. It is sufficient to have a normal integer variable and consecutive switch cases from 0 to N and, to get even better results, to also have equal-sized code for each of these switch cases (to avoid additional jumps). The version I proposed above would be sufficient in my opinion since there is no "default" case. The compiler normally does not insert a "default" case on its own when generating optimized code.

Sven

mar · Post by **mar** » Mon Mar 04, 2013 10:45 am

lucasart wrote: How about writing this instead?
Code: Select all
if (piece >= 12) goto skip;
switch (piece) {...};
skip:
The fact that 0 <= piece <= 11 can help to optimize the code is of course not known by a C compiler, nor is it known by an assembler...

It is of course known by the compiler and it will do exactly this (except in assembly), just look at the assembly the compiler produces.

Rebel · Post by **Rebel** » Mon Mar 04, 2013 11:54 am

lucasart wrote: So let's make this a challenge. I challenge anyone to write an assembly perft() that will beat my C code. PM me if you are interested!

Anyone up for the challenge ?

I will take the bait provided:

1. As long as your 32-bit Perft (stand alone) code compiles with my compiler;

2. It is not too much code (work).

I will optimize and then we compare.

*** edit *** 32-bit
*** edit *** added stand alone

hgm · Post by **hgm** » Mon Mar 04, 2013 12:05 pm

lucasart wrote:So let's make this a challenge. I challenge anyone to write an assembly perft() that will beat my C code. PM me if you are interested!

Anyone up for the challenge ?

How fast is your perft(), then? So that we know what to beat.

Rebel · Post by **Rebel** » Mon Mar 04, 2013 12:08 pm

hgm wrote:
lucasart wrote:So let's make this a challenge. I challenge anyone to write an assembly perft() that will beat my C code. PM me if you are interested!

Anyone up for the challenge ?
How fast is your perft(), then? So that we know what to beat.

That's a separate subject, the challenge is about C versus ASM.

lucasart · Post by **lucasart** » Mon Mar 04, 2013 12:13 pm

Rebel wrote:
lucasart wrote: So let's make this a challenge. I challenge anyone to write an assembly perft() that will beat my C code. PM me if you are interested!

Anyone up for the challenge ?
I will take the bait provided:

1. As long as your 32-bit Perft (stand alone) code compiles with my compiler;

2. It is not too much code (work).

I will optimize and then we compare.

*** edit *** 32-bit
*** edit *** added stand alone

That's interesting, although it's not the original challenge. Basically you take my code, get your compiler to spue out assembly, and optimize manually the assembly. I'm sure the gain will be negligible if any (certainly not a factor 2 as was discussed earlier). Besides, this challenge is meaningless, and would simply prove that your compiler is deficient (if you manage to beat significantly GCC I will bow before you).

The reason why no one can win the originlal challenge is that:
- writing in assembly fro m scratch a perft (with all the board and move gen code of course) is an enormous task, that would take a huge amount of time
- optimozing that code manually until it beats what GCC generates for my C code is even harder

But fair enough, you certainly can optimize manually the asm produced by the compiler. I say you won't optimize much, and the gain (if any) is certainly not worth transforming your engine into an unmaintainable assembly mess.

I need to hack the code to remove all the DiscoCheck that is not useful and will send you a cleand up version that only does perft.

Rebel · Post by **Rebel** » Mon Mar 04, 2013 12:26 pm

lucasart wrote:
Rebel wrote:
lucasart wrote: So let's make this a challenge. I challenge anyone to write an assembly perft() that will beat my C code. PM me if you are interested!

Anyone up for the challenge ?
I will take the bait provided:

1. As long as your 32-bit Perft (stand alone) code compiles with my compiler;

2. It is not too much code (work).

I will optimize and then we compare.

*** edit *** 32-bit
*** edit *** added stand alone
That's interesting, although it's not the originla challenge. Basically you take my code, compile it, and optimize manually the assembly. I'm sure the gain will be negligible if any (certainly not a factor 2 as was discussed earlier).

A few posts earlier you stated:

Have you ever tried to beat the C compiler with your own hand-written assembly ? In the old days (80486 let's say) it was still possible. But nowadays, architectures have become so complicated, and compilers so powerful, that it's virtually impossible to beat the compiler on any significant portion of code.

It's exactly the challenge and I am interested to explore how to define negligible

Besides, I never claimed a factor 2.

Rebel · Post by **Rebel** » Mon Mar 04, 2013 12:29 pm

lucasart wrote: I need to hack the code to remove all the DiscoCheck that is not useful and will send you a cleand up version that only does perft.

Great

hgm · Post by **hgm** » Mon Mar 04, 2013 12:30 pm

Then the challenge is not properly formulated. He asks for an assembly-written perft() that beat his C-code. Apparently any assemby-code would not do, and there are some restrictions on it. Like that the assembly code in question could be considered an optimized version of his C-code algorithm. But that is an ill-defined task. Modern optimizers do amazing things, and when I try to time multiplier performance, and write code like

Code: Select all

cnt=1e9; while(--cnt) a *= 1.00000000001;

it always times as 0, because the optimizer removes the entire loop. (Unless you actually print the value of a afterwards.)

So if the C-code perft is slow, because it Makes/UnMakes the last ply, refraining from making that ply causes an enormous speedup. But a clever optimizer (especially a human optimizer!) can easily recognize the set of assembly instructions that make the move and then immediately take it back as a set of instructions that as a group do nothing, and remove it for that reason.

I remember having to optimize PDP 11 assembly code for Akkerman's function for a course in computer organization, which is a heavily recursively defined functions. The best I could come up with was beaten by orders of magnitude by some others, who had used the fact that for some low value of the arguments there existed a formula for that function in closed form (something like Akk(0, N) = N, when the definition for that case would reduce to Akk(0,N) = Akk(0,N-1) and Akk(0,0)=0. So my optimized code was calculating N recursively by N recursive function calls each doing an INC on the passed value to produce N. Others has 'optimized that away'.

Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years