Thinker output

sje · Post by **sje** » Wed Mar 25, 2009 10:33 pm

CThinker wrote:

    // command strings

    static const char StrCmd&#91;&#93; =

        "computer"      "\0"
        "easy"          "\0"
        "hard"          "\0"
        "level"         "\0"
        "otim"          "\0"
        "time"          "\0"
        "post"          "\0"
        "nopost"        "\0"
        "st"            "\0"
        "sd"            "\0"
        "xboard"        "\0"
        "go"            "\0"
        "setboard"      "\0"
        "?"             "\0"
        "force"         "\0"
        "new"           "\0"
        "remove"        "\0"
        "undo"          "\0"
        "quit"          "\0"
        ;

Hint: Have the strings appear in ASCII order. Then you can use bsearch() or the like for a quick look-up.

sje · Post by **sje** » Wed Mar 25, 2009 10:37 pm

CThinker wrote:Actually, I was not saying that the PV array will go to the stack. Instead, what I was staying is that there is now less memory that can be given to you as stack memory because that memory has been allocated somewhere else. With smaller stack, you run in to the chance of overrunning it. You go a few plies deep, and your program could just crash.

It is not necessary to code the search recursively. In fact, there are some good reasons for an iterative coding and one of them is the elimination of a stack overflow hazard.

Some of the early CC pioneers used iterative coding in part because of the unavailability of high level languages supporting recursion.

bob · Post by **bob** » Wed Mar 25, 2009 10:55 pm

CThinker wrote:
bob wrote:
CThinker wrote:
bob wrote:
CThinker wrote:
bob wrote:
Zach Wegner wrote:
CThinker wrote:The ways that other engines collect PV arrays look very hacky to me. They either allocate a square array, but use only half of it, or, allocates PV arrays on the stack on each call to a search function. Neither is acceptable to me. So, until I device a really clean solution, I am not inclined on polluting the Thinker code.
One idea from Vincent is to make an array pv_stack[MAX_PLY * MAX_PLY / 2]; and then keep a pointer to the first and last entry into it for each ply. Whenever you get a new PV, you save it starting at the first pointer, and then save the last entry. You then pass the entry past the last move (also you need a marker to signify the end of a PV) to the next ply.

ZCT used to do this but I got rid of it because of the copying of PVs in SMP. One CPU could be searching a node with current PV of length 1, then another CPU backs up a PV of length 3. You either have to truncate it or shift the current PV forward in the array to make room for it.
I like the approach I use, which is a pv[maxply][maxply]

At any depth, you only store into pv[ply][ply] for a terminal move. As you back up you only back up a part of this array. When you back up from 10 to 9, you copy everything from pv[10][0 to endofpv] and then save the current move in pv[9[[9] to go along with the partial PV being backed up... This is executed _very_ few times and costs nothing as a result.
Yup, I am aware of this approach, but it does not appeal to me. Half of that array is unused.
And that is a problem because... ???
Well, I guess to you there is no problem then. That's OK.
This is similar to fretting around because you waste up to 63 bytes of memory to force your hash table to sit on a 64-byte memory alignment to improve cache efficiency. Do you worry about that potential loss of 63 bytes of memory? It _is_ wasted...

the 1/2 used array makes the code clean and elegant and fast. All much more important than losing 1/2 of a 64xx64 = 4096 element array. 2048 x 4 = 8K bytes on machines that nowadays have at least 1-2 gigabytes of memory. On that scale, properly aligning the hash table is just as bad... Yet I'd hope everyone does that...
Actually, 63 bytes is a lot in my world. To me, every byte counts, and I mean that literraly. So, if it is at all posible to avoid wasting it, I do take the extra steps, assuming that there is no additional hit on code (binary) size.

Just to illustrate that, I'll use here the constant strings as an example. Most compilers align the strings. And so, you end-up with unused bytes at the end of strings.

If you open the Fruit EXE in a binary editor, you will see unused bytes in between the strings.
Code: Select all
btime   depth   infinite    mate    movestogo   movetime    nodes   ponder  searchmoves winc    wtime   fen     moves   name  
In constrast, if you open any Thinker EXE in a binary editor, you will not see any unused bytes.
Code: Select all
computer easy hard level otim time post nopost st sd xboard go setboard ? force new remove undo quit
Some compilers will take care of this. Some won't, no matter what option you give it.

To make sure that "any compiler" will always not waste any byte, I actually deliberately code for it. Here is a snippet from the Thinker xboard shell:
Code: Select all
    // command strings

    static const char StrCmd&#91;&#93; =

        "computer"      "\0"
        "easy"          "\0"
        "hard"          "\0"
        "level"         "\0"
        "otim"          "\0"
        "time"          "\0"
        "post"          "\0"
        "nopost"        "\0"
        "st"            "\0"
        "sd"            "\0"
        "xboard"        "\0"
        "go"            "\0"
        "setboard"      "\0"
        "?"             "\0"
        "force"         "\0"
        "new"           "\0"
        "remove"        "\0"
        "undo"          "\0"
        "quit"          "\0"
        ;

    #define StrCmd_computer      ( StrCmd )
    #define StrCmd_easy          ( StrCmd_computer      +  8 + 1 )
    #define StrCmd_hard          ( StrCmd_easy          +  4 + 1 )
    #define StrCmd_level         ( StrCmd_hard          +  4 + 1 )
    #define StrCmd_otim          ( StrCmd_level         +  5 + 1 )
    #define StrCmd_time          ( StrCmd_otim          +  4 + 1 )
    #define StrCmd_post          ( StrCmd_time          +  4 + 1 )
    #define StrCmd_nopost        ( StrCmd_post          +  4 + 1 )
    #define StrCmd_st            ( StrCmd_nopost        +  6 + 1 )
    #define StrCmd_sd            ( StrCmd_st            +  2 + 1 )
    #define StrCmd_xboard        ( StrCmd_sd            +  2 + 1 )
    #define StrCmd_go            ( StrCmd_xboard        +  6 + 1 )
    #define StrCmd_setboard      ( StrCmd_go            +  2 + 1 )
    #define StrCmd_movenow       ( StrCmd_setboard      +  8 + 1 )
    #define StrCmd_force         ( StrCmd_movenow       +  1 + 1 )
    #define StrCmd_new           ( StrCmd_force         +  5 + 1 )
    #define StrCmd_remove        ( StrCmd_new           +  3 + 1 )
    #define StrCmd_undo          ( StrCmd_remove        +  6 + 1 )
    #define StrCmd_quit          ( StrCmd_undo          +  4 + 1 )
As you can see, there is extra source code, but that does not affect the actual binary size. These are just pre-processor text substitutions.

This also forces me not to use the strings directly. If the strings need to change (say, I need to Greek version of the xboard commands), then I only need to change in one known place.

Now back to the squre PV array..., in Thinker, the max ply is 128. The square PV array approach would translate to 64K. 32K of that will never be used. For others, its nothing. To me, that's a lot. On a Pocket PC, that may be the difference between running and running out of stack and terminating.

Some developers don't go though this type of detail, but I do. Some don't bother designing (the engine just ends up as one giant chunk of intermingled code), but I do.

But then again, that's just me. To each his own, of course.

Based on the above code, you could always use a union to equivalence those unused array elements with various scalar and small arrays you use, so that nothing goes unused.

Would not be very elegant IMHO because changing the size of the array would become problematic.

bob · Post by **bob** » Wed Mar 25, 2009 10:57 pm

CThinker wrote:
michiguel wrote: Miguel
Edit: You do not need to have the array in the stack.
Actually, I was not saying that the PV array will go to the stack. Instead, what I was staying is that there is now less memory that can be given to you as stack memory because that memory has been allocated somewhere else. With smaller stack, you run in to the chance of overrunning it. You go a few plies deep, and your program could just crash.

Memory is not free. If you use it in one place, you now have less for others. That 32K could be used for the pawn hash table, for example, which is just about right on a mobile phone.

Cheers...

Blowing the stack in x86-land is going to take a _big_ set of pushes...

sje · Post by **sje** » Wed Mar 25, 2009 11:23 pm

bob wrote:Blowing the stack in x86-land is going to take a _big_ set of pushes...

Maybe not, depending on the amount of automatic storage used for each recursive invocation.

The test string

Code: Select all

x = ((((((((((((((((((((((((((((((((((&#40;y&#41;))))))))))))))))))))))))))))))))));

written with a sufficient number of parentheses will take out a recursive descent parser. This has been useful for chastising overconfident students in compiler writing courses.

CThinker · Post by **CThinker** » Wed Mar 25, 2009 11:45 pm

bob wrote:
CThinker wrote:
bob wrote:
CThinker wrote:
bob wrote:
CThinker wrote:
bob wrote:
Zach Wegner wrote:
CThinker wrote:The ways that other engines collect PV arrays look very hacky to me. They either allocate a square array, but use only half of it, or, allocates PV arrays on the stack on each call to a search function. Neither is acceptable to me. So, until I device a really clean solution, I am not inclined on polluting the Thinker code.
One idea from Vincent is to make an array pv_stack[MAX_PLY * MAX_PLY / 2]; and then keep a pointer to the first and last entry into it for each ply. Whenever you get a new PV, you save it starting at the first pointer, and then save the last entry. You then pass the entry past the last move (also you need a marker to signify the end of a PV) to the next ply.

ZCT used to do this but I got rid of it because of the copying of PVs in SMP. One CPU could be searching a node with current PV of length 1, then another CPU backs up a PV of length 3. You either have to truncate it or shift the current PV forward in the array to make room for it.
I like the approach I use, which is a pv[maxply][maxply]

At any depth, you only store into pv[ply][ply] for a terminal move. As you back up you only back up a part of this array. When you back up from 10 to 9, you copy everything from pv[10][0 to endofpv] and then save the current move in pv[9[[9] to go along with the partial PV being backed up... This is executed _very_ few times and costs nothing as a result.
Yup, I am aware of this approach, but it does not appeal to me. Half of that array is unused.
And that is a problem because... ???
Well, I guess to you there is no problem then. That's OK.
This is similar to fretting around because you waste up to 63 bytes of memory to force your hash table to sit on a 64-byte memory alignment to improve cache efficiency. Do you worry about that potential loss of 63 bytes of memory? It _is_ wasted...

the 1/2 used array makes the code clean and elegant and fast. All much more important than losing 1/2 of a 64xx64 = 4096 element array. 2048 x 4 = 8K bytes on machines that nowadays have at least 1-2 gigabytes of memory. On that scale, properly aligning the hash table is just as bad... Yet I'd hope everyone does that...
Actually, 63 bytes is a lot in my world. To me, every byte counts, and I mean that literraly. So, if it is at all posible to avoid wasting it, I do take the extra steps, assuming that there is no additional hit on code (binary) size.

Just to illustrate that, I'll use here the constant strings as an example. Most compilers align the strings. And so, you end-up with unused bytes at the end of strings.

If you open the Fruit EXE in a binary editor, you will see unused bytes in between the strings.
Code: Select all
btime   depth   infinite    mate    movestogo   movetime    nodes   ponder  searchmoves winc    wtime   fen     moves   name  
In constrast, if you open any Thinker EXE in a binary editor, you will not see any unused bytes.
Code: Select all
computer easy hard level otim time post nopost st sd xboard go setboard ? force new remove undo quit
Some compilers will take care of this. Some won't, no matter what option you give it.

To make sure that "any compiler" will always not waste any byte, I actually deliberately code for it. Here is a snippet from the Thinker xboard shell:
Code: Select all
    // command strings

    static const char StrCmd&#91;&#93; =

        "computer"      "\0"
        "easy"          "\0"
        "hard"          "\0"
        "level"         "\0"
        "otim"          "\0"
        "time"          "\0"
        "post"          "\0"
        "nopost"        "\0"
        "st"            "\0"
        "sd"            "\0"
        "xboard"        "\0"
        "go"            "\0"
        "setboard"      "\0"
        "?"             "\0"
        "force"         "\0"
        "new"           "\0"
        "remove"        "\0"
        "undo"          "\0"
        "quit"          "\0"
        ;

    #define StrCmd_computer      ( StrCmd )
    #define StrCmd_easy          ( StrCmd_computer      +  8 + 1 )
    #define StrCmd_hard          ( StrCmd_easy          +  4 + 1 )
    #define StrCmd_level         ( StrCmd_hard          +  4 + 1 )
    #define StrCmd_otim          ( StrCmd_level         +  5 + 1 )
    #define StrCmd_time          ( StrCmd_otim          +  4 + 1 )
    #define StrCmd_post          ( StrCmd_time          +  4 + 1 )
    #define StrCmd_nopost        ( StrCmd_post          +  4 + 1 )
    #define StrCmd_st            ( StrCmd_nopost        +  6 + 1 )
    #define StrCmd_sd            ( StrCmd_st            +  2 + 1 )
    #define StrCmd_xboard        ( StrCmd_sd            +  2 + 1 )
    #define StrCmd_go            ( StrCmd_xboard        +  6 + 1 )
    #define StrCmd_setboard      ( StrCmd_go            +  2 + 1 )
    #define StrCmd_movenow       ( StrCmd_setboard      +  8 + 1 )
    #define StrCmd_force         ( StrCmd_movenow       +  1 + 1 )
    #define StrCmd_new           ( StrCmd_force         +  5 + 1 )
    #define StrCmd_remove        ( StrCmd_new           +  3 + 1 )
    #define StrCmd_undo          ( StrCmd_remove        +  6 + 1 )
    #define StrCmd_quit          ( StrCmd_undo          +  4 + 1 )
As you can see, there is extra source code, but that does not affect the actual binary size. These are just pre-processor text substitutions.

This also forces me not to use the strings directly. If the strings need to change (say, I need to Greek version of the xboard commands), then I only need to change in one known place.

Now back to the squre PV array..., in Thinker, the max ply is 128. The square PV array approach would translate to 64K. 32K of that will never be used. For others, its nothing. To me, that's a lot. On a Pocket PC, that may be the difference between running and running out of stack and terminating.

Some developers don't go though this type of detail, but I do. Some don't bother designing (the engine just ends up as one giant chunk of intermingled code), but I do.

But then again, that's just me. To each his own, of course.
Based on the above code, you could always use a union to equivalence those unused array elements with various scalar and small arrays you use, so that nothing goes unused.

Would not be very elegant IMHO because changing the size of the array would become problematic.

Yup sure, you can use all sorts of tricks. The point is that, most won't go that route. To most, saving a few bytes does not matter. To me, it does. And, I try to achieve that without impact on anything else.

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Thu Mar 26, 2009 12:02 am

Can you please go to the programmers forum and continue your discution there,I feel ashamed of my ignorance when I read your posts

Dr.D

CThinker · Post by **CThinker** » Thu Mar 26, 2009 12:19 am

bob wrote:
CThinker wrote:
michiguel wrote: Miguel
Edit: You do not need to have the array in the stack.
Actually, I was not saying that the PV array will go to the stack. Instead, what I was staying is that there is now less memory that can be given to you as stack memory because that memory has been allocated somewhere else. With smaller stack, you run in to the chance of overrunning it. You go a few plies deep, and your program could just crash.

Memory is not free. If you use it in one place, you now have less for others. That 32K could be used for the pawn hash table, for example, which is just about right on a mobile phone.

Cheers...
Blowing the stack in x86-land is going to take a _big_ set of pushes...

Well, x86 is no the only platform in this world. The context here is actually the Pocket PC.

CThinker · Post by **CThinker** » Thu Mar 26, 2009 12:26 am

sje wrote:
CThinker wrote:Actually, I was not saying that the PV array will go to the stack. Instead, what I was staying is that there is now less memory that can be given to you as stack memory because that memory has been allocated somewhere else. With smaller stack, you run in to the chance of overrunning it. You go a few plies deep, and your program could just crash.
It is not necessary to code the search recursively. In fact, there are some good reasons for an iterative coding and one of them is the elimination of a stack overflow hazard.

Some of the early CC pioneers used iterative coding in part because of the unavailability of high level languages supporting recursion.

I used to code that way even with C in environments with very limited memory (e.g., embedded systems). I allows my full control of memory usage.

With this approach however, you are just managing your own stack instead of making use of the machine stack. I both cases, you need memory.

If you manage your own stack, you have to allocate that, and if there is no available memory, you would still fail.

CThinker · Post by **CThinker** » Thu Mar 26, 2009 12:42 am

sje wrote:

CThinker wrote:

Code: Select all

    // command strings

    static const char StrCmd&#91;&#93; =

        "computer"      "\0"
        "easy"          "\0"
        "hard"          "\0"
        "level"         "\0"
        "otim"          "\0"
        "time"          "\0"
        "post"          "\0"
        "nopost"        "\0"
        "st"            "\0"
        "sd"            "\0"
        "xboard"        "\0"
        "go"            "\0"
        "setboard"      "\0"
        "?"             "\0"
        "force"         "\0"
        "new"           "\0"
        "remove"        "\0"
        "undo"          "\0"
        "quit"          "\0"
        ;

Hint: Have the strings appear in ASCII order. Then you can use bsearch() or the like for a quick look-up.

Speed is actually not a concern here. This is for use by the protocol handler. The dictionary look-up is twice more code than a simple iteration.

Of course, both are small code. But again, I would take 50 bytes over 100 bytes any day. You save 50 here, 40 there, and before you know it, your code is half the size.

The Thinker EXE is only 55K, and that is because it is compiled for speed. When compiled for size, it comes down to 45K. When the bitboard board class is replaced with the mailbox board class, it comes down to 35K, but is now 30% slower. This is what is being ported to the DS.

Thinker output

Re: Thinker output

Re: Thinker output

Re: Thinker output

Re: Thinker output

Re: Thinker output

Re: Thinker output

Re: Thinker output

Re: Thinker output

Re: Thinker output

Re: Thinker output