Hint: Have the strings appear in ASCII order. Then you can use bsearch() or the like for a quick look-up.CThinker wrote:Code: Select all
// command strings static const char StrCmd[] = "computer" "\0" "easy" "\0" "hard" "\0" "level" "\0" "otim" "\0" "time" "\0" "post" "\0" "nopost" "\0" "st" "\0" "sd" "\0" "xboard" "\0" "go" "\0" "setboard" "\0" "?" "\0" "force" "\0" "new" "\0" "remove" "\0" "undo" "\0" "quit" "\0" ;
Thinker output
Moderators: hgm, Rebel, chrisw
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Thinker output
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Thinker output
It is not necessary to code the search recursively. In fact, there are some good reasons for an iterative coding and one of them is the elimination of a stack overflow hazard.CThinker wrote:Actually, I was not saying that the PV array will go to the stack. Instead, what I was staying is that there is now less memory that can be given to you as stack memory because that memory has been allocated somewhere else. With smaller stack, you run in to the chance of overrunning it. You go a few plies deep, and your program could just crash.
Some of the early CC pioneers used iterative coding in part because of the unavailability of high level languages supporting recursion.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Thinker output
Based on the above code, you could always use a union to equivalence those unused array elements with various scalar and small arrays you use, so that nothing goes unused.CThinker wrote:Actually, 63 bytes is a lot in my world. To me, every byte counts, and I mean that literraly. So, if it is at all posible to avoid wasting it, I do take the extra steps, assuming that there is no additional hit on code (binary) size.bob wrote:This is similar to fretting around because you waste up to 63 bytes of memory to force your hash table to sit on a 64-byte memory alignment to improve cache efficiency. Do you worry about that potential loss of 63 bytes of memory? It _is_ wasted...CThinker wrote:Well, I guess to you there is no problem then. That's OK.bob wrote:And that is a problem because... ???CThinker wrote:Yup, I am aware of this approach, but it does not appeal to me. Half of that array is unused.bob wrote:I like the approach I use, which is a pv[maxply][maxply]Zach Wegner wrote:One idea from Vincent is to make an array pv_stack[MAX_PLY * MAX_PLY / 2]; and then keep a pointer to the first and last entry into it for each ply. Whenever you get a new PV, you save it starting at the first pointer, and then save the last entry. You then pass the entry past the last move (also you need a marker to signify the end of a PV) to the next ply.CThinker wrote:The ways that other engines collect PV arrays look very hacky to me. They either allocate a square array, but use only half of it, or, allocates PV arrays on the stack on each call to a search function. Neither is acceptable to me. So, until I device a really clean solution, I am not inclined on polluting the Thinker code.
ZCT used to do this but I got rid of it because of the copying of PVs in SMP. One CPU could be searching a node with current PV of length 1, then another CPU backs up a PV of length 3. You either have to truncate it or shift the current PV forward in the array to make room for it.
At any depth, you only store into pv[ply][ply] for a terminal move. As you back up you only back up a part of this array. When you back up from 10 to 9, you copy everything from pv[10][0 to endofpv] and then save the current move in pv[9[[9] to go along with the partial PV being backed up... This is executed _very_ few times and costs nothing as a result.
the 1/2 used array makes the code clean and elegant and fast. All much more important than losing 1/2 of a 64xx64 = 4096 element array. 2048 x 4 = 8K bytes on machines that nowadays have at least 1-2 gigabytes of memory. On that scale, properly aligning the hash table is just as bad... Yet I'd hope everyone does that...
Just to illustrate that, I'll use here the constant strings as an example. Most compilers align the strings. And so, you end-up with unused bytes at the end of strings.
If you open the Fruit EXE in a binary editor, you will see unused bytes in between the strings.In constrast, if you open any Thinker EXE in a binary editor, you will not see any unused bytes.Code: Select all
btime depth infinite mate movestogo movetime nodes ponder searchmoves winc wtime fen moves name
Some compilers will take care of this. Some won't, no matter what option you give it.Code: Select all
computer easy hard level otim time post nopost st sd xboard go setboard ? force new remove undo quit
To make sure that "any compiler" will always not waste any byte, I actually deliberately code for it. Here is a snippet from the Thinker xboard shell:
As you can see, there is extra source code, but that does not affect the actual binary size. These are just pre-processor text substitutions.Code: Select all
// command strings static const char StrCmd[] = "computer" "\0" "easy" "\0" "hard" "\0" "level" "\0" "otim" "\0" "time" "\0" "post" "\0" "nopost" "\0" "st" "\0" "sd" "\0" "xboard" "\0" "go" "\0" "setboard" "\0" "?" "\0" "force" "\0" "new" "\0" "remove" "\0" "undo" "\0" "quit" "\0" ; #define StrCmd_computer ( StrCmd ) #define StrCmd_easy ( StrCmd_computer + 8 + 1 ) #define StrCmd_hard ( StrCmd_easy + 4 + 1 ) #define StrCmd_level ( StrCmd_hard + 4 + 1 ) #define StrCmd_otim ( StrCmd_level + 5 + 1 ) #define StrCmd_time ( StrCmd_otim + 4 + 1 ) #define StrCmd_post ( StrCmd_time + 4 + 1 ) #define StrCmd_nopost ( StrCmd_post + 4 + 1 ) #define StrCmd_st ( StrCmd_nopost + 6 + 1 ) #define StrCmd_sd ( StrCmd_st + 2 + 1 ) #define StrCmd_xboard ( StrCmd_sd + 2 + 1 ) #define StrCmd_go ( StrCmd_xboard + 6 + 1 ) #define StrCmd_setboard ( StrCmd_go + 2 + 1 ) #define StrCmd_movenow ( StrCmd_setboard + 8 + 1 ) #define StrCmd_force ( StrCmd_movenow + 1 + 1 ) #define StrCmd_new ( StrCmd_force + 5 + 1 ) #define StrCmd_remove ( StrCmd_new + 3 + 1 ) #define StrCmd_undo ( StrCmd_remove + 6 + 1 ) #define StrCmd_quit ( StrCmd_undo + 4 + 1 )
This also forces me not to use the strings directly. If the strings need to change (say, I need to Greek version of the xboard commands), then I only need to change in one known place.
Now back to the squre PV array..., in Thinker, the max ply is 128. The square PV array approach would translate to 64K. 32K of that will never be used. For others, its nothing. To me, that's a lot. On a Pocket PC, that may be the difference between running and running out of stack and terminating.
Some developers don't go though this type of detail, but I do. Some don't bother designing (the engine just ends up as one giant chunk of intermingled code), but I do.
But then again, that's just me. To each his own, of course.
Would not be very elegant IMHO because changing the size of the array would become problematic.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Thinker output
Blowing the stack in x86-land is going to take a _big_ set of pushes...CThinker wrote:Actually, I was not saying that the PV array will go to the stack. Instead, what I was staying is that there is now less memory that can be given to you as stack memory because that memory has been allocated somewhere else. With smaller stack, you run in to the chance of overrunning it. You go a few plies deep, and your program could just crash.michiguel wrote: Miguel
Edit: You do not need to have the array in the stack.
Memory is not free. If you use it in one place, you now have less for others. That 32K could be used for the pawn hash table, for example, which is just about right on a mobile phone.
Cheers...
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Thinker output
Maybe not, depending on the amount of automatic storage used for each recursive invocation.bob wrote:Blowing the stack in x86-land is going to take a _big_ set of pushes...
The test string
Code: Select all
x = (((((((((((((((((((((((((((((((((((y)))))))))))))))))))))))))))))))))));
-
- Posts: 388
- Joined: Wed Mar 08, 2006 10:08 pm
Re: Thinker output
Yup sure, you can use all sorts of tricks. The point is that, most won't go that route. To most, saving a few bytes does not matter. To me, it does. And, I try to achieve that without impact on anything else.bob wrote:Based on the above code, you could always use a union to equivalence those unused array elements with various scalar and small arrays you use, so that nothing goes unused.CThinker wrote:Actually, 63 bytes is a lot in my world. To me, every byte counts, and I mean that literraly. So, if it is at all posible to avoid wasting it, I do take the extra steps, assuming that there is no additional hit on code (binary) size.bob wrote:This is similar to fretting around because you waste up to 63 bytes of memory to force your hash table to sit on a 64-byte memory alignment to improve cache efficiency. Do you worry about that potential loss of 63 bytes of memory? It _is_ wasted...CThinker wrote:Well, I guess to you there is no problem then. That's OK.bob wrote:And that is a problem because... ???CThinker wrote:Yup, I am aware of this approach, but it does not appeal to me. Half of that array is unused.bob wrote:I like the approach I use, which is a pv[maxply][maxply]Zach Wegner wrote:One idea from Vincent is to make an array pv_stack[MAX_PLY * MAX_PLY / 2]; and then keep a pointer to the first and last entry into it for each ply. Whenever you get a new PV, you save it starting at the first pointer, and then save the last entry. You then pass the entry past the last move (also you need a marker to signify the end of a PV) to the next ply.CThinker wrote:The ways that other engines collect PV arrays look very hacky to me. They either allocate a square array, but use only half of it, or, allocates PV arrays on the stack on each call to a search function. Neither is acceptable to me. So, until I device a really clean solution, I am not inclined on polluting the Thinker code.
ZCT used to do this but I got rid of it because of the copying of PVs in SMP. One CPU could be searching a node with current PV of length 1, then another CPU backs up a PV of length 3. You either have to truncate it or shift the current PV forward in the array to make room for it.
At any depth, you only store into pv[ply][ply] for a terminal move. As you back up you only back up a part of this array. When you back up from 10 to 9, you copy everything from pv[10][0 to endofpv] and then save the current move in pv[9[[9] to go along with the partial PV being backed up... This is executed _very_ few times and costs nothing as a result.
the 1/2 used array makes the code clean and elegant and fast. All much more important than losing 1/2 of a 64xx64 = 4096 element array. 2048 x 4 = 8K bytes on machines that nowadays have at least 1-2 gigabytes of memory. On that scale, properly aligning the hash table is just as bad... Yet I'd hope everyone does that...
Just to illustrate that, I'll use here the constant strings as an example. Most compilers align the strings. And so, you end-up with unused bytes at the end of strings.
If you open the Fruit EXE in a binary editor, you will see unused bytes in between the strings.In constrast, if you open any Thinker EXE in a binary editor, you will not see any unused bytes.Code: Select all
btime depth infinite mate movestogo movetime nodes ponder searchmoves winc wtime fen moves name
Some compilers will take care of this. Some won't, no matter what option you give it.Code: Select all
computer easy hard level otim time post nopost st sd xboard go setboard ? force new remove undo quit
To make sure that "any compiler" will always not waste any byte, I actually deliberately code for it. Here is a snippet from the Thinker xboard shell:
As you can see, there is extra source code, but that does not affect the actual binary size. These are just pre-processor text substitutions.Code: Select all
// command strings static const char StrCmd[] = "computer" "\0" "easy" "\0" "hard" "\0" "level" "\0" "otim" "\0" "time" "\0" "post" "\0" "nopost" "\0" "st" "\0" "sd" "\0" "xboard" "\0" "go" "\0" "setboard" "\0" "?" "\0" "force" "\0" "new" "\0" "remove" "\0" "undo" "\0" "quit" "\0" ; #define StrCmd_computer ( StrCmd ) #define StrCmd_easy ( StrCmd_computer + 8 + 1 ) #define StrCmd_hard ( StrCmd_easy + 4 + 1 ) #define StrCmd_level ( StrCmd_hard + 4 + 1 ) #define StrCmd_otim ( StrCmd_level + 5 + 1 ) #define StrCmd_time ( StrCmd_otim + 4 + 1 ) #define StrCmd_post ( StrCmd_time + 4 + 1 ) #define StrCmd_nopost ( StrCmd_post + 4 + 1 ) #define StrCmd_st ( StrCmd_nopost + 6 + 1 ) #define StrCmd_sd ( StrCmd_st + 2 + 1 ) #define StrCmd_xboard ( StrCmd_sd + 2 + 1 ) #define StrCmd_go ( StrCmd_xboard + 6 + 1 ) #define StrCmd_setboard ( StrCmd_go + 2 + 1 ) #define StrCmd_movenow ( StrCmd_setboard + 8 + 1 ) #define StrCmd_force ( StrCmd_movenow + 1 + 1 ) #define StrCmd_new ( StrCmd_force + 5 + 1 ) #define StrCmd_remove ( StrCmd_new + 3 + 1 ) #define StrCmd_undo ( StrCmd_remove + 6 + 1 ) #define StrCmd_quit ( StrCmd_undo + 4 + 1 )
This also forces me not to use the strings directly. If the strings need to change (say, I need to Greek version of the xboard commands), then I only need to change in one known place.
Now back to the squre PV array..., in Thinker, the max ply is 128. The square PV array approach would translate to 64K. 32K of that will never be used. For others, its nothing. To me, that's a lot. On a Pocket PC, that may be the difference between running and running out of stack and terminating.
Some developers don't go though this type of detail, but I do. Some don't bother designing (the engine just ends up as one giant chunk of intermingled code), but I do.
But then again, that's just me. To each his own, of course.
Would not be very elegant IMHO because changing the size of the array would become problematic.
-
- Posts: 9773
- Joined: Wed Mar 08, 2006 8:44 pm
- Location: Amman,Jordan
Re: Thinker output
Can you please go to the programmers forum and continue your discution there,I feel ashamed of my ignorance when I read your posts
Dr.D
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
-
- Posts: 388
- Joined: Wed Mar 08, 2006 10:08 pm
Re: Thinker output
Well, x86 is no the only platform in this world. The context here is actually the Pocket PC.bob wrote:Blowing the stack in x86-land is going to take a _big_ set of pushes...CThinker wrote:Actually, I was not saying that the PV array will go to the stack. Instead, what I was staying is that there is now less memory that can be given to you as stack memory because that memory has been allocated somewhere else. With smaller stack, you run in to the chance of overrunning it. You go a few plies deep, and your program could just crash.michiguel wrote: Miguel
Edit: You do not need to have the array in the stack.
Memory is not free. If you use it in one place, you now have less for others. That 32K could be used for the pawn hash table, for example, which is just about right on a mobile phone.
Cheers...
-
- Posts: 388
- Joined: Wed Mar 08, 2006 10:08 pm
Re: Thinker output
I used to code that way even with C in environments with very limited memory (e.g., embedded systems). I allows my full control of memory usage.sje wrote:It is not necessary to code the search recursively. In fact, there are some good reasons for an iterative coding and one of them is the elimination of a stack overflow hazard.CThinker wrote:Actually, I was not saying that the PV array will go to the stack. Instead, what I was staying is that there is now less memory that can be given to you as stack memory because that memory has been allocated somewhere else. With smaller stack, you run in to the chance of overrunning it. You go a few plies deep, and your program could just crash.
Some of the early CC pioneers used iterative coding in part because of the unavailability of high level languages supporting recursion.
With this approach however, you are just managing your own stack instead of making use of the machine stack. I both cases, you need memory.
If you manage your own stack, you have to allocate that, and if there is no available memory, you would still fail.
-
- Posts: 388
- Joined: Wed Mar 08, 2006 10:08 pm
Re: Thinker output
Speed is actually not a concern here. This is for use by the protocol handler. The dictionary look-up is twice more code than a simple iteration.sje wrote:Hint: Have the strings appear in ASCII order. Then you can use bsearch() or the like for a quick look-up.CThinker wrote:Code: Select all
// command strings static const char StrCmd[] = "computer" "\0" "easy" "\0" "hard" "\0" "level" "\0" "otim" "\0" "time" "\0" "post" "\0" "nopost" "\0" "st" "\0" "sd" "\0" "xboard" "\0" "go" "\0" "setboard" "\0" "?" "\0" "force" "\0" "new" "\0" "remove" "\0" "undo" "\0" "quit" "\0" ;
Of course, both are small code. But again, I would take 50 bytes over 100 bytes any day. You save 50 here, 40 there, and before you know it, your code is half the size.
The Thinker EXE is only 55K, and that is because it is compiled for speed. When compiled for size, it comes down to 45K. When the bitboard board class is replaced with the mailbox board class, it comes down to 35K, but is now 30% slower. This is what is being ported to the DS.