Thinker output

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: PV generation

Post by michiguel »

sje wrote:
CThinker wrote:Thanks for the code. I have already mentioned this approach. That to me is a lot of stack allocation. In most cases, you barely use a fraction of that local PV array. And if you implement that as a dynamic list, that would be slow.
A PV is implemented as a list and it's not slow at all due to storage recycling; total allocation costs are too low to be measured. It's fast as well; remember that only links are changed; move data is never copied.
This is what I do. I do not claim this is efficient, or anything like that. It is just the way I understand it better. At least, it shows no impact at all when I profile it.

variation_calloc is not a real "calloc". It just returns a pointer to a position in a buffer that is not in the stack. variation_append() and variation_copy() are self explanatory.

when I call search, a pointer returns a pv (that can be empty in non-pv nodes etc.)

Two variations are kept, "pva_curr", which is the one being examined, and pv_best.
When there is a change in subvariation, pointers are swapped.

I do not think this is too much code to pollute the search. I admit I have to have a module variation.c that contains function definitions such as variation_calloc(), variation_append() etc. etc.; but everything is hidden there. You can even limit PVs to 20 plies or whatever, change how this is implemented etc., without touching anything but "variation.c". In fact, you can even have a compiler switch to eliminate all the code in case that you want to port it to a wristwatch, but it is certainly beyond my intentions :-).
I am just happy understanding what I do.

Everybody is welcome to find flaws on this. Please do. I mean it.

Lance: I respect your decision to design your engine as you like and I do not think you are breaking any protocol or any sacred "etiquette". However, I hope you realize that this decision implies missing a feature as important as time management. For someone who uses an engine for analysis, PV is as important as time management is for the people who uses an engine in competitions.

Miguel

Code: Select all


typedef struct {
	uint32_t		n;
	move_t			mv [ MAX_VARIATION_MOVES ];
}													VARIATION;

search  ( POSITION * po
		, int depth_input
		, int plylevel
		, eval_t alpha
		, eval_t beta
		, VARIATION *pva                    <== asking for a PV
		, VARIATION *pPVAR
		, bool_t *timeup
		, bool_t *out_rep_flag
		, unsigned int sflags)
{
	VARIATION *	pva_best = variation_calloc (thread_id);    /*  ALLOCATION  OF  */
	VARIATION *	pva_curr = variation_calloc (thread_id);    /* VARIATION MEMORY */

	/* typical stuff here... ==> makemove, search, unmakemove etc */ 

		value = -search (po
						, depth-1
						, plylevel+1
						, -(alpha+1)
						, -alpha
						, pva_curr            <= will get a PV returned
						, NULL
						, timeup
						, &rep_flag
						, sflags);

		/* etc (if value > best) best  = value etc. etc. */

		if (best  > alpha) {
			alpha = best;
			variation_swap (&pva_best, &pva_curr);   <== swap pointers
		}

	/* at the end of search in PV nodes... */
	if (alpha > alpha_ori) {              /* improving_move_found) */
		variation_append (pva_best, bestmove);   
		variation_copy   (pva, pva_best);
	} 

	/* before returning */
	variation_free (tid, pva_curr);
	variation_free (tid, pva_best);
	return adjust_out(best);		/* adjust to drive the checkmates */

} /* end search */

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Thinker output

Post by bob »

Zach Wegner wrote:
bob wrote:I like the approach I use, which is a pv[maxply][maxply]

At any depth, you only store into pv[ply][ply] for a terminal move. As you back up you only back up a part of this array. When you back up from 10 to 9, you copy everything from pv[10][0 to endofpv] and then save the current move in pv[9[[9] to go along with the partial PV being backed up... This is executed _very_ few times and costs nothing as a result.
I never understood this [ply][ply] nonsense. It was in TSCP, which was one of the first programs I ever saw. When I changed from the approach I just described to the two-dimensional array, I use [ply][0]. When you back up a PV, you store the current move in [ply][0] and copy the next ply's starting at [ply][1]. Saves an add. :)
what add?

You do a 10 ply search and at ply 10, depth=0, you evaluate and start to back up. You have no move here, just a score.

At ply=9, when you complete, you store current move in pv[8][9] and you are done.

At ply=8, you store current move in pv[7][8] aand copy the rest of pv[8][n] where n= 9 only.

At ply=7 you store the current move in pv[6][7], and then copy pv[7][8 & 9] to pv[6]

...

at ply=1 you store current move in pv[0][1] and copy pv[1][2,3,4,5,6,7,8,9] to pv[0].

Very little copying, hardly ever done...
CThinker
Posts: 388
Joined: Wed Mar 08, 2006 10:08 pm

Re: Thinker output

Post by CThinker »

Graham Banks wrote:
CThinker wrote: Sometimes I get the feeling that nobody cares about code elegance anymore.
Whilst I can understand where you're coming from from a programming point of view, program features are important to users.
Yes, I agree with you. That is why I have mentioned in another post that we would have to consider such scenarios as game analysis in future major redesigns. Who knows, we might actually improve on the PV-style analysis that people see today.

Developers do have a limit on development resources (like, time). And so you pick on a set of features that you can implement, and in our case, we implement them efficiently and cleanly.

As far as simplistic engines go, I think Thinker is not that bad in feature set. Neither TSCP nor MicroMax ponders. Neither have run-time configurable hash table size. Many other engines that are several times more complex that Thinker don't support multiple processors (including some commercial ones).

I have a feeling that if you ask Harm to add the parallel search feature to MicroMax, he would probably decline. Its just not in-line with what MicroMax is.

For now, it may be best for users to think of Thinker as just a slightly better TSCP (Thinker Simple Chess Program).
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Thinker output

Post by bob »

CThinker wrote:
bob wrote:
Zach Wegner wrote:
CThinker wrote:The ways that other engines collect PV arrays look very hacky to me. They either allocate a square array, but use only half of it, or, allocates PV arrays on the stack on each call to a search function. Neither is acceptable to me. So, until I device a really clean solution, I am not inclined on polluting the Thinker code.
One idea from Vincent is to make an array pv_stack[MAX_PLY * MAX_PLY / 2]; and then keep a pointer to the first and last entry into it for each ply. Whenever you get a new PV, you save it starting at the first pointer, and then save the last entry. You then pass the entry past the last move (also you need a marker to signify the end of a PV) to the next ply.

ZCT used to do this but I got rid of it because of the copying of PVs in SMP. One CPU could be searching a node with current PV of length 1, then another CPU backs up a PV of length 3. You either have to truncate it or shift the current PV forward in the array to make room for it.
I like the approach I use, which is a pv[maxply][maxply]

At any depth, you only store into pv[ply][ply] for a terminal move. As you back up you only back up a part of this array. When you back up from 10 to 9, you copy everything from pv[10][0 to endofpv] and then save the current move in pv[9[[9] to go along with the partial PV being backed up... This is executed _very_ few times and costs nothing as a result.
Yup, I am aware of this approach, but it does not appeal to me. Half of that array is unused.
And that is a problem because... ???
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Thinker output

Post by bob »

CThinker wrote:
sje wrote:
CThinker wrote:
bob wrote:I like the approach I use, which is a pv[maxply][maxply]

At any depth, you only store into pv[ply][ply] for a terminal move. As you back up you only back up a part of this array. When you back up from 10 to 9, you copy everything from pv[10][0 to endofpv] and then save the current move in pv[9[[9] to go along with the partial PV being backed up... This is executed _very_ few times and costs nothing as a result.
Yup, I am aware of this approach, but it does not appeal to me. Half of that array is unused.
Let's see...

maxply = 64; maxply * maxply = 4,096; move size = 4 bytes; total storage = 16 KB; wasted = 8 KB

RAM price per high quality, fast GB = US$32

8 KB / 1 GB = (ca) 7.6e-6

Cost of wasted RAM: about 1/40 US penny.
----

The [maxply][maxply] technique is not bad, and I think that many engines use it. I don't because I don't want any fixed maxply numbers in my program.
To me, every line of code counts. Every run-time memory byte counts.

When you move into other platforms where memory usage is premium (not the memory cost, but run-time usage), you can see why the Thinker implementation pays off. Check out this comparison of Pocket PC engines by Alexandr. You will be surprised.
http://raitpref.hotbox.ru/ct001-r.htm

This explains why I'm not worried when Bob said that the Nintendo DS has slow processor and small memory. The Thinker port to that platform will do just fine (I think).

Sometimes I get the feeling that nobody cares about code elegance anymore.
I care about elegance _and_ speed. the 2d array is extremely fast. I use a simple memcpy() to back up the pv from one ply to the previous one... not even a loop.
CThinker
Posts: 388
Joined: Wed Mar 08, 2006 10:08 pm

Re: Thinker output

Post by CThinker »

bob wrote:
CThinker wrote:
bob wrote:
Zach Wegner wrote:
CThinker wrote:The ways that other engines collect PV arrays look very hacky to me. They either allocate a square array, but use only half of it, or, allocates PV arrays on the stack on each call to a search function. Neither is acceptable to me. So, until I device a really clean solution, I am not inclined on polluting the Thinker code.
One idea from Vincent is to make an array pv_stack[MAX_PLY * MAX_PLY / 2]; and then keep a pointer to the first and last entry into it for each ply. Whenever you get a new PV, you save it starting at the first pointer, and then save the last entry. You then pass the entry past the last move (also you need a marker to signify the end of a PV) to the next ply.

ZCT used to do this but I got rid of it because of the copying of PVs in SMP. One CPU could be searching a node with current PV of length 1, then another CPU backs up a PV of length 3. You either have to truncate it or shift the current PV forward in the array to make room for it.
I like the approach I use, which is a pv[maxply][maxply]

At any depth, you only store into pv[ply][ply] for a terminal move. As you back up you only back up a part of this array. When you back up from 10 to 9, you copy everything from pv[10][0 to endofpv] and then save the current move in pv[9[[9] to go along with the partial PV being backed up... This is executed _very_ few times and costs nothing as a result.
Yup, I am aware of this approach, but it does not appeal to me. Half of that array is unused.
And that is a problem because... ???
Well, I guess to you there is no problem then. That's OK.
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Thinker output

Post by Tord Romstad »

bob wrote:
glorfindel wrote:So how fast does crafty run on a DS? Do you have an estimation of playing strength perhaps?
It's been a while, I believe it was in the 30K nodes per second range. I was asked to do this for a game company a couple of years ago. I had to get rid of rotated bitboards due to space and did a direct calculation to generate moves, which is about 10% slower than rotated/magic. I also had to remove lots of "extras" and use a _really_ small hash table.
30 kN/s? That's incredibly fast! The Nintendo DS, according to Wikipedia, has a 67 MHz ARM CPU. Glaurung, when running on the much faster 412 MHz ARM CPU of the Apple iPhone, searches about 12-15 kN/s. Of course, Crafty is a lot faster than Glaurung on all platforms, but not that much faster.

Do you remember what you used instead of rotated bitboards?

Tord
Spock

Re: Thinker output

Post by Spock »

CThinker wrote: Sometimes I get the feeling that nobody cares about code elegance anymore.
It's your engine Lance and your code, you must do what you feel is right for you. And furthermore you share the engine with us at no cost. So, from my point of view, don't feel under any pressure over this issue.
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Thinker output

Post by Tord Romstad »

CThinker wrote:Sometimes I get the feeling that nobody cares about code elegance anymore.
My impression is that many still do, but they very rarely program in C or C++. Both languages have very little appeal for those who care about elegance.

Personally, I do care about code elegance, although you wouldn't believe it from looking at the Glaurung source code. I just lack the skills to write elegant C++, and moreover I do care about features and am always impatient to implement them quickly, therefore my code ends up as a horrible mess.

It is interesting that your program is so different from mine: I think your goals are similar to mine in many ways, and we still end up doing almost everything differently. :)

Tord
CThinker
Posts: 388
Joined: Wed Mar 08, 2006 10:08 pm

Re: Thinker output

Post by CThinker »

Tord Romstad wrote:
bob wrote:
glorfindel wrote:So how fast does crafty run on a DS? Do you have an estimation of playing strength perhaps?
It's been a while, I believe it was in the 30K nodes per second range. I was asked to do this for a game company a couple of years ago. I had to get rid of rotated bitboards due to space and did a direct calculation to generate moves, which is about 10% slower than rotated/magic. I also had to remove lots of "extras" and use a _really_ small hash table.
30 kN/s? That's incredibly fast! The Nintendo DS, according to Wikipedia, has a 67 MHz ARM CPU. Glaurung, when running on the much faster 412 MHz ARM CPU of the Apple iPhone, searches about 12-15 kN/s. Of course, Crafty is a lot faster than Glaurung on all platforms, but not that much faster.

Do you remember what you used instead of rotated bitboards?

Tord
I think there is one extra zero there. My estimate is that it is something like 3 knps.

On a 625Mhz ARM (iPaq), Crafty searches at around 20 knps. That machine is about 7 or 8 times faster than a DS.