Gaviota EGTBs, interface proposal for programmers

mcostalba · Post by **mcostalba** » Mon Dec 14, 2009 12:54 pm

michiguel wrote:

#if !defined&#40;H_GTB&#41;
#define H_GTB
/*>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>*/

/* this includes a boolean type bool_t */
#include "bool_t.h"

/* this includes SQUARE and SQ_CONTENT types */
#include "tbtypes.h"

Please inline content and avoid headers, I would like to see what are the types I am using here.

michiguel wrote:

Code: Select all


enum TB_return_values &#123;	tb_DRAW = 0, tb_WMATE = 1, tb_BMATE = 2, tb_FORBID = 3, tb_UNKNOWN = 7&#125;;

Why piece definitions and side to move and castling are not defined here (as enum, not as defines) ?

michiguel wrote:

Code: Select all


		/*----------------------------------*\
		|	Initialization
		|
		|	Include somthing like this at 
		|	the beginning of the program.	
		\*----------------------------------*/

		tb_init&#40;);
		tb_setcompression &#40;1&#41;;
		tb_setpath ("/home/user/tb/");
		tb_addpath ("/media/bigdisk/tb/");
		tbcache_init &#40;64*1024*1024 /* cache size 64M */, 32*1024 /* block size 32k */);
		tbcache_on&#40;);

Why 5 init routines instead of one ?

It has a sense only if you foreseen the possibility to use the routines independently, and with different number of times, for instance you call tb_init() once and then call two times tb_setpath() duirng engine life time. Is is a possible secnario ?

Otherwise I suggest only _one_ init routine, it is more correct also from a theoretical point of view.

michiguel wrote:

Code: Select all

		/*--------------------------------------*\
		|
		|	ASSIGNING POSITIONAL VALUES for 
		|	one probing example
		|
		|	Position we try to probe&#58;
		|	
		|	1r6/6k1/8/8/8/8/1P6/1KR5 w - - 0 1 
		|	
		\*--------------------------------------*/

		stm =  WHITE_TO_MOVE; 	/* 0 = white to move, 1 = black to move */
		ep_square = NOSQUARE; 	/* no ep available */
		castl = 0; 				/* No castling available */

		ws&#91;0&#93; = B1;
		ws&#91;1&#93; = C1;
		ws&#91;2&#93; = B2;
		ws&#91;3&#93; = NOSQUARE;

		wp&#91;0&#93; = KING;
		wp&#91;1&#93; = ROOK;
		wp&#91;2&#93; = PAWN;
		wp&#91;3&#93; = NOPIECE;

		bs&#91;0&#93; = G7;
		bs&#91;1&#93; = B8;
		bs&#91;2&#93; = NOSQUARE;

		bp&#91;0&#93; = KING;
		bp&#91;1&#93; = ROOK;
		bp&#91;2&#93; = NOPIECE;

I don't know egbt internals, but this piece coding multiplexing it seems very slow and redundant. As a guide line I would try to avoid as much as possible positions transcoding, so that if internally your implementation uses an hash key to look up in the egbt table I woul prefer to pass directly that key instead of coding position in an array that will be decoded in a hash key (that is the functional entity) internally.

michiguel wrote:
Code: Select all
			return EXIT_SUCCESS;

I suggest to post some beta versions and to promote to 1.0 only when API is stabilized (it will take time, be sure) because any variation of the API from that point on will require a major release change.

Dann Corbit · Post by **Dann Corbit** » Mon Dec 14, 2009 9:30 pm

zamar wrote:Interface looks really nice and easy to use!!

I don't understand too well the internals of EGTB, so is there some reason why program using TBs should specify internal memory block size?

I guess because the EGTB designer has no way to know if you are on a Windows CE handheld with 256MB RAM total or on a 64-way server with 256TB RAM.

extern bool_t tbcache_init (size_t cache_mem, size_t block_mem);

Another thing is that for Nalimov TBs UCI specifies:

* NalimovPath, type string:
this is the path on the hard disk to the Nalimov compressed format.

* NalimovCache, type spin:
this is the size in MB for the cache for the nalimov table bases

I hope that when you release your EGTBs, you could give recommendations for the used UCI option names, so that we can get some standardization in here.

BallicoraPath, type string:
this is the path on the hard disk to the Ballicora compressed format.

BallicoraCache, type spin:
this is the size in MB for the cache for the Ballicora table bases

Dann Corbit · Post by **Dann Corbit** » Mon Dec 14, 2009 9:31 pm

P.S.
BallicoraPath and BallicoraCache are also much more musical than NalimovPath and NalimovCache.

jshriver · Post by **jshriver** » Tue Dec 15, 2009 4:27 am

Wonderful work. Still working on the 5men. Guess my spare server is a little slow, it's been cranking on the 5men for over a week and half now, but looks like it's near completion.

Anxious to test your access code but no stress

-Josh

michiguel · Post by **michiguel** » Tue Dec 15, 2009 6:01 am

Michel wrote:
As long as it is thread safe, I have no problem with that.
Thread safety has nothing to do with it. An api should not contain global state because it dictates global state in all clients.

This brings an interesting point.

There is nothing really global after tb_init(). It initializes read-only tables and things that are kept "static", hidden from the rest of the program, and private for that "module". The only problem with that is that it could become thread unsafe if there is a write/read block of memory. That part is the cache, but its access protected by a mutex to make it thread safe. Of course, this may become a performance bottleneck, but without the cache there will be several threads trying to access the hard drive all over the place making things even worse.

I do not know if this answers your concerns. The only pointer I can imagine could be returned is one to the cache. This could be useful if each thread is allowed to have their own cache. I do not think that it is a good idea. There will be data duplicated in several caches, which implies that parts of the HD were accessed more than it was needed.

Miguel

michiguel · Post by **michiguel** » Tue Dec 15, 2009 6:33 am

mcostalba wrote:
michiguel wrote:
Code: Select all
#if !defined&#40;H_GTB&#41;
#define H_GTB
/*>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>*/

/* this includes a boolean type bool_t */
#include "bool_t.h"

/* this includes SQUARE and SQ_CONTENT types */
#include "tbtypes.h"
Please inline content and avoid headers, I would like to see what are the types I am using here.

Even better, I am going to make everything an int, and keep the C spirit.

michiguel wrote:
Code: Select all
enum TB_return_values &#123;	tb_DRAW = 0, tb_WMATE = 1, tb_BMATE = 2, tb_FORBID = 3, tb_UNKNOWN = 7&#125;;
Why piece definitions and side to move and castling are not defined here (as enum, not as defines) ?

No valid reason, I just copied and pasted to quickly get an example. There were some reasons why I wanted to explicitly state that some pieces are unsigned, rather than regular ints, but not here.

michiguel wrote:
Code: Select all
		/*----------------------------------*\
		|	Initialization
		|
		|	Include somthing like this at 
		|	the beginning of the program.	
		\*----------------------------------*/

		tb_init&#40;);
		tb_setcompression &#40;1&#41;;
		tb_setpath ("/home/user/tb/");
		tb_addpath ("/media/bigdisk/tb/");
		tbcache_init &#40;64*1024*1024 /* cache size 64M */, 32*1024 /* block size 32k */);
		tbcache_on&#40;);
Why 5 init routines instead of one ?

It has a sense only if you foreseen the possibility to use the routines independently, and with different number of times, for instance you call tb_init() once and then call two times tb_setpath() duirng engine life time. Is is a possible secnario ?

Yes it is a possible scenario.

I can combine tb_init and tb_compression into one, but I would not like to combine the others. tb_addpath should be able to be called many times because there is a chance that I have several paths to access files spread in several disks. Allowing to call tb_addpath many times allows the "protocol" to be flexible enough so the number of paths is limited only by memory. I can eliminate tb_setpath and use only tb_addpath (that will assume that after tb_init there is no default path available).

tb_cache_init(), tb_cache_done() should be allowed to be called at all times. When there are 32 pieces on the board, maybe I do not want to have any cache at all and use that memory to have a bigger hash table. Or maybe I want to resize the cache as the game progresses.

Otherwise I suggest only _one_ init routine, it is more correct also from a theoretical point of view.
michiguel wrote:
Code: Select all
		/*--------------------------------------*\
		|
		|	ASSIGNING POSITIONAL VALUES for 
		|	one probing example
		|
		|	Position we try to probe&#58;
		|	
		|	1r6/6k1/8/8/8/8/1P6/1KR5 w - - 0 1 
		|	
		\*--------------------------------------*/

		stm =  WHITE_TO_MOVE; 	/* 0 = white to move, 1 = black to move */
		ep_square = NOSQUARE; 	/* no ep available */
		castl = 0; 				/* No castling available */

		ws&#91;0&#93; = B1;
		ws&#91;1&#93; = C1;
		ws&#91;2&#93; = B2;
		ws&#91;3&#93; = NOSQUARE;

		wp&#91;0&#93; = KING;
		wp&#91;1&#93; = ROOK;
		wp&#91;2&#93; = PAWN;
		wp&#91;3&#93; = NOPIECE;

		bs&#91;0&#93; = G7;
		bs&#91;1&#93; = B8;
		bs&#91;2&#93; = NOSQUARE;

		bp&#91;0&#93; = KING;
		bp&#91;1&#93; = ROOK;
		bp&#91;2&#93; = NOPIECE;
I don't know egbt internals, but this piece coding multiplexing it seems very slow and redundant. As a guide line I would try to avoid as much as possible positions transcoding, so that if internally your implementation uses an hash key to look up in the egbt table I woul prefer to pass directly that key instead of coding position in an array that will be decoded in a hash key (that is the functional entity) internally.

You really do not want to go low level on this because it is messy. Each file has their own function for it. Hiding the whole process will allow this interface to support 6-men TBs without any single change in the engine's code. You probe in exactly the same way.

Besides, you will not have any performance gain because to obtain the indexes you will need to go through a transformation like this anyway. In addition, the huge bottleneck is in the number of times the HD is accessed. Listing the pieces and squares is really cheap in comparison.

michiguel wrote:
Code: Select all
			return EXIT_SUCCESS;
I suggest to post some beta versions and to promote to 1.0 only when API is stabilized (it will take time, be sure) because any variation of the API from that point on will require a major release change.

That is the plan.

Miguel

michiguel · Post by **michiguel** » Tue Dec 15, 2009 6:46 am

An improved version based on comments. I removed the building routines (not needed for engines who would only probe).

Miguel

Code: Select all

#if !defined&#40;H_GTB&#41;
#define H_GTB
#ifdef __cplusplus
extern "C" &#123;
#endif
/*>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>*/

/* needed for size_t */
#include <stdlib.h>

/*----------------------------------*\
|				ENUMS
\*----------------------------------*/

enum TB_return_values &#123;	
	tb_DRAW = 0, 
	tb_WMATE = 1, 
	tb_BMATE = 2, 
	tb_FORBID = 3, 
	tb_UNKNOWN = 7
&#125;;

enum TB_pieces 	&#123; 
	tb_NOPIECE, tb_PAWN, tb_KNIGHT, tb_BISHOP, tb_ROOK, tb_QUEEN, tb_KING 
&#125;;

enum TB_sides &#123;
	tb_WHITETOMOVE, tb_BLACKTOMOVE
&#125;;

enum TB_squares &#123;
	tb_A1, tb_B1, tb_C1, tb_D1, tb_E1, tb_F1, tb_G1, tb_H1,
	tb_A2, tb_B2, tb_C2, tb_D2, tb_E2, tb_F2, tb_G2, tb_H2,
	tb_A3, tb_B3, tb_C3, tb_D3, tb_E3, tb_F3, tb_G3, tb_H3,
	tb_A4, tb_B4, tb_C4, tb_D4, tb_E4, tb_F4, tb_G4, tb_H4,
	tb_A5, tb_B5, tb_C5, tb_D5, tb_E5, tb_F5, tb_G5, tb_H5,
	tb_A6, tb_B6, tb_C6, tb_D6, tb_E6, tb_F6, tb_G6, tb_H6,
	tb_A7, tb_B7, tb_C7, tb_D7, tb_E7, tb_F7, tb_G7, tb_H7,
	tb_A8, tb_B8, tb_C8, tb_D8, tb_E8, tb_F8, tb_G8, tb_H8,
	tb_NOSQUARE
&#125;;

enum TB_castling &#123;
	tb_NOCASTLE = 0,
	tb_WOO  = 8, 
	tb_WOOO = 4, 	
	tb_BOO  = 2, 
	tb_BOOO = 1 
&#125;;

enum TB_compression_scheme &#123;
	tb_UNCOMPRESSED, tb_CP0, tb_CP1
&#125;;


/*----------------------------------*\
|			FUNCTIONS
\*----------------------------------*/

extern void		tb_init &#40;int compression_scheme&#41;;
extern void		tb_done &#40;void&#41;;

extern int		tb_probe 
							&#40;int stm, 
							 int castles,
							 int epsq,
							 const int *inp_wSQ, 
							 const int *inp_bSQ,
							 const int *inp_wPC, 
							 const int *inp_bPC,
							 /*@out@*/ int *tbinfo, 
							 /*@out@*/ int *plies&#41;;
							
/* tb cache routines */

extern int	tbcache_init &#40;size_t cache_mem, size_t block_mem&#41;;
extern void	tbcache_done &#40;void&#41;;
extern void	tbcache_on&#40;);
extern void	tbcache_off&#40;);


/* for path */

extern int	tb_setpath &#40;const char *path&#41;;
extern int	tb_addpath &#40;const char *path&#41;;


/* testing routines */

extern int	tb_set_testlongest &#40;int men&#41;;

/*<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<*/
#ifdef __cplusplus
&#125;
#endif 
#endif

Example code for probing:

Code: Select all

#include <stdlib.h>
#include "gtb.h"

int main &#40;void&#41;
&#123;
		/*----------------------------------*\
		|	Info to provide&#58;
		|	Elements that define a position
		\*----------------------------------*/

		int	stm;			/* side to move */
		int	ep_square;		/* target square for an en passant capture */ 
		int	castl; 			/* castling availability, 0 => no castles avail */
		int	ws&#91;17&#93;, bs&#91;17&#93;; /* list of squares for white and black */
		int	wp&#91;17&#93;, bp&#91;17&#93;; /* what pieces are on those squares */

		/*----------------------------------*\
		|	Info requested
		\*----------------------------------*/

		int	info = tb_UNKNOWN; 	/* default, no tbvalue */
		int	pliestomate = -1;	/* default&#58; -1, unknown value */
		int	tb_available = 0;	/* default&#58; 0, not available */

		/*----------------------------------*\
		|	Initialization&#58;
		|	Include something like this at 
		|	the beginning of the program.	
		\*----------------------------------*/

		tb_init&#40;tb_CP1&#41;;
		tb_addpath ("/home/user/tb/");
		tb_addpath ("/media/bigdisk/tb/");
		tb_cache_init &#40;64*1024*1024 /* cache size 64M */, 32*1024 /* block size 32k */);
		tb_cache_on&#40;);
	
		/*--------------------------------------*\
		|
		|	ASSIGNING POSITIONAL VALUES for 
		|	one probing example
		|
		|	Position we try to probe&#58;
		|	
		|	1r6/6k1/8/8/8/8/1P6/1KR5 w - - 0 1 
		|	
		\*--------------------------------------*/

		stm      = tb_WHITETOMOVE; 	/* 0 = white to move, 1 = black to move */
		epsquare = tb_NOSQUARE; 	/* no ep available */
		castling = tb_NOCASTLE; 	/* no castling available */

		ws&#91;0&#93; = tb_B1;
		ws&#91;1&#93; = tb_C1;
		ws&#91;2&#93; = tb_B2;
		ws&#91;3&#93; = tb_NOSQUARE;

		wp&#91;0&#93; = tb_KING;
		wp&#91;1&#93; = tb_ROOK;
		wp&#91;2&#93; = tb_PAWN;
		wp&#91;3&#93; = tb_NOPIECE;

		bs&#91;0&#93; = tb_G7;
		bs&#91;1&#93; = tb_B8;
		bs&#91;2&#93; = tb_NOSQUARE;

		bp&#91;0&#93; = tb_KING;
		bp&#91;1&#93; = tb_ROOK;
		bp&#91;2&#93; = tb_NOPIECE;


		/*----------------------------------*\
		|
		|		PROBING TBs
		|	
		\*----------------------------------*/

		tb_available = tb_probe &#40;stm, castl, ep_square, ws, bs, wp, bp, &info, &pliestomate&#41;;

		if &#40;tb_available&#41; &#123;

			if &#40;info == tb_DRAW&#41;
				printf ("Draw\n");
			else if &#40;info == tb_WMATE && stm == tb_WHITE_TO_MOVE&#41;
 				printf ("White mates, plies=%d\n", pliestomate&#41;;
			else if &#40;info == tb_BMATE && stm == tb_BLACK_TO_MOVE&#41;
 				printf ("Black mates, plies=%d\n", pliestomate&#41;;
			else if &#40;info == tb_WMATE && stm == tb_BLACK_TO_MOVE&#41;
 				printf ("Black is mated, plies=%d\n", pliestomate&#41;;
			else if &#40;info == tb_BMATE && stm == tb_WHITE_TO_MOVE&#41;
 				printf ("White is mated, plies=%d\n", pliestomate&#41;;			
			else &#123;
 				printf ("FATAL ERROR, This should never be reached\n");	
			&#125;
		&#125; else &#123;
 				printf ("Tablebase not available\n");	
		&#125;


		/*----------------------------------*\
		|			Clean up
		\*----------------------------------*/

		tb_cache_done&#40;);
		tb_done&#40;);

		/*----------------------------------*\
		|			Return
		\*----------------------------------*/

		if &#40;tb_available&#41;
			return EXIT_SUCCESS;
		else
			return EXIT_FAILURE;
&#125;

Aaron Becker · Post by **Aaron Becker** » Tue Dec 15, 2009 7:13 am

The stuff about compression schemes is not exactly clear to me. Are CP0 and CP1 alternate ways of compressing the tablebases? Maybe a more descriptive name is in order. Even better would be the ability to determine what kind of compression is used automatically. If you're using standard compressed file formats this is pretty easy. For example, gzipped files consistently begin with the bytes "1F 8B 08", and 7z files start with "37 7A BC AF 27 1C".

michiguel · Post by **michiguel** » Tue Dec 15, 2009 7:38 am

Aaron Becker wrote:The stuff about compression schemes is not exactly clear to me. Are CP0 and CP1 alternate ways of compressing the tablebases? Maybe a more descriptive name is in order. Even better would be the ability to determine what kind of compression is used automatically. If you're using standard compressed file formats this is pretty easy. For example, gzipped files consistently begin with the bytes "1F 8B 08", and 7z files start with "37 7A BC AF 27 1C".

I could use tb_COMPRESSED and tb_UNCOMPRESSED, but I was trying to emphasize that this enum was not "boolean" in nature, but that there could be alternative schemes in the future. Realistically, I think that once we settle with one scheme it won't change so easily.

Currently, I have an uncompressed version that ends in .gtb and two alternative compressed versions that end in .gtb.cp0 and .gtb.cp1. The first one was just a pilot experiment with huffman coding (38.5 Gb --> 12 Gb) and the second was a scheme using the lzma library (38.5 --> 6.5 Gb). Realistically, there will be two options for now, the uncompressed and the lzma. I could call it tb_LZMA rather than tb_CP1. Note that even though I use the lzma library, the files are not lzma. Some chunks are lzma compressed but the file needs to have those chunks arranged in a special format.

Automatic detection (tb_AUTO?) should be possible by looking at the extension, besides internal codes.

Miguel

mcostalba · Post by **mcostalba** » Tue Dec 15, 2009 10:04 am

michiguel wrote:

Code: Select all

		int	ws&#91;17&#93;, bs&#91;17&#93;; /* list of squares for white and black */
		int	wp&#91;17&#93;, bp&#91;17&#93;; /* what pieces are on those squares */

Why not

Code: Select all

int pieceList&#91;2&#93;&#91;8&#93;&#91;16&#93;; // &#91;color&#93;&#91;pieceType&#93;&#91;index&#93;
int index&#91;64&#93;; // &#91;square&#93;

Instead of ws, bs, wp, bp arrays ?

I understand this is a request specific to satisfy how lists are defined in Stockfish, but if it doesn't hurt performance I would propose that. And I don't think it will hurt performance (at least from user point of view) because in a real engine the piece lists will be updated incrementally in do_move(), not from scratch.

Anyhow, in case of this library will be used in SF the API (and of course internals) will be changed anyway

to be adapted to SF. So if you don't see negative side effects please consider this list layout in first instance.

In case data is on disk, I agree with you has no sense talking about this, but I am not sure a 6 TB on disk will result in a stronger engine then one using, say 4 TB in RAM, especially if this engine has _already_ a good handling of endgame positions.

Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers