Speeding Up The Tuner

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

D Sceviour
Posts: 570
Joined: Mon Jul 20, 2015 5:06 pm

Re: Speeding Up The Tuner

Post by D Sceviour »

Alayan wrote: Sun Sep 06, 2020 8:43 pm With 5x64x64 values, you get over 20K parameters. Some of these combinations are very frequent but many others will be less than 1 in a million. This makes overfitting a massive issue. You don't just need quality but also quantity in the dataset for it to work out, tens of millions of positions is a lower-bound, hundreds of millions may be where diminishing return on dataset size kick in.
This is true even for 6x64 parameters. There is never enough positions to cover all possible combinations. I have never been a big fan of statistical tuning, and I prefer to use experience and intuitive judgement. For 245,760 parameters I will probably add a curve smoothing technique.
So, you can't use a dataset generation method that requires a lot of hardware time, it's unfeasible. You need fast dataset generation even at the cost of accuracy.
What do you mean?
Alayan
Posts: 550
Joined: Tue Nov 19, 2019 8:48 pm
Full name: Alayan Feh

Re: Speeding Up The Tuner

Post by Alayan »

Overfitting can be an issue for 6x64 tables too, I agree. But it's easier to overcome.

And I meant that to tune parameters that have very low activation rate, it's more important to have a high number of dataset positiosn to limit overfitting than to have accurately evaluated positions. If a parameter is activated only once in a position that was winning, correctly rating the position as winning instead of drawing or losing won't help much because that the parameter is activated and the position is winning most likely has only a weak relationship, so trying to reduce the error of the engine eval compared to the dataset result by increasing massively the value of that parameter isn't going to do any good.
Kieren Pearson
Posts: 70
Joined: Tue Dec 31, 2019 2:52 am
Full name: Kieren Pearson

Re: Speeding Up The Tuner

Post by Kieren Pearson »

chrisw wrote: Sun Sep 06, 2020 4:06 pm
Kieren Pearson wrote: Sun Sep 06, 2020 1:39 pm
chrisw wrote: Sun Sep 06, 2020 1:21 pm
D Sceviour wrote: Sun Sep 06, 2020 2:56 am I noticed that 99% of the calculation time for stochastic gradient tuning was spent parsing fen strings, and building positional structures. Only 1% of the time was spent on evaluation. To speed up the tuner I saved a binary file of positional structures to disk. Then, the binary was recalled for tuning. Instead of 5 hours spent tuning a variable, it now takes less than one second.

Most engines have a unique position structure, and a "position_t" is often all that is necessary to evaluate a position. It takes several hours to create the binary file, but it only needs to be done once. Instead of:

if (fgets(line, 256, fin) == NULL)

that programs like Ethereal use with an epd file, to tune now use with the binary file:

while (fread(pos, sizeof(position_t), 1 , f)) {

The binary file could be re-opened into another array for tuning, but disk read is so fast that it is not necessary. Besides, if threading is used then the disk binary file can be used to save memory. The "quiet-labeled.epd" was an old common file used for tuning. It creates a binary of 113,282 KB for the Schooner position_t structure.

A link from someone would be appreciated if there are newer and better tuning epd's.
+1

If there's a public respository to keep them in, I can contribute
42M evaluated EPDs from Ed Schroeders Human database (distilled from LiChess, I think)
8.6M evaluated EPDs distilled from CCRL 40-40
13M evaluated EPDs distilled from CCRL-blitz

evals are at 25 ms, SF11, one thread

The game result is not normally saved in a position structure, so an unused variable is used to store the game result. Here is a sample 'C' code of the technique to convert an epd file and build the binary file:

Code: Select all

void CreateBinaryFile() {
FILE * f;
FILE * f2;
int n = 0;
double result;
position_t *pos = rsd->pos;

   strcpy(Tfilename,"quiet-labeled.epd");
   strcpy(Wfilename,"tune_array.bin");

   f = fopen(Tfilename,"r+");
   if (f == NULL) {
	printf("file not opened\n");
	return;
   }

   f2 = fopen(Wfilename,"wb+");
   if (f2 == NULL) {
	printf("file f2 not opened\n");
	return;
   }

   n = 0;

   while (fgets(fen,128,f)) {
	n+=1;

// the parsing method here was taken from Crafty
	nargs = ReadParse(fen," ;=");

//this loads the fen and builds the position structure in my program
        Command(rsd);

	if (!strcmp(args[5], "\"0-1\"")) {
		result = 0.0;
	} else if (!strcmp(args[5], "\"1-0\"")) {
		result = 1.0;
	} else if (!strcmp(args[5], "\"1/2-1/2\"")) {
		result = 0.5;
	} else {
	   printf("%d bad result\n",n);
	   printf("args[5] %s\n",args[5]);
	   printf("%s\n", fen);
        break;
	}

// save the game result in an unused variable
     pos->static_score = (uint16_t) (result * 2);

// and save the converted fen to disk with a sequential write
     fwrite(pos, sizeof(position_t), 1 , f2);
   }

   fclose (f); fclose(f2);
   printf("%d writes\n",n);
}
I'm very interested in using these datasets. Currently I am using the 725K epd quiet-labelled dataset from zuirchess. This has worked well, but I am looking for a similar dataset which is larger if possible.
Well, if we can find a public repository for swapping/exchanging/downloading you’re welcome to them.
I make a public repo: https://github.com/KierenP/ChessTrainingSets. If possible, make a PR and I'll accept it or if you'd rather pm the datasets to me and I'll put them up
chrisw
Posts: 4317
Joined: Tue Apr 03, 2012 4:28 pm

Re: Speeding Up The Tuner

Post by chrisw »

Kieren Pearson wrote: Mon Sep 07, 2020 8:29 am
chrisw wrote: Sun Sep 06, 2020 4:06 pm
Kieren Pearson wrote: Sun Sep 06, 2020 1:39 pm
chrisw wrote: Sun Sep 06, 2020 1:21 pm
D Sceviour wrote: Sun Sep 06, 2020 2:56 am I noticed that 99% of the calculation time for stochastic gradient tuning was spent parsing fen strings, and building positional structures. Only 1% of the time was spent on evaluation. To speed up the tuner I saved a binary file of positional structures to disk. Then, the binary was recalled for tuning. Instead of 5 hours spent tuning a variable, it now takes less than one second.

Most engines have a unique position structure, and a "position_t" is often all that is necessary to evaluate a position. It takes several hours to create the binary file, but it only needs to be done once. Instead of:

if (fgets(line, 256, fin) == NULL)

that programs like Ethereal use with an epd file, to tune now use with the binary file:

while (fread(pos, sizeof(position_t), 1 , f)) {

The binary file could be re-opened into another array for tuning, but disk read is so fast that it is not necessary. Besides, if threading is used then the disk binary file can be used to save memory. The "quiet-labeled.epd" was an old common file used for tuning. It creates a binary of 113,282 KB for the Schooner position_t structure.

A link from someone would be appreciated if there are newer and better tuning epd's.
+1

If there's a public respository to keep them in, I can contribute
42M evaluated EPDs from Ed Schroeders Human database (distilled from LiChess, I think)
8.6M evaluated EPDs distilled from CCRL 40-40
13M evaluated EPDs distilled from CCRL-blitz

evals are at 25 ms, SF11, one thread

The game result is not normally saved in a position structure, so an unused variable is used to store the game result. Here is a sample 'C' code of the technique to convert an epd file and build the binary file:

Code: Select all

void CreateBinaryFile() {
FILE * f;
FILE * f2;
int n = 0;
double result;
position_t *pos = rsd->pos;

   strcpy(Tfilename,"quiet-labeled.epd");
   strcpy(Wfilename,"tune_array.bin");

   f = fopen(Tfilename,"r+");
   if (f == NULL) {
	printf("file not opened\n");
	return;
   }

   f2 = fopen(Wfilename,"wb+");
   if (f2 == NULL) {
	printf("file f2 not opened\n");
	return;
   }

   n = 0;

   while (fgets(fen,128,f)) {
	n+=1;

// the parsing method here was taken from Crafty
	nargs = ReadParse(fen," ;=");

//this loads the fen and builds the position structure in my program
        Command(rsd);

	if (!strcmp(args[5], "\"0-1\"")) {
		result = 0.0;
	} else if (!strcmp(args[5], "\"1-0\"")) {
		result = 1.0;
	} else if (!strcmp(args[5], "\"1/2-1/2\"")) {
		result = 0.5;
	} else {
	   printf("%d bad result\n",n);
	   printf("args[5] %s\n",args[5]);
	   printf("%s\n", fen);
        break;
	}

// save the game result in an unused variable
     pos->static_score = (uint16_t) (result * 2);

// and save the converted fen to disk with a sequential write
     fwrite(pos, sizeof(position_t), 1 , f2);
   }

   fclose (f); fclose(f2);
   printf("%d writes\n",n);
}
I'm very interested in using these datasets. Currently I am using the 725K epd quiet-labelled dataset from zuirchess. This has worked well, but I am looking for a similar dataset which is larger if possible.
Well, if we can find a public repository for swapping/exchanging/downloading you’re welcome to them.
I make a public repo: https://github.com/KierenP/ChessTrainingSets. If possible, make a PR and I'll accept it or if you'd rather pm the datasets to me and I'll put them up
Oh cool, well done. I tried with github some time ago but quickly overflowed their capacity limit. Do you have an updated account with them where capacity is no problem? Mine amount to a few Gb, even compressed, btw.
AndrewGrant
Posts: 1754
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Speeding Up The Tuner

Post by AndrewGrant »

Joost Buijs wrote: Sun Sep 06, 2020 4:43 pm PST[piece][sq][enemy king] looks interesting, it is a bit like the NNUE approach, I suppose it will take a long time to tune.
So ... shamefully I tried tuning a table of [2][64][64], which is [0][ourKingSquare][ourPawnSquare], [1][theirKingSquare][ourPawnSquare]. And ... it passed STC testing rather quickly at +5.09 elo. Which is ... (hopefully?) probably a lucky result. But the tune did massively reduce the error of the dataset. Of course you can condense this down to [2][64][48], as a side note ... since maybe adding +1500 lines to Ethereal is better than adding +2000.

I'm trying some simplified forms of this [64][64] idea before testing the original one at LTC. First up is [files between][ranks between].
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
Kieren Pearson
Posts: 70
Joined: Tue Dec 31, 2019 2:52 am
Full name: Kieren Pearson

Re: Speeding Up The Tuner

Post by Kieren Pearson »

chrisw wrote: Mon Sep 07, 2020 8:44 am
Kieren Pearson wrote: Mon Sep 07, 2020 8:29 am
chrisw wrote: Sun Sep 06, 2020 4:06 pm
Kieren Pearson wrote: Sun Sep 06, 2020 1:39 pm
chrisw wrote: Sun Sep 06, 2020 1:21 pm
D Sceviour wrote: Sun Sep 06, 2020 2:56 am I noticed that 99% of the calculation time for stochastic gradient tuning was spent parsing fen strings, and building positional structures. Only 1% of the time was spent on evaluation. To speed up the tuner I saved a binary file of positional structures to disk. Then, the binary was recalled for tuning. Instead of 5 hours spent tuning a variable, it now takes less than one second.

Most engines have a unique position structure, and a "position_t" is often all that is necessary to evaluate a position. It takes several hours to create the binary file, but it only needs to be done once. Instead of:

if (fgets(line, 256, fin) == NULL)

that programs like Ethereal use with an epd file, to tune now use with the binary file:

while (fread(pos, sizeof(position_t), 1 , f)) {

The binary file could be re-opened into another array for tuning, but disk read is so fast that it is not necessary. Besides, if threading is used then the disk binary file can be used to save memory. The "quiet-labeled.epd" was an old common file used for tuning. It creates a binary of 113,282 KB for the Schooner position_t structure.

A link from someone would be appreciated if there are newer and better tuning epd's.
+1

If there's a public respository to keep them in, I can contribute
42M evaluated EPDs from Ed Schroeders Human database (distilled from LiChess, I think)
8.6M evaluated EPDs distilled from CCRL 40-40
13M evaluated EPDs distilled from CCRL-blitz

evals are at 25 ms, SF11, one thread

The game result is not normally saved in a position structure, so an unused variable is used to store the game result. Here is a sample 'C' code of the technique to convert an epd file and build the binary file:

Code: Select all

void CreateBinaryFile() {
FILE * f;
FILE * f2;
int n = 0;
double result;
position_t *pos = rsd->pos;

   strcpy(Tfilename,"quiet-labeled.epd");
   strcpy(Wfilename,"tune_array.bin");

   f = fopen(Tfilename,"r+");
   if (f == NULL) {
	printf("file not opened\n");
	return;
   }

   f2 = fopen(Wfilename,"wb+");
   if (f2 == NULL) {
	printf("file f2 not opened\n");
	return;
   }

   n = 0;

   while (fgets(fen,128,f)) {
	n+=1;

// the parsing method here was taken from Crafty
	nargs = ReadParse(fen," ;=");

//this loads the fen and builds the position structure in my program
        Command(rsd);

	if (!strcmp(args[5], "\"0-1\"")) {
		result = 0.0;
	} else if (!strcmp(args[5], "\"1-0\"")) {
		result = 1.0;
	} else if (!strcmp(args[5], "\"1/2-1/2\"")) {
		result = 0.5;
	} else {
	   printf("%d bad result\n",n);
	   printf("args[5] %s\n",args[5]);
	   printf("%s\n", fen);
        break;
	}

// save the game result in an unused variable
     pos->static_score = (uint16_t) (result * 2);

// and save the converted fen to disk with a sequential write
     fwrite(pos, sizeof(position_t), 1 , f2);
   }

   fclose (f); fclose(f2);
   printf("%d writes\n",n);
}
I'm very interested in using these datasets. Currently I am using the 725K epd quiet-labelled dataset from zuirchess. This has worked well, but I am looking for a similar dataset which is larger if possible.
Well, if we can find a public repository for swapping/exchanging/downloading you’re welcome to them.
I make a public repo: https://github.com/KierenP/ChessTrainingSets. If possible, make a PR and I'll accept it or if you'd rather pm the datasets to me and I'll put them up
Oh cool, well done. I tried with github some time ago but quickly overflowed their capacity limit. Do you have an updated account with them where capacity is no problem? Mine amount to a few Gb, even compressed, btw.
No I don't have an upgraded account
chrisw
Posts: 4317
Joined: Tue Apr 03, 2012 4:28 pm

Re: Speeding Up The Tuner

Post by chrisw »

Kieren Pearson wrote: Mon Sep 07, 2020 4:25 pm
chrisw wrote: Mon Sep 07, 2020 8:44 am
Kieren Pearson wrote: Mon Sep 07, 2020 8:29 am
chrisw wrote: Sun Sep 06, 2020 4:06 pm
Kieren Pearson wrote: Sun Sep 06, 2020 1:39 pm
chrisw wrote: Sun Sep 06, 2020 1:21 pm
D Sceviour wrote: Sun Sep 06, 2020 2:56 am I noticed that 99% of the calculation time for stochastic gradient tuning was spent parsing fen strings, and building positional structures. Only 1% of the time was spent on evaluation. To speed up the tuner I saved a binary file of positional structures to disk. Then, the binary was recalled for tuning. Instead of 5 hours spent tuning a variable, it now takes less than one second.

Most engines have a unique position structure, and a "position_t" is often all that is necessary to evaluate a position. It takes several hours to create the binary file, but it only needs to be done once. Instead of:

if (fgets(line, 256, fin) == NULL)

that programs like Ethereal use with an epd file, to tune now use with the binary file:

while (fread(pos, sizeof(position_t), 1 , f)) {

The binary file could be re-opened into another array for tuning, but disk read is so fast that it is not necessary. Besides, if threading is used then the disk binary file can be used to save memory. The "quiet-labeled.epd" was an old common file used for tuning. It creates a binary of 113,282 KB for the Schooner position_t structure.

A link from someone would be appreciated if there are newer and better tuning epd's.
+1

If there's a public respository to keep them in, I can contribute
42M evaluated EPDs from Ed Schroeders Human database (distilled from LiChess, I think)
8.6M evaluated EPDs distilled from CCRL 40-40
13M evaluated EPDs distilled from CCRL-blitz

evals are at 25 ms, SF11, one thread

The game result is not normally saved in a position structure, so an unused variable is used to store the game result. Here is a sample 'C' code of the technique to convert an epd file and build the binary file:

Code: Select all

void CreateBinaryFile() {
FILE * f;
FILE * f2;
int n = 0;
double result;
position_t *pos = rsd->pos;

   strcpy(Tfilename,"quiet-labeled.epd");
   strcpy(Wfilename,"tune_array.bin");

   f = fopen(Tfilename,"r+");
   if (f == NULL) {
	printf("file not opened\n");
	return;
   }

   f2 = fopen(Wfilename,"wb+");
   if (f2 == NULL) {
	printf("file f2 not opened\n");
	return;
   }

   n = 0;

   while (fgets(fen,128,f)) {
	n+=1;

// the parsing method here was taken from Crafty
	nargs = ReadParse(fen," ;=");

//this loads the fen and builds the position structure in my program
        Command(rsd);

	if (!strcmp(args[5], "\"0-1\"")) {
		result = 0.0;
	} else if (!strcmp(args[5], "\"1-0\"")) {
		result = 1.0;
	} else if (!strcmp(args[5], "\"1/2-1/2\"")) {
		result = 0.5;
	} else {
	   printf("%d bad result\n",n);
	   printf("args[5] %s\n",args[5]);
	   printf("%s\n", fen);
        break;
	}

// save the game result in an unused variable
     pos->static_score = (uint16_t) (result * 2);

// and save the converted fen to disk with a sequential write
     fwrite(pos, sizeof(position_t), 1 , f2);
   }

   fclose (f); fclose(f2);
   printf("%d writes\n",n);
}
I'm very interested in using these datasets. Currently I am using the 725K epd quiet-labelled dataset from zuirchess. This has worked well, but I am looking for a similar dataset which is larger if possible.
Well, if we can find a public repository for swapping/exchanging/downloading you’re welcome to them.
I make a public repo: https://github.com/KierenP/ChessTrainingSets. If possible, make a PR and I'll accept it or if you'd rather pm the datasets to me and I'll put them up
Oh cool, well done. I tried with github some time ago but quickly overflowed their capacity limit. Do you have an updated account with them where capacity is no problem? Mine amount to a few Gb, even compressed, btw.
No I don't have an upgraded account
presumably same as mine, in that case, 500Mb limit. I think Ed Schroeder will be able to host them on his site
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Speeding Up The Tuner

Post by hgm »

If reading a FEN takes more time than evaluating the position, there is something badly wrong with your FEN reader... Reading a FEN should not take measurably more time than reading a binary representation of the position.
D Sceviour
Posts: 570
Joined: Mon Jul 20, 2015 5:06 pm

Re: Speeding Up The Tuner

Post by D Sceviour »

hgm wrote: Wed Sep 09, 2020 9:39 am If reading a FEN takes more time than evaluating the position, there is something badly wrong with your FEN reader... Reading a FEN should not take measurably more time than reading a binary representation of the position.
On the other hand, maybe my evaluator() is very fast. :D

One problem discovered was that the hash tables are cleared with memset() for every fen read. This takes considerable time although perhaps it is not really necessary. To fix this, a new setboard() routine would have to be re-written just for the tuner. Pre-processing the fens has cured all this instead - and speeded up the tuner.
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Speeding Up The Tuner

Post by hgm »

Even when your evaluator was piece-square only it couldn't be much faster than reading a FEN. But if you clear a few GB of memory that was not used in the first place... That is just wrong design. There is no reason at all to clear the hash table on setboard. There actually is never any need to clear the hash if you do not switch variants and calculate the key from scratch on setboard (i.e. use an absolute key). And even when you would use a relative key (i.e. start with a random value unrelated to the true xor sum of Zobrist keys, but update it incrementally in the normal way) it would in fact be self-invalidating. If you do want to clear it, though, the 'new' command would be the recommended place; GUIs should always send one before a setboard.