Training data

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
maksimKorzh
Posts: 627
Joined: Sat Sep 08, 2018 3:37 pm
Location: Ukraine
Full name: Maksim Korzh
Contact:

Re: Training data

Post by maksimKorzh » Tue Jan 12, 2021 11:17 pm

Desperado, here're my test of MSE for two sets of material + PST params:

My own PST: 0.13688435023553586
rofChade PST (3000+ engine): 0.13744401297556394

I calculated MSE on my gm2600.pgn 30 000 quite positions.
Now despite the fact rofChade's error is slightly bigger it's PSTs score around 70 Elo stronger
Wukong Xiangqi (Chinese chess engine + apps to embed into 3rd party websites):
https://github.com/maksimKorzh/wukong-xiangqi

Chess programming YouTube channel:
https://www.youtube.com/channel/UCB9-pr ... KKqDgXhsMQ

User avatar
Desperado
Posts: 782
Joined: Mon Dec 15, 2008 10:45 am

Re: Training data

Post by Desperado » Wed Jan 13, 2021 7:54 am

maksimKorzh wrote:
Tue Jan 12, 2021 11:17 pm
Desperado, here're my test of MSE for two sets of material + PST params:

My own PST: 0.13688435023553586
rofChade PST (3000+ engine): 0.13744401297556394

I calculated MSE on my gm2600.pgn 30 000 quite positions.
Now despite the fact rofChade's error is slightly bigger it's PSTs score around 70 Elo stronger
Hello Maksim,

is there a question involved, sorry i don't understand what you want to tell me.

In general it is not unusal that a smaller error does not produce better gameplay.
There is nothing wrong with that. There can be many reasons for such an observation.

User avatar
Desperado
Posts: 782
Joined: Mon Dec 15, 2008 10:45 am

Re: Training data

Post by Desperado » Wed Jan 13, 2021 9:26 am

Here is what i do with success now

1. I generated 111.661.993 positions from ccrl database with players both elo over 2800.
2. I shuffled the file and picked 4M by random.

That is what i did before too. Now the improvement...

3. I did a 3-ply search for each position
4. I playout the pv to move 3 and create the resulting epd entry with result and score (i just kept the result ?!)

Now validating...

5.I did a training session for material on the new dataset. The result was stable now. No diverging numbers anymore.

Perfect! This process allows me to use noisy input and create useful training data out of it.
Especially because it shows statistically well the distribution of position types in games.
The new feature that the data correlates well with the engine is of course a special bonus.

There is also something to play with, because the value of the position is also output and can be used in the error
calculation in some way.

User avatar
Desperado
Posts: 782
Joined: Mon Dec 15, 2008 10:45 am

Re: Training data

Post by Desperado » Wed Jan 13, 2021 10:09 am

Desperado wrote:
Wed Jan 13, 2021 9:26 am
Here is what i do with success now

1. I generated 111.661.993 positions from ccrl database with players both elo over 2800.
2. I shuffled the file and picked 4M by random.

That is what i did before too. Now the improvement...

3. I did a 3-ply search for each position
4. I playout the pv to move 3 and create the resulting epd entry with result and score (i just kept the result ?!)

Now validating...

5.I did a training session for material on the new dataset. The result was stable now. No diverging numbers anymore.

Perfect! This process allows me to use noisy input and create useful training data out of it.
Especially because it shows statistically well the distribution of position types in games.
The new feature that the data correlates well with the engine is of course a special bonus.

There is also something to play with, because the value of the position is also output and can be used in the error
calculation in some way.
Something went wrong ... it is not true what i wrote. Sorry for the noise!

User avatar
maksimKorzh
Posts: 627
Joined: Sat Sep 08, 2018 3:37 pm
Location: Ukraine
Full name: Maksim Korzh
Contact:

Re: Training data

Post by maksimKorzh » Wed Jan 13, 2021 12:31 pm

Desperado wrote:
Wed Jan 13, 2021 7:54 am
maksimKorzh wrote:
Tue Jan 12, 2021 11:17 pm
Desperado, here're my test of MSE for two sets of material + PST params:

My own PST: 0.13688435023553586
rofChade PST (3000+ engine): 0.13744401297556394

I calculated MSE on my gm2600.pgn 30 000 quite positions.
Now despite the fact rofChade's error is slightly bigger it's PSTs score around 70 Elo stronger
Hello Maksim,

is there a question involved, sorry i don't understand what you want to tell me.

In general it is not unusal that a smaller error does not produce better gameplay.
There is nothing wrong with that. There can be many reasons for such an observation.
Just shared my experiment.
No question here)
Wukong Xiangqi (Chinese chess engine + apps to embed into 3rd party websites):
https://github.com/maksimKorzh/wukong-xiangqi

Chess programming YouTube channel:
https://www.youtube.com/channel/UCB9-pr ... KKqDgXhsMQ

Post Reply