Desperado wrote: ↑Fri Mar 11, 2022 6:25 pm
How was the data generated
====
1. 75000 games were played between three slightly different versions of zurichess
using 2moves_v1.pgn opening book to ensure high play variability.
2. From each game 20 positions were sampled from the millions of positions evaluated by
the engine during the game play. This resulted in 1500k random positions which were stored in violent.epd.
3. From the set were removed all positions on which quiescence search found a winning capture.
The remaining positions were stored in quiet.epd.
4. Next, from each quiet position a game using Stockfish 080916 was played.
The result were stored in quiet-labeled.epd.
1. Can someone explain the idea why the step 1. and 2. was done to create violent.epd and quiet.epd?
Why these extra steps, where is the difference to play immediately the Stockfish games and filter quiet positions out of it?
So what is the point in playing 75K games if the training signal (game result) will be used from a different engine anyway?
To get quiet positions from different game phases can be done too when picking the positions from any dataset, or not?
2. And what was the time level for the playouts of Stockfish?
3. quiet-labled.epd includes 750K positions, so there were played 750K playouts by Stockfish?
1. The idea is that what you actually want to do with the evaluation is that it should predict positions that the engine you are trying to optimize has to statically evaluate during search as good as possible. These positions that an engine encounters in it's static evaluation are not necessarily of the same class as normal positions during a game. For example, if you look at the leaves of quiesce, you'll often see positions that, while not having winning captures anymore, are still extremely winning for one position because of some tactics. Which kind of positions an engine encounters in quiesce leaves (and at other times when a position needs to be statically evaluated during search), is likely slightly different for each engine, so it maybe helps if you sample these positions from the engine you are actually optimizing the evals for. Additionally, the sampled positions are representative of a usual game that the engine will play, so it has naturally the right amount of opening/endgame positions.
2. I don't know, but for the set I created for Nalwald (using almost the same method, if I understand correctly) during the playouts the time per move was 80 milliseconds. With Stockfish you'd obviously need less time per move to reach the same quality (when I use my Nalwald-set together with the ZurChess-set, the engine plays roughly 10 Elo better, than if I only use Nalwald-set, just to bring this into perspective).
3. I guess so. This process can take a very long time. I only did it myself because I had access to a machine with 80 threads. It took still more than a whole day.