jstanback wrote: ↑Fri Jan 01, 2021 7:37 pm
Tuning is now done in a similar fashion to back-propogation for neural networks rather than the gradient-descent method...
jstanback wrote: ↑Fri Jan 01, 2021 7:37 pm
...each pertinent term is tweaked by a small amount in the direction to reduce error.
this is the gradient-descent. the neural network also uses gradient-descent. backprop propagate the error back for hidden layer(s) and.
When I did gradient-descent, for each position I tweaked every parameter a bit and ran the evaluation to compute the gradient gParam = dEval/dParam. Then I adjusted each parameter by lr*gParam where lr is the learn rate. For 1000 parameters this required doing 1000 evaluations for each training position. I know there are more efficient methods, but that's what I was doing.
The method I'm using now just does one evaluation for each training position but keeps a count for each parameter that gets used when evaluating the position. Since white and black use the same eval terms the count is incremented for white and decremented for black. For example, if a position has 8 white pawns and 6 black pawns then the count for pawn_material would be +2. If the game result was 1.0 (win for white) and the eval was 0 centipawns (which gets converted to 0.5 expected win-fraction) then the error is 0.5-1.0 = -0.5 and I increase pawn_material by 2*lr*0.5. I'm using a learn rate of about 1e-3, so in this case pawn_material would be increased by only 1e-3 centipawns for this single position. But at a rate of 250K positions per second the eval terms converge quite quickly.
Thank you, John! Nice news. What is the "proper" version number: 4.5 or 4.50 ?
Well, I don't know that there is a "proper" version number. But maybe to match the names for the binaries It would be best to use Wasp 4.50. I probably should have done that for my posts.
jstanback wrote: ↑Fri Jan 01, 2021 7:37 pm
Wasp 4.5 is released today. I estimate it to be about 50 Elo stronger than Wasp 4.0.
Many thanks John. How do you use the config file. I used example_config with UCI_Elo = 1200 and set ConfigFilePath=./example_config in the Shredder 12 GUI. But it seems to search much more deeply than I expect for 1200 elo and gets depth around 8 ply.
There is a lower limit of 1500 for UCI_Elo. Since Wasp mainly uses nodes/sec to reduce Elo it didn't seem reasonable to drop nodes/sec below about 50. For ConfigFilePath, Wasp doesn't add a ".txt" extension, so you need to put ConfigFilePath=./example_config.txt in the Shredder GUI.
jstanback wrote: ↑Fri Jan 01, 2021 7:37 pm
Tuning is now done in a similar fashion to back-propogation for neural networks rather than the gradient-descent method...
jstanback wrote: ↑Fri Jan 01, 2021 7:37 pm
...each pertinent term is tweaked by a small amount in the direction to reduce error.
this is the gradient-descent. the neural network also uses gradient-descent. backprop propagate the error back for hidden layer(s) and.
When I did gradient-descent, for each position I tweaked every parameter a bit and ran the evaluation to compute the gradient gParam = dEval/dParam. Then I adjusted each parameter by lr*gParam where lr is the learn rate. For 1000 parameters this required doing 1000 evaluations for each training position. I know there are more efficient methods, but that's what I was doing.
The method I'm using now just does one evaluation for each training position but keeps a count for each parameter that gets used when evaluating the position. Since white and black use the same eval terms the count is incremented for white and decremented for black. For example, if a position has 8 white pawns and 6 black pawns then the count for pawn_material would be +2. If the game result was 1.0 (win for white) and the eval was 0 centipawns (which gets converted to 0.5 expected win-fraction) then the error is 0.5-1.0 = -0.5 and I increase pawn_material by 2*lr*0.5. I'm using a learn rate of about 1e-3, so in this case pawn_material would be increased by only 1e-3 centipawns for this single position. But at a rate of 250K positions per second the eval terms converge quite quickly.
John
I did the same, but this is a gradient descent. Uses the derivative of error for the (weights update) direction. It just assigns a coefficient. eg: when the coefficient is zero the don't change the weight. I don't think that would have a different name. Backpropagation is another thing. There may be some specialized word to this, but not backpropagation. I think it's also called gradient descent.
Apart from that it’s irrelevant, the point is that you understand what you’re doing.
jstanback wrote: ↑Fri Jan 01, 2021 7:37 pm
Tuning is now done in a similar fashion to back-propogation for neural networks rather than the gradient-descent method...
jstanback wrote: ↑Fri Jan 01, 2021 7:37 pm
...each pertinent term is tweaked by a small amount in the direction to reduce error.
this is the gradient-descent. the neural network also uses gradient-descent. backprop propagate the error back for hidden layer(s) and.
When I did gradient-descent, for each position I tweaked every parameter a bit and ran the evaluation to compute the gradient gParam = dEval/dParam. Then I adjusted each parameter by lr*gParam where lr is the learn rate. For 1000 parameters this required doing 1000 evaluations for each training position. I know there are more efficient methods, but that's what I was doing.
The method I'm using now just does one evaluation for each training position but keeps a count for each parameter that gets used when evaluating the position. Since white and black use the same eval terms the count is incremented for white and decremented for black. For example, if a position has 8 white pawns and 6 black pawns then the count for pawn_material would be +2. If the game result was 1.0 (win for white) and the eval was 0 centipawns (which gets converted to 0.5 expected win-fraction) then the error is 0.5-1.0 = -0.5 and I increase pawn_material by 2*lr*0.5. I'm using a learn rate of about 1e-3, so in this case pawn_material would be increased by only 1e-3 centipawns for this single position. But at a rate of 250K positions per second the eval terms converge quite quickly.
John
I did the same, but this is a gradient descent. Uses the derivative of error for the (weights update) direction. It just assigns a coefficient. eg: when the coefficient is zero the don't change the weight. I don't think that would have a different name. Backpropagation is another thing. There may be some specialized word to this, but not backpropagation. I think it's also called gradient descent.
Apart from that it’s irrelevant, the point is that you understand what you’re doing.
-Tamás
Hi Tamas,
Yes, it may amount to exactly the same thing, but it somehow seems different to me. I adopted this technique after experimenting with a tiny NN and learning to back-propogate. I realized that I could update the weights for an HCE exactly as is done for a single node of a NN except that I could eliminate using the derivative of the activation function since the HCE is just a linear sum of weights. Anyway, compared to my previous approact it sped up the tuning by a factor of 1000 and also made the tuning function much simpler. I have a method for using non-integer "counts" for some eval terms. For example I have a single value for king-safety and calculate a floating point scaling factor during the eval based on number enemy threats. The eval gets updated by scale*king_safety and the "count" for king_safety gets incremented by scale. It might actually be better to have separate terms for every possible number of enemy threats and let the training come up with the weight for each term. I did this initially to derive an appropriate scaling function, but I kind of like having a smoothly scaled feature.
Alayan wrote: ↑Sun Jan 03, 2021 8:23 pm
How do you think this automated tuning affected Wasp's offensive middlegame style ?
I think it's about the same or maybe a little more aggressive than the previous version. I guess we'll find out from Frank's upcoming tournament. I should probably start keeping some statistics from my gauntlet testing such as wins/losses/draws before and after move 50-50 moves, similar to what Frank does.
jstanback wrote: ↑Fri Jan 01, 2021 7:37 pm
Wasp 4.5 is released today. I estimate it to be about 50 Elo stronger than Wasp 4.0.
Many thanks John. How do you use the config file. I used example_config with UCI_Elo = 1200 and set ConfigFilePath=./example_config in the Shredder 12 GUI. But it seems to search much more deeply than I expect for 1200 elo and gets depth around 8 ply.
There is a lower limit of 1500 for UCI_Elo. Since Wasp mainly uses nodes/sec to reduce Elo it didn't seem reasonable to drop nodes/sec below about 50. For ConfigFilePath, Wasp doesn't add a ".txt" extension, so you need to put ConfigFilePath=./example_config.txt in the Shredder GUI.
I added some examples in my "Engine Configuration".
The entry to Wasp 4.5 after release.
One of the most important points for myself.
Please thinking on it that Elo-Strength will be start with 1500 (same on DGT-Pi with 22 Levels, Picochess 3).
after my v4.08 test games the move-average without resign-mode goes back.
No engine have a lesser move average in FCP Qualify Tourney-2021 as Wasp 4.08 with 79,2 (without resign).
Same results Wasp 4.08 produced in test-games vs. stronger engines.
In my humble opinion the dynamic for mid-games is clearly improved and the pawn's in midgames are again more aggressively (if I compare it with Wasp 4.00). But all in all I think the more on Elo comes from a better endgame.
So, the style of Wasp 4.5 will be again more aggressively as in the predesscor's.
This make the engine very interesting for self-playing with 22 levels and Picochess 3 on DGT-Pi.