tpetzke wrote:Auto-tuning works to a certain extend. There are several method either based on already played games or by playing games.
This tuning just automates the stuff that a programmer would do manually otherwise, increasing and decreasing certain values and look what value performs best. Completely remove intuition in that process, it is worthless.
All the terms interact somehow with each other. It is impossible to say what the effect of a change is until you try it against a scoring method (game play or correctness of prediction of game outcomes or something else). And this can be automated or you do it manually.
Chess knowledge is important to learn what parameters are useful but it is not so useful for estimating the value you assign to a term.
Thomas...
Hi Thomas.
I went to your site that helped me understand a bit how autotuning works.
The point is how inaccurate an autotuning value might become, if it does not take into account other unrelated factors that influence the score? OK, autotuners can count better than humans, but what makes them understand better what terms influence and what not the result? The values might go quite off if there are a lot of influencing factors that are not taken into account.
Humans also miss such factors, but they are better prepared to assess them. Autotuners count better. What is more important: better assessment or better counting?
tpetzke wrote:Auto-tuning works to a certain extend. There are several method either based on already played games or by playing games.
This tuning just automates the stuff that a programmer would do manually otherwise, increasing and decreasing certain values and look what value performs best. Completely remove intuition in that process, it is worthless.
All the terms interact somehow with each other. It is impossible to say what the effect of a change is until you try it against a scoring method (game play or correctness of prediction of game outcomes or something else). And this can be automated or you do it manually.
Chess knowledge is important to learn what parameters are useful but it is not so useful for estimating the value you assign to a term.
Thomas...
The programmer may increase and decrease some values by 20% and find almost no difference and get the conclusion that the parameter is not sensitive so it is better not tune it.
I do not know if SPSA does it and at least I see no information about this type of result in the stockfish framework.
the tuner will find reasonable values for the parameters you expose, if a term is useless or even hurts you find out because the tuner tries to minimize that value. Also if you repeat the tuning process those terms that fluctuate very much are the ones of lesser importance, those terms that get their weights fixed earlier are the more important ones.
My auto tuner does not guarantee me the best values (the algorithm I use cannot ensure that) but a reasonable set achieved with big but manageable efforts. This is the reason I wrote it works to up to a certain extent. The efforts to improve a good set towards a best set increase very fast and you must decide for yourself where to stop. Diminishing returns.
Also the best set has a disadvantage. It will hinder future development because as soon as you change a small thing in your engine you have a regression unless you produce another best set and spend the huge effort again (maybe only to then find out your change was really a regression).
Uri Blass wrote:
I did not see succesful patches with SPSA and my conclusion is that it probably does not work for some reason(maybe people do not use it correctly) because I do not believe that all the values that the stockfish team tested are optimal or not sensitive.
I think that the first thing to do is to detect the sensitive parameters when increasing the value by 20% or reducing the value by 20% cause a significant difference in playing strength and try to tune these values.
I admit that I did not learn how SPSA works so
I do not know if SPSA try to detect the sensitive values when it get a list of parameters to tune but so far I saw only red results with SPSA.
Uri
Hi Uri,
well, I'm pretty sure SPSA works as intended.
But I'm also sure, there are many possibilities to mess things up correctly setting up a tuning session. I find CLOP easier to use. But maybe that's just me ...
I think, that when using fishtest for tuning sessions, we have to deal with 2 things.
First, we are talking about gaining 0 - 3 elo most of the times, depending on the sensitivity of the parameters to be tuned.
Second, in a distributed framework like fishtest, there is a lot of variance.
To compensate for the second, I guess much more games have to be played.
So if we need to play 50k games on our local box to tune 2 parameters, we must play ten times as much in fishtest. At least. But that's just a guess of mine ...
Now let's imagine one tries to tune 10 parameters at once. I don't think 500k games are enough ...
Just my 2 cents regards,
Joerg.
I tried to find on the internet some simple explanations how autotuning works, and did not find much. My understanding is that, whether CLOP, SPSA or other tuning facility, it adjusts, increases or decreases the numerical values for a specific term, based on the win/loss ratio in a large game database where the abovementioned term occurs multiple times.
If that is so, and I would be glad to be corrected, then what makes autotuners better than human judgement? As I understand it, autotuners can say that with a specific term present, the winning ratio increases, so it will adjust its preset value upwards, more if the winning ratio is bigger. However, the big question for me is how can we be certain that the winning ratio in positions with the above term is not due to other unrelated terms simultaneously present into the game pool? I have no certainty about that at all. It is quite possible that other terms influence the winning ratio.
So what makes an autotuner better than a human assessment? I think it is the other way round. For the simple fact that autotuners can not suppose what other terms could affect the winning ratio, but humans can and their assessments are actually based on that.
So that for me it is very improbable that an autotuner works successfully, unless of course it is very perfected. But I think we are very far from this autotuning perfection stage.
I understand that programmers would like to leave the dirty hard work to machines and concentrate on creative solutions instead, but I do not think they can fully trust autotuners yet.
Do you imagine a time when everything will be done by autotuners? What will programers do then? For, if you can autotune some eval and search terms, then you can definitely also tune all programming code.
There is always interaction. That is why you play 100K+ games. The statistics takes care of itself.
tpetzke wrote:Auto-tuning works to a certain extend. There are several method either based on already played games or by playing games.
This tuning just automates the stuff that a programmer would do manually otherwise, increasing and decreasing certain values and look what value performs best. Completely remove intuition in that process, it is worthless.
All the terms interact somehow with each other. It is impossible to say what the effect of a change is until you try it against a scoring method (game play or correctness of prediction of game outcomes or something else). And this can be automated or you do it manually.
Chess knowledge is important to learn what parameters are useful but it is not so useful for estimating the value you assign to a term.
Thomas...
The programmer may increase and decrease some values by 20% and find almost no difference and get the conclusion that the parameter is not sensitive so it is better not tune it.
I do not know if SPSA does it and at least I see no information about this type of result in the stockfish framework.
No SPSA does not measure sensitivity. That you have to measure yourself.
tpetzke wrote:Auto-tuning works to a certain extend. There are several method either based on already played games or by playing games.
This tuning just automates the stuff that a programmer would do manually otherwise, increasing and decreasing certain values and look what value performs best. Completely remove intuition in that process, it is worthless.
All the terms interact somehow with each other. It is impossible to say what the effect of a change is until you try it against a scoring method (game play or correctness of prediction of game outcomes or something else). And this can be automated or you do it manually.
Chess knowledge is important to learn what parameters are useful but it is not so useful for estimating the value you assign to a term.
Thomas...
Hi Thomas.
I went to your site that helped me understand a bit how autotuning works.
The point is how inaccurate an autotuning value might become, if it does not take into account other unrelated factors that influence the score? OK, autotuners can count better than humans, but what makes them understand better what terms influence and what not the result? The values might go quite off if there are a lot of influencing factors that are not taken into account.
Humans also miss such factors, but they are better prepared to assess them. Autotuners count better. What is more important: better assessment or better counting?
It has to be setup properly. This takes a bit of experimentation.
I believe at this point, the fishtest implementation of SPSA is still in "experimental" stage. It is crashing workers and has lots of variance. I have much better success in local tests.
Uri Blass wrote:I think that maybe reducing the middle game value of the knight value by 2% may give a better result if the result of big elo loss of 4 rating points hold
with more games.
stockfish see advantage for black in the following position from my game that is not correct
r1bq1rk1/p3nppp/1pn1p3/3pP3/1P1N4/P1B5/2P2PPP/R2QKB1R w KQ - 1 11
Hi Uri.
I saw so many games where SF underestimates knights in the mg preferring instead bishop, pair of bishops, pawns, queen, etc. A really astounding number of games where that would be true.
And yet the increase seems to fail currently convincingly. Some things are really difficult for me to explain in SF. I am sure a lot could be gained by tweaking piece values in SF, but, if a sensible approach does not work, I do not know what will.
An intuitive approach is hardly sensible!
What is sensible is employing a mathematically-proven algorithm to tune the values in a systematic way.
I already explained this to you Lyudmil, but you will never listen to people even when it is regarding a subject which you know absolutely nothing about (SPSA and CLOP). Your utter disregard and disrespect for people who give you good and patient advice is what annoys me most.
As for the matter of piece values, I believe this has been tuned to death in the past by Joona, and there is hardly anything left there, but maybe with 1,000,000 SPSA games, we may gain 1 elo or something... (Hardly something a sane person will pursue as potential for elo!), and even if there is more elo, the way to find out is through algorithmic tuning, not "random fantasy value insertions".
It is the first time that someone accuses me of being disrespectful of people here. I do not remember an occasion where I have insulted someone. But I stick to what I believe of course.
How many successful patches you had with SPSA?
This is an insult to Joona and all his efforts in tuning stockfish long before fishtest. SF is 3000+ elo now because of his tuning efforts.
Even if Joona found close to optimal values before fishtest in his tuning then I see no reason to assume that the values remained close to optimal after stockfish earned hundrends of elo and there is a lot of changes in the code.
I do not know how much rating stockfish earned because of tuning before fishtest and it may be interesting to know.
I did not see succesful patches with SPSA and my conclusion is that it probably does not work for some reason(maybe people do not use it correctly) because I do not believe that all the values that the stockfish team tested are optimal or not sensitive.
I think that the first thing to do is to detect the sensitive parameters when increasing the value by 20% or reducing the value by 20% cause a significant difference in playing strength and try to tune these values.
I admit that I did not learn how SPSA works so
I do not know if SPSA try to detect the sensitive values when it get a list of parameters to tune but so far I saw only red results with SPSA.
Uri
I believe most recently Joona used SPSA for tuning KPSQT and NPSQT and they both gained elo -- so that is your proof (post-fishtest) that it does work. I am pretty sure almost every parameter in SF has been tuned using some algorithm (SPSA, CLOP, Genetic Algorithms, etc.) except for the more recent additions.
Yes, I would agree that some values may be off the optimal after 100s of patches. But, I completely disagree with Lyudmil's assessment that it is 50 elo for piece values. It is a few elo at most for piece values.
Uri Blass wrote:I think that maybe reducing the middle game value of the knight value by 2% may give a better result if the result of big elo loss of 4 rating points hold
with more games.
stockfish see advantage for black in the following position from my game that is not correct
r1bq1rk1/p3nppp/1pn1p3/3pP3/1P1N4/P1B5/2P2PPP/R2QKB1R w KQ - 1 11
Hi Uri.
I saw so many games where SF underestimates knights in the mg preferring instead bishop, pair of bishops, pawns, queen, etc. A really astounding number of games where that would be true.
And yet the increase seems to fail currently convincingly. Some things are really difficult for me to explain in SF. I am sure a lot could be gained by tweaking piece values in SF, but, if a sensible approach does not work, I do not know what will.
An intuitive approach is hardly sensible!
What is sensible is employing a mathematically-proven algorithm to tune the values in a systematic way.
I already explained this to you Lyudmil, but you will never listen to people even when it is regarding a subject which you know absolutely nothing about (SPSA and CLOP). Your utter disregard and disrespect for people who give you good and patient advice is what annoys me most.
As for the matter of piece values, I believe this has been tuned to death in the past by Joona, and there is hardly anything left there, but maybe with 1,000,000 SPSA games, we may gain 1 elo or something... (Hardly something a sane person will pursue as potential for elo!), and even if there is more elo, the way to find out is through algorithmic tuning, not "random fantasy value insertions".
It is the first time that someone accuses me of being disrespectful of people here. I do not remember an occasion where I have insulted someone. But I stick to what I believe of course.
How many successful patches you had with SPSA?
This is an insult to Joona and all his efforts in tuning stockfish long before fishtest. SF is 3000+ elo now because of his tuning efforts.
Even if Joona found close to optimal values before fishtest in his tuning then I see no reason to assume that the values remained close to optimal after stockfish earned hundrends of elo and there is a lot of changes in the code.
I do not know how much rating stockfish earned because of tuning before fishtest and it may be interesting to know.
I did not see succesful patches with SPSA and my conclusion is that it probably does not work for some reason(maybe people do not use it correctly) because I do not believe that all the values that the stockfish team tested are optimal or not sensitive.
I think that the first thing to do is to detect the sensitive parameters when increasing the value by 20% or reducing the value by 20% cause a significant difference in playing strength and try to tune these values.
I admit that I did not learn how SPSA works so
I do not know if SPSA try to detect the sensitive values when it get a list of parameters to tune but so far I saw only red results with SPSA.
Uri
I believe most recently Joona used SPSA for tuning KPSQT and NPSQT and they both gained elo -- so that is your proof (post-fishtest) that it does work. I am pretty sure almost every parameter in SF has been tuned using some algorithm (SPSA, CLOP, Genetic Algorithms, etc.) except for the more recent additions.
Yes, I would agree that some values may be off the optimal after 100s of patches. But, I completely disagree with Lyudmil's assessment that it is 50 elo for piece values. It is a few elo at most for piece values.
So those autotuners work only privately, but not when they are used in fishtest.
My suggestion then to all who would use them: use those only at home, but not to tune officially.
tpetzke wrote:Auto-tuning works to a certain extend. There are several method either based on already played games or by playing games.
This tuning just automates the stuff that a programmer would do manually otherwise, increasing and decreasing certain values and look what value performs best. Completely remove intuition in that process, it is worthless.
All the terms interact somehow with each other. It is impossible to say what the effect of a change is until you try it against a scoring method (game play or correctness of prediction of game outcomes or something else). And this can be automated or you do it manually.
Chess knowledge is important to learn what parameters are useful but it is not so useful for estimating the value you assign to a term.
Thomas...
Hi Thomas.
I went to your site that helped me understand a bit how autotuning works.
The point is how inaccurate an autotuning value might become, if it does not take into account other unrelated factors that influence the score? OK, autotuners can count better than humans, but what makes them understand better what terms influence and what not the result? The values might go quite off if there are a lot of influencing factors that are not taken into account.
Humans also miss such factors, but they are better prepared to assess them. Autotuners count better. What is more important: better assessment or better counting?
It has to be setup properly. This takes a bit of experimentation.
I believe at this point, the fishtest implementation of SPSA is still in "experimental" stage. It is crashing workers and has lots of variance. I have much better success in local tests.
So why do not you then push those 'much more successfully tuned' values into fishtest?
Do not understand me wrong, I am ready to say, sorry guys, SPSA and CLOP are great, but I have to see some evidence for it on fishtest. For the time being there is no such evidence.
I agree that perfected autotuners might be able to work, but for the time being there are no such autotuners, they are like google translators, I hope you know what I mean.
Uri Blass wrote:I think that maybe reducing the middle game value of the knight value by 2% may give a better result if the result of big elo loss of 4 rating points hold
with more games.
stockfish see advantage for black in the following position from my game that is not correct
r1bq1rk1/p3nppp/1pn1p3/3pP3/1P1N4/P1B5/2P2PPP/R2QKB1R w KQ - 1 11
Hi Uri.
I saw so many games where SF underestimates knights in the mg preferring instead bishop, pair of bishops, pawns, queen, etc. A really astounding number of games where that would be true.
And yet the increase seems to fail currently convincingly. Some things are really difficult for me to explain in SF. I am sure a lot could be gained by tweaking piece values in SF, but, if a sensible approach does not work, I do not know what will.
An intuitive approach is hardly sensible!
What is sensible is employing a mathematically-proven algorithm to tune the values in a systematic way.
I already explained this to you Lyudmil, but you will never listen to people even when it is regarding a subject which you know absolutely nothing about (SPSA and CLOP). Your utter disregard and disrespect for people who give you good and patient advice is what annoys me most.
As for the matter of piece values, I believe this has been tuned to death in the past by Joona, and there is hardly anything left there, but maybe with 1,000,000 SPSA games, we may gain 1 elo or something... (Hardly something a sane person will pursue as potential for elo!), and even if there is more elo, the way to find out is through algorithmic tuning, not "random fantasy value insertions".
It is the first time that someone accuses me of being disrespectful of people here. I do not remember an occasion where I have insulted someone. But I stick to what I believe of course.
How many successful patches you had with SPSA?
This is an insult to Joona and all his efforts in tuning stockfish long before fishtest. SF is 3000+ elo now because of his tuning efforts.
Even if Joona found close to optimal values before fishtest in his tuning then I see no reason to assume that the values remained close to optimal after stockfish earned hundrends of elo and there is a lot of changes in the code.
I do not know how much rating stockfish earned because of tuning before fishtest and it may be interesting to know.
I did not see succesful patches with SPSA and my conclusion is that it probably does not work for some reason(maybe people do not use it correctly) because I do not believe that all the values that the stockfish team tested are optimal or not sensitive.
I think that the first thing to do is to detect the sensitive parameters when increasing the value by 20% or reducing the value by 20% cause a significant difference in playing strength and try to tune these values.
I admit that I did not learn how SPSA works so
I do not know if SPSA try to detect the sensitive values when it get a list of parameters to tune but so far I saw only red results with SPSA.
Uri
I believe most recently Joona used SPSA for tuning KPSQT and NPSQT and they both gained elo -- so that is your proof (post-fishtest) that it does work. I am pretty sure almost every parameter in SF has been tuned using some algorithm (SPSA, CLOP, Genetic Algorithms, etc.) except for the more recent additions.
Yes, I would agree that some values may be off the optimal after 100s of patches. But, I completely disagree with Lyudmil's assessment that it is 50 elo for piece values. It is a few elo at most for piece values.
So those autotuners work only privately, but not when they are used in fishtest.
My suggestion then to all who would use them: use those only at home, but not to tune officially.
Actually, quite the opposite. More resources should be put towards sorting out fishtest quirks. Nobody has machines that are near the capability or the power of distributed resources that fishtest has. For example, if I do an SPSA tuning session at home on my 8-core machine (already a luxury most people can't afford), it takes about 15 days for 100K games (15" sec TC). On fishtest, 100K games finishes in just 2-3 hrs (15" sec TC)!
The more SPSA tests we do on the framework, the better idea we get about its problems and limitations. That is one of the reasons why I posted the mega_pawn_tune test. I wanted to see what is the limit of the framework. This information is useful in the long-term, even if the test did not give elo results immediately.
Many of the first say 5-15 SPSA attempts will probably not yield much results, but every test will give useful hints to improve the setup better, and eventually results will come. I think there is lot of potential here, and I would bet on autotuning any day!