Double levers

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

arjuntemurnikar
Posts: 204
Joined: Tue Oct 15, 2013 10:22 pm
Location: Singapore

Re: Double levers

Post by arjuntemurnikar »

Lyudmil Tsvetkov wrote:
arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:
arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:
arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:
Uri Blass wrote:I think that maybe reducing the middle game value of the knight value by 2% may give a better result if the result of big elo loss of 4 rating points hold
with more games.

stockfish see advantage for black in the following position from my game that is not correct

r1bq1rk1/p3nppp/1pn1p3/3pP3/1P1N4/P1B5/2P2PPP/R2QKB1R w KQ - 1 11
Hi Uri.

I saw so many games where SF underestimates knights in the mg preferring instead bishop, pair of bishops, pawns, queen, etc. A really astounding number of games where that would be true.

And yet the increase seems to fail currently convincingly. Some things are really difficult for me to explain in SF. I am sure a lot could be gained by tweaking piece values in SF, but, if a sensible approach does not work, I do not know what will.
An intuitive approach is hardly sensible!

What is sensible is employing a mathematically-proven algorithm to tune the values in a systematic way.

I already explained this to you Lyudmil, but you will never listen to people even when it is regarding a subject which you know absolutely nothing about (SPSA and CLOP). Your utter disregard and disrespect for people who give you good and patient advice is what annoys me most.

As for the matter of piece values, I believe this has been tuned to death in the past by Joona, and there is hardly anything left there, but maybe with 1,000,000 SPSA games, we may gain 1 elo or something... (Hardly something a sane person will pursue as potential for elo!), and even if there is more elo, the way to find out is through algorithmic tuning, not "random fantasy value insertions".
It is the first time that someone accuses me of being disrespectful of people here. I do not remember an occasion where I have insulted someone. But I stick to what I believe of course.

How many successful patches you had with SPSA?
This is an insult to Joona and all his efforts in tuning stockfish long before fishtest. SF is 3000+ elo now because of his tuning efforts. If it doesn't work -- either it is already well tuned, or the setup is wrong. And if you ask me, yes I have had success using CLOP and SPSA in my private testing. You on the other hand, have never used it, nor have you even the slightest clue how it works, but you see a RED color in fishtest, and you immediately jump to your "expert" conclusions!
Lyudmil Tsvetkov wrote: Piece values in SF are not tuned well at all. Actually, currently they are incredibly inaccurate. For example, you have knight value increasing from mg to eg with some 30cps, and bishop value with just 20cps, when everyone knows that the bishop gains more strength relative to the knight in the eg. Similarly, rook value increases from mg to eg with just 8cps, when the rook in general is supposed to have a much higher value in the eg. Those are not well-tuned values. Rather, they are the product of tuning in a larger term pool involving redundancies here and there.
There are other factors in eval that contribute to the apparent reduction and increase in piece values in different positions. So, completely wrong. If the piece values were so badly calibrated, SF would not be 3000+ right now. If there is any tuning left, it is few elo at most!
Lyudmil Tsvetkov wrote: However, this is not the biggest problem for me. I am almost certain there is something specific into the search that would prevent to correct some parameters, even if this is the right approach. The old parameters, even if basically wrong, will resist alteration because functioning in tune with some specific search parameters. That is how I explain the failed knight test.
... and now you blame the search, which has absolutely nothing to do with this matter! When you don't understand something, blame the magic unicorn. Sure!
Lyudmil Tsvetkov wrote: Considering the knight value, how can you say that this is a stupid intuitive approach, when in every second game involving imbalance I have looked at that SF lost, SF always underestimated knights and that was the reason for losing. Do you understand, every second game. I think this is more a mathematically sound approach, not an intuition. It simply is impossible not to work, but still it does not. So there is something into the search. I do not understand, people could help here, but I am certain it has to do with the search. What search features could prevent sensible tuning?

So again, here is an example: you look at 100 games (all available SF lost games), and in 50 of them you observe that SF loses because it evaluate positions with knights worse than positions with other pieces and pawns. What is the logical, mathematically-based conclusion? That you should increase the knight value. You do that, and it does not work. How could you explain that?
"Every second game" is an exaggeration, but even then, I think here your mind is biasing you to see a pattern where there is none. You are so hard-bent on proving piece values are untuned that every other game you see, your mind leads you to the same false conclusion. There is a psychological term for this, but I forget what it is called.

Anyway, I have no idea how given just
1. The position
2. The final evaluation score

...you can come to the conclusion that in the given position SF over-values/under-values a knight! You must be GOD or something!!

There are 100+ eval features that are computed in a not-so-straightforward way by SF, so how can you breakdown exactly what is happening by just looking at the position and the final score??

If you ran eval command, and then you made a few claims, then I would take you seriously, but you don't. You make claims out of thin air.

To make it clearer (because I know you don't understand), how do you know in a particular position SF is over-valuing the knight, when there are other features of the position like pawn-structure, king-safety, etc. that may be counter-acting the knight evaluation? Without running eval and without proper debugging, any claim, especially about code which has been already well-tuned, is just silly.

I am not trying to demotivate you. Suggestions are always good (and you do make very good suggestions now and then). It is just that, when most people make bad suggestions, and somebody explains to them nicely why they are bad, then that's the end of it. Everybody lives in peace. But when YOU make a bad suggestion, and somebody explains to you nicely why it is bad, you ignore the person, question his merits, and continue to claim that you are right!

If someone does not get annoyed by this, they must be Buddha or something.
There are lots of Buddhas around this forum, Arjun. :D

Not 1 or 5 elo, I am absolutely certain SF can gain 50 elo just on better tuned piece values. Absolutely certain. So SF 6 could be just a change of piece values.

When I say I have looked at 100 games, this means I have looked at 100 games, and when I say that in half of the games SF underestimates knights, that means it does so in half of the games. I have no reason to lie.

I do not need to run eval command, besides it shows just a couple of terms. For an experienced chess player, accustomed to SF playing style, it is obvious what the engine is doing. There are positions where king safety is irrelevant, or pawn structure is irrelevant, etc. That is all the trick: to quickly be able to see what happens on the board. The quicker you get it, the more experienced you are. It is like that in every area. Experience comes with practice, that is it.

SF performing at 3000 does not mean that it has no glitches, but rather than other engines perform at lower elo. While SF piece values are badly tuned (Marco does not get offense here, you see:D), other engine piece values are even worse. Very simple.

I tell you again: it is not logical that the knight value increases from mg to eg more than the bishop value, very simple. And therefore SF has problems with its knights in games. It is not logical that the rook value stays almost the same in mg and eg, and therefore SF underestimates rooks in eg. Etc. You had a chance that in TCEC SF was the much better opponent overall, otherwise you would have seen what big problems it has with its piece values and imbalances! It is visible even now, but much weaker Komodo was unable to convert. Look at the last season, SF lost so many games because of its piece values. Its eval did not improve in that area substantially since then, so the problems have not disappeared. If SF misplayed 1/3 of the positions in the previous final based on its deficient piece value/imbalance understanding, then there is a problem, and it is huge. In the last TCEC SF simply had too much other advantages for this to show clearly. But it will certainly show with stronger opponents.

So that I stay by that: SF 6 could be just a change of piece values.
It is sad that all that experience gave you no sanity.

You can continue ignoring good advice, and you can continue re-iterating your same points over and over again. Sometimes innocent people will come around and will push a random test or two for you around piece values. They will fail. But, you will continue to claim you are right, and continue to claim that somehow magically, SF search is messing with your exact values.

I have come to know you very well now, Lyudmil, and I think sometimes you are just full of crap.
This time you already insult me, Arjun, but I take no offence.

My ideas are bad, but a storming pawn idea worked, a low mobility idea worked, a chain rank bonus idea worked, a queen imbalance idea worked, an advanced levers idea worked.

How many of your creative/non-simplification ideas actually worked?
I told you that sometimes you do make good suggestions, and that is good, but sometimes you don't, but that is not the problem. Everyone has some ideas that work and some that don't.

The problem is that when nice people explain to you in nice terms why an idea is bad, you ignore them, question their merit and continue making claims that you are actually correct. And worse, it is often in a subject that you know less about, and the person explaining to you knows what he is talking about. And even WORSE, when one of those bad ideas is tested in fishtest, and it fails, you blame another part of the code for interfering with your "perfect intuition".

Either you have a big ego, or no sense at all.

I am sorry you feel insulted, but you insult me first by ignoring everything I said. I put time and effort typing those words for you, but they go through one ear and come out of your other like air...
Uri Blass
Posts: 11204
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Double levers

Post by Uri Blass »

arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:
arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:
Uri Blass wrote:I think that maybe reducing the middle game value of the knight value by 2% may give a better result if the result of big elo loss of 4 rating points hold
with more games.

stockfish see advantage for black in the following position from my game that is not correct

r1bq1rk1/p3nppp/1pn1p3/3pP3/1P1N4/P1B5/2P2PPP/R2QKB1R w KQ - 1 11
Hi Uri.

I saw so many games where SF underestimates knights in the mg preferring instead bishop, pair of bishops, pawns, queen, etc. A really astounding number of games where that would be true.

And yet the increase seems to fail currently convincingly. Some things are really difficult for me to explain in SF. I am sure a lot could be gained by tweaking piece values in SF, but, if a sensible approach does not work, I do not know what will.
An intuitive approach is hardly sensible!

What is sensible is employing a mathematically-proven algorithm to tune the values in a systematic way.

I already explained this to you Lyudmil, but you will never listen to people even when it is regarding a subject which you know absolutely nothing about (SPSA and CLOP). Your utter disregard and disrespect for people who give you good and patient advice is what annoys me most.

As for the matter of piece values, I believe this has been tuned to death in the past by Joona, and there is hardly anything left there, but maybe with 1,000,000 SPSA games, we may gain 1 elo or something... (Hardly something a sane person will pursue as potential for elo!), and even if there is more elo, the way to find out is through algorithmic tuning, not "random fantasy value insertions".
It is the first time that someone accuses me of being disrespectful of people here. I do not remember an occasion where I have insulted someone. But I stick to what I believe of course.

How many successful patches you had with SPSA?
This is an insult to Joona and all his efforts in tuning stockfish long before fishtest. SF is 3000+ elo now because of his tuning efforts.
Even if Joona found close to optimal values before fishtest in his tuning then I see no reason to assume that the values remained close to optimal after stockfish earned hundrends of elo and there is a lot of changes in the code.

I do not know how much rating stockfish earned because of tuning before fishtest and it may be interesting to know.

I did not see succesful patches with SPSA and my conclusion is that it probably does not work for some reason(maybe people do not use it correctly) because I do not believe that all the values that the stockfish team tested are optimal or not sensitive.

I think that the first thing to do is to detect the sensitive parameters when increasing the value by 20% or reducing the value by 20% cause a significant difference in playing strength and try to tune these values.

I admit that I did not learn how SPSA works so
I do not know if SPSA try to detect the sensitive values when it get a list of parameters to tune but so far I saw only red results with SPSA.

Uri
Uri Blass
Posts: 11204
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Double levers

Post by Uri Blass »

arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:
Uri Blass wrote:I think that maybe reducing the middle game value of the knight value by 2% may give a better result if the result of big elo loss of 4 rating points hold
with more games.

stockfish see advantage for black in the following position from my game that is not correct

r1bq1rk1/p3nppp/1pn1p3/3pP3/1P1N4/P1B5/2P2PPP/R2QKB1R w KQ - 1 11
Hi Uri.

I saw so many games where SF underestimates knights in the mg preferring instead bishop, pair of bishops, pawns, queen, etc. A really astounding number of games where that would be true.

And yet the increase seems to fail currently convincingly. Some things are really difficult for me to explain in SF. I am sure a lot could be gained by tweaking piece values in SF, but, if a sensible approach does not work, I do not know what will.
An intuitive approach is hardly sensible!

What is sensible is employing a mathematically-proven algorithm to tune the values in a systematic way.

I already explained this to you Lyudmil, but you will never listen to people even when it is regarding a subject which you know absolutely nothing about (SPSA and CLOP). Your utter disregard and disrespect for people who give you good and patient advice is what annoys me most.

As for the matter of piece values, I believe this has been tuned to death in the past by Joona, and there is hardly anything left there, but maybe with 1,000,000 SPSA games, we may gain 1 elo or something... (Hardly something a sane person will pursue as potential for elo!), and even if there is more elo, the way to find out is through algorithmic tuning, not "random fantasy value insertions".
Is there a reason to believe that tuning is one time job and the right weights when stockfish was at least 200 elo weaker continue to be right
so there is nothing to earn from changing them?
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Double levers

Post by Lyudmil Tsvetkov »

Uri Blass wrote:
arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:
arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:
Uri Blass wrote:I think that maybe reducing the middle game value of the knight value by 2% may give a better result if the result of big elo loss of 4 rating points hold
with more games.

stockfish see advantage for black in the following position from my game that is not correct

r1bq1rk1/p3nppp/1pn1p3/3pP3/1P1N4/P1B5/2P2PPP/R2QKB1R w KQ - 1 11
Hi Uri.

I saw so many games where SF underestimates knights in the mg preferring instead bishop, pair of bishops, pawns, queen, etc. A really astounding number of games where that would be true.

And yet the increase seems to fail currently convincingly. Some things are really difficult for me to explain in SF. I am sure a lot could be gained by tweaking piece values in SF, but, if a sensible approach does not work, I do not know what will.
An intuitive approach is hardly sensible!

What is sensible is employing a mathematically-proven algorithm to tune the values in a systematic way.

I already explained this to you Lyudmil, but you will never listen to people even when it is regarding a subject which you know absolutely nothing about (SPSA and CLOP). Your utter disregard and disrespect for people who give you good and patient advice is what annoys me most.

As for the matter of piece values, I believe this has been tuned to death in the past by Joona, and there is hardly anything left there, but maybe with 1,000,000 SPSA games, we may gain 1 elo or something... (Hardly something a sane person will pursue as potential for elo!), and even if there is more elo, the way to find out is through algorithmic tuning, not "random fantasy value insertions".
It is the first time that someone accuses me of being disrespectful of people here. I do not remember an occasion where I have insulted someone. But I stick to what I believe of course.

How many successful patches you had with SPSA?
This is an insult to Joona and all his efforts in tuning stockfish long before fishtest. SF is 3000+ elo now because of his tuning efforts.
Even if Joona found close to optimal values before fishtest in his tuning then I see no reason to assume that the values remained close to optimal after stockfish earned hundrends of elo and there is a lot of changes in the code.

I do not know how much rating stockfish earned because of tuning before fishtest and it may be interesting to know.

I did not see succesful patches with SPSA and my conclusion is that it probably does not work for some reason(maybe people do not use it correctly) because I do not believe that all the values that the stockfish team tested are optimal or not sensitive.

I think that the first thing to do is to detect the sensitive parameters when increasing the value by 20% or reducing the value by 20% cause a significant difference in playing strength and try to tune these values.

I admit that I did not learn how SPSA works so
I do not know if SPSA try to detect the sensitive values when it get a list of parameters to tune but so far I saw only red results with SPSA.

Uri
Actually, it is very logical to suppose that after 2 years not being changed and a lot of other changes in the code, the piece values urgently need a change.

Regarding SPSA I do not know how much time it will take for levers to converge, but a sensible proposal that will take less time and allow drawing easy conclusions what to do next would be to change current 20;40 values to 15;45 in one test, and 25;35 in another, to see if 5th or 6th rank is more important for SF. Then you may proceed with further tests. This will take much less time than SPSA.
Joerg Oster
Posts: 994
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: Double levers

Post by Joerg Oster »

Lyudmil Tsvetkov wrote: Not 1 or 5 elo, I am absolutely certain SF can gain 50 elo just on better tuned piece values. Absolutely certain. So SF 6 could be just a change of piece values.
Hi Lyudmil,

this may be true for an engine in the beginning of its life, but definitely not for Stockfish.
You may well be right that there are better values, but we are talking about gaining 0 to 3 elo here. At best.

That's simply a truth you must face.

Best, Joerg.
Jörg Oster
Joerg Oster
Posts: 994
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: Double levers

Post by Joerg Oster »

Uri Blass wrote: I did not see succesful patches with SPSA and my conclusion is that it probably does not work for some reason(maybe people do not use it correctly) because I do not believe that all the values that the stockfish team tested are optimal or not sensitive.

I think that the first thing to do is to detect the sensitive parameters when increasing the value by 20% or reducing the value by 20% cause a significant difference in playing strength and try to tune these values.

I admit that I did not learn how SPSA works so
I do not know if SPSA try to detect the sensitive values when it get a list of parameters to tune but so far I saw only red results with SPSA.

Uri
Hi Uri,

well, I'm pretty sure SPSA works as intended.
But I'm also sure, there are many possibilities to mess things up correctly setting up a tuning session. I find CLOP easier to use. But maybe that's just me ... :lol:

I think, that when using fishtest for tuning sessions, we have to deal with 2 things.
First, we are talking about gaining 0 - 3 elo most of the times, depending on the sensitivity of the parameters to be tuned.
Second, in a distributed framework like fishtest, there is a lot of variance.

To compensate for the second, I guess much more games have to be played.
So if we need to play 50k games on our local box to tune 2 parameters, we must play ten times as much in fishtest. At least. But that's just a guess of mine ...
Now let's imagine one tries to tune 10 parameters at once. I don't think 500k games are enough ...

Just my 2 cents regards,
Joerg.
Jörg Oster
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Double levers

Post by Lyudmil Tsvetkov »

Joerg Oster wrote:
Lyudmil Tsvetkov wrote: Not 1 or 5 elo, I am absolutely certain SF can gain 50 elo just on better tuned piece values. Absolutely certain. So SF 6 could be just a change of piece values.
Hi Lyudmil,

this may be true for an engine in the beginning of its life, but definitely not for Stockfish.
You may well be right that there are better values, but we are talking about gaining 0 to 3 elo here. At best.

That's simply a truth you must face.

Best, Joerg.
Hi Joerg,

I will tell you what I must face when I post some SF 5 games analysis later, and especially if I find at some point another willing person to push a more comprehensive patch of piece values changes. :D

For me, piece values/imbalance are the weakest part of SF eval code. Its main rival, Komodo, is simply much better there. But other things compensate for SF of course.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Double levers

Post by Lyudmil Tsvetkov »

Joerg Oster wrote:
Uri Blass wrote: I did not see succesful patches with SPSA and my conclusion is that it probably does not work for some reason(maybe people do not use it correctly) because I do not believe that all the values that the stockfish team tested are optimal or not sensitive.

I think that the first thing to do is to detect the sensitive parameters when increasing the value by 20% or reducing the value by 20% cause a significant difference in playing strength and try to tune these values.

I admit that I did not learn how SPSA works so
I do not know if SPSA try to detect the sensitive values when it get a list of parameters to tune but so far I saw only red results with SPSA.

Uri
Hi Uri,

well, I'm pretty sure SPSA works as intended.
But I'm also sure, there are many possibilities to mess things up correctly setting up a tuning session. I find CLOP easier to use. But maybe that's just me ... :lol:

I think, that when using fishtest for tuning sessions, we have to deal with 2 things.
First, we are talking about gaining 0 - 3 elo most of the times, depending on the sensitivity of the parameters to be tuned.
Second, in a distributed framework like fishtest, there is a lot of variance.

To compensate for the second, I guess much more games have to be played.
So if we need to play 50k games on our local box to tune 2 parameters, we must play ten times as much in fishtest. At least. But that's just a guess of mine ...
Now let's imagine one tries to tune 10 parameters at once. I don't think 500k games are enough ...

Just my 2 cents regards,
Joerg.
I tried to find on the internet some simple explanations how autotuning works, and did not find much. My understanding is that, whether CLOP, SPSA or other tuning facility, it adjusts, increases or decreases the numerical values for a specific term, based on the win/loss ratio in a large game database where the abovementioned term occurs multiple times.

If that is so, and I would be glad to be corrected, then what makes autotuners better than human judgement? As I understand it, autotuners can say that with a specific term present, the winning ratio increases, so it will adjust its preset value upwards, more if the winning ratio is bigger. However, the big question for me is how can we be certain that the winning ratio in positions with the above term is not due to other unrelated terms simultaneously present into the game pool? I have no certainty about that at all. It is quite possible that other terms influence the winning ratio.

So what makes an autotuner better than a human assessment? I think it is the other way round. For the simple fact that autotuners can not suppose what other terms could affect the winning ratio, but humans can and their assessments are actually based on that.

So that for me it is very improbable that an autotuner works successfully, unless of course it is very perfected. But I think we are very far from this autotuning perfection stage.

I understand that programmers would like to leave the dirty hard work to machines and concentrate on creative solutions instead, but I do not think they can fully trust autotuners yet.

Do you imagine a time when everything will be done by autotuners? What will programers do then? For, if you can autotune some eval and search terms, then you can definitely also tune all programming code.
tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 4:57 pm
Location: Germany

Re: Double levers

Post by tpetzke »

Auto-tuning works to a certain extend. There are several method either based on already played games or by playing games.

This tuning just automates the stuff that a programmer would do manually otherwise, increasing and decreasing certain values and look what value performs best. Completely remove intuition in that process, it is worthless.

All the terms interact somehow with each other. It is impossible to say what the effect of a change is until you try it against a scoring method (game play or correctness of prediction of game outcomes or something else). And this can be automated or you do it manually.

Chess knowledge is important to learn what parameters are useful but it is not so useful for estimating the value you assign to a term.

Thomas...
Thomas...

=======
http://macechess.blogspot.com - iCE Chess Engine
Henk
Posts: 7259
Joined: Mon May 27, 2013 10:31 am

Re: Double levers

Post by Henk »

If a neural network would play chess, the only thing a programmer has to do is start and stop the tuner.