Page 7 of 8

Re: Komodo vs. Larry K on chess.com

Posted: Wed Sep 04, 2019 3:58 am
by lkaufman
Uri Blass wrote: Wed Sep 04, 2019 12:49 am
lkaufman wrote: Tue Sep 03, 2019 12:22 am
leavenfish wrote: Mon Sep 02, 2019 7:59 pm Seriously Larry - you think 1/10th of a pawn (.2) is "Very meaningful" compared to .1 in Stockfish? Being so close to 'zero' (and perhaps given their large contempt for play purposes) one might think it closer to negligible than very meaningful. And are you speaking of pure evaluation or chances in OTB play or even computer vs computer play?

I am watching Komodo MTCS (not Stockfish so, different animal) churn away right now on a position at 3 pv and 3 core.

1 .02
2 .09 (okay, not a full 1/10th...)
3 .39

Now to me, the difference between 1 and 2 is really tiny (negligible really - might even change with an eval tweaking in your next (or a previous) iteration)...given their relation to .39, well that I notice more...and would be more likely to toss it aside.
I'm not talking about a score of 0.2, but a score difference of 0.2, so Contempt doesn't enter into the discussion. If the top move is evaluated as 0.2 higher than the second move, the probability that it is really the better move is very high, maybe 90 % or so. That is true regardless of who is playing. Of course if the players are weak players, then playing the better move isn't likely to matter much, they make too many errors.
The term better move is not defined.
Better move based on what?
You can decide based on analysis by the computer for higher depth but the computer may be biased by the same wrong evaluation.

For example it is going to insist that Kxh1 is better than Ra2 in the following position but I consider both moves as equal.

[d]7k/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/R5Kb w - - 0 1
Of course "better move" is not always clear, it's only clear when one move gets the win or draw and the other does not, with perfect play. But in most positions that are not yet clearly won or clearly drawn, one move is likely to have a higher win probability against unpredictible but strong opposition than another move, and that's what we generally call a "better move". Naturally sometimes there will be disagreement over which move is better even between players using the same engines and overnight analysis. Perhaps I should say that if one move is 0.2 above another, then if there is some concensus among correspondence players using top engines as to which is better, 90% of the time it will be the higher scoring one, even after say a ten second search or so. Another way to look at it, is this: if the top move is 0.2 above the second move, but a top GM not using an engine prefers the second one, which would you play with no other information? I would play the engine choice. 0.2 is meaningful enough to overrule the opinion of a 2800 GM not using an engine, in my mind at least.

Re: Komodo vs. Larry K on chess.com

Posted: Wed Sep 04, 2019 4:58 am
by jp
lkaufman wrote: Wed Sep 04, 2019 3:58 am Another way to look at it, is this: if the top move is 0.2 above the second move, but a top GM not using an engine prefers the second one, which would you play with no other information? I would play the engine choice. 0.2 is meaningful enough to overrule the opinion of a 2800 GM not using an engine, in my mind at least.
What about this: if the top move is 0.2 more than the second, but a top GM using an engine (in home prep) prefers the second one, which would you play with no other information? Is it that the top GM is deciding to play "worse" for practical purposes, or is it that he doesn't believe the engine?

Re: Komodo vs. Larry K on chess.com

Posted: Wed Sep 04, 2019 7:03 am
by lkaufman
jp wrote: Wed Sep 04, 2019 4:58 am
lkaufman wrote: Wed Sep 04, 2019 3:58 am Another way to look at it, is this: if the top move is 0.2 above the second move, but a top GM not using an engine prefers the second one, which would you play with no other information? I would play the engine choice. 0.2 is meaningful enough to overrule the opinion of a 2800 GM not using an engine, in my mind at least.
What about this: if the top move is 0.2 more than the second, but a top GM using an engine (in home prep) prefers the second one, which would you play with no other information? Is it that the top GM is deciding to play "worse" for practical purposes, or is it that he doesn't believe the engine?
If he chose the move for a specific OTB game for the first time, then his motivation may have been surprise or practical considerations. If he plays it a second time or in a correspondence game then he must really believe it is better, and I would assume he has good reason for this, perhaps having used multiple engines, or knowing practical results from the position, or knowing that the position type is one that engines aren't good at, etc. In other words, if he knows that the move he likes is 0.2 worse per Stockfish and still chooses it when surprise is not a factor, then I would trust his judgment. It doesn't mean that 0.2 is not quite significant, just that top GMs know this and must have good reason to overrule it.

Re: Komodo vs. Larry K on chess.com

Posted: Thu Sep 05, 2019 11:48 am
by Ovyron
lkaufman wrote: Mon Sep 02, 2019 5:41 pmSf changed its scale rather noticeably in the past year or so. What used to be a +.20 might now get close to a .30.
Could you post examples of this? I have just perceived the opposite trend over the years (the scale has been approaching 0.00 consistently... some positions that used to be 0.60 are 0.30 now...), and since I have used Stockfish pretty much daily for analysis of positions since Stockfish 1.01 JA I find it bizarre that something of this magnitude happened right under my nose, and didn't affect me at all.
lkaufman wrote: Mon Sep 02, 2019 5:41 pm0.1 not so much...I'm sure I'm not the only GM with this opinion.
But this makes no sense. A difference of 0.1 is significant because it means if you continue playing moves that are 0.1 worse than best, in 9 moves the difference will be 0.90 from best.

I guess such a mentality would explain how I'm able to build huge advantages against people on corr chess (if they don't think small mistakes pile up), but 0.1 has to be significant because it means in future positions you're going to restrict yourself to the best move or nothing (assuming you don't want a 1.00 advantage for your opponent in 10 moves, once you play nine 0.10 worse than best moves and one 0.05 worse than best move, you're at 0.95 disadvantage and then the 0.10 worse than best move puts you over the threshold and becomes significant, so it had to be significant before this point. In general for Stockfish, a 0.01 difference is significant once you're at 0.19 disadvantage, because the breaking point at which you're going to only draw or lose the game without chances of winning happens around there.)

Re: Komodo vs. Larry K on chess.com

Posted: Thu Sep 05, 2019 5:09 pm
by jp
Ovyron wrote: Thu Sep 05, 2019 11:48 am
lkaufman wrote: Mon Sep 02, 2019 5:41 pm0.1 not so much...I'm sure I'm not the only GM with this opinion.
But this makes no sense. A difference of 0.1 is significant because it means if you continue playing moves that are 0.1 worse than best, in 9 moves the difference will be 0.90 from best.

I guess such a mentality would explain how I'm able to build huge advantages
This isn't a good argument. It assumes the engine has no error and won't change its evaluation at higher depths. Even if it doesn't change its evaluation, that doesn't mean it must be right. The question is in what position is its error in judgement less than 0.1 at CC depths.

And even if the engine is right, you won't have a 1.0 advantage after 10 moves, because that would need Stockfish to have 0.1 differences on every one of those 10 moves AND the human to overrule the engine's top choice every time.

Re: Komodo vs. Larry K on chess.com

Posted: Thu Sep 05, 2019 6:11 pm
by lkaufman
Ovyron wrote: Thu Sep 05, 2019 11:48 am
lkaufman wrote: Mon Sep 02, 2019 5:41 pmSf changed its scale rather noticeably in the past year or so. What used to be a +.20 might now get close to a .30.
Could you post examples of this? I have just perceived the opposite trend over the years (the scale has been approaching 0.00 consistently... some positions that used to be 0.60 are 0.30 now...), and since I have used Stockfish pretty much daily for analysis of positions since Stockfish 1.01 JA I find it bizarre that something of this magnitude happened right under my nose, and didn't affect me at all.
lkaufman wrote: Mon Sep 02, 2019 5:41 pm0.1 not so much...I'm sure I'm not the only GM with this opinion.
But this makes no sense. A difference of 0.1 is significant because it means if you continue playing moves that are 0.1 worse than best, in 9 moves the difference will be 0.90 from best.

I guess such a mentality would explain how I'm able to build huge advantages against people on corr chess (if they don't think small mistakes pile up), but 0.1 has to be significant because it means in future positions you're going to restrict yourself to the best move or nothing (assuming you don't want a 1.00 advantage for your opponent in 10 moves, once you play nine 0.10 worse than best moves and one 0.05 worse than best move, you're at 0.95 disadvantage and then the 0.10 worse than best move puts you over the threshold and becomes significant, so it had to be significant before this point. In general for Stockfish, a 0.01 difference is significant once you're at 0.19 disadvantage, because the breaking point at which you're going to only draw or lose the game without chances of winning happens around there.)
With SF it is the basic definition of what score to return when a clean pawn ahead on a full board that has changed many times. I usually use the opening position with f2 removed to measure this, because the slight weakening of the White king roughly offsets his first move advantage so a score of close to 1.00 would seem to me to be ideal. Just compare the score for different SF versions.
I certainly don't think that a 0.1 change has no significance, just that it is small enough to allow a strongly held belief by a 2800 player that the second move is better to overrule it. For a 0.2 change he would have to have some objective evidence beyond just belief for me to accept his choice as better.

Re: Komodo vs. Larry K on chess.com

Posted: Fri Sep 06, 2019 1:04 am
by jp
For Stockfish with multiPV you often see two lines that are just transpositions of each other and have different evaluations, but I don't remember whether the differences are usually 0.1 or more.

Re: Komodo vs. Larry K on chess.com

Posted: Sat Sep 07, 2019 7:02 pm
by leavenfish
lkaufman wrote: Thu Sep 05, 2019 6:11 pm
Ovyron wrote: Thu Sep 05, 2019 11:48 am
lkaufman wrote: Mon Sep 02, 2019 5:41 pmSf changed its scale rather noticeably in the past year or so. What used to be a +.20 might now get close to a .30.
Could you post examples of this? I have just perceived the opposite trend over the years (the scale has been approaching 0.00 consistently... some positions that used to be 0.60 are 0.30 now...), and since I have used Stockfish pretty much daily for analysis of positions since Stockfish 1.01 JA I find it bizarre that something of this magnitude happened right under my nose, and didn't affect me at all.
lkaufman wrote: Mon Sep 02, 2019 5:41 pm0.1 not so much...I'm sure I'm not the only GM with this opinion.
But this makes no sense. A difference of 0.1 is significant because it means if you continue playing moves that are 0.1 worse than best, in 9 moves the difference will be 0.90 from best.

I guess such a mentality would explain how I'm able to build huge advantages against people on corr chess (if they don't think small mistakes pile up), but 0.1 has to be significant because it means in future positions you're going to restrict yourself to the best move or nothing (assuming you don't want a 1.00 advantage for your opponent in 10 moves, once you play nine 0.10 worse than best moves and one 0.05 worse than best move, you're at 0.95 disadvantage and then the 0.10 worse than best move puts you over the threshold and becomes significant, so it had to be significant before this point. In general for Stockfish, a 0.01 difference is significant once you're at 0.19 disadvantage, because the breaking point at which you're going to only draw or lose the game without chances of winning happens around there.)
With SF it is the basic definition of what score to return when a clean pawn ahead on a full board that has changed many times. I usually use the opening position with f2 removed to measure this, because the slight weakening of the White king roughly offsets his first move advantage so a score of close to 1.00 would seem to me to be ideal. Just compare the score for different SF versions.
I certainly don't think that a 0.1 change has no significance, just that it is small enough to allow a strongly held belief by a 2800 player that the second move is better to overrule it. For a 0.2 change he would have to have some objective evidence beyond just belief for me to accept his choice as better.
Lets grant for a moment that you are correct in what you say. OTB is a game of mistakes (great and small) and less than perfect moves during the course of a game...from duffer to GM.

If line A. gives an eval of +.12 (or less...say .09) and line B gives an eval of +.14 , line A could be harder for your opponent to navigate and ultimately be there better choice.

I know you are not arguing against that obvious point...just pointing out that such arguments as these occur within a vacuum of sorts.

I have never tried this, but it would seem with your argument, even a Stockfish 10 vs Stockfish 10, same hardware, once one side got an edge of say .02, it seems that the other side would never win. But perhaps I am taking your argument too far?

Re: Komodo vs. Larry K on chess.com

Posted: Sat Sep 07, 2019 7:14 pm
by lkaufman
leavenfish wrote: Sat Sep 07, 2019 7:02 pm
lkaufman wrote: Thu Sep 05, 2019 6:11 pm
Ovyron wrote: Thu Sep 05, 2019 11:48 am
lkaufman wrote: Mon Sep 02, 2019 5:41 pmSf changed its scale rather noticeably in the past year or so. What used to be a +.20 might now get close to a .30.
Could you post examples of this? I have just perceived the opposite trend over the years (the scale has been approaching 0.00 consistently... some positions that used to be 0.60 are 0.30 now...), and since I have used Stockfish pretty much daily for analysis of positions since Stockfish 1.01 JA I find it bizarre that something of this magnitude happened right under my nose, and didn't affect me at all.
lkaufman wrote: Mon Sep 02, 2019 5:41 pm0.1 not so much...I'm sure I'm not the only GM with this opinion.
But this makes no sense. A difference of 0.1 is significant because it means if you continue playing moves that are 0.1 worse than best, in 9 moves the difference will be 0.90 from best.

I guess such a mentality would explain how I'm able to build huge advantages against people on corr chess (if they don't think small mistakes pile up), but 0.1 has to be significant because it means in future positions you're going to restrict yourself to the best move or nothing (assuming you don't want a 1.00 advantage for your opponent in 10 moves, once you play nine 0.10 worse than best moves and one 0.05 worse than best move, you're at 0.95 disadvantage and then the 0.10 worse than best move puts you over the threshold and becomes significant, so it had to be significant before this point. In general for Stockfish, a 0.01 difference is significant once you're at 0.19 disadvantage, because the breaking point at which you're going to only draw or lose the game without chances of winning happens around there.)
With SF it is the basic definition of what score to return when a clean pawn ahead on a full board that has changed many times. I usually use the opening position with f2 removed to measure this, because the slight weakening of the White king roughly offsets his first move advantage so a score of close to 1.00 would seem to me to be ideal. Just compare the score for different SF versions.
I certainly don't think that a 0.1 change has no significance, just that it is small enough to allow a strongly held belief by a 2800 player that the second move is better to overrule it. For a 0.2 change he would have to have some objective evidence beyond just belief for me to accept his choice as better.
Lets grant for a moment that you are correct in what you say. OTB is a game of mistakes (great and small) and less than perfect moves during the course of a game...from duffer to GM.

If line A. gives an eval of +.12 (or less...say .09) and line B gives an eval of +.14 , line A could be harder for your opponent to navigate and ultimately be there better choice.

I know you are not arguing against that obvious point...just pointing out that such arguments as these occur within a vacuum of sorts.

I have never tried this, but it would seem with your argument, even a Stockfish 10 vs Stockfish 10, same hardware, once one side got an edge of say .02, it seems that the other side would never win. But perhaps I am taking your argument too far?
Of course that does not follow. Even with engines chess is still a game of mistakes. Having a plus score is no guarantee of victory or even of getting a draw. It just means that if you want to bet on which side will win, you would be foolish to bet on the side with the worse score unless you have some other information or insight.

Re: Komodo vs. Larry K on chess.com

Posted: Sat Sep 07, 2019 9:14 pm
by Ovyron
lkaufman wrote: Thu Sep 05, 2019 6:11 pm With SF it is the basic definition of what score to return when a clean pawn ahead on a full board that has changed many times. I usually use the opening position with f2 removed to measure this
I really don't think that works in practice for actual chess positions. I have full records of actual chess positions (usually openings) that I've analyzed over the years since 2007, with Glaurung and Stockfish versions more and more modern included, where it's clear the trend has been getting a score closer to 0.00 - positions where old Stockfish used to say "0.80" and today's Stockfish says "0.19". And I hadn't seen a jump in recent years in the scale.

If anything, over the years Stockfish's eval and the material on the board has become disjointed, 1.00 as an advantage has not meant "a pawn advantage" for a long while now, and I don't even know what it means anymore. All I know is that, all things being equal, you'd rather have that 1.00 advantage than a 0.90 advantage, and it usually correlates with chances of winning, specially against weak opposition.
lkaufman wrote: Thu Sep 05, 2019 6:11 pm I certainly don't think that a 0.1 change has no significance, just that it is small enough to allow a strongly held belief by a 2800 player that the second move is better to overrule it.
I'd agree, specially if the 0.1 better move is trading a very important piece that may be key to winning, or break a pawn structure that one just knows is worse than the alternative, etc. My question is, how many of such overrules are you going to allow per game, and then, why are the overrules closer to the limit more significant than the others? (say, if you allow 5 such overrules, once you have burned out 4, the last one is significant because it means you can't do another in the game.)

My claim is overruling depends on the advantage or disadvantage that you already have on the game, so a "0.1" score difference can't be applied to all situations. A difference between 0.90 and 1.00 disadvantage might be the difference between saving and losing the game, while a difference between 0.00 and 0.10 might be the difference between increasing your chances of winning or giving the opponent an easy draw. Both are significant scenarios at the edge, and there's no point I can discern where a difference of this magnitude is insignificant.