Dylan Sharp Vs. Harvey Williamson (G4)

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

jp
Posts: 1480
Joined: Mon Apr 23, 2018 7:54 am

Re: Dylan Sharp Vs. Harvey Williamson (G4)

Post by jp »

Ras wrote: Thu Oct 24, 2019 7:01 pm
zullil wrote: Thu Oct 24, 2019 6:01 pmCan you provide one example at a time control like this one? What you say is certainly true, but is much more common at very short time controls.
Inn correspondence chess, players who follow Stockfish blindly are not too hard opponents for experienced correspondence players. At least one, maybe even both of the players have experience in correspondence chess. LC0 was just an example that some structural issues do exist even with Stockfish, as incredibly strong as it is.
Of course engines of any type are far below perfect play.

But as for SF vs. Lc0, the examples in the main forum so far show that at long TC they agree most of the time (even when we humans suspect they both are "wrong", sometimes to 14 or 20 plies), and I'm still waiting for examples of great moves that Lc0 comes up with that SF cannot find. So far there are none. People have claimed examples and they have been shown to be wrong.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Dylan Sharp Vs. Harvey Williamson (G4)

Post by zullil »

Harvey Williamson wrote: Thu Oct 24, 2019 1:40 pm A fast reply today as I expected this.
[pgn]1. g4 d5 2. g5 e5 3. d4 exd4 4. Nf3 c5 5. Bg2 Ne7 6. c3 dxc3 7. Nxc3 Nbc6 8. O-O d4[/pgn] and another conditional if 9. Ne4 Ng6
-1.52 at depth 63.
User avatar
Ovyron
Posts: 4558
Joined: Tue Jul 03, 2007 4:30 am

Re: Dylan Sharp Vs. Harvey Williamson (G4)

Post by Ovyron »

jp wrote: Thu Oct 24, 2019 1:58 pm
Ovyron wrote: Thu Oct 24, 2019 1:41 am A) The evaluation gets down very near 0. This meant the PV had a blunder by black somewhere and the eval from the root couldn't be trusted.
B) The evaluation will explode to the high 2.00 or more. This meant the PV had a blunder by white somewhere and the eval from the root couldn't be trusted.
C) The evaluation remains stable, so around where it is now, because of law of averages or because the moves from both side were very good.

If C happens then you'll have from this line what you'd have seen at Depth 90 from the root! :D

Is this what you meant when you wrote the post below?

Ovyron wrote: Mon Oct 14, 2019 5:12 pm you just need to have a third party software like Chess Openings Wizard and backsolve manually scores of positions, and then you'll have accurate scores of everything you analyze, and all you need to do for positions you already analyzed is exclude the moves you already have evaluated and ask the engine what it thinks about the rest with something like my patched McBrain X with Smarter Tactical setting, and this will beat the analysis of a hash full of relevant positions, something like Persistent Hash, or a Stockfish with learning, but you have to actually put the work into it and analyze the relevant positions to evaluate them.
No. The whole point was going to be that once you run the experiment and see what happens, which is practice and not theory, you'll end up with a line. In the A case you find the blunder and find the best line so black continues the attack. In case B you find the blunder and find the best line so white continues to defend. In case C you check for improvements for both sides and find the best line so the game continues on.

Whatever happens once you try this, the challenge, the question is "is there any way you can find this line with the eval you end with in a faster time?" And the answer is: Yes, yes you can. In fact, you can find this line with this score in a time faster than what it'd take to reach Depth 60 at the root.

Once you find this method that finds high depth lines with accurate scores in a short time, you don't need to reach high depth ever again. But you don't need to stop there, the point of this method is that you can emulate any arbitrary depth, 60 is being thrown around because that's what Zullil is throwing at us but the limit is self-refutation, the point where evals of all moves flatten because the position is 0.00, so against an inferior opponent there could have been a line that defeated them, but you saw how the opponent would have defended it, so you don't play it, and the game is drawn. In this game I have no hopes of winning so self-refutation will not be a problem.
Zenmastur wrote: Thu Oct 24, 2019 1:59 pmYour machine is so old there's probably not much you can do about it until you upgrade it. I'll give you one hint: It has nothing to do with a lack of processing power.
My machine is fast enough that I've been able to find the moves I'm playing against Harvey in 10-60 minutes, I've spent most of my time analyzing my other 16 games. And this is without ever reaching any depth near 60. The higher the Depth the fewer positions you can analyze. If Zullil's computer is 10 times faster than mine, he could do what I do and find the Depth 60 line with my eval (which has been more stable than his) in 6 minutes. Unless he's really reaching Depth 60 in 6 minutes, whatever time he's using to reach it is mostly a waste of resources. Because he doesn't interact.

Since there's no clock, I have all the time in the world to decide on my moves, if I upgraded my machine, all it'd do is allowing me to find my moves faster (with a twice as fast machine I would find it in 5-30 minutes?) but I think what I have is allowing me to play them in reasonable time. Even someone complained that we're playing too fast :shock:

The scariest thing in this game is Harvey's brain, nobody but him has access to that resource.
jp
Posts: 1480
Joined: Mon Apr 23, 2018 7:54 am

Re: Dylan Sharp Vs. Harvey Williamson (G4)

Post by jp »

Ovyron wrote: Fri Oct 25, 2019 3:05 am Whatever happens once you try this, the challenge, the question is "is there any way you can find this line with the eval you end with in a faster time?" And the answer is: Yes, yes you can. In fact, you can find this line with this score in a time faster than what it'd take to reach Depth 60 at the root.
The real question is: "What's the fastest way to find the PV and reliable eval and know with very high confidence that they are correct?"

There's much strange behavior in the main forum where people become attached to moves by trendy engines and don't care to check whether those moves are correct.
User avatar
Ovyron
Posts: 4558
Joined: Tue Jul 03, 2007 4:30 am

Re: Dylan Sharp Vs. Harvey Williamson (G4)

Post by Ovyron »

jp wrote: Fri Oct 25, 2019 5:21 amThe real question is: "What's the fastest way to find the PV and reliable eval and know with very high confidence that they are correct?"
As I've said in other threads there's no such thing as "reliable eval", because all that matters is the ranking of the moves. An eval would just be a number that helps you compare lines and variations, but it doesn't matter what number is it. That is, if you have a method to rank the best moves at the top every time, it doesn't matter if the eval number is twice as much, or half as much.

What I'm saying is that Zullil's -1.50 doesn't mean more than if it was -0.75 or -3.00, what -1.50 means is that all the other alternatives by black on the mainline have a score closer to zero than -1.50 and all the other alternatives by white have a score more far away from zero than -1.50, so you'd rank them worse than the PV.

The only way to gain a very high confidence in your eval is to actually play the moves on the board and check if they're indeed best, and SPOILER ALERT if all you did was sitting at the root to wait for a PV and a score, you'll have a very low confidence in its eval, because the PV will have a move somewhere that you'd not rank first if you were on the position and had to play a move.

I hope Zullil is saving the Depth 60 PVs and can confirm what I say: every time he reports a score it's coming from an entirely different PV from the last report and the first thing Stockgfish dev does when starting the next iteration is realizing how dumb the moves at the tail of the variation are and has to find new ones that don't suck as badly :mrgreen:

Conditional accepted:

1. g4 d5 2. g5 e5 3. d4 exd4 4. Nf3 c5 5. Bg2 Ne7 6. c3 dxc3 7. Nxc3 Nbc6 8. O-O d4 9. Ne4 Ng6

[d]r1bqkb1r/pp3ppp/2n3n1/2p3P1/3pN3/5N2/PP2PPBP/R1BQ1RK1 w kq -

My turn again.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Dylan Sharp Vs. Harvey Williamson (G4)

Post by zullil »

jp wrote: Fri Oct 25, 2019 5:21 am
Ovyron wrote: Fri Oct 25, 2019 3:05 am Whatever happens once you try this, the challenge, the question is "is there any way you can find this line with the eval you end with in a faster time?" And the answer is: Yes, yes you can. In fact, you can find this line with this score in a time faster than what it'd take to reach Depth 60 at the root.
The real question is: "What's the fastest way to find the PV and reliable eval and know with very high confidence that they are correct?"

There's much strange behavior in the main forum where people become attached to moves by trendy engines and don't care to check whether those moves are correct.
Actually, the PV and evaluation are irrelevant to me. All I need is a "best" move in the current position. I'm posting evals from depths above 60 simply to fill the time waiting for the humans to respond.

What I'm hoping to see is evidence from White that his methods work better than what I've been doing, which is basically nothing.

-1.50 at depth 64, by the way.

[EDIT] Two more Stockfish moves have just been played. The eval above was from the position before those moves were made.
jp
Posts: 1480
Joined: Mon Apr 23, 2018 7:54 am

Re: Dylan Sharp Vs. Harvey Williamson (G4)

Post by jp »

Ovyron wrote: Fri Oct 25, 2019 10:28 am
jp wrote: Fri Oct 25, 2019 5:21 amThe real question is: "What's the fastest way to find the PV and reliable eval and know with very high confidence that they are correct?"
As I've said in other threads there's no such thing as "reliable eval", because all that matters is the ranking of the moves. An eval would just be a number that helps you compare lines and variations, but it doesn't matter what number is it. That is, if you have a method to rank the best moves at the top every time, it doesn't matter if the eval number is twice as much, or half as much.
If it's not what matters to you, it doesn't mean there's "no such thing", but that's not the question here anyway. Here, you can take "reliable eval" to mean reliable ranking of moves.

The main point was the reliable part. Correct. True. Well, looking back, I see I put the important part in bold ("know with very high confidence that they are correct"), but that was ignored. :wink:
It's not much help getting an answer really fast if you don't know it's reliable.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: Dylan Sharp Vs. Harvey Williamson (G4)

Post by MikeB »

zullil wrote: Fri Oct 25, 2019 10:52 am
jp wrote: Fri Oct 25, 2019 5:21 am
Ovyron wrote: Fri Oct 25, 2019 3:05 am Whatever happens once you try this, the challenge, the question is "is there any way you can find this line with the eval you end with in a faster time?" And the answer is: Yes, yes you can. In fact, you can find this line with this score in a time faster than what it'd take to reach Depth 60 at the root.
The real question is: "What's the fastest way to find the PV and reliable eval and know with very high confidence that they are correct?"

There's much strange behavior in the main forum where people become attached to moves by trendy engines and don't care to check whether those moves are correct.
Actually, the PV and evaluation are irrelevant to me. All I need is a "best" move in the current position. I'm posting evals from depths above 60 simply to fill the time waiting for the humans to respond.

What I'm hoping to see is evidence from White that his methods work better than what I've been doing, which is basically nothing.

-1.50 at depth 64, by the way.

[EDIT] Two more Stockfish moves have just been played. The eval above was from the position before those moves were made.

+1.5 for black appears to be inflated, maybe it should be somewhere between +.9 to +1.1 for black. The position evaluation inflation is typical for Stockfish. What's the point, well +1.5 centipawn would actually indicate a higher scoring percentage than what is realistic for this position. For this position , the expected outcome of a draw is very high, about 80%. Komodo's evaluation is around 1.2 to 1.25 centipawns after a few minutes which is closer to the truth in my view. Others may disagree , which is fine - not here to debate, just giving my opinion.
Image
jp
Posts: 1480
Joined: Mon Apr 23, 2018 7:54 am

Re: Dylan Sharp Vs. Harvey Williamson (G4)

Post by jp »

Ovyron wrote: Fri Oct 25, 2019 10:28 am I hope Zullil is saving the Depth 60 PVs and can confirm what I say: every time he reports a score it's coming from an entirely different PV from the last report and the first thing Stockgfish dev does when starting the next iteration is realizing how dumb the moves at the tail of the variation are and has to find new ones that don't suck as badly
Of course the tails of the variations should not be trusted. It's better to go to depth 60 not because you demand that the 60th ply makes sense but because its root choice at depth 60 is probably more reliable than its root choice at depth 30.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Dylan Sharp Vs. Harvey Williamson (G4)

Post by zullil »

jp wrote: Fri Oct 25, 2019 2:17 pm
Ovyron wrote: Fri Oct 25, 2019 10:28 am I hope Zullil is saving the Depth 60 PVs and can confirm what I say: every time he reports a score it's coming from an entirely different PV from the last report and the first thing Stockgfish dev does when starting the next iteration is realizing how dumb the moves at the tail of the variation are and has to find new ones that don't suck as badly
Of course the tails of the variations should not be trusted. It's better to go to depth 60 not because you demand that the 60th ply makes sense but because its root choice at depth 60 is probably more reliable than its root choice at depth 30.
I'm not saving any of the PV's. As I said, I could care less about anything other than the move at hand. FWIW, the PV's are like 60 plies long, but have been stable to maybe 30 plies for the last 12 plies of this game. Just waiting for someone to reveal a move that is different from Stockfish's choice. My hunch is that, if White ever does that, he loses.