hgm wrote:The 'unbalancing' of the openings was obviously done very poorly, and a complete bust. They were not anywhere near unbalanced enough. Pawn odds is known to produce around 20% advantage (i.e. a 30% score for the handicapped side), and even this falls short of the mark, as a 50-50 mix of draws and wins would cause a 75% score.
So it was predictable the out-of book advantage would have to be a Pawn and then something, and the chosen lines only provided a fraction of that.
I'm not sure where you got your above data, but a 70% score for a healthy extra pawn for Komodo at the start sounds pretty low, assuming you are averaging White and Black. Keep in mind that in the initial position any pawn removed other than the "f" pawn automatically gives some compensation in mobility. A score of about 0.7 to 0.75 in the opening for Komodo seems to be about the score which produces equal White wins and draws, which is roughly what an extra pawn in the opening would show if all else is equal. But yes, the positions were not unbalanced enough to avoid excessive draws.
The problem is that there is no right amount of imbalance once the engines start to play close to "perfection".
If 0.72 out of the opening is still a theoretical draw, then perfect engines will draw it, no matter what. If the engines are only close to perfect you might have a few wins and losses, but they will be rare.
Similarly, if 0.73 out of the opening is a theoretical win for white, then perfect engines will win it as white, no matter what.
A high win rate is very easy to achieve: just start with positions that have a winning advantage for white or for black. But this will not guarantee more interesting games than games starting from an equal position.
We are very far away from this scenario. To show this I plotted expected performance as a function of eval value of a recent Stockfish for time controls separated by a factor of 60:
TCEC is roughly another factor of 60 longer TC than 2' + 1.2'', so imagine TCEC line somewhere there higher, distanced by a similar amount to the seen separation. Nowhere near yet your scenario, and observe that on the relevant eval range [0.50, 1.00] the variation is quite small. So, choosing good unbalanced openings for TCEC is feasible. And we are very far away from perfect play.
With the help of Ferdy (I had problems with TCEC PGN file formatting), I made a similar plot for Komodo in the Superfinal. Having only 100 games, the result is very rough, but nevertheless illuminating in the same sense, that talking of "behavior close to perfect engine" at TCEC doesn't make much sense. In fact, the behavior is more similar to that of 2'+1.2'' time control, as I predicted in the previous post.
In fact, this evening I got curious about TCEC games versus ultra-fast games in Komodo's dependency of expected outcome of the games and the value of the eval. TCEC Season 8 Superfinal compared to 5 sec + 0.05 sec time control on one core.
The effective total difference (average of searched nodes per move) is about a factor of 10,000 between these two cases. The ELO difference in these two scenarios is probably no less than 800 ELO points. The shape of the curve seems to converge slowly to a perfect engine proposed by Ronald. But it seems, at least visually (visual regression ), that to reach closeness to perfect engine, at least another 800 ELO points above TCEC are needed, probably more. Also, the "sweet spot", where roughly half of games are wins and half are draws (with a few losses), is about eval 0.75 in fast games for Komodo. In TCEC games this value seems to increase to 0.90. I think TCEC Superfinal needed something like 0.80 unbalanced openings, instead of 0.30-0.50 used in the last 34 games.
Here's an opening I found with an exit eval of -0.79 from K9.3 at depth 29:
1. e4 c6 2. d4 d5 3. Nc3 dxe4 4. f3 exf3 5. Nxf3 Nf6 6. Bc4 Bf5 7.Ne5 e6 8. O-O
[d]rn1qkb1r/pp3ppp/2p1pn2/4Nb2/2BP4/2N5/PPP3PP/R1BQ1RK1 b kq - 0 8
Kai: I have a small quibble about that "perfect engine" step function-like curve.
I would say a perfect engine doesn't use a continuous eval function, with +1.00 (or +0.9 or +1.1 or whatever) equal to win. A perfect engine only has three evals: win/draw/loss.
Now, you *can* translate this to real-world engine evals of +1/0/-1 but each engine has its own eval, and consequently a step function critical win/draw/loss value of its own.
In other words, it's not where eval has its centre, but how steep the climb from win/draw/loss is. So showing the same perfect step function for both engines might be slightly misleading.
If you see what I mean... not sure I was clear. In fact, you're probably thinking the same as me.. lol. There's another thing, but I'll keep that for later
whereagles wrote:Kai: I have a small quibble about that "perfect engine" step function-like curve.
I would say a perfect engine doesn't use a continuous eval function, with +1.00 (or +0.9 or +1.1 or whatever) equal to win. A perfect engine only has three evals: win/draw/loss.
Now, you *can* translate this to real-world engine evals of +1/0/-1 but each engine has its own eval, and consequently a step function critical win/draw/loss value of its own.
In other words, it's not where eval has its centre, but how steep the climb from win/draw/loss is. So showing the same perfect step function for both engines might be slightly misleading.
If you see what I mean... not sure I was clear. In fact, you're probably thinking the same as me.. lol. There's another thing, but I'll keep that for later
Sure you are right, this "perfect" engine step function is just a guidance for what might be the limit to which tends such heuristics based engines like K and SF to infinite time controls and/or infinite hardware. When the perfect engine will be built, it sure will have nothing to do with shown step function perfect engine. As for identical for each engine, my goal was not to make it identical for K and SF (in fact the last one has a step at eval=0.9, previous two at 0.8), but where the function is probably the steepest according to my previous experiments which show the highest sensitivity in the region Wins=Draws and a few losses, a total expected performance of about 70% (say 47% Wins, 47% Draws, 6% Losses), as a guidance too. This "guidance" is somehow confirmed in the last plot, where TCEC Komodo indeed tends a bit towards a fantasy heuristic perfect engine shown as a step function at 0.90, compared to very short time controls.
Laskos wrote:In fact, this evening I got curious about TCEC games versus ultra-fast games in Komodo's dependency of expected outcome of the games and the value of the eval.
Sorry if I have missed this, but what are your graphs showing exactly?
Does each point correspond to multiple self-plays from one and the same opening position? Or does each point correspond to self-plays from various opening positions, each with (approximately) the same initial evaluation?
(Of course different points on the same line correspond to different opening positions.)
Laskos wrote:In fact, this evening I got curious about TCEC games versus ultra-fast games in Komodo's dependency of expected outcome of the games and the value of the eval.
Sorry if I have missed this, but what are your graphs showing exactly?
Does each point correspond to multiple self-plays from one and the same opening position? Or does each point correspond to self-plays from various opening positions, each with (approximately) the same initial evaluation?
(Of course different points on the same line correspond to different opening positions.)
Different middlegame positions (moves 20-30) for an interval of eval (say a shown eval of 0.4 means interval [0.3,0.5]). I can do only with these moving averages because I need to collect some sufficient data, which is easy for many ultra-fast games, but hard for only 100 TCEC games. Games are self-play for short ones done here locally, K-SF games for TCEC Superfinal.
Laskos wrote:In fact, this evening I got curious about TCEC games versus ultra-fast games in Komodo's dependency of expected outcome of the games and the value of the eval.
Sorry if I have missed this, but what are your graphs showing exactly?
Does each point correspond to multiple self-plays from one and the same opening position? Or does each point correspond to self-plays from various opening positions, each with (approximately) the same initial evaluation?
(Of course different points on the same line correspond to different opening positions.)
Different middlegame positions (moves 20-30) for an interval of eval (say a shown eval of 0.4 means interval [0.3,0.5]). I can do only with these moving averages because I need to collect some sufficient data, which is easy for many ultra-fast games, but hard for only 100 TCEC games. Games are self-play for short ones done here locally, K-SF games for TCEC Superfinal.
If each point represents the average result starting from several positions (with approximately the same eval), then the graph for the "perfect engine" most likely won't look like a step function. Some positions that are evaluated as 0.40 will be theoretical draws (and will be drawn by a perfect engine in self-play), some positions will be theoretical wins (and will be won by a perfect engine in self-play). So the "perfect graph" will look like those for SF and K.