It is not going to be useful to start from position when white is always winning because you practically get no information from it.
It is clearly possible to try some lines like 1.e4 h6 or 1.d4 a6 in some generic book.
White get a clear advantage of the opening but the main question is if the variety of the result at long time control is higher relative to the variaty of the results with normal books.
It is only possible to find it by testing.
Even if the variety is higher it does not mean that it is better testing because it is possible that some engine is relatively better in lines like 1.e4 h6 and relatively worse in normal lines.
I guess that if you find some positions when white is winning only if white does not fall into some trap that is going to cause black to win,
when half of the top engines fall into the trap at long time control then you can get the biggest variety, but it does not mean that these positions are good for evaluating the level of the engines because the ability not to fall into tactical traps is only part of chess strength.
Notice- Some EXTREMELY Interesting Stockfish Results!
Moderator: Ras
-
Uri Blass
- Posts: 11150
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
-
hgm
- Posts: 28451
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Notice- Some EXTREMELY Interesting Stockfish Results!
I am not sure what exactly you mean by 'variety' here. Is that the same as 'fewer draws'?
Obviously it is just as much a waste of time to test from positions that are (nearly) always won as it is to test from positions that are nearly always draw. My point was that you will get the best efficiency by starting from a positions that are on the boundary of being won or draw, so that even the slightest difference in quality of play can swing the result.
Starting from positions that require one of the players to give up a large amount of score before it gets expressed in the result will become very inefficient once you reach the level of play where this will almost never happen. This applies just as well to positions that are too deep in the 'won sector' as it applies to positions that are to deep in the 'draw sector' (i.e. too far from the boundary between the two). Although the latter obviously extends over a much smaller range than the win or loss sector, as a position that is very far from being won is usually also no longer a draw, but plainly lost.
And you are right in pointing out that only looking at the win-vs-draw statistics for a position is not sufficient to guarantee it is a good starting position. If the difference is only due to avoiding a single trap or falling for it, after which the game is decided, it tests only a very limited aspect of the engine, compared to a game where every move could contribute to the result. Ability to avoid traps is much more efficiently tested through EPD suites of trap positions.
Nevertheless, traps do occur in Chess, and the ability to not fall for them is part of the engine strength. So an occasional position close to a decisive move is not disastrous. As long as you don't systematically select such positions.
Whether there is an effect on quantitative rating measurement from starting from 'on the edge' positions would have to be tested, and if there is, it would be debatable if the effect is good or bad. The ability to defend a poor position, and the ability to retain the win in a better position should be important factors that contribute to engine strength. Excluding positions where the engine is in such a situation (e.g. because the level of play is so high that significant disadvantage only occurs once the early end-game is reached when you started from an equal position) could then fail to expose that an engine is quite poor at this, and discriminate it against engines that are good at it.
Note that people do not only use engines to play engine-engine games from the standard FIDE setup. The overwhelming majority of engine users actually doesn't do that at all. Most people use engines to analyze human games. So that an engine is too strong to ever find itself in a lost or nearly lost position soon after the opening is not really an argument for considering such positions irrelevant.
Obviously it is just as much a waste of time to test from positions that are (nearly) always won as it is to test from positions that are nearly always draw. My point was that you will get the best efficiency by starting from a positions that are on the boundary of being won or draw, so that even the slightest difference in quality of play can swing the result.
Starting from positions that require one of the players to give up a large amount of score before it gets expressed in the result will become very inefficient once you reach the level of play where this will almost never happen. This applies just as well to positions that are too deep in the 'won sector' as it applies to positions that are to deep in the 'draw sector' (i.e. too far from the boundary between the two). Although the latter obviously extends over a much smaller range than the win or loss sector, as a position that is very far from being won is usually also no longer a draw, but plainly lost.
And you are right in pointing out that only looking at the win-vs-draw statistics for a position is not sufficient to guarantee it is a good starting position. If the difference is only due to avoiding a single trap or falling for it, after which the game is decided, it tests only a very limited aspect of the engine, compared to a game where every move could contribute to the result. Ability to avoid traps is much more efficiently tested through EPD suites of trap positions.
Nevertheless, traps do occur in Chess, and the ability to not fall for them is part of the engine strength. So an occasional position close to a decisive move is not disastrous. As long as you don't systematically select such positions.
Whether there is an effect on quantitative rating measurement from starting from 'on the edge' positions would have to be tested, and if there is, it would be debatable if the effect is good or bad. The ability to defend a poor position, and the ability to retain the win in a better position should be important factors that contribute to engine strength. Excluding positions where the engine is in such a situation (e.g. because the level of play is so high that significant disadvantage only occurs once the early end-game is reached when you started from an equal position) could then fail to expose that an engine is quite poor at this, and discriminate it against engines that are good at it.
Note that people do not only use engines to play engine-engine games from the standard FIDE setup. The overwhelming majority of engine users actually doesn't do that at all. Most people use engines to analyze human games. So that an engine is too strong to ever find itself in a lost or nearly lost position soon after the opening is not really an argument for considering such positions irrelevant.
-
Uri Blass
- Posts: 11150
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Notice- Some EXTREMELY Interesting Stockfish Results!
variety is not about number of draws and it has a mathematical definition.
always white wins is obviously 0 variety.
I simply wrote that always white win give no information because you wrote:
"It is totaly trivial to make a generic book where every engine comes out of the opening a Queen ahead."
My point is that it is not going to be useful and it is not obvious what is going to be useful without testing.
positions when the probability for a win is 50% and the probability for a draw
is 50% may be useful but you need testing to find them.
The biggest variety is of course with positions with no draws and 50% wins for white and 50% win for black and my point is that only using them is not useful because they may include only traps that half of the top programs fall into them.
always white wins is obviously 0 variety.
I simply wrote that always white win give no information because you wrote:
"It is totaly trivial to make a generic book where every engine comes out of the opening a Queen ahead."
My point is that it is not going to be useful and it is not obvious what is going to be useful without testing.
positions when the probability for a win is 50% and the probability for a draw
is 50% may be useful but you need testing to find them.
The biggest variety is of course with positions with no draws and 50% wins for white and 50% win for black and my point is that only using them is not useful because they may include only traps that half of the top programs fall into them.
-
hgm
- Posts: 28451
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Notice- Some EXTREMELY Interesting Stockfish Results!
I did not know that as a mathematical concept. Is this a non-numeric version of 'variance'?Uri Blass wrote:variety is not about number of draws and it has a mathematical definition.
always white wins is obviously 0 variety.
Well, I never claimed it would be useful to do it. Just trivial. So I think we are in total agreement here.I simply wrote that always white win give no information because you wrote:
"It is totaly trivial to make a generic book where every engine comes out of the opening a Queen ahead."
My point is that it is not going to be useful
You could use the evaluation of the position as a first approximation. If you have a large collection of engine games annotated with scores, you could select all positions that occurred between move 10 and 15 and had a score between 0.7 and 1.3, (say). With usual engine evaluation these should result in scores of 60-75% (Pawn odds typically scores 68-72%.) These should make a good starting point, and as you gather statistics using them, you can weed out those that do not give sufficient variety.and it is not obvious what is going to be useful without testing.
positions when the probability for a win is 50% and the probability for a draw
is 50% may be useful but you need testing to find them.
Indeed, I agree that such positions would be suspect. But note I did not propose anything that would lead to selection of such positions above their natural occurrence. What I suggested was to use positions closer to the win-draw boundary, rather than in the center of the draw range. Such positions would presumably increase the number of wins at the expense of primarily the number of losses, and to a lesser extent the number of draws. Like having a W/D/L of 49/50/1 instead of 10/80/10.The biggest variety is of course with positions with no draws and 50% wins for white and 50% win for black and my point is that only using them is not useful because they may include only traps that half of the top programs fall into them.
-
lucasart
- Posts: 3243
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Notice- Some EXTREMELY Interesting Stockfish Results!
Can you enlighten us on that ?Uri Blass wrote:variety is not about number of draws and it has a mathematical definition.
I would be interested to see your mathematical definition.
PS: Variety has indeed a mathematical definition, but I don't see what it's got to do with chess openings...
http://en.wikipedia.org/wiki/Algebraic_variety
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
Uri Blass
- Posts: 11150
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Notice- Some EXTREMELY Interesting Stockfish Results!
I meant variance when I wrote variety.
-
hgm
- Posts: 28451
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Notice- Some EXTREMELY Interesting Stockfish Results!
OK, I was confused by that. Because I could imagine there exists some numeric measure for the variation in a non-numeric stochastic process.
E.g. consider a 3-player game where the outcome can be a win by R, G or B. No way to define a variance, but intuitively you would think that when the respective probabilities are 33/33/33 it has an outcome that varies more than with 50/0/50, which again has more than with 100/0/0 (which does not vary at all). But would 50/0/50 be more variation than 70/10/20?
E.g. consider a 3-player game where the outcome can be a win by R, G or B. No way to define a variance, but intuitively you would think that when the respective probabilities are 33/33/33 it has an outcome that varies more than with 50/0/50, which again has more than with 100/0/0 (which does not vary at all). But would 50/0/50 be more variation than 70/10/20?
-
Uri Blass
- Posts: 11150
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Notice- Some EXTREMELY Interesting Stockfish Results!
I think that what we need is variance rating list for different openings.
take 10 top engines and give them to play the same openings between themselves at 40/20 time control when every pair of engines get white and black.
There are 90 results of games.
Calculate the variance of the result of the white pieces.
Do the same thing for many openings.
You can do the same also for faster time control like 40/2 to see if the variance go down with longer time control.
I believe that the score of top programs out of book does not give us enough information because there may be 0.00 score with 80% draws and 10% for both sides and there may be 0.00 score with 60% draws and 20% for both sides and there may be even 0.00 score with non balanced scores like 50% for white and only 10% for black.
take 10 top engines and give them to play the same openings between themselves at 40/20 time control when every pair of engines get white and black.
There are 90 results of games.
Calculate the variance of the result of the white pieces.
Do the same thing for many openings.
You can do the same also for faster time control like 40/2 to see if the variance go down with longer time control.
I believe that the score of top programs out of book does not give us enough information because there may be 0.00 score with 80% draws and 10% for both sides and there may be 0.00 score with 60% draws and 20% for both sides and there may be even 0.00 score with non balanced scores like 50% for white and only 10% for black.
-
hgm
- Posts: 28451
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Notice- Some EXTREMELY Interesting Stockfish Results!
Well, variance based on the numeric 1/0.5/0 result would not be the good measure, as it would be largest for 50% wins, 50% losses and 50% draws. Which is exactly what you don't want, as these are the trap positions we discussed. What you would want is positions with 0/50/50 and 50/50/0: low W or L, and the rest evenly distributed.
Otherwise your proposal is good. You could even handicap engines that are known to be stronger by giving them small time odds.
Otherwise your proposal is good. You could even handicap engines that are known to be stronger by giving them small time odds.
-
Uri Blass
- Posts: 11150
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Notice- Some EXTREMELY Interesting Stockfish Results!
I also think about time odds that there are positions that are less sensitive to time odds not because of small variance in the results and it means that program earn less from doubling the time control inspite of big variance in the results because part of the top programs simply do not understand them and may go for the same wrong plan at all normal time controls(including even correspondence time control).
I think that finding these positions may be important for programmers to get ideas what they need to fix in their evaluation.
Same for positions when all programs evaluate wrong.
You can see it for example if they see 0.5 pawn advantage for white but practially black scores more than 50% in games because the programs are strong enough to find the correct moves because their evaluation is strong enough to compare correctly between positions that can heppen later but the programs are not strong enough to evaluate the positions correctly.
I think that finding these positions may be important for programmers to get ideas what they need to fix in their evaluation.
Same for positions when all programs evaluate wrong.
You can see it for example if they see 0.5 pawn advantage for white but practially black scores more than 50% in games because the programs are strong enough to find the correct moves because their evaluation is strong enough to compare correctly between positions that can heppen later but the programs are not strong enough to evaluate the positions correctly.