Questions for the Stockfish team

bob · Post by **bob** » Wed Jul 21, 2010 7:17 pm

Daniel Shawul wrote:You said random evaluation at first, and then you started bringing
order first by 0.01 * real eval which I strenously objected to,
then you said eval of white = -eval of black which further breaks the random nature of the eval, period.
I will not try to convince anyone further. Anyone interested to know my position can read all the issues
I raised here http://talkchess.com/forum/viewtopic.ph ... 66&t=35455 with the perspective of random eval and take their own conclusion.
It really doesn't help if you post voluminous game resutls with different setup than what was discussed.
This is basically a strawman argument from you which neglects the complete random evaluation criteria
you originally proposed.

They say insanity is doing the same thing over and over again and expecting different results.
I say it is expecting a consistent miracle from a random event.

I said, quite clearly, that I have tried _all_ of those things, if you read. The original discussion was about Crafty's "skill 1" performance. Just look at 23.2, search for SKILL in evaluate.c. It is not _that_ complicated to understand what I do. skill 100 simply uses 100% of normal eval. Which is negated at the return point in evaluate.c if it is not WTM since the eval is based on +=good for white.

for skill 1, you get:

score = .1 * score + .99 * random()

then it does the normal

return (wtm) ? score : - score;

That has been there since the skill command was added. I assume that if you jump into a discussion, you at least know what it is about. Which would include this information. I then pointed out that I had tried a pure random number with no positional component at all. just:

score = random();

with the usual return.

If you use negamax, then yes, you have to negate the score because black wants the biggest score, but that has to be the opposite of what white wants. So, the normal "colorful" return is used.

And that does give a pseudo-mobility that works just fine. Too fine, in fact...

So before we go on, how about looking at the statement following the "SKILL" token in evaluate.c, so that we are talking about the same thing. I am _always_ talking about what I do in Crafty, not what I imagine others are doing in their programs...

jwes · Post by **jwes** » Wed Jul 21, 2010 7:48 pm

bob wrote:It is definitely odd. Fortunately, to measure this stuff, I have the perfect facility here. The new thread I posted is the result of a 24 hour run to test 11 different skill settings for 30,000 games each.

Would it make any difference if you used the hash key to create the random score rather than random()? That would result in positions always having the same (random) evaluation.

AlvaroBegue · Post by **AlvaroBegue** » Wed Jul 21, 2010 7:53 pm

jwes wrote:
bob wrote:It is definitely odd. Fortunately, to measure this stuff, I have the perfect facility here. The new thread I posted is the result of a 24 hour run to test 11 different skill settings for 30,000 games each.
Would it make any difference if you used the hash key to create the random score rather than random()? That would result in positions always having the same (random) evaluation.

I don't think it would make a difference. Reproducibility of the evaluation function probably doesn't matter much for anything.

jwes · Post by **jwes** » Wed Jul 21, 2010 8:02 pm

I just had another idea about why crafty plays too well. Since the search will always choose the first move that returns the minimax score, the move that your move ordering picks first is very likely to be chosen.

Dann Corbit · Post by **Dann Corbit** » Wed Jul 21, 2010 8:25 pm

Dann Corbit wrote:
Dann Corbit wrote:
Dann Corbit wrote:
Dann Corbit wrote:Truly a bizarre beginning (though we would expect positive on top and negative below, the sterling start of zero evaluation has me scratching my head.):
Code: Select all
    Program                  Elo    +   -   Games   Score   Av.Op.  Draws
  1 Crafty-232ap00         : 3490    0   0     4   100.0 %   2890    0.0 %
  2 Crafty-232ap50         : 3343    0   0     4   100.0 %   2743    0.0 %
  3 Crafty-23.2a-skill-mod : 3082  497 415     4    62.5 %   2993   25.0 %
  4 Crafty-232ap10         : 3035  497 415     4    62.5 %   2946   25.0 %
  5 Crafty-232ap01         : 2981  415 497     4    37.5 %   3070   25.0 %
  6 Crafty-232am10         : 2776  675 409     4    25.0 %   2967    0.0 %
  7 Crafty-232am01         : 2752  318 262     4    12.5 %   3090   25.0 %
  8 Crafty-232am50         : 2541    0   0     4     0.0 %   3141    0.0 %
New chapter in the theatre of the bizarre:
Code: Select all
   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         : 3525    0   0     7   100.0 %   2925    0.0 %
 2 Crafty-23.2a-skill-mod : 3197  374 317     7    78.6 %   2972   14.3 %
 3 Crafty-232ap50         : 3139  441 334     7    71.4 %   2980    0.0 %
 4 Crafty-232ap10         : 3058  342 316     6    58.3 %   3000   16.7 %
 5 Crafty-232ap01         : 2942  316 342     6    41.7 %   3000   16.7 %
 6 Crafty-232am01         : 2861  255 290     7    28.6 %   3020   28.6 %
 7 Crafty-232am10         : 2803  317 374     7    21.4 %   3028   14.3 %
 8 Crafty-232am50         : 2475    0   0     7     0.0 %   3075    0.0 %
Everything makes easy sense except the top entry. 100% skill better than 50% which is better than 10% which is better than 1% which is better than -1% which is better than -10% which is better than -50%.

However, the mighty clout of EVAL_ZERO is giving me pause. Surely, it's just a statistical abnormality. Or perhaps in the code somewhere there is a test for (skill == 0) and it is exercising a different branch.
We are now approaching the cliffs of insanity.
Code: Select all
   Program                  Elo    +   -   Games   Score   Av.Op.  Draws

 1 Crafty-232ap00         : 3526    0   0    15   100.0 %   2926    0.0 %
 2 Crafty-23.2a-skill-mod : 3207  234 202    15    80.0 %   2966   13.3 %
 3 Crafty-232ap50         : 3142  211 188    15    73.3 %   2966   13.3 %
 4 Crafty-232ap10         : 3077  195 181    15    66.7 %   2956   13.3 %
 5 Crafty-232ap01         : 2926  181 195    15    33.3 %   3046   13.3 %
 6 Crafty-232am01         : 2883  175 189    15    30.0 %   3031   20.0 %
 7 Crafty-232am10         : 2763  227 288    15    16.7 %   3042    6.7 %
 8 Crafty-232am50         : 2476    0   0    15     0.0 %   3076    0.0 %
I suggest that it may be worthwhile for others to perform the simple test with the patch I posted up above. + 300 Elo for removal of eval seems a bit odd at best.
You said I would be surprised, but I guess even astonished would not begin to cover it. At least it turns out that ZERO_EVAL is not indestructable:
Code: Select all
   Program                  Elo    +   -   Games   Score   Av.Op.  Draws

 1 Crafty-232ap00         : 3523    0 301    18    97.2 %   2923    5.6 %
 2 Crafty-23.2a-skill-mod : 3183  184 168    18    75.0 %   2992   16.7 %
 3 Crafty-232ap50         : 3146  206 181    18    77.8 %   2928   11.1 %
 4 Crafty-232ap10         : 3109  178 166    18    66.7 %   2989   11.1 %
 5 Crafty-232ap01         : 2924  158 166    18    36.1 %   3023   16.7 %
 6 Crafty-232am01         : 2867  157 168    18    27.8 %   3033   22.2 %
 7 Crafty-232am10         : 2763  197 236    18    19.4 %   3010    5.6 %
 8 Crafty-232am50         : 2485    0   0    18     0.0 %   3085    0.0 %
 
p00 is skill=0
skill-mod is skill=100
p50 is skill=+50
p10 is skill=+10
p01 is skill=+1
m01 is skill=-1
m10 is skill=-10
m50 is skill=-50

Here is how it ended up:

Code: Select all

   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         : 3344  133 121    55    90.0 %   2963   12.7 %
 2 Crafty-23.2a-skill-mod : 3270  113 105    55    83.6 %   2986   14.5 %
 3 Crafty-232ap50         : 3179  102  97    55    75.5 %   2984   12.7 %
 4 Crafty-232ap10         : 3100   88  86    55    63.6 %   3003   18.2 %
 5 Crafty-232ap01         : 2945   87  88    55    39.1 %   3022   16.4 %
 6 Crafty-232am01         : 2889   90  94    55    30.0 %   3036   16.4 %
 7 Crafty-232am10         : 2788  113 126    55    18.2 %   3049    3.6 %
 8 Crafty-232am50         : 2486    0   0    55     0.0 %   3086    0.0 %

Gerd Isenberg · Post by **Gerd Isenberg** » Wed Jul 21, 2010 8:38 pm

bob wrote:
Daniel Shawul wrote:You said random evaluation at first, and then you started bringing
order first by 0.01 * real eval which I strenously objected to,
then you said eval of white = -eval of black which further breaks the random nature of the eval, period.
I will not try to convince anyone further. Anyone interested to know my position can read all the issues
I raised here http://talkchess.com/forum/viewtopic.ph ... 66&t=35455 with the perspective of random eval and take their own conclusion.
It really doesn't help if you post voluminous game resutls with different setup than what was discussed.
This is basically a strawman argument from you which neglects the complete random evaluation criteria
you originally proposed.

They say insanity is doing the same thing over and over again and expecting different results.
I say it is expecting a consistent miracle from a random event.
I said, quite clearly, that I have tried _all_ of those things, if you read. The original discussion was about Crafty's "skill 1" performance. Just look at 23.2, search for SKILL in evaluate.c. It is not _that_ complicated to understand what I do. skill 100 simply uses 100% of normal eval. Which is negated at the return point in evaluate.c if it is not WTM since the eval is based on +=good for white.

for skill 1, you get:

score = .1 * score + .99 * random()

then it does the normal

return (wtm) ? score : - score;

That has been there since the skill command was added. I assume that if you jump into a discussion, you at least know what it is about. Which would include this information. I then pointed out that I had tried a pure random number with no positional component at all. just:

score = random();

with the usual return.

If you use negamax, then yes, you have to negate the score because black wants the biggest score, but that has to be the opposite of what white wants. So, the normal "colorful" return is used.

And that does give a pseudo-mobility that works just fine. Too fine, in fact...

So before we go on, how about looking at the statement following the "SKILL" token in evaluate.c, so that we are talking about the same thing. I am _always_ talking about what I do in Crafty, not what I imagine others are doing in their programs...

Not everybody likes to download crafty sources to inspect your code on the fly. It is much more appropriate and convenient for the readers if you explain it with patience

What is the value range of random? Is it symmetrical around zero or not?

How would this three more or less random evals play?

Code: Select all

negamaxEval ::= rand() > 0.5 ? MATE_SCORE/2 : -MATE_SCORE/2

negamaxEval ::=  wtm ? MATE_SCORE/2 : -MATE_SCORE/2;

negamaxEval ::=  MATE_SCORE/2;  // both are winning at a leaf

Gerd Isenberg · Post by **Gerd Isenberg** » Wed Jul 21, 2010 8:43 pm

Daniel Shawul wrote:Gerd,
My objection was to a completely random evaluation which I tried to outline as much as I can here http://talkchess.com/forum/viewtopic.ph ... 66&t=35455. Now we have come to apparent consensus how the score of one side should be negated to the other side for minimax to work... This was originally absent from his reply to me but somehow expected me to understand even after giving me pseudocode how to do it..

I gave up the point that it does some weird kind of mobility evaluation the minute Marco posted it. But I pointed out how bad that eval is and how one sided it is completely disregarding the perfect information game assumption. The supposed engine evaluates like 'poker' , like it can't see what the opponent has to offer. It just evaluates its mobility and goes on... See points c & d of my post in the link above.

Did they (Don Beal) say a 1800 elo engine can be constructed this way ? Even he (Bob) himself didn't belive it when people first told him it plays like 1800. He thought it played like 800 (said it in this thread ofcourse). I can't say what crafty does / does not do, that is why I am sticking to what he says about the effect with random eval and I am definately not getting a 1800 elo engine.

Daniel

Hi Daniel,
yes, you can not weigh one's words in this threads and there are tons of misunderstandings due to implicit knowledge and assumptions about the point, selective reading, impatience etc..

Apparently, if the random range per side is symmetric around zero, you may use the same "evaluation" function for white and black, of course no matter whether you use explicit min versus max or negamax. I guess if both white and black to move leaf nodes have always "random" winning scores from their negamax perspective let say in the 1000..2000 range (> 15000 mate) things are obviously different, specially with all usual ID, TT, extension stuff etc.. I guess that the latter has some more search "instability problems" and I expect it weaker than the symmetric one.

I don't have that ICCA Journal handy, and I'm not sure I ever read Beal's article. I remember some discussions. So no idea on any Elo-figures. May be Bob's 1800 claim is a bit overestimated due to huge error by some "random" wins or draws against otherwise > 2800 engines, which don't apply any opponent model approach. On the other hand such random eval engines still have mate scores.

Gerd

Daniel Shawul · Post by **Daniel Shawul** » Wed Jul 21, 2010 8:50 pm

What does it take to convince you it is bad ??? Here are some games

scorpio_random : return (hash_key % 1000) at beginning of eval
scorpio_regular : the regular eval

REULT: Disastoruous 67 - 0 .. Every game tested my patience and indeed i can't take the awfulness after 67 games and had to stop it...

Crap games available for download here http://sites.google.com/site/dshawul/te ... ects=0&d=1

If you insist i can do million games or whatever it takes to convince you...

Gerd Isenberg · Post by **Gerd Isenberg** » Wed Jul 21, 2010 9:09 pm

Daniel Shawul wrote:What does it take to convince you it is bad ??? Here are some games

scorpio_random : return (hash_key % 1000) at beginning of eval
scorpio_regular : the regular eval

REULT: Disastoruous 67 - 0 .. Every game tested my patience and indeed i can't take the awfulness after 67 games and had to stop it...

Crap games available for download here http://sites.google.com/site/dshawul/te ... ects=0&d=1

If you insist i can do million games or whatever it takes to convince you...

What you return, the % 1000, is it signed or zero extended?
What does one expect with 1000-1200 Elo difference?
How does scorpio_random n ply perform against n+1, n+2 ... ply?

Ralph Stoesser · Post by **Ralph Stoesser** » Wed Jul 21, 2010 9:10 pm

Dann Corbit wrote: Here is how it ended up:

Code: Select all

   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         : 3344  133 121    55    90.0 %   2963   12.7 %
 2 Crafty-23.2a-skill-mod : 3270  113 105    55    83.6 %   2986   14.5 %
 3 Crafty-232ap50         : 3179  102  97    55    75.5 %   2984   12.7 %
 4 Crafty-232ap10         : 3100   88  86    55    63.6 %   3003   18.2 %
 5 Crafty-232ap01         : 2945   87  88    55    39.1 %   3022   16.4 %
 6 Crafty-232am01         : 2889   90  94    55    30.0 %   3036   16.4 %
 7 Crafty-232am10         : 2788  113 126    55    18.2 %   3049    3.6 %
 8 Crafty-232am50         : 2486    0   0    55     0.0 %   3086    0.0 %

It seems you have found a strong setting. What was the time control?

Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team