Questions for the Stockfish team

Daniel Shawul · Post by **Daniel Shawul** » Wed Jul 21, 2010 12:43 pm

First of all with the return random() in eval, minimax is broken.
This is used in zero sum games http://en.wikipedia.org/wiki/Minimax .
So the search can't intelligently propage anything sensible.. If my rand()
gives me 370 (with 30 white moves for white) when white is to play and gives me 220 (with 10 black moves) when black to play,how can it do the propagation properly... You can get away with something asymmetrical but not this radical.
EDIT : This example just gave me another idea, the mobility eval only considers the side to move. How about the mobility of the opponent ? The more I think about it get worse.

Even if we fix that and return -ve of the score for the opposite side (like I suggested
in the previous thread), I don't see its elo going to 4 digits let alone 1800. TSCP is 1600elo or so according to
WBEC. And I know how many months it takes to reach its level..

The mobility eval that is claimed to exist is very poor to say the least. Do you think a position
with mobility of 15 would distinctly be classified to that with 20 by this method, with reasonable degree of
certainity. I don't think so..

AlvaroBegue · Post by **AlvaroBegue** » Wed Jul 21, 2010 3:24 pm

I haven't read the whole thread (too long!), but I see you guys are talking about random evaluation. I have looked at this (alpha-beta search with a `-10000+rand()%20001' as evaluation function) in the past, and it is true that it captures certain dynamic notion of mobility.

What I wanted to mention is that 15 years ago I modified my Spanish-checkers program to play loosing Spanish checkers. Since it was completely unclear to me whether it was good or bad to have material, and the only thing that seemed to matter was having options, I tried with a random evaluation function and it worked surprisingly well. It was immediately stronger than any human players I had access to.

As further evidence that there is some value to this random evaluation stuff, you can very easily implement a tic-tac-toe program that uses random values in some range for draws, so from the beginning of the game it will prefer to constrain the opponent as much as possible. Such a program plays significantly more naturally than something that considers all draws the same. For instance, it plays the first move in the center most of the time. You can test it yourself:

Code: Select all

#include <iostream>
#include <ctime>
#include <cstdlib>
#include <algorithm>

struct Board {
  static unsigned const code[9];
  
  unsigned player_sum[2];
  int board[9];
  
  Board() {
    player_sum[0]=0x11111111;
    player_sum[1]=0x11111111;
    for (int i=0; i<9; ++i)
      board[i]=-1;
  }
  
  void play(int player, int square) {
    player_sum[player]+=code[square];
    board[square]=player;
  }
  
  void undo(int player, int square) {
    player_sum[player]-=code[square];
    board[square]=-1;
  }
  
  bool finished(int &winner) {
    if (player_sum[0]&0x44444444){
      winner=0;
      return true;
    }
    if (player_sum[1]&0x44444444){
      winner=1;
      return true;
    }
    if (player_sum[0]+player_sum[1] == 0x55555555) {
      winner=-1;
      return true;
    }
    return false;
  }
};

unsigned const Board::code[9] = {
  0x10010010, 0x10001000, 0x10000101,
  0x01010000, 0x01001011, 0x01000100,
  0x00110001, 0x00101000, 0x00100110
};

int nega_max(Board &b, int player, int alpha, int beta, int from_root) {
  int winner;
  
  if (b.finished(winner)) {
    if (winner==-1)
      return -500+(std::rand()%1001);
    return (winner == player ? 1 : -1)
      *(1000000000 - from_root*1000 -500 + (std::rand()%1001));
  }

  for (int i=0; i<9; ++i) {
    if (b.board[i]==-1) {
      b.play(player, i);
      int v = -nega_max(b, 1-player, -beta, -alpha, from_root+1);
      b.undo(player, i);
      if (v>alpha) {
	alpha = v;
	if (v>beta) {
	  return v;
	}
      }
    }
  }
  
  return alpha;
}

int pick_move(Board &b, int player) {
  int best_score=-1000000000;
  int best_move=0;
  
  for (int i=0; i<9; ++i) {
    if (b.board[i]==-1) {
      b.play(player, i);
      int v = -nega_max(b, 1-player, -1000000000, -best_score, 1);
      b.undo(player, i);
      if (v>best_score) {
	best_score=v;
	best_move=i;
      }
    }
  }
  std::cout << "best=" << best_move << " score=" << best_score << '\n';
  return best_move;
}

std::ostream &operator << (std::ostream &os, Board const &b) {
  for (int j=0; j<3; ++j) {
    for (int i=0; i<3; ++i)
      os << ".xo"[b.board[3*j+i]+1] << ' ';
    os << "   ";
    for (int i=0; i<3; ++i)
      os << (3*j+i) << ' ';
    os << '\n';
  }
  return os;
}

int main() {
  std::srand(time(0));
  Board b;
  int winner;
  
  int human_player;
  std::cout << "Enter which player you want to be (0 or 1): ";
  std::cin >> human_player;
  
  for (int i=0; !b.finished(winner); i=1-i) {
    std::cout << b;
    int m;
    if (i==human_player)
      std::cin >> m;
    else
      m=pick_move(b,i);
    b.play(i,m);
  }
  
  std::cout << b;
  
  switch(winner) {
  case -1:
    std::cout << "It's a draw!\n";
    break;
  case 0:
    std::cout << "`x' wins!\n";
    break;
  case 1:
    std::cout << "`o' wins!\n";
    break;
  }
}

Daniel Shawul · Post by **Daniel Shawul** » Wed Jul 21, 2010 4:07 pm

Tic-tac-toe really ? You don't need random evaluation for that as you can
get many WDL just by searching.. Has already been mentioned in this thread infact.
That effect is not related to the random eval at all.

Answer me these questions.

a) how minimax is going to work with just eval() which returns
positive number for everything... It is really a simple question.
This really breaks the essence of minimax as is clearly outlined in the
wiki page I gave link to.

b) The mobility eval is done for one side only.. But chess is a game of perfect information
http://en.wikipedia.org/wiki/Perfect_information
And yet we are doing gross mobility evaluation for one side only, which disregards the
mobility of its opponent.

c) The supposed mobility evaluation that it brings is very rough to the say the least.
Again for reasons I mentioned in this thread already.

This is two violations of chess game tree search theory in a raw and one so so positonal evaluation.
You must understand why I have difficulty to accept one can get a 1800 elo engine out of this mess.

bob · Post by **bob** » Wed Jul 21, 2010 4:16 pm

Dann Corbit wrote:Truly a bizarre beginning (though we would expect positive on top and negative below, the sterling start of zero evaluation has me scratching my head.):

Code: Select all

    Program                  Elo    +   -   Games   Score   Av.Op.  Draws
  1 Crafty-232ap00         : 3490    0   0     4   100.0 %   2890    0.0 %
  2 Crafty-232ap50         : 3343    0   0     4   100.0 %   2743    0.0 %
  3 Crafty-23.2a-skill-mod : 3082  497 415     4    62.5 %   2993   25.0 %
  4 Crafty-232ap10         : 3035  497 415     4    62.5 %   2946   25.0 %
  5 Crafty-232ap01         : 2981  415 497     4    37.5 %   3070   25.0 %
  6 Crafty-232am10         : 2776  675 409     4    25.0 %   2967    0.0 %
  7 Crafty-232am01         : 2752  318 262     4    12.5 %   3090   25.0 %
  8 Crafty-232am50         : 2541    0   0     4     0.0 %   3141    0.0 %

This doesn't surprise me that much. The "Beal effect" depends on randomness. When I first tested the skill stuff, I stopped at 50 because 50 was well over 400 Elo weaker than normal crafty. But as you get down to zero, which is purely random, you see maximal Beal effect, which may well be better than a 50-50 eval/random split...

bob · Post by **bob** » Wed Jul 21, 2010 4:17 pm

Dann Corbit wrote:
Dann Corbit wrote:Truly a bizarre beginning (though we would expect positive on top and negative below, the sterling start of zero evaluation has me scratching my head.):
Code: Select all
    Program                  Elo    +   -   Games   Score   Av.Op.  Draws
  1 Crafty-232ap00         : 3490    0   0     4   100.0 %   2890    0.0 %
  2 Crafty-232ap50         : 3343    0   0     4   100.0 %   2743    0.0 %
  3 Crafty-23.2a-skill-mod : 3082  497 415     4    62.5 %   2993   25.0 %
  4 Crafty-232ap10         : 3035  497 415     4    62.5 %   2946   25.0 %
  5 Crafty-232ap01         : 2981  415 497     4    37.5 %   3070   25.0 %
  6 Crafty-232am10         : 2776  675 409     4    25.0 %   2967    0.0 %
  7 Crafty-232am01         : 2752  318 262     4    12.5 %   3090   25.0 %
  8 Crafty-232am50         : 2541    0   0     4     0.0 %   3141    0.0 %
New chapter in the theatre of the bizarre:
Code: Select all
   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         : 3525    0   0     7   100.0 %   2925    0.0 %
 2 Crafty-23.2a-skill-mod : 3197  374 317     7    78.6 %   2972   14.3 %
 3 Crafty-232ap50         : 3139  441 334     7    71.4 %   2980    0.0 %
 4 Crafty-232ap10         : 3058  342 316     6    58.3 %   3000   16.7 %
 5 Crafty-232ap01         : 2942  316 342     6    41.7 %   3000   16.7 %
 6 Crafty-232am01         : 2861  255 290     7    28.6 %   3020   28.6 %
 7 Crafty-232am10         : 2803  317 374     7    21.4 %   3028   14.3 %
 8 Crafty-232am50         : 2475    0   0     7     0.0 %   3075    0.0 %
Everything makes easy sense except the top entry. 100% skill better than 50% which is better than 10% which is better than 1% which is better than -1% which is better than -10% which is better than -50%.

However, the mighty clout of EVAL_ZERO is giving me pause. Surely, it's just a statistical abnormality. Or perhaps in the code somewhere there is a test for (skill == 0) and it is exercising a different branch.

No tests. Only place this is used is in option.c where it whacks up the extensions and stuff, and evaluate.c where the randomness is introduced into the eval result...

bob · Post by **bob** » Wed Jul 21, 2010 4:19 pm

Dann Corbit wrote:
Dann Corbit wrote:
Dann Corbit wrote:Truly a bizarre beginning (though we would expect positive on top and negative below, the sterling start of zero evaluation has me scratching my head.):
Code: Select all
    Program                  Elo    +   -   Games   Score   Av.Op.  Draws
  1 Crafty-232ap00         : 3490    0   0     4   100.0 %   2890    0.0 %
  2 Crafty-232ap50         : 3343    0   0     4   100.0 %   2743    0.0 %
  3 Crafty-23.2a-skill-mod : 3082  497 415     4    62.5 %   2993   25.0 %
  4 Crafty-232ap10         : 3035  497 415     4    62.5 %   2946   25.0 %
  5 Crafty-232ap01         : 2981  415 497     4    37.5 %   3070   25.0 %
  6 Crafty-232am10         : 2776  675 409     4    25.0 %   2967    0.0 %
  7 Crafty-232am01         : 2752  318 262     4    12.5 %   3090   25.0 %
  8 Crafty-232am50         : 2541    0   0     4     0.0 %   3141    0.0 %
New chapter in the theatre of the bizarre:
Code: Select all
   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         : 3525    0   0     7   100.0 %   2925    0.0 %
 2 Crafty-23.2a-skill-mod : 3197  374 317     7    78.6 %   2972   14.3 %
 3 Crafty-232ap50         : 3139  441 334     7    71.4 %   2980    0.0 %
 4 Crafty-232ap10         : 3058  342 316     6    58.3 %   3000   16.7 %
 5 Crafty-232ap01         : 2942  316 342     6    41.7 %   3000   16.7 %
 6 Crafty-232am01         : 2861  255 290     7    28.6 %   3020   28.6 %
 7 Crafty-232am10         : 2803  317 374     7    21.4 %   3028   14.3 %
 8 Crafty-232am50         : 2475    0   0     7     0.0 %   3075    0.0 %
Everything makes easy sense except the top entry. 100% skill better than 50% which is better than 10% which is better than 1% which is better than -1% which is better than -10% which is better than -50%.

However, the mighty clout of EVAL_ZERO is giving me pause. Surely, it's just a statistical abnormality. Or perhaps in the code somewhere there is a test for (skill == 0) and it is exercising a different branch.
We are now approaching the cliffs of insanity.
Code: Select all
   Program                  Elo    +   -   Games   Score   Av.Op.  Draws

 1 Crafty-232ap00         : 3526    0   0    15   100.0 %   2926    0.0 %
 2 Crafty-23.2a-skill-mod : 3207  234 202    15    80.0 %   2966   13.3 %
 3 Crafty-232ap50         : 3142  211 188    15    73.3 %   2966   13.3 %
 4 Crafty-232ap10         : 3077  195 181    15    66.7 %   2956   13.3 %
 5 Crafty-232ap01         : 2926  181 195    15    33.3 %   3046   13.3 %
 6 Crafty-232am01         : 2883  175 189    15    30.0 %   3031   20.0 %
 7 Crafty-232am10         : 2763  227 288    15    16.7 %   3042    6.7 %
 8 Crafty-232am50         : 2476    0   0    15     0.0 %   3076    0.0 %
I suggest that it may be worthwhile for others to perform the simple test with the patch I posted up above. + 300 Elo for removal of eval seems a bit odd at best.

I've not seen normal crafty lose to a skill-impaired version, so that is new and suggests something is wrong somewhere. I'll look again at your patch to see if anything stands out...

bob · Post by **bob** » Wed Jul 21, 2010 4:30 pm

Daniel Shawul wrote:
do the math first...

.01 * real eval + .99 * random() where random is between 0 and 100 (one pawn value).

don't know what your "completely random" comment means, but I have tested (and just did it again) with pure random scores.
Just take Crafty, and right at the top of evaluate.c return 100 * random_generator() (assuming random_generator() returns a float 0.0 <= N < 1.00). Then you won't be guessing.
Actually I did do the math but it seems you don't comprehend. What was talked about
was complete randomness, but you suddenly decided to mix order into it, no matter
how insignificant you think it is!. The skill-1 mixes 1% of 'order' with 99% of
'chaos', which roughly translates into a maximum of 3-sigma (99.7% or so) result if we
are to say it is completley based on randomness. So when I say completely random,
I mean 0% order. I don't know why you thought otherwise..

99% of eval is pure random. The other 1% is insignificant, with respect to "The Beal Effect." If you generate 100 random numbers between 0 and 100, and then add 1% of a number that might reach 1000, you still have 100 random numbers for evaluations, where larger is better.

I have explained this several times already. And I explained that I tried a run with 100% random and found _zero_ difference statistically with the normal worst-case of 99% random. This by comparing two 30,000 game matches played by each.

How much simpler can this be explained? I am posting what "is", not what "I think it is".

There was recently news on the discovery of Higgs boson based on a 3 sigma result.
Despite doubts of the source of this news, this statement was by itself enough to convince
scientists it is not a _discovery_ as that is something attributed to a 5 sigma experimental
result. More about odds of discovery here http://www.fnal.gov/pub/ferminews/fermi ... 16/p1.html

I did not guess anything. I and Tord just did this to our engines and got something light years away
from 1800 elo.

Exactly what test did you run and what were the results for how many games? Just playing one game does not make this "light years different." Dann is playing games using the skill version of Crafty and is seeing just as much unusual strength as I am.

I have no "eval cache". There is a pawn score cache (pawn hash) but the random trick is applied to the score after that is used,
so this has absolutely no effect on anything. yes it causes TT issues. Again, "so what"? We want worse play, not an optimal search. Let it fail low and then high on the same move, it just wastes more time.
You have to attention to details . The same evaluation should be assigned to the same position
when visited twice..What is the sense of giving two different scores to the same position ?

Who says so? The whole idea here is probability. using hash sig is not going to provide the kind of randomness I want. In fact, it doesn't provide any randomness at all, which is exactly the _opposite_ of what I am trying to reach here.

Infact I have one more detail that I think needs to be addressed. Just using random() gives out positive values
(winning) for either side to move. They both get the same values which completely breaks the
_zero sum_ game notion. I am going to change this so that the score of white is negative of score of black

Doesn't break it for me. If it is BTM, I change the sign of the value before returning it, just as I do for all evaluations, since my evaluation is computed with +=good for white, and then negated if it is black to move.

for a given position.
We are using uniform PRNGs. The larger the sample, the greater the probability of getting a large PRN. That is pretty simple to understand.
Duh. Like I said you roughly get some kind of extereme value distribution from taking maximum of
random numbers. This is a little bit skewed to the left compared to normal. But If it were normal,
we could use the 1/sqrt(n) rule to make the comparison. You would need to multiply the sample size by 4
to double the certainity of getting a larger number from the sample. So if you compare 15 and 20, you see
it doesn't differ much.. Maybe when mobility difference is like 15 and 60 you start talking of
something and that is maybe. You can also take the exact mobility score (howevery you calculate it) if you like.
I do not belive mobility only brings 1800 elo, all it does will be to properly place his queen to the highest 'mobile' square only
to be captured by the opponent... epic fail!

I am not going to continuously argue a point that has already been explained in detail and verified by others, dating back as far as 20 years. Your comment about 15 or 20 is irrelevant. I am searching millions of nodes per second. By the time you compare two root moves, how many millions of random evaluations were used to produce an evaluation for each?

Care to rephrase that? Who is talking about "amplifying Elo" anywhere? Just a simple way to introduce mobility into the eval, which does lead to decent play. Not GM play, but also not 1200-level play either. I want to get the ELo down to 800 or less. Right now, with 23.2, the best one can get is down to 1800, which is much too high. With a purely random eval, at that
Are you saying approximate mobility eval is the only ingredient added by the search?
I just want to make sure.

That is what Don Beal said,, yes. I confirmed this myself. Yes. And it is not really added by the search. It is added by minimaxing random numbers so that you have a greater probability of getting larger ones if you take a larger sample. Mobility is enough to give a pseudo-material score as well, because the more pieces you have, the greater your mobility...

bob · Post by **bob** » Wed Jul 21, 2010 4:46 pm

Daniel Shawul wrote:First of all with the return random() in eval, minimax is broken.

What are you basing this on? In my case, I generate random numbers from 0 to 100. If it is black to move, those turn into 0 to -100. Not that that is important.

In a zero-sum game, there is nothing that says that "zero" is equal. Minimax is designed to return the best score for the side on move, based on the idea that big numbers are good and small numbers are bad. Not that they are grouped around zero. This works perfectly with a purely random evaluation.

This is used in zero sum games http://en.wikipedia.org/wiki/Minimax .
So the search can't intelligently propage anything sensible..

Before you go on, why don't you go find a copy of Beal's paper in the old ICCA and read it. This is not based on guesswork. This is based on experimental results by different programs. The search most definitely _does_ propagate "sensible" stuff. It might not be sensible to you, but once you understand the basic idea of "pseudo-mobility" which is also related to "pseudo-material" the light will come on.

If my rand()
gives me 370 (with 30 white moves for white) when white is to play and gives me 220 (with 10 black moves) when black to play,how can it do the propagation properly... You can get away with something asymmetrical but not this radical.

I presume you can produce a citation for some research that proves this?

Of course, no evaluation works like that. As I said, the scores my eval will produce, using pure random numbers, is -100 to +100. -100 is good for black, +100 is good for white. But the scores mean nothing. It is a function of mobility as to how large or small those scores are. The more positions where black has large mobility, the bigger the chance that minimax will choose a branch leading to a _small_ score. The more positions where white has large mobility, the bigger the chance that minimax will choose a branch leading to a large score. This really is simple. These are not scores in any sense of the word scores. You don't even need to use the +/- 100 trick, 0 to 100 will work just as well. The only notion that is important is that black wants a score closer to 0, white wants a score closer to 100.

But you need to stop writing and start thinking for a bit to understand this. It is odd, yes. But it is also a simple fact, not a theory.

EDIT : This example just gave me another idea, the mobility eval only considers the side to move. How about the mobility of the opponent ? The more I think about it get worse.

Jeez. Then stop thinking.

What is the idea of a _normal_ search with a _normal_ eval? Suppose white is to move at the root. What does white do? Search all branches and choose the branch with the largest score? What does black do? Search all branches and choose the branch with the smallest score? That is what makes this work. The scores mean nothing. But as I (white) search thru the tree, if I choose moves that increase my mobility and decrease my opponent's mobility (a check is the simplest example, but capturing one of his pieces works just fine) then I have a greater chance of reaching large scores, because my opponent has fewer opportunities to back up small scores.

Even if we fix that and return -ve of the score for the opposite side (like I suggested
in the previous thread), I don't see its elo going to 4 digits let alone 1800. TSCP is 1600elo or so according to
WBEC. And I know how many months it takes to reach its level..

Please stop the "I don't see..." "I don't bellieve" and such. As I said, this is not opinion, it really is observed fact.

The mobility eval that is claimed to exist is very poor to say the least. Do you think a position
with mobility of 15 would distinctly be classified to that with 20 by this method, with reasonable degree of certainity. I don't think so..

So "Monte Carlo" won't work at all, correct? After all, that is what you just said with that infamous "I don't think so." Yet a lot of people certainly depend on the methodology. But of course, they _understand_ it first...

AlvaroBegue · Post by **AlvaroBegue** » Wed Jul 21, 2010 4:54 pm

I haven't said anything offensive, so please keep the tone down.

Daniel Shawul wrote: Answer me these questions.

a) how minimax is going to work with just eval() which returns
positive number for everything... It is really a simple question.
This really breaks the essence of minimax as is clearly outlined in the
wiki page I gave link to.

That's why I posted exactly what evaluation function I was using, which is a uniform distribution symmetric around 0. Perhaps you should read the posts you reply to.

b) The mobility eval is done for one side only..

Not true.

But chess is a game of perfect information
http://en.wikipedia.org/wiki/Perfect_information

Everybody here knows what a game of perfect information is. Don't patronize me. Statistical methods have been shown to be useful in games of perfect information. For instance in go
http://en.wikipedia.org/wiki/Statistics
http://en.wikipedia.org/wiki/Go_%28game%29
http://senseis.xmp.net/?MonteCarlo

And yet we are doing gross mobility evaluation for one side only, which disregards the
mobility of its opponent.

I don't know what makes you believe that random evaluation only does mobility for one side.

c) The supposed mobility evaluation that it brings is very rough to the say the least.

Yes, but that doesn't mean it doesn't have any merit.

Daniel Shawul · Post by **Daniel Shawul** » Wed Jul 21, 2010 4:55 pm

Your words not mine. Re-read your first reply to me

Code: Select all

No. I mean this:

int Evaluate() {

return (random());

}

Is there anyting like if (side == white) ? rand() : -rand() ?? NO.
That is what I tried in my engine when you said random eval... and probably
what Tord did too. The fact that you give +ve score always to one side and -ve to the other of course something NOT random..
You just inserted order out no where. Not surprised there at all.

Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team