Questions for the Stockfish team

Discussion of chess software programming and technical issues.

Moderator: Ras

BubbaTough
Posts: 1154
Joined: Fri Jun 23, 2006 5:18 am

Re: Questions for the Stockfish team

Post by BubbaTough »

bob wrote:
BubbaTough wrote:
bob wrote:As I said, the scores my eval will produce, using pure random numbers, is -100 to +100. -100 is good for black, +100 is good for white.
only up to 100? Assuming integers, that is not much variety for millions of nodes to choose from. The whole thing makes sense to me if you were using a much larger number. I find it peculiar you are seeing much effect here without using a larger number, but if you are, you are. I wonder how it effects things to use smaller or larger ranges (I thought I knew but apparently my instinct is off).

-Sam
What would be the point of a larger range? The random numbers are uniformly distributed. The range is pretty much irrelevant, you just want enough choices to differentiate between typical mobility extremes. 0 to 100 moves covers almost all cases well enough. I tried other ranges when developing this and found absolutely no benefit. This tends to make the tree a bit smaller since there are fewer distinct scores and there are more "equals" which allow pruning.
Well clearly you are right, because you tested it and it showed...you are right. My thought was that if you have a large tree with millions of leafs, enough of them would be 100 at the edges that you couldn't really distinguish between large mobility and small mobility by the time you propagated back to the root. Clearly if you only generated numbers between 1 and 2 that would be the case. A large enough range counteracts this. But I guess somehow 100 is a large enough range, and you are right, and I am wrong.

-Sam
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions for the Stockfish team

Post by bob »

Daniel Shawul wrote:
Exactly what test did you run and what were the results for how many games? Just playing one game does not make this "light years different." Dann is playing games using the skill version of Crafty and is seeing just as much unusual strength as I am.
Well he is playing with something else. You said random eval and infact wrote psuedo code for it.
No. I mean this:

int Evaluate() {

return (random());

}
Another random quote from you
I don't know what your "completely random" comment means, but I have tested (and just did it again) with pure random scores. Just take Crafty, and right at the top of evaluate.c return 100 * random_generator() (assuming random_generator() returns a float 0.0 <= N < 1.00). Then you won't be guessing.
etcetra..

Why you suddenly decided to differentiate side to move ? I mean it is really difficult to
argue when your words change every reply.
My words have not changed one time. I gave two alternative ways of doing the random eval, clearly explaining that _either_ will work equally. STM is a normal minimax concept, is it not? Do I need to completely explain minimax and alpha/beta each time I write something here, or can I ASSUME that such ideas are understood by all and don't need to be regurgitated ad nauseum.

I also ASSUME that anyone interested in this discussion would first do a little research before trying to argue that this doesn't work. We have a well-done ICGA paper describing this. Others have chimed in explaining exactly _how_ and _why_ it does work. And we have a public-source chess program that _anyone_ (including yourself) can test to see if what we are saying is true or not. I had no idea random eval would play at +1800 until the thread here last month when someone complained that the skill 1 crafty option was playing far too strong and needed to be toned down. The issue is real, the explanation is not hard to understand. Sit back and digest what has been written before continuing to argue.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions for the Stockfish team

Post by bob »

BubbaTough wrote:
bob wrote:
BubbaTough wrote:
bob wrote:As I said, the scores my eval will produce, using pure random numbers, is -100 to +100. -100 is good for black, +100 is good for white.
only up to 100? Assuming integers, that is not much variety for millions of nodes to choose from. The whole thing makes sense to me if you were using a much larger number. I find it peculiar you are seeing much effect here without using a larger number, but if you are, you are. I wonder how it effects things to use smaller or larger ranges (I thought I knew but apparently my instinct is off).

-Sam
What would be the point of a larger range? The random numbers are uniformly distributed. The range is pretty much irrelevant, you just want enough choices to differentiate between typical mobility extremes. 0 to 100 moves covers almost all cases well enough. I tried other ranges when developing this and found absolutely no benefit. This tends to make the tree a bit smaller since there are fewer distinct scores and there are more "equals" which allow pruning.
Well clearly you are right, because you tested it and it showed...you are right. My thought was that if you have a large tree with millions of leafs, enough of them would be 100 at the edges that you couldn't really distinguish between large mobility and small mobility by the time you propagated back to the root. Clearly if you only generated numbers between 1 and 2 that would be the case. A large enough range counteracts this. But I guess somehow 100 is a large enough range, and you are right, and I am wrong.

-Sam
You are not thinking at the right level. We are talking about a probability P of generating any specific number between 0 and 99 (100 numbers total). To start with, what is the probability of producing any number >= 90? Exactly 0.1, correct? What about any number < 90? Exactly 0.9 correct?

Now, to the tree. At any node, how many moves will you have on average? 35-38 is the generally quoted number. If I have 4 legal moves because I am escaping check, I get 4/100 chances of getting exactly 99, the best possible score. If I have 50 legal moves, I get 50/100 shots at getting that 99 value. If I often had 100+ moves, I would want a bigger range, but we are talking about choices with just one side having N moves, where N is almost always < 100, and most of the time is < 50. That's why 100 works. In fact, I have been experimenting with smaller ranges to break this a bit, since the complaint about skill 1 is still pending. Going from 1 to 10 certainly weakens the thing, because now I have, on average, more moves than possible scores and with 10 moves, I get 10 chances to get a 9, which is a probability of near 1.0. If I have 40 moves, I still can't get a number bigger than 9 so I can't differentiate between those two nearly as well.

I am playing with the idea of shrinking the range as skill is reduced, to get around the "beal effect." I'd prefer that to having an artificial loop to slow the program down so much that the depth is reduced enough that the beal effect is minimized. To understand, think about a 1 ply search. Beal effect does not work at all. I find it irritating that it is just as complicated to make the program play poorly as it is to make it play well...
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Questions for the Stockfish team

Post by Chan Rasjid »

Milos wrote:
Joost Buijs wrote:It is my feeling that everything depends on the quality of the evaluation. When i look at my own engine, it has an evaluation function comparable to a 1600 player, but it plays at 2850 level just because it is very good at tactics. I'm pretty sure that when i'm able to improve the evaluation function to a higher level it's elo will go up.
Evaluation is just a dead-end. There are so many different positional things that you need to implement to noticeably increase elo, and all that with an assumption that you don't lose any speed. However, with each new thing you lose more in speed than what you gain with positional knowledge. Or simply put, evaluation as it is in the most strong programs is really in it's local optimum, and you can't improve it more without drastically changing search.
On the other hand, you can improve search almost for free ;).
After a little thinking, I do see some meaning in what you say. "dead-end" is another term for an evaluation maxima that others refer to. As you did not give the details, I''ll try.

Evaluation can reach a (local?) optimal maximum because static evaluation is always an approximation - what others say there is no perfect evaluation. We can construct an "error-free" chess engine:
1) search - don't have any forward pruning, etc and thus searches the whole tree intact with normal good ordering, check extensions. So this search cannot make any mistake.
2) A conservative evaluation, say that of Fruit taking away any dangerous bonus.

If this engine plays at elo 2000, then let an expert team test all manner of improvements to just the evaluation. I think the best elo increase might be just 150/200. The engine cannot be made stronger except through changes to search.

I think Miguel had to do some maths - if an engine has a huge elo difference from 3000, it is in a weak search and not evaluation. Bob's pointing to the simple evaluation of Fruit is correct - most importantly, it is bug free.

Rasjid
BubbaTough
Posts: 1154
Joined: Fri Jun 23, 2006 5:18 am

Re: Questions for the Stockfish team

Post by BubbaTough »

bob wrote:
BubbaTough wrote:
bob wrote:
BubbaTough wrote:
bob wrote:As I said, the scores my eval will produce, using pure random numbers, is -100 to +100. -100 is good for black, +100 is good for white.
only up to 100? Assuming integers, that is not much variety for millions of nodes to choose from. The whole thing makes sense to me if you were using a much larger number. I find it peculiar you are seeing much effect here without using a larger number, but if you are, you are. I wonder how it effects things to use smaller or larger ranges (I thought I knew but apparently my instinct is off).

-Sam
What would be the point of a larger range? The random numbers are uniformly distributed. The range is pretty much irrelevant, you just want enough choices to differentiate between typical mobility extremes. 0 to 100 moves covers almost all cases well enough. I tried other ranges when developing this and found absolutely no benefit. This tends to make the tree a bit smaller since there are fewer distinct scores and there are more "equals" which allow pruning.
Well clearly you are right, because you tested it and it showed...you are right. My thought was that if you have a large tree with millions of leafs, enough of them would be 100 at the edges that you couldn't really distinguish between large mobility and small mobility by the time you propagated back to the root. Clearly if you only generated numbers between 1 and 2 that would be the case. A large enough range counteracts this. But I guess somehow 100 is a large enough range, and you are right, and I am wrong.

-Sam
You are not thinking at the right level. We are talking about a probability P of generating any specific number between 0 and 99 (100 numbers total). To start with, what is the probability of producing any number >= 90? Exactly 0.1, correct? What about any number < 90? Exactly 0.9 correct?

Now, to the tree. At any node, how many moves will you have on average? 35-38 is the generally quoted number. If I have 4 legal moves because I am escaping check, I get 4/100 chances of getting exactly 99, the best possible score. If I have 50 legal moves, I get 50/100 shots at getting that 99 value. If I often had 100+ moves, I would want a bigger range, but we are talking about choices with just one side having N moves, where N is almost always < 100, and most of the time is < 50. That's why 100 works. In fact, I have been experimenting with smaller ranges to break this a bit, since the complaint about skill 1 is still pending. Going from 1 to 10 certainly weakens the thing, because now I have, on average, more moves than possible scores and with 10 moves, I get 10 chances to get a 9, which is a probability of near 1.0. If I have 40 moves, I still can't get a number bigger than 9 so I can't differentiate between those two nearly as well.

I am playing with the idea of shrinking the range as skill is reduced, to get around the "beal effect." I'd prefer that to having an artificial loop to slow the program down so much that the depth is reduced enough that the beal effect is minimized. To understand, think about a 1 ply search. Beal effect does not work at all. I find it irritating that it is just as complicated to make the program play poorly as it is to make it play well...
What you are saying makes sense to me on a 2 ply search, but if we crank up the ply to say to 10 it seemed much less clear to me as it seemed like you were accumulating more chances to get a higher score. Now that I think about it though, I think you are right (which you already knew of course).

Overall, very interesting stuff.

-Sam
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Questions for the Stockfish team

Post by Chan Rasjid »

bob wrote:
Chan Rasjid wrote:The Beal Effect :-
I am not going to continuously argue a point that has already been explained in detail and verified by others, dating back as far as 20 years.
...
First disclaimer, I have not gone into the detail explanations here about the Beal Effect. My gut intuition just cannot accept a random move generator engine having any elo from only randomness.

My question is : if all search() returns is replaced with random(), is there any Beal Effect?

Code: Select all

// for all search()
x = search();
x = random();
I withdraw my question if search() is allowed to retain draws, mate scores etc that are real and part of search().

Rasjid
Search is unmodified, so yes, it recognizes draws and mates. The random eval is a simple approximation to mobility. You let the search choose among the random scores, and at any point in the tree, the more moves you have, the greater the probability you will get a good random score to back up, and vice-versa...
I think it is the "Beal Effect" - at any point in the tree, the more moves you have, the greater the probability you will get a good random score to back up.

I don't think deeply, but I do seem to accept it will be a crude mobility evaluator - probably even when all draws, mates are also replaced by random numbers.

I think Daniel Shawul was not given this concise explanation (here) earlier.

Rasjid
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions for the Stockfish team

Post by bob »

BubbaTough wrote:
bob wrote:
BubbaTough wrote:
bob wrote:
BubbaTough wrote:
bob wrote:As I said, the scores my eval will produce, using pure random numbers, is -100 to +100. -100 is good for black, +100 is good for white.
only up to 100? Assuming integers, that is not much variety for millions of nodes to choose from. The whole thing makes sense to me if you were using a much larger number. I find it peculiar you are seeing much effect here without using a larger number, but if you are, you are. I wonder how it effects things to use smaller or larger ranges (I thought I knew but apparently my instinct is off).

-Sam
What would be the point of a larger range? The random numbers are uniformly distributed. The range is pretty much irrelevant, you just want enough choices to differentiate between typical mobility extremes. 0 to 100 moves covers almost all cases well enough. I tried other ranges when developing this and found absolutely no benefit. This tends to make the tree a bit smaller since there are fewer distinct scores and there are more "equals" which allow pruning.
Well clearly you are right, because you tested it and it showed...you are right. My thought was that if you have a large tree with millions of leafs, enough of them would be 100 at the edges that you couldn't really distinguish between large mobility and small mobility by the time you propagated back to the root. Clearly if you only generated numbers between 1 and 2 that would be the case. A large enough range counteracts this. But I guess somehow 100 is a large enough range, and you are right, and I am wrong.

-Sam
You are not thinking at the right level. We are talking about a probability P of generating any specific number between 0 and 99 (100 numbers total). To start with, what is the probability of producing any number >= 90? Exactly 0.1, correct? What about any number < 90? Exactly 0.9 correct?

Now, to the tree. At any node, how many moves will you have on average? 35-38 is the generally quoted number. If I have 4 legal moves because I am escaping check, I get 4/100 chances of getting exactly 99, the best possible score. If I have 50 legal moves, I get 50/100 shots at getting that 99 value. If I often had 100+ moves, I would want a bigger range, but we are talking about choices with just one side having N moves, where N is almost always < 100, and most of the time is < 50. That's why 100 works. In fact, I have been experimenting with smaller ranges to break this a bit, since the complaint about skill 1 is still pending. Going from 1 to 10 certainly weakens the thing, because now I have, on average, more moves than possible scores and with 10 moves, I get 10 chances to get a 9, which is a probability of near 1.0. If I have 40 moves, I still can't get a number bigger than 9 so I can't differentiate between those two nearly as well.

I am playing with the idea of shrinking the range as skill is reduced, to get around the "beal effect." I'd prefer that to having an artificial loop to slow the program down so much that the depth is reduced enough that the beal effect is minimized. To understand, think about a 1 ply search. Beal effect does not work at all. I find it irritating that it is just as complicated to make the program play poorly as it is to make it play well...
What you are saying makes sense to me on a 2 ply search, but if we crank up the ply to say to 10 it seemed much less clear to me as it seemed like you were accumulating more chances to get a higher score. Now that I think about it though, I think you are right (which you already knew of course).

Overall, very interesting stuff.

-Sam
It is definitely odd. Fortunately, to measure this stuff, I have the perfect facility here. The new thread I posted is the result of a 24 hour run to test 11 different skill settings for 30,000 games each.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Questions for the Stockfish team

Post by michiguel »

Chan Rasjid wrote:
Milos wrote:
Joost Buijs wrote:It is my feeling that everything depends on the quality of the evaluation. When i look at my own engine, it has an evaluation function comparable to a 1600 player, but it plays at 2850 level just because it is very good at tactics. I'm pretty sure that when i'm able to improve the evaluation function to a higher level it's elo will go up.
Evaluation is just a dead-end. There are so many different positional things that you need to implement to noticeably increase elo, and all that with an assumption that you don't lose any speed. However, with each new thing you lose more in speed than what you gain with positional knowledge. Or simply put, evaluation as it is in the most strong programs is really in it's local optimum, and you can't improve it more without drastically changing search.
On the other hand, you can improve search almost for free ;).
After a little thinking, I do see some meaning in what you say. "dead-end" is another term for an evaluation maxima that others refer to. As you did not give the details, I''ll try.

Evaluation can reach a (local?) optimal maximum because static evaluation is always an approximation
From a theoretical standpoint, that is wrong as demonstrated by tablebases. I know, this is an extreme, but...


- what others say there is no perfect evaluation. We can construct an "error-free" chess engine:
1) search - don't have any forward pruning, etc and thus searches the whole tree intact with normal good ordering, check extensions. So this search cannot make any mistake.
2) A conservative evaluation, say that of Fruit taking away any dangerous bonus.

If this engine plays at elo 2000, then let an expert team test all manner of improvements to just the evaluation. I think the best elo increase might be just 150/200. The engine cannot be made stronger except through changes to search.
Fruit 1.0 with plain alpha-beta search (i.e no nullmove) plays already around 2400. I used it as a partner.

I think Miguel had to do some maths - if an engine has a huge elo difference from 3000, it is in a weak search and not evaluation. Bob's pointing to the simple evaluation of Fruit is correct - most importantly, it is bug free.
My intuition tells me that is is both. You cannot reach that number with good eval or good search alone. Moreover, I think that they need to complement each other (i.e. a good search from one program + good eval from another will play worst than either of the originals).

Miguel

Rasjid
Last edited by michiguel on Wed Jul 21, 2010 5:54 pm, edited 1 time in total.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions for the Stockfish team

Post by bob »

Chan Rasjid wrote:
bob wrote:
Chan Rasjid wrote:The Beal Effect :-
I am not going to continuously argue a point that has already been explained in detail and verified by others, dating back as far as 20 years.
...
First disclaimer, I have not gone into the detail explanations here about the Beal Effect. My gut intuition just cannot accept a random move generator engine having any elo from only randomness.

My question is : if all search() returns is replaced with random(), is there any Beal Effect?

Code: Select all

// for all search()
x = search();
x = random();
I withdraw my question if search() is allowed to retain draws, mate scores etc that are real and part of search().

Rasjid
Search is unmodified, so yes, it recognizes draws and mates. The random eval is a simple approximation to mobility. You let the search choose among the random scores, and at any point in the tree, the more moves you have, the greater the probability you will get a good random score to back up, and vice-versa...
I think it is the "Beal Effect" - at any point in the tree, the more moves you have, the greater the probability you will get a good random score to back up.

I don't think deeply, but I do seem to accept it will be a crude mobility evaluator - probably even when all draws, mates are also replaced by random numbers.

I think Daniel Shawul was not given this concise explanation (here) earlier.

Rasjid
I don't follow. The "Beal effect" has been explained by me at least a dozen times in this thread, and by a couple of others as well...
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Questions for the Stockfish team

Post by Chan Rasjid »

Chan Rasjid wrote:
bob wrote:
Chan Rasjid wrote:The Beal Effect :-
I am not going to continuously argue a point that has already been explained in detail and verified by others, dating back as far as 20 years.
...
First disclaimer, I have not gone into the detail explanations here about the Beal Effect. My gut intuition just cannot accept a random move generator engine having any elo from only randomness.

My question is : if all search() returns is replaced with random(), is there any Beal Effect?

Code: Select all

// for all search()
x = search();
x = random();
I withdraw my question if search() is allowed to retain draws, mate scores etc that are real and part of search().

Rasjid
Search is unmodified, so yes, it recognizes draws and mates. The random eval is a simple approximation to mobility. You let the search choose among the random scores, and at any point in the tree, the more moves you have, the greater the probability you will get a good random score to back up, and vice-versa...
I think it is the "Beal Effect" - at any point in the tree, the more moves you have, the greater the probability you will get a good random score to back up.

I don't think deeply, but I do seem to accept it will be a crude mobility evaluator - probably even when all draws, mates are also replaced by random numbers.

I think Daniel Shawul was not given this concise explanation (here) earlier.

Rasjid
Unfair disadvatage to Daniel as he was not given this explicit explanation earlier. :D. But still I cannot accept elo1800 which is near a sane human player. Say dumb at elo 800.

Rasjid