Interpreting Alpha-Beta pseudo-code

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Fguy64
Posts: 814
Joined: Sat May 09, 2009 4:51 pm
Location: Toronto

Interpreting Alpha-Beta pseudo-code

Post by Fguy64 »

OK, I am using Bruce Moreland's alpha-beta page as an guide, and I am trying to get a clear and stable picture in my mind of how is program might implement the algorithm he presents. here's the link again...
http://web.archive.org/web/200404200223 ... habeta.htm

ok first I will describe the scenario, and clarify some semantics...

There is a game between human and engine, and it is the engine's turn to move.
level0 describes the position after the human move, it consists of a single node (position).
level1 consists of n nodes, which are the positions which arise after each possible legal move that the engine might make "on the board"
I also use the terms search and alphaBeta somewhat interchangeably.

Now at this point I suppose we can consider that the problem in two ways...

1. Only one "external call" to search, in which we pass the level0 position to the search, and the search determines the level1 positions and somehow choses from them the final optimal move to be made on the board.

2. Outside of search we have some kind of non-recursive makeBestMove() method. It will determine the n legal moves (level1), then iterate through them and make n external calls to search. Each call will return an eval of the position after the move. And on the basis of these n evals the makeBestMove() method will choose the best move.

OK, here's the rub...

Moreland's alphabeta is an integer valued function, it returns an evaluation, as does every other pseudo-code alphabeta I have ever seen. That suggests to me method 2, n external calls to search, n evals get returned, then some other method chooses the level1 move with the maximum eval.

However, Moreland also describes an initial external call to search as val = AlphaBeta(5, -INFINITY, INFINITY);
Now that suggest to me method 1. It has to be, or else the fact that we are passing two constants (-INFINITY, INFINITY) as alpha and beta would mean we have lost out on a bg chunk of pruning. But if it is method 1, then surely the final optimal move is being determined by the alphaBeta() function, in which case it would be nice to see that represented in the pseudo-code, and it woyuld also be nice to see alphaBeta() return a move and not an eval.

Now there are a couple of ways I might resolve this discrepancy, but I feel that somewhere along the way I am unnecessarily complicating things. So how do people typically deal with this. Method 1 or method 2?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Interpreting Alpha-Beta pseudo-code

Post by bob »

Fguy64 wrote:OK, I am using Bruce Moreland's alpha-beta page as an guide, and I am trying to get a clear and stable picture in my mind of how is program might implement the algorithm he presents. here's the link again...
http://web.archive.org/web/200404200223 ... habeta.htm

ok first I will describe the scenario, and clarify some semantics...

There is a game between human and engine, and it is the engine's turn to move.
level0 describes the position after the human move, it consists of a single node (position).
level1 consists of n nodes, which are the positions which arise after each possible legal move that the engine might make "on the board"
I also use the terms search and alphaBeta somewhat interchangeably.

Now at this point I suppose we can consider that the problem in two ways...

1. Only one "external call" to search, in which we pass the level0 position to the search, and the search determines the level1 positions and somehow choses from them the final optimal move to be made on the board.

2. Outside of search we have some kind of non-recursive makeBestMove() method. It will determine the n legal moves (level1), then iterate through them and make n external calls to search. Each call will return an eval of the position after the move. And on the basis of these n evals the makeBestMove() method will choose the best move.

OK, here's the rub...

Moreland's alphabeta is an integer valued function, it returns an evaluation, as does every other pseudo-code alphabeta I have ever seen. That suggests to me method 2, n external calls to search, n evals get returned, then some other method chooses the level1 move with the maximum eval.

However, Moreland also describes an initial external call to search as val = AlphaBeta(5, -INFINITY, INFINITY);
Now that suggest to me method 1. It has to be, or else the fact that we are passing two constants (-INFINITY, INFINITY) as alpha and beta would mean we have lost out on a bg chunk of pruning. But if it is method 1, then surely the final optimal move is being determined by the alphaBeta() function, in which case it would be nice to see that represented in the pseudo-code, and it woyuld also be nice to see alphaBeta() return a move and not an eval.

Now there are a couple of ways I might resolve this discrepancy, but I feel that somewhere along the way I am unnecessarily complicating things. So how do people typically deal with this. Method 1 or method 2?
First, AlphaBeta() is a recursive function. Think of it like this:

It is called with a position to search, and a depth to search it to. From the root, we call alpha/beta with the root position and the nominal search depth. Once we get into AlphaBeta(), if depth is > 0, we make a move, and then call AlphaBeta() with the opposite side on move, the new position, and a depth that has been reduced by 1 ply. Whenever we start to call AlphaBeta() and we notice that the current depth is 1, we now call Quiesce() instead of AlphaBeta() (which really does the same thing but just generates fewer moves, namely just captures).
Fguy64
Posts: 814
Joined: Sat May 09, 2009 4:51 pm
Location: Toronto

Re: Interpreting Alpha-Beta pseudo-code

Post by Fguy64 »

bob wrote:
Fguy64 wrote:OK, I am using Bruce Moreland's alpha-beta page as an guide, and I am trying to get a clear and stable picture in my mind of how is program might implement the algorithm he presents. here's the link again...
http://web.archive.org/web/200404200223 ... habeta.htm

ok first I will describe the scenario, and clarify some semantics...

There is a game between human and engine, and it is the engine's turn to move.
level0 describes the position after the human move, it consists of a single node (position).
level1 consists of n nodes, which are the positions which arise after each possible legal move that the engine might make "on the board"
I also use the terms search and alphaBeta somewhat interchangeably.

Now at this point I suppose we can consider that the problem in two ways...

1. Only one "external call" to search, in which we pass the level0 position to the search, and the search determines the level1 positions and somehow choses from them the final optimal move to be made on the board.

2. Outside of search we have some kind of non-recursive makeBestMove() method. It will determine the n legal moves (level1), then iterate through them and make n external calls to search. Each call will return an eval of the position after the move. And on the basis of these n evals the makeBestMove() method will choose the best move.

OK, here's the rub...

Moreland's alphabeta is an integer valued function, it returns an evaluation, as does every other pseudo-code alphabeta I have ever seen. That suggests to me method 2, n external calls to search, n evals get returned, then some other method chooses the level1 move with the maximum eval.

However, Moreland also describes an initial external call to search as val = AlphaBeta(5, -INFINITY, INFINITY);
Now that suggest to me method 1. It has to be, or else the fact that we are passing two constants (-INFINITY, INFINITY) as alpha and beta would mean we have lost out on a bg chunk of pruning. But if it is method 1, then surely the final optimal move is being determined by the alphaBeta() function, in which case it would be nice to see that represented in the pseudo-code, and it woyuld also be nice to see alphaBeta() return a move and not an eval.

Now there are a couple of ways I might resolve this discrepancy, but I feel that somewhere along the way I am unnecessarily complicating things. So how do people typically deal with this. Method 1 or method 2?
First, AlphaBeta() is a recursive function. Think of it like this:

It is called with a position to search, and a depth to search it to. From the root, we call alpha/beta with the root position and the nominal search depth. Once we get into AlphaBeta(), if depth is > 0, we make a move, and then call AlphaBeta() with the opposite side on move, the new position, and a depth that has been reduced by 1 ply. Whenever we start to call AlphaBeta() and we notice that the current depth is 1, we now call Quiesce() instead of AlphaBeta() (which really does the same thing but just generates fewer moves, namely just captures).
OK that seems clear enough, thanks. It tells me one and only one "external call" to the recursive search function. It also tells me that just because alphaBeta() is depicted as returning an integer (alpha), it does not mean that the original "root-call" to alphaBeta will make any meaningful use of a returned integer.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Interpreting Alpha-Beta pseudo-code

Post by bob »

Fguy64 wrote:
bob wrote:
Fguy64 wrote:OK, I am using Bruce Moreland's alpha-beta page as an guide, and I am trying to get a clear and stable picture in my mind of how is program might implement the algorithm he presents. here's the link again...
http://web.archive.org/web/200404200223 ... habeta.htm

ok first I will describe the scenario, and clarify some semantics...

There is a game between human and engine, and it is the engine's turn to move.
level0 describes the position after the human move, it consists of a single node (position).
level1 consists of n nodes, which are the positions which arise after each possible legal move that the engine might make "on the board"
I also use the terms search and alphaBeta somewhat interchangeably.

Now at this point I suppose we can consider that the problem in two ways...

1. Only one "external call" to search, in which we pass the level0 position to the search, and the search determines the level1 positions and somehow choses from them the final optimal move to be made on the board.

2. Outside of search we have some kind of non-recursive makeBestMove() method. It will determine the n legal moves (level1), then iterate through them and make n external calls to search. Each call will return an eval of the position after the move. And on the basis of these n evals the makeBestMove() method will choose the best move.

OK, here's the rub...

Moreland's alphabeta is an integer valued function, it returns an evaluation, as does every other pseudo-code alphabeta I have ever seen. That suggests to me method 2, n external calls to search, n evals get returned, then some other method chooses the level1 move with the maximum eval.

However, Moreland also describes an initial external call to search as val = AlphaBeta(5, -INFINITY, INFINITY);
Now that suggest to me method 1. It has to be, or else the fact that we are passing two constants (-INFINITY, INFINITY) as alpha and beta would mean we have lost out on a bg chunk of pruning. But if it is method 1, then surely the final optimal move is being determined by the alphaBeta() function, in which case it would be nice to see that represented in the pseudo-code, and it woyuld also be nice to see alphaBeta() return a move and not an eval.

Now there are a couple of ways I might resolve this discrepancy, but I feel that somewhere along the way I am unnecessarily complicating things. So how do people typically deal with this. Method 1 or method 2?
First, AlphaBeta() is a recursive function. Think of it like this:

It is called with a position to search, and a depth to search it to. From the root, we call alpha/beta with the root position and the nominal search depth. Once we get into AlphaBeta(), if depth is > 0, we make a move, and then call AlphaBeta() with the opposite side on move, the new position, and a depth that has been reduced by 1 ply. Whenever we start to call AlphaBeta() and we notice that the current depth is 1, we now call Quiesce() instead of AlphaBeta() (which really does the same thing but just generates fewer moves, namely just captures).
OK that seems clear enough, thanks. It tells me one and only one "external call" to the recursive search function. It also tells me that just because alphaBeta() is depicted as returning an integer (alpha), it does not mean that the original "root-call" to alphaBeta will make any meaningful use of a returned integer.
All that value is good for is you can display it and say "this is the score for the best move, based on the search I just completed."
User avatar
xsadar
Posts: 147
Joined: Wed Jun 06, 2007 10:01 am
Location: United States
Full name: Mike Leany

Re: Interpreting Alpha-Beta pseudo-code

Post by xsadar »

Both methods work fine. Eventually you will probably prefer method 2 because it lets you do special things in the root position, but at this point, I think it's best for you to pick whichever method you find easiest to understand and deal with.
Fguy64
Posts: 814
Joined: Sat May 09, 2009 4:51 pm
Location: Toronto

Re: Interpreting Alpha-Beta pseudo-code

Post by Fguy64 »

xsadar wrote:Both methods work fine. Eventually you will probably prefer method 2 because it lets you do special things in the root position, but at this point, I think it's best for you to pick whichever method you find easiest to understand and deal with.
well intuitively I can't see why method two wouldn't work, it seems reasonable, but something was going wrong, I just couldn't see what. So I tried method 1 as Bob described, and it appears to have come together nicely, I'm working my way through the test positions, and it's getting done a lot faster than my old "semi-alphabeta" search that I started out these threads with at the beginning of the week.

touch wood.

Anyways, I'll do a summary post once I get it all nailed down, so people can see the results of their assistance. :)
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Interpreting Alpha-Beta pseudo-code

Post by Sven »

I am not yet sure what your exact problem was. Was it the question how you get the best move returned by your search function (AlphaBeta)? In this case you have several choices:

1) Stay with only one search function (ignoring quiescence search here), and let this function return the best move via an output parameter in addition to the integer return value representing the value of the subtree. The best move might only be used by the "external" caller (at "Level 0" as you called it) and will be ignored otherwise, but avoiding different cases makes the code easier to read.

2) Same as 1 but with an additional condition "if (we are at level 0)" around the update of the best move.

3) Have two slightly different versions of the search function (I think this is what you called "method 2"), one for "Level 0" and one for all other levels. In this case, only the "Level 0" function needs to track the best move. Everything else is essentially the same in both functions, at least for your first approach. Disadvantage: some more lines of code. Advantage: you avoid to execute a tiny bit of code that is not necessary below level 0 (update of best move when current move is an improvement), and you save stack space by not maintaining the best move at lower levels.

4) Similar to choice 1 again, but extended to a data structure representing the whole PV (principal variation) of the subtree, instead of only the best move. It is necessary to have something like this if you ever want to display the current PV to the user.

If you would ask me, the easiest way in the beginning would be choice 1 for me.

Sven
Fguy64
Posts: 814
Joined: Sat May 09, 2009 4:51 pm
Location: Toronto

Re: Interpreting Alpha-Beta pseudo-code

Post by Fguy64 »

Sven Schüle wrote:I am not yet sure what your exact problem was. Was it the question how you get the best move returned by your search function (AlphaBeta)? In this case you have several choices:

...

If you would ask me, the easiest way in the beginning would be choice 1 for me.

Sven
Indeed, that was a big part of the problem. I haven't considered your choices in detail yet, but I will in due time. What I ended up doing was maintaining a move object at the class level, outside of my method1 search. Then, whenever alpha was updated or when a beta cutoff happened, both at depth==ply, I would update my external move object.

As it stands now, I have lost one thing in going to method1. It is entirely reasonable to expect that more than one level1 move would have the same eval, and I'd prefer to select randomly from those moves, for variety. Now that sort of thing seems much clearer with method2, but I'm sure it can't be too hard to do it under method1, I just haven't tried to work it out yet.

p.s. I intend to revisit method2 once I get method1 nailed down, and solve the problem. I'll let you know how it goes.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Interpreting Alpha-Beta pseudo-code

Post by Sven »

Fguy64 wrote:As it stands now, I have lost one thing in going to method1. It is entirely reasonable to expect that more than one level1 move would have the same eval, and I'd prefer to select randomly from those moves, for variety. Now that sort of thing seems much clearer with method2, but I'm sure it can't be too hard to do it under method1, I just haven't tried to work it out yet.
It is not a good idea to consider several (root) moves that return the same evaluation, and choose randomly between all of them. Here is the problem.

Say you have moves A and B at the root which are searched in this order. Searching A with window (alpha, beta) returns value X which is the best value up to now. Now you search B with window (X, beta) and get X returned again, what does this mean?

The true value of B is "lower or equal than X", and it may happen frequently that it is lower than X but you will never get to know about it. As soon as the opponent has one response to B that evaluates at least to -X from his viewpoint, this causes a cutoff. The opponent might have a better response but this would be irrelevant, B is already refuted sufficiently so other responses are not even tried.

For this reason it is not suitable to compare the value of B, which may have been searched incompletely by intent due to a cutoff, to the value of A which has been searched completely since it has become the best move.

So you better save the effort and just track one best move, and only overwrite it if a new move improves the best value. Everything else might be counter-productive for playing strength.

Sven
Fguy64
Posts: 814
Joined: Sat May 09, 2009 4:51 pm
Location: Toronto

Re: Interpreting Alpha-Beta pseudo-code

Post by Fguy64 »

Sven Schüle wrote:
Fguy64 wrote:As it stands now, I have lost one thing in going to method1. It is entirely reasonable to expect that more than one level1 move would have the same eval, and I'd prefer to select randomly from those moves, for variety. Now that sort of thing seems much clearer with method2, but I'm sure it can't be too hard to do it under method1, I just haven't tried to work it out yet.
It is not a good idea to consider several (root) moves that return the same evaluation, and choose randomly between all of them. Here is the problem.

Say you have moves A and B at the root which are searched in this order. Searching A with window (alpha, beta) returns value X which is the best value up to now. Now you search B with window (X, beta) and get X returned again, what does this mean?

The true value of B is "lower or equal than X", and it may happen frequently that it is lower than X but you will never get to know about it. As soon as the opponent has one response to B that evaluates at least to -X from his viewpoint, this causes a cutoff. The opponent might have a better response but this would be irrelevant, B is already refuted sufficiently so other responses are not even tried.

For this reason it is not suitable to compare the value of B, which may have been searched incompletely by intent due to a cutoff, to the value of A which has been searched completely since it has become the best move.

So you better save the effort and just track one best move, and only overwrite it if a new move improves the best value. Everything else might be counter-productive for playing strength.

Sven
OK I think I see what you mean. It sounds like the value that gets assigned after a cutoff can't be reliable? That would make sense.

It is starting to sound like a bit of a tradeoff. If you want the speed benefits of full alphabeta, then you have to give something up, and that is the ability to consider two moves at the same level as equal.