Futility pruning, Ext futility pruning and Limited Razoring.

rjgibert · Post by **rjgibert** » Sat Dec 12, 2009 5:20 am

My impression is that Romi uses a variant of the following called "MENACE":

http://delphiforfun.org/Programs/tic_ta ... achine.htm

Against a chess program that has a narrow opening book, Romi apparently learns steadily how to dominate it even when that program is stronger.

My assumption is that it essentially builds an opening book that is an exploitive one that zeros in on all the programs playing weaknesses, which boil down to being horizon effect errors. Since all programs more or less share the same weaknesses, the book it builds to varying degrees, should be fairly effective against most programs.

Unless the program it plays has the same sort of algorithm, it has to resort to using a really huge opening book and rather than try to get an edge in the opening stage with some pet line. I call this the "moving target defense." Keep playing something different, so that the learning algorithm takes too long to accomplish its aims in a practical time frame.

In the old days, human players would eventually memorize a way of always beating the computer program. This consisted of playing some reasonable move that got the program out of book early, then playing the damn thing over and over again until it finally lost a game, then simply repeating that game would be rewarded with a win every time. Such a method would create a "book" of maybe a dozen distinct games in those days for beating the program depending on how much the program managed to vary its play. Romi appears to do, more or less, an elaboration of this scheme.

Don · Post by **Don** » Sat Dec 12, 2009 5:47 am

One must defend against this type of learning though. I don't think just having a super variety book is quite enough because a program like Romi could start with an extremely off-beat and tiny opening book designed to get you out of theory, then start learning from that point. For instance if Romi always played 1. h3 on the first move, do you have heavy variety from that point? Probably not. 1. h3 may not be the best move in the world, but it's far from losing.

And one could have an equally bizarre response as black to all the common first opening moves. Anything that quickly moves you away from theory without getting too horrible a game.

The only defense is to do something that ensures variety even outside the opening book. You want to vary as quickly as possible from games you lost.

rjgibert wrote:My impression is that Romi uses a variant of the following called "MENACE":

http://delphiforfun.org/Programs/tic_ta ... achine.htm

Against a chess program that has a narrow opening book, Romi apparently learns steadily how to dominate it even when that program is stronger.

My assumption is that it essentially builds an opening book that is an exploitive one that zeros in on all the programs playing weaknesses, which boil down to being horizon effect errors. Since all programs more or less share the same weaknesses, the book it builds to varying degrees, should be fairly effective against most programs.

Unless the program it plays has the same sort of algorithm, it has to resort to using a really huge opening book and rather than try to get an edge in the opening stage with some pet line. I call this the "moving target defense." Keep playing something different, so that the learning algorithm takes too long to accomplish its aims in a practical time frame.

In the old days, human players would eventually memorize a way of always beating the computer program. This consisted of playing some reasonable move that got the program out of book early, then playing the damn thing over and over again until it finally lost a game, then simply repeating that game would be rewarded with a win every time. Such a method would create a "book" of maybe a dozen distinct games in those days for beating the program depending on how much the program managed to vary its play. Romi appears to do, more or less, an elaboration of this scheme.

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 12, 2009 6:10 am

Don wrote:Michael,

What is your book learning idea? When you lose a game you give every move you played a slight bonus and when you win you give it a slight penalty? And you record all move played in all game?

Do you give the same bonus and penalty? It seems like you have what is called the credit assignment problem and one method used here is to give higher bonus or penalty to later moves. Do you do something like that?

Hi Don,

There are two learning Ideas that coexist together for the purpose of human/computer sparring. Either one or both can be used in an engine depending on what the goal is.

The one that you asked about is taken from Pavlov's dog experiments and adapted for computer chess. Essentially the entire game is remembered but, to be practical it is restricted to 180 ply. The winning side gets a slight bonus for every one of its moves and the losing side gets a slight penalty for every move. The bonuses/penalties are slightly more for the deeper plies. The unintuitive thing about doing it this way is that bad moves can and do get bonuses and good moves can and do get penalties. Just like humans learn bad moves from opening books or overlook good moves the computer can do the same. In time though these get sorted out and the computer's openings become stronger. When playing stronger engines the learning eventually differentiates all the way back to the root which move the engine performs the best with. Then the learning starts once again to propagate back up the tree. It does not take as many games as it might seem to start getting improved results. This is not a book--it is just scores stored in the hash to a certain depth before the search is called.

The second method is just simply monkey see monkey do. If a move has been played enough times by either contestant (as little as once) and has a high enough winning average (as low as 40%) it is played immediately as book.

So, in human/computer sparring the human can set up a position that they want to learn. If the computer wins the game it will always repeat the same moves until the human wins at which time it will modify its play until it wins. If the human is willing to play both sides to learn the position well then the computer will play the humans winning moves back against them until the human shows the computer how to win the other side.

The combination of both methods also works well against computers. Though if you have a very good shallow book and just the first method is used when out of book then that should be good enough for computer/computer games.

Edit: The late Marc Lacrosse (RIP) did a nice test of Romi's learning ability. He preloaded 100,000 games. Picked several very strong top ten opponents and a good book. Played IIRC 6 sets of 1,000 games. Romi showed an increased elo performance for every set of games!

Futility pruning, Ext futility pruning and Limited Razoring.

Re: Futility pruning, Ext futility pruning and Limited Razor

Re: Futility pruning, Ext futility pruning and Limited Razor

Re: Futility pruning, Ext futility pruning and Limited Razor