Matchbox learning

bob · Post by **bob** » Sun Mar 23, 2008 1:58 am

Tony wrote:
bob wrote:
mathmoi wrote:
bob wrote:
sje wrote:A few decades ago, Martin Gardner wrote about a manually operated computer that played hexapawn (3x3 board, two rows of pawns) and included an effective learning algorithm. The computer was built from a couple of dozen small matchboxes and a bunch of colored jellybeans. Each matchbox was labeled with a possible hexapawn board position and contained jellybeans that corresponded to the available moves. Operation of the computer involved selectively removing jellybeans that represented moves in lost games.

Symbolic is now using a tangentially related algorithm for book learning. After a couple of hundred seed games using a conventional book, it has been building its current book based only on its past ICC play. This book now contains about 120,000 positions based on about 2,500 games and is increasing daily.

Each matchbox consists of a position hash and the jellybeans are the historical win/lose/draw counts. The position list and jellybean distribution are updated semi-automatically (I have to run a batch file) every day or so.

The working hypothesis is that a small book tuned to a program's peculiarities is operationally superior to a much larger book generated from play in a general, high strength population.

The two main disadvantages of this approach are:

1) A need to have a good variety of opponents,

2) Significant modifications of the search and evaluation code will likely degrade effectiveness of a book tuned to prior behavior.
Here's the down-side... If I notice you using this on ICC, you can expect crafty to teach you a terrible lesson that will come home to roost when we next meet in a CCT/ACCA event.

One has to be careful how you use things you learn, lest there be traps set.
Hi,

I don't understand why you think this learning technique is flawed. Can you explain?
Easily. Suppose We play an opening line where white (my side) has an easy winning position. But I make my program intentionally play badly so that yours learns that this is a good opening for black? Then, on tournament day, you walk right into this opening line that I "taught you" but this time, I don't have my program playing like an idiot, it is playing to win, and does so easily.

That's the down-side t- learning, and it is why I don't depend on it for preparing a book for tournament play at all.
If the learning didn't notice you played an idiot move then I wouldn't call it learning but rather repeating.

Tony

How would you notice this? Your only choice would be to play the game out with the move I played, and then again with the move you think is better. Takes too long...

hgm · Post by **hgm** » Sun Mar 23, 2008 8:28 am

It is well known that even Humans cannot properly learn if you expose them to non-representative learning material. If an infant grows up in a house with only a single book, he will not recognize other books as books, when asked to take all books from a bookshelf. The situation you sketch seems merely to be an example of that.

A proper learning algorithm would not draw conclusions from a single data item, no matter how often it repeated. If every time it played move A in position P it would have won, because the opponent always replied with move B, it _might_ be justified to draw the conclusion that move B is a bad one. (Depending on if there is sufficient diversity after B. But even then there is always the conclusion that it was playing an idiot opponent, from whom it would win no matter what opening moves were played...)

But is should _never_ draw the conclusion that move A is a good one. It only has some indication of that if move A seems to work against a wide variety of replies, against opponents that are known to offer serious resistance (e.g. because you also lost sufficiently many games against these opponents, or because you know their rating).

Any learning algorithm that does not recognize this elementary principle is flawed. Learning is accumulation of evidence. If you see evidence where there is none, and accumulate it, it cannot possibly lead to learning. It will at best be an elaborate form of randomizing.

sje · Post by **sje** » Sun Mar 23, 2008 5:16 pm

Let us remember back to the 1980s and the tournaments among dedicated commercial chess computers. A clever author would analyze a competitor's machine's opening book via matchbox learning, performed possibly by automated play or even by ROM reverse engineering. And this worked! Well, it worked until the competitor would revise the book.

It's a different story today when just about all of extant GM/IM praxis can be kept on US$20 USB thumb drive with room to spare for plenty of games played by strong computers. Again, it's easy to amass millions of games to build a jumbo opening book. But is this any better than a small book built exclusively (and with frequent updates to provide feedback) from games played by the program itself?

The purpose of an opening repertoire is to get the player to a playable middlegame while avoiding (and inflicting) opening traps. A matchbox book should be able to do this, and do it more efficiently and reliably than just munging together five million GM games.

bob · Post by **bob** » Sun Mar 23, 2008 11:06 pm

hgm wrote:It is well known that even Humans cannot properly learn if you expose them to non-representative learning material. If an infant grows up in a house with only a single book, he will not recognize other books as books, when asked to take all books from a bookshelf. The situation you sketch seems merely to be an example of that.

A proper learning algorithm would not draw conclusions from a single data item, no matter how often it repeated. If every time it played move A in position P it would have won, because the opponent always replied with move B, it _might_ be justified to draw the conclusion that move B is a bad one. (Depending on if there is sufficient diversity after B. But even then there is always the conclusion that it was playing an idiot opponent, from whom it would win no matter what opening moves were played...)

But is should _never_ draw the conclusion that move A is a good one. It only has some indication of that if move A seems to work against a wide variety of replies, against opponents that are known to offer serious resistance (e.g. because you also lost sufficiently many games against these opponents, or because you know their rating).

Any learning algorithm that does not recognize this elementary principle is flawed. Learning is accumulation of evidence. If you see evidence where there is none, and accumulate it, it cannot possibly lead to learning. It will at best be an elaborate form of randomizing.

OK, so _all_ current learning algorithms are flawed? What is your point? I don't believe I have ever said anything to the contrary. I just pointed out that it is easy to dupe a learning program, if you want to...

When you develop a learning algorithm _without_ this flaw, I'll be interested. Until then, I've done what I can do, as have others. We don't claim that what we do is perfect. We do claim that what we do works in the environment we developed it for. Once you actually develop some sort of learning algorithm, then perhaps you might be qualified to offer opinions on what is good and bad, but until you actually do something in this area, pointing out obvious flaws doesn't earn any brownie points since it offers zero new information.

My "learning" in crafty was developed to address one specific problem, "cooked books". It works perfectly to solve that problem. It also happens to solve the problem of not losing the same game the same way multiple times. It was not designed to be an AI-type learning algorithm that discovers "new knowledge" as it learns...

hgm · Post by **hgm** » Mon Mar 24, 2008 10:15 am

bob wrote:OK, so _all_ current learning algorithms are flawed? What is your point? I don't believe I have ever said anything to the contrary. I just pointed out that it is easy to dupe a learning program, if you want to...

Well, if you say so... I haven't studied any learning algorithms of Chess engines beyond what I read here in this thread. If they all conclude from move A winning after a B reply that A is good, yes, they are all flawed. So we agree on that. Please bear in mind it is not my sole calling in life to disagree with you.

My point is most succinctly summirized by "if you learn it nonsense, you should not be surprised it behaves badly".

To prevent the flaw, I would do exactly as I wrote above: In back-propagating the game result towards the root, take in each node the evidence or lack of evidence into account on all moves leading from the node, minimax fashon. In particular, empirical evidence on the quality of a move only affects the score of that node (and hence of the moves leading to the node) is there is reasonable evidence that that move was indeed the best move from the node. For this the move should have been played by at least several different serious opponents.

bob · Post by **bob** » Mon Mar 24, 2008 3:56 pm

hgm wrote:
bob wrote:OK, so _all_ current learning algorithms are flawed? What is your point? I don't believe I have ever said anything to the contrary. I just pointed out that it is easy to dupe a learning program, if you want to...
Well, if you say so... I haven't studied any learning algorithms of Chess engines beyond what I read here in this thread. If they all conclude from move A winning after a B reply that A is good, yes, they are all flawed. So we agree on that. Please bear in mind it is not my sole calling in life to disagree with you.

My point is most succinctly summirized by "if you learn it nonsense, you should not be surprised it behaves badly".

To prevent the flaw, I would do exactly as I wrote above: In back-propagating the game result towards the root, take in each node the evidence or lack of evidence into account on all moves leading from the node, minimax fashon. In particular, empirical evidence on the quality of a move only affects the score of that node (and hence of the moves leading to the node) is there is reasonable evidence that that move was indeed the best move from the node. For this the move should have been played by at least several different serious opponents.

Again, the above doesn't work, because you describe exactly what I do already. But I "learn" in two ways...

(1) when I leave book, I watch the evaluation over the next 10 moves to figure out what is happening. If it starts out good, but drops steadily, this is probably a gambit that should not be accepted. If it starts out bad but climbs steadily, this is probably a gambit I played that works well. If the score just sucks out of book, this move is probably bad, if the score is good and stays there, it is probably a good move. This gets adjusted and averaged over all the games played, and I factor in the rating of the opponent since if a weak player can force a - eval for me, this is much worse than if a stronger player can force me into a - eval.

(2) if I lose a game, I back-propogate this thru the opening tree to make certain I won't play this opening again in this "match". I don't learn "good results" directly but by losing a game, it back-propogates as "good" for the other side so it does learn both sides.

All of this works flawlessly in matches or tournaments, but can be manipulated if someone wants to. In my case manipulation is impossible, if your goal is to try to trick me into playing bad openings by letting me win, then springing the "real surprise" on me in a tournament. Because I don't use learned results in tournaments, I use a book that has been pre-tested to eliminate the ugly stuff, so that I only use learning during the event to avoid repetitive losses every other round to the same opening, something that has happened to many program in the past...

I don't see a viable way to detect manipulation, other than to use something like crafty's "annotate" facility on each game played to see if something odd happened. And that still fails if the opponent is subtle enough to play bad moves, but the moves are only detectable as bad with _very_ deep searches, which would make the "annotate" run take too much time to be useful after each and every game...

Tony · Post by **Tony** » Mon Mar 24, 2008 7:25 pm

bob wrote:
Tony wrote:
bob wrote:
mathmoi wrote:
bob wrote:
sje wrote:A few decades ago, Martin Gardner wrote about a manually operated computer that played hexapawn (3x3 board, two rows of pawns) and included an effective learning algorithm. The computer was built from a couple of dozen small matchboxes and a bunch of colored jellybeans. Each matchbox was labeled with a possible hexapawn board position and contained jellybeans that corresponded to the available moves. Operation of the computer involved selectively removing jellybeans that represented moves in lost games.

Symbolic is now using a tangentially related algorithm for book learning. After a couple of hundred seed games using a conventional book, it has been building its current book based only on its past ICC play. This book now contains about 120,000 positions based on about 2,500 games and is increasing daily.

Each matchbox consists of a position hash and the jellybeans are the historical win/lose/draw counts. The position list and jellybean distribution are updated semi-automatically (I have to run a batch file) every day or so.

The working hypothesis is that a small book tuned to a program's peculiarities is operationally superior to a much larger book generated from play in a general, high strength population.

The two main disadvantages of this approach are:

1) A need to have a good variety of opponents,

2) Significant modifications of the search and evaluation code will likely degrade effectiveness of a book tuned to prior behavior.
Here's the down-side... If I notice you using this on ICC, you can expect crafty to teach you a terrible lesson that will come home to roost when we next meet in a CCT/ACCA event.

One has to be careful how you use things you learn, lest there be traps set.
Hi,

I don't understand why you think this learning technique is flawed. Can you explain?
Easily. Suppose We play an opening line where white (my side) has an easy winning position. But I make my program intentionally play badly so that yours learns that this is a good opening for black? Then, on tournament day, you walk right into this opening line that I "taught you" but this time, I don't have my program playing like an idiot, it is playing to win, and does so easily.

That's the down-side t- learning, and it is why I don't depend on it for preparing a book for tournament play at all.
If the learning didn't notice you played an idiot move then I wouldn't call it learning but rather repeating.

Tony
How would you notice this? Your only choice would be to play the game out with the move I played, and then again with the move you think is better. Takes too long...

If my score jumps after your move, I shouldn't trust it.

Tony

bob · Post by **bob** » Mon Mar 24, 2008 7:52 pm

Tony wrote:
bob wrote:
Tony wrote:
bob wrote:
mathmoi wrote:
bob wrote:
sje wrote:A few decades ago, Martin Gardner wrote about a manually operated computer that played hexapawn (3x3 board, two rows of pawns) and included an effective learning algorithm. The computer was built from a couple of dozen small matchboxes and a bunch of colored jellybeans. Each matchbox was labeled with a possible hexapawn board position and contained jellybeans that corresponded to the available moves. Operation of the computer involved selectively removing jellybeans that represented moves in lost games.

Symbolic is now using a tangentially related algorithm for book learning. After a couple of hundred seed games using a conventional book, it has been building its current book based only on its past ICC play. This book now contains about 120,000 positions based on about 2,500 games and is increasing daily.

Each matchbox consists of a position hash and the jellybeans are the historical win/lose/draw counts. The position list and jellybean distribution are updated semi-automatically (I have to run a batch file) every day or so.

The working hypothesis is that a small book tuned to a program's peculiarities is operationally superior to a much larger book generated from play in a general, high strength population.

The two main disadvantages of this approach are:

1) A need to have a good variety of opponents,

2) Significant modifications of the search and evaluation code will likely degrade effectiveness of a book tuned to prior behavior.
Here's the down-side... If I notice you using this on ICC, you can expect crafty to teach you a terrible lesson that will come home to roost when we next meet in a CCT/ACCA event.

One has to be careful how you use things you learn, lest there be traps set.
Hi,

I don't understand why you think this learning technique is flawed. Can you explain?
Easily. Suppose We play an opening line where white (my side) has an easy winning position. But I make my program intentionally play badly so that yours learns that this is a good opening for black? Then, on tournament day, you walk right into this opening line that I "taught you" but this time, I don't have my program playing like an idiot, it is playing to win, and does so easily.

That's the down-side t- learning, and it is why I don't depend on it for preparing a book for tournament play at all.
If the learning didn't notice you played an idiot move then I wouldn't call it learning but rather repeating.

Tony
How would you notice this? Your only choice would be to play the game out with the move I played, and then again with the move you think is better. Takes too long...
If my score jumps after your move, I shouldn't trust it.

Tony

??

You have no score while in book, which is what we are talking about here... So how can you determine if it jumped or not? You certainly can not say +1.5 can't be trusted, you might have accepted a gambit that gets you killed in another 10-20 moves.

Tony · Post by **Tony** » Mon Mar 24, 2008 9:57 pm

My mistake. I thought it was about adding moves and concluded that you were not in the book and thus had an eval.

Tony

sje · Post by **sje** » Thu Mar 27, 2008 2:36 am

I've been testing the matchbox book learning on FICS along with a larger transposition table (128 million entries) and the 64 bit code conversion. So far, the program has gotten personal record results on FICS:

Code: Select all

          rating     RD      win    loss    draw   total   best
Blitz      2359     56.2    1029     401      98    1528   2359 (26-Mar-2008)
Standard   2437     59.2     815     379      98    1292   2491 (26-Mar-2008)
Lightning  2477     61.4     195      72      17     284   2477 (26-Mar-2008)

The results are not quite as good as they seem as a number of wins were due to under-booked opponents. Symbolic would learn up to 64 ply of moves and by playing its half of them in a game near instantly, the opponent would wind up with a significant time disadvantage. The effect is most pronounced in lightning (one minute) games, and so Symbolic has made in to number six on the FICS lightning ranking chart.

I'll be adding another four GB RAM to the battle machine as soon as I can get out to pick up the chips; I've been stuck inside recovering from a little outpatient surgery. (Expensive it was, but at least I get some marvelous pain killers to take home.) I'll likely take the main machine to 16 GB total within a month; the secondary machine has only 1 GB and can't be improved beyond 3 GB; it's the one currentlp playing on ICC.

Matchbox learning

Re: Matchbox learning

Re: Matchbox learning

Re: Matchbox learning

Re: Matchbox learning

Re: Matchbox learning

Re: Matchbox learning

Re: Matchbox learning

Re: Matchbox learning

Re: Matchbox learning

Re: Matchbox learning