Artificial stupidity - making a program play badly

Discussion of chess software programming and technical issues.

Moderator: Ras

mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Artificial stupidity - making a program play badly

Post by mjlef »

I went through the same thing with Zillions and earlier chess programs. WHat I settled on is limit search depth (auto play to determine a rating for a 1 ply search, 2 ply, etc. For even worse play, randomly toss out moves---do not score them based on how likely you think a human would be to overlook it...just toss x% of moves with a 1 ply search. You can then use autoplay to score that as well. People overlook moves all the time...even strong players miss mate in 1 sometimes.
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Artificial stupidity - making a program play badly

Post by Tord Romstad »

Dann Corbit wrote:How about:

if (rand() % 17 == 0) eval = -eval;

I guess it will uncork a real funny one once in a while.
I didn't bother to try it.
Perhaps worth trying, but I suspect the games won't look very realistic.
Colin put a lot of effort into Beowulf to make it play at different levels. Have you looked at what he did?
No, I haven't. My current code is (like so much else in Glaurung) mostly inspired by Phalanx. I'll have a look at Beowulf, too. Thanks for the suggestion.

Tord
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Artificial stupidity - making a program play badly

Post by Tord Romstad »

Aleks Peshkov wrote:I think that 1-ply search with usual extensions and quiesearch can be a good approximation of human in blitz.
I'm beginning to think the same, but I wish I could find something that I could vary more continuously. I want an adjustable Elo setting, and limiting the search to a single ply is just an on/off thing. That's one of the reasons I liked the idea of an adjustable blunder frequency.

Tord
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Artificial stupidity - making a program play badly

Post by Tord Romstad »

hgm wrote:
Tord Romstad wrote:Underpromotions and moves with negative SEE value are pruned with very high probability, long diagonal moves also have quite high probability, obvious recaptures have very low probability of being pruned, and so on. Finally, the frequency of pruning of course depends on the playing strength.
Pruning apparently bad moves will probably not have much effect against Humans, as they prune those moves too. So you make the program more Human-like by it, and vulnarable against full-width searchers which do find non-obvious tactics, but youdon't weaken it against Humans.
I don't prune only apparently bad moves, the probability is just a bit bigger for them. Any move can in principle be pruned as early as ply 2, and at the lowest level, the probability is quite big even for obvious recaptures.
I think it is very important that your prunings are correlated. If you independently decide if the same move will be pruned in some branches, but not in others, the search will simply seek out the branches where it is not pruned. And search is very clever at that.
This is a very good point. Thanks for mentioning it.
Try if you can tune it such that you manage to lose from NEG 0.3d! That is an engine that prunes every move! :lol:
Where can I find NEG? Unless it exists for Linux or OS X, I'm afraid I can't use it.

Tord
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Artificial stupidity - making a program play badly

Post by Zach Wegner »

Tord Romstad wrote: Trust me -- you wouldn't say that if you had tried to lose against 1900 rated humans on the ICC.
:wink:
My main concern was trying to play _any_ humans. Perhaps because I don't have an established rating, I could only really get computers to play ZCT.
It feels quite absurd to watch the games while they are played. I am nervous and excited, and always cheer for the opponent and hope my program will lose. Ultimately, I almost always end up disappointed. Here's a typical scenario:
  1. Glaurung hangs a rook.
  2. The opponent thinks for a few seconds, while I wait nervously and hope he will notice the hanging rook.
  3. The opponent ignores the hanging rook, and makes an irrelevant move somewhere else on the board.
  4. Glaurung thinks for a while, while I follow its PV, noticing that it still hasn't realized its rook is hanging, and hoping that it won't discover it at the last moment.
  5. Phew. Glaurung decides to leave the rook hanging.
  6. My relief is short-lasting, because the opponent moves instantly, leaving the rook untouched.
  7. Repeat of step 4.
  8. Repeat if step 5. There is still hope. :)
  9. The opponent thinks for a very long time. I'm beginning to get sure he has finally spotted the hanging rook, and is just making sure there is no trap before he takes it.
  10. He finally moves, doesn't take the rook, but instead walks into an instant back-rank-mate. Game over.
What this looks like to me is simply psychological perception. A typical human's view of a computer is a simple minded tactical machine that would never hang a rook. If you were to somehow hide the fact that they are playing a computer, I bet they would take the rook instantly. Unfortunately, this is cheating on ICC. Maybe if you played a bunch of unrated games without logging in, and simply calculated the ELO afterwards it would work better. Having your ELO displayed as 2000+ does not help either.

Having a (C) label makes it pretty hard to beat the Turing test...
I think my approach should simulate human play a bit better, but admittedly it doesn't seem very successful so far. In fact I do something slightly similar to what Mike Byrne does, just in a more complex way: Exactly how erroneous the search should be is decided by chance before every move.
I think your way is much more accurate, but doing both together might help. If human play is to be emulated well, I think the search must become _very_ selective, and in an even more general way (e.g. prune moves that are to the other side of the board as the last move). Maybe another thing would be to randomly return a qsearch score instead of searching deeper.
That's some relief. Glaurung's ICC rating when playing at an Elo setting of 1000 seems to have stabilized around 2100. Right now, it's 2094, after 44 games (+33,-10,=1). It has a lost position against a 1400 rated player in the currently running game, but I have lost all faith in human players and expect Glaurung to win in the end.
Not too bad. That's about what ZCT's rating is from playing computers. (Though we are aiming for opposite ends of the spectrum. :))
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Artificial stupidity - making a program play badly

Post by Tord Romstad »

mjlef wrote:I went through the same thing with Zillions and earlier chess programs. WHat I settled on is limit search depth (auto play to determine a rating for a 1 ply search, 2 ply, etc.
That's similar to what Aleks suggested. It should work, but I'd like something more continuous. The difference in strength between a 1 ply search and a 2 ply search is probably huge. Another disadvantage is that the ratings would have to be calibrated again for each new time control. A 1 ply search at blitz will obviously do much better against humans than a 1 ply search at a tournament time control.
For even worse play, randomly toss out moves---do not score them based on how likely you think a human would be to overlook it...just toss x% of moves with a 1 ply search. You can then use autoplay to score that as well. People overlook moves all the time...even strong players miss mate in 1 sometimes.
Glaurung never misses a mate in 1, even at the lowest level. Perhaps that alone is worth a considerable number of Elo points?

Tord
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Artificial stupidity - making a program play badly

Post by Tord Romstad »

Zach Wegner wrote:
Tord Romstad wrote: Trust me -- you wouldn't say that if you had tried to lose against 1900 rated humans on the ICC.
:wink:
My main concern was trying to play _any_ humans. Perhaps because I don't have an established rating, I could only really get computers to play ZCT.
Strange. Glaurung instantly became extremely popular among human players after I registered yesterday night. There are rarely more than a few seconds of idle time between the games. Perhaps having "Glaurung 080519, Elo = 1000" in the finger notes help to attract players, but I doubt they are fooled when the program's ICC rating is around 2100. :)
It feels quite absurd to watch the games while they are played. I am nervous and excited, and always cheer for the opponent and hope my program will lose. Ultimately, I almost always end up disappointed. Here's a typical scenario:
  1. Glaurung hangs a rook.
  2. The opponent thinks for a few seconds, while I wait nervously and hope he will notice the hanging rook.
  3. The opponent ignores the hanging rook, and makes an irrelevant move somewhere else on the board.
  4. Glaurung thinks for a while, while I follow its PV, noticing that it still hasn't realized its rook is hanging, and hoping that it won't discover it at the last moment.
  5. Phew. Glaurung decides to leave the rook hanging.
  6. My relief is short-lasting, because the opponent moves instantly, leaving the rook untouched.
  7. Repeat of step 4.
  8. Repeat if step 5. There is still hope. :)
  9. The opponent thinks for a very long time. I'm beginning to get sure he has finally spotted the hanging rook, and is just making sure there is no trap before he takes it.
  10. He finally moves, doesn't take the rook, but instead walks into an instant back-rank-mate. Game over.
What this looks like to me is simply psychological perception. A typical human's view of a computer is a simple minded tactical machine that would never hang a rook. If you were to somehow hide the fact that they are playing a computer, I bet they would take the rook instantly. Unfortunately, this is cheating on ICC. Maybe if you played a bunch of unrated games without logging in, and simply calculated the ELO afterwards it would work better. Having your ELO displayed as 2000+ does not help either.[/quote]

Perhaps you are right.
Having a (C) label makes it pretty hard to beat the Turing test...
Perhaps I could try to lie and write something in my finger notes about having been incorrectly and unjustly classified as a (C)heater? :wink:
I think my approach should simulate human play a bit better, but admittedly it doesn't seem very successful so far. In fact I do something slightly similar to what Mike Byrne does, just in a more complex way: Exactly how erroneous the search should be is decided by chance before every move.
I think your way is much more accurate, but doing both together might help. If human play is to be emulated well, I think the search must become _very_ selective, and in an even more general way (e.g. prune moves that are to the other side of the board as the last move). Maybe another thing would be to randomly return a qsearch score instead of searching deeper.
Perhaps worth trying.
That's some relief. Glaurung's ICC rating when playing at an Elo setting of 1000 seems to have stabilized around 2100. Right now, it's 2094, after 44 games (+33,-10,=1). It has a lost position against a 1400 rated player in the currently running game, but I have lost all faith in human players and expect Glaurung to win in the end.
Not too bad. That's about what ZCT's rating is from playing computers. (Though we are aiming for opposite ends of the spectrum. :))
I suspect that getting high ratings by playing against humans is far easier.

By the way, the 1400 rated player did manage to win, and briefly restored my faith in humanity, until Glaurung won against a 2220 rated player in the very next game.

Tord
Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Artificial stupidity - making a program play badly

Post by Dann Corbit »

Tord Romstad wrote:
Dann Corbit wrote:How about:

if (rand() % 17 == 0) eval = -eval;

I guess it will uncork a real funny one once in a while.
I didn't bother to try it.
Perhaps worth trying, but I suspect the games won't look very realistic.
It was a little tongue in cheek, but I think a modification like that might be useful so that younger players can have fun with it. I don't think that the effect will always be that noticeable, since 94% of the evals will be fine.
On the other hand, it seems an interesting experiment.

For younger players, they will sometimes plant a queen in front of an enemy pawn, so in that sense I think it has a certain realism anyway.
Colin put a lot of effort into Beowulf to make it play at different levels. Have you looked at what he did?
No, I haven't. My current code is (like so much else in Glaurung) mostly inspired by Phalanx. I'll have a look at Beowulf, too. Thanks for the suggestion.

Tord
User avatar
Roman Hartmann
Posts: 295
Joined: Wed Mar 08, 2006 8:29 pm

Re: Artificial stupidity - making a program play badly

Post by Roman Hartmann »

Well, I'm really not too much of a chess player to judge how human like the play without a QS really is. At least some stronger players than me claimed that they like to play vs roce because it would play human like.
I always assumed that this would be due the missing QS and the resulting rather planless play if there was no tactical shot on the board.

Like a human beginner an engine which lacks a QS will shuffle the queen on the board around chasing pawns and pieces. Maybe that's unwanted.

Anyway, the problem to make an engine play human-like is how to weaken it tactically because that's where humans usually have problems, not seeing a pin or a fork 2 or 3 ply ahead. But tactics is where engines really shine, so I don't think there is something easy to make it behave completely human like.

best regards
Roman
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Artificial stupidity - making a program play badly

Post by bob »

Tord Romstad wrote:The last few days, I've been working on the most important missing feature in my chess program: Adjustable playing strength. Strange though it might seem, making my program play very badly is by far the most difficult and frustrating thing I have attempted to do in computer chess, and I am now close to giving up in despair.

At ratings above 2200, I achieve limited strength simply by reducing the speed of calculation. This works fairly well, as one would expect. Below 2200, I try to emulate typical human blunders and tactical mistakes. This is where the problems begin. My approach seems very reasonable to me: I just prune random moves everywhere in the tree, and the probability that a move is pruned depends on how hard the move would be to see for a human player. Underpromotions and moves with negative SEE value are pruned with very high probability, long diagonal moves also have quite high probability, obvious recaptures have very low probability of being pruned, and so on. Finally, the frequency of pruning of course depends on the playing strength.

Tuning this turned out to be much trickier than I thought. I used TSCP as my sparring partner. The simple task of adjusting the blunder frequency so that my program scored somewhere above 0% and below 100% took a lot of time. After days of work, I finally began to hit close to the mark. I managed to find various settings which scored around 10%, 25%, 50%, 75% and 90% against TSCP. I was also quite pleased with the look of the games: Glaurung played positionally stronger than TSCP, but lost by making quite human-looking blunders. Many of the games looked almost like I would expect a game between TSCP and a similarly rated human to look.

Proud and happy with my work, I started an account on the ICC last night, in order to test against human players. I started with the settings which scored 50% against TSCP, which I thought (based on the WBEC ratings) should have a strength around 1700. At this level, the programs plays positionally ugly chess, and makes plenty of tactical blunders, but rarely hangs a piece, or misses to capture a hanging piece. The result was terribly disappointing: Glaurung played about a dozen games against players around 1900-2100, and won all games except for a single draw. Apparently, 2000 rated players on the ICC make elementary tactical blunders all the time.

I then adjusted the rating down to 1300, and tried again. At this level, the program drops a piece about once or twice per game, on average (at blitz time controls). It turned out that this hardly made any difference: Glaurung still scored close to 100%. Glaurung was frequently hanging pieces, but half the time the human opponents didn't see it, and half the time they quickly paid back the favor by blundering a piece themselves. With a blitz rating of around 2200, I gave up in disgust, logged off and went to bed.

Today, I logged on with the strength set to 1000 -- the lowest implemented level, which scores 0% against TSCP. Glaurung makes several horrible blunders in every single games. It is painful to watch, and it is difficult to imagine how it is possible to play much weaker without playing completely random moves. To my immense frustration, Glaurung still wins most of its games. The current blitz rating, after 37 games, is 2098.

How is this possible? TSCP is rated around 1700, and even when I make my program weak enough to lose every single game against TSCP, it still wins easily against most human players on the ICC. Are the ICC ratings 1000 points too high, or something? How do I manage to lose against average human players, without playing completely random moves?

I'm not sure what the purpose of this post is, apart from venting my frustration, but any advice about how to achieve weak, but realistic-looking play by a computer program would be welcome.

Tord
I have added a "skill feature" to crafty 22.1, that can be compiled in if desired.

It is a number between 1 and 100, where 100 is normal, and anything lower represents a degraded level of play...

Let's take skill 50 as an example. First thing this does is reduce all extensions and reductions by 50%, which reduces tactical skill. This includes things like check extensions, null-move and LRM type reductions, etc. It then reduces the evaluation numbers by 50% as well, and adds in a random value that is 50% of the value of a pawn.

I then ran a ton of cluster games to sort of calibrate what skill does. Roughly, skill 70 reduces the playing level by about 200 Elo. Skill 50, reduces it by 400 Elo. By the time you get to 10 or so it is playing real chess, but playing with almost no sense of material or positional values... Note that a random eval still plays something that looks like chess because it is a poor-man's way of maintaining a mobility evaluation term...

There are other ideas left to try.