Artificial stupidity - making a program play badly
Moderator: Ras
-
- Posts: 1494
- Joined: Thu Mar 30, 2006 2:08 pm
Re: Artificial stupidity - making a program play badly
I went through the same thing with Zillions and earlier chess programs. WHat I settled on is limit search depth (auto play to determine a rating for a 1 ply search, 2 ply, etc. For even worse play, randomly toss out moves---do not score them based on how likely you think a human would be to overlook it...just toss x% of moves with a 1 ply search. You can then use autoplay to score that as well. People overlook moves all the time...even strong players miss mate in 1 sometimes.
-
- Posts: 1808
- Joined: Wed Mar 08, 2006 9:19 pm
- Location: Oslo, Norway
Re: Artificial stupidity - making a program play badly
Perhaps worth trying, but I suspect the games won't look very realistic.Dann Corbit wrote:How about:
if (rand() % 17 == 0) eval = -eval;
I guess it will uncork a real funny one once in a while.
I didn't bother to try it.
No, I haven't. My current code is (like so much else in Glaurung) mostly inspired by Phalanx. I'll have a look at Beowulf, too. Thanks for the suggestion.Colin put a lot of effort into Beowulf to make it play at different levels. Have you looked at what he did?
Tord
-
- Posts: 1808
- Joined: Wed Mar 08, 2006 9:19 pm
- Location: Oslo, Norway
Re: Artificial stupidity - making a program play badly
I'm beginning to think the same, but I wish I could find something that I could vary more continuously. I want an adjustable Elo setting, and limiting the search to a single ply is just an on/off thing. That's one of the reasons I liked the idea of an adjustable blunder frequency.Aleks Peshkov wrote:I think that 1-ply search with usual extensions and quiesearch can be a good approximation of human in blitz.
Tord
-
- Posts: 1808
- Joined: Wed Mar 08, 2006 9:19 pm
- Location: Oslo, Norway
Re: Artificial stupidity - making a program play badly
I don't prune only apparently bad moves, the probability is just a bit bigger for them. Any move can in principle be pruned as early as ply 2, and at the lowest level, the probability is quite big even for obvious recaptures.hgm wrote:Pruning apparently bad moves will probably not have much effect against Humans, as they prune those moves too. So you make the program more Human-like by it, and vulnarable against full-width searchers which do find non-obvious tactics, but youdon't weaken it against Humans.Tord Romstad wrote:Underpromotions and moves with negative SEE value are pruned with very high probability, long diagonal moves also have quite high probability, obvious recaptures have very low probability of being pruned, and so on. Finally, the frequency of pruning of course depends on the playing strength.
This is a very good point. Thanks for mentioning it.I think it is very important that your prunings are correlated. If you independently decide if the same move will be pruned in some branches, but not in others, the search will simply seek out the branches where it is not pruned. And search is very clever at that.
Where can I find NEG? Unless it exists for Linux or OS X, I'm afraid I can't use it.Try if you can tune it such that you manage to lose from NEG 0.3d! That is an engine that prunes every move!
Tord
-
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
Re: Artificial stupidity - making a program play badly
My main concern was trying to play _any_ humans. Perhaps because I don't have an established rating, I could only really get computers to play ZCT.Tord Romstad wrote: Trust me -- you wouldn't say that if you had tried to lose against 1900 rated humans on the ICC.
![]()
What this looks like to me is simply psychological perception. A typical human's view of a computer is a simple minded tactical machine that would never hang a rook. If you were to somehow hide the fact that they are playing a computer, I bet they would take the rook instantly. Unfortunately, this is cheating on ICC. Maybe if you played a bunch of unrated games without logging in, and simply calculated the ELO afterwards it would work better. Having your ELO displayed as 2000+ does not help either.It feels quite absurd to watch the games while they are played. I am nervous and excited, and always cheer for the opponent and hope my program will lose. Ultimately, I almost always end up disappointed. Here's a typical scenario:
- Glaurung hangs a rook.
- The opponent thinks for a few seconds, while I wait nervously and hope he will notice the hanging rook.
- The opponent ignores the hanging rook, and makes an irrelevant move somewhere else on the board.
- Glaurung thinks for a while, while I follow its PV, noticing that it still hasn't realized its rook is hanging, and hoping that it won't discover it at the last moment.
- Phew. Glaurung decides to leave the rook hanging.
- My relief is short-lasting, because the opponent moves instantly, leaving the rook untouched.
- Repeat of step 4.
- Repeat if step 5. There is still hope.
- The opponent thinks for a very long time. I'm beginning to get sure he has finally spotted the hanging rook, and is just making sure there is no trap before he takes it.
- He finally moves, doesn't take the rook, but instead walks into an instant back-rank-mate. Game over.
Having a (C) label makes it pretty hard to beat the Turing test...
I think your way is much more accurate, but doing both together might help. If human play is to be emulated well, I think the search must become _very_ selective, and in an even more general way (e.g. prune moves that are to the other side of the board as the last move). Maybe another thing would be to randomly return a qsearch score instead of searching deeper.I think my approach should simulate human play a bit better, but admittedly it doesn't seem very successful so far. In fact I do something slightly similar to what Mike Byrne does, just in a more complex way: Exactly how erroneous the search should be is decided by chance before every move.
Not too bad. That's about what ZCT's rating is from playing computers. (Though we are aiming for opposite ends of the spectrum.That's some relief. Glaurung's ICC rating when playing at an Elo setting of 1000 seems to have stabilized around 2100. Right now, it's 2094, after 44 games (+33,-10,=1). It has a lost position against a 1400 rated player in the currently running game, but I have lost all faith in human players and expect Glaurung to win in the end.

-
- Posts: 1808
- Joined: Wed Mar 08, 2006 9:19 pm
- Location: Oslo, Norway
Re: Artificial stupidity - making a program play badly
That's similar to what Aleks suggested. It should work, but I'd like something more continuous. The difference in strength between a 1 ply search and a 2 ply search is probably huge. Another disadvantage is that the ratings would have to be calibrated again for each new time control. A 1 ply search at blitz will obviously do much better against humans than a 1 ply search at a tournament time control.mjlef wrote:I went through the same thing with Zillions and earlier chess programs. WHat I settled on is limit search depth (auto play to determine a rating for a 1 ply search, 2 ply, etc.
Glaurung never misses a mate in 1, even at the lowest level. Perhaps that alone is worth a considerable number of Elo points?For even worse play, randomly toss out moves---do not score them based on how likely you think a human would be to overlook it...just toss x% of moves with a 1 ply search. You can then use autoplay to score that as well. People overlook moves all the time...even strong players miss mate in 1 sometimes.
Tord
-
- Posts: 1808
- Joined: Wed Mar 08, 2006 9:19 pm
- Location: Oslo, Norway
Re: Artificial stupidity - making a program play badly
Strange. Glaurung instantly became extremely popular among human players after I registered yesterday night. There are rarely more than a few seconds of idle time between the games. Perhaps having "Glaurung 080519, Elo = 1000" in the finger notes help to attract players, but I doubt they are fooled when the program's ICC rating is around 2100.Zach Wegner wrote:My main concern was trying to play _any_ humans. Perhaps because I don't have an established rating, I could only really get computers to play ZCT.Tord Romstad wrote: Trust me -- you wouldn't say that if you had tried to lose against 1900 rated humans on the ICC.
![]()

What this looks like to me is simply psychological perception. A typical human's view of a computer is a simple minded tactical machine that would never hang a rook. If you were to somehow hide the fact that they are playing a computer, I bet they would take the rook instantly. Unfortunately, this is cheating on ICC. Maybe if you played a bunch of unrated games without logging in, and simply calculated the ELO afterwards it would work better. Having your ELO displayed as 2000+ does not help either.[/quote]It feels quite absurd to watch the games while they are played. I am nervous and excited, and always cheer for the opponent and hope my program will lose. Ultimately, I almost always end up disappointed. Here's a typical scenario:
- Glaurung hangs a rook.
- The opponent thinks for a few seconds, while I wait nervously and hope he will notice the hanging rook.
- The opponent ignores the hanging rook, and makes an irrelevant move somewhere else on the board.
- Glaurung thinks for a while, while I follow its PV, noticing that it still hasn't realized its rook is hanging, and hoping that it won't discover it at the last moment.
- Phew. Glaurung decides to leave the rook hanging.
- My relief is short-lasting, because the opponent moves instantly, leaving the rook untouched.
- Repeat of step 4.
- Repeat if step 5. There is still hope.
- The opponent thinks for a very long time. I'm beginning to get sure he has finally spotted the hanging rook, and is just making sure there is no trap before he takes it.
- He finally moves, doesn't take the rook, but instead walks into an instant back-rank-mate. Game over.
Perhaps you are right.
Perhaps I could try to lie and write something in my finger notes about having been incorrectly and unjustly classified as a (C)heater?Having a (C) label makes it pretty hard to beat the Turing test...

Perhaps worth trying.I think your way is much more accurate, but doing both together might help. If human play is to be emulated well, I think the search must become _very_ selective, and in an even more general way (e.g. prune moves that are to the other side of the board as the last move). Maybe another thing would be to randomly return a qsearch score instead of searching deeper.I think my approach should simulate human play a bit better, but admittedly it doesn't seem very successful so far. In fact I do something slightly similar to what Mike Byrne does, just in a more complex way: Exactly how erroneous the search should be is decided by chance before every move.
I suspect that getting high ratings by playing against humans is far easier.Not too bad. That's about what ZCT's rating is from playing computers. (Though we are aiming for opposite ends of the spectrum.That's some relief. Glaurung's ICC rating when playing at an Elo setting of 1000 seems to have stabilized around 2100. Right now, it's 2094, after 44 games (+33,-10,=1). It has a lost position against a 1400 rated player in the currently running game, but I have lost all faith in human players and expect Glaurung to win in the end.)
By the way, the 1400 rated player did manage to win, and briefly restored my faith in humanity, until Glaurung won against a 2220 rated player in the very next game.
Tord
-
- Posts: 12777
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Artificial stupidity - making a program play badly
It was a little tongue in cheek, but I think a modification like that might be useful so that younger players can have fun with it. I don't think that the effect will always be that noticeable, since 94% of the evals will be fine.Tord Romstad wrote:Perhaps worth trying, but I suspect the games won't look very realistic.Dann Corbit wrote:How about:
if (rand() % 17 == 0) eval = -eval;
I guess it will uncork a real funny one once in a while.
I didn't bother to try it.
On the other hand, it seems an interesting experiment.
For younger players, they will sometimes plant a queen in front of an enemy pawn, so in that sense I think it has a certain realism anyway.
No, I haven't. My current code is (like so much else in Glaurung) mostly inspired by Phalanx. I'll have a look at Beowulf, too. Thanks for the suggestion.Colin put a lot of effort into Beowulf to make it play at different levels. Have you looked at what he did?
Tord
-
- Posts: 295
- Joined: Wed Mar 08, 2006 8:29 pm
Re: Artificial stupidity - making a program play badly
Well, I'm really not too much of a chess player to judge how human like the play without a QS really is. At least some stronger players than me claimed that they like to play vs roce because it would play human like.
I always assumed that this would be due the missing QS and the resulting rather planless play if there was no tactical shot on the board.
Like a human beginner an engine which lacks a QS will shuffle the queen on the board around chasing pawns and pieces. Maybe that's unwanted.
Anyway, the problem to make an engine play human-like is how to weaken it tactically because that's where humans usually have problems, not seeing a pin or a fork 2 or 3 ply ahead. But tactics is where engines really shine, so I don't think there is something easy to make it behave completely human like.
best regards
Roman
I always assumed that this would be due the missing QS and the resulting rather planless play if there was no tactical shot on the board.
Like a human beginner an engine which lacks a QS will shuffle the queen on the board around chasing pawns and pieces. Maybe that's unwanted.
Anyway, the problem to make an engine play human-like is how to weaken it tactically because that's where humans usually have problems, not seeing a pin or a fork 2 or 3 ply ahead. But tactics is where engines really shine, so I don't think there is something easy to make it behave completely human like.
best regards
Roman
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Artificial stupidity - making a program play badly
I have added a "skill feature" to crafty 22.1, that can be compiled in if desired.Tord Romstad wrote:The last few days, I've been working on the most important missing feature in my chess program: Adjustable playing strength. Strange though it might seem, making my program play very badly is by far the most difficult and frustrating thing I have attempted to do in computer chess, and I am now close to giving up in despair.
At ratings above 2200, I achieve limited strength simply by reducing the speed of calculation. This works fairly well, as one would expect. Below 2200, I try to emulate typical human blunders and tactical mistakes. This is where the problems begin. My approach seems very reasonable to me: I just prune random moves everywhere in the tree, and the probability that a move is pruned depends on how hard the move would be to see for a human player. Underpromotions and moves with negative SEE value are pruned with very high probability, long diagonal moves also have quite high probability, obvious recaptures have very low probability of being pruned, and so on. Finally, the frequency of pruning of course depends on the playing strength.
Tuning this turned out to be much trickier than I thought. I used TSCP as my sparring partner. The simple task of adjusting the blunder frequency so that my program scored somewhere above 0% and below 100% took a lot of time. After days of work, I finally began to hit close to the mark. I managed to find various settings which scored around 10%, 25%, 50%, 75% and 90% against TSCP. I was also quite pleased with the look of the games: Glaurung played positionally stronger than TSCP, but lost by making quite human-looking blunders. Many of the games looked almost like I would expect a game between TSCP and a similarly rated human to look.
Proud and happy with my work, I started an account on the ICC last night, in order to test against human players. I started with the settings which scored 50% against TSCP, which I thought (based on the WBEC ratings) should have a strength around 1700. At this level, the programs plays positionally ugly chess, and makes plenty of tactical blunders, but rarely hangs a piece, or misses to capture a hanging piece. The result was terribly disappointing: Glaurung played about a dozen games against players around 1900-2100, and won all games except for a single draw. Apparently, 2000 rated players on the ICC make elementary tactical blunders all the time.
I then adjusted the rating down to 1300, and tried again. At this level, the program drops a piece about once or twice per game, on average (at blitz time controls). It turned out that this hardly made any difference: Glaurung still scored close to 100%. Glaurung was frequently hanging pieces, but half the time the human opponents didn't see it, and half the time they quickly paid back the favor by blundering a piece themselves. With a blitz rating of around 2200, I gave up in disgust, logged off and went to bed.
Today, I logged on with the strength set to 1000 -- the lowest implemented level, which scores 0% against TSCP. Glaurung makes several horrible blunders in every single games. It is painful to watch, and it is difficult to imagine how it is possible to play much weaker without playing completely random moves. To my immense frustration, Glaurung still wins most of its games. The current blitz rating, after 37 games, is 2098.
How is this possible? TSCP is rated around 1700, and even when I make my program weak enough to lose every single game against TSCP, it still wins easily against most human players on the ICC. Are the ICC ratings 1000 points too high, or something? How do I manage to lose against average human players, without playing completely random moves?
I'm not sure what the purpose of this post is, apart from venting my frustration, but any advice about how to achieve weak, but realistic-looking play by a computer program would be welcome.
Tord
It is a number between 1 and 100, where 100 is normal, and anything lower represents a degraded level of play...
Let's take skill 50 as an example. First thing this does is reduce all extensions and reductions by 50%, which reduces tactical skill. This includes things like check extensions, null-move and LRM type reductions, etc. It then reduces the evaluation numbers by 50% as well, and adds in a random value that is 50% of the value of a pawn.
I then ran a ton of cluster games to sort of calibrate what skill does. Roughly, skill 70 reduces the playing level by about 200 Elo. Skill 50, reduces it by 400 Elo. By the time you get to 10 or so it is playing real chess, but playing with almost no sense of material or positional values... Note that a random eval still plays something that looks like chess because it is a poor-man's way of maintaining a mobility evaluation term...
There are other ideas left to try.