how to track down a BUG?

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

elcabesa
Posts: 855
Joined: Sun May 23, 2010 1:32 pm

how to track down a BUG?

Post by elcabesa »

I have just found my engine has some bug :)

if i give him this position :
6rk/2p2R1p/2b4P/p1p1qP2/2p1P3/2N1Q3/P3r3/K5R1 w - - 2 37

it woulf tell me it has a sure win while it isn't.
info depth 1 seldepth 8 score cp -29 nodes 120 time 7 nps 120000 pv 1g8 h8g8
info depth 2 seldepth 9 score cp -273 nodes 294 time 17 nps 21000 pv 1g8 h8g8
info depth 3 seldepth 12 score cp -36 nodes 718 time 32 nps 26592 pv 7g7 g8g7
info depth 4 seldepth 12 score cp -96 nodes 1664 time 49 nps 38697 pv 7g7 g8b8 e3g3 e2e4 g7c7 e5g3 g1g3
info depth 5 seldepth 11 score cp -9 nodes 2600 time 69 nps 40625 pv f7g7 g8b8 e3g3 e5g3 g1g3
info depth 6 seldepth 10 score cp -39 nodes 3551 time 84 nps 44949 pv f7g7 g8b8 e3g3 e5g3 g7g3 e2e4 c3e4 c6e4
info depth 7 seldepth 15 score cp -14 nodes 5313 time 102 nps 54773 pv f7g7 g8g7
info depth 8 seldepth 24 score cp -20 nodes 11347 time 125 nps 93776 pv f7g7 g8g7
info depth 9 seldepth 16 score cp -5 nodes 19519 time 151 nps 130126 pv f7g7 g8g7
info depth 10 seldepth 7 score cp 14 nodes 47935 time 221 nps 218881 pv f7g7 g8b8 e3g3 e5g3 g7g3 e2f2 g3g7 f2f4 g7e7 b8d8 a1b1
info depth 11 seldepth 21 score cp -130 nodes 137101 time 434 nps 317363 pv g1g7 g8g7
info depth 12 seldepth 22 score cp -74 nodes 227219 time 666 nps 342197 pv g1g7 g8g7
info depth 13 seldepth 25 score cp -159 nodes 519839 time 1356 nps 383928 pv g1g7 g8g7
info depth 14 seldepth 37 score cp 2241 nodes 1567388 time 3598 nps 436112 pv g1g7 g8g7
info depth 15 seldepth 32 score cp 2279 nodes 1597021 time 3675 nps 434800 pv g1g7 g8g7
info depth 16 seldepth 36 score cp 2495 nodes 1656100 time 3811 nps 434900 pv g1g7 g8g7
info depth 17 seldepth 35 score cp 16675 nodes 1741345 time 4014 nps 434142 pv g1g7 g8g7
info depth 18 seldepth 31 score cp 16675 nodes 1934119 time 4376 nps 442185 pv g1g7 g8g7
info depth 19 seldepth 32 score cp 16675 nodes 2343496 time 5055 nps 463691 pv g1g7 g8g7
info depth 20 seldepth 32 score cp 16675 nodes 4306941 time 7919 nps 544011 pv g1g7 g8g7
bestmove g1g7 ponder g8g7
I have just controlled that perft up to 6 is right and i have just seen that it should not be an eval error, ( I have tested replacing eval with material only)
Have you got some idea how to track down an error at a so high depth?
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: how to track down a BUG?

Post by Sven »

1. Disable TT and check again
=> if the problem disappears then perhaps have a look at how you store/retrieve mate scores in/from the TT

2. Explain the meaning of the value "16675": is that a mate score, or at least close to it?

3. If it is not an eval bug and not a TT bug then it might be a search bug. Perhaps you disable certain features of the search step by step until the error disappears, e.g. nullmove, QS, ...

4. PVs of first 4 plies look dubious in your post (first letter 'g' resp. 'f' of first PV move is missing?!), what about this? Perhaps you have some memory corruption?

Sven
elcabesa
Posts: 855
Joined: Sun May 23, 2010 1:32 pm

Re: how to track down a BUG?

Post by elcabesa »

Sven Schüle wrote:1. Disable TT and check again

2. Explain the meaning of the value "16675": is that a mate score, or at least close to it?
bigger than 16000 is a sure win, it's returned by specialized endgame fuction, its bigger than a normal eval but lesser than mate in x ply
3. If it is not an eval bug and not a TT bug then it might be a search bug. Perhaps you disable certain features of the search step by step until the error disappears, e.g. nullmove, QS, ...
it looks like a null move threath recognition bug, i'm investigating
4. PVs of first 4 plies look dubious in your post (first letter 'g' resp. 'f' of first PV move is missing?!), what about this? Perhaps you have some memory corruption?
it's and error of transcrition, i have done some cut& paste to remove some carriege return and I removed the f or g.

[/quote]
User avatar
hgm
Posts: 27702
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: how to track down a BUG?

Post by hgm »

Any reproducible error can be found, but it can be very tedious. The way I do it is by keeping the current branch in an array moves[ply], assigned to in MakeMove. After UnMake I then put a conditional printf (which ends up in the GUI debug file) that prints ply, remaining depth, move, returned score and maximum score. The condition to print is that you are below a certain depth along a given path. So it prints only a very limited number of nodes. Then I start by printing only the root, to see which move has the suspect score, then I make that move the first move in the path, etc. Until you get to the node where the error occurs.

If the path ends in a hash hit, you print the hash key. Then you run again, printing the path at the point where an entry with that key is stored. And then you continue with that path.

At large depth this is a bit tedious. (It can be made less tedious by allowing specifying the path (and hash key) on the command line of the engine, so that you don't have to recompile for adding the next move to the path, but even then it is tedious at large depth. But it is an absolutely certain way to zoom in on the error and diagnose it in a finite number of steps. Therefore it usually is still faster than any othes (more speculative) methos.
elcabesa
Posts: 855
Joined: Sun May 23, 2010 1:32 pm

Re: how to track down a BUG?

Post by elcabesa »

it's very similar to debugging Perft using Divide, you look for a subtree who has given the wrong perft number and so on.
Antonio Torrecillas
Posts: 90
Joined: Sun Nov 02, 2008 4:43 pm
Location: Barcelona

Re: how to track down a BUG?

Post by Antonio Torrecillas »

I use the following method to track and to verify the sanity of the search.
Depending on the method you use to retrieve the main line, you can add it sneaks in to help you detect problems.
In debug mode, the path of the main line should be complete. If the quiesce search is included better.
in debug mode include a word NULLMOVE or TT or Repetition as the cause of the cut.
I usually use a large epd file, that I traverse, doing these verifications after a search:
- the number of move in PV must be >= nominal depth.unless there is a repetition or a end game database hit or mate.
- move down the PV and do a static eval, the returned value must match with the returned value of the search.(white perspective).
- If this word (NULLMOVE OR TT) emerge as PV you have a clue about your problem.(I don't cut in PV for a TT hit).

Finally, I write a log with positions that go wrong.
I choose the shallower for use in the debugging session. ;-)
JVMerlino
Posts: 1352
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: how to track down a BUG?

Post by JVMerlino »

Not sure if this will help, but you never know....

While trying to find a weird but very similar bug in Myrddin, Dann Corbit found that it was caused (or possibly just made visible) by the "Whole Program Optimization" compiler setting in Visual Studio. The issue in Myrddin only happened in 32-bit release builds (so, not in 64-bit at all and not in 32-bit debug).

So you might want to check your compiler settings if the above applies to your development environment.

We never figured out WHY the setting caused the problem; we were satisfied when we confirmed that turning off the setting made the problem go away in all positions where we could reproduce it.

Be careful because the setting actually exists in two places in the IDE, in "General" in the top level and in "C/C++ -> Optimization".

jm
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: how to track down a BUG?

Post by lucasart »

elcabesa wrote:I have just found my engine has some bug :)

if i give him this position :
6rk/2p2R1p/2b4P/p1p1qP2/2p1P3/2N1Q3/P3r3/K5R1 w - - 2 37

it woulf tell me it has a sure win while it isn't.
info depth 1 seldepth 8 score cp -29 nodes 120 time 7 nps 120000 pv 1g8 h8g8
info depth 2 seldepth 9 score cp -273 nodes 294 time 17 nps 21000 pv 1g8 h8g8
info depth 3 seldepth 12 score cp -36 nodes 718 time 32 nps 26592 pv 7g7 g8g7
info depth 4 seldepth 12 score cp -96 nodes 1664 time 49 nps 38697 pv 7g7 g8b8 e3g3 e2e4 g7c7 e5g3 g1g3
info depth 5 seldepth 11 score cp -9 nodes 2600 time 69 nps 40625 pv f7g7 g8b8 e3g3 e5g3 g1g3
info depth 6 seldepth 10 score cp -39 nodes 3551 time 84 nps 44949 pv f7g7 g8b8 e3g3 e5g3 g7g3 e2e4 c3e4 c6e4
info depth 7 seldepth 15 score cp -14 nodes 5313 time 102 nps 54773 pv f7g7 g8g7
info depth 8 seldepth 24 score cp -20 nodes 11347 time 125 nps 93776 pv f7g7 g8g7
info depth 9 seldepth 16 score cp -5 nodes 19519 time 151 nps 130126 pv f7g7 g8g7
info depth 10 seldepth 7 score cp 14 nodes 47935 time 221 nps 218881 pv f7g7 g8b8 e3g3 e5g3 g7g3 e2f2 g3g7 f2f4 g7e7 b8d8 a1b1
info depth 11 seldepth 21 score cp -130 nodes 137101 time 434 nps 317363 pv g1g7 g8g7
info depth 12 seldepth 22 score cp -74 nodes 227219 time 666 nps 342197 pv g1g7 g8g7
info depth 13 seldepth 25 score cp -159 nodes 519839 time 1356 nps 383928 pv g1g7 g8g7
info depth 14 seldepth 37 score cp 2241 nodes 1567388 time 3598 nps 436112 pv g1g7 g8g7
info depth 15 seldepth 32 score cp 2279 nodes 1597021 time 3675 nps 434800 pv g1g7 g8g7
info depth 16 seldepth 36 score cp 2495 nodes 1656100 time 3811 nps 434900 pv g1g7 g8g7
info depth 17 seldepth 35 score cp 16675 nodes 1741345 time 4014 nps 434142 pv g1g7 g8g7
info depth 18 seldepth 31 score cp 16675 nodes 1934119 time 4376 nps 442185 pv g1g7 g8g7
info depth 19 seldepth 32 score cp 16675 nodes 2343496 time 5055 nps 463691 pv g1g7 g8g7
info depth 20 seldepth 32 score cp 16675 nodes 4306941 time 7919 nps 544011 pv g1g7 g8g7
bestmove g1g7 ponder g8g7
I have just controlled that perft up to 6 is right and i have just seen that it should not be an eval error, ( I have tested replacing eval with material only)
Have you got some idea how to track down an error at a so high depth?
- perft proves that your board code is correct
- reproducing with material only eval proves that the bug is not in your eval (though your eval may still have bugs too, that's not what explains it)
- so the bug is in your search. disable all the features and use a plain and stupid alpha/beta search and see what hapopoens. enable the features one by one, especially transposition table, and see which one triggers the bug
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: how to track down a BUG?

Post by Sven »

elcabesa wrote:
Sven Schüle wrote:1. Disable TT and check again

2. Explain the meaning of the value "16675": is that a mate score, or at least close to it?
bigger than 16000 is a sure win, it's returned by specialized endgame fuction, its bigger than a normal eval but lesser than mate in x ply
Sounds like you should mainly investigate in that particular direction. There must be a reason why your program handles the position as a "sure win".

Sven
elcabesa
Posts: 855
Joined: Sun May 23, 2010 1:32 pm

Re: how to track down a BUG?

Post by elcabesa »

thank you everyone for the help you give me!

the bug turned out to be in the search code, threat detection in nullmove.

When the nullmove test reports a checkmate I returned beta-1, this was the problem.
I was inspired by stockfish code where they return beta-1 if the move that make a checkmate and the previous move are "connected".
Now I don't return beta -1 but I only set a checkmateThreat and extend the search.

GIT helped me tracking down the bug