bug?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

flok

bug?

Post by flok »

Often it is suggested that the low elo rating of Embla is not only caused by the low nps count (150k single core on a recent laptop) but also by bugs.

I have quite a few times compared my code to that of other programs to see if I can find any such deficiency. I can't find any - probably been looking to long at that code.

Yesterday I created a new branch (from trunk) in which I stripped almost all special code. No threading, no null move, no iid. Only alpha/beta, iterative deepening, qs and tt (called tpt in the source).

http://vps001.vanheusden.com/~folkert/embla-trunk.tgz.

It is probably something stupid, a missing negative or so.

All help is greatly appreciated.

- all regular search is in search() - no seperate search for the root
- Scene contains 32 ChessPiece objects and a Board object
- the Board object contains an 8x8 array of ChessPiece pointers
- push/popState and (un-)doMove are used for make-unmake move
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: bug?

Post by Sven »

flok wrote:Often it is suggested that the low elo rating of Embla is not only caused by the low nps count (150k single core on a recent laptop) but also by bugs.

I have quite a few times compared my code to that of other programs to see if I can find any such deficiency. I can't find any - probably been looking to long at that code.

Yesterday I created a new branch (from trunk) in which I stripped almost all special code. No threading, no null move, no iid. Only alpha/beta, iterative deepening, qs and tt (called tpt in the source).

http://vps001.vanheusden.com/~folkert/embla-trunk.tgz.

It is probably something stupid, a missing negative or so.

All help is greatly appreciated.

- all regular search is in search() - no seperate search for the root
- Scene contains 32 ChessPiece objects and a Board object
- the Board object contains an 8x8 array of ChessPiece pointers
- push/popState and (un-)doMove are used for make-unmake move
Why do you do this (Brain.cpp, function evaluate()):

Code: Select all

    if (unlikely(s -> isInsufficientMaterialDraw()) || unlikely(s -> isStaleMate(me))) {
        *abort = true;
        return 0;
    }
If I assume that the "unlikely()" stuff works correctly (although I would refrain from using it, in my opinion it is nothing but a misleading concept by some developers from the GCC world while trying to optimize things based on branch prediction - a topic that should be left to the compiler today! -, and it also hurts readability a lot for me) then why do you set the "abort" flag (which immediately aborts the whole search when checked next time) to true once you find one drawn position due to insufficient material or stalemate? I guess this is not intended ...
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: bug?

Post by hgm »

In the game against micro-Max in the latest on-line blitz tourney Embla sacrifices its Queen for a Pawn (move 18) in the most trivial way. This must be a bug, as even a 0-ply search (QS only) should have recognized this as a -800 score.

[pgn][Event "ICS Rated blitz match"]
[Site "winboard.nl"]
[Date "2016.08.28"]
[Round "-"]
[White "microMax"]
[Black "Embla"]
[Result "1-0"]
[WhiteElo "1324"]
[BlackElo "1022"]

1. d4 Nf6 2. c4 e6 3. g3 d5 4. Nf3 dxc4 5. Bg2 Nc6 6. Qa4 Qd5 7. Nc3 Bb4 8.
Nh4 Ne4 9. Bd2 Bxc3 10. bxc3 e5 11. Bxe4 Qxe4 12. f3 Qd5 13. e4 Qa5 14.
Qxc4 g5 15. Ng2 Bh3 16. Ne3 h5 17. Qb3 b5 18. d5 Qxa2 19. Rxa2 Ne7 20.
Qxb5+ Bd7 21. Qc5 c6 22. Nc4 f6 23. Rxa7 Rxa7 24. Nd6+ Kf8 25. Qxa7 Bh3 26.
Qc7 cxd5 27. Qd8+ Kg7 28. Qxe7+ Kh6 29. Qxf6+ Kh7 30. Qf7+ Kh6 31. Bxg5+
Kxg5 32. Qg7#
{Black checkmated} 1-0
[/pgn]

If you can reproduce that move, it should be possible to figure out exactly what the bug is. For bugs that do not reproduce it sometimes helps if you first try to reproduce the move the engine made before (in an attempt to recostruct the content of the hash table),and then enter the opponent move, (possibly after waiting foraslong as the opponent thought in the original game, to reproduce the effect of pondering on the hash table). If at any depth it thinks Qxa2 is a reasonable move, you have a bug for sure. And if you can reproduce it, it is just a matter of tracing the tree to the node where the misevaluation of this branch occurs.
flok

Re: bug?

Post by flok »

Sven Schüle wrote:Why do you do this (Brain.cpp, function evaluate()):

Code: Select all

    if (unlikely(s -> isInsufficientMaterialDraw()) || unlikely(s -> isStaleMate(me))) {
        *abort = true;
        return 0;
    }
(regarding the branch prediction: I've now removed it in this branch)
then why do you set the "abort" flag (which immediately aborts the whole search when checked next time) to true once you find one drawn position due to insufficient material or stalemate? I guess this is not intended ...
What it does is abort the current node. That is; in qs (where it is used), the node will not searched (the movelist won't be gone through) but instead 0 (draw) is returned. The assumption is that with insufficient material, it cannot win anyway.
flok

Re: bug?

Post by flok »

hgm wrote:In the game against micro-Max in the latest on-line blitz tourney Embla sacrifices its Queen for a Pawn (move 18) in the most trivial way. This must be a bug, as even a 0-ply search (QS only) should have recognized this as a -800 score.
Can reproduce it so luckily not a heisenbug:

Code: Select all

09:11:02 folkert@belle:~/Projects/QueenBee/tags/v0.9.1_2205/trunk↥ trunk(7) 18s ‡ ./Embla -p 'initial_fen=r3k2r/p1p2p2/2n5/qp1Pp1pp/4P3/1QP1NPPb/P2B3P/R3K2R b KQkq - 0 18'
Start this program as follows to see a list of options:
./Embla -h
info string Embla v0.9.3 trunk
info string Using 0 threads
go movetime 5000
info string color black
info depth 1 seldepth 5 score cp 27 time 13 nodes 79 pv c6e7 c3d4 e5d4
info depth 2 seldepth 8 score cp 2 time 19 nodes 203 pv a5a4 b3a4
info depth 2 seldepth 6 score cp 1 time 27 nodes 456 pv c6e7 e1e2
info depth 3 seldepth 11 score cp 20 time 35 nodes 864 pv c6e7 e1e2 a5b6 e4f5
info depth 4 seldepth 13 score cp -5 time 75 nodes 4311 pv a5c3 d2c3 a7a6 d5c6
info depth 4 seldepth 19 score cp -30 time 95 nodes 6368 pv c6e7 c3c4 b5c4
info depth 5 seldepth 15 score cp -55 time 121 nodes 8703 pv a5a2 a1a2 a7a5 d5c6
info depth 5 seldepth 20 score cp -98 time 157 nodes 11759 pv c6e7 c3c4 b5b4
info depth 6 seldepth 18 score cp -123 time 324 nodes 25753 pv a7a6 d5c6 a5a2 a1a2 a6a5 a2a5
info depth 6 seldepth 20 score cp -139 time 516 nodes 42339 pv c6e7 c3c4 b5b4
info depth 7 seldepth 21 score cp -118 time 787 nodes 66844 pv c6e7 c3c4 b5b4 d2b4 a5a6 g3g4
info depth 8 seldepth 25 score cp -135 time 1492 nodes 127302 pv c6e7 c3c4 b5b4 d2b4 a5b6 e1e2 a7a5
info depth 9 seldepth 27 score cp -110 time 2310 nodes 196968 pv c6e7 c3c4 a5a4 b3a4 b5a4 a1b1 c7c6 d5c6
info depth 9 seldepth 28 score cp -110 time 3315 nodes 283807 pv a5a2 a1a2 b5b4
info score cp -110
bestmove a5a2
Will look at it this evening. Thanks.
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: bug?

Post by hgm »

The PV of the blunder does see that the Queen is immediately recaptured. This is not shown in the score, however, which is only -110. But the PV ends after only 3 ply. This strongly suggests that the wrong score comes from a hash hit.

This is always a bit cumbersome to debug. What I usually do is print the number of the hash entry and content (signature, score, depth,bound type) in the node after a5a2 a1a2 b5b4. Then I put a test with the hash store that detects all stores in that hash entry, and causes printing of the path leading to that node. I then look at the last store ito that entry before the probe that retrieved a wrong value. This allows me to verify if the stored and retreived infowas indeed the same.

In a third run I then shift printing the debug info to the node where the wrong info was stored, to trace how it obtained this wrong information.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: bug?

Post by Sven »

flok wrote:
Sven Schüle wrote:Why do you do this (Brain.cpp, function evaluate()):

Code: Select all

    if (unlikely(s -> isInsufficientMaterialDraw()) || unlikely(s -> isStaleMate(me))) {
        *abort = true;
        return 0;
    }
(regarding the branch prediction: I've now removed it in this branch)
then why do you set the "abort" flag (which immediately aborts the whole search when checked next time) to true once you find one drawn position due to insufficient material or stalemate? I guess this is not intended ...
What it does is abort the current node. That is; in qs (where it is used), the node will not searched (the movelist won't be gone through) but instead 0 (draw) is returned. The assumption is that with insufficient material, it cannot win anyway.
Ok, I see. So it does not abort as much as I thought based on the variable name which sounds like aborting the whole search. But then this code seems to be redundant within QS since you have already checked for insufficient material or stalemate few lines above the call to evaluate(). On the other hand you need it for the normal search. Therefore I would remove it from evaluate(), also remove the "abort" parameter, and check these two draw cases directly in Brain::search(), just like you already do in QS. Checking for a draw by insufficient material, stalemate, repetition or 50 moves rule is usually something that you do during search, not within static eval. The way you currently do it also has some more redundancy and/or confusing order in it, e.g. in QS you do the following (pseudo code):

Code: Select all

generate moves
if (is_insufficient_material() or is_stalemate()) // where the latter checks "is_move_list_empty() and not is_my_king_in_check()"
    return 0
if (not is_my_king_in_check()) {
    bestVal = evaluate() // which also checks "is_insufficient_material() or is_stalemate()" and returns 0 if that is true
    if (maxDepthReached or (evaluate found a draw))
        return bestVal
    ...
}
// move loop ...
More typical, also "better", would be something like this for QS:

Code: Select all

if (is_my_king_in_check()) {
    return full_width_search_to_depth_1() // you do not want to do quiescence search at all when in check!
}
if (is_50_moves_draw() or is_insufficient_material() or is_repetition()) {
    // I would ignore stalemate in QS since this would require to generate all moves - but we only want to generate captures/promotions!
    // you need to check for repetition as well since you could be at the QS root where you did not check that so far
    return 0
}
bestVal = evaluate()
if (bestVal >= beta) {
    return bestVal;
}
generate moves for QS
// move loop ...
That would be much clearer and would also save a lot of redundant effort. For instance you never need to generate any moves if evaluate() already leads to a beta cutoff, which is quite often the case within QS.

I have also seen (already in previous versions of your engine, also in POS) that you generate moves for both colors. Again I'd propose to change that into generating only what is actually needed (only for the current side to move, and in case of QS only captures and promotions - for the latter I'm not sure whether you do that, at least it looks like you don't since otherwise your stalemate checking would not work correctly). Testing whether a king is attacked can be done much cheaper than by generating all moves for the opponent. I know that this touches your basic design and possibly data structures, but the requirements (improve the engine - I guess this is your goal) should be more important than keeping the current structure if that structure has disadvantages.

Some other issues I found:

1) Brain::search() must compare the 50 moves counter to 100 (2 * 50) instead of 50.
2) You should only update the PV if the score is > alpha and < beta. Currently you do that also if score >= beta, which will lead to wrong PVs.

As a general remark, I suggest that you reduce your static evaluation function to a minimum consisting of only material and piece-square table, and disable everything else. The reason is that this can help to isolate bugs in the search or in other areas. Currently you have quite a lot of evaluation features (and I suspect that the evaluation does some scoring that is not optimal) while the search obviously appears to have basic problems. At least that is the only explanation I have for the current (very low) playing strength of the engine. I would postpone working on the static evaluation until your engine has reached a level of, say, ELO 2000 and is stable and functional. My experience is that this strategy will result in much more fun while developing, if you can actually find and fix bugs and see substantial improvements afterwards. While there is some unreliable evaluation in the background it is very likely that this dominates the result so that you can change the search as you like but will never see the engine play reasonable chess.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: bug?

Post by Sven »

Now I also looked at the move loop in QS. There you do this:

Code: Select all

		if (!newCause.causingMove -> isCatchMove && newCause.causingMove -> promoteTo != QUEEN && newCause.causingMove -> promoteTo != KNIGHT&#41;
			break;
That means, you stop the whole move loop after finding the first move that is neither a capture nor one of the promotion moves that you allow in QS. This will work if you have sorted the move list such that all captures and promotions to queen or knight come first, then all others. But is that really guaranteed? What about those moves you maintain in your putInFront list, are these always "QS compatible moves"? Is the "break" intended? I would have expected a "continue" instead (or even better, see my comment in my previous post above regarding generating only captures/promotions for QS).
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: bug?

Post by Sven »

Something is wrong about scorePassedPawns(). Internally it assumes that "whiteObjects" actually refers to the white pieces and "blackObjects" to the black ones, given all the minimum/maximum calculation for "y" (rank). But evaluate() passes "myObjects ... hisObjects" instead. So for all positions with black to move, the two parameters are in wrong order so that white passers are counted for black and vice versa, resulting in a wrong evaluation. To avoid having to change the implementation of scorePassedPawns() you might want to change its call in evaluate(), and deal correctly with the sign of the resulting score depending on the current side to move.

Furthermore, passed pawn evaluation - if you actually want to do it at the current stage - should really consider at least the distance of a passer to its promotion square, and then assign an appropriate score. More advanced would be to also check whether the passer is blocked by an enemy piece (which usually reduces a passer's value a lot), whether there is any friendly piece on its promotion path (which increases the "promotion distance"), or whether it is an unstoppable passer. In any case just counting the number of passers is very inaccurate. I am not sure whether this is "better than nothing", knowing that the real value of a passed pawn is often misunderstood by the engine.

The effort you spend to count all protected and all attacked pieces does not correlate well to the scoring you derive from that information. Also the gained information itself is not very precise. For instance, if you have 4 protected pieces but 3 attacked ones then you assign a bonus of +1 (4 - 3) although having 3 attacked pieces might already mean a lost position. But you don't know, since you do not make a difference between "hanging" pieces (either attacked and unprotected, or attacked by a minor piece) and others that are attacked but safe. This is already a complex task that would need much more thoughts than just counting attacks, which is one more reason why I would would vote for removing this feature from eval at the moment.

Also your penalty for a doubled pawn is 30 cp which is quite high (you increase the count by 2 for a doubled pawn - 3 for tripled etc. - and multiply the result by 15). Compare that to other scores, e.g. your piece-square table or mobility. A piece with increased mobility can be worth a lot, and this can actually outweigh one doubled pawn. Leaving out this feature (as well as isolated pawns) will hardly have any negative effect.

Your mobility scoring uses the same weight (3) for all piece types. Much better would be a lower weight for queen mobility and slightly heigher weights for knight and bishop mobility. Queen mobility will otherwise get too much influence.

Back to your QS implementation: since you are currently checking for stalemate and generating all legal moves, are you sure that you also detect being checkmated in QS? If not, that would be one more argument for doing a 1-ply full-width search when being in check during QS.
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: bug?

Post by hgm »

Sven Schüle wrote:I have also seen (already in previous versions of your engine, also in POS) that you generate moves for both colors. Again I'd propose to change that into generating only what is actually needed (only for the current side to move, and in case of QS only captures and promotions - for the latter I'm not sure whether you do that, at least it looks like you don't since otherwise your stalemate checking would not work correctly).
Although this is unusual, some arguments can be made in favor of it. Mobility calculation is very closely related to move generation, and when you want the precise evaluation, you would need it for both sides. It can also help to identify which pieces are hanging for the side not to move, and which of the pieces side to move are threatened. Finally, when you do null move, you can skip move generation in the node after it, as all you have to do is swap the move lists of the players.

In KingSlayer I do it the other way around: each node generates only moves for the stm, but even nodes that do not want to search a nullmove (e.g. QS nodes) do make a recursive call to Search() with different side to move just to do move generation (and then return mobility and an attack map before searching anything).