bug?

flok · Post by **flok** » Mon Aug 29, 2016 11:17 pm

Ok I'm investigating this giving queen away problem.
If I revert a certain change (rev 3001) then it disappears.
This affects the following line:
for(size_t i=1; i<mlSize; i++)
in reorderAndSortList.

By starting the search at 1, the first move is not checked if it is in the "put in front"-list. by this, the sort-start-offset which is returned (moveToPos) is one less than what it should be. Because of that sometimes one important move (old pv hit, tt hit, sibbling) may be moved to the end of the move-list and thus never checked.

This together with that bug which could abort the q-search way earlier because the first move(s) (the ones from the put in front list) may not be attack/promote moves. This combined with the "stop node when a non-capture/promote move is found" may stop a qs search earlier than what is required.

flok · Post by **flok** » Mon Aug 29, 2016 11:34 pm

Sven Schüle wrote:Now I also looked at the move loop in QS. There you do this:
Code: Select all
		if (!newCause.causingMove -> isCatchMove && newCause.causingMove -> promoteTo != QUEEN && newCause.causingMove -> promoteTo != KNIGHT&#41;
			break;
That means, you stop the whole move loop after finding the first move that is neither a capture nor one of the promotion moves that you allow in QS. This will work if you have sorted the move list such that all captures and promotions to queen or knight come first, then all others. But is that really guaranteed? What about those moves you maintain in your putInFront list, are these always "QS compatible moves"? Is the "break" intended? I would have expected a "continue" instead (or even better, see my comment in my previous post above regarding generating only captures/promotions for QS).

That indeed is a bug. When I implemented that line I had forgotten that the put-in-front stuff may be moves that are non-capture/promotion moves. So that will cause premature aborts.

By the way: I read all msgs and process them accordingly (in case I do not put a response here).

hgm · Post by **hgm** » Tue Aug 30, 2016 9:23 am

flok wrote:Ok I'm investigating this giving queen away problem.
If I revert a certain change (rev 3001) then it disappears.

In general that doesn't mean a thing. The most innocent changes in evaluation or search extensions, or even just changing the size of the hash table will change the search tree, and the new tree might now avoid the node that triggered the bug, e.g. because it moved to a sub-tree that is alpha-beta pruned because of score changes in other branches. But the bug could still be there, and cause the same kind of blunder in other positions.

You really must understand why exactly it though Qxa2 was a not-so-bad move before you can be sure you fixed it.

Dann Corbit · Post by **Dann Corbit** » Tue Aug 30, 2016 9:50 am

hgm wrote:
flok wrote:Ok I'm investigating this giving queen away problem.
If I revert a certain change (rev 3001) then it disappears.
In general that doesn't mean a thing. The most innocent changes in evaluation or search extensions, or even just changing the size of the hash table will change the search tree, and the new tree might now avoid the node that triggered the bug, e.g. because it moved to a sub-tree that is alpha-beta pruned because of score changes in other branches. But the bug could still be there, and cause the same kind of blunder in other positions.

You really must understand why exactly it though Qxa2 was a not-so-bad move before you can be sure you fixed it.

I agree 100% with this.
First, you must reproduce the problem.
Then you must understand the problem.
Only then can you fix the problem.
Otherwise, you may simply be masking an even bigger problem of which the apparent problem is only a minor symptom.

flok · Post by **flok** » Thu Sep 01, 2016 12:38 pm

hgm wrote:In the game against micro-Max in the latest on-line blitz tourney Embla sacrifices its Queen for a Pawn (move 18) in the most trivial way. This must be a bug, as even a 0-ply search (QS only) should have recognized this as a -800 score.
(...)
If you can reproduce that move, it should be possible to figure out exactly what the bug is.

I can reproduce the problem and I found what triggered it.
For this I added a field "source" to the chain-of-moves (pv) which says "tt" or "syzygy" or "book" or "qs" or "search". With that I found that it comes from search. Then a few printfs later I found that it is qs return a score (and no move) because of a standing pat:

Code: Select all

        sd -> s -> generateMovelists&#40;c, true&#41;;

        bool abort = false;
        int bestVal = evaluate&#40;sd -> s, c, currentSearchDistance, &abort&#41;;

        // big_delta&#58; http&#58;//chessprogramming.wikispaces.com/Delta+Pruning
        int big_delta = ChessPiece&#58;&#58;evalVal&#91;QUEEN&#93;;

        if &#40;cause -> causingPiece -> isWasPromotion&#40;))
                big_delta += ChessPiece&#58;&#58;evalVal&#91;QUEEN&#93; - ChessPiece&#58;&#58;evalVal&#91;PAWN&#93; * 2;

        if &#40;bestVal < alpha - big_delta&#41; &#123;
printf&#40;"qs delta pruning return\n");
                return alpha;
        &#125;

        if &#40;bestVal > alpha&#41;
        &#123;
                alpha = bestVal;

                if &#40;bestVal >= beta&#41; &#123;
printf&#40;"qs stand pat return\n");
                        return bestVal;
                &#125;
        &#125;

So now I need to figure out why. I think it is either an implementation bug of that standing pat concept or a bug in the evaluation code. It probably is the evaluation code; I recently added an experimental factor which I forgot about:

Code: Select all

        int mNAtt = s -> countUnderAttack&#40;me&#41;, mNProt = s -> countProtected&#40;me&#41;;
        int hNAtt = s -> countUnderAttack&#40;opponent&#41;, hNProt = s -> countProtected&#40;opponent&#41;;
...
        score += (&#40;mNProt - mNAtt&#41; - &#40;hNProt - hNAtt&#41;)

hgm · Post by **hgm** » Thu Sep 01, 2016 9:18 pm

Well, root scores should always be derived from a stand pat at the end of the PV (or a checkmate), possibly transferred through the PV. Delta pruning is only done in nodes that fail low,meaning they don't propagate to the root.

The question is which position this was, and whether it was evaluated correctly. If you did end up in a position that was evaluated correctly as -110, after blundering a Queen away in the first 2 ply, you must have a search problem, because there should be no way it can force gaining a Queen back.

flok · Post by **flok** » Mon Sep 05, 2016 10:44 pm

Yes, that's what I'm trying to do; understanding the problem.
I can reproduce the problem. Not a slightest idea what may be causing it. Well, I've got a hunch; see a following mail.
Please note that I sometimes write things while investigating. So it is often full of assumptions that are later refuted.

flok · Post by **flok** » Mon Sep 05, 2016 10:53 pm

After a couple of days of totally ignoring my program I looked at it again, with the focus on this problem as I have the feeling it may more often wreaking havoc.

I already mentioned the experiment I had left in:

Code: Select all

score += (&#40;mNProt - mNAtt&#41; - &#40;hNProt - hNAtt&#41;)

prot = number of pieces that protect (can be > 8 for pieces that protect multiple others)
att = number of pieces that attack (idem)
m = mine, current player
h = his, opponent

Other things that stood out:
- qs does an tt update with search distance equal to the size of the pv but no direct noticable effect
- if iterative internal deepening is disabled, problem goes away
- with null move disabled, problem goes away
- if I use a search-depth reduction of 1 for null move (instead of 2), problem stays
- only 1 null move per branch: problem stays
- I found a deficiency in the move ordening code: the first move in the list would never be moved. solving this; problem goes away
- Sven S. found a bug where not all moves in QS were evaluated. fixing that does not solve _this_ problem though

To clarify "goes away":
- at depth 9 I get the following:

Code: Select all

info depth 9 seldepth 27 score cp -110 time 2379 nodes 190477 pv c6e7 c3c4 a5a4 b3a4 b5a4 a1b1 c7c6 d5c6
info depth 9 seldepth 28 score cp -110 time 3372 nodes 274113 pv a5a2 a1a2 a7a6

(aspiration window thing).
- when something fixes it, then I get a longer pv like

Code: Select all

info depth 9 seldepth 26 score cp -110 time 2383 nodes 191092 pv c6e7 c3c4 a5a4 b3b5 a4b5 c4b5 a7a6 a2a4 a6b5
info depth 9 seldepth 30 score cp -110 time 4425 nodes 360737 pv c6e7 c3c4 a5a4 b3c3 e8d7 c3e5 g5g4 e5f6 h5h4 c4b5 a4b5

In the "a5a2 a1a2 a7a6" case I verified that none of the 3 moves come out of the transposition table:

Code: Select all

move&#58; c6-e7 &#40;1&#41;, alpha/score/beta&#58; -10000/27/10000, took 0.01, 6076.92knps, pv&#58; c6e7/27/search c3d4/214/qs e5d4/-214/qs
move&#58; a5-a4 &#40;2&#41;, alpha/score/beta&#58; 2/2/52, took 0.02, 10150.00knps, pv&#58; a5a4/2/search b3a4/-2/search
move&#58; c6-e7 &#40;2&#41;, alpha/score/beta&#58; -10000/1/52, took 0.03, 15724.14knps, pv&#58; c6e7/1/search e1e2/-1/search
move&#58; c6-e7 &#40;3&#41;, alpha/score/beta&#58; -24/20/26, took 0.04, 24685.71knps, pv&#58; c6e7/20/search e1e2/-20/search a5b6/20/search e4f5/307/qs
move&#58; a5xc3 &#40;4&#41;, alpha/score/beta&#58; -5/-5/45, took 0.08, 56723.68knps, pv&#58; a5c3/-5/search d2c3/5/search a7a6/-5/search d5c6/5/search
move&#58; c6-e7 &#40;4&#41;, alpha/score/beta&#58; -10000/-30/45, took 0.10, 66333.33knps, pv&#58; c6e7/-30/search c3c4/30/search b5c4/-30/tpt
move&#58; a5xa2 &#40;5&#41;, alpha/score/beta&#58; -55/-55/-5, took 0.12, 70756.10knps, pv&#58; a5a2/-55/search a1a2/55/search a7a5/-55/search d5c6/55/search
move&#58; c6-e7 &#40;5&#41;, alpha/score/beta&#58; -10000/-98/-5, took 0.16, 73493.75knps, pv&#58; c6e7/-98/search c3c4/98/search b5b4/-98/tpt
move&#58; a7-a6 &#40;6&#41;, alpha/score/beta&#58; -123/-123/-73, took 0.37, 70363.39knps, pv&#58; a7a6/-123/search d5c6/123/search a5a2/-123/search a1a2/123/search a6a5/-123/search a2a5/123/search
move&#58; c6-e7 &#40;6&#41;, alpha/score/beta&#58; -10000/-139/-73, took 0.56, 74936.28knps, pv&#58; c6e7/-139/search c3c4/139/search b5b4/-139/tpt
move&#58; c6-e7 &#40;7&#41;, alpha/score/beta&#58; -164/-118/-114, took 0.85, 78271.66knps, pv&#58; c6e7/-118/search c3c4/118/search b5b4/-118/search d2b4/118/search a5a6/-118/search g3g4/118/tpt
move&#58; c6-e7 &#40;8&#41;, alpha/score/beta&#58; -143/-135/-93, took 1.58, 80775.38knps, pv&#58; c6e7/-135/search c3c4/135/search b5b4/-135/search d2b4/135/search a5b6/-135/search e1e2/135/search a7a5/-135/tpt
move&#58; c6-e7 &#40;9&#41;, alpha/score/beta&#58; -160/-110/-110, took 2.41, 81729.46knps, pv&#58; c6e7/-110/search c3c4/110/search a5a4/-110/search b3a4/110/search b5a4/-110/search a1b1/110/search c7c6/-95/search d5c6/95/search
move&#58; a5xa2 &#40;9&#41;, alpha/score/beta&#58; -160/-110/10000, took 3.41, 83154.70knps, pv&#58; a5a2/-110/search a1a2/110/search b5b4/-110/search

c6e7/27/search means:
- c6e7: the move of course
- 27: the score of the move
- search: where the move was made up. can be qs, search, tpt, syzygy, etc

I've added the fen of the last move:

Code: Select all

move&#58; a5xa2 &#40;9&#41;, alpha/score/beta&#58; -160/-110/10000, took 3.52, 80741.68knps, pv&#58; a5a2/-110/search a1a2/110/search b5b4/-110/search "r3k2r/p1p2p2/2n5/1p1Pp1pp/4P3/1QP1NPPb/R2B3P/4K2R b Kkq - 0 19"

So the PV shown is incorrect.
This affects the move it plays because the search() doesn't return a move but a PV. So in the root (other programs have a seperate searchRoot() for this) it may return a move which is gibberish and should've been an empty_pv/no_move, triggering a re-search with a wider window (aspiration windows). This is interesting because since a couple a while cutechess complains that moves in the pv returned are incorrect!

hgm · Post by **hgm** » Tue Sep 06, 2016 8:39 am

I don't understand that latest conclusion. Why is the PV incorrect? The FEN in the last node seems exactly what you would get after the two moves leading to it (a5a2 a1a2).

I don't see any how you can draw any conclusions from the data you have now. Everything so far is entirely consistent with the hypothesis that you get a (d >= 6) hash hit in the node after b5b4 which causes a hash cutoff, delivering the exact score -110. In that case the only question that can shed light on this,is how such an obviously wrong score could get in that entrie. (E.g. is it a key collision, so that the scorereally belongs toa totally different position, or is there a search bug that makes you erroneously think black can gain a Queen from that position.) Nothing of what you posted sheds any light on that.

flok · Post by **flok** » Tue Sep 06, 2016 11:09 am

hgm wrote:I don't understand that latest conclusion. Why is the PV incorrect? The FEN in the last node seems exactly what you would get after the two moves leading to it (a5a2 a1a2).

That pv may be correct (I meant PVs shown in general, not this specific one), others may not: what I did before was storing the move if score > bestVal instead if score > alpha.

I don't see any how you can draw any conclusions from the data you have now. Everything so far is entirely consistent with the hypothesis that you get a (d >= 6) hash hit in the node after b5b4 which causes a hash cutoff, delivering the exact score -110.

Hash hit? But the pv of the last iteration shows no hash hit at all. Only search-results (e.g. non-qs/tt/book/etc).

a5a2/-110/search a1a2/110/search b5b4/-110/search

All search results.

bug?

Re: bug?

Re: bug?

Re: bug?

Re: bug?

Re: bug?

Re: bug?

Re: bug?

Re: bug?

Re: bug?

Re: bug?