Changes in Andscacs 0.70

cdani · Post by **cdani** » Tue Nov 11, 2014 8:33 am

These are the tests (20+.04) I have done this night:

In quiescence, all hash moves even quiet ones:

Rank Name                Elo    +    - games score oppo. draws
 1 Gull 1.2 x64           236   27   25   562   79%     5   17%
 2 Gaviota v1.0           102   23   23   562   63%     5   21%
 3 Nirvanachess 1.6        19   22   23   562   52%     5   25%
 4 Naraku 1.4              17   23   23   563   52%     5   21%
 5 Critter 0.52b 64-bit    16   22   22   562   52%     5   26%
 6 Andscacs 0.71019         5    8    8  5058   51%    -1   22%
 7 cheng4 0.36c             2   22   22   561   49%     5   27%
 8 Atlas  3.60  x64       -47   23   23   562   43%     5   22%
 9 Spike 1.2 Turin       -144   24   24   562   30%     5   19%
10 Philou 3.7.1 64 bits  -207   25   26   562   23%     5   19%

In quiescence, only hash moves that are captures or promotions, unless king in check, and without the bug:

Code: Select all

Rank Name                Elo    +    - games score oppo. draws
1 Gull 1.2 x64           238   23   23   740   79%     3   16%
2 Gaviota v1.0           106   20   20   740   64%     3   22%
3 cheng4 0.36c            39   20   20   740   55%     3   26%
4 Nirvanachess 1.6        15   20   20   740   52%     3   25%
5 Critter 0.52b 64-bit    12   20   20   742   51%     3   21%
6 Naraku 1.4               8   20   20   742   51%     3   18%
7 Andscacs 0.71020         3    8    8  6668   51%     0   21%
8 Atlas  3.60  x64       -56   20   20   741   42%     3   24%
9 Spike 1.2 Turin       -157   21   21   741   29%     3   20%
0 Philou 3.7.1 64 bits  -207   22   22   742   22%     3   21%

Seem that are more or less equal.

I will try also the other idea of Thomas Petzke about returning a draw score if a quiet is found in hash.

Evert · Post by **Evert** » Tue Nov 11, 2014 9:36 am

cdani wrote: I will try also the other idea of Thomas Petzke about returning a draw score if a quiet is found in hash.

I think that's just supposed to be for the case of repeated positions. Otherwise it doesn't make much sense to me...

tpetzke · Post by **tpetzke** » Tue Nov 11, 2014 6:07 pm

I think that's just supposed to be for the case of repeated positions. Otherwise it doesn't make much sense to me...

Yes, because in case of a repetition I can imagine to find a quiet move in that hash, but in that case I don't use it as I handle the repetition instead (before even probing ...).

Evert · Post by **Evert** » Tue Nov 11, 2014 8:18 pm

Thinking about it some more, If the best move in a position would be a quiet move, I guess in QS you would just stand pat, since apparently the captures are no good. So trying the quiet hash move should be about equal.

cdani · Post by **cdani** » Tue Nov 11, 2014 8:23 pm

lucasart wrote:
cdani wrote: And in quiescence:
• Hash move, even if it’s a quiet move. I tried only captures and promotions, and it was clearly weaker.
Interesting. Testing in SF:
http://tests.stockfishchess.org/tests/v ... 75593e5d7b

I don't understand the results of this test, I never took time to learn those parameters. Anyone can explain it? I see that there are some more wins.
Thanks.

Ajedrecista · Post by **Ajedrecista** » Tue Nov 11, 2014 9:29 pm

Hello Daniel José:

cdani wrote:I don't understand the results of this test, I never took time to learn those parameters. Anyone can explain it? I see that there are some more wins.
Thanks.

First of all, congratulations for your engine. You are improving it quite a lot!

Now, trying to answer your question:

Re: Type I error in LOS based early stopping rule.

Re: Stats and bench on Stockfish development site.

I wrote those two posts more than a year ago. The first one contains the Python code inside SF testing framework, which computes LLR (Log Likelihood Ratio); the second one tries to explain how parameters work.

I will use an old explanation by someone that was not me (probably Uri or Lucas, but I am not sure): with each number of games, you get three values of {wins, draws, loses} and, with the help of the parameters, you can compute LLR in the way described in the Python code. Those LLR values tipically start very near to 0, and there are two bounds [an upper bound (a positive value) and a lower bound (a negative value)]. Once one of these bounds is overtaken, then the test is considered finished: if the upper bound was overtaken, then the patch succeeds; it fails if the overtaken bound is the lower one. SF testing framework adds a chi-squared test in a try to determine if there are 'black sheep' computers that are degrading the results, and in that case (p-value < 1% in SF testing framework when the test finished; I think this value was randomly choosed) the test is purged by an admin: the games of the bad computers are replaced.

I wrote a SPRT simulator just to estimate the average number of games of an SPRT given certain parameters, as well as collect other stats. I did not released it due to lack of interest.

I hope you will understand my post. Please ask more if you still have doubts, although I have been very busy for weeks and I will continue in the same way.

Regards from Spain.

Ajedrecista.

Evert · Post by **Evert** » Tue Nov 11, 2014 9:29 pm

The result is colour-coded for easy comparison. The fact that it's green means it's an improvement (but you can't tell how much), with the usualcaveats about false positives.

It'd be interesting to understand why. As I said, you'd normally expect that if the best move is a quiet move, the QS would just stand pat. However, that's not necessarily accurate if the move is a threat evasion (to safe a hanging piece, say), has a double threat you wouldn't normally pick up (threaten mate and attack a piece) or affects some possibly large evaluation term (like king safety). Arguably some of those should be considered in QS anyway, but generating them explicitly is complicated/expensive.

It would be interesting to see some positions where the move from the TT defeats the stand-pat score, and why. Of course, all the above considerations aside, it could be something as trivial as improving the position of a piece in a quiet position and backing up a more accurate score, but I'd be surprised if that was the dominant effect.

cdani · Post by **cdani** » Tue Nov 11, 2014 10:12 pm

Thanks!
So the patch will be adopted? I suppose it depends on other things.

I will continue doing the quiet moves in quiescence. I feel that it's better. In some time may be I will test again with more games. I think it's at least ok because it helps arriving to a best evaluation of the line, so in essence generates a more exact quiescence when there are no tactics.

lucasart · Post by **lucasart** » Wed Nov 12, 2014 12:15 am

cdani wrote:
lucasart wrote:
cdani wrote: And in quiescence:
• Hash move, even if it’s a quiet move. I tried only captures and promotions, and it was clearly weaker.
Interesting. Testing in SF:
http://tests.stockfishchess.org/tests/v ... 75593e5d7b
I don't understand the results of this test, I never took time to learn those parameters. Anyone can explain it? I see that there are some more wins.
Thanks.

Be careful when looking at stopped results, because they are biaised. What this shows, in a nutshell, is that the patch is neither an elo gain nor a regression. That's what we call a "zero elo patch". In Stockfish it's a small simplification, and corrects an erroneous comment about qsearch explosion as well. Perhaps it makes sense to commit.

PK · Post by PK » Wed Nov 12, 2014 1:03 am

I decided to test this idea overnight.

Beside that, I printed a couple of positions where non-capturing hash move is tried in quiescence search. Most of the time these are moves that improve static score significantly: castling, central pawn pushes (including well-prepared d3-d4 in the middlegame), rook to 7th rank, increasing the number of pieces attacking enemy king. There were piece escpes, but much less than I thought. As a bit of surprise there came a fair share of moves attacking enemy queen or offering queen trade.

It seems that using hash moves in quiesce might act as a very cost-effective replacement of a move generator spitting out potential gainers.

Changes in Andscacs 0.70

Re: Changes in Andscacs 0.70

Re: Changes in Andscacs 0.70

Re: Changes in Andscacs 0.70

Re: Changes in Andscacs 0.70

Re: Changes in Andscacs 0.70

Re: Changes in Andscacs 0.70.

Re: Changes in Andscacs 0.70

Re: Changes in Andscacs 0.70

Re: Changes in Andscacs 0.70

Re: Changes in Andscacs 0.70