Curiosity Engine ??? (For Now It's Number 1 )

syzygy · Post by **syzygy** » Sun Nov 01, 2015 10:50 pm

IWB wrote:
syzygy wrote:PERHAPS Stockfish crashes if it reaches maximum depth while pondering (i.e. maximum number of iterations). I did not test this.
Don't know, 127 plys in the below case, but only 70 in the example before, but there it was only 3 pieces ... I don't know

But it is not the related to a low number of pieces on the board:

It could still be that the engine reached 127 and crashed before iteration 71 reached the GUI...

In any case there seems to be a relation with reaching very high depths.

Leto · Post by **Leto** » Sun Nov 01, 2015 11:56 pm

Don't know if this is relevant but in both screenshots Stockfish shows mate in 4.

syzygy · Post by **syzygy** » Mon Nov 02, 2015 12:59 am

Leto wrote:Don't know if this is relevant but in both screenshots Stockfish shows mate in 4.

If there is a quick mate, Stockfish does not need much time per iteration, so reaches high depth. This is why I am suspecting that the crash occurs when Stockfish reaches the maximum iteration depth during pondering.

(I know that my private engine does not deal very well with this situation. Once it reaches max depth, it restarts the search from iteration 1 and completely spams the GUI until the opponent makes a move. It does not crash, but that is more a question of luck. Of course if it did crash, I would have had more reason to fix it...)

bob · Post by **bob** » Mon Nov 02, 2015 3:43 am

Ozymandias wrote:
Aser Huerga wrote:Ponder ON=since engines has no psychological capabilities, no foresight, ponder on only introduces some degree of random bias. It has no sense to emulate human conditions, engines aren't humans.
Ponder OFF=a more precise and reproductible way of testing engines

Is my honest opinion.
This not mean Ingo work are invalid, but I think it would be more precise with ponder off, even with the same number of games (ponder off allow to play twice number of games for the same period of time, hence more accurate results).
I was hoping for some kind of insight, into Ponder ON testing. What you say is basically what I thought, but maybe extensive testing has revealed, that some engines benefit more than others, because they guess the move played more often. It'd be interesting to see those stats.

The best idea is to simply test the way real games are played. Every tournament known to man has two clocks and pondering is a part of the game. You could make the same argument for max_threads = 1 that you do for ponder = off. But you will produce a skewed result compared to real tournament games...

michiguel · Post by **michiguel** » Mon Nov 02, 2015 4:01 am

syzygy wrote:
IWB wrote:
syzygy wrote:PERHAPS Stockfish crashes if it reaches maximum depth while pondering (i.e. maximum number of iterations). I did not test this.
Don't know, 127 plys in the below case, but only 70 in the example before, but there it was only 3 pieces ... I don't know

But it is not the related to a low number of pieces on the board:
It could still be that the engine reached 127 and crashed before iteration 71 reached the GUI...

In any case there seems to be a relation with reaching very high depths.

Engines should be debugged of those problems by reducing max depths in a "special debugged mode" to a very low number (16, for instance) to force those situations very often. That applies to anything that happens rarely.

Miguel

IWB · Post by **IWB** » Mon Nov 02, 2015 8:04 am

To conclude that:

By far most games were adjucated by the GUi.
Over night I had 5 GUI crashes, which means that without restarting a Ponder ON tourney like mine would be stuck sooner or later (at least not if the GUI is not restarting automaticaly).
Only 66 games were tried to be played to the mate and only 36 succeded. 26 crashed - that is a 40% crash rate ...

I never had any problems with Stockfish 6!

Bye
Ingo

PS: Unfortunately there is no easy channel (email) to report bugs to the SF team! So I have to do it here and hope for the best.

IWB · Post by **IWB** » Mon Nov 02, 2015 8:08 am

bob wrote: The best idea is to simply test the way real games are played. Every tournament known to man has two clocks and pondering is a part of the game. You could make the same argument for max_threads = 1 that you do for ponder = off. But you will produce a skewed result compared to real tournament games...

Exactly my point. Ponder off was a compromise for engine games when you had limited resources years ago (which is no problem nowadays) but it is not the normal way to play chess. That is why I consider Ponder OFF games as a sub group of "real", "full" or "name it as you want" chess.

Bye
Ingo

IWB · Post by **IWB** » Mon Nov 02, 2015 8:20 am

I know it would belong in the other section but I dont want to start another thread:

You are very close to your goal for a new release. If the current SF woudl be a final release the TOP of my list (if someone cares) would look like this:

Code: Select all

   1 SF151031                     &#58;   3229     11   2771.5    3520   78.7%      98
   2 Komodo 9.2                   &#58;   3214     11   3281.5    4180   78.5%     100
   3 Komodo 9                     &#58;   3190     10   3120.0    3960   78.8%      89
   4 Stockfish 6                  &#58;   3182      9   3951.5    5280   74.8%     100
   5 Stockfish 5s                 &#58;   3141      8   3972.5    5280   75.2%      53
   6 Komodo 8                     &#58;   3141      9   3789.5    5060   74.9%     100
   7 Houdini 4                    &#58;   3127      6   7639.5   10590   72.1%     100
   8 Komodo 7a                    &#58;   3104      9   2798.5    3960   70.7%      79

Bye
Ingo

PS: I hope you will fix the bug prior to the release!

Ozymandias · Post by **Ozymandias** » Mon Nov 02, 2015 10:30 am

bob wrote:
Ozymandias wrote:
Aser Huerga wrote:Ponder ON=since engines has no psychological capabilities, no foresight, ponder on only introduces some degree of random bias. It has no sense to emulate human conditions, engines aren't humans.
Ponder OFF=a more precise and reproductible way of testing engines

Is my honest opinion.
This not mean Ingo work are invalid, but I think it would be more precise with ponder off, even with the same number of games (ponder off allow to play twice number of games for the same period of time, hence more accurate results).
I was hoping for some kind of insight, into Ponder ON testing. What you say is basically what I thought, but maybe extensive testing has revealed, that some engines benefit more than others, because they guess the move played more often. It'd be interesting to see those stats.
The best idea is to simply test the way real games are played. Every tournament known to man has two clocks and pondering is a part of the game. You could make the same argument for max_threads = 1 that you do for ponder = off. But you will produce a skewed result compared to real tournament games…

I will make you the same question, then: how would you label Ponder=Off games? Unreal? I'm really curious.

IWB · Post by **IWB** » Mon Nov 02, 2015 10:35 am

Ozymandias wrote:
bob wrote:
Ozymandias wrote:
Aser Huerga wrote:Ponder ON=since engines has no psychological capabilities, no foresight, ponder on only introduces some degree of random bias. It has no sense to emulate human conditions, engines aren't humans.
Ponder OFF=a more precise and reproductible way of testing engines

Is my honest opinion.
This not mean Ingo work are invalid, but I think it would be more precise with ponder off, even with the same number of games (ponder off allow to play twice number of games for the same period of time, hence more accurate results).
I was hoping for some kind of insight, into Ponder ON testing. What you say is basically what I thought, but maybe extensive testing has revealed, that some engines benefit more than others, because they guess the move played more often. It'd be interesting to see those stats.
The best idea is to simply test the way real games are played. Every tournament known to man has two clocks and pondering is a part of the game. You could make the same argument for max_threads = 1 that you do for ponder = off. But you will produce a skewed result compared to real tournament games…
I will make you the same question, then: how would you label Ponder=Off games? Unreal? I'm really curious.

I don't see the point in insisting in a name for it beside to distinguish it.
I don't mind the label/name, but it is not played how chess is intended to be played! It is a sub group of real/normal chess as it was invented to use limited resources.
Again, if you like it - fine. Its just not my type of game.

Curiosity Engine ??? (For Now It's Number 1 )

Re: Curiosity Engine ??? (For Now It's Number 1 )

Re: Curiosity Engine ??? (For Now It's Number 1 )

Re: Curiosity Engine ??? (For Now It's Number 1 )

Re: Curiosity Engine ??? (For Now It's Number 1 )

Re: Curiosity Engine ??? (For Now It's Number 1 )

Re: Curiosity Engine ??? (For Now It's Number 1 )

Re: Curiosity Engine ??? (For Now It's Number 1 )

Re: Curiosity Engine ??? (For Now It's Number 1 )

Re: Curiosity Engine ??? (For Now It's Number 1 )

Re: Curiosity Engine ??? (For Now It's Number 1 )