Stockfish 2.3.1 weaker than 2.2.2?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by gladius »

zamar wrote:
gladius wrote: Yes, the progress on Stockfish has been great! However, with each change we made for 2.3.1, things looked quite positive. The sum of all those changes seems to be that 2.3.1 is about equal, and maybe a bit stronger. So, something is definitely amiss.
Not necessarily. Two thing must be kept in mind:

- Self-play always exaggerates things. Against other engines the actual change is about 1/2 of the improvement measured in self-play.

- Selection bias. You cannot sum up ELOs between separate tests, the result will be too high.
Yes, excellent points. However, given the patch tests, with even one of the improvements holding up, 2.3.1 should be +4-5 elo. And, certainly, it does seem to fare pretty well against 2.2.2. But in the wider world of engines, doesn't do as well. Perhaps the tests against other engines will not result in much, but it could be illuminating. Only 4 changes to test, so won't take too long too.
User avatar
Peter Skinner
Posts: 1763
Joined: Sun Feb 26, 2006 1:49 pm
Location: Edmonton, Alberta, Canada
Full name: Peter Skinner

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Peter Skinner »

One thing I have noticed with version 2.3.1, is that it draws or tries to draw extremely early in games as black. If it can take a repetition draw, it will.

Version 2.2.2 didn't seem to have this behavior.

Peter
I was kicked out of Chapters because I moved all the Bibles to the fiction section.
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by gladius »

Peter Skinner wrote:One thing I have noticed with version 2.3.1, is that it draws or tries to draw extremely early in games as black. If it can take a repetition draw, it will.

Version 2.2.2 didn't seem to have this behavior.

Peter
This could be because of evaluation changes. Do you have an example? Thanks.
User avatar
Peter Skinner
Posts: 1763
Joined: Sun Feb 26, 2006 1:49 pm
Location: Edmonton, Alberta, Canada
Full name: Peter Skinner

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Peter Skinner »

gladius wrote:
Peter Skinner wrote:One thing I have noticed with version 2.3.1, is that it draws or tries to draw extremely early in games as black. If it can take a repetition draw, it will.

Version 2.2.2 didn't seem to have this behavior.

Peter
This could be because of evaluation changes. Do you have an example? Thanks.

Code: Select all

[Event "FICS rated lightning game"]
[Site "FICS, San Jose, California USA"]
[Date "2012.09.28"]
[Time "12:55:00"]
[Round "-"]
[White "TogaRouter"]
[Black "Webkikr"]
[WhiteElo "2211"]
[BlackElo "3016"]
[TimeControl "60+0"]
[Mode "ICS"]
[Result "1/2-1/2"]

1. d4 Nf6 2. c4 e6 3. Nc3 d5 4. cxd5 exd5 5. Bg5 c6 6. Qc2 Bd6 7. e3 O-O 8. Nf3 h6 9. Bh4 Re8 10. Bg3 Bg4 11. Qb3 Bxg3 12. hxg3 Qc7 13. Bd3 Nbd7 14. Kf1 Qb6 15. Qa3 a5 16. Rh4 Qb4 17. Qb3 Qxb3 18. axb3 g6 19. Ke2 Kg7 20. Kd2 Bh5 21. Ke2 Reb8 22. Kd2 g5 23. Rhh1 Bg6 24. Bxg6 Kxg6 25. Ra3 Ng4 26. Ke2 b5 27. Nb1 c5 28. Ra1 b4 29. Nbd2 Rc8 30. Rhe1 cxd4 31. Nxd4 Nc5 32. Rec1 Ne5 33. Ra2 Nb7 34. Raa1 Nc5 35. Ra2 Nb7 36. Raa1 Nc5 {Game drawn by repetition} 1/2-1/2

Code: Select all

[Event "FICS rated lightning game"]
[Site "FICS, San Jose, California USA"]
[Date "2012.09.28"]
[Time "12:59:27"]
[Round "-"]
[White "TogaRouter"]
[Black "Webkikr"]
[WhiteElo "2219"]
[BlackElo "2980"]
[TimeControl "60+0"]
[Mode "ICS"]
[Result "1/2-1/2"]

1. c4 e5 2. Nc3 Nf6 3. Nf3 Nc6 4. a3 d6 5. e3 e4 6. Ng5 Bf5 7. Qb3 Ne5 8. Qxb7
Rb8 9. Qxa7 Ra8 10. Qb7 Rb8 11. Qa7 Ra8 12. Qb7 Rb8 13. Qa7 {Game drawn by repetition} 1/2-1/2
Both of these draws aren't book draws, yet Stockfish was more than happy to accept them.

In looking at the games with Houdini and Crafty, both engines would have played to avoid them.

Peter
I was kicked out of Chapters because I moved all the Bibles to the fiction section.
User avatar
Peter Skinner
Posts: 1763
Joined: Sun Feb 26, 2006 1:49 pm
Location: Edmonton, Alberta, Canada
Full name: Peter Skinner

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Peter Skinner »

Does anyone have a link to the JA compiles of Stockfish 2.2.2?

For some ungodly reason I just deleted my folder of the engine..

Peter
I was kicked out of Chapters because I moved all the Bibles to the fiction section.
User avatar
Graham Banks
Posts: 41412
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Graham Banks »

Peter Skinner wrote:Does anyone have a link to the JA compiles of Stockfish 2.2.2?

For some ungodly reason I just deleted my folder of the engine..

Peter
I'll put them in the CCRL public forum for you.
http://kirill-kryukov.com/chess/discuss ... p?id=27587
gbanksnz at gmail.com
User avatar
Eelco de Groot
Posts: 4561
Joined: Sun Mar 12, 2006 2:40 am
Full name:   

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Eelco de Groot »

There is an interesting discussion now on GitHub, about some possible further codechanges. Ryan Taker has added some code to Stockfish 2.3 to take into account the 50 move rule in the evaluation, and if I understand it a bit, scores are scaled towards a draw if the fifty move rule is approaching so Stockfish would be able to see earlier if a good eval is actually theatening to slip into a forced draw. However Ryan had trouble actually proving it works under Arena. His results were good but unfortunately they were to good to be believed :)

An added problem is that GitHub was confused by the added whitespaces in Ryan's source, I suspect this is a problem with using the Linux sources of Stockfish under Windows, and GitHub is not really totally Windows friendly either. I can't post any of my own Stockfish modified sources here on this forum, after I have added just a few lines to it with MSVC 2005 under Windows, because of added CR/LF that confuse the code functions of this forum. I have to reedit every line and then I still don't get it 100% right usually. Big nuisance. But the problem is, Ryan would like some help of people willing to do accurate testing, and I think the object is to get some useful statistical tests, so please no volunteers to test his codechanges on Playchess, with a private book or whatever. At this point we need more accurate elo.

I added Ryan's changes to Stockfish Barracuda code just to see if they compile, that did not give any problems. Only two files are changed, with one table added to evaluate.cpp that is used by one added pointer I believe. Added Carriage Returns confused GithUb however into thinking every file was changed...

I even started a little test with the Engine Research Tool from Lokasoft. I did not pick the best version of ERT and I had not used ERT for a long time, so maybe I did not pick the best settings for it. ERT is also not really suitable for very fast testing I suppose, not like Cutechess, so I just set it to a time control of two minutes per player with a Fischer bonus of ten seconds per move. I am not going to get significant results this way anytime soon, I know, but maybe other people have a few cores available to also do some testing? I am seeing a lot of draws so far, but no big problems with the '50 Moverule' version.

At the moment the match is like this, running on an Athlon with just one processor available, so also no pondering, 64 MB hash per engine:

Code: Select all


                                       W   D   L
Stockfish 'Barracuda' No  50Moverule   0   9   2
Stockfish 'Barracuda'     50Moverule   2   9   0
Regards, Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
User avatar
Eelco de Groot
Posts: 4561
Joined: Sun Mar 12, 2006 2:40 am
Full name:   

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Eelco de Groot »

After a twentygame match, the 50 Moverule version was still a bit in the lead but no longer unbeaten:

Code: Select all


                                       W    D    L
Stockfish 'Barracuda' No  50Moverule   2    15   3      47.5%
Stockfish 'Barracuda'     50Moverule   3    15   2      52.5%
There were indeed quite a few threefold repetition draws in the Nunn match, which means also some very short drawn games, about twenty moves. This is a drawback of testing versions that are very closely related I think, and of choosing very balanced opening positions.

Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
User avatar
Peter Skinner
Posts: 1763
Joined: Sun Feb 26, 2006 1:49 pm
Location: Edmonton, Alberta, Canada
Full name: Peter Skinner

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Peter Skinner »

Graham Banks wrote:
Peter Skinner wrote:Does anyone have a link to the JA compiles of Stockfish 2.2.2?

For some ungodly reason I just deleted my folder of the engine..

Peter
I'll put them in the CCRL public forum for you.
http://kirill-kryukov.com/chess/discuss ... p?id=27587
Thanks Graham, but do you have the original archive? I need the linux binaries as well.

Peter
I was kicked out of Chapters because I moved all the Bibles to the fiction section.
User avatar
Graham Banks
Posts: 41412
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Graham Banks »

Peter Skinner wrote:
Graham Banks wrote:
Peter Skinner wrote:Does anyone have a link to the JA compiles of Stockfish 2.2.2?

For some ungodly reason I just deleted my folder of the engine..

Peter
I'll put them in the CCRL public forum for you.
http://kirill-kryukov.com/chess/discuss ... p?id=27587
Thanks Graham, but do you have the original archive? I need the linux binaries as well.

Peter
Sorry. I have no need for the linux binaries, so I didn't extract them at the time. Perhaps somebody else will be kind enough to help you out.
gbanksnz at gmail.com