Stockfish 4 running for the IPON

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish 4 running for the IPON

Post by Laskos »

Uri Blass wrote:
Laskos wrote:
Vinvin wrote:As what I read, around +50 elo over SF 3 ...
At ultra-fast. Probably 25-30 points at blitz. Let's see.
I see no reason to reduce everything that you get at ultra-fast time control
espacially when the testing method of Stockfish discourage changes that do not scale well so these changes have smaller chance to be accepted.

After 130 games
100.0 - 30.0 76.92% Perf=3046
and this performance of 70 elo above stockfish3 does not include beating Junior 7-0
From ultra-fast to blitz rating compresses significantly. I do not expect 50-60 points improvement which they get in ultra-fast self-playing. But let's see.
Uri Blass
Posts: 11152
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish 4 running for the IPON

Post by Uri Blass »

Laskos wrote:
Uri Blass wrote:
Laskos wrote:
Vinvin wrote:As what I read, around +50 elo over SF 3 ...
At ultra-fast. Probably 25-30 points at blitz. Let's see.
I see no reason to reduce everything that you get at ultra-fast time control
espacially when the testing method of Stockfish discourage changes that do not scale well so these changes have smaller chance to be accepted.

After 130 games
100.0 - 30.0 76.92% Perf=3046
and this performance of 70 elo above stockfish3 does not include beating Junior 7-0
From ultra-fast to blitz rating compresses significantly. I do not expect 50-60 points improvement which they get in ultra-fast self-playing. But let's see.
I think that it is dependent on the changes that you do.
Improvement in the order of moves may give more elo at longer time control and a part of the changes in stockfish is improvement in the order of moves.

The improvement now is smaller and only 51 elo but still clearly more than your 25-30 elo estimate.

I am not sure if it is going to remain 50-60 elo but I think that it is going to be closer to 50-60 relative to 25-30(in other words at least 42 elo improvement that is more than ((50+60)/2+(25+30)/2)/2
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish 4 running for the IPON

Post by Laskos »

Uri Blass wrote:
Laskos wrote:
Uri Blass wrote:
Laskos wrote:
Vinvin wrote:As what I read, around +50 elo over SF 3 ...
At ultra-fast. Probably 25-30 points at blitz. Let's see.
I see no reason to reduce everything that you get at ultra-fast time control
espacially when the testing method of Stockfish discourage changes that do not scale well so these changes have smaller chance to be accepted.

After 130 games
100.0 - 30.0 76.92% Perf=3046
and this performance of 70 elo above stockfish3 does not include beating Junior 7-0
From ultra-fast to blitz rating compresses significantly. I do not expect 50-60 points improvement which they get in ultra-fast self-playing. But let's see.
I think that it is dependent on the changes that you do.
Improvement in the order of moves may give more elo at longer time control and a part of the changes in stockfish is improvement in the order of moves.

The improvement now is smaller and only 51 elo but still clearly more than your 25-30 elo estimate.

I am not sure if it is going to remain 50-60 elo but I think that it is going to be closer to 50-60 relative to 25-30(in other words at least 42 elo improvement that is more than ((50+60)/2+(25+30)/2)/2
Ok, let's bet, you are betting on >42, I am on <42.
Uri Blass
Posts: 11152
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish 4 running for the IPON

Post by Uri Blass »

I guessed >=42 but I expect to win even with >42(but of course no big confidence about it).
User avatar
Eelco de Groot
Posts: 4696
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: Stockfish 4 running for the IPON

Post by Eelco de Groot »

Uri you are becoming something of a (betting) pool hustler. You should have bet some money. Or was that the plan, but Kai doesn't know it yet. Of course, I think nobody relies on ultrafast Blitz alone anymore after the experiences at the Stockfish framework. That is just the point. The minimum is 60" + 0.05" testing in Stage II. And I bet Houdini is doing something similar now. The difference is more that Stockfish does selftesting but the other top programs do less of this. But about the timecontrol, 1 minute games that is still Blitz, but with significantly less horizon effects when on modern processors, than for the Stage I 15" + 0.05" Uri knows this very well, but he is like Paul Newman in that 1961 film. Before you know it he will raise the stakes on you Kai!

Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: Stockfish 4 running for the IPON

Post by gladius »

IWB wrote:http://www.inwoba.de

According to the fora and the internal self test it might be a big improvement.

Have fun
Ingo
Thanks Ingo! Interesting that so far the rating seems to be determined by the draw ratio against engines other than Houdini/Komodo. Default contempt setting could be interesting here. Maybe for SF 4+ :).
Tennison
Posts: 183
Joined: Sat Nov 26, 2011 2:02 pm

Re: Stockfish 4 running for the IPON

Post by Tennison »

Ingo,

Why are you using Zappa and not Booot for your tests ?
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish 4 running for the IPON

Post by IWB »

Tennison wrote:Ingo,

Why are you using Zappa and not Booot for your tests ?
Ups, good point! I edited an older tourney to start this and it seems that Booot slipped my mind as I used Zappa for years!
I will start the Match Stockfish vs Boot right now and include it later.

Thx for the remark
Ingo
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish 4 running for the IPON

Post by Laskos »

Eelco de Groot wrote:Uri you are becoming something of a (betting) pool hustler. You should have bet some money. Or was that the plan, but Kai doesn't know it yet. Of course, I think nobody relies on ultrafast Blitz alone anymore after the experiences at the Stockfish framework. That is just the point. The minimum is 60" + 0.05" testing in Stage II. And I bet Houdini is doing something similar now. The difference is more that Stockfish does selftesting but the other top programs do less of this. But about the timecontrol, 1 minute games that is still Blitz, but with significantly less horizon effects when on modern processors, than for the Stage I 15" + 0.05" Uri knows this very well, but he is like Paul Newman in that 1961 film. Before you know it he will raise the stakes on you Kai!

Eelco
It seems it's me who should have risen the stakes.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Stockfish 4 running for the IPON

Post by lkaufman »

It seems at first that we have a bit of a mystery, in that Stockfish 4 beats Houdini 3 in your test but does worse against almost every other opponent. It could of course just be sample error, but more generally it seems many tests report the recent SF versions to be pretty competitive with Houdini 3 at non-blitz levels, but probably SF4 will rate below Houdini 3 on all the rating lists. I think the explanation is "contempt". Because Houdini 3 uses a super-high contempt while SF4 uses none, this handicaps Houdini 3 when played directly against SF but boosts its rating compared to SF when they both play lesser engines. Komodo is in-between, using contempt but much less than Houdini. Personally I would prefer that all testing be done with zero contempt, but I don't think that is even an option on Houdini 3 so it's not a practical proposal. Anyway I think it means that Stockfish and Komodo have less of a gap to close than the rating lists suggest, but more than direct matches imply.