From ultra-fast to blitz rating compresses significantly. I do not expect 50-60 points improvement which they get in ultra-fast self-playing. But let's see.Uri Blass wrote:I see no reason to reduce everything that you get at ultra-fast time controlLaskos wrote:At ultra-fast. Probably 25-30 points at blitz. Let's see.Vinvin wrote:As what I read, around +50 elo over SF 3 ...
espacially when the testing method of Stockfish discourage changes that do not scale well so these changes have smaller chance to be accepted.
After 130 games
100.0 - 30.0 76.92% Perf=3046
and this performance of 70 elo above stockfish3 does not include beating Junior 7-0
Stockfish 4 running for the IPON
Moderator: Ras
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Stockfish 4 running for the IPON
-
Uri Blass
- Posts: 11152
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Stockfish 4 running for the IPON
I think that it is dependent on the changes that you do.Laskos wrote:From ultra-fast to blitz rating compresses significantly. I do not expect 50-60 points improvement which they get in ultra-fast self-playing. But let's see.Uri Blass wrote:I see no reason to reduce everything that you get at ultra-fast time controlLaskos wrote:At ultra-fast. Probably 25-30 points at blitz. Let's see.Vinvin wrote:As what I read, around +50 elo over SF 3 ...
espacially when the testing method of Stockfish discourage changes that do not scale well so these changes have smaller chance to be accepted.
After 130 games
100.0 - 30.0 76.92% Perf=3046
and this performance of 70 elo above stockfish3 does not include beating Junior 7-0
Improvement in the order of moves may give more elo at longer time control and a part of the changes in stockfish is improvement in the order of moves.
The improvement now is smaller and only 51 elo but still clearly more than your 25-30 elo estimate.
I am not sure if it is going to remain 50-60 elo but I think that it is going to be closer to 50-60 relative to 25-30(in other words at least 42 elo improvement that is more than ((50+60)/2+(25+30)/2)/2
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Stockfish 4 running for the IPON
Ok, let's bet, you are betting on >42, I am on <42.Uri Blass wrote:I think that it is dependent on the changes that you do.Laskos wrote:From ultra-fast to blitz rating compresses significantly. I do not expect 50-60 points improvement which they get in ultra-fast self-playing. But let's see.Uri Blass wrote:I see no reason to reduce everything that you get at ultra-fast time controlLaskos wrote:At ultra-fast. Probably 25-30 points at blitz. Let's see.Vinvin wrote:As what I read, around +50 elo over SF 3 ...
espacially when the testing method of Stockfish discourage changes that do not scale well so these changes have smaller chance to be accepted.
After 130 games
100.0 - 30.0 76.92% Perf=3046
and this performance of 70 elo above stockfish3 does not include beating Junior 7-0
Improvement in the order of moves may give more elo at longer time control and a part of the changes in stockfish is improvement in the order of moves.
The improvement now is smaller and only 51 elo but still clearly more than your 25-30 elo estimate.
I am not sure if it is going to remain 50-60 elo but I think that it is going to be closer to 50-60 relative to 25-30(in other words at least 42 elo improvement that is more than ((50+60)/2+(25+30)/2)/2
-
Uri Blass
- Posts: 11152
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Stockfish 4 running for the IPON
I guessed >=42 but I expect to win even with >42(but of course no big confidence about it).
-
Eelco de Groot
- Posts: 4696
- Joined: Sun Mar 12, 2006 2:40 am
- Full name: Eelco de Groot
Re: Stockfish 4 running for the IPON
Uri you are becoming something of a (betting) pool hustler. You should have bet some money. Or was that the plan, but Kai doesn't know it yet. Of course, I think nobody relies on ultrafast Blitz alone anymore after the experiences at the Stockfish framework. That is just the point. The minimum is 60" + 0.05" testing in Stage II. And I bet Houdini is doing something similar now. The difference is more that Stockfish does selftesting but the other top programs do less of this. But about the timecontrol, 1 minute games that is still Blitz, but with significantly less horizon effects when on modern processors, than for the Stage I 15" + 0.05" Uri knows this very well, but he is like Paul Newman in that 1961 film. Before you know it he will raise the stakes on you Kai!
Eelco
Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
-
gladius
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: Stockfish 4 running for the IPON
Thanks Ingo! Interesting that so far the rating seems to be determined by the draw ratio against engines other than Houdini/Komodo. Default contempt setting could be interesting here. Maybe for SF 4+IWB wrote:http://www.inwoba.de
According to the fora and the internal self test it might be a big improvement.
Have fun
Ingo
-
Tennison
- Posts: 183
- Joined: Sat Nov 26, 2011 2:02 pm
Re: Stockfish 4 running for the IPON
Ingo,
Why are you using Zappa and not Booot for your tests ?
Why are you using Zappa and not Booot for your tests ?
-
IWB
- Posts: 1539
- Joined: Thu Mar 09, 2006 2:02 pm
Re: Stockfish 4 running for the IPON
Ups, good point! I edited an older tourney to start this and it seems that Booot slipped my mind as I used Zappa for years!Tennison wrote:Ingo,
Why are you using Zappa and not Booot for your tests ?
I will start the Match Stockfish vs Boot right now and include it later.
Thx for the remark
Ingo
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Stockfish 4 running for the IPON
It seems it's me who should have risen the stakes.Eelco de Groot wrote:Uri you are becoming something of a (betting) pool hustler. You should have bet some money. Or was that the plan, but Kai doesn't know it yet. Of course, I think nobody relies on ultrafast Blitz alone anymore after the experiences at the Stockfish framework. That is just the point. The minimum is 60" + 0.05" testing in Stage II. And I bet Houdini is doing something similar now. The difference is more that Stockfish does selftesting but the other top programs do less of this. But about the timecontrol, 1 minute games that is still Blitz, but with significantly less horizon effects when on modern processors, than for the Stage I 15" + 0.05" Uri knows this very well, but he is like Paul Newman in that 1961 film. Before you know it he will raise the stakes on you Kai!
Eelco
-
lkaufman
- Posts: 6284
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Stockfish 4 running for the IPON
It seems at first that we have a bit of a mystery, in that Stockfish 4 beats Houdini 3 in your test but does worse against almost every other opponent. It could of course just be sample error, but more generally it seems many tests report the recent SF versions to be pretty competitive with Houdini 3 at non-blitz levels, but probably SF4 will rate below Houdini 3 on all the rating lists. I think the explanation is "contempt". Because Houdini 3 uses a super-high contempt while SF4 uses none, this handicaps Houdini 3 when played directly against SF but boosts its rating compared to SF when they both play lesser engines. Komodo is in-between, using contempt but much less than Houdini. Personally I would prefer that all testing be done with zero contempt, but I don't think that is even an option on Houdini 3 so it's not a practical proposal. Anyway I think it means that Stockfish and Komodo have less of a gap to close than the rating lists suggest, but more than direct matches imply.