18 days from SF4 release and about ~30+ ELO gain!

shrapnel · Post by **shrapnel** » Thu Sep 12, 2013 8:05 am

GenoM wrote:
mcostalba wrote:
Houdini wrote:
Marco, you sorry? Why? I can't understand... You're sharing all your ideas (if not code) with all commercial authors, right? That's more than enough for not feeling any remorse for that what you're doing.
Just my 2 cents.
Perhaps you never will...but let me try.
Lets say 'A' makes an Income from developing and selling commercial chess programs. He works hard to to do this and so has a right to make profits from his efforts. (Not to mention, that a large percentage of his Profits is eaten into by Pirates ).
Now, enter Mr 'B', the Philanthropist. He develops and publishes free chess software, simply out of love for the Game of Chess, which is indeed commendable. Now, the unforeseen happens ! The free Program becomes stronger than Commercial program. People stop buying A's commercial program...big Loss for A !
Now, if B was actually selling his Program, he would have indeed nothing to apologize for, following the principle of 'All is fair in love and War' or " Survival of the Fittest " !
But since B is actually gaining NO material benefit from his superior Program and is causing a Loss to "A" to boot, for no good financial reason, he feels morally obliged to apologize to "A".
THIS is what Marco feels and that is why he is apologizing to Robert Houdart !

Eelco de Groot · Post by **Eelco de Groot** » Thu Sep 12, 2013 8:25 am

gladius wrote:
lkaufman wrote:
Masta wrote:Yeah...seems that SF will run over other engines like a damn TRUCK!

18 days from release date of SF4 and almost +30 ELO gain. -> http://95.47.140.100/tests/view/522bcb1 ... 2ee68dc04a

Have a nice day yo false magicians. Your days are counted.
Since I found this hard to believe, I ran a similar test myself (SF Sept. 8 vs SF4). While the details differ slightly (book, exact time limit, hardware) the test was quite similar. My result showed a gain of just 11.5 elo. The difference is too large to attribute to sample error. Any other theories?
What were your testing conditions (time control, threads, # of games)? I'm assuming it's 11.5 elo +- some error bar .

SF4 release version has a few changes that can influence self tests, the TT is not cleared between games, and Idle threads sleep is set to false, but that only affects matches with threads > 1. For this reason, our regression tests are performed against the non-release version.

Otherwise, I'm not really sure to be honest.

The "30 Elo" is probably a bit too much, because the latest regression test shows about 26 Elo although some new patches are in this test. The timecontrol is maybe different, but that should not have such a big impact on Larry's test. The 8moves_GM.pgn from Adam Hair is public, Adam has given a link recently in this forum again and the experiences with it have been good at least better than with variety.bin as a testing book... In case Larry uses an actual book instead of fixed openings not clearing TT between games would introduce some noise but I believe that is not done in the non-release Stockfish 4? If there is still this big discrepancy that Larry measures, after a switch to the non release SF 4 and using same openings, it must be something in the testing conditions that differs too much between the framework and Larry's tests?

Eelco

Uri Blass · Post by **Uri Blass** » Thu Sep 12, 2013 9:36 am

Eelco de Groot wrote:
gladius wrote:
lkaufman wrote:
Masta wrote:Yeah...seems that SF will run over other engines like a damn TRUCK!

18 days from release date of SF4 and almost +30 ELO gain. -> http://95.47.140.100/tests/view/522bcb1 ... 2ee68dc04a

Have a nice day yo false magicians. Your days are counted.
Since I found this hard to believe, I ran a similar test myself (SF Sept. 8 vs SF4). While the details differ slightly (book, exact time limit, hardware) the test was quite similar. My result showed a gain of just 11.5 elo. The difference is too large to attribute to sample error. Any other theories?
What were your testing conditions (time control, threads, # of games)? I'm assuming it's 11.5 elo +- some error bar .

SF4 release version has a few changes that can influence self tests, the TT is not cleared between games, and Idle threads sleep is set to false, but that only affects matches with threads > 1. For this reason, our regression tests are performed against the non-release version.

Otherwise, I'm not really sure to be honest.
The "30 Elo" is probably a bit too much, because the latest regression test shows about 26 Elo although some new patches are in this test. The timecontrol is maybe different, but that should not have such a big impact on Larry's test. The 8moves_GM.pgn from Adam Hair is public, Adam has given a link recently in this forum again and the experiences with it have been good at least better than with variety.bin as a testing book... In case Larry uses an actual book instead of fixed openings not clearing TT between games would introduce some noise but I believe that is not done in the non-release Stockfish 4? If there is still this big discrepancy that Larry measures, after a switch to the non release SF 4 and using same openings, it must be something in the testing conditions that differs too much between the framework and Larry's tests?

Eelco

The latest regression test include some patchs that are probably a regression so let wait for the next regression test.

lkaufman · Post by **lkaufman** » Thu Sep 12, 2013 4:15 pm

gladius wrote:
lkaufman wrote:
Masta wrote:Yeah...seems that SF will run over other engines like a damn TRUCK!

18 days from release date of SF4 and almost +30 ELO gain. -> http://95.47.140.100/tests/view/522bcb1 ... 2ee68dc04a

Have a nice day yo false magicians. Your days are counted.
Since I found this hard to believe, I ran a similar test myself (SF Sept. 8 vs SF4). While the details differ slightly (book, exact time limit, hardware) the test was quite similar. My result showed a gain of just 11.5 elo. The difference is too large to attribute to sample error. Any other theories?
What were your testing conditions (time control, threads, # of games)? I'm assuming it's 11.5 elo +- some error bar .

SF4 release version has a few changes that can influence self tests, the TT is not cleared between games, and Idle threads sleep is set to false, but that only affects matches with threads > 1. For this reason, our regression tests are performed against the non-release version.

Otherwise, I'm not really sure to be honest.

Time limit was 2' + 1.2" for about half the games and 30" + .3" on the other half (on faster hardware), so on average about like yours. Number of games was something like 6 or 7 thousand (I forget exact number and I don't have it handy right now), so error bar was somewhere around 4 elo I think. Does clearing TT make a measurable difference in these direct matches? Any other settings or factors that could explain the discrepancy? I used default settings for both versions.

lkaufman · Post by **lkaufman** » Thu Sep 12, 2013 7:31 pm

Uri Blass wrote:
Eelco de Groot wrote:
gladius wrote: The "30 Elo" is probably a bit too much, because the latest regression test shows about 26 Elo although some new patches are in this test. The timecontrol is maybe different, but that should not have such a big impact on Larry's test. The 8moves_GM.pgn from Adam Hair is public, Adam has given a link recently in this forum again and the experiences with it have been good at least better than with variety.bin as a testing book... In case Larry uses an actual book instead of fixed openings not clearing TT between games would introduce some noise but I believe that is not done in the non-release Stockfish 4? If there is still this big discrepancy that Larry measures, after a switch to the non release SF 4 and using same openings, it must be something in the testing conditions that differs too much between the framework and Larry's tests?

Eelco

How many positions are in the Adam Hair book you mention? Since the test shows 20,000 games, it should be at least 10,000 to avoid possible duplicated games; is it that big? I use our own set of over 35,000 opening positions, enough for 70k games. If the book used in the regression test was much smaller than 10k, this might mean that the true error margin was much larger than the reported one.

The latest regression test include some patchs that are probably a regression so let wait for the next regression test.

Ajedrecista · Post by **Ajedrecista** » Thu Sep 12, 2013 7:50 pm

Hello Larry:

lkaufman wrote:Number of games was something like 6 or 7 thousand (I forget exact number and I don't have it handy right now), so error bar was somewhere around 4 elo I think.

Sure? I think that this error bar of circa ± 4 Elo for 6000 or 7000 games corresponds for a one-sigma confidence level, that is, ~ 68.27% confidence level. Since we are accustomed to 95% confidence level ~ 1.96-sigma confidence level, and an Elo gap of 11.5 Elo translates into a score 51.7%-48.3% (near 50%-50%), then the error bars for 95% confidence are (in first approximation) 1.96*(± 4), that is, around ± 8 Elo (from ± 7 to ± 9 because the original ± 4 could be ± 3.6 or ± 4.4 Elo). Please confirm my thought. Thanks in advance.

Regards from Spain.

Ajedrecista.

gladius · Post by **gladius** » Thu Sep 12, 2013 8:22 pm

lkaufman wrote:
gladius wrote:
lkaufman wrote:
Masta wrote:Yeah...seems that SF will run over other engines like a damn TRUCK!

18 days from release date of SF4 and almost +30 ELO gain. -> http://95.47.140.100/tests/view/522bcb1 ... 2ee68dc04a

Have a nice day yo false magicians. Your days are counted.
Since I found this hard to believe, I ran a similar test myself (SF Sept. 8 vs SF4). While the details differ slightly (book, exact time limit, hardware) the test was quite similar. My result showed a gain of just 11.5 elo. The difference is too large to attribute to sample error. Any other theories?
What were your testing conditions (time control, threads, # of games)? I'm assuming it's 11.5 elo +- some error bar .

SF4 release version has a few changes that can influence self tests, the TT is not cleared between games, and Idle threads sleep is set to false, but that only affects matches with threads > 1. For this reason, our regression tests are performed against the non-release version.

Otherwise, I'm not really sure to be honest.
Time limit was 2' + 1.2" for about half the games and 30" + .3" on the other half (on faster hardware), so on average about like yours. Number of games was something like 6 or 7 thousand (I forget exact number and I don't have it handy right now), so error bar was somewhere around 4 elo I think. Does clearing TT make a measurable difference in these direct matches? Any other settings or factors that could explain the discrepancy? I used default settings for both versions.

7000 games is 95% error bar of 8 ELO or so, it's entirely possible this was just an unlucky run.

The PGN has 48,491 games, so we should be okay there.

Uri Blass · Post by **Uri Blass** » Fri Sep 13, 2013 1:16 am

gladius wrote:
lkaufman wrote:
gladius wrote:
lkaufman wrote:
Masta wrote:Yeah...seems that SF will run over other engines like a damn TRUCK!

18 days from release date of SF4 and almost +30 ELO gain. -> http://95.47.140.100/tests/view/522bcb1 ... 2ee68dc04a

Have a nice day yo false magicians. Your days are counted.
Since I found this hard to believe, I ran a similar test myself (SF Sept. 8 vs SF4). While the details differ slightly (book, exact time limit, hardware) the test was quite similar. My result showed a gain of just 11.5 elo. The difference is too large to attribute to sample error. Any other theories?
What were your testing conditions (time control, threads, # of games)? I'm assuming it's 11.5 elo +- some error bar .

SF4 release version has a few changes that can influence self tests, the TT is not cleared between games, and Idle threads sleep is set to false, but that only affects matches with threads > 1. For this reason, our regression tests are performed against the non-release version.

Otherwise, I'm not really sure to be honest.
Time limit was 2' + 1.2" for about half the games and 30" + .3" on the other half (on faster hardware), so on average about like yours. Number of games was something like 6 or 7 thousand (I forget exact number and I don't have it handy right now), so error bar was somewhere around 4 elo I think. Does clearing TT make a measurable difference in these direct matches? Any other settings or factors that could explain the discrepancy? I used default settings for both versions.
7000 games is 95% error bar of 8 ELO or so, it's entirely possible this was just an unlucky run.

The PGN has 48,491 games, so we should be okay there.

I do not see how you get error bar of 8 elo for 7000 games and I think that it is 4-5 elo.

You have 2.8 error bar after 20,000 games
see for example the regression of latest stockfish

http://tests.stockfishchess.org/tests/v ... 63f25cba49
you should have 2.8*sqrt(20,000/7000) after 7000 games that is between 4 elo and 5 elo.

Uri Blass · Post by **Uri Blass** » Fri Sep 13, 2013 1:23 am

Ajedrecista wrote:Hello Larry:

lkaufman wrote:Number of games was something like 6 or 7 thousand (I forget exact number and I don't have it handy right now), so error bar was somewhere around 4 elo I think.
Sure? I think that this error bar of circa ± 4 Elo for 6000 or 7000 games corresponds for a one-sigma confidence level, that is, ~ 68.27% confidence level. Since we are accustomed to 95% confidence level ~ 1.96-sigma confidence level, and an Elo gap of 11.5 Elo translates into a score 51.7%-48.3% (near 50%-50%), then the error bars for 95% confidence are (in first approximation) 1.96*(± 4), that is, around ± 8 Elo (from ± 7 to ± 9 because the original ± 4 could be ± 3.6 or ± 4.4 Elo). Please confirm my thought. Thanks in advance.

Regards from Spain.

Ajedrecista.

after 20,000 games the error bar is 2.8 elo with 95% confidence after 20,000 games

see http://tests.stockfishchess.org/tests/v ... 63f25cba49

It means that in the worst case of 6000 games the error bar is
2.8*sqrt(20,000/6000) that is near 5 elo.

gladius · Post by **gladius** » Fri Sep 13, 2013 1:42 am

Uri Blass wrote:
Ajedrecista wrote:Hello Larry:

lkaufman wrote:Number of games was something like 6 or 7 thousand (I forget exact number and I don't have it handy right now), so error bar was somewhere around 4 elo I think.
Sure? I think that this error bar of circa ± 4 Elo for 6000 or 7000 games corresponds for a one-sigma confidence level, that is, ~ 68.27% confidence level. Since we are accustomed to 95% confidence level ~ 1.96-sigma confidence level, and an Elo gap of 11.5 Elo translates into a score 51.7%-48.3% (near 50%-50%), then the error bars for 95% confidence are (in first approximation) 1.96*(± 4), that is, around ± 8 Elo (from ± 7 to ± 9 because the original ± 4 could be ± 3.6 or ± 4.4 Elo). Please confirm my thought. Thanks in advance.

Regards from Spain.

Ajedrecista.
after 20,000 games the error bar is 2.8 elo with 95% confidence after 20,000 games

see http://tests.stockfishchess.org/tests/v ... 63f25cba49

It means that in the worst case of 6000 games the error bar is
2.8*sqrt(20,000/6000) that is near 5 elo.

I used my rating calculator http://forwardcoding.com/projects/ajaxchess/rating.html, assuming 60% draw rate, and 10 elo advantage. It gives this:

ELO: 9.93 +- 8.15
LOS: 99.99%
Wins: 1600 Losses: 1400 Draws: 4000

However, I just tried it with fishtest's stat_util.py https://github.com/glinscott/fishtest/b ... at_util.py and it gives
ELO: 11.56 +- 6.2
LOS: 99.99%

I would tend to trust stat_util.py more, but I'm not honestly sure.

18 days from SF4 release and about ~30+ ELO gain!

Re: 18 days from SF4 release and about ~30+ ELO gain!

Re: 18 days from SF4 release and about ~30+ ELO gain!

Re: 18 days from SF4 release and about ~30+ ELO gain!

Re: 18 days from SF4 release and about ~30+ ELO gain!

Re: 18 days from SF4 release and about ~30+ ELO gain!

Re: 19 days from SF 4 release and about ~30 Elo gain!

Re: 18 days from SF4 release and about ~30+ ELO gain!

Re: 18 days from SF4 release and about ~30+ ELO gain!

Re: 19 days from SF 4 release and about ~30 Elo gain!

Re: 19 days from SF 4 release and about ~30 Elo gain!