Stockfish Handicap Test

Discussion of computer chess matches and engine tournaments.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Adam Hair
Posts: 3205
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Stockfish Handicap Test

Post by Adam Hair » Wed Jan 17, 2018 4:34 pm

This is an attempt (using an outdated computer) to determine how much Stockfish may have been "handicapped" against AlphaZero.

Engine: Brainfish 140118
CPU: Xeon L5420 @ 2.5 GHz
Number of cores used per engine: 1
Adjudication: none

The generic Stockfish 8 Linux compile from abrok.eu averages about 1100 Kn/s with 1 core on this computer. In comparison, DeepMind reported that Stockfish 8 @ 64 threads averaged 70000 Kn/s in their test setup.


Handicap conditions:
1 minute per move
32MB
No book
No Syzygy bases

Full Strength:
40 moves in 40 minutes repeating
2048MB
Cerebellum Light polyglot book
5 men Syzygy bases

Link to pgn: http://www.mediafire.com/file/aykwmy1ak ... dicap_Test

Code: Select all


   # PLAYER           :  RATING  ERROR  PLAYED   (%)  CFS(%)    W    D    L  D(%)
   1 Full Strength    :    85.5   30.6     100    62     100   23   77    0    77
   2 Handicapped      :     0.0   ----     100    38     ---    0   77   23    77

White advantage = 71.44 +/- 16.55
Draw rate (equal opponents) = 100.00 % +/- 0.76

Code: Select all

Number of games in Handicap_Test.pgn = 100  (without a result = 0)
Date range: 2018.01.14 - 2018.01.16
Number of: White Elos = 0  Black Elos = 0  Both = 0
Number of White wins = 21 ( 21.0 % )
Number of draws =      77 ( 77.0 % )
Number of Black wins = 2 ( 2.0 % )
White score = 59.5 %
Black score = 40.5 %
Number of ECOs = 100  A: 0  B: 0  C: 100  D: 0  E: 0
Number of PlyCounts = 100  range: 29-384  average = 138.88

Code: Select all


                                        W H I T E                ::          B L A C K

          Name                   Win  : Draw  : Lose  :    %     ::   Win  : Draw  : Lose  :    %

Full Strength                    21+  :  29=  :   0-  :  71.0%   ::    2+  :  48=  :   0-  :  52.0%
Handicapped                       0+  :  48=  :   2-  :  48.0%   ::    0+  :  29=  :  21-  :  29.0%

Code: Select all


                  White       White                   White
ECO      Games    Score        Win    :    Draw   :    Loss

C48         1     50.0%   :      0+   :      1=   :      0-
C50         6     41.7%   :      0+   :      5=   :      1-
C53        11     50.0%   :      0+   :     11=   :      0-
C55        50     71.0%   :     21+   :     29=   :      0-
C65         8     43.8%   :      0+   :      7=   :      1-
C67        24     50.0%   :      0+   :     24=   :      0-

Code: Select all

Engine                 Depth       Time   Games     Moves  Average Forfeit
Full Strength          40.78  113:07:57      99      5089    80.03     0
Handicapped            38.51  109:12:25      99      6617    59.41     0


Time control comparison between engines

Depth     : Average search depth
Time      : Total time engine used
Moves     : Total moves engine played
Average   : Average time per move in centi-seconds
Forfeit   : Games engine lost due to time forfeit

List is sorted on Average Time indicating the engine that uses the most time tops.

Note: Protools 1.4 does not count book moves.

Code: Select all

Unique positions after X moves

2 moves: 1
4 moves: 7
6 moves: 14
8 moves: 25
10 moves: 53
12 moves: 66

In all, there were 98 unique games out of 100 played.

Milos
Posts: 3389
Joined: Wed Nov 25, 2009 12:47 am

Re: Stockfish Handicap Test

Post by Milos » Thu Jan 18, 2018 12:47 pm

Adam Hair wrote:This is an attempt (using an outdated computer) to determine how much Stockfish may have been "handicapped" against AlphaZero.

Engine: Brainfish 140118
CPU: Xeon L5420 @ 2.5 GHz
Number of cores used per engine: 1
Adjudication: none

The generic Stockfish 8 Linux compile from abrok.eu averages about 1100 Kn/s with 1 core on this computer. In comparison, DeepMind reported that Stockfish 8 @ 64 threads averaged 70000 Kn/s in their test setup.


Handicap conditions:
1 minute per move
32MB
No book
No Syzygy bases

Full Strength:
40 moves in 40 minutes repeating
2048MB
Cerebellum Light polyglot book
5 men Syzygy bases

Link to pgn: http://www.mediafire.com/file/aykwmy1ak ... dicap_Test

Code: Select all


   # PLAYER           :  RATING  ERROR  PLAYED   (%)  CFS(%)    W    D    L  D(%)
   1 Full Strength    :    85.5   30.6     100    62     100   23   77    0    77
   2 Handicapped      :     0.0   ----     100    38     ---    0   77   23    77

White advantage = 71.44 +/- 16.55
Draw rate (equal opponents) = 100.00 % +/- 0.76
Great one Adam. It just proves what some of us were saying from the very beginning, that full strength SF is totally on par with A0. And all this using most probably only dual Xeon E5-2697A v4 i.e. 32 real cores and 64 threads (70Mnps from the paper was most probably SF8 bench on that machine and not real games sample).
If they run it on octa Xeon E7-8894 v4 they'd get at least additional 80-100Elo and still use much cheaper machine to run SF than what was used to run A0.
Nothing to remove from A0 achievement, but a lot weaker result in terms of advertising. It is really reasonable assumption that Google ppl are not dumb and that they also did some test like this before actually choosing match conditions.

Leo
Posts: 858
Joined: Fri Sep 16, 2016 4:55 pm
Location: USA/Minnesota
Full name: Leo Anger

Re: Stockfish Handicap Test

Post by Leo » Thu Jan 18, 2018 4:56 pm

I always thought AZ subtly handicapped SF.
Advanced Micro Devices fan.

Dann Corbit
Posts: 10243
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Stockfish Handicap Test

Post by Dann Corbit » Thu Jan 18, 2018 7:05 pm

Now let's give the cerebellum book and the EGTB files to AZ also.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

User avatar
Ovyron
Posts: 3097
Joined: Tue Jul 03, 2007 2:30 am

Re: Stockfish Handicap Test

Post by Ovyron » Thu Jan 18, 2018 9:42 pm

Dann Corbit wrote:Now let's give the cerebellum book and the EGTB files to AZ also.
Unfortunately, all games would be started from this position due to Cerebellum's determinism:

[d]r5k1/1p4p1/1bp3Rp/p6r/3p1p2/2P5/PP3PKN/R1B5 b - -

Milos
Posts: 3389
Joined: Wed Nov 25, 2009 12:47 am

Re: Stockfish Handicap Test

Post by Milos » Thu Jan 18, 2018 9:52 pm

Dann Corbit wrote:Now let's give the cerebellum book and the EGTB files to AZ also.
Since A0 can't use them that would be pretty useless.
That's the same kind of reasoning as let's give SF 4 TPUs.

Milos
Posts: 3389
Joined: Wed Nov 25, 2009 12:47 am

Re: Stockfish Handicap Test

Post by Milos » Thu Jan 18, 2018 9:58 pm

Ovyron wrote:Unfortunately, all games would be started from this position due to Cerebellum's determinism:
A0 is already fully deterministic, so any idiot could learn how to beat it in no time. But ofc you wouldn't know that.

User avatar
Ovyron
Posts: 3097
Joined: Tue Jul 03, 2007 2:30 am

Re: Stockfish Handicap Test

Post by Ovyron » Thu Jan 18, 2018 10:04 pm

Milos wrote:
Dann Corbit wrote:Now let's give the cerebellum book and the EGTB files to AZ also.
Since A0 can't use them that would be pretty useless.
That's the same kind of reasoning as let's give SF 4 TPUs.
You really think that Google ppl are that stupid not to know how to implement a trivial bin book reader or tablebase access? Something that would require 10mins top to implement if you ever looked at such code of any modern engine?
Gee, I really thought you were a smarter person. What you believe seems equally convincing as believing in Santa Clause...

Milos
Posts: 3389
Joined: Wed Nov 25, 2009 12:47 am

Re: Stockfish Handicap Test

Post by Milos » Thu Jan 18, 2018 10:11 pm

Ovyron wrote:You really think that Google ppl are that stupid not to know how to implement a trivial bin book reader or tablebase access? Something that would require 10mins top to implement if you ever looked at such code of any modern engine?
If they are able or not to implement it is irrelevant. A0 doesn't have that option because it is explicitly said in the paper that A0 is not using any domain knowledge and both opening books and EGBT's are domain knowledge.
Or you are suggesting they are simply lying?
Your insisting on irrelevant and off-topic "arguments" is really silly. Careful my friend, ppl might think you are just trolling. ;)

syzygy
Posts: 4460
Joined: Tue Feb 28, 2012 10:56 pm

Re: Stockfish Handicap Test

Post by syzygy » Thu Jan 18, 2018 11:02 pm

Milos wrote:
Dann Corbit wrote:Now let's give the cerebellum book and the EGTB files to AZ also.
Since A0 can't use them that would be pretty useless.
That's the same kind of reasoning as let's give SF 4 TPUs.
Stockfish can't use the Cerebellum book either.

Post Reply