Fat Fritz destroyed Stockfish!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Guenther
Posts: 4693
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Fat Fritz destroyed Stockfish!

Post by Guenther »

schack wrote: Sun Nov 17, 2019 9:53 pm Link to review, citing this very thread at Talkchess!

https://new.uschess.org/news/fat-fritz- ... -review-i/

Code: Select all

ChessBase struggled along with Fritz 14 and 15, both rebranded and marginally improved versions of Rybka 4.1, and Fritz 16, a “skin” of the formerly private engine Pandix. 
There is a little quirk in this quote from your review.
F14 was based on Pandix and F15/F16 on Rybka.
https://rwbc-chess.de

Talkchess nowadays is a joke - it is full of trolls/idiots/wafflers/clone lovers/people stuck in the pleistocene > 70% of the posts fall into this category...
schack
Posts: 177
Joined: Thu May 27, 2010 3:32 am

Re: Fat Fritz destroyed Stockfish!

Post by schack »

Darn. Will fix. Thanks. :)
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Fat Fritz destroyed Stockfish!

Post by Laskos »

I checked a bit today Fat Fritz, and it came as following:

Strength at 60'' + 0.6'' against one of the best 20bx256 nets (JHorthos one)

Code: Select all

Score of FatFritz vs lc0_T40B4_200: 6 - 27 - 67  [0.395] 100
Elo difference: -74.06 +/- 38.21
Finished match
On test suites, tactical and positional:

Code: Select all

Tactical
Arasan21beta

Fat Fritz:   score=106/199 [averages on correct positions: depth=6.3 time=1.11 nodes=15106]
Lc0 T40 B4:  score=116/199 [averages on correct positions: depth=6.8 time=1.09 nodes=11576]

Positional
Openings199

Fat Fritz:   score=158/199 [averages on correct positions: depth=4.7 time=0.95 nodes=14221]
Lc0 T40 B4:  score=170/199 [averages on correct positions: depth=4.6 time=0.70 nodes=9720]

And finally, regarding how different Fat Fritz plays compared to the main Lc0 zero runs:

Sim03 (8200+ positions), the similarity matrix:

Code: Select all

  Key:

  1) Fat Fritz (time: 100 ms  scale: 2.5)
  2) Lc0 11248 (time: 100 ms  scale: 2.5)
  3) Lc0 32930 (time: 100 ms  scale: 2.5)
  4) Lc0 42850 (time: 100 ms  scale: 2.5)
  5) SF dev    (time: 100 ms  scale: 2.5)

         1     2     3     4     5
  1.  ----- 71.17 73.20 73.36 53.18
  2.  71.17 ----- 72.66 72.04 53.79
  3.  73.20 72.66 ----- 76.22 52.62
  4.  73.36 72.04 76.22 ----- 52.97
  5.  53.18 53.79 52.62 52.97 -----
The text reads:
"Silver used “supervised learning” to train Fat Fritz: the engine was fed hand-picked data, mostly from MegaBase, correspondence games, and top-level engine battles. Reinforcement learning was then used to help refine the network and strengthen it."

The supervised learning seems to have brought little, as Fat Fritz is closer in move selection to T30 and T40 zero runs than to T10 zero run. Did Albert use many games from T30 and T40 runs? The dendrogram is here:

Image

Also, strength-wise, it is probably the level of 11248 net, being more similar in move choices to 42850 net.

I also included SF_dev in the dendrogram, to see how far different is a really different, similar strength engine from all these NN based engines, be them Lc0 or Fat Fritz.
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Fat Fritz destroyed Stockfish!

Post by George Tsavdaris »

Laskos wrote: Mon Nov 18, 2019 12:17 am Also, strength-wise, it is probably the level of 11248 net, being more similar in move choices to 42850 net.
No way that 11248 could be able to beat latest SF dev 52-48 in a 100 games match.
It would lose badly.

Hopefully tomorrow i will play a 1'+0.6" 100 games match of FatFritz vs lc0_T40B4_200 to see how it will be compared to your match.
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Fat Fritz destroyed Stockfish!

Post by Laskos »

George Tsavdaris wrote: Mon Nov 18, 2019 12:47 am
Laskos wrote: Mon Nov 18, 2019 12:17 am Also, strength-wise, it is probably the level of 11248 net, being more similar in move choices to 42850 net.
No way that 11248 could be able to beat latest SF dev 52-48 in a 100 games match.
It would lose badly.
I don't know how the outcome is on each hardware configuration. 11248 is by no more than 100 Elo points weaker than the best T40 nets.
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: Fat Fritz destroyed Stockfish!

Post by Raphexon »

Laskos wrote: Mon Nov 18, 2019 12:52 am
George Tsavdaris wrote: Mon Nov 18, 2019 12:47 am
Laskos wrote: Mon Nov 18, 2019 12:17 am Also, strength-wise, it is probably the level of 11248 net, being more similar in move choices to 42850 net.
No way that 11248 could be able to beat latest SF dev 52-48 in a 100 games match.
It would lose badly.
I don't know how the outcome is on each hardware configuration. 11248 is by no more than 100 Elo points weaker than the best T40 nets.
I think at long TC and strong hardware the bigger T40 nets might very well be 100+ ELO stronger.
At 60+0.6, no. I don't think any net is more than 100 elo stronger than the 11248.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Fat Fritz destroyed Stockfish!

Post by Laskos »

Raphexon wrote: Mon Nov 18, 2019 1:18 am
Laskos wrote: Mon Nov 18, 2019 12:52 am
George Tsavdaris wrote: Mon Nov 18, 2019 12:47 am
Laskos wrote: Mon Nov 18, 2019 12:17 am Also, strength-wise, it is probably the level of 11248 net, being more similar in move choices to 42850 net.
No way that 11248 could be able to beat latest SF dev 52-48 in a 100 games match.
It would lose badly.
I don't know how the outcome is on each hardware configuration. 11248 is by no more than 100 Elo points weaker than the best T40 nets.
I think at long TC and strong hardware the bigger T40 nets might very well be 100+ ELO stronger.
At 60+0.6, no. I don't think any net is more than 100 elo stronger than the 11248.
Same 20bx256 size nets, just different formats. T40 might scale a bit better, but a bit. I doubt at any time control and hardware best T40 nets are better than 11248 by more than 100 Elo points (Elo compression enters too to LTC).

Anyway, it is fairly clear that Fat Fritz is in the same pool of main Lc0 zero runs, and if one wants to put it harshly, is just a crippled 42850 or T40B4_200. The supervised learning brought little even style wise.
Dann Corbit
Posts: 12638
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Fat Fritz destroyed Stockfish!

Post by Dann Corbit »

Re: "The supervised learning brought little even style wise."
my impression is that fat fritz is clearly better than the average nn tactically and worse positionally.

I have no idea how this translates to games
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Fat Fritz destroyed Stockfish!

Post by Laskos »

Dann Corbit wrote: Mon Nov 18, 2019 1:37 am Re: "The supervised learning brought little even style wise."
my impression is that fat fritz is clearly better than the average nn tactically and worse positionally.

I have no idea how this translates to games
Not sure. I presented results in tactical and positional test-suites earlier. Might check with WAC suite later.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Fat Fritz destroyed Stockfish!

Post by Daniel Shawul »

Laskos wrote: Mon Nov 18, 2019 12:17 am
And finally, regarding how different Fat Fritz plays compared to the main Lc0 zero runs:

Sim03 (8200+ positions), the similarity matrix:

Code: Select all

  Key:

  1) Fat Fritz (time: 100 ms  scale: 2.5)
  2) Lc0 11248 (time: 100 ms  scale: 2.5)
  3) Lc0 32930 (time: 100 ms  scale: 2.5)
  4) Lc0 42850 (time: 100 ms  scale: 2.5)
  5) SF dev    (time: 100 ms  scale: 2.5)

         1     2     3     4     5
  1.  ----- 71.17 73.20 73.36 53.18
  2.  71.17 ----- 72.66 72.04 53.79
  3.  73.20 72.66 ----- 76.22 52.62
  4.  73.36 72.04 76.22 ----- 52.97
  5.  53.18 53.79 52.62 52.97 -----
The text reads:
"Silver used “supervised learning” to train Fat Fritz: the engine was fed hand-picked data, mostly from MegaBase, correspondence games, and top-level engine battles. Reinforcement learning was then used to help refine the network and strengthen it."

The supervised learning seems to have brought little, as Fat Fritz is closer in move selection to T30 and T40 zero runs than to T10 zero run. Did Albert use many games from T30 and T40 runs? The dendrogram is here:

Image

Also, strength-wise, it is probably the level of 11248 net, being more similar in move choices to 42850 net.

I also included SF_dev in the dendrogram, to see how far different is a really different, similar strength engine from all these NN based engines, be them Lc0 or Fat Fritz.
Kai,

Albert provided tensorflow training graphs (policy/value loss metrics etc) on lczero discord that shows he has done 240k steps of training ( that is more than a third of what A0 did about 700k steps) on top of the supervized net using 4 GPUs for 161 days (5 and 1/2 months!!). I have no reason not to believe him unless you think the graph is faked which I highly doubt is the case. I don't have the patience or the hardware to put that kind of effort but if someone wants to do it, all the power to them!

Even if Fat Fritz turned out to be similar to T30 / T40, who cares. Why are T30 and T40 similar to each other in the first place? Maybe many training roads lead to similar kind of nets...and leela don't own that style of net.

If the claim was that the net was this strong just from supervized training, I would have highly doubted it. My experience in supervised training is that you would be very lucky to get something like 200 elo weaker than T30, and my effort in that regard got me to 3150 ccrl elo i think and DarkQueen is probably +120 elo stronger than that but it uses stockfish evaluations from filtered lichess games. Supervised training is really hard in my experience i think because of lack of "coherent set of data" that will guide and cure the holes in your net. You could try and grab a set of games and train your net, but it will then weaken something in your net while improving something else..keeping you in a loop. Selfplay keeps on fixing the holes in a net with the right learning rate IMO.