Fat Fritz destroyed Stockfish!

Guenther · Post by **Guenther** » Sun Nov 17, 2019 10:42 pm

schack wrote: ↑Sun Nov 17, 2019 9:53 pm Link to review, citing this very thread at Talkchess!

https://new.uschess.org/news/fat-fritz- ... -review-i/

Code: Select all

ChessBase struggled along with Fritz 14 and 15, both rebranded and marginally improved versions of Rybka 4.1, and Fritz 16, a “skin” of the formerly private engine Pandix.

There is a little quirk in this quote from your review.
F14 was based on Pandix and F15/F16 on Rybka.

schack · Post by **schack** » Sun Nov 17, 2019 11:07 pm

Darn. Will fix. Thanks.

Laskos · Post by **Laskos** » Mon Nov 18, 2019 12:17 am

I checked a bit today Fat Fritz, and it came as following:

Strength at 60'' + 0.6'' against one of the best 20bx256 nets (JHorthos one)

Code: Select all

Score of FatFritz vs lc0_T40B4_200: 6 - 27 - 67  [0.395] 100
Elo difference: -74.06 +/- 38.21
Finished match

On test suites, tactical and positional:

Code: Select all

Tactical
Arasan21beta

Fat Fritz:   score=106/199 [averages on correct positions: depth=6.3 time=1.11 nodes=15106]
Lc0 T40 B4:  score=116/199 [averages on correct positions: depth=6.8 time=1.09 nodes=11576]

Positional
Openings199

Fat Fritz:   score=158/199 [averages on correct positions: depth=4.7 time=0.95 nodes=14221]
Lc0 T40 B4:  score=170/199 [averages on correct positions: depth=4.6 time=0.70 nodes=9720]

And finally, regarding how different Fat Fritz plays compared to the main Lc0 zero runs:

Sim03 (8200+ positions), the similarity matrix:

Code: Select all

  Key:

  1) Fat Fritz (time: 100 ms  scale: 2.5)
  2) Lc0 11248 (time: 100 ms  scale: 2.5)
  3) Lc0 32930 (time: 100 ms  scale: 2.5)
  4) Lc0 42850 (time: 100 ms  scale: 2.5)
  5) SF dev    (time: 100 ms  scale: 2.5)

         1     2     3     4     5
  1.  ----- 71.17 73.20 73.36 53.18
  2.  71.17 ----- 72.66 72.04 53.79
  3.  73.20 72.66 ----- 76.22 52.62
  4.  73.36 72.04 76.22 ----- 52.97
  5.  53.18 53.79 52.62 52.97 -----

The text reads:
"Silver used “supervised learning” to train Fat Fritz: the engine was fed hand-picked data, mostly from MegaBase, correspondence games, and top-level engine battles. Reinforcement learning was then used to help refine the network and strengthen it."

The supervised learning seems to have brought little, as Fat Fritz is closer in move selection to T30 and T40 zero runs than to T10 zero run. Did Albert use many games from T30 and T40 runs? The dendrogram is here:

Also, strength-wise, it is probably the level of 11248 net, being more similar in move choices to 42850 net.

I also included SF_dev in the dendrogram, to see how far different is a really different, similar strength engine from all these NN based engines, be them Lc0 or Fat Fritz.

George Tsavdaris · Post by **George Tsavdaris** » Mon Nov 18, 2019 12:47 am

Laskos wrote: ↑Mon Nov 18, 2019 12:17 am Also, strength-wise, it is probably the level of 11248 net, being more similar in move choices to 42850 net.

No way that 11248 could be able to beat latest SF dev 52-48 in a 100 games match.
It would lose badly.

Hopefully tomorrow i will play a 1'+0.6" 100 games match of FatFritz vs lc0_T40B4_200 to see how it will be compared to your match.

Laskos · Post by **Laskos** » Mon Nov 18, 2019 12:52 am

George Tsavdaris wrote: ↑Mon Nov 18, 2019 12:47 am
Laskos wrote: ↑Mon Nov 18, 2019 12:17 am Also, strength-wise, it is probably the level of 11248 net, being more similar in move choices to 42850 net.
No way that 11248 could be able to beat latest SF dev 52-48 in a 100 games match.
It would lose badly.

I don't know how the outcome is on each hardware configuration. 11248 is by no more than 100 Elo points weaker than the best T40 nets.

Raphexon · Post by **Raphexon** » Mon Nov 18, 2019 1:18 am

Laskos wrote: ↑Mon Nov 18, 2019 12:52 am
George Tsavdaris wrote: ↑Mon Nov 18, 2019 12:47 am
Laskos wrote: ↑Mon Nov 18, 2019 12:17 am Also, strength-wise, it is probably the level of 11248 net, being more similar in move choices to 42850 net.
No way that 11248 could be able to beat latest SF dev 52-48 in a 100 games match.
It would lose badly.
I don't know how the outcome is on each hardware configuration. 11248 is by no more than 100 Elo points weaker than the best T40 nets.

I think at long TC and strong hardware the bigger T40 nets might very well be 100+ ELO stronger.
At 60+0.6, no. I don't think any net is more than 100 elo stronger than the 11248.

Laskos · Post by **Laskos** » Mon Nov 18, 2019 1:25 am

Raphexon wrote: ↑Mon Nov 18, 2019 1:18 am
Laskos wrote: ↑Mon Nov 18, 2019 12:52 am
George Tsavdaris wrote: ↑Mon Nov 18, 2019 12:47 am
Laskos wrote: ↑Mon Nov 18, 2019 12:17 am Also, strength-wise, it is probably the level of 11248 net, being more similar in move choices to 42850 net.
No way that 11248 could be able to beat latest SF dev 52-48 in a 100 games match.
It would lose badly.
I don't know how the outcome is on each hardware configuration. 11248 is by no more than 100 Elo points weaker than the best T40 nets.
I think at long TC and strong hardware the bigger T40 nets might very well be 100+ ELO stronger.
At 60+0.6, no. I don't think any net is more than 100 elo stronger than the 11248.

Same 20bx256 size nets, just different formats. T40 might scale a bit better, but a bit. I doubt at any time control and hardware best T40 nets are better than 11248 by more than 100 Elo points (Elo compression enters too to LTC).

Anyway, it is fairly clear that Fat Fritz is in the same pool of main Lc0 zero runs, and if one wants to put it harshly, is just a crippled 42850 or T40B4_200. The supervised learning brought little even style wise.

Dann Corbit · Post by **Dann Corbit** » Mon Nov 18, 2019 1:37 am

Re: "The supervised learning brought little even style wise."
my impression is that fat fritz is clearly better than the average nn tactically and worse positionally.

I have no idea how this translates to games

Laskos · Post by **Laskos** » Mon Nov 18, 2019 1:40 am

Dann Corbit wrote: ↑Mon Nov 18, 2019 1:37 am Re: "The supervised learning brought little even style wise."
my impression is that fat fritz is clearly better than the average nn tactically and worse positionally.

I have no idea how this translates to games

Not sure. I presented results in tactical and positional test-suites earlier. Might check with WAC suite later.

Daniel Shawul · Post by **Daniel Shawul** » Mon Nov 18, 2019 3:47 am

Laskos wrote: ↑Mon Nov 18, 2019 12:17 am
And finally, regarding how different Fat Fritz plays compared to the main Lc0 zero runs:

Sim03 (8200+ positions), the similarity matrix:
Code: Select all
  Key:

  1) Fat Fritz (time: 100 ms  scale: 2.5)
  2) Lc0 11248 (time: 100 ms  scale: 2.5)
  3) Lc0 32930 (time: 100 ms  scale: 2.5)
  4) Lc0 42850 (time: 100 ms  scale: 2.5)
  5) SF dev    (time: 100 ms  scale: 2.5)

         1     2     3     4     5
  1.  ----- 71.17 73.20 73.36 53.18
  2.  71.17 ----- 72.66 72.04 53.79
  3.  73.20 72.66 ----- 76.22 52.62
  4.  73.36 72.04 76.22 ----- 52.97
  5.  53.18 53.79 52.62 52.97 -----
The text reads:
"Silver used “supervised learning” to train Fat Fritz: the engine was fed hand-picked data, mostly from MegaBase, correspondence games, and top-level engine battles. Reinforcement learning was then used to help refine the network and strengthen it."

The supervised learning seems to have brought little, as Fat Fritz is closer in move selection to T30 and T40 zero runs than to T10 zero run. Did Albert use many games from T30 and T40 runs? The dendrogram is here:

Also, strength-wise, it is probably the level of 11248 net, being more similar in move choices to 42850 net.

I also included SF_dev in the dendrogram, to see how far different is a really different, similar strength engine from all these NN based engines, be them Lc0 or Fat Fritz.

Kai,

Albert provided tensorflow training graphs (policy/value loss metrics etc) on lczero discord that shows he has done 240k steps of training ( that is more than a third of what A0 did about 700k steps) on top of the supervized net using 4 GPUs for 161 days (5 and 1/2 months!!). I have no reason not to believe him unless you think the graph is faked which I highly doubt is the case. I don't have the patience or the hardware to put that kind of effort but if someone wants to do it, all the power to them!

Even if Fat Fritz turned out to be similar to T30 / T40, who cares. Why are T30 and T40 similar to each other in the first place? Maybe many training roads lead to similar kind of nets...and leela don't own that style of net.

If the claim was that the net was this strong just from supervized training, I would have highly doubted it. My experience in supervised training is that you would be very lucky to get something like 200 elo weaker than T30, and my effort in that regard got me to 3150 ccrl elo i think and DarkQueen is probably +120 elo stronger than that but it uses stockfish evaluations from filtered lichess games. Supervised training is really hard in my experience i think because of lack of "coherent set of data" that will guide and cure the holes in your net. You could try and grab a set of games and train your net, but it will then weaken something in your net while improving something else..keeping you in a loop. Selfplay keeps on fixing the holes in a net with the right learning rate IMO.

Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!