New 11258 distilled networks released.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

dkappe
Posts: 1631
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

New 11258 distilled networks released.

Post by dkappe »

48x5 and 120x10 https://github.com/dkappe/leela-chess-w ... d-Networks

The smaller end may work well on Raspberry Pi, the higher end is intended for CPU.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
User avatar
Graham Banks
Posts: 41423
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: New 11258 distilled networks released.

Post by Graham Banks »

dkappe wrote: Mon Feb 11, 2019 2:53 am 48x5 and 120x10 https://github.com/dkappe/leela-chess-w ... d-Networks

The smaller end may work well on Raspberry Pi, the higher end is intended for CPU.
Thanks.
gbanksnz at gmail.com
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: New 11258 distilled networks released.

Post by Eduard »

Thank you! On my slow PC, the 48x5 runs very well, and is not bad. Surprisingly good results in the analysis mode.
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: New 11258 distilled networks released.

Post by jp »

Eduard wrote: Mon Feb 11, 2019 5:49 am Surprisingly good results in the analysis mode.
Can you say more about what are "surprisingly good results"?
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: New 11258 distilled networks released.

Post by Eduard »

jp wrote: Thu Feb 14, 2019 1:46 am
Eduard wrote: Mon Feb 11, 2019 5:49 am Surprisingly good results in the analysis mode.
Can you say more about what are "surprisingly good results"?
Here I have posted some analyzes of other distilled networks:

forum3/viewtopic.php?f=2&t=69820

The net 48x5 is very good compared to its size. The 48x5 network makes on my 2x2,4 GHz i3 1-2 kn/s.

This gives a Lc0 Ratio of about 1 against Rybka 4.1 with 1 core. Of course this is not the same as with a big network. But should there be an Lc0 for Android, then this little NN would be fantastic!

Currently running on my PC a match against Rybka 4.1 x64 (1 core). Time control is 30+10. Rybka leads 3-2.

But, Lc0 (48x5) is clearly better in the opening and middlegame! I see that the bonus time for Lc0 is too little. Lc0 makes about 60 moves, then Lc0 has to play the rest with bonus time. :|

We will see how it is after 20 games. Then I will post some games and analyzes.
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: New 11258 distilled networks released.

Post by jp »

Eduard wrote: Thu Feb 14, 2019 2:50 am Here I have posted some analyzes of other distilled networks:

forum3/viewtopic.php?f=2&t=69820
Thanks. I'll take a look.
Spill_The_Tea
Posts: 24
Joined: Mon Dec 17, 2018 3:33 am
Full name: Jase de Lace

Re: New 11258 distilled networks released.

Post by Spill_The_Tea »

dkappe wrote: Mon Feb 11, 2019 2:53 am 48x5 and 120x10 https://github.com/dkappe/leela-chess-w ... d-Networks

The smaller end may work well on Raspberry Pi, the higher end is intended for CPU.
Can you define distilled networks?
Does this mean you are testing networks for best performance on cpu only (open blas backend)?
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: New 11258 distilled networks released.

Post by Raphexon »

Does anyone know how fast the 48x5 is with an RTX 2080ti or similar?

Also wth happened?

MM50 (Mephisto) port vs Lc0.
Both on a i5-4460 single thread
TC was 30+0.3 or 20+0.2 (I forgot)

[pgn][Event "?"] [Date "2019.04.12"] [Round "?"] [White "MM50-UCI"] [Black "lc0"] [Result "1/2-1/2"] [WhiteElo "?"] [BlackElo "?"] [Variant "Standard"] [TimeControl "-"] [ECO "B91"] [Opening "Sicilian Defense: Najdorf Variation, Zagreb (Fianchetto) Variation"] [Termination "Normal"] [Annotator "lichess.org"] 1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 a6 6. g3 { B91 Sicilian Defense: Najdorf Variation, Zagreb (Fianchetto) Variation } e5 7. Nde2 Be6 8. f4 Nbd7 9. f5 Bc4 10. b3 Bb5 11. Nxb5 axb5 12. Bg5 d5 13. exd5 h6 14. Be3 Bb4+ 15. c3 Bc5 16. Bg1 Bxg1 17. Nxg1 Qb6 18. Qd3 O-O 19. Qxb5 Qe3+ 20. Ne2 Nc5 21. Bg2 Nd3+ 22. Kd1 Ng4 23. Rf1 Nxh2 24. Rg1 Ng4 25. Bf1 e4 26. Kc2 Qf2 27. Qxb7 Rfb8 28. Qe7 Re8 29. Qd7 Nde5 30. Qc7 Rac8 31. Qd6 Nf3 32. Rh1 Rcd8 33. Qa6 Rxd5 34. Qc6 Red8 35. Kb2 e3 36. Qa6 Rd2+ 37. Ka3 R8d5 38. Bh3 Nge5 39. Bf1 Ng4 40. Bh3 Ngh2 41. Nf4 Rc5 42. Nd3 Qxg3 43. Nxc5 e2 44. Ne4 Qf4 45. Qc8+ Kh7 46. Nxd2 Qxd2 47. Qc7 Kg8 48. Qb8+ Kh7 49. Qe8 h5 50. Rhc1 Ng4 51. Bxg4 e1=Q 52. Rxe1 hxg4 53. Rh1+ Nh2 54. Qb8 g3 55. Qxg3 Qd6+ 56. Qxd6 f6 57. Rxh2+ Kg8 58. Rb1 Kf7 59. Ra1 Ke8 60. Rb1 Kf7 61. Ra1 Kg8 62. Rb1 Kf7 { The game is a draw. } 1/2-1/2[/pgn]
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: New 11258 distilled networks released.

Post by Raphexon »

Net I used was 16x2 btw. :oops:
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: New 11258 distilled networks released.

Post by Ferdy »

dkappe wrote: Mon Feb 11, 2019 2:53 am 48x5 and 120x10 https://github.com/dkappe/leela-chess-w ... d-Networks

The smaller end may work well on Raspberry Pi, the higher end is intended for CPU.
Thanks for creating these nets. Am trying to find the best net according to my old cpu (i7-2600K 3.4 Ghz), so will be using a blas backend.

I found 64x6 to be the best so far. Rating list refs are anchored on engines with CCRL 40/4 ratings. Will be adding some tests from time to time.

Code: Select all

TC 3'+2s

   # PLAYER                             :  RATING  POINTS  PLAYED   (%)
   1 GreKo 2018.08 2720                 :  2720.0    11.0      20    55
   2 Lc0 v0.21.1 w11258-64x6-se blas    :  2695.6    40.5      70    58
   3 Wyldchess 1.51 2630                :  2630.0     6.0      20    30
   4 Glass 2.0 2610                     :  2610.0     5.5      10    55
   5 Floyd 0.9 2580                     :  2580.0     7.0      20    35

White advantage = -15.79
Draw rate (equal opponents) = 22.15 %
Saw your 11258 tests at https://github.com/dkappe/leela-chess-w ... tournament
64x6 is not really far behind.


Created a multiple linear regression test on that data to possibly approximate the rating based on filters and blocks as independent variables.

Data table source:
https://github.com/dkappe/leela-chess-w ... tournament

Code: Select all

    Filters  Blocks  Rating
0       112      12    2988
1       120       9    2981
2       104       9    2977
3        80       7    2977
4       112      10    2966
5        64       6    2966
6       128       9    2958
7       120      10    2946
8        96       8    2945
9       128      10    2937
10       48       5    2909
11       32       4    2787
12       24       3    2703
13       16       2    2408

Same data sorted by Filters and Blocks in descending orders.

Code: Select all

    Filters  Blocks  Rating
9       128      10    2937
6       128       9    2958
7       120      10    2946
1       120       9    2981
0       112      12    2988
4       112      10    2966
2       104       9    2977
8        96       8    2945
3        80       7    2977
5        64       6    2966
10       48       5    2909
11       32       4    2787
12       24       3    2703
13       16       2    2408

Plot for rating vs filters
Image

and rating vs blocks
Image

Then use sklearn regression to get intercepts and coefficients.

Code: Select all

sklearn multiple linear regression results:
Intercept: 2584.2790502045655
Coefficients: [ 0.69001166 33.18384124]
It comes up with the formula

Code: Select all

Rating = 2584 + (Filters x 0.69) + (Blocks x 33.18)
Independent variables: Filters and Blocks

Sample prediction calculation.

Code: Select all

Sample prediction #1:
Filters = 120
Blocks = 8
Rating predition: 2933

Sample prediction #2:
Filters = 64
Blocks = 8
Rating predition: 2894
Not perfect but perhaps prediction accuracy increases as more data will be available.

Python source:

Code: Select all

# -*- coding: utf-8 -*-
"""
mlr.py

Lc0 Distilled Networks Multiple Linear Regression

Credits:
    https://datatofish.com/multiple-linear-regression-python/
    https://github.com/dkappe/leela-chess-weights/wiki/Distilled-Networks
    
"""


from pandas import DataFrame
from sklearn import linear_model
import matplotlib.pyplot as plt


Lc0DistilledData = {
        'Filters': [112,120,104,80,112,64,128,120,96,128,48,32,24,16],
        'Blocks': [12,9,9,7,10,6,9,10,8,10,5,4,3,2],
        'Rating': [2988,2981,2977,2977,2966,2966,2958,2946,2945,2937,
                   2909,2787,2703,2408]
        }
 

def main():
    
    df = DataFrame(Lc0DistilledData, columns=['Filters','Blocks','Rating'])
    
    print('Data table source:')
    print('https://github.com/dkappe/leela-chess-weights/wiki/Distilled-Networks#11258-focused-tournament')
    print(df)
    
    print('\nSorted by Filters and Blocks descending:')
    sorted_blocks = df.sort_values(by='Blocks', ascending=False)
    sorted_filters = sorted_blocks.sort_values(by='Filters', ascending=False)
    print(sorted_filters)
     
    plt.figure(num=None, figsize=(6, 3), dpi=80)
    plt.scatter(df['Filters'], df['Rating'], color='red')
    plt.title('Rating Vs Filters', fontsize=14)
    plt.xlabel('Filters', fontsize=14)
    plt.ylabel('Rating', fontsize=14)
    plt.grid(True)
    plt.show()
     
    plt.figure(num=None, figsize=(6, 3), dpi=80)
    plt.scatter(df['Blocks'], df['Rating'], color='green')
    plt.title('Rating Vs Blocks', fontsize=14)
    plt.xlabel('Blocks', fontsize=14)
    plt.ylabel('Rating', fontsize=14)
    plt.grid(True) 
    plt.show()
    
    # Mul reg
    X = df[['Filters','Blocks']]
    Y = df['Rating']
    
    # Using sklearn multiple linear regression
    regr = linear_model.LinearRegression()
    regr.fit(X, Y)
    
    print('sklearn multiple linear regression results:')    
    print('Intercept: {}'.format(regr.intercept_))
    print('Coefficients: {}'.format(regr.coef_))
    
    print('\nFormula:')
    print('Rating = {:0.0f} + (Filters x {:0.2f}) + (Blocks x {:0.2f})'.\
          format(regr.intercept_, regr.coef_[0], regr.coef_[1]))
    print('Independent variables: {} and {}'.format('Filters', 'Blocks'))
    
    # Prediction samples
    print('\nSample prediction #1:')
    new_filters = 120
    new_blocks = 8
    print('Filters = {}'.format(new_filters))
    print('Blocks = {}'.format(new_blocks))
    print ('Rating predition: {0:0.0f}'.\
           format(float(regr.predict([[new_filters ,new_blocks]]))))
    
    print('\nSample prediction #2:')
    new_filters = 64
    new_blocks = 8
    print('Filters = {}'.format(new_filters))
    print('Blocks = {}'.format(new_blocks))
    print ('Rating predition: {0:0.0f}'.\
           format(float(regr.predict([[new_filters ,new_blocks]]))))
    

if __name__ == '__main__':
    main()