Houston: We have lift off ...

Laskos · Post by **Laskos** » Mon Nov 19, 2018 6:39 am

Leto wrote: ↑Mon Nov 19, 2018 1:18 am
Laskos wrote: ↑Fri Nov 16, 2018 3:56 pm I don't know why you are so enthusiastic. Runs 20xxx and 30xxx are pretty pathetic, especially considering how much resources they have eaten up. Some folks there have overdone something. Just a quick check with the latest engine (rc4) and one of the latest nets:

TC: 60'' + 1''
Code: Select all
Rank Name                          Elo     +/-   Games   Score   Draws
     SF8                           120      68      60   66.7%   43.3%
   
   1 lc0_v19_11261                   0     111      20   50.0%   50.0%
   2 lc0_v19_31214                -147     128      20   30.0%   40.0%
   3 lc0_v19_9155                 -241     127      20   20.0%   40.0%
Finished match
So, run 30xxx is still ~150 Elo points below run 10xxx, and barely ~100 Elo points above 6x64 net 9155 (run 9xxx). Taking into account that the games with 6x64 net were 10-12 times faster and taking into account the hardware resources allocated, the whole run 9xxx could have been completed in less than a day. Lame runs, these newest ones. But I still hope that they will improve some 200 real Elo points over current level, although this is not granted at all.
I don't think Test30 is this close to Test10 in strength, I still think it's several hundred elo weaker. What's 60" + 1", is that game in 1 minute with an extra second per move?

Yes, 1m + 1s. It is close, Test30 is about 100 Elo points weaker than Test10.

Laskos · Post by **Laskos** » Mon Nov 19, 2018 6:41 am

glennsamuel32 wrote: ↑Mon Nov 19, 2018 3:52 am I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy

Rank Name Elo + - games score oppo. draws
1 31311 9 10 10 1000 52% -9 28%
2 31255 -9 10 10 1000 48% 9 28%

Still self-play, and not sure what to make out of 1 node result.

glennsamuel32 · Post by **glennsamuel32** » Mon Nov 19, 2018 7:39 am

Laskos wrote: ↑Mon Nov 19, 2018 6:41 am
glennsamuel32 wrote: ↑Mon Nov 19, 2018 3:52 am I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy

Rank Name Elo + - games score oppo. draws
1 31311 9 10 10 1000 52% -9 28%
2 31255 -9 10 10 1000 48% 9 28%
Still self-play, and not sure what to make out of 1 node result.

Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw

Therefore, I used SF8 as a sparring partner.
I think 1 node games with a high sample number tests not only the policy of the network, but also it's all-round strength.

Are the results below comparable to yours ?

Score of 31311 vs stockfish_8_x64_bmi2: 58 - 806 - 136 [0.126]
Elo difference: -336.46 +/- 27.08
1000 of 1000 games finished.

Score of 31330 vs stockfish_8_x64_bmi2: 74 - 780 - 146 [0.147]
Elo difference: -305.45 +/- 25.68
1000 of 1000 games finished.

Bayeselo Ratings
================

Network Selfplay ELO Real ELO vs SF8
======= ========== =============
31311 5783.59 -161

31330 6043.22 -148

jp · Post by jp » Mon Nov 19, 2018 8:13 am

glennsamuel32 wrote: ↑Mon Nov 19, 2018 7:39 am
I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
---
Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw

I guess this was because it was 1 node. Did it just repeat moves?

glennsamuel32 · Post by **glennsamuel32** » Mon Nov 19, 2018 8:34 am

jp wrote: ↑Mon Nov 19, 2018 8:13 am
glennsamuel32 wrote: ↑Mon Nov 19, 2018 7:39 am
I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
---
Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw
I guess this was because it was 1 node. Did it just repeat moves?

I didn't bother to check.
But basically ridiculous endgame play with 2 networks in a tournament.

Werewolf · Post by **Werewolf** » Mon Nov 19, 2018 9:34 am

Over 1000 points now!

That graph

Laskos · Post by **Laskos** » Mon Nov 19, 2018 10:30 am

Werewolf wrote: ↑Mon Nov 19, 2018 9:34 am Over 1000 points now!

That graph

It's all fake. 0 gain, at most 50 Elo points with some nets.

It's not

It's

whereagles · Post by **whereagles** » Mon Nov 19, 2018 11:35 am

how can it raise to >6000 and still suck compared to 11248? I dont get it

MikeB · Post by **MikeB** » Mon Nov 19, 2018 1:51 pm

Laskos wrote: ↑Mon Nov 19, 2018 10:30 am
Werewolf wrote: ↑Mon Nov 19, 2018 9:34 am Over 1000 points now!

That graph
It's all fake. 0 gain, at most 50 Elo points with some nets.

It's not
It's

you're right of course, it's starting to remind me of the Seinfeld episode where Elaine says "...fake, fake, fake, fake..." and then smiles,

https://www.youtube.com/watch?v=ywi9-MGUCy8

chrisw · Post by **chrisw** » Mon Nov 19, 2018 1:55 pm

whereagles wrote: ↑Mon Nov 19, 2018 11:35 am how can it raise to >6000 and still suck compared to 11248? I dont get it

presumably because the self play learning generalises well at first, but eventually ends up fitting only to itself, and no longer generalising.

actually someone could possibly test this. As far as I know the self-play elo chart is generated by testing iteration x+1 against iteration x and so on.
did anybody try testing, say iteration 50 against iteration 50 + 100 or whatever? In other words does this additive elo gain showing up in the chart exist when looked at over a range of the chart?

Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...