Page 4 of 6
Re: Houston: We have lift off ...
Posted: Mon Nov 19, 2018 6:39 am
by Laskos
Leto wrote: ↑Mon Nov 19, 2018 1:18 am
Laskos wrote: ↑Fri Nov 16, 2018 3:56 pm
I don't know why you are so enthusiastic. Runs 20xxx and 30xxx are pretty pathetic, especially considering how much resources they have eaten up. Some folks there have overdone something. Just a quick check with the latest engine (rc4) and one of the latest nets:
TC: 60'' + 1''
Code: Select all
Rank Name Elo +/- Games Score Draws
SF8 120 68 60 66.7% 43.3%
1 lc0_v19_11261 0 111 20 50.0% 50.0%
2 lc0_v19_31214 -147 128 20 30.0% 40.0%
3 lc0_v19_9155 -241 127 20 20.0% 40.0%
Finished match
So, run 30xxx is still ~150 Elo points below run 10xxx, and barely ~100 Elo points above 6x64 net 9155 (run 9xxx). Taking into account that the games with 6x64 net were 10-12 times faster and taking into account the hardware resources allocated, the whole run 9xxx could have been completed in less than a day. Lame runs, these newest ones. But I still hope that they will improve some 200
real Elo points over current level, although this is not granted at all.
I don't think Test30 is this close to Test10 in strength, I still think it's several hundred elo weaker. What's 60" + 1", is that game in 1 minute with an extra second per move?
Yes, 1m + 1s. It is close, Test30 is about 100 Elo points weaker than Test10.
Re: Houston: We have lift off ...
Posted: Mon Nov 19, 2018 6:41 am
by Laskos
glennsamuel32 wrote: ↑Mon Nov 19, 2018 3:52 am
I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
Rank Name Elo + - games score oppo. draws
1 31311 9 10 10 1000 52% -9 28%
2 31255 -9 10 10 1000 48% 9 28%
Still self-play, and not sure what to make out of 1 node result.
Re: Houston: We have lift off ...
Posted: Mon Nov 19, 2018 7:39 am
by glennsamuel32
Laskos wrote: ↑Mon Nov 19, 2018 6:41 am
glennsamuel32 wrote: ↑Mon Nov 19, 2018 3:52 am
I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
Rank Name Elo + - games score oppo. draws
1 31311 9 10 10 1000 52% -9 28%
2 31255 -9 10 10 1000 48% 9 28%
Still self-play, and not sure what to make out of 1 node result.
Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw
Therefore, I used SF8 as a sparring partner.
I think 1 node games with a high sample number tests not only the policy of the network, but also it's all-round strength.
Are the results below comparable to yours ?
Score of 31311 vs stockfish_8_x64_bmi2: 58 - 806 - 136 [0.126]
Elo difference: -336.46 +/- 27.08
1000 of 1000 games finished.
Score of 31330 vs stockfish_8_x64_bmi2: 74 - 780 - 146 [0.147]
Elo difference: -305.45 +/- 25.68
1000 of 1000 games finished.
Bayeselo Ratings
================
Network Selfplay ELO Real ELO vs SF8
======= ========== =============
31311 5783.59 -161
31330 6043.22 -148
Re: Houston: We have lift off ...
Posted: Mon Nov 19, 2018 8:13 am
by jp
glennsamuel32 wrote: ↑Mon Nov 19, 2018 7:39 am
I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
---
Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw
I guess this was because it was 1 node. Did it just repeat moves?
Re: Houston: We have lift off ...
Posted: Mon Nov 19, 2018 8:34 am
by glennsamuel32
jp wrote: ↑Mon Nov 19, 2018 8:13 am
glennsamuel32 wrote: ↑Mon Nov 19, 2018 7:39 am
I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
---
Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw
I guess this was because it was 1 node. Did it just repeat moves?
I didn't bother to check.
But basically ridiculous endgame play with 2 networks in a tournament.
Re: Houston: We have lift off ...
Posted: Mon Nov 19, 2018 9:34 am
by Werewolf
Over 1000 points now!
That graph
Re: Houston: We have lift off ...
Posted: Mon Nov 19, 2018 10:30 am
by Laskos
Werewolf wrote: ↑Mon Nov 19, 2018 9:34 am
Over 1000 points now!
That graph
It's all fake. 0 gain, at most 50 Elo points with some nets.
It's not
It's
Re: Houston: We have lift off ...
Posted: Mon Nov 19, 2018 11:35 am
by whereagles
how can it raise to >6000 and still suck compared to 11248? I dont get it
Re: Houston: We have lift off ...
Posted: Mon Nov 19, 2018 1:51 pm
by MikeB
Laskos wrote: ↑Mon Nov 19, 2018 10:30 am
Werewolf wrote: ↑Mon Nov 19, 2018 9:34 am
Over 1000 points now!
That graph
It's all fake. 0 gain, at most 50 Elo points with some nets.
It's not
It's
you're right of course, it's starting to remind me of the Seinfeld episode where Elaine says "...fake, fake, fake, fake..." and then smiles,
https://www.youtube.com/watch?v=ywi9-MGUCy8
Re: Houston: We have lift off ...
Posted: Mon Nov 19, 2018 1:55 pm
by chrisw
whereagles wrote: ↑Mon Nov 19, 2018 11:35 am
how can it raise to >6000 and still suck compared to 11248? I dont get it
presumably because the self play learning generalises well at first, but eventually ends up fitting only to itself, and no longer generalising.
actually someone could possibly test this. As far as I know the self-play elo chart is generated by testing iteration x+1 against iteration x and so on.
did anybody try testing, say iteration 50 against iteration 50 + 100 or whatever? In other words does this additive elo gain showing up in the chart exist when looked at over a range of the chart?