Yes, 1m + 1s. It is close, Test30 is about 100 Elo points weaker than Test10.Leto wrote: ↑Mon Nov 19, 2018 12:18 amI don't think Test30 is this close to Test10 in strength, I still think it's several hundred elo weaker. What's 60" + 1", is that game in 1 minute with an extra second per move?Laskos wrote: ↑Fri Nov 16, 2018 2:56 pmI don't know why you are so enthusiastic. Runs 20xxx and 30xxx are pretty pathetic, especially considering how much resources they have eaten up. Some folks there have overdone something. Just a quick check with the latest engine (rc4) and one of the latest nets:
TC: 60'' + 1''
So, run 30xxx is still ~150 Elo points below run 10xxx, and barely ~100 Elo points above 6x64 net 9155 (run 9xxx). Taking into account that the games with 6x64 net were 10-12 times faster and taking into account the hardware resources allocated, the whole run 9xxx could have been completed in less than a day. Lame runs, these newest ones. But I still hope that they will improve some 200 real Elo points over current level, although this is not granted at all.Code: Select all
Rank Name Elo +/- Games Score Draws SF8 120 68 60 66.7% 43.3% 1 lc0_v19_11261 0 111 20 50.0% 50.0% 2 lc0_v19_31214 -147 128 20 30.0% 40.0% 3 lc0_v19_9155 -241 127 20 20.0% 40.0% Finished match
Houston: We have lift off ...
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Re: Houston: We have lift off ...
Re: Houston: We have lift off ...
Still self-play, and not sure what to make out of 1 node result.glennsamuel32 wrote: ↑Mon Nov 19, 2018 2:52 amI ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
Rank Name Elo + - games score oppo. draws
1 31311 9 10 10 1000 52% -9 28%
2 31255 -9 10 10 1000 48% 9 28%
-
- Posts: 125
- Joined: Sat Dec 04, 2010 4:31 pm
- Location: 223
Re: Houston: We have lift off ...
Yes, I realized my mistake. Please ignore the last test.Laskos wrote: ↑Mon Nov 19, 2018 5:41 amStill self-play, and not sure what to make out of 1 node result.glennsamuel32 wrote: ↑Mon Nov 19, 2018 2:52 amI ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
Rank Name Elo + - games score oppo. draws
1 31311 9 10 10 1000 52% -9 28%
2 31255 -9 10 10 1000 48% 9 28%
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw

Therefore, I used SF8 as a sparring partner.
I think 1 node games with a high sample number tests not only the policy of the network, but also it's all-round strength.
Are the results below comparable to yours ?
Score of 31311 vs stockfish_8_x64_bmi2: 58 - 806 - 136 [0.126]
Elo difference: -336.46 +/- 27.08
1000 of 1000 games finished.
Score of 31330 vs stockfish_8_x64_bmi2: 74 - 780 - 146 [0.147]
Elo difference: -305.45 +/- 25.68
1000 of 1000 games finished.
Bayeselo Ratings
================

Network Selfplay ELO Real ELO vs SF8
======= ========== =============
31311 5783.59 -161
31330 6043.22 -148
Judge without bias, or don't judge at all...
Re: Houston: We have lift off ...
I guess this was because it was 1 node. Did it just repeat moves?glennsamuel32 wrote: ↑Mon Nov 19, 2018 6:39 am
I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
---
Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw![]()
-
- Posts: 125
- Joined: Sat Dec 04, 2010 4:31 pm
- Location: 223
Re: Houston: We have lift off ...
I didn't bother to check.jp wrote: ↑Mon Nov 19, 2018 7:13 amI guess this was because it was 1 node. Did it just repeat moves?glennsamuel32 wrote: ↑Mon Nov 19, 2018 6:39 am
I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
---
Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw![]()
But basically ridiculous endgame play with 2 networks in a tournament.
Judge without bias, or don't judge at all...
Re: Houston: We have lift off ...
Over 1000 points now!
That graph

That graph



-
- Posts: 565
- Joined: Thu Nov 13, 2014 11:03 am
Re: Houston: We have lift off ...
how can it raise to >6000 and still suck compared to 11248? I dont get it 

Re: Houston: We have lift off ...
you're right of course, it's starting to remind me of the Seinfeld episode where Elaine says "...fake, fake, fake, fake..." and then smiles,
https://www.youtube.com/watch?v=ywi9-MGUCy8
Re: Houston: We have lift off ...
presumably because the self play learning generalises well at first, but eventually ends up fitting only to itself, and no longer generalising.whereagles wrote: ↑Mon Nov 19, 2018 10:35 amhow can it raise to >6000 and still suck compared to 11248? I dont get it![]()
actually someone could possibly test this. As far as I know the self-play elo chart is generated by testing iteration x+1 against iteration x and so on.
did anybody try testing, say iteration 50 against iteration 50 + 100 or whatever? In other words does this additive elo gain showing up in the chart exist when looked at over a range of the chart?