Houston: We have lift off ...

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Houston: We have lift off ...

Post by Laskos »

Leto wrote: Mon Nov 19, 2018 1:18 am
Laskos wrote: Fri Nov 16, 2018 3:56 pm I don't know why you are so enthusiastic. Runs 20xxx and 30xxx are pretty pathetic, especially considering how much resources they have eaten up. Some folks there have overdone something. Just a quick check with the latest engine (rc4) and one of the latest nets:

TC: 60'' + 1''

Code: Select all

Rank Name                          Elo     +/-   Games   Score   Draws
     SF8                           120      68      60   66.7%   43.3%
   
   1 lc0_v19_11261                   0     111      20   50.0%   50.0%
   2 lc0_v19_31214                -147     128      20   30.0%   40.0%
   3 lc0_v19_9155                 -241     127      20   20.0%   40.0%
Finished match
So, run 30xxx is still ~150 Elo points below run 10xxx, and barely ~100 Elo points above 6x64 net 9155 (run 9xxx). Taking into account that the games with 6x64 net were 10-12 times faster and taking into account the hardware resources allocated, the whole run 9xxx could have been completed in less than a day. Lame runs, these newest ones. But I still hope that they will improve some 200 real Elo points over current level, although this is not granted at all.
I don't think Test30 is this close to Test10 in strength, I still think it's several hundred elo weaker. What's 60" + 1", is that game in 1 minute with an extra second per move?
Yes, 1m + 1s. It is close, Test30 is about 100 Elo points weaker than Test10.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Houston: We have lift off ...

Post by Laskos »

glennsamuel32 wrote: Mon Nov 19, 2018 3:52 am I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy

Rank Name Elo + - games score oppo. draws
1 31311 9 10 10 1000 52% -9 28%
2 31255 -9 10 10 1000 48% 9 28%
Still self-play, and not sure what to make out of 1 node result.
glennsamuel32
Posts: 136
Joined: Sat Dec 04, 2010 5:31 pm
Location: 223

Re: Houston: We have lift off ...

Post by glennsamuel32 »

Laskos wrote: Mon Nov 19, 2018 6:41 am
glennsamuel32 wrote: Mon Nov 19, 2018 3:52 am I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy

Rank Name Elo + - games score oppo. draws
1 31311 9 10 10 1000 52% -9 28%
2 31255 -9 10 10 1000 48% 9 28%
Still self-play, and not sure what to make out of 1 node result.
Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw :)
Therefore, I used SF8 as a sparring partner.
I think 1 node games with a high sample number tests not only the policy of the network, but also it's all-round strength.

Are the results below comparable to yours ?

Score of 31311 vs stockfish_8_x64_bmi2: 58 - 806 - 136 [0.126]
Elo difference: -336.46 +/- 27.08
1000 of 1000 games finished.

Score of 31330 vs stockfish_8_x64_bmi2: 74 - 780 - 146 [0.147]
Elo difference: -305.45 +/- 25.68
1000 of 1000 games finished.

Bayeselo Ratings
================

Image


Network Selfplay ELO Real ELO vs SF8
======= ========== =============
31311 5783.59 -161

31330 6043.22 -148
Judge without bias, or don't judge at all...
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Houston: We have lift off ...

Post by jp »

glennsamuel32 wrote: Mon Nov 19, 2018 7:39 am
I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
---
Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw :)
I guess this was because it was 1 node. Did it just repeat moves?
glennsamuel32
Posts: 136
Joined: Sat Dec 04, 2010 5:31 pm
Location: 223

Re: Houston: We have lift off ...

Post by glennsamuel32 »

jp wrote: Mon Nov 19, 2018 8:13 am
glennsamuel32 wrote: Mon Nov 19, 2018 7:39 am
I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
---
Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw :)
I guess this was because it was 1 node. Did it just repeat moves?
I didn't bother to check.
But basically ridiculous endgame play with 2 networks in a tournament.
Judge without bias, or don't judge at all...
Werewolf
Posts: 1795
Joined: Thu Sep 18, 2008 10:24 pm

Re: Houston: We have lift off ...

Post by Werewolf »

Over 1000 points now!

That graph :shock: :shock: :shock:
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Houston: We have lift off ...

Post by Laskos »

Werewolf wrote: Mon Nov 19, 2018 9:34 am Over 1000 points now!

That graph :shock: :shock: :shock:
It's all fake. 0 gain, at most 50 Elo points with some nets.

It's not :shock:
It's :mrgreen:
whereagles
Posts: 565
Joined: Thu Nov 13, 2014 12:03 pm

Re: Houston: We have lift off ...

Post by whereagles »

how can it raise to >6000 and still suck compared to 11248? I dont get it :shock:
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: Houston: We have lift off ...

Post by MikeB »

Laskos wrote: Mon Nov 19, 2018 10:30 am
Werewolf wrote: Mon Nov 19, 2018 9:34 am Over 1000 points now!

That graph :shock: :shock: :shock:
It's all fake. 0 gain, at most 50 Elo points with some nets.

It's not :shock:
It's :mrgreen:
you're right of course, it's starting to remind me of the Seinfeld episode where Elaine says "...fake, fake, fake, fake..." and then smiles,

https://www.youtube.com/watch?v=ywi9-MGUCy8
Image
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Houston: We have lift off ...

Post by chrisw »

whereagles wrote: Mon Nov 19, 2018 11:35 am how can it raise to >6000 and still suck compared to 11248? I dont get it :shock:
presumably because the self play learning generalises well at first, but eventually ends up fitting only to itself, and no longer generalising.

actually someone could possibly test this. As far as I know the self-play elo chart is generated by testing iteration x+1 against iteration x and so on.
did anybody try testing, say iteration 50 against iteration 50 + 100 or whatever? In other words does this additive elo gain showing up in the chart exist when looked at over a range of the chart?