Alphazero news

yanquis1972 · Post by **yanquis1972** » Tue Dec 18, 2018 6:03 pm

But as far as I can tell there’s little reason to conclude variety is the issue. Again, at longer TC, with a far lower leela ratio, test10 is performing fine against SF with what I understand to be a high quality book (albeit one that’s likely weaker than SF10 by itself). You can see in the pgns I linked that there’s significantly more opening variety in the SF+book set.

I’d still like to see bookx distributes because as of now i believe that either the quality of moves (should be able to test at stc vs SF10) or the openings favored by it give Lc0 trouble.

jp · Post by jp » Tue Dec 18, 2018 7:03 pm

yanquis1972 wrote: ↑Tue Dec 18, 2018 11:52 am
jp wrote: ↑Tue Dec 18, 2018 9:16 am I think the error margin for only 50 games is too large.
I plan to run more, but it was pretty consistent, and has been across the board. So far SF10 + test10 (11248) are both outperforming SF10+book, albeit in small samples. (3W 16D 1L for SF10, 2W 3D 0L for id11248 — four draws and a loss against SF10.)

Just looking at Kai's results for rough numbers, I see for 40 games the error is ~ +/- 80 elo.
To judge the perfect2017 book with the quite small elo margin you got, you need the error to be much smaller than that.
Isn't whatever software you're using for the elo telling you what the errors are too?

Uri Blass · Post by **Uri Blass** » Tue Dec 18, 2018 7:10 pm

jp wrote: ↑Mon Dec 17, 2018 11:57 pm
Uri Blass wrote: ↑Mon Dec 17, 2018 10:06 pm
jp wrote: ↑Mon Dec 17, 2018 8:54 pm
Uri Blass wrote: ↑Mon Dec 17, 2018 8:43 pm I think that your assumption is wrong at least for stockfish.
The authors of stockfish do not use a big book or a special strong book when they test changes in the code because they assume stockfish will not need to play the opening by itself.

The book that they use is 2moves_v1.pgn that based on my understanding contain 2 random moves by white and black.
They are not specifically testing for opening play & there is no code specific for opening play.

If they were trying to develop its opening play, they would not be using 2moves_v1.pgn.
I do not understand.

2moves_v1.pgn means that stockfish play by itself in the opening in testing so changes that help in the opening stage can be productive.
We can look at the online SF log of suggested changes tested. Many are technical (e.g. pruning, time management, etc.) suggestions. Many are tactical suggestions. There are suggestions for endgame tweaks. I cannot see any that are even vaguely to do with the opening. (But have a look for yourself. I didn't look at every entry, of course, only 2018 ones.)

Stockfish basically test what people suggest to test and there are no rules not to test patches that should help in the opening.

The point is not what they change in the code but that basically they test stockfish also in the opening stage so they are not going to accept a patch that cause stockfish to be bad in the opening stage because they do not care about it and start testing from middle game positions when they may accept patches that cause stockfish to be stronger in the opening stage.

Pruning or time management patches can help also in the opening and playing the opening better does not mean changing evaluation terms that are relevant to the opening.

Patches that change something in the evaluation of the middlegame can also help in the opening because by search stockfish get to middle game positions even in the opening.

jp · Post by jp » Tue Dec 18, 2018 7:14 pm

I don't see any of the SF tests directly related to openings. Sure, if some idea makes the opening much worse or better, then the test will fail or pass. But there's no reason to think those ideas are changing the SF opening play a lot, especially when you see lots of tests about endgame stuff like passed pawns. If they were motivated to improve the bookless opening play, it'd be different. The SF team does not forbid opening ideas. It's just that I don't see any in the log. Look at the log.

yanquis1972 · Post by **yanquis1972** » Tue Dec 18, 2018 11:09 pm

jp wrote: ↑Tue Dec 18, 2018 7:03 pm
yanquis1972 wrote: ↑Tue Dec 18, 2018 11:52 am
jp wrote: ↑Tue Dec 18, 2018 9:16 am I think the error margin for only 50 games is too large.
I plan to run more, but it was pretty consistent, and has been across the board. So far SF10 + test10 (11248) are both outperforming SF10+book, albeit in small samples. (3W 16D 1L for SF10, 2W 3D 0L for id11248 — four draws and a loss against SF10.)
Just looking at Kai's results for rough numbers, I see for 40 games the error is ~ +/- 80 elo.
To judge the perfect2017 book with the quite small elo margin you got, you need the error to be much smaller than that.
Isn't whatever software you're using for the elo telling you what the errors are too?

No, not in the GUI itself anyway, but you can probably find a program or site online that’ll do so with w/d/l info. After 18 more games (59 vs each) score is +2 =43 -14 vs SF10 and +4 =48 -7 against SF10+book (now perfect2018)

Laskos · Post by **Laskos** » Wed Dec 19, 2018 7:34 am

Now, my results at ultra-fast time control: Lc0 at 10s+0.1s, SF10 at 8s+0.08s. Both with Overhead set to 0ms and timemargin set to 100ms in Cutechess-Cli, to not have time losses for overstepping several milliseconds.

500 games each match.

From the Initial Board position (their dominant opening in tests):

Score of lc0_v191_11261 vs SF10: 179 - 131 - 190 [0.548] 500
Elo difference: 33.46 +/- 24.07
Finished match

From 12 "human openings":

Score of lc0_v191_11261 vs SF10: 176 - 119 - 205 [0.557] 500
Elo difference: 39.78 +/- 23.44
Finished match

From 50 TCEC 9 positions:

Score of lc0_v191_11261 vs SF10: 146 - 168 - 186 [0.478] 500
Elo difference: -15.30 +/- 24.15
Finished match

=======================================================================

The above results were shown in their paper. Now what they could do. The first test (Initial Board) has very constricted variety and a huge systematic error, but they used it extensively. The "human openings" seem a bit cherry-picked.

From 2moves_v1.pgn (40,000+ random 2-movers):

Score of lc0_v191_11261 vs SF10: 155 - 178 - 167 [0.477] 500
Elo difference: -15.99 +/- 24.87
Finished match

From AdamHair_8moves_100133.pgn (many 8-mover fairly balanced, very varied 8-movers, built and used by Adam Hair, among other things, a CCRL tester):

Score of lc0_v191_11261 vs SF10: 126 - 195 - 179 [0.431] 500
Elo difference: -48.25 +/- 24.51
Finished match

SF10 using BookX.bin (it's not an 8-mover, it's a 15MB good quality small, varied polyglot book. Against Lc0, SF usually exits the book in 4-14 moves):

Score of lc0_v191_11261 vs SF10: 98 - 216 - 186 [0.382] 500
Elo difference: -83.57 +/- 24.41
Finished match

=======================================================================

All in all, as I already said, the paper's most representative result, having some variety too, is their TCEC openings test result. And that test shows that A0 is not really destroying SF8, and is probably not above SF10 in strength using those openings. They could have gotten even worse results for A0 using very varied 8-mover PGN opening positions or allowing SF8 to have a varied book.

yanquis1972 · Post by **yanquis1972** » Wed Dec 19, 2018 6:22 pm

after 93 games at 5+5, test30 is stable at -70 elo vs SF10 & -21 elo vs SF10+book (perfect2017/2018). will attach pgn's later, going to try to get a decent sample of SF10 vs SF10+book results.

Jouni · Post by **Jouni** » Thu Dec 20, 2018 10:32 am

We need a LIVE match A0 vs SF10/SFdev to clear current status. One year old paper isn't enough!

jp · Post by jp » Thu Dec 20, 2018 11:14 pm

yanquis1972 wrote: ↑Wed Dec 19, 2018 6:22 pm after 93 games at 5+5, test30 is stable at -70 elo vs SF10 & -21 elo vs SF10+book (perfect2017/2018). will attach pgn's later, going to try to get a decent sample of SF10 vs SF10+book results.

What is the size of perfect2017/2018?

yanquis1972 · Post by **yanquis1972** » Fri Dec 21, 2018 8:10 am

theres a link and summary here: https://sites.google.com/site/computers ... t2017books

It’s meant for engine-engine matches, seems to be a good compromise of quality and diversity.

Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news