But as far as I can tell there’s little reason to conclude variety is the issue. Again, at longer TC, with a far lower leela ratio, test10 is performing fine against SF with what I understand to be a high quality book (albeit one that’s likely weaker than SF10 by itself). You can see in the pgns I linked that there’s significantly more opening variety in the SF+book set.
I’d still like to see bookx distributes because as of now i believe that either the quality of moves (should be able to test at stc vs SF10) or the openings favored by it give Lc0 trouble.
Alphazero news
Moderators: hgm, Rebel, chrisw
-
- Posts: 1473
- Joined: Mon Apr 23, 2018 7:54 am
Re: Alphazero news
Just looking at Kai's results for rough numbers, I see for 40 games the error is ~ +/- 80 elo.yanquis1972 wrote: ↑Tue Dec 18, 2018 11:52 amI plan to run more, but it was pretty consistent, and has been across the board. So far SF10 + test10 (11248) are both outperforming SF10+book, albeit in small samples. (3W 16D 1L for SF10, 2W 3D 0L for id11248 — four draws and a loss against SF10.)
To judge the perfect2017 book with the quite small elo margin you got, you need the error to be much smaller than that.
Isn't whatever software you're using for the elo telling you what the errors are too?
Last edited by jp on Tue Dec 18, 2018 7:10 pm, edited 1 time in total.
-
- Posts: 10460
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Alphazero news
Stockfish basically test what people suggest to test and there are no rules not to test patches that should help in the opening.jp wrote: ↑Mon Dec 17, 2018 11:57 pmWe can look at the online SF log of suggested changes tested. Many are technical (e.g. pruning, time management, etc.) suggestions. Many are tactical suggestions. There are suggestions for endgame tweaks. I cannot see any that are even vaguely to do with the opening. (But have a look for yourself. I didn't look at every entry, of course, only 2018 ones.)Uri Blass wrote: ↑Mon Dec 17, 2018 10:06 pmI do not understand.jp wrote: ↑Mon Dec 17, 2018 8:54 pmThey are not specifically testing for opening play & there is no code specific for opening play.Uri Blass wrote: ↑Mon Dec 17, 2018 8:43 pm I think that your assumption is wrong at least for stockfish.
The authors of stockfish do not use a big book or a special strong book when they test changes in the code because they assume stockfish will not need to play the opening by itself.
The book that they use is 2moves_v1.pgn that based on my understanding contain 2 random moves by white and black.
If they were trying to develop its opening play, they would not be using 2moves_v1.pgn.
2moves_v1.pgn means that stockfish play by itself in the opening in testing so changes that help in the opening stage can be productive.
The point is not what they change in the code but that basically they test stockfish also in the opening stage so they are not going to accept a patch that cause stockfish to be bad in the opening stage because they do not care about it and start testing from middle game positions when they may accept patches that cause stockfish to be stronger in the opening stage.
Pruning or time management patches can help also in the opening and playing the opening better does not mean changing evaluation terms that are relevant to the opening.
Patches that change something in the evaluation of the middlegame can also help in the opening because by search stockfish get to middle game positions even in the opening.
-
- Posts: 1473
- Joined: Mon Apr 23, 2018 7:54 am
Re: Alphazero news
I don't see any of the SF tests directly related to openings. Sure, if some idea makes the opening much worse or better, then the test will fail or pass. But there's no reason to think those ideas are changing the SF opening play a lot, especially when you see lots of tests about endgame stuff like passed pawns. If they were motivated to improve the bookless opening play, it'd be different. The SF team does not forbid opening ideas. It's just that I don't see any in the log. Look at the log.
Last edited by jp on Tue Dec 18, 2018 7:15 pm, edited 1 time in total.
-
- Posts: 1766
- Joined: Wed Jun 03, 2009 12:14 am
Re: Alphazero news
No, not in the GUI itself anyway, but you can probably find a program or site online that’ll do so with w/d/l info. After 18 more games (59 vs each) score is +2 =43 -14 vs SF10 and +4 =48 -7 against SF10+book (now perfect2018)jp wrote: ↑Tue Dec 18, 2018 7:03 pmJust looking at Kai's results for rough numbers, I see for 40 games the error is ~ +/- 80 elo.yanquis1972 wrote: ↑Tue Dec 18, 2018 11:52 amI plan to run more, but it was pretty consistent, and has been across the board. So far SF10 + test10 (11248) are both outperforming SF10+book, albeit in small samples. (3W 16D 1L for SF10, 2W 3D 0L for id11248 — four draws and a loss against SF10.)
To judge the perfect2017 book with the quite small elo margin you got, you need the error to be much smaller than that.
Isn't whatever software you're using for the elo telling you what the errors are too?
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Alphazero news
Now, my results at ultra-fast time control: Lc0 at 10s+0.1s, SF10 at 8s+0.08s. Both with Overhead set to 0ms and timemargin set to 100ms in Cutechess-Cli, to not have time losses for overstepping several milliseconds.
500 games each match.
From the Initial Board position (their dominant opening in tests):
Score of lc0_v191_11261 vs SF10: 179 - 131 - 190 [0.548] 500
Elo difference: 33.46 +/- 24.07
Finished match
From 12 "human openings":
Score of lc0_v191_11261 vs SF10: 176 - 119 - 205 [0.557] 500
Elo difference: 39.78 +/- 23.44
Finished match
From 50 TCEC 9 positions:
Score of lc0_v191_11261 vs SF10: 146 - 168 - 186 [0.478] 500
Elo difference: -15.30 +/- 24.15
Finished match
=======================================================================
The above results were shown in their paper. Now what they could do. The first test (Initial Board) has very constricted variety and a huge systematic error, but they used it extensively. The "human openings" seem a bit cherry-picked.
From 2moves_v1.pgn (40,000+ random 2-movers):
Score of lc0_v191_11261 vs SF10: 155 - 178 - 167 [0.477] 500
Elo difference: -15.99 +/- 24.87
Finished match
From AdamHair_8moves_100133.pgn (many 8-mover fairly balanced, very varied 8-movers, built and used by Adam Hair, among other things, a CCRL tester):
Score of lc0_v191_11261 vs SF10: 126 - 195 - 179 [0.431] 500
Elo difference: -48.25 +/- 24.51
Finished match
SF10 using BookX.bin (it's not an 8-mover, it's a 15MB good quality small, varied polyglot book. Against Lc0, SF usually exits the book in 4-14 moves):
Score of lc0_v191_11261 vs SF10: 98 - 216 - 186 [0.382] 500
Elo difference: -83.57 +/- 24.41
Finished match
=======================================================================
All in all, as I already said, the paper's most representative result, having some variety too, is their TCEC openings test result. And that test shows that A0 is not really destroying SF8, and is probably not above SF10 in strength using those openings. They could have gotten even worse results for A0 using very varied 8-mover PGN opening positions or allowing SF8 to have a varied book.
500 games each match.
From the Initial Board position (their dominant opening in tests):
Score of lc0_v191_11261 vs SF10: 179 - 131 - 190 [0.548] 500
Elo difference: 33.46 +/- 24.07
Finished match
From 12 "human openings":
Score of lc0_v191_11261 vs SF10: 176 - 119 - 205 [0.557] 500
Elo difference: 39.78 +/- 23.44
Finished match
From 50 TCEC 9 positions:
Score of lc0_v191_11261 vs SF10: 146 - 168 - 186 [0.478] 500
Elo difference: -15.30 +/- 24.15
Finished match
=======================================================================
The above results were shown in their paper. Now what they could do. The first test (Initial Board) has very constricted variety and a huge systematic error, but they used it extensively. The "human openings" seem a bit cherry-picked.
From 2moves_v1.pgn (40,000+ random 2-movers):
Score of lc0_v191_11261 vs SF10: 155 - 178 - 167 [0.477] 500
Elo difference: -15.99 +/- 24.87
Finished match
From AdamHair_8moves_100133.pgn (many 8-mover fairly balanced, very varied 8-movers, built and used by Adam Hair, among other things, a CCRL tester):
Score of lc0_v191_11261 vs SF10: 126 - 195 - 179 [0.431] 500
Elo difference: -48.25 +/- 24.51
Finished match
SF10 using BookX.bin (it's not an 8-mover, it's a 15MB good quality small, varied polyglot book. Against Lc0, SF usually exits the book in 4-14 moves):
Score of lc0_v191_11261 vs SF10: 98 - 216 - 186 [0.382] 500
Elo difference: -83.57 +/- 24.41
Finished match
=======================================================================
All in all, as I already said, the paper's most representative result, having some variety too, is their TCEC openings test result. And that test shows that A0 is not really destroying SF8, and is probably not above SF10 in strength using those openings. They could have gotten even worse results for A0 using very varied 8-mover PGN opening positions or allowing SF8 to have a varied book.
-
- Posts: 1766
- Joined: Wed Jun 03, 2009 12:14 am
Re: Alphazero news
after 93 games at 5+5, test30 is stable at -70 elo vs SF10 & -21 elo vs SF10+book (perfect2017/2018). will attach pgn's later, going to try to get a decent sample of SF10 vs SF10+book results.
-
- Posts: 3425
- Joined: Wed Mar 08, 2006 8:15 pm
Re: Alphazero news
We need a LIVE match A0 vs SF10/SFdev to clear current status. One year old paper isn't enough!
Jouni
-
- Posts: 1473
- Joined: Mon Apr 23, 2018 7:54 am
Re: Alphazero news
What is the size of perfect2017/2018?yanquis1972 wrote: ↑Wed Dec 19, 2018 6:22 pm after 93 games at 5+5, test30 is stable at -70 elo vs SF10 & -21 elo vs SF10+book (perfect2017/2018). will attach pgn's later, going to try to get a decent sample of SF10 vs SF10+book results.
-
- Posts: 1766
- Joined: Wed Jun 03, 2009 12:14 am
Re: Alphazero news
theres a link and summary here: https://sites.google.com/site/computers ... t2017books
It’s meant for engine-engine matches, seems to be a good compromise of quality and diversity.
It’s meant for engine-engine matches, seems to be a good compromise of quality and diversity.