mvanthoor wrote: ↑Sat Jul 27, 2019 10:06 pm
Today I ran a short 20-game match between Stockfish 10 and Lc0. Specs of the match:
Stockfish 10 x64 BMI2 on Intel i7-6700K, 4 threads, 8GB hashtable
Lc0 0.21.3, w42850 on GTX 1070. 4 threads, everything else default.
Syzygy 5 men tablebase, 8 move Performance.bin opening book.
Adjucation by GUI bo overwhelming material advantage or Syzygy when win/draw/loss in endgame.
The result was +4 -1 =15 in favor of Stockfish 10.
To be honest, after all the hype surrounding Lc0, I find the result to be disappointing. I'd expected the result to be the reverse, to be honest.When looking into networks, I found
https://www.sp-cc.de/lc0-testing.htm, and the network I used is stronger than the ones used there (+60 ELO).
I haven't looked into things such as Leela Ratio or anything yet. I'm not trying to match one engine against another on the same hardware or anything: I wanted to know: how much stronger or weaker is Lc0, running on a GTX 1070, compared to Stockfish running to the specifications of CCRL 40/4?
I ran the match at a time control of 40 moves in 85 seconds as, on my computer, that is the setting to use for CCRL 40/4. In CCRL 40/4. I wanted to know where a full power Lc0 on GTX1070 would fall in the CCRL 40/4 list. Stockfish has a rating of 3547, and the result of +4-1=15 shows a rating advantage of +52 of Stockfish over Lc0, setting Lc0 at 3495. That is only 6 points above the rating of 3486 which Lc0 attains in the CCRL 40/4 list (al be it with a different network), despite the GTX 1070 being a much more powerful card that the GTX 1050. That seems disappointing.
Also, the games are not very interesting. Often, after 30-35 moves or so, everything has been traded down to an endgame. Also, it's often Stockfish preventing a draw by threefold repetition (because of the default contempt probably), and even so, many games ended in threefold repetition. In some games, Leela makes exceedingly weird moves, and lost game 1 in 21 moves because of a blunder. With regard to Stockfish, I can mostly understand what it's trying to do with a move, but with Lc0, I'm often left guessing. Because Lc0 "only" searches 10K nodes or so in the endgame, while Stockfish is often already into the 10+ million, Stockfish reaches the endgame database much faster. I often see Leela struggling to look beyond 12 ply or so, while Stockfish is soaring into the 40 ply range, reaching the endgame database from the late middle game.
Of course, my expectation wasn't for Lc0 to blow Stockfish out of the water with a 20-0 result, but I did expect it to win with a +2 score or so. Could/should I be using a different network (I've seen some networks that were smaller, faster, and had a higher ELO-rating than the 42850 I used)? Are my expectations wrong, and is a GTX 1070 just not powerful enough?
I don't play a lot of games. I always pick a midrange card; in this case I picked the GTX 1070 in 2016, because of The Witcher 3, but if I don't acquire a newer game that needs a lot more power, this card is likely to also be in my next computer. I do need/use a lot of CPU-power for some of my tasks, so the 6700K will probably be replaced by a 12 core machine, at least. If Stockfish already wins by +4-1, running on an old i7-6700K against Lc0 on a GTX 1070, I shudder to think how it would decimate Lc0 @ GTX 1070 when running on one of the new Zen3 CPU's with 12 or 16 cores if I should get a new computer (but not a new graphics card).
PS: I found the JH.T6.532a net used for the CCRL rating. I'll rerun a longer test. The match will be run at 40 moves in 85 seconds repeating to comply with the CCRL 40/4 list, and Lc0 0.21.3 JH.T6.532 will run full-out on a GTX 1070. That should give an approximation of Elo-difference between the GTX 1050 and 1070, at least for this particular net.
PS2: I have also put the current list into a spreadsheet with a filter on the Elo-field. The strongest network I've found is 10968, from august 22, 2018 (so it's an old one... how can it be so strong? Were all the other networks much weaker... meaning that old network Elo can't be related to new networks? I'll run a test using that network as well.)