Guess what, why this happens is probably because of those who have a clue about how it may work probably wouldn't bother to argue with those self-complacent experts in the future way of developing chess engines.
Come on. This is unfair. Lcero is the first public NN+MCTS engine. So nobody "has a clue". Even the most elementary things are unclear (like if gating is good or bad). I am not even talking about more nebulous things like selecting good meta parameters, training schedule....
Those that are calling "Rollback! Rollback!" should realize that this means throwing away millions of user contributed games. The developers are understandably anxious about this, especially as it is not clear if such a rollback would provide any benefits.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Guess what, why this happens is probably because of those who have a clue about how it may work probably wouldn't bother to argue with those self-complacent experts in the future way of developing chess engines.
Come on. This is unfair. Lcero is the first public NN+MCTS engine. So nobody "has a clue". Even the most elementary things are unclear (like if gating is good or bad). I am not even talking about more nebulous things like selecting good meta parameters, training schedule....
Those that are calling "Rollback! Rollback!" should realize that this means throwing away millions of user contributed games. The developers are understandably anxious about this, especially as it is not clear if such a rollback would provide any benefits.
Seriously, where did you hear from that a rollback is equal to wiping out all self-play games and start over? Talking about stuff gets thrown away if a rollback happens, that would be a bunch of poorly performing network weights due to bug or bad training parameters, which were all produced by ONE machine in a few days that was doing the training, plus somewhere around half a million of toxic self-play games due to various bugs. All this can be done side by side until they achieve a beneficial outcome, then publish the re-trained network and move on from there. If the developers are only expressing their anxieties but not doing anything in this regard, which is also a part of experimenting how to better train the networks in general, would be just of pure laziness.
With LC0 it already had surpassed this with proper settings (on my rig with quad i5 and GTX1060 6GB). With LCZero v0.10 it is another story. I am finishing a second 254-game gauntletn with default settings and NN237 and NN314 are about equal strength, scoring roughly -55 Elo against Spike 1.4 (3039 CCRL), so that is probably good for 3100 CCRL with LC0 optimized. Kind of an interesting result really since NN314 is clearly worse in tactics and still suffers from suicidal evaluations here and there, which NN237 did not, but results are results.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
With LC0 it already had surpassed this with proper settings (on my rig with quad i5 and GTX1060 6GB). With LCZero v0.10 it is another story. I am finishing a second 254-game gauntletn with default settings and NN237 and NN314 are about equal strength, scoring roughly -55 Elo against Spike 1.4 (3039 CCRL), so that is probably good for 3100 CCRL with LC0 optimized. Kind of an interesting result really since NN314 is clearly worse in tactics and still suffers from suicidal evaluations here and there, which NN237 did not, but results are results.
It seems to be a bit more complicated. At 1'+ 1'' on GTX 1060 with LC0 CUDA, latest nets seem even stronger than NN237. But at 15'+ 15'', NN237 seems stronger. I left NN319 against Komodo 10.2, it lost 5 games in a row due to tactical blunders. Eval graph was also unstable. I interrupted the match and reverted to NN237, and in 5 games until now, there are 2 wins of Komodo and 3 draws. Only one game was lost due to blunder. Still waiting for one win of LC0 in 10 games. The sample is too small, but I saw a similar thing in games against Houdini 1.5a. It seems NN237 scales better with TC or playouts, having a better value head eval. It is strange, as at nodes=1, latest nets are some 150-200 Elo points stronger than NN237. Really, they have to roll back to v0.7 engine, the current nets are trained in some schizophrenic way with v0.10.
With LC0 it already had surpassed this with proper settings (on my rig with quad i5 and GTX1060 6GB). With LCZero v0.10 it is another story. I am finishing a second 254-game gauntletn with default settings and NN237 and NN314 are about equal strength, scoring roughly -55 Elo against Spike 1.4 (3039 CCRL), so that is probably good for 3100 CCRL with LC0 optimized. Kind of an interesting result really since NN314 is clearly worse in tactics and still suffers from suicidal evaluations here and there, which NN237 did not, but results are results.
It seems to be a bit more complicated. At 1'+ 1'' on GTX 1060 with LC0 CUDA, latest nets seem even stronger than NN237. But at 15'+ 15'', NN237 seems stronger. I left NN319 against Komodo 10.2, it lost 5 games in a row due to tactical blunders. Eval graph was also unstable. I interrupted the match and reverted to NN237, and in 5 games until now, there are 2 wins of Komodo and 3 draws. Only one game was lost due to blunder. Still waiting for one win of LC0 in 10 games. The sample is too small, but I saw a similar thing in games against Houdini 1.5a. It seems NN237 scales better with TC or playouts, having a better value head eval. It is strange, as at nodes=1, latest nets are some 150-200 Elo points stronger than NN237. Really, they have to roll back to v0.7 engine, the current nets are trained in some schizophrenic way with v0.10.
I think the issue is purely one of sample size, Kai. I have many ridiculous losses in 2 moves (though game went on) by 314, but in the long-run it ended that way. Ex: after 40 games it was performing about -160 Elo against Spike, but by game 130 it was 50-50, and by end of full run of 255 games, it was -54 Elo.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
With LC0 it already had surpassed this with proper settings (on my rig with quad i5 and GTX1060 6GB). With LCZero v0.10 it is another story. I am finishing a second 254-game gauntletn with default settings and NN237 and NN314 are about equal strength, scoring roughly -55 Elo against Spike 1.4 (3039 CCRL), so that is probably good for 3100 CCRL with LC0 optimized. Kind of an interesting result really since NN314 is clearly worse in tactics and still suffers from suicidal evaluations here and there, which NN237 did not, but results are results.
It seems to be a bit more complicated. At 1'+ 1'' on GTX 1060 with LC0 CUDA, latest nets seem even stronger than NN237. But at 15'+ 15'', NN237 seems stronger. I left NN319 against Komodo 10.2, it lost 5 games in a row due to tactical blunders. Eval graph was also unstable. I interrupted the match and reverted to NN237, and in 5 games until now, there are 2 wins of Komodo and 3 draws. Only one game was lost due to blunder. Still waiting for one win of LC0 in 10 games. The sample is too small, but I saw a similar thing in games against Houdini 1.5a. It seems NN237 scales better with TC or playouts, having a better value head eval. It is strange, as at nodes=1, latest nets are some 150-200 Elo points stronger than NN237. Really, they have to roll back to v0.7 engine, the current nets are trained in some schizophrenic way with v0.10.
I think the issue is purely one of sample size, Kai. I have many ridiculous losses in 2 moves (though game went on) by 314, but in the long-run it ended that way. Ex: after 40 games it was performing about -160 Elo against Spike, but by game 130 it was 50-50, and by end of full run of 255 games, it was -54 Elo.
Sure, sample is very small. At longer TC, I have a combined some 15 games of NN237 and 15 games of latest NN. But games lost due to tactical blunders are some 9-10 for latest nets and 2-3 for NN237. Also, the eval graph is smoother with NN237. This is borderline significant, although you surely can dismiss this as anecdotal.
Laskos wrote: ↑Sun May 20, 2018 8:24 pm
It seems to be a bit more complicated. At 1'+ 1'' on GTX 1060 with LC0 CUDA, latest nets seem even stronger than NN237. But at 15'+ 15'', NN237 seems stronger. I left NN319 against Komodo 10.2, it lost 5 games in a row due to tactical blunders. Eval graph was also unstable. I interrupted the match and reverted to NN237, and in 5 games until now, there are 2 wins of Komodo and 3 draws. Only one game was lost due to blunder. Still waiting for one win of LC0 in 10 games. The sample is too small, but I saw a similar thing in games against Houdini 1.5a. It seems NN237 scales better with TC or playouts, having a better value head eval. It is strange, as at nodes=1, latest nets are some 150-200 Elo points stronger than NN237. Really, they have to roll back to v0.7 engine, the current nets are trained in some schizophrenic way with v0.10.
There was a discussion about rollback on Discord yesterday, it isn't happening. At low node counts (800), current nets are far stronger than Id 237, although as you observed they don't scale quite as well (yet). But the quality of the value head is still improving, which is also the deciding factor in determining scaling properties. I'm not too worried this won't fix itself in the end, since we're going to upgrade to a 256x20 network eventually when there's no more improvement on the 192x15 architecture. Lc0 beating Komodo on your setup may not be happening yet, but I'm optimistic that it will soon, either still on 192x15 or at the latest once we go 256x20 (the AlphaZero size).
what's the latest about smart pruning on/off? i'd heard a few people say it made little difference, but i noticed w/ the client update lc0 chews thru plies much faster. wonder if it's a hindrance atm..?
edit: to expand on this, with specific regards to kai's post, i'd be curious if elo doesn't improve significantly against top engines + 'long' TC.
maybe this feature is even mostly redundant or even regressive in match play on the whole.