Laskos wrote:
Yes, some sort of list. For ECM200.epd middlegame tactical suite (200 positions), analyzed for 20s/position. At this time control and my hardware, LC0 performs overall (Elo-wise) comparably to GreKo 6.5 2330 Elo CCRL standard A/B engine, which fares much better tactically (but much worse positionally). And it seems on this tactical middlegame suite ID124 is still the best of the nets.
Having watched around 100+ games of ID150+ and ~40 games of ID 156 versus 2100-3100 CCRL ELO opponents, i see that LC0(with that IDs as also with previous) completely outplays positionaly the other engines in many many cases, just to miss in at least 80% of them a tactical hit that either cost LC0 the win or even the draw and it loses.
LC0 is on par i dare to say with Stockfish dev in evaluation, but of course is ultra weak in tactics. It's even better than Stockfish in King attacks as i have seen. In placing its pieces to attack. Not in executing the attack since in that aspect is fails miserably due to bad tactics. The pattern recognition its NNs are offering it to see how to attack the King, seem to be extremely prosperous.
Meanwhile ID160 had a good jump in self-play ELO.
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
I think the approach is trying to summarize simulation results, which is good at handling general cases.
Those tactical lines are isolated incidents which it can never solve, while with minimax the search will develop that line deep and fast enough to see it.
It is not like we cannot write better evaluation code, but once a while it turns out that a simplification actually gains ELO because the search will run faster. LC0 is doing the opposite, and people seem to ignore the fact that it's evaluation is just slow, and blame the hardware for poor performance, even with A0's hardware you get some 80k NPS, convert that naively to CPU, 1 TPU ~= 10x 1080TI, one 1080TI ~= 32 CPU cores. so for A0 that's 4*10*32 = 1280 CPU cores.
Given that many CPU cores I'm sure I can get more than +100 ELO against a 64-core SF8 to get that result.
In fact, my test shows that it would only need about 1/3 of A0's hardware performance to get there, not including all the training efforts, now you tell me which is more efficient?
If the move probabilties are supposed to single out "unclear" moves, then things could work. But I don't really see how the whole updating process would work towards identifying "unclear" moves.
Well we will have to wait to see how good (or bad) LC0 will eventually become at tactics. I am hoping that the majority of chess tactics actually depend on fairly standard patterns and that the NN (value head and policy head) can learn to recognize those patterns. This would be similar to how humans handle tactics.
Recent experiments (by Kai and Killiani) show that the policy network of LC0 is on par with SF at depth 1 (with quiescence search). This might mean that LC0 already statically recognizes some recapture patterns. Unfortunately it may also mean that SF simply prunes too much at depth 1 to be competitive...
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
If the move probabilties are supposed to single out "unclear" moves, then things could work. But I don't really see how the whole updating process would work towards identifying "unclear" moves.
Well we will have to wait to see how good (or bad) LC0 will eventually become at tactics. I am hoping that the majority of chess tactics actually depend on fairly standard patterns and that the NN (value head and policy head) can learn to recognize those patterns. This would be similar to how humans handle tactics.
Recent experiments (by Kai and Killiani) show that the policy network of LC0 is on par with SF at depth 1 (with quiescence search). This might mean that LC0 already statically recognizes some recapture patterns. Unfortunately it may also mean that SF simply prunes too much at depth 1 to be competitive...
I don't think SF9 depth=1 is excessively weak and misses much compared to lesser pruning, older engines. An older test showing depth=1 results in RR games from regular openings:
Laskos wrote:
Yes, some sort of list. For ECM200.epd middlegame tactical suite (200 positions), analyzed for 20s/position. At this time control and my hardware, LC0 performs overall (Elo-wise) comparably to GreKo 6.5 2330 Elo CCRL standard A/B engine, which fares much better tactically (but much worse positionally). And it seems on this tactical middlegame suite ID124 is still the best of the nets.
Having watched around 100+ games of ID150+ and ~40 games of ID 156 versus 2100-3100 CCRL ELO opponents, i see that LC0(with that IDs as also with previous) completely outplays positionaly the other engines in many many cases, just to miss in at least 80% of them a tactical hit that either cost LC0 the win or even the draw and it loses.
LC0 is on par i dare to say with Stockfish dev in evaluation, but of course is ultra weak in tactics. It's even better than Stockfish in King attacks as i have seen. In placing its pieces to attack. Not in executing the attack since in that aspect is fails miserably due to bad tactics. The pattern recognition its NNs are offering it to see how to attack the King, seem to be extremely prosperous.
Meanwhile ID160 had a good jump in self-play ELO.
Yes, ID160 seems the strongest (at least in my test). Now I am checking its scaling, seems to scale nicely from 1s/move to 4s/move compared to similar in strength Jabba 1.0 (in my conditions).
noobpwnftw wrote:In fact, my test shows that it would only need about 1/3 of A0's hardware performance to get there, not including all the training efforts, now you tell me which is more efficient?
The human cost of developing LC zero is zero... Well, more or less
Laskos wrote:
Yes, some sort of list. For ECM200.epd middlegame tactical suite (200 positions), analyzed for 20s/position. At this time control and my hardware, LC0 performs overall (Elo-wise) comparably to GreKo 6.5 2330 Elo CCRL standard A/B engine, which fares much better tactically (but much worse positionally). And it seems on this tactical middlegame suite ID124 is still the best of the nets.
Having watched around 100+ games of ID150+ and ~40 games of ID 156 versus 2100-3100 CCRL ELO opponents, i see that LC0(with that IDs as also with previous) completely outplays positionaly the other engines in many many cases, just to miss in at least 80% of them a tactical hit that either cost LC0 the win or even the draw and it loses.
LC0 is on par i dare to say with Stockfish dev in evaluation, but of course is ultra weak in tactics. It's even better than Stockfish in King attacks as i have seen. In placing its pieces to attack. Not in executing the attack since in that aspect is fails miserably due to bad tactics. The pattern recognition its NNs are offering it to see how to attack the King, seem to be extremely prosperous.
Meanwhile ID160 had a good jump in self-play ELO.
Yes, ID160 seems the strongest (at least in my test). Now I am checking its scaling, seems to scale nicely from 1s/move to 4s/move compared to similar in strength Jabba 1.0 (in my conditions).
I'd love to see your tactical results on ID 160.
My own tests are no longer totally negative, but are very mixed.
Below is the "easiest" position in my testsuite, which I've posted many times but LCZero ID 160 still cannot get in 20 minutes.
This position was a real challenge until around 1993-1995 because dedicateds thought 8.Ng5 was a simpler way to win (it's not) and 8.Bf7 requires seeing quite deeply in one line.
I wonder if some kind of hybrid NN/minimax algorithm might make sense, where the NN would guide the search but the search would validate that the so-called best line is tactically sound? I am not actually an expert in this area. But while A0 is very interesting, in practical terms, on current consumer hardware, it does not seem like pure NN is a promising approach. True, you can throw hardware at it to make it better. But on comparable hardware (dollars or FLOPS or however you want to measure it) Stockfish is still hard for it to beat.
For hybrid approach, I have an idea: couldn't we run some MCTS threads and make use of their simulations for root move ordering? Let's just say if we can feed one GPU with 2 CPU threads, then we have them running independently, from time to time we could reorder root moves by their eval scores scaled to win rate estimation, it may help with favoring moves that score a few centi-pawns less but more favorable in the NN's view.