What I find puzzling is that neither on the LC Zero forum nor on github there is much if any sense of "how are we going to fight the bugs". The mood seems to be "what goes down will go up again" as if LC Zero could fix the bugs itself.Sven wrote: ↑Sun May 13, 2018 1:09 pmI basically agree. But my strategy would still involve to isolate the point where it got broken. Throwing away some hundreds of source code commits would not only be a waste but might also decrease motivation for the developers. So I guess the primary goal should be to find and fix the bugs, assuming that bugs are the main reason for the current regression (which appears to be most likely for me).Dann Corbit wrote: ↑Sun May 13, 2018 3:43 am The current graph is a projection of Malthus' Dismal Theory.
Time to admit that something got broken, backing up to a version that is not broken, and restarting.
Or, perhaps, this is just an imitation of nature that is too realistic, and the food supply is running out.
I don't know if this is the answer, but it seems not all contributing users upgrade their clients immediately when a new version comes out.What puzzles me is that according to the graph the regression did not start within 24 hours after a new release but more than three days after the release of v0.8. Network IDs 255, 256, 257 show the first symptoms of regression and were created on May 6th in the evening while v0.8 was released on May 3rd early in the morning. How is that possible? Contributing users play >100,000 games per day so there should be sufficient significance to rule out any weird statistical explanations.
The "Rxd7" bug started to show up around ID242 and probably earlier:
https://github.com/glinscott/leela-ches ... -388169418
This seems to coincide with v0.8.
On the other hand:
- this was fixed by changing code that had been committed on April 7 or so;
- it seems fixing it did not actually resolve LC's problems.
So this is again puzzling. If the bug was there since April 7, why did it not show up earlier? Was it triggered by some other change that was in v0.8 but not in v0.7?
Questions like this should get an answer. (Maybe it did get an answer, but I haven't seen it in any of the github threads.) Bugs don't just disappear by themselves but can be fixed only after having been understood.
Another thing I am wondering about: the sudden jump from ID226 to ID227. Larger network, they say, but does that actually explain the jump? I don't see how a larger network can lead to better play all by itself. It will first have to be trained. If the new larger network did not play better moves, then it seems the jump can only be explained by a calibration error. What was the larger network tested against? Is any of this fully understood by at least the core developers?
Same hereI still do not know enough about the project so maybe I misunderstand some of its concepts.