My failed attempt to change TCEC NN clone rules

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Thu Sep 19, 2019 7:44 pm

dkappe wrote: ↑Thu Sep 19, 2019 6:13 pm If you run SIMEX on neural network engines (other than Stoofvlees), you get similarity of over 60%. That’s because they all derive from the alpha zero pseudo code which dictates certain neural network structures.

IIRC, I saw some statements that SIMEX rates Stockfish and lc0 rather closely, more so than other (non-NN) engines, presumably because there is some convergence to perfect play happening (high draw rates suggest this too). So I wouldn't be surprised if Stoofvlees was no different given that it is also pretty strong. On the other hand, if you are so strong that you know for sure what the moves are that still draw that gives more room for divergence even within perfect play? Not sure how that works.

Or is 60% so high you can really only get it from networks that have essentially converged to the same thing? Then you're probably right.

If you find this sort of hand waving argument from pictures compelling, I have a bridge to sell you.

Oh OK then.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Thu Sep 19, 2019 7:53 pm

gonzochess75 wrote: ↑Thu Sep 19, 2019 6:54 pm one of the top Leela devs told me Ankan is the most important person in the project and although he spent little time contributing, his code is really critical as noone else knows to code for GPU, and without him cuda backend wouldn't probably happen.

So for the record, you agree that the lc0 cuDNN/CUDA backend is a very critical part to the strength of an NN-engine?

dkappe · Post by **dkappe** » Thu Sep 19, 2019 7:55 pm

Gian-Carlo Pascutto wrote: ↑Thu Sep 19, 2019 7:25 pm
dkappe wrote: ↑Sun Sep 15, 2019 11:31 pm They all use essentially the same A0 network types, as that is dictated by the PUCT search algorithm.
Really? Well, you probably want at least a move probability and some kind of winrate eval. That's not even an A0 innovation, although Alpha Zero was the first time I saw the networks for those outputs combined, which was a nice trick. (But I do not necessarily think that is the best approach depending on the nature of your data)

Using separate networks for policy and value was in the original Alpha Go paper, but had already been suggested on the computer-go mailing-list long before.

The pieces were all there, Deep Mind put them all together and threw massive resources at it, but it wasn’t technically groundbreaking. When I say “A0 network types,” I mean the old familiar ResNet with chess state inputs and policy and value outputs. Nothing earthshaking about that, and value and policy are required by the PUCT algorithm, else it won’t run.

gonzochess75 · Post by **gonzochess75** » Thu Sep 19, 2019 7:59 pm

Gian-Carlo Pascutto wrote: ↑Thu Sep 19, 2019 7:53 pm
gonzochess75 wrote: ↑Thu Sep 19, 2019 6:54 pm one of the top Leela devs told me Ankan is the most important person in the project and although he spent little time contributing, his code is really critical as noone else knows to code for GPU, and without him cuda backend wouldn't probably happen.
So for the record, you agree that the lc0 cuDNN/CUDA backend is a very critical part to the strength of an NN-engine?

I think I've been very clear and said *numerous times* that the cudann backend is a very important critical piece of code. Doing it well vs poorly is the difference between a lot of nps. I did not want to try and rewrite this nor did I see any reason to. I don't think I'd be learning much that interests me and Ankan had a very nice one already written that he's very happy for others to use including lc0 and Allie

I just don't understand how I could ever hope to be more transparent about this. It is published on allie's github page since the very beginning of her public release. I've said it to everyone who would listen. But to be very clear, the sentiment I was relating above was the unsolicited opinion of a top lc0 developer.

dkappe · Post by **dkappe** » Thu Sep 19, 2019 8:03 pm

Gian-Carlo Pascutto wrote: ↑Thu Sep 19, 2019 7:53 pm
gonzochess75 wrote: ↑Thu Sep 19, 2019 6:54 pm one of the top Leela devs told me Ankan is the most important person in the project and although he spent little time contributing, his code is really critical as noone else knows to code for GPU, and without him cuda backend wouldn't probably happen.
So for the record, you agree that the lc0 cuDNN/CUDA backend is a very critical part to the strength of an NN-engine?

Yes. The speed of NN evaluation is critical to the strength of a NN-engine or the throughput of a medical image processing net. That’s why CUDA is the darling of GPU programming at the moment.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Thu Sep 19, 2019 8:20 pm

dkappe wrote: ↑Thu Sep 19, 2019 7:55 pm The pieces were all there, Deep Mind put them all together and threw massive resources at it, but it wasn’t technically groundbreaking. When I say “A0 network types,” I mean the old familiar ResNet with chess state inputs and policy and value outputs. Nothing earthshaking about that, and value and policy are required by the PUCT algorithm, else it won’t run.

It's interesting to consider the constraints here.

First, we need to run the search on the CPU, because GPUs suck at this kind of code. The amount of calls we can make to the GPU limits our speed in such a way that it is very beneficial to make *any* network really large. But we want to use the GPU for the network, because the amount of TOPS on an RTX card draws any CPU by ludicrous amounts.

We want to play chess, so you're going to need some chess-related inputs. Maybe it doesn't need to be the state though, but it's going to have to be reasonably close?

It's hard to search without at least getting an Expected Value for the current state, so you need the eval output.

The policy isn't required (you could just use regular UCT). But knowing the best move is handy for move ordering (which is the same as pruning in a strong engine). There were articles about doing this with neural networks in the ICGA Journal in 2001!

So, I think that it is hard to get away from having to include at least these. If CPUs get much faster at neural network inference, that could throw things around, though.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Thu Sep 19, 2019 9:11 pm

gonzochess75 wrote: ↑Thu Sep 19, 2019 7:59 pm So for the record, you agree that the lc0 cuDNN/CUDA backend is a very critical part to the strength of an NN-engine?
I think I've been very clear and said *numerous times* that the cudann backend is a very important critical piece of code...But to be very clear, the sentiment I was relating above was the unsolicited opinion of a top lc0 developer.

I'll explain you why I highlight this:

That code is great, very important and all praise to Ankan for writing it, but that code all by itself is a far cry from a chess engine. So you see Allie is not a clone or a copy of Lc0.

See, the thing is nobody has a good definition of "clone". It was pointed out in the beginning, and if anything, it looks like this thread was an attempt to define the term for NN engines. Good luck! I don't think anyone agreed on one for AB engines and those are quite a bit older and more understood.

I think I understand somewhat where crem's frustration comes from. He thinks computer chess is a programming competition. It's logical, because it superficially looks that way. But nothing could be further from the truth. From a competition point of view it seems most sensible to look at where an engine gets its strength from. Hence, it looks senseless from this POV that lc0 is losing to competitors that use lc0's code and get a lot of strength from it. (This is even more painful because the results are somewhat random and lc0 could easily lose to an engine using its code that is actually a bit weaker! It is literally possible to take lc0, make it a bit weaker, and still hope to win TCEC/CCC/whatever)

But computer chess is not a programming competition. Computer chess is entertainment. If it were a competition, there'd be no Komodo and KomodoMCTS in TCEC - and I'll leave it at those two most obvious examples to not open another dozen cans of worms. From an audience point of view it is more interesting to have lots of strong engines, preferably that still differ enough in how they see a position so there's enough decisive games. The audience doesn't give a f--- about the graphs posted on the first page. They care - at most a little - about engines that are so close that it's like watching a self-play game. Details like how the NN backend was written - even if it took months of blood, sweat, tears and access to non-public GPU programming details - are not relevant, as long as the engine plays a bit differently and the author can explain a bit what he did. So hoping that the rules address this is futile - there's a fundamental conflict of interest. There's quite a bit more people watching and discussing TCEC than there are watching the ICGA Computer Chess World Championship, right?

It took me quite a while to come to that realization when I was originally involved in computer chess, and after I did, I quit.

(Don't ask me what I'm doing here again now, it's a long story involving ataxx engines...)

gonzochess75 · Post by **gonzochess75** » Thu Sep 19, 2019 9:43 pm

Gian-Carlo Pascutto wrote: ↑Thu Sep 19, 2019 9:11 pm

gonzochess75 wrote: ↑Thu Sep 19, 2019 7:59 pm So for the record, you agree that the lc0 cuDNN/CUDA backend is a very critical part to the strength of an NN-engine?
I think I've been very clear and said *numerous times* that the cudann backend is a very important critical piece of code...But to be very clear, the sentiment I was relating above was the unsolicited opinion of a top lc0 developer.
I'll explain you why I highlight this:

...

It looks like we agree to a very large extent GCP. I have been saying this for quite a long time. It is entertainment not a coding competition.

What is very interesting is that the code in question *here* was written mainly by one person who is fine with the code being used by both engines... and more. So if it was a coding competition I guess Ankan wins if either Lc0 or Allie win? Or maybe Deepmind should get some of the credit if either win? Or maybe *you* should get some credit if either win? Or maybe the whole history of contributors to chess programming knowledge should get some credit since so many of these ideas that are so crucial and used by all current engines were not original to any of the *current* generation of engine devs? Hell, maybe there is plenty of credit to go around and we don't have to bash each other and play a zero sum game of who deserves the *real* or the *most* credit?

But really, if it *were* a coding competition and you wanted to know which one is better well hell these tournaments are not the way to find out. Play them on some silent server farm for a couple hundred thousand games and sit back and wait for the answer. But that is actually done all the time by the various testers who test these engines and you know what they've found? That the Lc0 binary is still slightly better than the Allie binary by and large if you aggregate the test data. I'm Ok with that. credit --> Kudos to the lc0 team! <-- credit

chrisw · Post by **chrisw** » Thu Sep 19, 2019 10:08 pm

Gian-Carlo Pascutto wrote: ↑Thu Sep 19, 2019 9:11 pm

gonzochess75 wrote: ↑Thu Sep 19, 2019 7:59 pm So for the record, you agree that the lc0 cuDNN/CUDA backend is a very critical part to the strength of an NN-engine?
I think I've been very clear and said *numerous times* that the cudann backend is a very important critical piece of code...But to be very clear, the sentiment I was relating above was the unsolicited opinion of a top lc0 developer.
I'll explain you why I highlight this:

That code is great, very important and all praise to Ankan for writing it, but that code all by itself is a far cry from a chess engine. So you see Allie is not a clone or a copy of Lc0.
See, the thing is nobody has a good definition of "clone". It was pointed out in the beginning, and if anything, it looks like this thread was an attempt to define the term for NN engines. Good luck! I don't think anyone agreed on one for AB engines and those are quite a bit older and more understood.

I think I understand somewhat where crem's frustration comes from. He thinks computer chess is a programming competition. It's logical, because it superficially looks that way. But nothing could be further from the truth. From a competition point of view it seems most sensible to look at where an engine gets its strength from. Hence, it looks senseless from this POV that lc0 is losing to competitors that use lc0's code and get a lot of strength from it. (This is even more painful because the results are somewhat random and lc0 could easily lose to an engine using its code that is actually a bit weaker! It is literally possible to take lc0, make it a bit weaker, and still hope to win TCEC/CCC/whatever)

But computer chess is not a programming competition. Computer chess is entertainment. If it were a competition, there'd be no Komodo and KomodoMCTS in TCEC - and I'll leave it at those two most obvious examples to not open another dozen cans of worms. From an audience point of view it is more interesting to have lots of strong engines, preferably that still differ enough in how they see a position so there's enough decisive games. The audience doesn't give a f--- about the graphs posted on the first page. They care - at most a little - about engines that are so close that it's like watching a self-play game. Details like how the NN backend was written - even if it took months of blood, sweat, tears and access to non-public GPU programming details - are not relevant, as long as the engine plays a bit differently and the author can explain a bit what he did. So hoping that the rules address this is futile - there's a fundamental conflict of interest. There's quite a bit more people watching and discussing TCEC than there are watching the ICGA Computer Chess World Championship, right?

It took me quite a while to come to that realization when I was originally involved in computer chess, and after I did, I quit.

(Don't ask me what I'm doing here again now, it's a long story involving ataxx engines...)

Well, yes, that (what computer chess is about) is true in part, but it is also:

the poster child of boom bust AI investment development cycle. Each boom has been accompanied propaganda-wise on dreams of “well if they can beat humans in chess ...” and each bust on failure to actually deliver on the dreams. So, I would posit, computer chess is about bullshit also. Extracting copious investment sums from greedy capitalists, isn’t that what AZ is really about? Sure, Go now as well. And one of the reasons there are a few sleazy crooks hovering around the upper echelons of it.

second, at programmer level, it’s about ego and identity. You can tell by the huge importance given to the name of the thing. The objection to Alliestein (or any other equivalent example) is that the original has been renamed, an act of stripping identity away from the actual originators. If it were called LC0WithSomeChangedBits the whole thing would be different. But Alliestein and something in the notes about thanks to LC0 is not the same thing at all. Alliestein says “I own this, its part of me now, my ego/identity is in it”, and “speak to me differently now, I am somebody, I created a chess engine”.
Likewise the people/person recognised for hard work in LC0, see this seizure of status (equivalent status to them) as kind of cheat “hey you can’t do that, I am in that engine, you are erasing me by the renaming and publicity act”.
See? It’s all about the naming. People in open source projects object when the project is renamed as some kind of new entity, and some new person gets a cheap route to status, which is really belonging to the originators, on the originators back and hard work.
Same with DeusX. The renaming and attempts to hide origins. That case more obviously about ego/identity because of the accompanying displays of narcissism.

Gian-Carlo Pascutto · Thu Sep 19, 2019 10:22 pm

gonzochess75 wrote: ↑Thu Sep 19, 2019 9:43 pm So if it was a coding competition I guess Ankan wins if either Lc0 or Allie win?

Yep. (Which underlines that TCEC can't possibly be that - he's playing against himself)

Or maybe *you* should get some credit if either win?

Even better: I'll win TCEC if either lc0, Allie, Stockfish or Stoofvlees wins. But I'm still the underdog by far: Ronald De Man wins if lc0, Allie, Stockfish, Stoofvlees, Houdini, Komodo OR KomodoMCTS wins.

About the only thing that could go wrong for Ronald is ScorpioNN winning the superfinal. We shouldn't be too quick with celebrations, but things are looking up, I'd say.

Or we could dismiss the above as nonsense and conclude that TCEC is not a programming competition.

Or maybe the whole history of contributors to chess programming knowledge should get some credit since so many of these ideas that are so crucial and used by all current engines were not original to any of the *current* generation of engine devs?

It's not an ideas competition either - clearest counterexample of this is that you may have good ones but if I code an engine with 100x the NPS then that may not help you as much as you'd hope.

Hell, maybe there is plenty of credit to go around and we don't have to bash each other and play a zero sum game of who deserves the *real* or the *most* credit?

A competition is not useful if everyone shares first place. Some engine will win TCEC.

My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules