Evidence That NNs Work Best With Multiple Modules

towforce · Post by **towforce** » Sun Dec 12, 2021 1:26 pm

Researchers at Fujitsu an MIT have shown that for understanding images, NNs work best when broken into multiple modules, each of which does a separate part of the work.

https://www.techradar.com/news/boffins- ... like-we-do

It seems obvious to me that the same will apply to chess: one NN to do everything is clearly not the way. This partially explains why NNs trained on several order of magnitude more games than any human will ever see are unable to beat top humans at ply 1.

(post first published on Ed's ProDeo forum earlier today).

Milos · Post by **Milos** » Sun Dec 12, 2021 5:06 pm

towforce wrote: ↑Sun Dec 12, 2021 1:26 pm Researchers at Fujitsu an MIT have shown that for understanding images, NNs work best when broken into multiple modules, each of which does a separate part of the work.

https://www.techradar.com/news/boffins- ... like-we-do

It seems obvious to me that the same will apply to chess: one NN to do everything is clearly not the way. This partially explains why NNs trained on several order of magnitude more games than any human will ever see are unable to beat top humans at ply 1.

(post first published on Ed's ProDeo forum earlier today).

As usual, your layman intuition doesn't at all apply.
What they did is not groundbraking in any way (again some layman tech journalist reporting) what could clearly be seen from the category of their paper at NIPS (just a regular poster, not chosen as the best poster, let alone best paper). There are plenty of research groups working on the same idea.
However, that's not even a point. The point is that this has absolutely nothing to do with chess and is totally not applicable in chess.
Human based evaluation (classical) tends to separate categories (pieces types, colors, ranks) because hand coding actual dependence between categories (features) in efficient way is difficult if not impossible. OTOH, NN(UE) based evaluation tries to extract features that contain interdependencies between categories that are crucial for accurate evaluation. Separating NNs doesn't help in any way because the goal of NN in chess is not classification that is the main task of the approach you cited.

dkappe · Post by **dkappe** » Sun Dec 12, 2021 5:31 pm

Milos wrote: ↑Sun Dec 12, 2021 5:06 pm However, that's not even a point. The point is that this has absolutely nothing to do with chess and is totally not applicable in chess.
Human based evaluation (classical) tends to separate categories (pieces types, colors, ranks) because hand coding actual dependence between categories (features) in efficient way is difficult if not impossible. OTOH, NN(UE) based evaluation tries to extract features that contain interdependencies between categories that are crucial for accurate evaluation. Separating NNs doesn't help in any way because the goal of NN in chess is not classification that is the main task of the approach you cited.

Milos,

I guess you can be forgiven for sleeping through the AlphaZero revolution. Like Athena emerging from the forehead of Zeus, the NN techniques used in AlphaZero were adapted from the work on image classification and recognition. Even the latest NNUE’s used by stockfish have sub modules corresponding to psqt’s and game phase.

Eagerly awaiting your next angry response.

Milos · Post by **Milos** » Sun Dec 12, 2021 8:07 pm

dkappe wrote: ↑Sun Dec 12, 2021 5:31 pm
Milos wrote: ↑Sun Dec 12, 2021 5:06 pm However, that's not even a point. The point is that this has absolutely nothing to do with chess and is totally not applicable in chess.
Human based evaluation (classical) tends to separate categories (pieces types, colors, ranks) because hand coding actual dependence between categories (features) in efficient way is difficult if not impossible. OTOH, NN(UE) based evaluation tries to extract features that contain interdependencies between categories that are crucial for accurate evaluation. Separating NNs doesn't help in any way because the goal of NN in chess is not classification that is the main task of the approach you cited.
Milos,

I guess you can be forgiven for sleeping through the AlphaZero revolution. Like Athena emerging from the forehead of Zeus, the NN techniques used in AlphaZero were adapted from the work on image classification and recognition. Even the latest NNUE’s used by stockfish have sub modules corresponding to psqt’s and game phase.

Eagerly awaiting your next angry response.

Why would I be angry because of someone's layman understanding?

I guess since Resnet was used for determining policy and value it must be that categorization based image classification is also very useful for chess. Gee. I get it, you are just a hobbyist, but it's not that much expecting some basic understanding.
Thinking that different psqt based on the game phase has something to do with what OP quoted is a bit clueless.
P.S. I analyzed particular AlphaGo Resnet architecture when you didn't even know what deep learning is.

dkappe · Post by **dkappe** » Sun Dec 12, 2021 8:57 pm

Milos wrote: ↑Sun Dec 12, 2021 8:07 pm Why would I be angry because of someone's layman understanding?
I guess since Resnet was used for determining policy and value it must be that categorization based image classification is also very useful for chess. Gee. I get it, you are just a hobbyist, but it's not that much expecting some basic understanding.
Thinking that different psqt based on the game phase has something to do with what OP quoted is a bit clueless.
P.S. I analyzed particular AlphaGo Resnet architecture when you didn't even know what deep learning is.

And you didn’t disappoint. Angry response, right on time.

Since I was using ResNETs for image classification in medical devices long before AlphaGo, when exactly were you analyzing these AlphaGo ResNET?

Also, basic reading comprehension: sub modules corresponding to psqt’s are not the same thing as different psqt based on the game phase. Think before you write.

Eagerly awaiting your next angry response.

Milos · Post by **Milos** » Sun Dec 12, 2021 10:23 pm

dkappe wrote: ↑Sun Dec 12, 2021 8:57 pm And you didn’t disappoint. Angry response, right on time. Since I was using ResNETs for image classification in medical devices long before AlphaGo, when exactly were you analyzing these AlphaGo ResNET?

Late 2015 when first version of DeepMind's Nature paper appeared on arXiv (they later removed it in order to get it published in Nature).
Even though AlexNet already won LSVRC2012 the first ResNets really appeared in 2015.
For your information I have papers in ECCV and MICCAI in ML for medial images, so I kind of know the field. So, excuse me for not believing you, but I can smell BS from a mile away, and you are kind of known for it.
So when were you using those ResNets for classification? What kind of classification? What medical images? Which exact ResNet?

dkappe · Post by **dkappe** » Sun Dec 12, 2021 11:51 pm

Milos,

since ResNET’s are a specialized kind of highway network, I suppose you are correct that I technically wasn’t using ResNET’s in early 2015. But don’t tell me the first you heard of a ResNET was in a paper.

Milos · Post by **Milos** » Mon Dec 13, 2021 12:19 am

dkappe wrote: ↑Sun Dec 12, 2021 11:51 pm Milos,

since ResNET’s are a specialized kind of highway network, I suppose you are correct that I technically wasn’t using ResNET’s in early 2015. But don’t tell me the first you heard of a ResNET was in a paper.

Ofc, but how else?
I remember what was discussion at that time. CNNs were in for a while (AlexNet, GoogleNet) but they were simple, couple of convolutional layers with ReLUs and fully-connected layer in the end. And they all suffered from saturation once you start adding more layers (usually more than 10 hidden layers).
There was a famous pre-print of ResNet paper published on arXiv (and later in CVPR) that has like ridiculous number of citations (one of the most cited ML papers ever) claiming that introducing residual connections solves issue of saturation of accuracy in training. And if I remember correctly, the discussion at that time was whether Deepmind used residual connections in their policy and value nets in AlphaGo or not, because that one was first trained with supervised learning and there was no saturation in their training graphs in the paper, and they were quite vague (just checked in the paper they didn't even mention BN and pooling layers) about policy and value network architecture (and they got away with it even in Nature publication). Later with AlphaZero paper it was confirmed that they indeed used residue blocks.

And if you want to listen, I can explain you more in detail what is the problem with approach in that paper that OP cited and why it can't be applied to chess. First this approach of having multiple categorized networks assumes that objects that need to be classified can be more less cleanly divided into different categories that don't overlap. That is in practice of a very little importance since usually many features in images overlap and having networks that are trained separately for shape, color and size, for example, doesn't work that well. It's even more complicated with chess because you can't train one separate network for each individual piece type, or each side and seriously expect to have any meaningful result. There are no categories that make sense with clear separation. Even some that might sound interesting on paper were tried in Lc0 (like separate nets for pawns and other pieces, or PSQT for the mater) and failed miserably.

dkappe · Post by **dkappe** » Mon Dec 13, 2021 12:48 am

Well, what were later termed “highway nets” had been around for a while (2013?) in my academic circles. It was just a matter of time before someone came up with a practical application such as ResNET’s. We employed them (highway nets) in a ridiculous application that used a smartphone to read the level of discharge in a graduated container normally placed under beds. You always remember the silly ones.

So, I’m curious what your take on that paper is. Let’s wipe the slate clean. All ears (or eyes).

Sopel · Post by **Sopel** » Mon Dec 13, 2021 1:12 am

Went through the slides (https://neurips.cc/media/neurips-2021/Slides/26740.pdf). I agree with Milos, this is useless for chess.

Evidence That NNs Work Best With Multiple Modules

Evidence That NNs Work Best With Multiple Modules

Re: Evidence That NNs Work Best With Multiple Modules

Re: Evidence That NNs Work Best With Multiple Modules

Re: Evidence That NNs Work Best With Multiple Modules

Re: Evidence That NNs Work Best With Multiple Modules

Re: Evidence That NNs Work Best With Multiple Modules

Re: Evidence That NNs Work Best With Multiple Modules

Re: Evidence That NNs Work Best With Multiple Modules

Re: Evidence That NNs Work Best With Multiple Modules

Re: Evidence That NNs Work Best With Multiple Modules