Has anyone here implemented one of the smaller binary neural networks?
Not Int8 or Float16 - but Int1. Thats extremely important for chess because this can be directly used with the 12 * 64bit Bitboard these algorithms work on. They work like you would think matrix multiplication is supposed to work if you reduce down to one bit. The solution is XNOR + Popcnt
https://towardsdatascience.com/binary-n ... c926888f3f
The coolest part ist that these are natively supported by Tensor cores. Meaning that with modern hardware you get 136Tflops of float16 compute - but there is a whopping 1.8 Petaflops in your off the shelf 3080 gpu. This would be a good first layer since the input is binary and the output is forced to be a int32 matrix/vector.
Cublass gemm Support is not there yet. But Nvidia CUTLASS (the template abstraction layer with different backends) supports 8, 4 and 1 bits:
https://github.com/NVIDIA/cutlass/blob/ ... l_types.md
Of course only for chess engines that already run natively on the gpu since transferring latency is a huge cost.
Does anyone here have experience or can point me to a good existing git repo for a 1 bit network implementation?
Interesting would be to see a backpropagation learning approach since I wouldnt know how a gradient is defined on a single bit output.
			
			
									
						
							Petaflop Binary Neural Networks? - on your home PC
Moderator: Ras
- 
				dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Petaflop Binary Neural Networks? - on your home PC
Worlds-fastest-Bitboard-Chess-Movegenerator 
Daniel Inführ - Software Developer
			
						Daniel Inführ - Software Developer
- 
				Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Petaflop Binary Neural Networks? - on your home PC
Not INT1 but I do have INT8 which works pretty well on many GPUs that do not support FLOAT16.
The elo loss for INT8 is ~40 elos on the same node search against FLOAT16, but the speed gain does compensate for this
elo loss. I was planning to test INT4 when and if TensorRT supports it, but it looks difficult to go beyond that.
Big neural netowrk search with MCTS heavily relies on the "knowledge in the net" so much that speed is less of an issue.
Loosing that knowledge by quantization may not be the best approach unless it is able to compensate for it tactically.
For speed sensitive neural networks (NNUE), on the other hand, quantization seems to be the right choice.
			
			
									
						
										
						The elo loss for INT8 is ~40 elos on the same node search against FLOAT16, but the speed gain does compensate for this
elo loss. I was planning to test INT4 when and if TensorRT supports it, but it looks difficult to go beyond that.
Big neural netowrk search with MCTS heavily relies on the "knowledge in the net" so much that speed is less of an issue.
Loosing that knowledge by quantization may not be the best approach unless it is able to compensate for it tactically.
For speed sensitive neural networks (NNUE), on the other hand, quantization seems to be the right choice.
- 
				dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Petaflop Binary Neural Networks? - on your home PC
Well its useful only for the first layer anyway. But my understanding is that it could expand into a more optimal layout. 
Who says that KingKPhalf is the perfect input for the other layers.
A Binary Neural Network could expand any bitboard directly into a good board representation for the other layers - and that in int32 format which is optimal for gpus - and dowsampling to fp16 is natively supported from int32.
IMO its natively the format of bitbaords. Thats why the difference between 8 bits and 1 bits is huge for that. With 8 bits you need to maintain an input list like nnue does. With bitboards and int1 that list is the bitboard itself!
			
			
									
						
							Who says that KingKPhalf is the perfect input for the other layers.
A Binary Neural Network could expand any bitboard directly into a good board representation for the other layers - and that in int32 format which is optimal for gpus - and dowsampling to fp16 is natively supported from int32.
IMO its natively the format of bitbaords. Thats why the difference between 8 bits and 1 bits is huge for that. With 8 bits you need to maintain an input list like nnue does. With bitboards and int1 that list is the bitboard itself!
Worlds-fastest-Bitboard-Chess-Movegenerator 
Daniel Inführ - Software Developer
			
						Daniel Inführ - Software Developer
- 
				Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Petaflop Binary Neural Networks? - on your home PC
There is really no computation in the first layer because the input is sparse matrix with 1s at few places where pieces are.
The first layer output is simply the weights associated with piece squares without multiplication (1 is the input).
			
			
									
						
										
						The first layer output is simply the weights associated with piece squares without multiplication (1 is the input).
- 
				dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Petaflop Binary Neural Networks? - on your home PC
Nope thats wrong. Its non linear because its not XNOR - its XNor + popcnt. So you have the dot product over all weighted input bits at once - like you do for fp16 or int8. But in binary the "sum" is equal to the popcnt.Daniel Shawul wrote: ↑Mon Oct 18, 2021 12:58 pm There is really no computation in the first layer because the input is sparse matrix with 1s at few places where pieces are.
The first layer output is simply the weights associated with piece squares without multiplication (1 is the input).
If you want a binary exact same uint8 representation like the inputlayer for NNUE - a trained int1 model can generate that. int32->int8
But halfking input may not be the best possible representation for a network - the input bits would become part of the training and thats the cool stuff.
The second cool part is that instead of transferring a huge uint8 array per position - you only need 12*64bits + 16 state bits - and have the full 1.8 Petaflops for the first layer and 119 Teraflops for every other fp16 layer. Which would be better that nnue in terms of size - speed and accuracy.
Only latency is the remaining problem.
Worlds-fastest-Bitboard-Chess-Movegenerator 
Daniel Inführ - Software Developer
			
						Daniel Inführ - Software Developer
- 
				odomobo
- Posts: 96
- Joined: Fri Jul 06, 2018 1:09 am
- Location: Chicago, IL
- Full name: Josh Odom
Re: Petaflop Binary Neural Networks? - on your home PC
A bit pedantic, but it wouldn't be petaflops, because it's not floating-point operations. Maybe binary neural networks are worth exploring, but "faster" doesn't necessarily mean better.
			
			
									
						
										
						- 
				Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Petaflop Binary Neural Networks? - on your home PC
Oops I forgot the weights will be binary in the binary network. Yes in that case.dangi12012 wrote: ↑Mon Oct 18, 2021 4:09 pmNope thats wrong. Its non linear because its not XNOR - its XNor + popcnt. So you have the dot product over all weighted input bits at once - like you do for fp16 or int8. But in binary the "sum" is equal to the popcnt.Daniel Shawul wrote: ↑Mon Oct 18, 2021 12:58 pm There is really no computation in the first layer because the input is sparse matrix with 1s at few places where pieces are.
The first layer output is simply the weights associated with piece squares without multiplication (1 is the input).
If you want a binary exact same uint8 representation like the inputlayer for NNUE - a trained int1 model can generate that. int32->int8
But halfking input may not be the best possible representation for a network - the input bits would become part of the training and thats the cool stuff.
The second cool part is that instead of transferring a huge uint8 array per position - you only need 12*64bits + 16 state bits - and have the full 1.8 Petaflops for the first layer and 119 Teraflops for every other fp16 layer. Which would be better that nnue in terms of size - speed and accuracy.
Only latency is the remaining problem.
For the current NNUE, the computation is vectorized additions that are further optimized with "accumulation" from previous moves --
which is where the incremental part of NNUE comes from.
So when a piece moves only the part that changed (from/to squares) weights are incremented/decremented.
It is hard to see the binary network apporch improving the computation on the first layer ...
- 
				Sopel
- Posts: 391
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: Petaflop Binary Neural Networks? - on your home PC
You have a fundamental misunderstanding about how NNUE relies on sparsity and board modification constraints between 2 subsequent positions to allow for incremental updates of the first layer. See https://github.com/glinscott/nnue-pytor ... ccumulator. There's also the aspect of overparametrization which is a key part of NNUE, and your "bitboard" input disallows that.dangi12012 wrote: ↑Mon Oct 18, 2021 4:09 pm The second cool part is that instead of transferring a huge uint8 array per position - you only need 12*64bits + 16 state bits - and have the full 1.8 Petaflops for the first layer and 119 Teraflops for every other fp16 layer. Which would be better that nnue in terms of size - speed and accuracy.
Only latency is the remaining problem.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.
- 
				dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Petaflop Binary Neural Networks? - on your home PC
Nope I think programmers need a more precise language than english. Maybe neuralink will change that.Sopel wrote: ↑Tue Oct 19, 2021 12:32 pmYou have a fundamental misunderstanding about how NNUE relies on sparsity and board modification constraints between 2 subsequent positions to allow for incremental updates of the first layer. See https://github.com/glinscott/nnue-pytor ... ccumulator. There's also the aspect of overparametrization which is a key part of NNUE, and your "bitboard" input disallows that.dangi12012 wrote: ↑Mon Oct 18, 2021 4:09 pm The second cool part is that instead of transferring a huge uint8 array per position - you only need 12*64bits + 16 state bits - and have the full 1.8 Petaflops for the first layer and 119 Teraflops for every other fp16 layer. Which would be better that nnue in terms of size - speed and accuracy.
Only latency is the remaining problem.
Overparametrisation can be done inherently by a binary network - since you choose yourself how many parameters you generate from the fundamental position by setting the number of columns in the 2nd matrix. Also NNUE maintains the first layer not input - thats the HalfKP i was talking about. Its a very good idea and no wonder it ended up in chess.
My question was if someone has experience with binary networks - but it seems not.
Worlds-fastest-Bitboard-Chess-Movegenerator 
Daniel Inführ - Software Developer
			
						Daniel Inführ - Software Developer
- 
				dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Petaflop Binary Neural Networks? - on your home PC
So if anyone is interested in the topic - here is a current research paper with a github repo. 
https://arxiv.org/pdf/2006.16578.pdf
https://github.com/pnnl/TCBNN
			
			
									
						
							https://arxiv.org/pdf/2006.16578.pdf
https://github.com/pnnl/TCBNN
Worlds-fastest-Bitboard-Chess-Movegenerator 
Daniel Inführ - Software Developer
			
						Daniel Inführ - Software Developer