Why is there no bulk evaluation happening in Alpha Beta?

Discussion of chess software programming and technical issues.

Moderator: Ras

smatovic
Posts: 3224
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Why is there no bulk evaluation happening in Alpha Beta?

Post by smatovic »

Madeleine Birchfield wrote: Sat Nov 06, 2021 2:06 pm
smatovic wrote: Sat Nov 06, 2021 7:21 am It is the effective branching factor of ~2 of modern AB engines and the serial nature of the algorithm which makes bulk evaluation/batches not fit with AB -> MCTS-PUCT of Lc0 with NN-eval batches on GPU.
So bulk evaluation/batches would work with AB on the GPU much better for games with higher effective branching factor, such as Go?
Idk about AB and Go, but the higher the branching factor of the game (and the effective BF of the algorithm) the more there is to parallelize, the higher the batch-size you could run in theory. There are some papers (meanwhile old) on game tree search regarding GPU:

https://zeta-chess.app26.de/post/papers ... ee-search/

And as Daniel mentioned, there is the roll-out-paradigm, MCTS converges to MiniMax and alike.

--
Srdja
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Why is there no bulk evaluation happening in Alpha Beta?

Post by dangi12012 »

Does Alphabeta have enough leaf nodes for bulk evaluation?

With MCTS you wouldnt need any evaluation network at all. The smapling of outcomes after a move converges to AB. For MCTS only perft speed would matter.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
smatovic
Posts: 3224
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Why is there no bulk evaluation happening in Alpha Beta?

Post by smatovic »

dangi12012 wrote: Sat Nov 06, 2021 2:35 pm Does Alphabeta have enough leaf nodes for bulk evaluation?
...
I guess that is the issue several people tried to point you at, AB+QS with EBF in an actual real modern chess engine.

--
Srdja
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Why is there no bulk evaluation happening in Alpha Beta?

Post by diep »

dangi12012 wrote: Sat Nov 06, 2021 2:35 pm Does Alphabeta have enough leaf nodes for bulk evaluation?

With MCTS you wouldnt need any evaluation network at all. The smapling of outcomes after a move converges to AB. For MCTS only perft speed would matter.
Not at all the perft speed is total irrelevant. This is a very simplistic idea what determines speed of an engine.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Why is there no bulk evaluation happening in Alpha Beta?

Post by diep »

Madeleine Birchfield wrote: Sat Nov 06, 2021 2:06 pm
smatovic wrote: Sat Nov 06, 2021 7:21 am It is the effective branching factor of ~2 of modern AB engines and the serial nature of the algorithm which makes bulk evaluation/batches not fit with AB -> MCTS-PUCT of Lc0 with NN-eval batches on GPU.
So bulk evaluation/batches would work with AB on the GPU much better for games with higher effective branching factor, such as Go?
Bulk evaluation is not possible in Game Tree Search.

Please note that in GO especially at the end of long lines in reality lots of moves are forced. You can forward prune about everything and only a handful of moves are 'plausible' at most. This makes the effective branching factor much smaller than chess in practice. Only big difference is the huge length of each line in go. So where branching factor in practice is smaller in go - it's much longer lines.

Hard forward pruning is very easy in go and total impossible in chess. That effectively makes b.f. smaller in Go than in chess.

Yet that's total unrelated to bulk evaluation.

By definition bulk evaluation implies you have a much worse branching factor - which is total suicide in any game from a search perspective.

So from an absolute viewpoint seen Go is total uninteresting from a search viewpoint as less money has been poured into it than in chess. Yet if we look at past 20 years which 2 things boosted the elo of chess engines most besides the improved hardware searching deeper then it is the additional chessknowledge and perfect parameter tuning (or so you want much better tuned) chess engines.

So the past 10+ years the classic chess engines have been at a standstill there as they were perfect parameter tuned around 2005-2008 and sincethen no improved chessknowledge anymore in stockfish/fruit. The commercial incentive to improve classical searching chessprograms had disappeared completely.

Trying to massively vectorize chess or go - there is no difference between the 2 from a theoretical search viewpoint seen. Typically go programs were very bad coded compared to chessprograms yet used much more go knowledge to get the job done. A big problem in go was sales. It was tough to sell anything and get any sort of knowledge and the books that is there in go is total RUBBISH. I have a book or 10 on go there and especially the writings of a 9-dan dude from quite a long time ago is total RUBBISH.

For example a typical Japanese/Korean lemma to attack the enemy at his strongest point.

Strategical this is in both games in GENERAL seen total IDIOCY.

He mentions it this guy of course because SOMETIMES it makes sense. But grosso modo it doesn't make sense.

In short the big difference between computer chess and computer go is that computer go simply was 20 years behind computerchess.
Selling anything in Japan is very complicated. It's a very closed society there. Importing things from abroad is just not easily done. Mainly raw materials get imported.

So it is not an interesting game at all from a sales viewpoint seen.

Yet game tree search technical spoken it's exactly the same challenge. The parameter differences are: hard forward pruning is much easier - the incremental makemove and unmakemove in go is a bit harder to update the structures (knowing how many groups are alive or dead) yet effectively it is much faster in nps there than in chess. And as a result things was done more lazy in go - so ANN's were used sooner there.

Code of go programs in general is very slow code compared to chess. Very bad programmed so to speak. JAVA for example was used a lot some years ago there. And only 1 or 2 engines really used assembler (handtalk back then) yet all that was very common in chess.

So more effort was put in computerchess. Yet it's the exactly same problem from a search viewpoint seen.

Now i'm not that strong in go - some 5-dan guy totally murdered me there.
Yet i'm titled in chess. When i talked there to an expert in go,
the strategical lemma's in chess and go are pretty much the same.

You attack where your enemy is weakest and try to first chop of a finger instead of attack him at his strongest point.

Draughts and checkers there is different. This is assymetric games - that completely changes the perspective and strategic goals.
Chess and go are completely symmetric and completely similar games.

A strong chessplayer can play very strong after some practice in go. Dan level and FM level no problem.
A strong checkers player, even world champion draughtsplayers capable of perfect calculation - very weak in chess even when doing it fulltime. (we have 2 examples in Netherlands there)
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Why is there no bulk evaluation happening in Alpha Beta?

Post by Daniel Shawul »

dangi12012 wrote: Sat Nov 06, 2021 11:26 am
j.t. wrote: Sat Nov 06, 2021 12:11 am To make any use of the GPU you need to collect a pretty big amount of positions that you want to evaluate. The longer you postpone evaluating positions, the more you lose the advantages of the alpha-beta. Together with the extreme pruning engines do today, the relative efficient NNUE algorithm, and the overhead that comes from using a GPU, and last but not least the complexity of GPU code, I assume that people just think that it's not worth it.

But I'll be honest and say that I've never tried this myself (I thought about it, but I've never got familiar enough with GPUs to actually try it), so maybe I misjudged the potential of GPUs.
I think this is the best answer here so far. Complexity of GPU code and not every developer is confortable with it.
I am very very certain that gpu is the future of chess - as for the pruning the gpu would only ever get all unevaluated leaf nodes - so algorithmically there is no difference.

GPUs have come a long way and by bulk sending all good leaf nodes at once I think network size can grow for free and you offload all the hard neural netork GEMM work to the gpu where its much faster calculated. NNUE has the size it has because the later layers use AVX2 and the first layer is done incrementally. Nothing I see is preventing 1E6 leaf nodes to be bulk evaluated on the gpu side - except that it has not been done before.
You have no idea what you are talking about sorry.
All that you have asked for in your OP, I have done and even more.
Instead of talking all over the place, pick one subject and do it, then you will learn the challenges of doing it.
Weren't you previously interested in binary neural networks ? Maybe focus on that do something with it instead of opening many topics
telling us about paradigm shift.

When you ask a question, prepare yourself to listen when people tell you what they have done...etc
I think this perft stuff has gotten to your head to be honest :)
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Why is there no bulk evaluation happening in Alpha Beta?

Post by Daniel Shawul »

j.t. wrote: Sat Nov 06, 2021 12:11 am To make any use of the GPU you need to collect a pretty big amount of positions that you want to evaluate. The longer you postpone evaluating positions, the more you lose the advantages of the alpha-beta. Together with the extreme pruning engines do today, the relative efficient NNUE algorithm, and the overhead that comes from using a GPU, and last but not least the complexity of GPU code, I assume that people just think that it's not worth it.

But I'll be honest and say that I've never tried this myself (I thought about it, but I've never got familiar enough with GPUs to actually try it), so maybe I misjudged the potential of GPUs.
This is wrong! Modern AB engines use many threads, 256 threads with lazy smp, so you can batch together eval requests from different threads.
Each thread evaluates exactly one position and you then effectiviely have a batch size of 256 that should be enough to fill up a GPU with a decent net size (not nnue though since that is too tiny). I already do this in Scorpio and it "works", but it is darn slow even with a tiny 6x64 net ...
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Why is there no bulk evaluation happening in Alpha Beta?

Post by Daniel Shawul »

dangi12012 wrote: Sat Nov 06, 2021 2:35 pm Does Alphabeta have enough leaf nodes for bulk evaluation?
What does this even mean ? AB has same leaf nodes as an MCTS tree ...
With MCTS you wouldnt need any evaluation network at all. The smapling of outcomes after a move converges to AB. For MCTS only perft speed would matter.
Sigh...
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Why is there no bulk evaluation happening in Alpha Beta?

Post by dangi12012 »

Daniel Shawul wrote: Sun Nov 07, 2021 10:02 pm
j.t. wrote: Sat Nov 06, 2021 12:11 am To make any use of the GPU you need to collect a pretty big amount of positions that you want to evaluate. The longer you postpone evaluating positions, the more you lose the advantages of the alpha-beta. Together with the extreme pruning engines do today, the relative efficient NNUE algorithm, and the overhead that comes from using a GPU, and last but not least the complexity of GPU code, I assume that people just think that it's not worth it.

But I'll be honest and say that I've never tried this myself (I thought about it, but I've never got familiar enough with GPUs to actually try it), so maybe I misjudged the potential of GPUs.
This is wrong! Modern AB engines use many threads, 256 threads with lazy smp, so you can batch together eval requests from different threads.
Each thread evaluates exactly one position and you then effectiviely have a batch size of 256 that should be enough to fill up a GPU with a decent net size (not nnue though since that is too tiny). I already do this in Scorpio and it "works", but it is darn slow even with a tiny 6x64 net ...
Thats a cool idea. If the movegen + AB is happening on the gpu as well you get exactly one eval request per warp thread with sync barriers this could be very very fast and evaluate thousands of positions with a few matrix multiply + Relu iterations = layers.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
j.t.
Posts: 263
Joined: Wed Jun 16, 2021 2:08 am
Location: Berlin
Full name: Jost Triller

Re: Why is there no bulk evaluation happening in Alpha Beta?

Post by j.t. »

Daniel Shawul wrote: Sun Nov 07, 2021 10:02 pm
j.t. wrote: Sat Nov 06, 2021 12:11 am To make any use of the GPU you need to collect a pretty big amount of positions that you want to evaluate. The longer you postpone evaluating positions, the more you lose the advantages of the alpha-beta. Together with the extreme pruning engines do today, the relative efficient NNUE algorithm, and the overhead that comes from using a GPU, and last but not least the complexity of GPU code, I assume that people just think that it's not worth it.

But I'll be honest and say that I've never tried this myself (I thought about it, but I've never got familiar enough with GPUs to actually try it), so maybe I misjudged the potential of GPUs.
This is wrong! Modern AB engines use many threads, 256 threads with lazy smp, so you can batch together eval requests from different threads.
Each thread evaluates exactly one position and you then effectively have a batch size of 256 that should be enough to fill up a GPU with a decent net size (not nnue though since that is too tiny). I already do this in Scorpio and it "works", but it is darn slow even with a tiny 6x64 net ...
The search speed doesn't scale linearly with thread count. At some point, when you use multiple thousand threads instead of a few hundred, the algorithmic loss is practically the same as the advantage you get from more cores. That's what I mean with "the longer you postpone evaluating positions, the more you lose the advantages of the alpha-beta". Considering that GPUs are slower per thread and have an additional overhead compared to CPU threads, there might be no advantage at all using a GPU instead of hundreds of CPU threads to parallelize alpha-beta (but as I said earlier, this is mostly speculation by me). Of course, using the GPUs many threads to evaluate a huge neural network is a viable solution. But I don't think that this is what the original question was about, as it would be basically the same what Lc0 does.