stegemma wrote:I've done a very simple count on how fast an FPGA could be. A starting that board costs about 30$ has a 100 Mhz clock. If you can do any step in one clock cicle, you can do:
Code: Select all
1 - select the first move
2 - make the move
3 - next ply
...last ply:
4 - evaluate
5 - go back one ply
6 - undo last move
Of course this is over-simplified but it means that you need from 5 to 6 clock cycles per move. This is the minimal count, i think... but maybe there are some better design (i'm not an engineer) than the one i can imagine.
If we can do a full make/unmake in only 5 clock cycles, at 100 Mhz we can reach a 20M nodes per second or less (for the cheapest FPGA board that i've found). It is not so impressive, compared to today CPUs. What is interesting is that we can build multiple processor unit in one FPGA, that works in parallel. I don't know the FPGA limits and how many "co-processors" we can fit on it, because it depends on the complexity of our project and of the FPGA itself.
Even if this speed is not so impressive, it would be more than anything personally I can do in C++, for now.
That's why I don't believe it's a good idea to do alpha-beta in hardware.
On the other hand, eval() takes about 1/3 of the time in a chess engine for example, and it's very simple to parallelize. It should be possible to do a pretty complicated eval() in a few clock cycles.
With search, it's possible to search multiple children at the same time by mapping them into duplicated hardware, but then alpha-beta efficiency decreases, etc.
The clock/oscillator you have on the board actually has little to do with how fast you can clock your design. All modern FPGAs have integrated PLL circuits that allow you to generate a very wide range of clocks from a fixed input clock.
The actual maximum speed you can achieve with your design depends on design complexity (especially number of layers of logic between flip flops), FPGA speed grade, FPGA architecture, etc.
Most careful designs on low cost FPGAs can go to 200 MHz or so. At 250 and above you have to be extremely careful, making sure EVERYTHING is pipelined, etc.
Even at 20Mnps I would say that's still quite a feat, considering a low cost FPGA draws about 0.5W, and a CPU searching 20Mnps would be drawing on the order of 100W.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.