FPGA chess

matthewlai · Post by **matthewlai** » Wed Nov 26, 2014 7:30 pm

Just wondering if anyone is interested in creating custom hardware for chess.

Way back in the days of Deep Blue, there was a lot of custom hardware, but custom hardware at that time costed a lot of money.

Nowadays, thanks to FPGAs (can be thought of as "blank" chips that can be configured into different hardware designs), custom hardware is well within the budget of most hobbyists (a cheap FPGA board is less than $100).

A typical architecture is to synthesize a general purpose CPU to do most things, and modify the CPU to include custom instructions to accelerate inner loops.

For example, for chess, there could be an instruction for generating all moves for a piece, an instruction that applies or unapplies moves, or even an instruction that does eval().

The limiting factor would be chip space, and allocation of that space would be interesting. For example, given enough space, it's probably possible to do eval() in a few clock cycles thanks to how parallelizable most eval() functions are.

It's something I really want to look into, coming from an electrical engineering background, but I'm not sure if many people would be interested, and have the required hardware design knowledge.

Has anyone been working on something similar?

Custom/hybrid hardware is not prevalent in consumer computers yet, but Intel is developing CPUs with included FPGA fabric, so it will probably become a normal part of a CPU in 5-10 years time.

Hybrid hardware is already very popular in embedded space (eg. Xilinx Zynq), where the CPU is usually on fixed hardware (instead of synthesized on the custom fabric) for cost and performance reasons (fixed hardware can be clocked faster).

Joost Buijs · Post by **Joost Buijs** » Wed Nov 26, 2014 7:47 pm

matthewlai wrote:Just wondering if anyone is interested in creating custom hardware for chess.

Way back in the days of Deep Blue, there was a lot of custom hardware, but custom hardware at that time costed a lot of money.

Nowadays, thanks to FPGAs (can be thought of as "blank" chips that can be configured into different hardware designs), custom hardware is well within the budget of most hobbyists (a cheap FPGA board is less than $100).

A typical architecture is to synthesize a general purpose CPU to do most things, and modify the CPU to include custom instructions to accelerate inner loops.

For example, for chess, there could be an instruction for generating all moves for a piece, an instruction that applies or unapplies moves, or even an instruction that does eval().

The limiting factor would be chip space, and allocation of that space would be interesting. For example, given enough space, it's probably possible to do eval() in a few clock cycles thanks to how parallelizable most eval() functions are.

It's something I really want to look into, coming from an electrical engineering background, but I'm not sure if many people would be interested, and have the required hardware design knowledge.

Has anyone been working on something similar?

Custom/hybrid hardware is not prevalent in consumer computers yet, but Intel is developing CPUs with included FPGA fabric, so it will probably become a normal part of a CPU in 5-10 years time.

Hybrid hardware is already very popular in embedded space (eg. Xilinx Zynq), where the CPU is usually on fixed hardware (instead of synthesized on the custom fabric) for cost and performance reasons (fixed hardware can be clocked faster).

Chrilly Donninger already did this about 8 years ago with his 'Chess Machine Hydra'.
He more or less used the design from 'Belle' and programmed that in FPGA.
At a certain point he had 64 of these cards running and did about 150 Mnps.
Unfortunately he was easily beaten by Rybka and I assume that is why he abandoned the whole project.

BeyondCritics · Post by **BeyondCritics** » Thu Nov 27, 2014 3:35 am

Hi Matthew,

you said a cheap fpga is less thang 100$. Could you use such a platfrom for a mere demonstration project or is it possible to house serious applications? I have no idea about that. Which board do you recommened?

Maybe i will buy myself one of the new epiphany cell platforms
http://www.parallella.org/parallella-models/
Price is around 100$. You get your hands on cell programming and fpga programming at the same time. What do you think about this?

stegemma · Post by **stegemma** » Thu Nov 27, 2014 12:31 pm

I'm interested, i talked about this some days ago, with a friend, and then i searched some information on the net. I've found this interesting document:

http://numato.com/learning-fpga-and-ver ... troduction

I'm sorry to not being an electronic expert but i liked gate logic programming, since when i was joung. When i was 15, i've designed a circuit with only logic gates that would play chess (a single move, not a whole game). I'd never realized the board, then, but now, with FPGA, it would possible to almost anyone to get some working result.

I think that the right approach is not to build a "CPU" with some chess istructions but a full "chess-machine", where all the logic is embedded in the circuit itself. Only that way, IMHO, you can compete with modern general-purpose CPUs. This kind of circuit will be very complex, because you have to think in a very parallel paradigma but still implementing some kind of alpha-beta in hardware.

I should admit that i would need months just to find the time to start with some experiment on FPGA... but this is one of my "to do before to die".

matthewlai · Post by **matthewlai** » Thu Nov 27, 2014 2:40 pm

Joost Buijs wrote: Chrilly Donninger already did this about 8 years ago with his 'Chess Machine Hydra'.
He more or less used the design from 'Belle' and programmed that in FPGA.
At a certain point he had 64 of these cards running and did about 150 Mnps.
Unfortunately he was easily beaten by Rybka and I assume that is why he abandoned the whole project.

Ah that's slightly different I think. They used add-on cards as coprocessors, which require a much coarse level of task division, due to bus latency.

For example, it would not be possible in that case to use FPGAs for eval only, because even if eval() is instantaneous on the FPGA, the latency between the CPU and FPGA would be orders of magnitude slower than just doing the eval on the CPU.

With a more tightly coupled approach (CPU and FPGA fabric on the same chip), it's possible to do task division on the instruction level with no latency penalty.

matthewlai · Post by **matthewlai** » Thu Nov 27, 2014 3:25 pm

BeyondCritics wrote:Hi Matthew,

you said a cheap fpga is less thang 100$. Could you use such a platfrom for a mere demonstration project or is it possible to house serious applications? I have no idea about that. Which board do you recommened?

Maybe i will buy myself one of the new epiphany cell platforms
http://www.parallella.org/parallella-models/
Price is around 100$. You get your hands on cell programming and fpga programming at the same time. What do you think about this?

FPGAs are certainly used in serious commercial applications, though they don't usually use off-the-shelf boards. They would integrate the actual FPGA chip into whatever product they are making. FPGA chips themselves are anywhere from $5 to a few hundred $s depending on size and speed required.

They are used in products that require the speed or power efficiency of custom hardware, but don't have the volume to warrant custom ICs (ASIC, application-specific integrated circuits).

Off-the-shelf FPGA boards are usually only used in very early development (performance evaluation, choosing an appropriate chip, etc), or for extremely low volume (<10) products. FPGA companies often give evaluation boards to their customers (bigger companies) free of charge. They are good for hobbyists too of course, but unfortunately we usually have to pay for them.

Usually the cost breakdown is like this -
FPGA - $10 per unit + $100 initial
ASIC - $1 per unit + $1M initial

So at 100,000 units or so, ASICs become more cost effective.

ASICs are also much more risky for the company, because mistakes are very hard (costly) to fix, and small chip design mistakes have actually bankrupted smaller companies. On FPGAs if you make a mistake, you just have to update the FPGA configuration (almost like a firmware update).

Even very high volume companies usually do prototyping and development on FPGAs, before committing the final design into ASICs for mass production.

The board you linked looks pretty good. Another popular option is Digilent (http://www.digilentinc.com/Products/Det ... &Prod=ZYBO). You don't get the coprocessor, but you get more IO options I believe.

matthewlai · Post by **matthewlai** » Thu Nov 27, 2014 3:47 pm

stegemma wrote:I'm interested, i talked about this some days ago, with a friend, and then i searched some information on the net. I've found this interesting document:

http://numato.com/learning-fpga-and-ver ... troduction

I'm sorry to not being an electronic expert but i liked gate logic programming, since when i was joung. When i was 15, i've designed a circuit with only logic gates that would play chess (a single move, not a whole game). I'd never realized the board, then, but now, with FPGA, it would possible to almost anyone to get some working result.

I think that the right approach is not to build a "CPU" with some chess istructions but a full "chess-machine", where all the logic is embedded in the circuit itself. Only that way, IMHO, you can compete with modern general-purpose CPUs. This kind of circuit will be very complex, because you have to think in a very parallel paradigma but still implementing some kind of alpha-beta in hardware.

I should admit that i would need months just to find the time to start with some experiment on FPGA... but this is one of my "to do before to die".

Yeah logic gates are fun

. HDL is definitely the way to go for larger scale projects, though. I am a VHDL person, just because that's what's taught at my university.

It's very hard to do everything in hardware, because of data dependency issues. It's the same issue that stops parallel searches from scaling linearly.

One method is to hardcode the last few plies in hardware, and I believe that's the approach taken by a few chess computers.

Some things are always better done on a CPU - things like IO, so there will probably still need to be a CPU in any event. Very few real world FPGA designs have no CPU at all (either as an actual chip, or synthesized on the FPGA).

So it's just a question of what goes into software, and what goes into hardware. Alpha-beta I feel would be better to do in software because of the flexibility required which both increases chip area usage, and limit possible speedup.

For example, one possible implementation is to basically map the search tree into hardware. It's fast, but it takes up FPGA space proportional to nodes to search, so thanks to branching factor, can't be done for significant number of plies (that's why previous chess computers limited it to 3 or so). Quiescent search would also be difficult because the tree shape is unpredictable.

In fact, alpha-beta would not be needed if we are searching all nodes at the same time. It would just be negamax.

Chip size requirement can be reduced by doing nodes in batches, but beyond some point, we won't be faster than a CPU anymore.

A CPU-FPGA hybrid design is always about space-speed tradeoffs, so it's a quest in finding the correct things to accelerate to get maximum total speed. I am not really convinced offloading alpha-beta is a good choice, but much smarter people than me have disagreed.

jdart · Post by **jdart** » Thu Nov 27, 2014 4:06 pm

Don't forget Hitech, implemented in the 80's: http://www.amazon.com/All-Right-Moves-A ... 0262050358. It was a custom chip, not a FPGA. It claimed some improvements over Belle.

--Jon

bob · Post by **bob** » Thu Nov 27, 2014 5:23 pm

jdart wrote:Don't forget Hitech, implemented in the 80's: http://www.amazon.com/All-Right-Moves-A ... 0262050358. It was a custom chip, not a FPGA. It claimed some improvements over Belle.

--Jon

It really doesn't fit here. (1) it used one chip per square (All The Right Moves, Ebeling's dissertation) but did nothing different than Belle in that regard; (2) it had hardware feature recognizers that they added one at a time, which were basically evaluation terms but they were 100% independent of each other with no "feed-over" so that one term could affect another.

Hsu's original comment was "It is the wrong way to go as it is not scalable at all and can't use multiple CPUs, period."

There were multiple hardware implementations over the years, starting with the early Belle's that had hardware move generation and evaluation, but not search; the 1980 Belle that did everything in PLA's and hit 150K+ nodes per second; BeBe (BB, black box) by Scherzer that was another hardware implementation but never approached Belle's strength (peaked at maybe 30K nodes per second). Then Hitech (roughly belle with an improved evaluation) and deep thought (belle on a chip with a much-improved hardware evaluation added in. The most recent was Hydra which was strong, but not as strong as best SMP engines. I think the hardware speeds have reached a point, along with the number of available cores, so that FPGA is not worth the effort and hassle.

Milos · Post by **Milos** » Thu Nov 27, 2014 5:25 pm

matthewlai wrote:Just wondering if anyone is interested in creating custom hardware for chess.

Way back in the days of Deep Blue, there was a lot of custom hardware, but custom hardware at that time costed a lot of money.

Nowadays, thanks to FPGAs (can be thought of as "blank" chips that can be configured into different hardware designs), custom hardware is well within the budget of most hobbyists (a cheap FPGA board is less than $100).

A typical architecture is to synthesize a general purpose CPU to do most things, and modify the CPU to include custom instructions to accelerate inner loops.

For example, for chess, there could be an instruction for generating all moves for a piece, an instruction that applies or unapplies moves, or even an instruction that does eval().

The limiting factor would be chip space, and allocation of that space would be interesting. For example, given enough space, it's probably possible to do eval() in a few clock cycles thanks to how parallelizable most eval() functions are.

It's something I really want to look into, coming from an electrical engineering background, but I'm not sure if many people would be interested, and have the required hardware design knowledge.

Has anyone been working on something similar?

Custom/hybrid hardware is not prevalent in consumer computers yet, but Intel is developing CPUs with included FPGA fabric, so it will probably become a normal part of a CPU in 5-10 years time.

Hybrid hardware is already very popular in embedded space (eg. Xilinx Zynq), where the CPU is usually on fixed hardware (instead of synthesized on the custom fabric) for cost and performance reasons (fixed hardware can be clocked faster).

Before Intel comes with a chip there will be no cheep solutions. The only existing high performance solution today (and it is very recent) is IBM's POWER8 with CAPI.
Zynq boards are great tools for prototyping, but they have 2 big disadvantages for chess:
1) they are relatively expensive (~1000$)
2) CPUs are really not powerful (dual core A9 on 667-866MHz, low on cache, optimized for floating point)
3) maximum DDR3 memory is limited to 1GB

FPGA chess

FPGA chess

Re: FPGA chess

Re: FPGA chess

Re: FPGA chess

Re: FPGA chess

Re: FPGA chess

Re: FPGA chess

Re: FPGA chess

Re: FPGA chess

Re: FPGA chess