Discussion of Special Purpose hardware for chess search

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Search in hardware

Post by sje »

Deep Blue, just like Belle, had search implemented in hardware. Just like Belle, DB handed off terminal subtrees to the special purpose hardware that had enough smarts to run an N ply search plus a basic quiescence search.
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Discussion of Special Purpose hardware for chess search

Post by Gerd Isenberg »

Zach Wegner wrote: That's pretty cool, I wish I had the time, knowledge, and resources to build an analog. That's my other hobby, you know, playing sythesizers. I guess it is monophonic? You'd need some pretty complex electronics to do a polyphonic, and with 3 voices, what's the point really...

Of course, the all-time best synthesizer solo is Don Preston's Moog solo on Waka/Jawaka. I'm sure you're familiar. I love cranking that one up on vinyl.

And to keep this running on chess, for each search it could play the PV as a melody, each note being the board position as you describe (we could call them "Godel waves"). Sounds pretty sweet... Now that I think about it, I could probably do this in software pretty easily. Maybe a little plugin-option for ZCT? ;)
Not that hard to re-build a Formant from a dutch/german electronic journal Elektor from the 70ties:
http://www.synthmuseum.com/elektor/eleform01.html
http://www.analoghell.com/studio/forsale/formant/

In the sixties they also had early construction manuals of a 4-bit ttl-based computer with some leds and switches for i/o. Later an 8088 based chess computer. More recently a chess computer with a 8051 based flash-board:
http://www.elektor.de/jahrgang/2003/feb ... 2049.lynkx
(german)
rfadden

Re: Discussion of Special Purpose hardware for chess search

Post by rfadden »

Zach Wegner wrote:
Dann Corbit wrote:[snip reference]
Hmm, not quite. That seems mainly about their parallel search, still software based. I want to say that they had some basic search logic in the chips, so that the last N plies are entirely hardware. Maybe this isn't the case, but I think that would probably be the most interesting application of special hardware. What's the use of having move generation and eval in hardware, when you need to call it every time from software and just wait for the result? For real efficiency you need to be able to just set the hardware in motion and let it search without needing software.
Umm well... as I said, I'm a hardware architect and "For real efficiency" you always work toward a balance between hardware and software, between the custom logic and the programmable processor that must be part of the system.

When you say "why develop hardware that you just have to wait for" this is not what I described.

If you have custom Eval logic and it takes a clock cycle per position to give you a result, and if the hardware operates as a pure pipeline with a certain number of clocks of transport delay through the pipelines, then you use the following software technique to feed the pipeline and to use the results of the pipeline.

Your engine controls search and it reaches the leaves of the tree, feeding the remainder of the problem to the custom logic by placing the current position into a FIFO (with the hardware consuming this FIFO). The software can search multiple branches of the tree and as a position is placed in the FIFO, search from the current Leaf is suspended until the answer which eventually comes out of the pipeline is available. At that time search from that Leaf proceeds where it has been temporarily suspended, in the intervening time the software has been fully busy with it's other such parallel searching.

This is a typical technique for feeding a fast pipeline and for using the results that come out at "one result per clock" at the other end of the pipeline.

Also when you refer to "waiting for the hardware", this is not really what we are talking about. The above described hardware is so fast that it is far, far faster than software Eval.

Actually I described this appraoch as possibly at times waiting for the software to feed positions into the input FIFO of the hardware, at one position per clock.

Well anyway, the general purpose processors these days can keep up to that rate and so I'm describing what I consider a "balanced system."

Another thing to keep in mind. When you work in this field of special purpose hardware you definately do not want to take a large piece of mostly sequential steps with complex interactions and conditionals, and then just attempt to place this in a custom chip. First, mostly sequential logic is implemented inside the chip by building a state machine that is driven by RAM, and note there are some really strong reasons to leave this logic back in the General Purpose computer.

It is not a good idea to deliberately put complex sequential logic into a custom chip. You end up building small custom programmable computers inside the chips (state machine engines) and these units tend to be where you may have nasty bugs hiding in the completed chip. Instead use the reliable and fully debugged general purpose processors, and there when there is a problem you can change the software.

The logic within an FPGA implementation can be changed, like software, but if a custom chip is developed these can not be changed without going through a lengthy and very expensive process.

It is very natural to put pipelined computation into hardware and leave complex control logic that implies sequential stepping - leave that out of the chip.

------------

To proceed in this discussion we need to ask the question: where is this supposed process of "Waiting?"

1. Is the pipelined Eval waiting due to the software's inability to keep up with supplying one position per hardware clock cycle?

or

2. Is the General Purpose computer (or set of 8 or 16 processors) waiting for the hardware due to it's ability to generate more than one position per clock cycle?

Keep in mind that I can buy an fast general purpose x86 processor for $80.

We always strive to create a balanced system, and the bulk of complex (mostly sequential) control logic should be left in the General Purpose computer.

Intense math and what may be thought of as parallel evaluation of formulas is what most naturally would be placed in custom logic.

Essentially the "Innermost Loop" goes in to custom logic. You do not place the entire vast chess search and evaluation system into custom logic.

How may years would you be willing to wait for this task to be completed? By the time this custom logic design is done the algorithm would need to change to reflect advancements in chess search.

One learns from experience to leave the vast bulk of complex sequential control logic in the fast general purpose computers.

--------------

Here's another angle: try the experiment of running an existing chess engine with a "Zero Time Eval" (I already described how to do this)...

Then ask if this engine is as fast as you would like. If not then consider multipe chips where separate search engines in software each use an FPGA or a custom chip in this fashion.

Most likely this approach will include a good initial balance between hardware and software.
Tony

Re: Discussion of Special Purpose hardware for chess search

Post by Tony »

Zach Wegner wrote:
Dann Corbit wrote:[snip reference]
Hmm, not quite. That seems mainly about their parallel search, still software based. I want to say that they had some basic search logic in the chips, so that the last N plies are entirely hardware. Maybe this isn't the case, but I think that would probably be the most interesting application of special hardware. What's the use of having move generation and eval in hardware, when you need to call it every time from software and just wait for the result? For real efficiency you need to be able to just set the hardware in motion and let it search without needing software.
You're correct. Simply put, Deep Blue aplha beta software search would fire off 6 ply hardware searches (including quiescence) where we would call quiescence.

Tony
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Discussion of Special Purpose hardware for chess search

Post by Zach Wegner »

rfadden wrote:Umm well... as I said, I'm a hardware architect and "For real efficiency" you always work toward a balance between hardware and software, between the custom logic and the programmable processor that must be part of the system.

When you say "why develop hardware that you just have to wait for" this is not what I described.
I knew someone would say this...

What I mean is that the way it works now, software basically goes to a position, sends a position to the hardware, and then waits for the result. You can't do much else in between. If you had something to do in the meantime, sure, go ahead. Balance is of course optimal. But if you consider that the main bottleneck here is the communication between HW and SW, you should minimize the dependence between HW and SW. SW takes a variable amount of time to find a node to evaluate, so it's not always best to have HW wait for SW to pass it a node. If HW can just churn along on it's own, then that's probably the best way to do it.

I know that I'm just a lowly 20 year old that has never designed any hardware, but I think I have a good enough idea of the theory behind it.

Here's another idea that might be worthwhile though:
To maximize the "pipeline", the SW could shoot off a bunch of qsearches at horizon level after searching the first N moves. This is analogous to a split point in SMP: you're assuming that none of the qsearches will fail high, so you're basically parallelizing them. You could even have a "branch misprediction", where the pipeline is cleared if a position gets a cutoff.

Does this make sense to anyone? It's getting pretty interesting to me at least...
rfadden

Re: Discussion of Special Purpose hardware for chess search

Post by rfadden »

Zach Wegner wrote:
rfadden wrote:Umm well... as I said, I'm a hardware architect and "For real efficiency" you always work toward a balance between hardware and software, between the custom logic and the programmable processor that must be part of the system.

When you say "why develop hardware that you just have to wait for" this is not what I described.
I knew someone would say this...

What I mean is that the way it works now, software basically goes to a position, sends a position to the hardware, and then waits for the result. You can't do much else in between. If you had something to do in the meantime, sure, go ahead. Balance is of course optimal. But if you consider that the main bottleneck here is the communication between HW and SW, you should minimize the dependence between HW and SW. SW takes a variable amount of time to find a node to evaluate, so it's not always best to have HW wait for SW to pass it a node. If HW can just churn along on it's own, then that's probably the best way to do it.

I know that I'm just a lowly 20 year old that has never designed any hardware, but I think I have a good enough idea of the theory behind it.

Here's another idea that might be worthwhile though:
To maximize the "pipeline", the SW could shoot off a bunch of qsearches at horizon level after searching the first N moves. This is analogous to a split point in SMP: you're assuming that none of the qsearches will fail high, so you're basically parallelizing them. You could even have a "branch misprediction", where the pipeline is cleared if a position gets a cutoff.

Does this make sense to anyone? It's getting pretty interesting to me at least...
Zach your ideas are good.

I did explain that with further software/searching technique the software doesn't have to wait for the hardware. I described a technique that we use in current hardware.

You feed a problem to the hardware by placing the Info in a FIFO that is consumed "automatically" by the hardware, and then in software you continue to prepeare the next problem.

In Chess and in many other problem spaces you actually want the result from the hardware *before* you make further decisions, but what you do in practice is you implement a compromise where you proceed without the answer and you make the best use of the software processing time to accomplish "overlap" between software computation and hardware computation.

Zach I think you and I are both talking about the same kind of solution, so as I said, your ideas are good.

The nice thing about being 20 is the creativity, man!

I have tried to hang on to that creativity as best as I can, and you've got it man! You've got the original thing...


You know... these discussions about "overlap" occur among designers as we are working on the architecture of proposed hardware/software. It's a typical subject in this field of Engineering. It is very natural to discus various alternatives.

Thanks.