The Gigatron project

Ras · Post by **Ras** » Tue Dec 05, 2017 10:04 pm

Wow, this is really awesome!

hgm · Post by **hgm** » Tue Dec 05, 2017 10:45 pm

To get back to the Chess program:

Road Map

The eventual goal would be to have something that uses an alpha-beta search with null-move pruning, LMR and IID, using staged move generation, with MVV/LVA sorting for the captures, and killer heuristic for non-captures. There is not enough RAM for a meaningful transposition table, but a poor-man's substitute could be to have each node hold TT entries for all its daughter nodes. And perhaps keep this info around for the PV in a tri-angular array. First goal would be to just have the alpha-beta search with move ordering, and add NMP, LMR, IID and killer later.

For evaluation I would want to include material/PST, mobility, passed-Pawn evaluation, discounting of drawish end-games, King safety. The first version would only have material/PST.

The game state will be held as mailbox board, piece list, neighbor table and attack map, mobility list, and Pawn passage. This will be incrementally updated. This game state includes a lot of redundancy. Update of the board and piece list is trivial, but that of the other elements is complex and involved. So MakeMove() will be slow. OTOH, move generation from this data will be fast, and the moves can be generated in the desired order. Captures can be generated in MVV order by running through the opponent piece list, and examining the attack map at their location for attackers.

Fulvio · Post by **Fulvio** » Wed Dec 06, 2017 9:31 am

hgm wrote:Marcel van Kervinck has developed this wonderful retro computer, the Gigatron:

Cool!!

mar · Post by **mar** » Wed Dec 06, 2017 12:42 pm

This is simply mindblowing...

I wonder about the instruction set.

Do I understand correctly that it only has a handful of ops (add, sub, and, or, xor), 3 regs (accumulator, X, Y) with fancy addressing modes and conditional branches compare accumulator to zero without flags?

hgm · Post by **hgm** » Wed Dec 06, 2017 1:15 pm

Indeed, that is about it. But that doesn't make it very different from the celebrated 6502. (The lack of condition is a bit of a downer, though.) And because of the Harvard architecture it tends to outperform the latter. Even though there is no 16-bit addressing mode, you can load the high address in Y, and use it as a page register. Then the access takes two clock cycles, but on 6502 absolute addressing took 4 clocks cycles: fetch of opcode and both address bytes, and then the access. Even when Y needs to be preserved you can afford to save a copy of it in zero-page memory, and load Y from it again after use as page register, ad you are still not off any worse. But in many cases, Y doesn't need to be preserved, so you are 2 cycles faster. And in other cases, Y has to be restored only after a bunch of uses as page register, or to the same value after a number of such bunches (so you can use the same stored value).

I have to get used a bit to the absence of a real indexed mode, however. It took me some time to discover that the best way to allocate global tables is 'vertically', i.e. not as contiguous memory, but one byte in each memory page (at the same offset), so that it can be indexed by the high byte of the address, through the Y register.

hgm · Post by **hgm** » Wed Dec 06, 2017 3:44 pm

Basic operation

The design I have in mind will use a neighbor table, which for each occupied square will hold the square number of the nearest neighbor in each of the 8 principal directions. This nearest neighbor can be an edge guard, when there is nothing between the square and the board edge. So the table will have to contain 10x10 elements, although for any particular direction the final element along that direction (always just beyond the board edge) will not have a further neighbor.

Then there will be an attack map, for each square holding bit flags to indicate attacks on it. There will actually be 4 such board size tables, for white and black attackers each a set of slider and a set of leaper attacks. The slider attacks will be flagged by direction, 1 bit for each direction of movement. The leaper attacks will be flagged by piece for King and Knights, (3 bits, perhaps 4 if we want to allow for a 3rd Knight), but by direction for Pawns (4 bits, one for each diagonal direction).

For the update of the attack map it will be handy to also record attacks on the edge guards, so that these can be treated exactly the same as on-board attacks. This means the slider attack maps will have to be 10x10. For leapers it would also be inconvenient having to test all the time whether a potential attack goes off-board or not, and the map will even have to be 12x12 to make sure it catches all Knight jumps. Leaper attacks will be valid for any square; we will just apply or remove them to the map whenever the leaper resposible for them gets moved or captured. Slider attacks will only be faithfully recorded on occupied squares, to limit the cost of updating (as sliders can attack very many unoccupied squares).

The piece list will be set up as a doubly-linked list, so that captured pieces can be easily removed from them. (And then re-inserted on UnMake().) The neighbor tables are in fact also a collection of doubly-linked lists, one for each ray, so that each occupied square is aware of its two neighbors on the ray through its forward and backward links. Evacuating a square will just remove if from the lists for all 4 rays it is on, and re-occupying it during UnMake will re-insert it, based on its old neighbors, which it still remembers.

Generating captures will be done by running through the piece list, which is kept in order of decreasing piece value. The piece numbers produce the piece locations, the locations produce the set of attacks on it, all through lookup in the appropriate table. The attacks will be extracted from the set also by table lookup. For leapers this would result in a piece number for the attacker (K or N), which finally is used to look up the location of the attacker. This leaves all 4 descriptive elements of the move (fromSqr, toSqr, piece and victim) known, so the capture can be made and searched. A Pawn would result in a recognizably invalid piece number (like negative or beyond the end of the piece table) when its turn comes up to be extracted. In this case the number would be used to index a table that contains the board step of the Pawn capture, which, added to the to-square, would produce the from-square, and finally through lookup on the board the number of the attacking piece (a Pawn).

Code: Select all

victim = nextPiece&#91;victim&#93;; // step through piece list
toSqr = location&#91;victim&#93;;
leaps = leaperAttacks&#91;stm&#93;&#91;toSqr&#93;;
while&#40;leaps&#41; &#123;
  piece = set2leaper&#91;leaps&#93;+stm; // extracts in order Pawns, Knights, King
  leaps = stripBit&#91;leaps&#93;; // clears lowest set bit, for next iteration
  if&#40;piece <= MAXPIECE&#41; &#123;
    fromSqr = location&#91;piece&#93;;
  &#125; else &#123;
    step = pawnSteps&#91;piece - MAXPIECE&#93;; // 4-entry table
    fromSqr = toSqr + step;
    piece = board&#91;toSqr&#93;;
  &#125;
  if&#40;SearchMove&#40;)) goto cutoff;
&#125;
// go on with slider attackers

For sliders the situation is a bit more complex. Because their attacks are kept by direction, we canot be sure they are extracted in LVA order, and it will be necessary to sort the attacks. This will be done by first converting the direction-wise attack set to an attackers set, where each bit corresponds to a particular slider. The latter will then be extracted in the order B,B,R,R,Q. to make the actual captures. So the data flow here also starts from victim to location to sliderAttacks, but then it extracts direction, neighbor, attacker, attackerBit to OR that ito the attackers set. nI the second pass then attackers -> piece -> fromSqr. Again, all through sequential table lookups:

Code: Select all

slides = sliderAttacks&#91;stm&#93;&#91;toSqr&#93;; // we already had determined toSqr and victim
attackers = 0;
while&#40;slides&#41; &#123;
  direction = set2dir&#91;slides&#93;; // actually we tabulate &neighbor&#91;direction&#93;...
  fromSqr = neighbor&#91;direction&#93;&#91;toSqr&#93;; // ... so we can do fromSqr = direction&#91;toSqr&#93; here
  piece = board&#91;fromSqr&#93;;
  attackers |= attBit&#91;piece&#93;;
  slides = stripBit&#91;slides&#93;;
&#125;
while&#40;attackers&#41; &#123;
  piece = set2slider&#91;attackers&#93;+stm;
  fromSqr = location&#91;piece&#93;;
  attackers = stripBit&#91;attackers&#93;;
  if&#40;SearchMove&#40;)) goto cutoff;
&#125;

This generates captures very fast, just 6 or 7 table lookups and an occasional ADD or OR. And most of that is what you would normally do anyway to fetch and/or unpack a move from a move list, to get fromSqr, toSqr, piece and victim, in order to perform a MakeMove(). And you ca stop generating as soon as you reach the futile victims, i.e. if victim > futilityThreshold (which you would determine in advanced based on the high byte of alpha - curEval, through another table lookup).

Non-capture generation could be done in the usual mailbox way, by running through your own piece list, and for each piece through a list of directions, and for each direction along the ray until you hit a non-empty squere (for a slider) or just one step (when it is a leaper). I am not sure if it would have any benefits to use history to order the non-captures; this would definitely cause a lot of complication. Maintaining a list of a dozen or so 'hot moves', and checking if these are possible in the current position, similar to what is done with killers, and then do them before non-capture generation might be faster than putting all non-captures in a move list, assigning them a history score, and sorting the lot.

hgm · Post by **hgm** » Thu Dec 07, 2017 12:05 am

Memory allocation

The Gigatron has only one addressing mode where it combines a constant from the instruction with a register, and in that case the register will deliver the high byte of the address. As we will need to access many tables indexed by square or piece numbers, and it would be a pain to have to load the fixed part of the table address into a register, the best way to do this is to store the pages 'vertically', i.e. let the low address byte determine the table, and the high byte the index. Because there is only 32KB RAM, and not 64KB, the high 'byte' of the address actually is just 7 bits; the 8th bit is ignored. This means that vertically stored tables can contain at most 128 bytes.

Now this is (barely) large enough for a board plus the surrounding band of edge guards. The square number of a square (x,y) (where (0,0) = a1) can be made sqrA1 + 10*y + x. Knight moves made from h8 can end up at (9,8), which gives sqrA1 + 98, while from a1 they hit (-2,-1), which gives sqrA1 - 21. This spans 120 bytes in total, which will fit vertically in memory with only 8 pages to spare. The leaper attack maps for black and white will also have this size. We will allocate such tables in the high bytes of of the memory pages. E.g. the board at addresses $xxFF ($ meaning hexadecimal), with xx = 08...7F, the black attack map at $xxFE, etc.

This means that there are only 8 memory pages that are only 8 memory pages still completely empty; all others will have to suffer some table bytes at their upper end. Tables that would need more than 128 bytes can only be stored horizontally, and tables >= 256 bytes would really be problematic. But fortunately we do not need the latter. We do need some tables of exactly 256 bytes, though. They must be stored horizontally, and only 8 pages are still available to do this.

These 'full-size' tables are a table to extract the directions from a slider attacks set, which can have all bits, and the table containing the index with the lowest bit cleared. Other sets of attackers do not use the upper bits, so bit extraction from those could in principle be done with tables of 128 bytes or smaller. Now perhaps I am exaggerating here, because it seems not possible that slider attacks come from all 8 directions at once. Since Bishops will be on different colors, at most 4 sliders can be aimed at the same square, and the highest code with four 1 bits is $F0. And there is room to be smart here: we can assign the upper 4 bits to diagonal directions, and the lower 4 bits to orthogobal directions. There can be at most 2 diagonal slider attacks on the same square. So the largest possible code is then $CC = 204. This leaves 51 bytes at the top of the page unused. Which should be enough for all vertically stored tables.

Now there is still another matter for which we will need full-size tables, which is related to 16-bit arithmetic for scores. Unfortunately the Gigatron has no condition codes, and cannot detect whether an addition or subtraction produces a carry. This makes multiple-precision arithmetic problematic. One way to solve that problem is to use a 2-byte score format where the lower byte only uses the lowest 7 bits, i.e. stores values ranging from -64 to +63, so that overflow can be detected from the 8th bit being unequal to the 7th bit. In the case of such an overflow 128 would have to be added or subtracted to get the lower byte back in range, and 1 would have to be subtracted or added to the upper byte to compensate that. To make that fast it can be done by lookup tables indexed by the low byte, one table to get the correction for the upper byte (0, 1 or -1), the other the corrected value of the lower byte. This would require two full-size tables. But this is not fatal, as 8 empty pages were still available.

Rémi Coulom · Post by **Rémi Coulom** » Thu Dec 07, 2017 12:51 am

That machine is fantastic! I want one for Christmas.

IanO · Post by **IanO** » Thu Dec 07, 2017 1:17 am

I really admire this project. (Though it feels like cheating to develop it with a super fancy digital oscilloscope!)

I also admire your project to develop a chess program for it. I've always wondered what the strength of engines on 80's hardware would be given last few decades' advances in search and evaluation techniques. I mean, I don't think there was a single dedicated computer that had the null-move heuristic, much less the LMR or automatically tuned eval params.

Ras · Post by **Ras** » Thu Dec 07, 2017 10:44 pm

What programming language will you choose? The interpreter? Assembly? I guess a C compiler is out of question.

The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project