The Gigatron project

hgm · Post by **hgm** » Wed Dec 06, 2017 3:44 pm

Basic operation

The design I have in mind will use a neighbor table, which for each occupied square will hold the square number of the nearest neighbor in each of the 8 principal directions. This nearest neighbor can be an edge guard, when there is nothing between the square and the board edge. So the table will have to contain 10x10 elements, although for any particular direction the final element along that direction (always just beyond the board edge) will not have a further neighbor.

Then there will be an attack map, for each square holding bit flags to indicate attacks on it. There will actually be 4 such board size tables, for white and black attackers each a set of slider and a set of leaper attacks. The slider attacks will be flagged by direction, 1 bit for each direction of movement. The leaper attacks will be flagged by piece for King and Knights, (3 bits, perhaps 4 if we want to allow for a 3rd Knight), but by direction for Pawns (4 bits, one for each diagonal direction).

For the update of the attack map it will be handy to also record attacks on the edge guards, so that these can be treated exactly the same as on-board attacks. This means the slider attack maps will have to be 10x10. For leapers it would also be inconvenient having to test all the time whether a potential attack goes off-board or not, and the map will even have to be 12x12 to make sure it catches all Knight jumps. Leaper attacks will be valid for any square; we will just apply or remove them to the map whenever the leaper resposible for them gets moved or captured. Slider attacks will only be faithfully recorded on occupied squares, to limit the cost of updating (as sliders can attack very many unoccupied squares).

The piece list will be set up as a doubly-linked list, so that captured pieces can be easily removed from them. (And then re-inserted on UnMake().) The neighbor tables are in fact also a collection of doubly-linked lists, one for each ray, so that each occupied square is aware of its two neighbors on the ray through its forward and backward links. Evacuating a square will just remove if from the lists for all 4 rays it is on, and re-occupying it during UnMake will re-insert it, based on its old neighbors, which it still remembers.

Generating captures will be done by running through the piece list, which is kept in order of decreasing piece value. The piece numbers produce the piece locations, the locations produce the set of attacks on it, all through lookup in the appropriate table. The attacks will be extracted from the set also by table lookup. For leapers this would result in a piece number for the attacker (K or N), which finally is used to look up the location of the attacker. This leaves all 4 descriptive elements of the move (fromSqr, toSqr, piece and victim) known, so the capture can be made and searched. A Pawn would result in a recognizably invalid piece number (like negative or beyond the end of the piece table) when its turn comes up to be extracted. In this case the number would be used to index a table that contains the board step of the Pawn capture, which, added to the to-square, would produce the from-square, and finally through lookup on the board the number of the attacking piece (a Pawn).

Code: Select all

victim = nextPiece[victim]; // step through piece list
toSqr = location[victim];
leaps = leaperAttacks[stm][toSqr];
while(leaps) {
  piece = set2leaper[leaps]+stm; // extracts in order Pawns, Knights, King
  leaps = stripBit[leaps]; // clears lowest set bit, for next iteration
  if(piece <= MAXPIECE) {
    fromSqr = location[piece];
  } else {
    step = pawnSteps[piece - MAXPIECE]; // 4-entry table
    fromSqr = toSqr + step;
    piece = board[toSqr];
  }
  if(SearchMove()) goto cutoff;
}
// go on with slider attackers

For sliders the situation is a bit more complex. Because their attacks are kept by direction, we canot be sure they are extracted in LVA order, and it will be necessary to sort the attacks. This will be done by first converting the direction-wise attack set to an attackers set, where each bit corresponds to a particular slider. The latter will then be extracted in the order B,B,R,R,Q. to make the actual captures. So the data flow here also starts from victim to location to sliderAttacks, but then it extracts direction, neighbor, attacker, attackerBit to OR that ito the attackers set. nI the second pass then attackers -> piece -> fromSqr. Again, all through sequential table lookups:

Code: Select all

slides = sliderAttacks[stm][toSqr]; // we already had determined toSqr and victim
attackers = 0;
while(slides) {
  direction = set2dir[slides]; // actually we tabulate &neighbor[direction]...
  fromSqr = neighbor[direction][toSqr]; // ... so we can do fromSqr = direction[toSqr] here
  piece = board[fromSqr];
  attackers |= attBit[piece];
  slides = stripBit[slides];
}
while(attackers) {
  piece = set2slider[attackers]+stm;
  fromSqr = location[piece];
  attackers = stripBit[attackers];
  if(SearchMove()) goto cutoff;
}

This generates captures very fast, just 6 or 7 table lookups and an occasional ADD or OR. And most of that is what you would normally do anyway to fetch and/or unpack a move from a move list, to get fromSqr, toSqr, piece and victim, in order to perform a MakeMove(). And you ca stop generating as soon as you reach the futile victims, i.e. if victim > futilityThreshold (which you would determine in advanced based on the high byte of alpha - curEval, through another table lookup).

Non-capture generation could be done in the usual mailbox way, by running through your own piece list, and for each piece through a list of directions, and for each direction along the ray until you hit a non-empty squere (for a slider) or just one step (when it is a leaper). I am not sure if it would have any benefits to use history to order the non-captures; this would definitely cause a lot of complication. Maintaining a list of a dozen or so 'hot moves', and checking if these are possible in the current position, similar to what is done with killers, and then do them before non-capture generation might be faster than putting all non-captures in a move list, assigning them a history score, and sorting the lot.

hgm · Post by **hgm** » Thu Dec 07, 2017 12:05 am

Memory allocation

The Gigatron has only one addressing mode where it combines a constant from the instruction with a register, and in that case the register will deliver the high byte of the address. As we will need to access many tables indexed by square or piece numbers, and it would be a pain to have to load the fixed part of the table address into a register, the best way to do this is to store the pages 'vertically', i.e. let the low address byte determine the table, and the high byte the index. Because there is only 32KB RAM, and not 64KB, the high 'byte' of the address actually is just 7 bits; the 8th bit is ignored. This means that vertically stored tables can contain at most 128 bytes.

Now this is (barely) large enough for a board plus the surrounding band of edge guards. The square number of a square (x,y) (where (0,0) = a1) can be made sqrA1 + 10*y + x. Knight moves made from h8 can end up at (9,8), which gives sqrA1 + 98, while from a1 they hit (-2,-1), which gives sqrA1 - 21. This spans 120 bytes in total, which will fit vertically in memory with only 8 pages to spare. The leaper attack maps for black and white will also have this size. We will allocate such tables in the high bytes of of the memory pages. E.g. the board at addresses $xxFF ($ meaning hexadecimal), with xx = 08...7F, the black attack map at $xxFE, etc.

This means that there are only 8 memory pages that are only 8 memory pages still completely empty; all others will have to suffer some table bytes at their upper end. Tables that would need more than 128 bytes can only be stored horizontally, and tables >= 256 bytes would really be problematic. But fortunately we do not need the latter. We do need some tables of exactly 256 bytes, though. They must be stored horizontally, and only 8 pages are still available to do this.

These 'full-size' tables are a table to extract the directions from a slider attacks set, which can have all bits, and the table containing the index with the lowest bit cleared. Other sets of attackers do not use the upper bits, so bit extraction from those could in principle be done with tables of 128 bytes or smaller. Now perhaps I am exaggerating here, because it seems not possible that slider attacks come from all 8 directions at once. Since Bishops will be on different colors, at most 4 sliders can be aimed at the same square, and the highest code with four 1 bits is $F0. And there is room to be smart here: we can assign the upper 4 bits to diagonal directions, and the lower 4 bits to orthogobal directions. There can be at most 2 diagonal slider attacks on the same square. So the largest possible code is then $CC = 204. This leaves 51 bytes at the top of the page unused. Which should be enough for all vertically stored tables.

Now there is still another matter for which we will need full-size tables, which is related to 16-bit arithmetic for scores. Unfortunately the Gigatron has no condition codes, and cannot detect whether an addition or subtraction produces a carry. This makes multiple-precision arithmetic problematic. One way to solve that problem is to use a 2-byte score format where the lower byte only uses the lowest 7 bits, i.e. stores values ranging from -64 to +63, so that overflow can be detected from the 8th bit being unequal to the 7th bit. In the case of such an overflow 128 would have to be added or subtracted to get the lower byte back in range, and 1 would have to be subtracted or added to the upper byte to compensate that. To make that fast it can be done by lookup tables indexed by the low byte, one table to get the correction for the upper byte (0, 1 or -1), the other the corrected value of the lower byte. This would require two full-size tables. But this is not fatal, as 8 empty pages were still available.

Rémi Coulom · Post by **Rémi Coulom** » Thu Dec 07, 2017 12:51 am

That machine is fantastic! I want one for Christmas.

IanO · Post by **IanO** » Thu Dec 07, 2017 1:17 am

I really admire this project. (Though it feels like cheating to develop it with a super fancy digital oscilloscope!)

I also admire your project to develop a chess program for it. I've always wondered what the strength of engines on 80's hardware would be given last few decades' advances in search and evaluation techniques. I mean, I don't think there was a single dedicated computer that had the null-move heuristic, much less the LMR or automatically tuned eval params.

Ras · Post by **Ras** » Thu Dec 07, 2017 10:44 pm

What programming language will you choose? The interpreter? Assembly? I guess a C compiler is out of question.

Rémi Coulom · Post by **Rémi Coulom** » Thu Dec 07, 2017 10:58 pm

Is your emulator available?

mar · Post by **mar** » Thu Dec 07, 2017 11:48 pm

The instruction set is minimalistic, sexy and also challenging.

I wonder how one does function calls; there's no stack so one would have to emulate it manually
which would be costly/tricky (thinking of how to compute return address, managing stack)

Are indirect jumps possible?

Even 6502 had a 256-byte stack IIRC plus it also was possible to make limited register transfers.

I wonder if old tricks like self-modifying code might be useful again?

(I don't want to distract you too much from the project but I simply find Gigatron fascinating)

hgm · Post by **hgm** » Fri Dec 08, 2017 9:45 am

Rémi Coulom wrote:Is your emulator available?

Marcel's 'official' emulator is on gtihub: https://github.com/kervinck/gigatron-ro ... er/gtemu.c . It does generate the video signal as a stream of ascii characters, though.

What I did was plug this emulator into XBoard, to use XBoard's graphical system to actually display the generated VGA signal as an image. This is very slow, though; XBoard is not able to completely redraw its window 60 times per second. I only draw what changes, but this causes extra overhead if a lot changes.

I did not publish the XBoard patch that does this (and makes XBoard unusable as a GUI for Chess, btw), but if there is interest in it I could do so. Problem, however, is that there is currently no emulation of input.

Ras wrote:What programming language will you choose? The interpreter? Assembly? I guess a C compiler is out of question.

Indeed, no C compiler.

I did not like the speed penalty we would suffer when using an interpreter. So I opted for assembly. I have already written an assembler.

Problem with this approach is that the program also becomes responsible for generating the sync pulses in time, while the interpreter would do that for you. I built a pseudo-instruction into the assembler that will expand to the 5 instructions needed to test if there still is time to execute the code block until the address it mentions (usually as symbolic label), assuming linear execution, and if not, jump to the video handler. (Which then delays for the remaining time, before generating a sync pulse and resuming the code.)

Tight loops (such as ray scans over the chess board) become a problem, however; performing this test in each iteration would slow them down by a large factor. Even just counting how may times you did the loop (under the assumption that there would be enough time to handle the worst case, so you won't have to test if time is up, but just measure it) takes 3 instructions (load, add, store). Sometimes you can determine the number of iterations made afterwards, e.g. by the distance between the start and end-square of a ray scan, which you get from a table after the loop completes.

mar wrote:The instruction set is minimalistic, sexy and also challenging.

I wonder how one does function calls; there's no stack so one would have to emulate it manually
which would be costly/tricky (thinking of how to compute return address, managing stack)

Indeed. The Gigatron has the quirk that it always executes the instruction following a branch, even if the branch is taken. (A result of the instruction-fetch pipelining.) When jumping to subroutines, I use this instruction to load the return address in the accumulator. The subroutine can then start saving that, in a fixed place when it is not called recursively, and on a software stack it maintains itself when it is. This assumes the subroutine is in the same memory page as its caller, or at least knows in which page the caller must be. This is usually not too much of a problem. Not many subroutines in a program are called from more than one or two places. E.g. Search would be called only from itself and from SearchRoot, and even if these are in different pages, it could just explicitly test for the ply level to decide if it should return to SearchRoot or to itself.

Are indirect jumps possible?

Yes, the target address can be taken from the accumulator, or fetched from memory. So my subroutines typically look like:

Code: Select all

routine:
  STA savedReturnAddr
  ...                       // do stuff
  BRA [savedReturnAddr]
  NOP                    // still executed (sometimes the last instruction of the task can be put here)

The Intel860 'supercomputer on a chip' worked similarly, btw. It was a risc machine without stack, and it left the return address of a subroutine call in one of the registers. It also executed the instruction after a branch, IIRC. So the Gigatron is in good company here!

Even 6502 had a 256-byte stack IIRC plus it also was possible to make limited register transfers.

In the Gigatron X and Y cannot be transferred anywhere, and only act as memory address. But there are programmig tricks, where through the use of a memory table you could effectively get them back into A. But you would hardly ever need those, because the Gigatron can transfer A to X or Y as a side effect of a memory store. So during such a trasfer you can make a mirror of the new X or Y content for free, and load that back into the accumulator when you need it. When you load X or Y directly from memory it can only be through direct addressing (meaning the value was already in memory) or as immediate constant (in which case you could simply load that same constant).

Absence of a stack is a bit of a nuisance. The 6502 stack was only a very limited help, though. To set up a stack for local variables you would need the Y register to access them all the time. Problem is that the addressing mode [constant,X] is missing, for no good reason. Connecting D to the now unused input of the high address multiplexer could replace mode [X] (which is really [0,X], forcing the output of the multiplexer to 0) by mode [const,X], and then X could be used as a frame pointer.

Nice thing about the CPU not being a single immutable LSI chip is that you can dust off your soldering iron and add new instructions!

What I plan to do is solder a second Y register ('Z') on top of the existing one (as far as inputs are concerned), and wire the outputs of that to the unused high address multiplexer input, so that the modes [const] and [X] become [Z,const] and [Z,X]. This allows relocation of the zero page, so that each instance of a recursive routine can have its own 'direct page'. Main problem is to find (and decode) an opcode to load this Z. Perhaps the highest bit of the data (now unused because there is only 32KB of RAM, so the high part of the address is only 7 bits wide) should be used to decide if the data is written in Y or Z.

I wonder if old tricks like self-modifying code might be useful again?

This is out of the question. Because of the Harvard architecture there is no write access to program memory. In fact the program memory is an EPROM. This is an advantage of the interpreter: there the program it interprets resides in RAM, and can thus be easily loaded. A Chess program in native machine code would reside in EPROM, and would make it a dedicated machine.

This is one aspect I don't like; when I am going to build a Gigatron I will probably replace the EPROM by two CMOS RAM chips, and a battery for data retention. I must figure out a way to load a program in there, however. But I would have to figure out a way to program EPROMs in the other case, which is not nessecarily easier.

stegemma · Post by **stegemma** » Sat Dec 09, 2017 12:07 am

The project is very interesting but I would suggest you to start with a simpler program than those proposed. For the first test, something simple as my old Drago would be enough, just to see if the whole things can works. Maybe even just a Perft function would gives some hint on how to proceed with the whole program.

hgm · Post by **hgm** » Sat Dec 09, 2017 2:16 pm

Well, one of the problems with programming in assembler is that it is quite error prone, and that there is an appreciable chance that you will make errors even when entering code for a trivial task. So even when you have something simple that works, it will be a poor stepping stone to the more complex thing you eventually want, if that requires replacing code that does things in a simple way by the complex code. In a sense you are starting from scratch all over.

So I want to have a road map along an evolutionary development, which can improve sophistication by adding code to something that is already working, with only minimal changes (if any) required in that working stuff. So if my eventual goal is to do staged capture generation from an incrementally updated attack map, I think it would be a waste of time to first write something that generates captures by making ray scans over the board, put them in a move list, and extract them MVV/LVA order. Even if you had the latter, and it worked perfectly with perft and all, it would not have brought you an inch closer to the final goal. What would make sense is to write code that does generate an attack map from scratch, for the current position, and initially run the staged move generation from there. Then you can later add code in MakeMove/UnMake to keep an incrementally updated attack map of the same format, and add code to compare the two in every node. And when all obvious bugs in the incremental update are cured, switch to using the incrementally updated map for capture generation, and after that limit from-scratch generation and comparison only to the root, and when the engine is fiished, throw it away altogether.

That means that in any case I want my loop over moves in Search() to look similar to what it should look eventually; this wil be the core of the engine, and I don't want to have to completely replace that later. The C code I presented above isn't especially complex. It relies on tables, which should be initialized without errors, though, in particullar stripBit, which would hold stripBit[index] = index & (index-1), and set2dir[(1<<d) + n*(2<<d)] = (&neighbor[d]) >> 8 (with d=0 to 7 and n all possible integer values that keep the table index in the range 0 to 255). And consistency between the attBit[piece] and set2slider[attackersSet].

But it definitely is a good plan to start by making a version that is stripped of almost everyting that is not absolutely essential. That would indeed be a perft-like thing, as that would do away with everything that has to do with evaluation and alpha-beta pruning. Initially I also don't want to bother with the complications of e.p. capture, castling and promotion, so the perft numbers would not be correct, but who cares? Then material evaluation, minimaxing of scores and alpha-beta pruning can be added. Then best-move tracking and IID, using the best move of the previous iteration as 'hash move'. Then tracking of the PV, using the PV move (when available) as hash move.

Promotions

The thing that is still most unclear to me is how to best handle promotion. Because the design at various stages uses bit sets to represent a collection of leapers or sliders, the number of these cannot be allowed to exceed 8. For the sliders that would leave room for 3 extra Q, R or B, (as you start with only 5 of those), and for the leapers 3 extra Knights (after K, N, N, left-P, right-P). The sets will be used in the lookup tables set2leaper[] and set2slider[] to find their index in the piece table. Eventually the value in this index will have a meaning in itself, because although the piece list is traversed as a linked list, following the links, and thus potentially visiting them in any order, I want to detect whether futile victims are reached just from that index, rather than having to consult a lookup table for the piece value to see if the piece at that index is below the futilityThreshold, for every potential victim. Instead I want to translate the (high byte of the) eval-alpha difference to a piece index through a lookup table at the top of Search(), which tabulates the index of the first piece capture of which would not exceed the futility threshold. So it is important that the indexes represent the pieces in order of descending value, and that the linked piece lists respect this order.

So I have to reserve some entries early in the piece list for extra Queens, which can be done directly behind the King. As the King is always present, it would also be easy to insert such a Queen in the piece list (and take out the Pawn that promoted to it) on promotion moves. But since there is only room for 3 extra Queens in the bit sets, that means I cannot reserve a 'promotion Queen' slot for every Pawn, so that the promotion slots should be organized in a way that makes it easy to assign a slot to the Pawn that happens to promote. I guess this can be done by linking the spare slots into a list (using the same links as normally used to make them part of the piece list), and just unlinking the first element of that list on promotion.

For underpromotions it is more trick, however. I am not sure I want to allow those, and even if I do, they would only come at the latest stage of development. But since the way they would be handled would affect the design of the piece list, I want to reserve the required assets for them now, so that I wouldn't have to change the handling of the piece lists for the other pieces later. Ideally pieces from under-promotion would be assigned indexes between Rook and Bishop, so that they are in the right place irrespective of whether they are Knight, Bishop and Rook. (I don't care about the order of Bishops and Knights). It would be difficult to insert them there in the doubly-linked list, however, as you don't know what the intended closest neighbor is that is still on the board. That would require you to run through the piece list, looking for pieces in the middle. (Well, perhaps not a big deal. Under-promotions will be very rare, not considered in QS. So who cares if their MakeMove is slow and cumbersome?)

An alternative is to put an always-present dummy cell in the piece lists, (linked) between Knights and Pawns. If the index of that cell is the highest of all, then even when Pawn capture is not futile, the futilityThreshold would be set to this dummy cell, so that the loop generating their capture would automatically terminate after the last minor. We then would have to test if Pawn capture was really futile (a simple compare between the futilityThreshold and the idex of the dummy cell), and if not enter a loop for the capture of the Pawns, starting with the successor of the dummy cell. A later advantage of doing capture of Pawns and pieces in a different loop is that a simplified MakeMove() could then be used for the latter, as capture of a Pawn will have ramifications for the passer accounting.

Such a dummy cell in the piece list creates the possibility to easily insert pieces just before or just after the cell. Before it the under-promoted pieces can be inserted. That would short-change a Rook obtained by under-promotion, as it would now be behind the minors, so that its capture in some cases could be considered futile, while in fact it isn't. Well, tough luck. In the root you could correct this on the next move. I am not even sure I would want to make the program ever search under-promotion to Rook. (Although you can bungle a drawn KBPK if you are unaware of it.)

It is also easy to insert pieces directly behind the dummy cell. This could be used by considering 7th-rank passers a different piece type from other Pawns. So 'pre-promote' Pawns already when they reach 7th rank, delete their original entry from the piece list, and insert a new entry for them just behind the dummy cell. This does justice to the fact their value is around 200cP, and futility pruning could then take heed of their elevated status. I this case it is possible to reserve a cell for every Pawn, as individual Pawns are never indicated by a bit in a set; they are included in the set of leaper attackers by the direction from which they attack, and in this respect there is no reason to distinguish pre-promoted Pawns from ordinary ones. Allocatig a cell in case of pre-promotion then ivolves simply subracting a constant to the piece-list index.

The dummy cell also offers a starting point for running through the Pawn list for the purpose of generating promotions. You could just run through the list until you encounter a Pawn with an index in the range of unpromoted Pawns. Every pre-promoted Pawn you encounter that way only has promotion moves. Typlically there would not be any, of course, but it is nice to have a quick way for establishing that (namely that next[DUMMY] >= ORDINARY_PAWN).

This leads to the following assigment of piece-list indexes:

Code: Select all

index piece type                  next    attBit
0x10 King                         0x12       0xC0  start of piece list
0x12 Queen                        0x1A  0x10
0x14 promo-Queen                  0x14  0x20       <- qSlots
0x16 promo-Queen                  0x16  0x40
0x18 promo-Queen                  0x80  0x80
0x1A Rook                         0x1C  0x08
0x1C Rook                         0x34  0x04
0x1E 
0x30 under-promotion slot         0x32       0xA0  <- nSlots
0x32 under-promotion slot         0x80       0x90
0x34 Bishop                       0x38  0x02
0x36 
0x38 Bishop                       0x3A  0x01
0x3A Knight                       0x3C       0x88
0x3C Knight                       0x7F       0x84
0x3E pre-promoted Pawn (for 0x5E)
0x50        "               "
0x52        "               "
0x54        "               "
0x56        "               "
0x58        "               "
0x5A        "               "
0x5C pre-promoted Pawn (for 0x7C)
0x5E Pawn                         0x70              ORDINARY_PAWN
0x70    "                         0x72
0x72    "                         0x74
0x74    "                         0x76
0x76    "                         0x78
0x78    "                         0x7A
0x7A    "                         0x7C
0x7C Pawn                         0x80
0x7E DUMMY                        0x5E

Note they are all even: the corresponding odd values will be for the opponent pieces. Also note they all have the 0x10 bit set; this is a kludge for facilitating calculation of mobility. At some point we will want to count how many of the moves of a given piece will hit occupied squares with friends or foes. (The friends should not be counted towards the piece mobility, the foes give valid captures.) This can now be done by ANDing the index of the occupant with 0x11, and adding the lot. The upper nibble will then count total number of occupied target squares, and the lower nibble the number of white ones. This total can then be used in a lookup table to get the the number of captures amongst them, and add that to the mobility. The edge guards would mimic empty squares here (use code 0x80, with both the 0x10 and 0x01 bit at 0).

Also note that the MSB of the leaper set is reserved for indicating the tabulated attBit represents a piece, and not the direction of a Pawn attack. This limits the number of representable leapers to 7, and thus the number of Knight under-promotions to 2 (if no other Knights are captured).

The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project