The Gigatron project

Rémi Coulom · Post by **Rémi Coulom** » Thu Dec 07, 2017 12:51 am

That machine is fantastic! I want one for Christmas.

IanO · Post by **IanO** » Thu Dec 07, 2017 1:17 am

I really admire this project. (Though it feels like cheating to develop it with a super fancy digital oscilloscope!)

I also admire your project to develop a chess program for it. I've always wondered what the strength of engines on 80's hardware would be given last few decades' advances in search and evaluation techniques. I mean, I don't think there was a single dedicated computer that had the null-move heuristic, much less the LMR or automatically tuned eval params.

Ras · Post by **Ras** » Thu Dec 07, 2017 10:44 pm

What programming language will you choose? The interpreter? Assembly? I guess a C compiler is out of question.

Rémi Coulom · Post by **Rémi Coulom** » Thu Dec 07, 2017 10:58 pm

Is your emulator available?

mar · Post by **mar** » Thu Dec 07, 2017 11:48 pm

The instruction set is minimalistic, sexy and also challenging.

I wonder how one does function calls; there's no stack so one would have to emulate it manually
which would be costly/tricky (thinking of how to compute return address, managing stack)

Are indirect jumps possible?

Even 6502 had a 256-byte stack IIRC plus it also was possible to make limited register transfers.

I wonder if old tricks like self-modifying code might be useful again?

(I don't want to distract you too much from the project but I simply find Gigatron fascinating)

hgm · Post by **hgm** » Fri Dec 08, 2017 9:45 am

Rémi Coulom wrote:Is your emulator available?

Marcel's 'official' emulator is on gtihub: https://github.com/kervinck/gigatron-ro ... er/gtemu.c . It does generate the video signal as a stream of ascii characters, though.

What I did was plug this emulator into XBoard, to use XBoard's graphical system to actually display the generated VGA signal as an image. This is very slow, though; XBoard is not able to completely redraw its window 60 times per second. I only draw what changes, but this causes extra overhead if a lot changes.

I did not publish the XBoard patch that does this (and makes XBoard unusable as a GUI for Chess, btw), but if there is interest in it I could do so. Problem, however, is that there is currently no emulation of input.

Ras wrote:What programming language will you choose? The interpreter? Assembly? I guess a C compiler is out of question.

Indeed, no C compiler.

I did not like the speed penalty we would suffer when using an interpreter. So I opted for assembly. I have already written an assembler.

Problem with this approach is that the program also becomes responsible for generating the sync pulses in time, while the interpreter would do that for you. I built a pseudo-instruction into the assembler that will expand to the 5 instructions needed to test if there still is time to execute the code block until the address it mentions (usually as symbolic label), assuming linear execution, and if not, jump to the video handler. (Which then delays for the remaining time, before generating a sync pulse and resuming the code.)

Tight loops (such as ray scans over the chess board) become a problem, however; performing this test in each iteration would slow them down by a large factor. Even just counting how may times you did the loop (under the assumption that there would be enough time to handle the worst case, so you won't have to test if time is up, but just measure it) takes 3 instructions (load, add, store). Sometimes you can determine the number of iterations made afterwards, e.g. by the distance between the start and end-square of a ray scan, which you get from a table after the loop completes.

mar wrote:The instruction set is minimalistic, sexy and also challenging.

I wonder how one does function calls; there's no stack so one would have to emulate it manually
which would be costly/tricky (thinking of how to compute return address, managing stack)

Indeed. The Gigatron has the quirk that it always executes the instruction following a branch, even if the branch is taken. (A result of the instruction-fetch pipelining.) When jumping to subroutines, I use this instruction to load the return address in the accumulator. The subroutine can then start saving that, in a fixed place when it is not called recursively, and on a software stack it maintains itself when it is. This assumes the subroutine is in the same memory page as its caller, or at least knows in which page the caller must be. This is usually not too much of a problem. Not many subroutines in a program are called from more than one or two places. E.g. Search would be called only from itself and from SearchRoot, and even if these are in different pages, it could just explicitly test for the ply level to decide if it should return to SearchRoot or to itself.

Are indirect jumps possible?

Yes, the target address can be taken from the accumulator, or fetched from memory. So my subroutines typically look like:

Code: Select all

routine&#58;
  STA savedReturnAddr
  ...                       // do stuff
  BRA &#91;savedReturnAddr&#93;
  NOP                    // still executed &#40;sometimes the last instruction of the task can be put here&#41;

The Intel860 'supercomputer on a chip' worked similarly, btw. It was a risc machine without stack, and it left the return address of a subroutine call in one of the registers. It also executed the instruction after a branch, IIRC. So the Gigatron is in good company here!

Even 6502 had a 256-byte stack IIRC plus it also was possible to make limited register transfers.

In the Gigatron X and Y cannot be transferred anywhere, and only act as memory address. But there are programmig tricks, where through the use of a memory table you could effectively get them back into A. But you would hardly ever need those, because the Gigatron can transfer A to X or Y as a side effect of a memory store. So during such a trasfer you can make a mirror of the new X or Y content for free, and load that back into the accumulator when you need it. When you load X or Y directly from memory it can only be through direct addressing (meaning the value was already in memory) or as immediate constant (in which case you could simply load that same constant).

Absence of a stack is a bit of a nuisance. The 6502 stack was only a very limited help, though. To set up a stack for local variables you would need the Y register to access them all the time. Problem is that the addressing mode [constant,X] is missing, for no good reason. Connecting D to the now unused input of the high address multiplexer could replace mode [X] (which is really [0,X], forcing the output of the multiplexer to 0) by mode [const,X], and then X could be used as a frame pointer.

Nice thing about the CPU not being a single immutable LSI chip is that you can dust off your soldering iron and add new instructions!

What I plan to do is solder a second Y register ('Z') on top of the existing one (as far as inputs are concerned), and wire the outputs of that to the unused high address multiplexer input, so that the modes [const] and [X] become [Z,const] and [Z,X]. This allows relocation of the zero page, so that each instance of a recursive routine can have its own 'direct page'. Main problem is to find (and decode) an opcode to load this Z. Perhaps the highest bit of the data (now unused because there is only 32KB of RAM, so the high part of the address is only 7 bits wide) should be used to decide if the data is written in Y or Z.

I wonder if old tricks like self-modifying code might be useful again?

This is out of the question. Because of the Harvard architecture there is no write access to program memory. In fact the program memory is an EPROM. This is an advantage of the interpreter: there the program it interprets resides in RAM, and can thus be easily loaded. A Chess program in native machine code would reside in EPROM, and would make it a dedicated machine.

This is one aspect I don't like; when I am going to build a Gigatron I will probably replace the EPROM by two CMOS RAM chips, and a battery for data retention. I must figure out a way to load a program in there, however. But I would have to figure out a way to program EPROMs in the other case, which is not nessecarily easier.

stegemma · Post by **stegemma** » Sat Dec 09, 2017 12:07 am

The project is very interesting but I would suggest you to start with a simpler program than those proposed. For the first test, something simple as my old Drago would be enough, just to see if the whole things can works. Maybe even just a Perft function would gives some hint on how to proceed with the whole program.

hgm · Post by **hgm** » Sat Dec 09, 2017 2:16 pm

Well, one of the problems with programming in assembler is that it is quite error prone, and that there is an appreciable chance that you will make errors even when entering code for a trivial task. So even when you have something simple that works, it will be a poor stepping stone to the more complex thing you eventually want, if that requires replacing code that does things in a simple way by the complex code. In a sense you are starting from scratch all over.

So I want to have a road map along an evolutionary development, which can improve sophistication by adding code to something that is already working, with only minimal changes (if any) required in that working stuff. So if my eventual goal is to do staged capture generation from an incrementally updated attack map, I think it would be a waste of time to first write something that generates captures by making ray scans over the board, put them in a move list, and extract them MVV/LVA order. Even if you had the latter, and it worked perfectly with perft and all, it would not have brought you an inch closer to the final goal. What would make sense is to write code that does generate an attack map from scratch, for the current position, and initially run the staged move generation from there. Then you can later add code in MakeMove/UnMake to keep an incrementally updated attack map of the same format, and add code to compare the two in every node. And when all obvious bugs in the incremental update are cured, switch to using the incrementally updated map for capture generation, and after that limit from-scratch generation and comparison only to the root, and when the engine is fiished, throw it away altogether.

That means that in any case I want my loop over moves in Search() to look similar to what it should look eventually; this wil be the core of the engine, and I don't want to have to completely replace that later. The C code I presented above isn't especially complex. It relies on tables, which should be initialized without errors, though, in particullar stripBit, which would hold stripBit[index] = index & (index-1), and set2dir[(1<<d) + n*(2<<d)] = (&neighbor[d]) >> 8 (with d=0 to 7 and n all possible integer values that keep the table index in the range 0 to 255). And consistency between the attBit[piece] and set2slider[attackersSet].

But it definitely is a good plan to start by making a version that is stripped of almost everyting that is not absolutely essential. That would indeed be a perft-like thing, as that would do away with everything that has to do with evaluation and alpha-beta pruning. Initially I also don't want to bother with the complications of e.p. capture, castling and promotion, so the perft numbers would not be correct, but who cares? Then material evaluation, minimaxing of scores and alpha-beta pruning can be added. Then best-move tracking and IID, using the best move of the previous iteration as 'hash move'. Then tracking of the PV, using the PV move (when available) as hash move.

Promotions

The thing that is still most unclear to me is how to best handle promotion. Because the design at various stages uses bit sets to represent a collection of leapers or sliders, the number of these cannot be allowed to exceed 8. For the sliders that would leave room for 3 extra Q, R or B, (as you start with only 5 of those), and for the leapers 3 extra Knights (after K, N, N, left-P, right-P). The sets will be used in the lookup tables set2leaper[] and set2slider[] to find their index in the piece table. Eventually the value in this index will have a meaning in itself, because although the piece list is traversed as a linked list, following the links, and thus potentially visiting them in any order, I want to detect whether futile victims are reached just from that index, rather than having to consult a lookup table for the piece value to see if the piece at that index is below the futilityThreshold, for every potential victim. Instead I want to translate the (high byte of the) eval-alpha difference to a piece index through a lookup table at the top of Search(), which tabulates the index of the first piece capture of which would not exceed the futility threshold. So it is important that the indexes represent the pieces in order of descending value, and that the linked piece lists respect this order.

So I have to reserve some entries early in the piece list for extra Queens, which can be done directly behind the King. As the King is always present, it would also be easy to insert such a Queen in the piece list (and take out the Pawn that promoted to it) on promotion moves. But since there is only room for 3 extra Queens in the bit sets, that means I cannot reserve a 'promotion Queen' slot for every Pawn, so that the promotion slots should be organized in a way that makes it easy to assign a slot to the Pawn that happens to promote. I guess this can be done by linking the spare slots into a list (using the same links as normally used to make them part of the piece list), and just unlinking the first element of that list on promotion.

For underpromotions it is more trick, however. I am not sure I want to allow those, and even if I do, they would only come at the latest stage of development. But since the way they would be handled would affect the design of the piece list, I want to reserve the required assets for them now, so that I wouldn't have to change the handling of the piece lists for the other pieces later. Ideally pieces from under-promotion would be assigned indexes between Rook and Bishop, so that they are in the right place irrespective of whether they are Knight, Bishop and Rook. (I don't care about the order of Bishops and Knights). It would be difficult to insert them there in the doubly-linked list, however, as you don't know what the intended closest neighbor is that is still on the board. That would require you to run through the piece list, looking for pieces in the middle. (Well, perhaps not a big deal. Under-promotions will be very rare, not considered in QS. So who cares if their MakeMove is slow and cumbersome?)

An alternative is to put an always-present dummy cell in the piece lists, (linked) between Knights and Pawns. If the index of that cell is the highest of all, then even when Pawn capture is not futile, the futilityThreshold would be set to this dummy cell, so that the loop generating their capture would automatically terminate after the last minor. We then would have to test if Pawn capture was really futile (a simple compare between the futilityThreshold and the idex of the dummy cell), and if not enter a loop for the capture of the Pawns, starting with the successor of the dummy cell. A later advantage of doing capture of Pawns and pieces in a different loop is that a simplified MakeMove() could then be used for the latter, as capture of a Pawn will have ramifications for the passer accounting.

Such a dummy cell in the piece list creates the possibility to easily insert pieces just before or just after the cell. Before it the under-promoted pieces can be inserted. That would short-change a Rook obtained by under-promotion, as it would now be behind the minors, so that its capture in some cases could be considered futile, while in fact it isn't. Well, tough luck. In the root you could correct this on the next move. I am not even sure I would want to make the program ever search under-promotion to Rook. (Although you can bungle a drawn KBPK if you are unaware of it.)

It is also easy to insert pieces directly behind the dummy cell. This could be used by considering 7th-rank passers a different piece type from other Pawns. So 'pre-promote' Pawns already when they reach 7th rank, delete their original entry from the piece list, and insert a new entry for them just behind the dummy cell. This does justice to the fact their value is around 200cP, and futility pruning could then take heed of their elevated status. I this case it is possible to reserve a cell for every Pawn, as individual Pawns are never indicated by a bit in a set; they are included in the set of leaper attackers by the direction from which they attack, and in this respect there is no reason to distinguish pre-promoted Pawns from ordinary ones. Allocatig a cell in case of pre-promotion then ivolves simply subracting a constant to the piece-list index.

The dummy cell also offers a starting point for running through the Pawn list for the purpose of generating promotions. You could just run through the list until you encounter a Pawn with an index in the range of unpromoted Pawns. Every pre-promoted Pawn you encounter that way only has promotion moves. Typlically there would not be any, of course, but it is nice to have a quick way for establishing that (namely that next[DUMMY] >= ORDINARY_PAWN).

This leads to the following assigment of piece-list indexes:

Code: Select all

index piece type                  next    attBit
0x10 King                         0x12       0xC0  start of piece list
0x12 Queen                        0x1A  0x10
0x14 promo-Queen                  0x14  0x20       <- qSlots
0x16 promo-Queen                  0x16  0x40
0x18 promo-Queen                  0x80  0x80
0x1A Rook                         0x1C  0x08
0x1C Rook                         0x34  0x04
0x1E 
0x30 under-promotion slot         0x32       0xA0  <- nSlots
0x32 under-promotion slot         0x80       0x90
0x34 Bishop                       0x38  0x02
0x36 
0x38 Bishop                       0x3A  0x01
0x3A Knight                       0x3C       0x88
0x3C Knight                       0x7F       0x84
0x3E pre-promoted Pawn &#40;for 0x5E&#41;
0x50        "               "
0x52        "               "
0x54        "               "
0x56        "               "
0x58        "               "
0x5A        "               "
0x5C pre-promoted Pawn &#40;for 0x7C&#41;
0x5E Pawn                         0x70              ORDINARY_PAWN
0x70    "                         0x72
0x72    "                         0x74
0x74    "                         0x76
0x76    "                         0x78
0x78    "                         0x7A
0x7A    "                         0x7C
0x7C Pawn                         0x80
0x7E DUMMY                        0x5E

Note they are all even: the corresponding odd values will be for the opponent pieces. Also note they all have the 0x10 bit set; this is a kludge for facilitating calculation of mobility. At some point we will want to count how many of the moves of a given piece will hit occupied squares with friends or foes. (The friends should not be counted towards the piece mobility, the foes give valid captures.) This can now be done by ANDing the index of the occupant with 0x11, and adding the lot. The upper nibble will then count total number of occupied target squares, and the lower nibble the number of white ones. This total can then be used in a lookup table to get the the number of captures amongst them, and add that to the mobility. The edge guards would mimic empty squares here (use code 0x80, with both the 0x10 and 0x01 bit at 0).

Also note that the MSB of the leaper set is reserved for indicating the tabulated attBit represents a piece, and not the direction of a Pawn attack. This limits the number of representable leapers to 7, and thus the number of Knight under-promotions to 2 (if no other Knights are captured).

stegemma · Post by **stegemma** » Sat Dec 09, 2017 2:44 pm

I really don't understand how do you generate moves, because your way to do it is too many complex for me. In my old programs, I simply have a piece list with a "piece type" member. When there are a promotion, I just change the piece type from pawn to Queen. For multiple promotions, in Sabrina I duplicate the move, changing the destination piece type and the under-promotion has been done.

hgm · Post by **hgm** » Sat Dec 09, 2017 3:21 pm

It is true that the generation of moves is somewhat less than trivial. But it is not so difficult that it should be impossible to understand. The key is that this is what I call 'inverted' capture generation, i.e. not done attacker by attacker, but instead victim by victim. So you run through the opponent piece list, King to last Pawn, to generate all captures in MVV order.

For each piece you then have to figure out which of your own pieces can capture it. This could be done by running through your own piece list, making 0x88 tests to see if any of them hits the intended victim. (This would be stupid for the Pawns, as there could be 8, and there are only 2 places they can come from, so it is faster t just check those two squares on the board for enemy Pawns.) But that is still slow, because you would have to check 8 pieces and 2 squares for each possible victim, most of the time not even one of those 10 being able to capture it.

So instead I use a attack map, which for each square keeps track of all pieces that attack that square (even with 'pseudo-attacks', that attack a friendly piece). All I have to do to generate captures on a given victim is then look in that attack map, where each attack will be represented by a bit flag set to 1. I extract these bits, which only takes me as many iterations as there are attacks, and stops after no 1 bits are left. (Which might be from the very start, if the victim is not attacked at all.) Each recorded attack then has to be deciphered, to figure out what piece was making the attack, and where it was located on the board. But you only have to do that for attacks that you are sure they exist. (And you don't even start doing it if the victim isn't worthwile.)

How to get the attacker from the attack-map bit depends on whether the attacker is a slider, a leaper or a Pawn. But that isn't much of a problem; there are't enough bits in a byte to store all 10 potential attacks, so you need to store the set of attacks (by one color) in 2 bytes anyway, and you can put the leaper attacks in one byte, and the slider attacks in another. And then use two different loops for extracting them. First the leaper attacks in the order P, N, K, and then the slider attacks in the order B, R, Q. This mimics LVA order. Except that K is done a bit early. Now I can live with that, because the reason to do LVA is that you usually gain more material by capturig with the lowest attacker when the victim is protected. But the King can capture only if the victim is unprotected, and this would be apparent from the attack map, when you judge if you really want to make a HxL capture, or that you'd better prune or LMR it as a losing capture. For capturing unprotected pieces LVA has no special advantage over a random order.

Of course this rises the question of how the (pseudo-)captures got to be flagged in the attack map in the first place. The answer to that is that they were generated in the 'direct' way, by taking the location of the piece that makes them, calculating all capture moves a piece like that can make from there, and setting the flags in the attack-map elements for the target square. That sounds like an inefficiet, roundabout way for doing things. Except that this was done way down in the tree, close to the root, when the piece last moved. So that the result is shared between thousands of leaves.

The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project

Re: The Gigatron project