Porting micro-Max to a small teaching language

octogonz · Post by **octogonz** » Tue Jun 16, 2026 1:51 pm

The rationale of ISA design tends to be complicated.

Regarding the framed registers: I agree that memory-mapped registers aren't new. What may be novel is rather Chombit's approach and purpose that motivates it. The approach: aside from the specialized SP/FP/IP/GP, the framed registers entirely replace traditional registers rather than supplementing them. They can appear in every syntax role, even base-plus-displacement addressing like "store [i:4 + 12345], byte 123" or "load b:1, [i:4 + 12345]". The clock cycle timing model even roughly corresponds to traditional registers (2 cycles for MOVE, 3 cycles for STORE, 4 cycles for LOAD), intentionally downplaying the FP-based access time by allowing multiple bus reads/writes to happen on the same cycle when there is no data dependency. The docs admit that while electronically realizable, framed registers are not electronically defensible.

That's not the purpose, though. Hybrix aims to enable "total knowledge" of a nontrivial program (chess with a graphical UX and 6-voice music, for example), by showing how it reduces to a well-defined foundation of bytes and clock cycles. "I can easily make sense of the compiler's output" encourages "I can write assembly myself," although the main focus is high level code. To this end, framed registers completely eliminate the complexity of register spilling, well enough that register exhaustion can be a compile error. The design serves a second goal: that every variable, function parameter, and GC object can have a permanent unique memory address for its entire lifetime; this supports the claim that "you can explain every byte of your program" by inspecting the address space using the debugger. Thus, although Chombit arguably preserves much of the character of a real CPU, it accepts some compromises that would confound an EE curriculum.

Regarding the 5-byte limit: The machine language encodes each instruction as an opcode byte followed by one byte for each operand, with the high bits of the opcode determining the instruction length. The official release has 186 opcodes, or 211 if you include the FPU add-on, however almost every remaining opcode has "reserved" purpose anticipating a possible future need. Thus the opcode list is actually near capacity. Allowing 6-byte instructions is certainly possible, but it would need to justify displacing other opcodes. Allowing 6-byte forms raises a lot of why-this-not-that questions that could either overburden students with too many instruction forms, or if omitted would require more complex justifications than a simple rule "5 is the max." We can argue it's not strongly needed: XHEX and the 24-bit bus mostly eliminate pointers as a source of 32-bit literals. For other constants, the workaround "push int $1234_5679; pop i:0" is quite readable and fits in 7 bytes.

Regarding the absolute memory addressing: For a 24-bit memory bus, we could certainly support forms like "load i:0, [$12_3456]" or "store [$12_3456], byte 123" and stay within the 5-byte limit. The challenge is that this sort of variation contributes to a combinatorial explosion of possible forms, which an "easy to learn" instruction set must balance carefully. LOAD and STORE already have the most forms of any Chombit instructions (and the most reserved future possible forms).

In practice, absolute memory addressing seems most useful for global variables and memory-mapped I/O. This is easily handled by loading the base address into a register and using it for the duration of a function, and in my measurements it reduced the total ROM code size noticeably because base+offset operands require less bytes. (Going further: FAIL handler can put the base pointer directly into FP, elegant or ugly depending on your perspective.)

Regarding WITH: Even though we have over 200 opcodes, I've tried to leverage syntax to create a mental picture that "you only need to learn 25 instructions", where possible minimizing special modifiers on the instruction forms. Given the nature of Hybrix, it seemed that add-with-carry and euclidean-division were barely necessary features, included mainly because I thought they would be fruitful to discuss in one lesson rather than a general workhorse. WITH seemed a convenient way for one opcode to add an alternative semantics to any instruction without disrupting the canonical opcodes. Surprisingly, although Euclidean division is absent from most real CPUs, it turned out to be quite nice for certain operations like tilemap indexing, so in retrospect maybe it should have had its own opcode.

You are right that WITH looks like a prefix, however its semantics and interplay with IF may be easier to understand with the ability to step over it in the debugger and observe the SF and WF as control flags. Also, although Hybrix is designed to avoid interrupts, if you look closely at the event handler design, I've left open the possibility for Chombit to support interrupts in the future for specialized learning topics such as multithreading or video raster effects. If so, then it makes sense that "push flags" includes WF, and WITH is not secretly an exception to the 5-byte limit.

Signedness prefix: It's a good suggestion, however because "with carry" and "with euclid" share an opcode, I notice that introducing "with signed" would encounter a conflict for instructions like ADD that support both meanings. Also, the signedness of immediate operands is not a simple default: operands variously use "{s_byte}" (signed byte) or "{byte}" (unsigned or irrelevant) according to expected usage scenarios.

Regarding IF: Being able to step over IF in the debugger, with its own flag, really helps people understand the world of conditionals separately from the modified instruction. As you observed, we also get conditional execution of other instructions for free. (ARM Thumb does something like this with "it eq" and "it ne" if I recall.)

This was a tangent, so I'll reply separately about the chess program.

hgm · Post by **hgm** » Tue Jun 16, 2026 10:28 pm

Have you looked at the TMS9900 architecture? It seems almost exactly like Chombit to me. There are only 3 physical registers: Program Counter, Workspace Pointer and Status register. In Chombit you call these IP, FP and Flags.

The 'framed registers' in the memory pointed to by the WP can also be used with indexed mode (i.e. address = framed registerd content + offset). It even has auto-increment addressing modes using these registers as pointers.

I was surprised you do not support absolute addressing, because it is considered the most basic form of memory addressing, and I never have seen any architecture that does not support it. It is true that loading a register with a base address and using indexed modes with smaller offsets can achieve the same. But when using recursive routines (as is common in chess programs) every instance of the routine would have its own set of framed registers, used for local variables). So you cannot do a one-time load on a register reserved for accessing a certain relatively small memory block (which is affordable due to the large number of framed registers Chombit supports); you would have to initialize the register on every call, to make the global variables available in every instance of the subroutine, as the previously initialized value will get 'out of few' when you alter the FP. Of course when the recursive routine is the only routine (as in micro-Max), you would know in advance where, say, i:0 would end up for every recursion, and you could initialize that (e.g. to 0, if your globals start there) in advance. Then every time the routine requests a new stack frame, its i:0 would already be initialized to the desired value, and it would never change it.

Note that the TMS9900 achieves this in hardware: it does consider the framed register r0 to contain zero when it is used in indexed addressing (even if the memory location that corresponds to r0 given the current WP contains something else). Many RISC architectures (characterized by having many register) reserve one of their registers for holding the constant 0, which eliminates the need for many addressing modes and instructions.

octogonz · Post by **octogonz** » Wed Jun 17, 2026 12:33 pm

I had mainly studied SPARC and Infineon C166. For TMS9900, I assumed its workspace was intended more for context-switching for interrupts or system calls. Looking closer, it seems BLWP/RTWP can also represent a function call. Thanks for pointing me to it.

The design goals are different, though. Chombit's framed registers implement the Hybrix language's calling convention, which tries to approximate the classic cdecl layout. (Hybrix pushes left-to-right for readability in the debugger, but right-to-left can work too for a C-style language with variadic "printf".) Unlike TMS9900, the idea is not just to have registers backed by memory, but to make the compiler's activation record look like the programmer's model of the function. That pushes the design toward a larger register set and different register sizes, rather than a small fixed workspace used mainly as the processor's visible register set.

Regarding absolute addressing, I assume you mean "load i:0, [$12_3456]" where the instruction encodes the full address? Classic CPUs like 6502 could do "LDA $D012", but for example ARM load/store lacks this absolute addressing form. 32-bit address operands are too costly to encode, whereas 24-bit is debatable. Empirically, replacing Chombit's "load i:0, [$12_3456]" with "load i:0, [i:4 + $56]" generally reduces ROM size because there will tend to be a lot of repeated accesses near the base pointer (i:4 in this example). I know it was significant for 32-bit Chombit addresses; I changed the bus to 24-bit fairly late in the development, so I'm not as confident about the size tradeoff for 24-bit.

On TMS9900, the r0=0 trick nicely converts indexed addressing to absolute, however Chombit encodes each register operand as a separate byte, so for example "load i:4, [i:0 + $12_3456]" would be a 6-byte form. A specialized encoding seems unavoidable.

Supporting small addresses as "zero page" would look like "load i:0, [$12]" or maybe "load i:0, [$1234]". This encounters a problem that Hybrix reserves the $00_xxxx address range for hardware faults to trap null references.

Especially for chess-solver recursive loops, you are right that base+offset would work better with global registers that persist across function calls. One global register is not enough; Hybrix commonly references at least $d0_0000 for I/O memory and $10_1000 for the program's global variables. It would need to be more like SPARC's g1-g7 global registers. Although this solves the recursive call problem, a full suite of global registers will be a higher cognitive load for learners. Not just "more register types to know", but also more complex operand encodings, as well as a compiler convention for initializing and preserving them.

So, if I was going to modify the architecture to address this gap somehow, "load i:0, [$12_3456]" seems easier to integrate, more natural to explain. But I'll point out that the compiler tends to favor small code over fast code, because a ROM cartridge must fit in 64 kilobytes (after LZMA compression). Thus the compiler may prefer base+offset over absolute addressing for that reason.

octogonz · Post by **octogonz** » Wed Jun 17, 2026 12:44 pm

Returning to the topic of the chess game, a few questions:

You mentioned you are very busy lately: Would you prefer for me to code the Hybrix port of micro-Max myself, and you can give feedback about its code?

If we improve the hash replacement algorithm as suggested, would you still consider it to be "micro-Max"? Probably the main character of the algorithm is basically unchanged, in the same category as adjusting the RAM size, but I'm not too familiar with chess programming culture.

Do you expect KingSlayer to play significantly better than micro-Max on Hybrix hardware? If so, maybe I could consider eventually porting both of them, providing a "simple" and "advanced" version of the program. The two versions could share the same graphical front-end.

hgm · Post by **hgm** » Fri Jun 19, 2026 5:32 pm

I generally love programming in assembly language. But if I am honest, there is just no way I could realistically achieve a Hybrix micro-Max version in a reasonable time in my present situation. So It is probably better if you have a go at it. I will always be available to review the code, and give advice for debugging.

And I would consider a somewhat smarter replacement algorithm just an adaptation to the very cramped memory it had to run on. Basically it would just be an attempt to make a small hash table behave like it is the quite large one the micro-Max originally was designed to run with (256MB). Although of course the fact that it now runs on a very much slower machine than originally developed on causes the search tree to be so much smaller that the situation is parhaps not so bad as it at first glance seems.

I would not bother with KingSlayer. It basically has the same search algorithm as micro-Max, but a much more extensive evaluation. At the low search depth you will be able to reach at 6MHz, it would probably be much more important (and less memory consuming) to make the search more selective (putting emphasis on moves that are 'tactically connected' to the previous few moves).

octogonz · Post by **octogonz** » Fri Jun 19, 2026 9:52 pm

Sounds good!

One clarification: I won't be coding this in assembly language. Instead I will use the Hybrix high-level language, which is similar to Pascal but with a garbage collector and some interesting storage directives for memory-mapped I/O. The code is verbose but quite easy to read, and corresponds closely to the emitted assembly. (The runtime library's engine.hyb is a pretty good code sample to see what it's like.)

hgm · Post by **hgm** » Fri Jun 19, 2026 11:04 pm

Well, I suppose you have a compiler for that high-level language to turn it into Chombit assembly code. Micro-Max should not produce any garbage, so the garbage collector isn't relevant.

Denis P. Mendoza · Post by **Denis P. Mendoza** » Sat Jun 20, 2026 2:11 am

Well, this one's new. Just logged in from my hiatus. This is a fun project like ticalc.
I only read a few pages on this Hybrix Framework and the online tutorials
So for the curious user, the requirements are:
1. Create a workspace folder named micromax_chess
2. Create umax4_8_backend.asm and put it in that folder
3. Create umax4_8_frontend.hyx and put it inside the same folder
4. Run the Hybrix environment linker toolchain from your terminal or developer panel with the following script:
hybrix-cc umax4_8_frontend.hyx -o micromax_chess.rom
5. Drop the resulting .rom asset block file into the Hybrix virtual runner window frame (Hybrix web engine) to load the application.
If I'm right, it would be nice if I could try and run this on my own as I have 2 and 3 to verify and test.
The process I did was converting the obfuscated C-code to C++, transform to asm, then create the micromax hybrix frontend.
I hope someone corrects mistakes from curious guys like me. I'll drop the codes when it runs.

Chris Formula · Post by **Chris Formula** » Sat Jun 20, 2026 3:33 am

Denis P. Mendoza wrote: ↑Sat Jun 20, 2026 2:11 am Well, this one's new. Just logged in from my hiatus. This is a fun project like ticalc.
I only read a few pages on this Hybrix Framework and the online tutorials
So for the curious user, the requirements are:
1. Create a workspace folder named micromax_chess
2. Create umax4_8_backend.asm and put it in that folder
3. Create umax4_8_frontend.hyx and put it inside the same folder
4. Run the Hybrix environment linker toolchain from your terminal or developer panel with the following script:
hybrix-cc umax4_8_frontend.hyx -o micromax_chess.rom
5. Drop the resulting .rom asset block file into the Hybrix virtual runner window frame (Hybrix web engine) to load the application.
If I'm right, it would be nice if I could try and run this on my own as I have 2 and 3 to verify and test.
The process I did was converting the obfuscated C-code to C++, transform to asm, then create the micromax hybrix frontend.
I hope someone corrects mistakes from curious guys like me. I'll drop the codes when it runs.

musta na Denis? been a while since Toga days.

i'm glad na active ka uli dito.

Chris

hgm · Post by **hgm** » Sun Jun 21, 2026 11:04 am

The hash improvement I would recommend is this:

Original code

Code: Select all

 struct _*a=A+(J+k*E&U-1);                     /* lookup pos. in hash table*/

should become

Code: Select all

 struct _*a=A+(J+k*E&U-1);                     /* lookup pos. in hash table*/
 d=a->D;                                       /* depth of 1st entry       */
 a+=a->K!=Z;                                   /* 1st slot misses, try 2nd */
 a-=a->K-Z&&d<a->D;                            /* back to 1st if less deep */

Since each hash probe now considers a pair of entries, the hash table A[] has to be declared as a power of 2 plus 1. (Not shown.) The first line calculates the address of the first member of the pair where the entry could be stored. The depth of the entry there is remembered as d in line 2. Line 3 then shifts the pointer to the next entry, if the first entry is not a hit (signature a->K does not match the hash key Z).

The 4th line switches back to the primary slot if the second is also a miss, and the depth of the entry in that first slot (d) is smaller than that of the second. If the pointer a was still pointing to the first slot, because that was a hit, it is still a hit (a->K-Z == 0), and the line would do nothing. The result is that 'a' points to the entry of the pair that was the hit, or, if neither is a hit, to the one with the lowest depth. (Which will then be replaced.) The code then continues in the old way to test whether the selected entry is a usable hit.

This implements the 'shallowest of two' replacement algorithm. But wihout any provision for aging. I am not sure if that would be a problem. In theory the depth-preferred part of the table could clog up with the deepest entries of earlier search that refer to positions that can no longer be reached from the current game position. A simple way to build in some aging is 'undercut replacement', where you replace the deepest entry of the slot when the new entry is exactly one less deep. That would also allow the deepest entries to be flushed from the table, but at a rate that becomes successively lower when they are deeper. (Because then positions of depth d-1 that could replace them also are less frequent.) This could be achieved by replacing the 4th line with

Code: Select all

a-=a->K-Z&&d<a->D|n==d-1;

Then it does not only replace the primary entry in the case of a double miss when it has a lower depth than the second, but also when it has a depth (d) one higher than the new entry to store there (n).

Porting micro-Max to a small teaching language

Re: Porting micro-Max to a small teaching language

Re: Porting micro-Max to a small teaching language

Re: Porting micro-Max to a small teaching language

Re: Porting micro-Max to a small teaching language

Re: Porting micro-Max to a small teaching language

Re: Porting micro-Max to a small teaching language

Re: Porting micro-Max to a small teaching language

Re: Porting micro-Max to a small teaching language

Re: Porting micro-Max to a small teaching language

Re: Porting micro-Max to a small teaching language