# Harmonized Assembly Layer The Harmonized Assembly Layer is a set of words implemented by all Dusk kernels which have the same semantics and compile native code that has consistent results on all architectures. For example, "RSP) 2 +) 16b) +," will, on all arches, compile a set of instructions that will result in the 16-bit addition of RSP+2 into the Work register. On i386, this is the same as "ax sp 2 d) 16b) add,". This layer allows us to generate performant code in a cross-arch manner. It is also what compilers such as the C compiler rely on to generate code. Of course, as with any abstraction, we sometimes lose a little bit in speed and binary space compared to direct assembler instructions, but in general, the result is pretty good and direct assembler should be needed only in the tightest of the loops. The HAL is implemented at the kernel level and is available from the very beginning of the boot sequence, which makes extensive use of it to bootstrap into a usable system. The HAL is always for the "live" system. It has not been designed with cross- compiling in mind. ## Concepts ### Register allocation The HAL has 5 virtual registers: W, A, S, PSP, RSP. Each architecture implementing the HAL will need to map those virtual registers to actual registers. Here is the list of mappings for all supported CPUs: i386 W=eax A=ebx S=edx PSP=esi RSP=esp ARM W=r9 A=r11 S=r8 PSP=r10 RSP=r13 ### RSP and PSP registers RSP and PSP registers map directly to Forth's RS pointer and PS pointer. They're the same. ### W register The W register is the "work" register and the default destination of all HAL instructions. When we say that "@," means "fetch", we mean "fetch into the destination", which is the W register by default. The W register is PS's top of stack. This means that, for example, increasing the W register by 1 is the exact same thing as executing the "1+" word. ### A and S register The HAL has two extra registers that regular Forth doesn't have: the A (Address) register and the S (Scratch) register. Both are general purpose registers that can be used both as source and destination. To use as a source, there's the A) and S) words that behave like W). To have an operand target one of those registers as their destination, you use the A>) and S>) words. In terms of capabilities, the A register has the exact same ones as W. The S register can also be used in the same way, but it's used by the "/mod," operation as a result register, so it's a tiny bit less "permanent". ### Register permanence The A and S registers cannot be expected to keep their value across word calls. As soon as another word is called, we must consider those values destroyed. However, all HAL operations and macros must preserve A's value (unless A is the destination, of course). Therefore, we can rely on A's value as long as the code we generate doesn't branch to other words. This applies to all "compiling" words such as dup, drop, if, then, etc. Those words are supposed to leave the A register alone. The S register doesn't have the same guarantees and is used by some HAL operations and macros as temporary storage. That's why it's called "Scratch". Operations and macros using it with will mention it in their documentation. Rule of thumb: if you're coding a HAL macro, use the S register. If you're coding a regular word, use A because the S register might be swept under your feet. It goes without saying that W, being PS's top of stack, is preserved at all times. ### Operands All HAL instructions take either no operand (inherent) or one operand parameter. That operand parameter is a 32-bit number with an arch-specific (that is, opaque) bit structure and that contains all the information the instruction needs to know the source and destination of the instruction. Operand words all end with ")". For example, "A) +," means "add 32-bit location where the A register points to the W register". Some operand words are not directly operands, but operand modifiers. For example, "+)" adds a numerical offset to an operand. "W) 4 +)" refers to the memory location where W points to, with a 4 bytes displacement. The "8b)" modifier transforms the operand into a 8-bit operand. By default, all operands refer to a memory location. Only through the "&)" operand (see below) can we refer directly to a value in a register. ### &) operand modifier The &) word takes an input operand and returns its dereferenced counterpart. For example, m) becomes i), W) becomes a direct reference to W, etc. This also works with displacements. For example, "RSP) 4 +) &)" yields an operand that points to RSP+4. This operand might not be adressable directly by the host CPU. In that case, the HAL operator will compile two instructions. For example, "RSP) 4 +) &) +," under i386 would yield "di sp 4 +) lea, ax di add," ("di" being any unallocated register). The "&)" word never writes instructions directly, only operator words do. The "lea," above wouldn't be written when "&)" is called, but when "+," is. The &) operand always results in a 32-bit operation. Don't try to apply 16b) or 8b) afterwards, this results in undefined behavior. &) can't be used with i). ### <>) operand modifier The <>) word inverts the destination and the source of the HAL instruction, allowing arithmetic result to be stored directly in memory. For example, "$1234 m) 8b) <>) +," adds the 8-bit value at address $1234 to W and stores the result directly in address $1234 without affecting W. ### 8b) and 16b) arithmetics 8b) and 16b) modifiers only apply to memory accesses and all arithmetics are "upscaled" to 32-bit with regards to flags settings. This also applies to compare, which means that, for example, "$4242 i) @, RSP) 8b) compare," will never set the Z flag because even if RSP) is $42, comparison is done one the whole W register. ### RSP) and [rcnt] The only HAL operation that automatically adjusts [rcnt] (see "Local variables" in [doc/usage/rs]) is rs+,. Other HAL operations don't touch [rcnt]. Therefore, special care must be taken when using the RSP) operand. If you're inside of a regular "code" word, you don't care about [rcnt], so you can ignore this warning. However, if you're writing HAL as part of a macro that could be used in a word that has local variables, then every time you write a HAL operation that modifies RSP ("RSP) @+," for example), you need to adjust [rcnt] accordingly or else you'll break local variables. ### Branching and flags The HAL can generate branching, conditional or not, through its "branch" instructions. "branchC,", the conditional branching generator, takes a "cond" argument. This argument is generated by words like "Z)", ">)", etc. and the number they yield is arch-specific. The idea is that through this number, the "branchC," instruction knows the kind of native branch instruction to generate. These conditions depend on flags being set (or not) and the conditions under which these flags are set is not exactly the same across achitectures. As of now, when we refer to "flags", there's actually only the Z flag involved, which is set when the operation yields a zero result. One day, maybe we'll add the C flag (Carry), so that's why we refer to "flags" as plural. Of course, CPUs have more flags than that, but they are opaque to the HAL. To be able to rely on consistant condition branching, HAL instructions make guarantees on the flags set by certain instructions. If an instruction has a "Z" next to it in the listing below, it's safe to conditionally branch using "Z)" or "NZ)" right after having called it. Even if the native instruction for a particular HAL word doesn't supply that flag, the HAL instruction will generate the necessary native instructions to make it so, at the cost of speed. For this reason, we minimize flag guarantees in HAL words. Condition flags are only valid right after the instruction that's supposed to set it. Flags are considered destroyed as soon as you compile another instruction... with one exception: branching preserves flags. This means that after a branch, branchC, or branchR, flags are the same as they were before. Arithmetic conditions (">)", "<=)", etc.) have no associated flag and can only be used after a "compare,". If you look at branching words signatures, you'll notice something weird: they take an address parameter and yield an address result. This is because those words can be used for both backward branching or forward branching. What they do is to write down a branch to the supplied address, but also yield an address to a memory location that can then be used by "branch!". Therefore, a backward branch looks like "begin .. branch, drop" and a forward branch looks like "0 branch, .. here branch!" All addresses passed to branching words are absolute addresses. If the native instructions use relative branching addressing, the HAL takes care of the translation. ## pushret, popret, and popexit, In Dusk, "Call" means "Push the address of the instruction following the current one to RSP and then jump to the address being called". "Return" means "Pop RSP and jump to that address". On "traditional" CPU architectures, this maps exactly to the behavior of the native "call" and "return" instructions, so we can live a happy life of blissful ignorance when using these CPUs. On some CPUs such as ARM, the native "call" model is to save the address we'll want to return to to a register and leave the task of push/popping to RSP to the programmer. Of course, one thing we could do is to simply wrap all calls and returns in Dusk into RSP push/pop operations, but that would squander a wonderful speedup opportunity: With such an approach to calling, we can avoid one push and one pop on each "leaf" routine call, that is, on each call to a routine that doesn't call any other routine. That adds up to quite a lot of pushes and pops. To grab this opportunity, the HAL has three words: pushret, popret, and popexit, On "traditional" CPUs, these are hollow shells. The first two are noops and the last one is an alias to "exit,". On ARM, these words push and pop the return address register to and from RSP. Words defined through "high level" mechanisms such as ":" call those words automatically, no need to worry. However, words created with "code" don't because it could be a "leaf" word. This means that if you create a "code" word that happens to not be a "leaf" (that calls another word), it needs to call "pushret," as a prelude and to call "popret," before it returns. Leaf words don't need to do that, which makes them faster. ## Word marks On some architectures (on WASM), there is a strong separation between "code" and "data" and memory areas containing executable code have to be "marked" as such. We do so with "wordmark,". Calling this results in an arbitrary number of bytes to be written to "here" to serve as such mark (it's a noop in architectures not needing it). This mark also serves as JIT status, which means that when a piece of code changes (for example in "realias"), its word mark should be re-written. These marks apply to - every word on sysdict (as well as their code16 and code8 metadata) - every location that is targeted by a CALL instruction The "code", "code8b", "code16b" words will automatically write a word mark, while the "entry" word will not. ## HAL API Operand words: W) -- op Indirect W register A) -- op Indirect A register S) -- op Indirect S register PSP) -- op Indirect PSP register RSP) -- op Indirect RSP register i) n -- op Immediate operand. Can't use with <>) m) addr -- op Absolute address +) op disp -- op Apply displacement to op. Can be applied multiple times. Displacement can be negative. W>) op -- op Set destination to W A>) op -- op Set destination to A S>) op -- op Set destination to S &) op -- op Dereference operand (see above) <>) op -- op Direction of the operation is inverted (see above) 8b) op -- op Make op 8-bit 16b) op -- op Make op 16-bit 32b) op -- op Make op 32-bit (default) Operand query words: (W? ( op -- f ) Yields whether "op"'s base register is W, regardless of its direct/displacement/invert flags. Because W is the top of stack, there's often special processing to do in that case. (split ( op -- width dst src ) Split "op" in three components. The broad idea is that "or"-ring those components together will yield "op" back. "src" is the "heaviest" component. It includes displacement/invert flags. "src" is "destination-less" and "width-less" (*not* the same thing as 32-bit) and *cannot* be used as-is with an operation. Either "or" it back with a "dst/width" or re-apply explicit destination/width words on it. Conversely, you shouldn't "or" a component back with a "full" op, only a "splitted" one. "dst" only includes the destination register. "width" is a *set of flags*, not a number of bytes. To get a number of bytes, apply "(sz" to it. (sz ( op -- n ) Yields "op" width in bytes, that is, 4, 2 or 1. Branching and conditions: Z) "Zero" flag set. On "compare,", this means "equal". NZ) "Zero" flag not set. On "compare," this means "not equal". <) <=) >) >=) s<) Signed comparison s<=) s>) s>=) C>W, cond -- If cond is met, W=1. Otherwise, W=0. branch, a -- a Branch to address a, yielding a "forward" address for "branch!" branchC, a cond -- a Branch to address a if condition is met, yielding "a" for "branch!" branch! braddr tgtaddr -- Given "braddr" yielded by a previous "branch" instruction, change the reference at the address so that it targets "tgtaddr". Used for forward branching. branchR, a -- Compile a branch to address a while at the same time setting the "return address" (commonly, that means pushing to RSP, but not always) to the instruction directly following this one. This is commonly called a "call". branchA, -- Branch to the address held in the A register. exit, -- Compile a return from a call. pushret, -- Push the current return address to RSP (on relevant CPUs) popret, -- Pop RSP in return address register (on relevant CPUs) popexit, -- Equivalent to "popret, exit," but faster. wordmark, -- Write a "word mark". See section above. Instructions: @, op -- Read source into dest @!, op -- Swap dest and source +, op -- Z dest + source -, op -- Z dest - source *, op -- dest * source. /mod, op -- divide dest by source and put remainder in S register. Can't be used with S>). <<, op -- dest lshift source >>, op -- dest rshift source s>>, op -- Arithmetic ("signed") shift right. Instead of filling the "right" part of dest with zeroes, it fills it with its b31. &, op -- Z dest and source |, op -- Z dest or source ^, op -- Z dest xor source @+, op -- Read source into dest and then add 4/2/1 to operand's dereferenced source. Cannot be used with m) i) &) If source is the same as dest, behavior is undefined. -@, op -- Subtract 4/2/1 to operand's dereferenced source and then read source into dest. Decrement happens before fetch, hence the symbol order being the opposite of "@+". Cannot be used with m) i) &). compare, op -- * Compare source to dest (all flags set) example: if W=1 and A=2, "A) &) compare," makes "<)" condition true. +n, n op -- Z Add n to source without affecting dest Can't use with i) or <>) -W, -- W = -W ## HAL macros These words below aren't implemented in kernels and are combinations of the words above, but they're pretty useful nonetheless. (src op -- src Same as "(split rot> 2drop" (dst op -- dst Same as "(split rot 2drop" (width op -- width Same as "(split 2drop" ps+, n -- Add n to PSP rs+, n -- Add n to RSP !, op -- Write dest to source. Shortcut for "<>) @," !+, op -- Equivalent to "<>) @+,". Source==dest is weird, but fine. -!, op -- Equivalent to "<>) -@,". field+) op "x" -- Equivalent to "0 to' +)" with "x" input stream. In other words: add the offset of the typed field to the HAL operand. Doesn't work with methods. [@+], ( op -- ) Do an indirect fetch+increase, that is: Fetch a 32-bit address at op's src and fetch perform a "@," on that address. Then, increase op's src by op's size in bytes. Cannot be used with "S)", "&)", "i)" or "<>)". Destroys the S register, but using it with "S>)" is fine. [!+], ( op -- ) Do an indirect store+increase, that is: Fetch a 32-bit address at op's src and fetch perform a "!," on that address. Then, increase op's src by op's size in bytes. Cannot be used with "S)", "S>)", "&)", "i)" or "<>)". Destroys the S register. ## Examples To give a better idea of how the HAL works, here are examples with their corresponding i386 instructions: PSP) @, ax si 0 d) mov, A) 8b) !, bx 0 d) al mov, RSP 4 +) A>) +, bx sp 4 d) add, PSP) &) A>) @!, bx si xchg, PSP) <>) <<, cx ax mov, si 0 d) cl shl, RSP) @+, ax sp 0 d) mov, sp 4 i) add, A) 16b) !+, bx 0 d) 16b) ax mov, bx 2 i) add, A) 16 +) &) @, bx 16 d) lea, $1234 m) +n, $1234 m) 42 i) add, 42 PSP) &) +n, si 42 i) add, 54 i) -, ax 54 i) sub, Here are actual word implementations: code drop PSP) @+, exit, code dup PSP) -!, exit, code swap PSP) @!, exit, code nip 4 ps+, exit, code over PSP) -!, PSP) 4 +) @, exit, code 1+ 1 i) +, exit, code lshift PSP) <>) <<, PSP) @+, exit, code c@ W) 8b) @, exit, code , HERE m) A>) @, 4 HERE m) +n, A) !, PSP) @+, exit, code not 0 i) compare, Z) C>W, exit, code execute A) &) !, PSP) @+, branchA, Branching: ' foo branchR, \ call "foo" ' foo branch, drop \ jump to "foo" 42 i) compare, ' foo Z) branchC, drop \ jump to "foo" if W=42 here branch, drop \ infinite loop \ Execute code "..." only if W <= A A) &) compare, 0 >) branchC, ... here branch! ## HAL number bank Numbers supplied to i) m) and +) can be any number of the 32-bit range. Nevertheless, as per HAL API constraints, all operands occupy only one PS slot. Therein lies a problem: how can a 32-bit operand include its necessary metadata along with a possible offset that can be anything in the 32-bit range? It does so through a number bank mechanism. The number bank is a 4b * 16 global and static rolling buffer. This allows us to assign arbitrary number to slots numbering from 0 to 15. This slot number occupies only 4 bit in our HAL operand, which is much more manageable. This allows up to 16 operands associated with numbers to coexist at once on PS, making HAL and assemblers (which piggy-back on this API) pretty macro-able. Every kernel implement this number bank and expose this API: hbank' ( slot -- a ) Get address associated to bank slot. hbank! ( n -- slot ) Reserve a new slot and write "n" to it. Yield the ID of the new slot. hbank@ ( slot -- n ) Yield number in slot.