Features making Dusk OS special

A whole OS built from source on boot

One thing that makes Dusk OS special is that it boots from a tiny kernel weighting less than 4 kilobytes. From this tiny core, on boot, it builds its way up, from source code, to a system that has a functional C compiler, which then allows it to bootstrap itself some more, from source code.

This peculiarity of Dusk OS has interesting properties. The nicest one, in my humble opinion, is that this allows us to sidestep the entire problems of binary compatibility and relocation and only deal with source compatibility. So, no ELF, no binutils, only code that is designed to run from where it was generated in the first place. This is so much simpler!

Object files? Global symbols? Nah. C functions are simple Forth words.

Harmonized Assembly Layer

Dusk features what we call the Harmonized Assembly Layer (HAL for short). This is a cross-CPU assembler, on which the C compiler relies, which prioritizes implementation and usage simplicity, but is also designed to generate efficient native code.

Shortest path to self-hosting for an "almost C" compiler

Dusk OS self-hosts in about 500 lines of assembly and a few hundred lines of Forth (the exact number depends on the target machine). From there, it bootstraps to DuskCC, which is roughly 1000 lines of Forth code. To my knowledge, Dusk OS is unique in that regard.

You can pick any C compiler that requires POSIX and it will automatically require orders of magnitude more lines of code to bootstrap because you need that POSIX system in addition to the C compiler. So even if you pick a small C compiler such as tcc, you still need a POSIX system to build it, which is usually in the millions of LOCs.

To be fair, Dusk OS is not the first project thinking of optimizing that path. Efforts at making our modern software world bootstrappable lead to an "almost C", M2-Planet with a feature set comparable to DuskCC with very few lines of code. M2-Planet itself is about 5K lines of code and the various stages that lead to it are generally a few hundred lines each. The project initially ran on top of regular kernels (as in "fat kernels with lots of code"), but some bare metal stages (1, 2) were created and now this little chain end up being comparable to Dusk in terms of lines of code. Still more than Dusk, but in the same ballpark.

Although this path is short and technically leads you to an "almost C" compiler, you can hardly use it because it has no "real kernel" (those bare metal stages mentioned above are enough to compile M2-Planet, but really not much else, they're extremely limited) and no shell. You'll need those if you want to use your shiny compiler.

One of your best picks, should you try this path, would be Fiwix, a minimal POSIX i386 kernel weighting less than 50K lines of C+asm. But then, M2-Planet is not enough. You need to compile tcc (which M2-Planet can compile after having applied a few patches) which weights 80K. Userspace is worse. Bash+coreutils are 400K, even busybox is 190K. We still end up with a pretty minimal and simple system, but it's still a lot more code than Dusk.

So, unless someone tells me about some option I don't know about, DuskCC is quite innovative on the aspect of self-hosting path length.

Dusk OS is pretty fast

The code generated by Dusk OS holds pretty well to modern compilers with fancy optimizations and millions of lines of code.

In a Byte Sieve benchmark done on a i386 NetBSD system, the DuskCC version of the sieve is almost as fast as GCC's unoptimized build and the HAL's translation of the Sieve algorithm blows past GCC's unoptimized build to nearly reach the speed of GCC -O2!

And it gets better! HAL's design allows efficiency to naturally "bubble up" to higher level code with impressive results. For example, there's the "charcount" example which pits Dusk's rfind against POSIX's regex(3). For this particular and simple use case (count occurrences of characters range in a big file), Dusk is 15% faster than Debian bookworm amd64's regex(3) implementation.