Dusk's Byte Sieve benchmark

The code generated by Dusk OS holds pretty well to modern compilers with fancy optimizations. For example, let's use the famous Byte Sieve benchmark and modify it slightly for modern CPU capabilities:

#include <stdio.h>
#define size 6000000
char flags[size+1];
void main() {
  int i, prime, k, count, iter;
  printf("10 iterations\n");
  for (iter=1; iter<=10; iter++) {
    count = 0;
    for (i=0; i<=size; i++) flags[i] = 1;
    for (i=0; i<=size; i++) {
      if (flags[i]) {
        prime = i + i + 3;
        k = i + prime;
        while (k <= size) {
          flags[k] = 0;
          k += prime;
        }
        count++;
      }
    }
  }
  printf("\n%d primes", count);
}

On my 10 years old desktop on a NetBSD 10.0 i386 machine, doing gcc -o sieve sieve.c (GCC version is 10.5.0) produces a 15752 bytes sieve ELF. Doing time ./sieve completes in 1.22 seconds.

Doing the same with gcc -O2 yields a 15780 bytes ELF that completes in 0.46 seconds.

Now let's take the same C file and compile it through DuskCC. The only modifications needed are the removal of the #include and renaming main() to sieve(). The rest stays the same.

To have a fair comparison, the code will need to run "raw" on the CPU. A good way to do so is to use Usermode Dusk. The Dusk package examples repository even has examples that build the exact code shown below. So, if we do:

git clone https://git.sr.ht/~vdupras/dusk-examples
(verify PGP signature)
cd dusk-examples
make sievec
time ./sievec

This produces a 339 bytes word named sieve and running this takes 1.23 seconds. Roughly as good as GCC without a -O! The result of the computation is the same of course.

What about Forth code? Let's see:

6000000 const size
create flags size 1+ allot

variable count
variable prime
: sieve ( -- )
  ."10 iterations\n"
  10 for
    !{'count #0}
    flags size cells/ $01010101 fill
    0 size for2 @A{R} @{Ac+flags} if
      lshift{R#1} +{&#3;} dup !{'prime}
      +{R} begin ( k ) dup size <= while
        dup !{c+flags #0} ( k )
        +{'prime} repeat drop ( )
    +{'count !#1} then next next
  nl> count @ . ." primes" ;

That's pure Forth code taking advantage of Big Moustache for speed. Guess what? a 347 bytes word running in 1.30 seconds. A few milliseconds shy of GCC's unoptimized build!

We want to squeeze some speed out of Dusk? Sure, let's use the HAL:

6000000 const size
create flags size 1+ allot

variable count

: sieve ( -- )
  ."10 iterations\n"
  10 for
    0 count !
    flags size cells/ $01010101 fill
    0 r! [ ( W=0 ) 0 i) A>) @, begin
      W) flags +) 8b) A>) compare, 0 Z) branchC,
        1 i) <<, 3 i) +, S) &) !, RSP) +,
        size i) compare, 0 >) branchC, begin
          W) flags +) A>) 8b) !, S) &) +,
          size i) compare, <=) branchC, drop then
        1 count m) +n, then 1 RSP) +n,
      RSP) @, size i) compare, NZ) branchC, drop ] ( W )
    rdrop drop next nl> count @ . ." primes" ;

Verdict? A 287 bytes word completing in 0.51 seconds... blowing past unoptimized GCC builds and almost reaching the speed of its -O2 builds!

So, Dusk OS provides you with speed not too far away from what modern compilers with their millions upon millions of lines of code provide you, but through an other-worldly simpler system.