Dusk Gopher Daemon

INET daemon that serves Gopher requests

This project is a Dusk package that is designed serve Gopher requests using POSIX's accept(2) and listen(2). It handles IOs through INET sockets in a non-blocking way, allowing it to serve many requests concurrently.

The package isn't designed to be very configurable, but rather to have a source code simple enough that it can easily be adapted to your needs.

This code is available from Dusk OS' files folder or as a Git repository (without SSL) at:

git://git.duskos.org/dusk-gopherd.git

Build and run

To build this project, you need:

Running make will download Dusk OS and compile the package using its Usermode. The end result in a standalone dkgopherd executable that takes no argument.

It launches the server that will bind a socket to BINDADDR:BINDPORT (constants in the source code) and listen to Gopher requests on that socket forever.

Before you build, you might want to change a few constants such as BINDADDR, HOSTNAME or BASEDIR. These constants are at the top of either dkgopherd.fs or dkgopherd.c.

Theory of operation

This server has a fixed number of active connections with static memory associated with each of those connections. The number of such connections is controlled by CONNCNT (16 by default).

Before accepting connections, we chroot ourselves to BASEDIR (default /var/gopher, see dkgopherd.c) and change our user to TGTUSER (default nobody).

Then, we start the main loop, which does two things: accept new connections and "talk" with the active connections.

When we accept a new connection, we try to find an unused connection in the pool. If all connections are used, we drop one. Therefore, we always handle incoming connections.

TODO: For now, we always drop the first one of the array, but eventually, a better selection system based on time and remote address will be devised.

Talking with active connections

For this server to be non-blocking, We need to divide our request handling tasks in chunks that stop at the moment where we need to wait after the remote connection. This means that we need to keep track of the state every connection is in, with an associated data buffer.

The data buffer, Conn buf is used for both reading and writing and is BUFSZ size in bytes (default 16kb). We use the same buffer for reading and writing because once we've handled the incoming request, we don't use it anymore so it doesn't matter that we overwrite it as we spit our contents.

Connections can be in two states: REQRD (read request) or RESPWR (write response).

When the connection is initiated, it's placed in REQRD mode, that is, we're waiting for a Gopher selector to come in. We do so by reading into the connection's buffer up to MAXREQUESTSZ bytes, or until a whitespace (character smaller than $21) ends up in the buffer.

If we reach MAXREQUESTSZ, the request is invalid, we close the connection. Otherwise, the contents up to the whitespace is the selector. If the selector is empty, we set it to index (the index of the whole site).

We have our selector and the connection is about to pass to the RESPWR mode, that is, when it writes some contents to the connection. In the description below, when we say "spit", we mean place the contents in the buffer so that it can be picked up by the non-blocking loop and then written to the connection.

So, we look up that selector in the chrooted filesystem. If not found, we spit "not found" then close.

Otherwise, we check if the path is a directory. If it is, we automatically generate a directory listing and spit it.

If it's a file, then we don't spit it right away, we simply save the opened file descriptor. The idea is that the file requested might be bigger than the connection buffer and will have to be spit in multiple chunks.

And that completed request handling.

Chunked writes

Spitting contents to the connection buffer is one thing, but that contents hasn't yet reached the connection yet. This is done in the main loop.

When "talking" to an active connection that is in RESPWR mode, we first check if there's anything to send in the memory buffer. We know that by looking at the connection's a and u fields which represent a range of the contents that is still left to send.

We try to send as much of that range down the internet tubes, then check how many bytes we've managed to send. We reduce the range by that many bytes.

If the range is empty, we check if we have an associed file descriptor (Conn filefd) to feed our output buffer. If we have, we fill the buffer again with contents read from Conn filefd. When we have no more contents to send, we close the connection.

Directory listing

If the selector correspond to a directory, we auto-generate an index by iterating over its contents, ignoring all elements beginning with a .. In this listing, we differentiate between the "text" type and the "bin" type through a predetermined list of file extensions for text files, by default ".txt" and ".md".

Security considerations

The contents of the request cannot be trusted and could be malicious. To protect the host against malicious requests, these mechanisms have been put in place.

  1. The program chroot(2) itself in BASEDIR to prevent a malicious path from accessing files outside it.
  2. This program is designed to be ran as root, and once it is done chrooting itself, it changes its effective user to TGTUSER (by default nobody).
  3. The selector is accumulated in a static buffer of a fixed size (by default 64 bytes). As soon as input goes over that threshold, we go in "bad request" condition.

Running as a daemon

This program doesn't daemonize itself. However, if you want to run it as a service, you'll most likely want this. To do so, you can use daemonize.