INET daemon that serves Gopher requests
This project is a Dusk package that is designed serve Gopher
requests using POSIX's accept(2)
and listen(2)
. It handles IOs through INET
sockets in a non-blocking way, allowing it to serve many requests concurrently.
The package isn't designed to be very configurable, but rather to have a source code simple enough that it can easily be adapted to your needs.
This code is available from Dusk OS' files folder or as a Git repository (without SSL) at:
git://git.duskos.org/dusk-gopherd.git
To build this project, you need:
Running make
will download Dusk OS and compile the package using its
Usermode. The end result in a standalone dkgopherd
executable that
takes no argument.
It launches the server that will bind a socket to BINDADDR:BINDPORT
(constants in the source code) and listen to Gopher requests on that socket
forever.
Before you build, you might want to change a few constants such as BINDADDR
,
HOSTNAME
or BASEDIR
. These constants are at the top of either dkgopherd.fs
or dkgopherd.c
.
This server has a fixed number of active connections with static memory
associated with each of those connections. The number of such connections is
controlled by CONNCNT
(16 by default).
Before accepting connections, we chroot ourselves to BASEDIR
(default
/var/gopher
, see dkgopherd.c
) and change our user to TGTUSER
(default
nobody
).
Then, we start the main loop, which does two things: accept new connections and "talk" with the active connections.
When we accept a new connection, we try to find an unused connection in the pool. If all connections are used, we drop one. Therefore, we always handle incoming connections.
TODO: For now, we always drop the first one of the array, but eventually, a better selection system based on time and remote address will be devised.
For this server to be non-blocking, We need to divide our request handling tasks in chunks that stop at the moment where we need to wait after the remote connection. This means that we need to keep track of the state every connection is in, with an associated data buffer.
The data buffer, Conn buf
is used for both reading and writing and is BUFSZ
size in bytes (default 16kb). We use the same buffer for reading and writing
because once we've handled the incoming request, we don't use it anymore so it
doesn't matter that we overwrite it as we spit our contents.
Connections can be in two states: REQRD
(read request) or RESPWR
(write response).
When the connection is initiated, it's placed in REQRD
mode, that is, we're
waiting for a Gopher selector to come in. We do so by reading into the
connection's buffer up to MAXREQUESTSZ
bytes, or until a whitespace
(character smaller than $21
) ends up in the buffer.
If we reach MAXREQUESTSZ
, the request is invalid, we close the connection.
Otherwise, the contents up to the whitespace is the selector. If the selector is
empty, we set it to index
(the index of the whole site).
We have our selector and the connection is about to pass to the RESPWR
mode,
that is, when it writes some contents to the connection. In the description
below, when we say "spit", we mean place the contents in the buffer so that it
can be picked up by the non-blocking loop and then written to the connection.
So, we look up that selector in the chrooted filesystem. If not found, we spit "not found" then close.
Otherwise, we check if the path is a directory. If it is, we automatically generate a directory listing and spit it.
If it's a file, then we don't spit it right away, we simply save the opened file descriptor. The idea is that the file requested might be bigger than the connection buffer and will have to be spit in multiple chunks.
And that completed request handling.
Spitting contents to the connection buffer is one thing, but that contents hasn't yet reached the connection yet. This is done in the main loop.
When "talking" to an active connection that is in RESPWR
mode, we first check
if there's anything to send in the memory buffer. We know that by looking at the
connection's a
and u
fields which represent a range of the contents that is
still left to send.
We try to send as much of that range down the internet tubes, then check how many bytes we've managed to send. We reduce the range by that many bytes.
If the range is empty, we check if we have an associed file descriptor (Conn
filefd
) to feed our output buffer. If we have, we fill the buffer again with
contents read from Conn filefd
. When we have no more contents to send, we
close the connection.
If the selector correspond to a directory, we auto-generate an index by
iterating over its contents, ignoring all elements beginning with a .
.
In this listing, we differentiate between the "text" type and the "bin" type
through a predetermined list of file extensions for text files, by default
".txt" and ".md".
The contents of the request cannot be trusted and could be malicious. To protect the host against malicious requests, these mechanisms have been put in place.
chroot(2)
itself in BASEDIR
to prevent a malicious path from
accessing files outside it.TGTUSER
(by default nobody
).This program doesn't daemonize itself. However, if you want to run it as a service, you'll most likely want this. To do so, you can use daemonize.