March 21st 2024

Cross-referencing code & specs for maintainability

You’re implementing a network protocol. You read the specs, make a mental model and a design to fulfill the requirements, do some prototyping, check against existing software implementing the protocol, then continue with the details until the implementation is finished. You write tests. Perhaps even integration tests, talking to other existing software. Now you’re done, and you move on to other interesting software projects.

Haha, if only.

No… you’ll be maintaining your implementation for a long time. Sooner or later you’ll run into software that doesn’t interoperate. It could be your fault, or perhaps you’ll have to add a workaround for other software. Not convinced yet? New extensions to the protocols will be published that you’ll want to implement. Security issues are found in the protocol that need mitigation. Changes to your implementation will be needed.

When you make those changes, do you still have that mental picture of the requirements and design of the protocol and specs in mind? How do you know the implementation and its design still adheres to all old and new requirements after the changes? Can other people (including your future self) safely make changes to the code? Comments will help explain the big picture and design choices. Tests will help to lock existing behaviour down.

Cross-referencing the code with the specifications will also be very helpful: With getting up to speed with the code quickly. With understanding why the code is the way to it (pointing to the relevant requirement in the spec). With verifying that an implementation has taken all requirements from the spec into account.

I’m writing mox, a modern mail server. It implements protocols like SMTP, IMAP4 (plus extensions), SASL, SPF, DKIM, DMARC, DANE, MTA-STS, and more. These protocols are specified in RFCs. Mox is cross-referenced with these RFCs through comments in the source code.

Example

Here’s an example from the IMAP server code in mox, for the ID command, from imapserver/server.go.

// Clients can use ID to tell the server which software they are using. Servers can
// respond with their version. For statistics/logging/debugging purposes.
//
// State: any
func (c *conn) cmdID(tag, cmd string, p *parser) {
        // Command: ../rfc/2971:129

        // Request syntax: ../rfc/2971:241
        p.xspace()
        var params map[string]string
        if p.take("(") {

This snippet has two references to RFC 2971, "../rfc/2971:129" and "../rfc/2971:241". The references point to a file in the file system, "../rfc/2971", with a line number. Contents of that file starting with line 129:

3.1. ID Command                                                                   ../imapserver/server.go:1406

   Arguments:  client parameter list or NIL

   Responses:  OPTIONAL untagged response: ID

It has a reference back to the code, on the right. The reference in the RFC is added automatically by the small program linked below.

Referencing RFCs by line number works because RFCs don’t change after they are published. Lines are never longer than 80 characters. Adding references to each line is easy: Rewrite the RFC line by line, removing anything after position 80 (previously generated references), then adding current references.

The rfc directory in the mox repo has an index.txt with relevant RFCs, and a Makefile that runs link.go (200 lines). It reads all source code files, finds the references, and rewrites all RFC files.

Linking

Comment lines mentioning multiple RFCs/line numbers cause them to be linked together. The back reference in the RFC points to the other reference too. Another example, again from imapserver/server.go.

// Append adds a message to a mailbox.
//
// State: Authenticated and selected.
func (c *conn) cmdAppend(tag, cmd string, p *parser) {
        // Command: ../rfc/9051:3406 ../rfc/6855:204 ../rfc/3501:2527
        // Examples: ../rfc/9051:3482 ../rfc/3501:2589

        // Request syntax: ../rfc/9051:6325 ../rfc/6855:219 ../rfc/3501:4547
        p.xspace() 
        name := p.xmailbox()
        p.xspace()
        var storeFlags store.Flags
        var keywords []string
        if p.hasPrefix("(") {

For context: RFC 9051 is about IMAP4 revision 2, RFC 3501 about IMAP4 revision 1, and RFC 6855 about IMAP support for UTF-8. The line "// Command: ../rfc/9051:3406 ../rfc/6855:204 ../rfc/3501:2527" points to the description of the command in these RFCs. If I’m reading the code and need to refresh my memory, I’ll open those links with a single right-click in my editor of choice (acme), and read the descriptions. RFC 9051 on line 3406 looks like this:

6.3.12.  APPEND Command                                           6855:204 3501:2527 ../imapserver/server.go:2703

It points back to the code, but also to the other RFCs (just files in the same “rfc” directory, with line numbers). When reading the latest RFC, it’s easy to open the description of the APPEND command in previous revisions of the protocol. This is also a useful way to link between RFCs that update each other, such as extensions. And RFC errata. These RFC cross-references may also be useful if you’re implementing your own email-related code, ignoring references to mox code.

Referencing the same RFC line from multiple places in the code also links together the code. I can’t do that by adding line-number-based references between code directly, because those line numbers change.

Annotations

Annotations like “todo:” in references in the code make it into the RFC. When you’re reading the RFC and you see a reference, you’ll know some behaviour may not be up to spec, or not yet implemented. For example, in the implementation of the IMAP “CREATE” command:

	// todo: support CREATE-SPECIAL-USE ../rfc/6154:296

Resulting in this corresponding line in the RFC:

   An IMAP server that supports this OPTIONAL feature will advertise the         todo: ../imapserver/server.go:2235
   "CREATE-SPECIAL-USE" capability string.

Try it online

HTML versions of the code and RFCs are on the mox website. Try this link to the code snippet of the IMAP4 “append” command from earlier, side-by-side with the first RFC reference:

https://www.xmox.nl/xr/v0.0.10/#imapserver/server.go:2691,rfc/9051:3406

The code is always on the left, the RFC on the right. There is also an, admittedly dense, index page listing code and rfc files. Heads up: Browsing through code and RFCs for the first time can feel a bit disorienting, the page jumps around when clicking a link due to use of id anchors for line numbers, but you’ll get the hang of it.

Maintainability & quality

So does this improve maintainabiliy? I don’t have hard evidence, but it feels like it does ☺. Some code in mox is already getting close to two years old. When I need to make changes, it’s easier to jump back in and understand what’s going on.

I believe quality also goes up with this approach. After the bulk of the implementation work (including writing tests), I tend to do a last round of adding references from code to specs. Both by reading the code and adding references to the RFCs, and by reading the RFC again and checking that requirements have references to the code (disclaimer: mox does not have 100% requirement coverage!). This usually finds some bugs, e.g. some requirement that wasn’t implemented yet, or an implementation assumption that has no basis in the specification.

Adding references takes time. But time is saved again because bugs are found early.

Questions

I don’t remember seeing other projects that do this. This technique happens to work well with RFCs: text files that don’t change. Perhaps less so with other types of specifications, that have more frequent updates and/or are in different formats.

Questions: Would this work for you? Do you know of other projects that do this, or with a different approach of cross-referencing code and specs? Are the references in the code, or kept separately? If separate, how do you stay in sync? Are special tools needed? Perhaps IDE integration? How convenient is adding references and following them? How would you improve on the approach used in mox? Other potentially uncommon/unconventional experience around improving maintainability/quality of code are welcome too.