mjl blog
April 27th 2014

Tokenizing lines with (quoted) strings in Go

While programming some new tools, I needed to parse configuration files. Normally, I would read a file, remove all empty lines and lines starting with a “#”. Then tokenize each line and treat the tokens as commands. I could even treat them as actual commands with flags and such.

So, I went looking in the Go libraries for a function to do this. There is strings.Fields. But that isn’t enough, because it always splits on whitespace. You want to be able to include whitespace in tokens. That’s also why the variant with a split-function won’t work.

I couldn’t find existing scanners useful to implement this. So I wrote a small new library: tokenize. It splits on whitespace too, but you can use double quotes to make strings that include whitespace. Two subsequent double quotes inside such a string are unescaped into a single double quote. This is somewhat similar to what Plan 9’s tokenize() function does.

For more complex configuration files, I would recommend using a json unmarshaller, or maybe an ini-file parser. They do more for you.

Comments