Why Vim syntax highlighting breaks sometimes

Vim was my preferred text editor for nearly eighteen years, until I switched to aretext in 2021. I appreciated vim’s efficiency and ubiquity, the way I could rely on it regardless of what project I was working on or what machine I had ssh’d into. Like any software, however, vim reflects the time in which it was written. In many cases, vim optimizes for speed above all else, an approach that made sense given the limitations of late ’90s computers. Nowhere is this trade-off more apparent than in vim’s implementation of syntax highlighting.

Vim syntax highlighting first appeared in version 5, which was released in 1998. Syntax highlighting quirks have confused vim users ever since. A quick Internet search yields many bug reports, such as this rather plaintive Reddit post from 2015 ("[Vim] breaks the syntax highlighting. All the time. It’s unbearable…"). The Vim Tips Wiki has a full page titled simply “Fix syntax highlighting”.

With its typical candor, the vim user guide explains how syntax highlighting can go awry:

Vim doesn’t read the whole file to parse the text. It starts parsing wherever you are viewing the file. That saves a lot of time, but sometimes the colors are wrong. A simple fix is hitting CTRL-L. Or scroll back a bit and then forward again.

It is easy to confuse vim’s syntax highlighter, as shown in the screencast below:

When the user jumps to the last line, vim starts reparsing after the /* token, so it fails to recognize the end of the comment. In this case, the simple heuristic of reparsing around the current view produces an incorrect result.

Vim provides several knobs to control the “sync point” from which the syntax highlighter begins parsing. One can set the sync point to the beginning of the document using :syntax sync fromstart or from a fixed number of lines before the edited line using :syntax sync minlines={N}. This is often prohibitively slow for large documents, especially since the parser needs to rerun after every edit. Alternatively, the options ccomment and javaComment use heuristics specifically for C-style and Java-style comments, but this doesn’t solve the general case.

It is also possible to set the sync point based on a regular expression. We can see an example in the default Python syntax highlighting rules:

" Sync at the beginning of class, function, or method definition.
syn sync match pythonSync grouphere NONE "^\%(def\|class\)\s\+\h\w*\s*[(:]"

Such rules are easy to get wrong, either by missing edge cases or slowing down the editor. This isn’t theoretical: I’ve seen incorrect highlights from several of vim’s built-in syntax languages, including mainstream ones like JSON and Python.

Another anomaly can occur when highlighting a large document. After a timeout controlled by the redrawtime setting, vim will stop highlighting. The result is disorienting:

Vim stops highlighting a large JSON document after a timeout

Users have reported a few workarounds: increasing redrawtime, configuring a different regular expression engine using set re=1… or maybe it’s set re=0 according to someone else? Both of vim’s regular expression engines (backtracking and NFA-based) have worst-case exponential time complexity, so your mileage may vary.

So where do we go from here? The neovim developers are attempting to replace vim’s syntax highlighter with tree-sitter, a widely used incremental parsing library. Tree-sitter maintains a full parse tree for the document, using some clever algorithms from Tim Wagner’s PhD dissertation1 to update the tree as the document changes. When experimental support for tree-sitter was announced in neovim 0.5 last summer, the small part of the Internet that cares about this stuff nearly lost its mind. I haven’t tried it myself, though.

Aretext, the minimalist vim clone I have been working on, has its own approach for fast and accurate incremental syntax highlighting. More on that in a future post.