In my career I’ve used many tools which keep local file caches of one kind or
another: pip, Maven, npm, clj
, go
, staticcheck, and dozens of others. I’ve
also written a few myself.
Besides running them on my local computers, I’ve also maintained automated test
suites and CI systems that use these tools.
These experiences have led me to a set of ideas about what makes a tool cache
good or bad.
Tool caches must be correct
The most important job of a tool cache is to be correct. Correctness means that
the user cannot tell that the tool is making use of a cache—other than that
the tool is faster.
Here are some specific behaviors of correct tool caches:
- The tool invisibly recovers from being interrupted or killed.
Typically, this means that it writes files atomically or else handles
partially-written files.
- The tool invisibly recovers from having its cache directory deleted.
Excellent cache implementations also recover if you selectively delete or even
mutate cache files, but we can call this a stretch goal.
- The tool just works when you update to a new version. Usually this
means that the tool understands the version of the cached data and partially
or completely invalidates it when the tool changes.
- Concurrent executions of the tool just work. They should never cause
errors or break the cache.
Here are some smells that may indicate a poor tool cache:
- The tool fails with an error message that mentions cache files.
- The tool authors encourage the deletion of the cache as a troubleshooting step.
- The tool has provisions for initializing, manipulating, or interacting with
the cache.
Caching is a privilege, not a right
If a tool’s cache behavior is not correct, it’s usually best to disable the
cache. Better to be slow and correct than fast and broken.
Disabling the cache can mean running the tool in a no-cache mode or wiping the
cache before each invocation.
Occasionally when you do this you’ll discover that a tool cache is not just
buggy, but also unnecessary, because the cache hardly makes the tool faster at
all.
Cache location
By default, tools should locate their caches according to the system’s
conventions. For example:
Dumping cache files in /tmp
or ~/.bespoke
may have been fine in 1995, but
it’s not acceptable anymore.
It should be easy for the user to override a cache location, typically via an
environment variable. This should be clearly and prominently documented.
Dependencies
If your tool has a code dependency that writes a file cache, you own that
cache.
If a dependency’s caching behavior is broken, your tool needs to shield the user
from it. For example, the tool might disable the cache, it might selectively
delete the cache when it’s in a bad state that confuses the dependency, or it
might add concurrency control if the dependency doesn’t properly handle
concurrent cache use.
If a dependency places its cache in a weird location, your tool should override
it.
If your tool has multiple caches, they should be consolidated in a single
directory rather than scattered across the file system.
The best cache is no cache
Modern hardware can read files and process data very quickly. A tool can do a
huge amount of work in 5 seconds (or even 50 ms).
Good tools are fast. When a good tool uses a file cache for performance reasons,
it’s because the problem at hand is fundamentally expensive enough that a cache
makes a real difference after the work has been well-optimized.
Adding a cache to a slow, inefficient tool can result in a tool which is still
slow some of the time but also has new correctness issues. If a tool’s authors
haven’t delivered respectable performance, I’m skeptical that they can implement
a solid file cache.
Doing tool caches right
I’d love to point to an article which explains how best to implement file caches
for tools. Unfortunately, I don’t think it exists.
Each use case has its own particular needs; doing this task well requires
thinking through those needs and coming up with a design that plays nicely with
the underlying capabilities of the OSes you are targeting.
I can recommend studying the Go tool’s
cache; it
works extremely well.
Apenwarr’s blog post mtime comparison considered harmful
is well worth a read. It’s focused on make
-like tools, but many of the
pitfalls he points out also await implementers of file caches.