Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Content-Addressable Cache

One of the most frustrating things about traditional build tools is the phantom re-run: you check out a branch, and everything rebuilds even though nothing actually changed. OxyMake eliminates this by using file content as the source of truth, not timestamps.

How It Works

Every time OxyMake runs a job, it computes a cache key from everything that could affect the output:

cache_key = blake3(
    format_version ||
    rule_source_hash ||
    sorted((input_path, input_content_hash) pairs) ||
    params_hash ||
    env_content_hash ||
    shell_executable ||
    platform
)

Every field is length-framed with a domain-separation tag, so two different job specifications can never hash to the same key. If the key matches a previously computed result, the job is skipped. The key includes:

  • Rule source hash -- if you change the shell command, inline code, or function reference, the cache is invalidated
  • Input content hashes -- blake3 of every input file's contents, bound to its path; parameter files and (in script mode) the script file itself count as inputs, so editing script.py invalidates the cache
  • Params hash -- any parameters passed via --set or [config]
  • Environment content hash -- the content of the referenced spec file (requirements.txt, conda YAML, nix expression), or the container image reference for Docker/Apptainer
  • Shell executable -- the same command under /bin/bash and /bin/zsh can behave differently
  • Platform -- OS and architecture (a Linux build is not reusable on macOS)

Two exclusions to know about: call-mode function bodies are tracked only if you declare the module as an input, and mutable container tags are hashed as written (pin images by digest -- python@sha256:... -- if you need re-pushed tags to invalidate the cache).

Why Not Timestamps?

Timestamps lie. Here are common situations where they cause phantom re-runs in tools like Make or Snakemake:

ScenarioWhat happens to mtimeContent changed?
git checkoutReset to nowNo
cp without -pReset to nowNo
NFS clock skewArbitraryNo
CI fresh cloneAll files are "new"No
touch commandUpdatedNo

Validation Strategies (ADR-006)

OxyMake's cache validation is pluggable — you choose the right speed/correctness tradeoff for your workflow:

StrategyFlagBehavior
mtime+hash (default)--cache-validation=mtime+hashIf mtime/size differ, compute BLAKE3 hash. Fast on steady-state, correct on change.
mtime (opt-in)--cache-validation=mtimePure filesystem metadata (stat calls only). Fastest, but never verifies content — unsuitable for shared/multi-user caches.
hash--cache-validation=hashAlways compute BLAKE3 hash. Bit-exact. Required for shared/remote caches.
ox run                                  # default: mtime+hash (fast + content-verifying)
ox run --cache-validation=mtime         # Make-parity opt-in (no content check)
ox run --cache-validation=hash          # strict mode (CI)
OX_CACHE_VALIDATION=hash ox run         # via environment variable

Configure per project in Oxymakefile.toml:

[config]
cache_validation = "mtime+hash"

Remote caches automatically promote to hash regardless of the configured strategy, because mtime is not meaningful across machines.

The Cache on Disk

Cached outputs live in .oxymake/cache/, organized by hash prefix:

.oxymake/cache/
  a3/
    a3f7b2c1...   # cached output file
  b1/
    b1e9d4a8...   # another cached output

This directory is independent of the SQLite state database. You can share it across machines, back it up, or delete it without losing execution state (jobs will simply re-run and repopulate the cache).

Sharing Across Machines

Because the cache key is deterministic -- same inputs, same rule, same environment, same platform produce the same key -- you can share cached outputs via S3, GCS, or any shared filesystem:

# Production: everything cached locally
ox run

# CI: pull from shared remote cache
ox run --cache-remote s3://my-bucket/oxymake-cache

For remote caches, OxyMake adds a trust_scope to prevent cache poisoning: cached outputs from untrusted branches cannot be used by production builds.

Cache and Materialization

The cache interacts with the materialization policy:

PolicyWritten to disk?Cached?
always (default)YesYes
autoOnly if neededYes, when materialized
neverNo (memory only)No
finalOnly if DAG leafYes, when materialized

Outputs with materialize = "never" are kept in memory and never enter the cache. This is a deliberate trade-off: you get speed at the cost of reproducibility. The next ox run will recompute them.

Managing the Cache

# See cache size
ox gc --dry-run

# Limit cache to 10 GB (removes oldest entries)
ox gc --max-cache-size 10G

# Remove all cached outputs
ox clean --cache

Why This Matters

The content-addressable cache means you can:

  1. Switch branches freely without phantom re-runs
  2. Add new rules without invalidating existing cached results
  3. Share computation across machines and CI
  4. Resume interrupted runs -- completed work is preserved
  5. Trust the result -- if OxyMake says "cached," the output is bit-for-bit identical to what a fresh run would produce