Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Oxymakefile Format

OxyMake workflows are defined in Oxymakefile.toml, a declarative TOML file. This page is the complete format reference.

Top-Level Fields

ox_version = "0.1"           # Required. OxyMake format version.

Config Section

The [config] section defines workflow-level variables used for wildcard expansion:

[config]
samples = ["A", "B", "C"]
chromosomes = ["chr1", "chr2", "chr3"]
models = ["linear", "ridge", "lasso"]

Config values are arrays of strings. They drive wildcard expansion in rules.

Rule Definitions

Each rule is a [rule.<name>] table:

[rule.process]
input = ["data/{sample}.csv"]
output = ["results/{sample}.txt"]
shell = "python process.py {input} {output}"

Rule Fields

FieldTypeRequiredDescription
inputArray of stringsNoInput file patterns with {wildcards}
outputArray of stringsYesOutput file patterns with {wildcards}
shellStringOne of shell/run/script/callOpaque shell command
runStringOne of shell/run/script/callInline script (with lang)
scriptStringOne of shell/run/script/callPath to script file
callStringOne of shell/run/script/callPython function reference
langStringWith run/scriptLanguage: python, r, julia
tagsArray of stringsNoTags for filtering and grouping
resourcesTableNoResource requirements
envStringNoEnvironment to use
whenStringNoConditional guard expression
materializeStringNoalways, auto, never, final
paramsTableNoRule-specific parameters

Execution Modes

Four modes form a spectrum from flexibility to optimizability:

shell -- Opaque shell command. Maximum flexibility, no optimization.

[rule.align]
shell = "bwa mem ref.fa {input} > {output}"

run -- Inline script with language specification.

[rule.stats]
lang = "python"
run = """
import pandas as pd
df = pd.read_csv("{input}")
df.describe().to_csv("{output}")
"""

script -- External script file.

[rule.analyze]
lang = "python"
script = "scripts/analyze.py"

call -- Pure function reference. Supports in-memory Arrow IPC passing.

[rule.features]
input = [{ path = "data/{sample}.parquet", format = "parquet" }]
output = [{ path = "features/{sample}.parquet", format = "parquet", materialize = "auto" }]
call = "pipeline.features:compute_features"

Wildcards

Wildcards in {braces} are resolved from [config] arrays or inferred from existing files:

[config]
samples = ["A", "B"]

[rule.process]
input = ["data/{sample}.csv"]     # {sample} expanded from config.samples
output = ["results/{sample}.txt"]

Resources

[rule.heavy_job]
output = ["results/big.txt"]
shell = "compute_heavy"
resources = { cpus = 4, mem_gb = 16, gpu = 1, time_min = 60 }

Conditional Guards

[rule.expensive]
output = ["results/{seed}.txt"]
shell = "compute {seed}"
when = "seed in @selected_seeds"

Guards are evaluated at DAG resolution time. Jobs whose guard is false are never created.

Include Directives

Split large workflows across files:

include = ["rules/alignment.toml", "rules/qc.toml"]

Environment Specification

[env.analysis]
type = "uv"
requirements = "requirements.txt"

[rule.analyze]
env = "analysis"

Supported environment types: system, uv, conda, docker, nix.

Next Steps