AST Compiler
The AST compiler transforms formula strings into IR DAG nodes. It uses Python's ast module to parse mathematical expressions, then walks the syntax tree to produce a list of IRNode objects — each representing one computation step. This is the bridge between human-readable formulas and the backend-agnostic IR.
Why Formulas
Before the formula compiler, FeatureTypes were defined with explicit steps: lists — manual DAG construction in YAML. The formula compiler replaces this with:
# Old: explicit steps
steps:
- primitive: TRANSFORM
op: sub
params: { a: best_bid_size, b: best_ask_size }
- primitive: WINDOW
op: diff
params: { input: bid_ask_diff, window: 1 }
# New: single formula string
formula: "diff(best_bid_size - best_ask_size, $window)"
The formula is more concise, less error-prone, and allows the compiler to optimize the resulting DAG. A single formula line replaces dozens of YAML lines.
Formula Language Reference
Arithmetic
| Operator | Primitive | Example |
|---|---|---|
+ | TRANSFORM.add | a + b |
- | TRANSFORM.sub | a - b |
* | TRANSFORM.mul | a * b |
/ | TRANSFORM.div | a / b |
** | TRANSFORM.pow | a ** 2 |
Unary Functions (TRANSFORM)
| Function | Description |
|---|---|
sign(x) | Sign: -1, 0, or +1 |
abs(x) | Absolute value |
sqrt(x) | Square root |
log(x) | Natural logarithm |
tanh(x) | Hyperbolic tangent (bounds to [-1, 1]) |
clip(x, lo, hi) | Clamp to [lo, hi] |
round(x), floor(x), ceil(x) | Rounding |
Comparisons (TRANSFORM)
| Operator | Description |
|---|---|
a > b, a < b | Greater-than, less-than |
a >= b, a <= b | Greater/less-or-equal |
a == b, a != b | Equality, inequality |
Conditional
if(condition, true_value, false_value)
Compiles to TRANSFORM.conditional — evaluated row-by-row.
Window Functions (WINDOW)
Window functions operate over a fixed-size trailing window of rows.
| Function | Description | Extra Params |
|---|---|---|
rolling_mean(x, window) | Rolling average | — |
rolling_std(x, window) | Rolling standard deviation | — |
rolling_sum(x, window) | Rolling sum | — |
rolling_min(x, window) | Rolling minimum | — |
rolling_max(x, window) | Rolling maximum | — |
rolling_median(x, window) | Rolling median | — |
rolling_skew(x, window) | Rolling skewness | — |
rolling_kurt(x, window) | Rolling kurtosis | — |
rolling_zscore(x, window) | Rolling z-score: (x - μ) / σ | — |
rolling_corr(x, y, window) | Rolling correlation between two series | — |
zscore(x, window) | Alias for rolling_zscore | — |
Lag / Diff (WINDOW)
| Function | Description |
|---|---|
diff(x, n) | Difference from n periods ago: x[t] - x[t-n] |
lag(x, n) | Value n periods ago: x[t-n] |
shift(x, n) | Shift forward by n periods |
pct_change(x, n) | Percentage change: (x[t] - x[t-n]) / x[t-n] |
Autocorrelation (WINDOW)
| Function | Description |
|---|---|
autocorr(x, lag) | Positive autocorrelation at given lag |
neg_autocorr(x, lag) | Negative autocorrelation (for mean-reversion signals) |
Entropy (WINDOW)
| Function | Description |
|---|---|
shannon_entropy(x, window) | Shannon entropy over rolling window |
sample_entropy(x, window) | Sample entropy (pattern regularity) |
Special Window Functions
| Function | Description |
|---|---|
rolling_percentile(x, window, q) | q-th percentile over rolling window |
historical_var(x, window, q) | Historical Value-at-Risk (quantile) |
consecutive_count(x, window) | Count of consecutive above-threshold values |
percentile_rank(x, window) | Percentile rank of current value in rolling distribution |
time_under_water(x, window) | Duration since last peak |
linear_regression_slope(x, window) | Slope of linear regression over window |
half_life(x, window) | Estimated half-life of mean reversion |
adx(high, low, close, window) | Wilder's ADX trend strength indicator |
State Accumulators (STATE)
State functions maintain cumulative memory across all prior rows — no fixed window.
| Function | Description |
|---|---|
ema(x, alpha) | Exponential moving average: S[t] = α·x[t] + (1-α)·S[t-1] |
decay_accum(x, decay) | Decaying accumulator: S[t] = decay·S[t-1] + x[t] |
cumsum(x, init) | Cumulative sum starting from init |
Utility
| Function | Description |
|---|---|
rsi_from_rs(rs) | Convert Relative Strength to RSI: 100 - 100/(1+rs) |
min(a, b), max(a, b) | Two-argument min/max (decomposed to compare + conditional) |
Column References vs Parameters
In a formula, bare names refer to CDM columns. Names prefixed with $ refer to configurable parameters from the FeatureType's parameters: block.
# FeatureType definition
name: ofi
required_inputs:
- best_bid_size # ← column reference in formula
- best_ask_size # ← column reference in formula
parameters:
window:
type: integer
default: 1
formula: "cumsum((diff(best_bid_size, $window) - diff(best_ask_size, $window)), 0)"
# ^^^ columns ^^^^^^ parameter ^^^^^^ parameter
During compilation, $window is resolved to its concrete value (e.g., 1) from the parameter defaults or feature overrides. Column names pass through as column references in the IR.
Python keywords that conflict with formula identifiers (e.g., return) are handled with a _kw_ prefix: _kw_return_ before parsing, restored to return after.
Compilation Pipeline
Step 1: Parse
Python's ast.parse() converts the formula string into an AST:
"sign(OFI) * abs(momentum)"
│
▼
BinOp(op=Mult,
left= Call(func=Name('sign'), args=[Name('OFI')]),
right=Call(func=Name('abs'), args=[Name('momentum')]))
Step 2: Walk
FormulaIRCompiler._walk() recursively traverses the AST, producing IRNodes at each node:
| AST Node | IRNode Produced |
|---|---|
ast.Constant | TRANSFORM.lit — literal value |
ast.Name | TRANSFORM.col (if column) or TRANSFORM.lit (if $param) |
ast.Attribute | TRANSFORM.col with flattened dotted name (a.b.c) |
ast.BinOp | TRANSFORM.{add, sub, mul, div, pow} |
ast.UnaryOp | TRANSFORM.neg |
ast.Compare | TRANSFORM.{gt, lt, eq, neq, lte, gte} |
ast.IfExp | TRANSFORM.conditional |
ast.Call | Dispatched to handler by function name |
Step 3: Dispatch
Function calls (ast.Call) are dispatched by category using frozenset lookups from builtins.py:
if func_name in _UNARY_TRANSFORM_FUNCS: # sign, abs, sqrt, log, tanh
return make_transform_node(func_name, args[0])
elif func_name in _UNARY_WINDOW_FUNCS: # rolling_mean, rolling_std, ...
return make_window_node(func_name, args[0], args[1])
elif func_name in _STATE_FUNCS: # ema, decay_accum, cumsum
return make_state_node(func_name, args[0], args[1])
elif func_name in _AUTOCORR_FUNCS: # autocorr, neg_autocorr
return make_autocorr_node(func_name, args[0], args[1])
...
Step 4: Output
The compiler returns (nodes, root_id) — the full list of IRNodes and the ID of the root (final) node. The IR builder then assembles these into the full DAG.
Example: Full Compilation Walkthrough
Formula: "cumsum((diff(best_bid_size, $window) - diff(best_ask_size, $window)), 0)"
With window = 5:
Parse:
Call(func='cumsum', args=[
BinOp(op=Sub,
left= Call(func='diff', args=[Name('best_bid_size'), Constant(5)]),
right=Call(func='diff', args=[Name('best_ask_size'), Constant(5)])),
Constant(0)
])
Compiled IRNodes:
1. TRANSFORM.col → best_bid_size
2. WINDOW.diff → best_bid_size[t] - best_bid_size[t-5]
3. TRANSFORM.col → best_ask_size
4. WINDOW.diff → best_ask_size[t] - best_ask_size[t-5]
5. TRANSFORM.sub → diff_bid - diff_ask
6. STATE.cumsum → running total from 0
DAG edges:
1 → 2 (diff depends on col)
3 → 4 (diff depends on col)
2 → 5 (sub depends on both diffs)
4 → 5
5 → 6 (cumsum depends on sub)
The resulting DAG has 6 nodes. The compiler handles all dependency wiring automatically — no manual inputs: declarations needed.
Utility Functions
The ir/ast/utils.py module provides high-level entry points:
| Function | Purpose |
|---|---|
formula_to_ir(formula, feature_columns, param_names) | Compile formula → (nodes, root_id) |
formula_to_steps(formula, output_name, param_names) | Compile formula → List[ComputationStep] (for legacy interop) |
extract_formula_variables(formula, param_names) | Extract column references (excluding known functions and params) |
ir_nodes_to_steps(nodes) | Convert IRNode list → ComputationStep list |
File Reference
| File | Purpose |
|---|---|
ir/ast/compiler.py | FormulaIRCompiler — main AST → IR compilation |
ir/ast/handlers.py | FormulaHandlersMixin — per-function handler dispatch |
ir/ast/builtins.py | Function classification frozensets + helpers |
ir/ast/utils.py | High-level formula_to_ir / formula_to_steps API |