Formula Language Reference
The FeatureDAG formula language is a mathematical DSL for declaring feature computations. Formula strings are compiled to an IR DAG via Python's ast module, then lowered to Polars (pl.Expr) or DolphinDB DSL — same expression compiles to both backends — no duplicate code.
Arithmetic
| Operator | Primitive | Example |
|---|
+ | TRANSFORM.add | a + b |
- | TRANSFORM.sub | a - b |
* | TRANSFORM.mul | a * b |
/ | TRANSFORM.div | a / b |
** | TRANSFORM.pow | a ** 2 |
Unary Functions
| Function | Description |
|---|
sign(x) | Sign: -1, 0, or +1 |
abs(x) | Absolute value |
sqrt(x) | Square root |
log(x) | Natural logarithm |
tanh(x) | Hyperbolic tangent (bounds to [-1, 1]) |
clip(x, lo, hi) | Clamp to [lo, hi] |
round(x), floor(x), ceil(x) | Rounding |
Comparisons
| Operator | Description |
|---|
a > b, a < b | Greater-than, less-than |
a >= b, a <= b | Greater/less-or-equal |
a == b, a != b | Equality, inequality |
Conditional
if(condition, true_value, false_value)
Compiles to TRANSFORM.conditional — evaluated row-by-row.
Window Functions
Operate over a fixed-size trailing window of rows.
| Function | Description |
|---|
rolling_mean(x, window) | Rolling average |
rolling_std(x, window) | Rolling standard deviation |
rolling_sum(x, window) | Rolling sum |
rolling_min(x, window) | Rolling minimum |
rolling_max(x, window) | Rolling maximum |
rolling_median(x, window) | Rolling median |
rolling_skew(x, window) | Rolling skewness |
rolling_kurt(x, window) | Rolling kurtosis |
rolling_zscore(x, window) | Rolling z-score: (x - μ) / σ |
rolling_corr(x, y, window) | Rolling Pearson correlation between two series |
rolling_rank_corr(x, y, window) | Rolling rank (Spearman) correlation between two series |
zscore(x, window) | Alias for rolling_zscore |
Lag / Diff
| Function | Description |
|---|
diff(x, n) | Difference from n periods ago: x[t] - x[t-n] |
lag(x, n) | Value n periods ago: x[t-n] |
shift(x, n) | Shift forward by n periods |
pct_change(x, n) | Percentage change: (x[t] - x[t-n]) / x[t-n] |
Autocorrelation
| Function | Description |
|---|
autocorr(x, lag) | Positive autocorrelation at given lag |
neg_autocorr(x, lag) | Negative autocorrelation (for mean-reversion signals) |
Entropy
| Function | Description |
|---|
shannon_entropy(x, window) | Shannon entropy over rolling window |
sample_entropy(x, window) | Sample entropy (pattern regularity) |
Special Window Functions
| Function | Description |
|---|
rolling_percentile(x, window, q) | q-th percentile over rolling window |
historical_var(x, window, q) | Historical Value-at-Risk (quantile) |
consecutive_count(x, window) | Count of consecutive above-threshold values |
percentile_rank(x, window) | Percentile rank of current value in rolling distribution |
time_under_water(x, window) | Duration since last peak |
linear_regression_slope(x, window) | Slope of linear regression over window |
half_life(x, window) | Estimated half-life of mean reversion |
adx(high, low, close, window) | Wilder's ADX trend strength indicator |
State Accumulators
State functions maintain cumulative memory across all prior rows — no fixed window.
| Function | Description |
|---|
ema(x, alpha) | Exponential moving average: S[t] = α·x[t] + (1-α)·S[t-1] |
decay_accum(x, decay) | Decaying accumulator: S[t] = decay·S[t-1] + x[t] |
cumsum(x, init) | Cumulative sum starting from init |
Utility
| Function | Description |
|---|
rsi_from_rs(rs) | Convert Relative Strength to RSI: 100 - 100/(1+rs) |
min(a, b), max(a, b) | Two-argument min/max |
Column References vs Parameters
In a formula, bare names refer to CDM columns. Names prefixed with $ refer to configurable parameters from the FeatureType's parameters: block.
required_inputs:
- best_bid_size
- best_ask_size
parameters:
window:
type: integer
default: 1
formula: "cumsum((diff(best_bid_size, $window) - diff(best_ask_size, $window)), 0)"
During compilation, $window is resolved to its concrete value from parameter defaults or feature overrides. Column names pass through as column references in the IR.
Compilation
Formulas compile through FeatureDAG's 4-stage pipeline:
[Formula string] → [AST Compiler] → [IR DAG] → [Lowering] → [Pipeline] → [Execution]
The AST Compiler uses Python's ast module to parse the formula, walks the syntax tree, and dispatches each function call to an IR node. Function dispatch uses frozenset lookups: rolling_mean → WINDOW.rolling_mean, cumsum → STATE.cumsum, etc. The resulting DAG is frozen and validated with 50+ compile-time schema contracts before lowering to backend expressions.
→ AST Compiler internals