Skip to main content

Formula Language Reference

The FeatureDAG formula language is a mathematical DSL for declaring feature computations. Formula strings are compiled to an IR DAG via Python's ast module, then lowered to Polars (pl.Expr) or DolphinDB DSL — same expression compiles to both backends — no duplicate code.


Arithmetic

OperatorPrimitiveExample
+TRANSFORM.adda + b
-TRANSFORM.suba - b
*TRANSFORM.mula * b
/TRANSFORM.diva / b
**TRANSFORM.powa ** 2

Unary Functions

FunctionDescription
sign(x)Sign: -1, 0, or +1
abs(x)Absolute value
sqrt(x)Square root
log(x)Natural logarithm
tanh(x)Hyperbolic tangent (bounds to [-1, 1])
clip(x, lo, hi)Clamp to [lo, hi]
round(x), floor(x), ceil(x)Rounding

Comparisons

OperatorDescription
a > b, a < bGreater-than, less-than
a >= b, a <= bGreater/less-or-equal
a == b, a != bEquality, inequality

Conditional

if(condition, true_value, false_value)

Compiles to TRANSFORM.conditional — evaluated row-by-row.


Window Functions

Operate over a fixed-size trailing window of rows.

FunctionDescription
rolling_mean(x, window)Rolling average
rolling_std(x, window)Rolling standard deviation
rolling_sum(x, window)Rolling sum
rolling_min(x, window)Rolling minimum
rolling_max(x, window)Rolling maximum
rolling_median(x, window)Rolling median
rolling_skew(x, window)Rolling skewness
rolling_kurt(x, window)Rolling kurtosis
rolling_zscore(x, window)Rolling z-score: (x - μ) / σ
rolling_corr(x, y, window)Rolling Pearson correlation between two series
rolling_rank_corr(x, y, window)Rolling rank (Spearman) correlation between two series
zscore(x, window)Alias for rolling_zscore

Lag / Diff

FunctionDescription
diff(x, n)Difference from n periods ago: x[t] - x[t-n]
lag(x, n)Value n periods ago: x[t-n]
shift(x, n)Shift forward by n periods
pct_change(x, n)Percentage change: (x[t] - x[t-n]) / x[t-n]

Autocorrelation

FunctionDescription
autocorr(x, lag)Positive autocorrelation at given lag
neg_autocorr(x, lag)Negative autocorrelation (for mean-reversion signals)

Entropy

FunctionDescription
shannon_entropy(x, window)Shannon entropy over rolling window
sample_entropy(x, window)Sample entropy (pattern regularity)

Special Window Functions

FunctionDescription
rolling_percentile(x, window, q)q-th percentile over rolling window
historical_var(x, window, q)Historical Value-at-Risk (quantile)
consecutive_count(x, window)Count of consecutive above-threshold values
percentile_rank(x, window)Percentile rank of current value in rolling distribution
time_under_water(x, window)Duration since last peak
linear_regression_slope(x, window)Slope of linear regression over window
half_life(x, window)Estimated half-life of mean reversion
adx(high, low, close, window)Wilder's ADX trend strength indicator

State Accumulators

State functions maintain cumulative memory across all prior rows — no fixed window.

FunctionDescription
ema(x, alpha)Exponential moving average: S[t] = α·x[t] + (1-α)·S[t-1]
decay_accum(x, decay)Decaying accumulator: S[t] = decay·S[t-1] + x[t]
cumsum(x, init)Cumulative sum starting from init

Utility

FunctionDescription
rsi_from_rs(rs)Convert Relative Strength to RSI: 100 - 100/(1+rs)
min(a, b), max(a, b)Two-argument min/max

Column References vs Parameters

In a formula, bare names refer to CDM columns. Names prefixed with $ refer to configurable parameters from the FeatureType's parameters: block.

required_inputs:
- best_bid_size # column reference in formula
- best_ask_size # column reference in formula
parameters:
window:
type: integer
default: 1
formula: "cumsum((diff(best_bid_size, $window) - diff(best_ask_size, $window)), 0)"
# columns ^^^^^^ parameter ^^^^^^ parameter

During compilation, $window is resolved to its concrete value from parameter defaults or feature overrides. Column names pass through as column references in the IR.


Compilation

Formulas compile through FeatureDAG's 4-stage pipeline:

[Formula string] → [AST Compiler] → [IR DAG] → [Lowering] → [Pipeline] → [Execution]

The AST Compiler uses Python's ast module to parse the formula, walks the syntax tree, and dispatches each function call to an IR node. Function dispatch uses frozenset lookups: rolling_meanWINDOW.rolling_mean, cumsumSTATE.cumsum, etc. The resulting DAG is frozen and validated with 50+ compile-time schema contracts before lowering to backend expressions.

AST Compiler internals