Declarative YAML β IR DAG β Polars (batch) + DolphinDB (streaming)
FeatureDAG is a compiler for quantitative finance features. It eliminates the research-to-production gap by treating financial features as declarative computation graphs, not hand-coded functions. A single YAML specification compiles through a 4-layer pipeline and produces both batch (Polars) and streaming (DolphinDB) execution from the same definition.
Quant researchers write features in notebooks with pandas and one-off scripts. Production engineers rewrite them for streaming pipelines. The two diverge β results drift, bugs creep in, and every new feature requires a full rewrite cycle. FeatureDAG eliminates this gap entirely.
AST Compiler
Formula string β IR nodes. Parses Python AST, dispatches ~40 built-in functions.
IR DAG
Frozen IR nodes. 50+ schema contracts. Column aliasing. Rustworkx DAG.
Lowering
Backend protocol. Decorator-based registry. 30+ agnostic ops.
Execution
Polars lazy pipeline (batch) + DolphinDB stream engines (streaming).
Every feature computation falls into one of six primitives. The compiler understands the semantics of each and optimizes accordingly.
| Primitive | Nature |
|---|---|
| SOURCE | Data ingestion from CDM tables |
| TRANSFORM | Stateless row-wise mapping (mid_price, trade_sign, depth_imbalance) |
| WINDOW | Rolling/windowed aggregation (rolling_vol, rolling_corr, pct_change, lag) |
| STATE | Recursive computations with memory (ema, decay_accum, rolling_zscore) |
| SINK | Marks output columns β final feature values written to storage |
| EVENT | Bar trigger generation for information-driven sampling |
Same feature definitions, two execution models. The divergence happens only at the expression-generation layer.
| Batch | Streaming | |
|---|---|---|
| Runtime | Polars (Python) | DolphinDB cluster |
| Data | Static Arrow tables | Unbounded stream tables |
| Trigger | Explicit run() | Continuous β data arrival |
| Grouping | Sequential per-feature | Consolidated engines (~60% fewer) |
| Use | Research | Live trading, real-time signals |
Same definitions, two runtimes β no duplicate implementations.
FeatureDAG ships with 133 FeatureTypes across 6 dimensions (Signal, Execution, Quality, Regime, Stability, Technical).
Each feature is documented with its inputs, parameters, computation type, and economic meaning. Features are consumed directly by the compiler β no manual implementation needed.