Overview
A compiler for quantitative finance features. Formula strings compile through a 4-stage pipeline — AST compiler, IR DAG, lowering, and execution — producing batch (Polars) and streaming (DolphinDB) results from the same YAML specification.
Why
In a typical quant organization, the same features — OFI, rolling z-score, VPIN, momentum signals — are written and maintained independently by multiple teams. Each researcher authors them in notebooks. Each production engineer rewrites them for the live pipeline. Each new hire rediscovers them from scratch. The result:
- Duplicated effort — teams across the organization rebuild the same core logic, with no shared library or single source of truth
- Divergent implementations — the same feature computed in research and production can produce different numbers because each version encodes subtly different logic
- Quality blind spots — without centralized validation, bugs in feature code go undetected; a researcher's notebook error might surface only months later when a model underperforms
- Notebook → production gap — the research version of a feature doesn't match what runs in production; every new feature requires a full rewrite cycle to bridge this gap
FeatureDAG addresses this by treating features as declarative computation graphs, not hand-coded functions spread across notebooks and scripts. A formula string in YAML compiles through Python's AST module into an Intermediate Representation (IR), which lowers to backend-specific expressions. Same formula, same IR, two backends — one source of truth for the entire organization.
[Formula YAML] → [AST Compiler] → [IR DAG] → [Lowering] → [Execution]
│
┌──────────┼──────────┐
▼ ▼
Polars (batch) DolphinDB (streaming)
How It Works
| Stage | Role |
|---|---|
| 1. AST Compiler | Parses formula strings via Python's ast module, walks the syntax tree, dispatches ~40 built-in functions to IR nodes |
| 2. IR DAG | Frozen, validated DAG nodes (rustworkx) with 50+ compile-time schema contracts — catches type errors before execution |
| 3. Lowering | Translates IR into backend expressions: pl.Expr objects (Polars) or DolphinDB DSL strings, dispatched via a decorator-based registry |
| 4. Execution | Runs the lowered expressions — Polars lazy DataFrame pipeline (batch) or DolphinDB stream engines (streaming) |
→ Type System · AST Compiler · IR DAG · Lowering · Execution
Integration
state_engine label_engine
│ │
│ (enriched CDM tables) │ (target labels)
▼ ▼
feature_engine
│
│ (computed features)
▼
sinks (warehouse, Kafka, stream tables)
FeatureDAG sits downstream of MarketState. It consumes enriched CDM tables and target labels, and outputs computed features.
- state_engine — Produces enriched CDM tables consumed as SOURCE inputs
- label_engine — Generates target labels for supervised learning features
- batch_runner — End-to-end batch pipeline: load data → compile formulas → build IR DAG → lower → execute → write to sinks
- streaming/dolphindb — Deploys features as DolphinDB streaming pipelines for live trading
Two Execution Paths
| Aspect | Batch (Polars) | Streaming (DolphinDB) |
|---|---|---|
| Runtime | Python process | DolphinDB cluster |
| Data model | Static Arrow tables | Unbounded stream tables |
| Trigger | Explicit run() | Continuous — data arrival |
| Expression model | pl.Expr objects (lazy) | DolphinDB DSL strings |
| Primary use | Research | Live trading, real-time signals |
Both paths consume the same feature definitions — no duplicate implementations, no research-production drift.
Key Design Decisions
- Template → Instance separation — One FeatureType blueprint yields many FeatureInstance parameterizations
- Schema contracts over runtime errors — 50+ OP_CONTRACTS validate column types at DAG construction time
- Frozen IRNode — Immutable after construction; no accidental mutation during optimization passes
- Protocol-based backends — Python Protocol (structural subtyping), no inheritance required
- Per-feature error isolation — One broken feature doesn't block the entire pipeline run
See Also
- Feature Library — 133 FeatureTypes