QuantFlow FeatureDAG

Declarative YAML → IR DAG → Polars (batch) + DolphinDB (streaming)

DataInfra → MarketState → FeatureDAG → Research / Trading

FeatureDAG is a compiler for quantitative finance features. It eliminates the research-to-production gap by treating financial features as declarative computation graphs, not hand-coded functions. A single YAML specification compiles through a 4-layer pipeline and produces both batch (Polars) and streaming (DolphinDB) execution from the same definition.

Quant researchers write features in notebooks with pandas and one-off scripts. Production engineers rewrite them for streaming pipelines. The two diverge — results drift, bugs creep in, and every new feature requires a full rewrite cycle. FeatureDAG eliminates this gap entirely.

[Formula YAML] → [AST Compiler] → [IR DAG] → [Lowering] → [Execution]
┌───────────────┴───────────────┐
Polars (batch)        DolphinDB (streaming)

Stage 1

AST Compiler

Formula string → IR nodes. Parses Python AST, dispatches ~40 built-in functions.

Stage 2

IR DAG

Frozen IR nodes. 50+ schema contracts. Column aliasing. Rustworkx DAG.

Stage 3

Lowering

Backend protocol. Decorator-based registry. 30+ agnostic ops.

Stage 4

Execution

Polars lazy pipeline (batch) + DolphinDB stream engines (streaming).

Every feature computation falls into one of six primitives. The compiler understands the semantics of each and optimizes accordingly.

Primitive	Nature
SOURCE	Data ingestion from CDM tables
TRANSFORM	Stateless row-wise mapping (mid_price, trade_sign, depth_imbalance)
WINDOW	Rolling/windowed aggregation (rolling_vol, rolling_corr, pct_change, lag)
STATE	Recursive computations with memory (ema, decay_accum, rolling_zscore)
SINK	Marks output columns — final feature values written to storage
EVENT	Bar trigger generation for information-driven sampling

Same feature definitions, two execution models. The divergence happens only at the expression-generation layer.

	Batch	Streaming
Runtime	Polars (Python)	DolphinDB cluster
Data	Static Arrow tables	Unbounded stream tables
Trigger	Explicit run()	Continuous — data arrival
Grouping	Sequential per-feature	Consolidated engines (~60% fewer)
Use	Research	Live trading, real-time signals

Same definitions, two runtimes — no duplicate implementations.

Feature definitions are declarative, not imperative — YAML specs compile to IR. No hand-coded computation logic in the pipeline.
Single IR, multiple backends — The IR is backend-agnostic. Lowering functions produce engine-specific expressions for Polars and DolphinDB from the same IR nodes.
Feature-level error isolation — A misconfigured feature is logged and skipped. The rest of the pipeline continues — critical for research iteration speed.
LazyFrame chaining in batch — Polars expressions are chained via successive with_columns calls, letting the query optimizer fuse operations and minimize allocations.
Consolidated engine deployment in streaming — Multiple features sharing the same input table deploy as a single DolphinDB engine, reducing engine count by ~60%.
Expression folding in streaming — Intermediate computation steps are inlined into terminal expressions via regex substitution. Deployed engines only see final outputs.

FeatureDAG ships with 133 FeatureTypes across 6 dimensions (Signal, Execution, Quality, Regime, Stability, Technical).

Each feature is documented with its inputs, parameters, computation type, and economic meaning. Features are consumed directly by the compiler — no manual implementation needed.

Browse the Standard Feature Library →

FeatureDAG Overview — 4-stage pipeline: AST compiler, IR DAG, lowering, execution
Execution Layer — Batch (Polars) and streaming (DolphinDB) paths in detail
Formula Language Reference — ~40 math functions compiled to IR DAG

← Back to Home

QuantFlow FeatureDAG

🔗 Where It Fits

📋 Overview

🏗️ 4-Stage Compilation Pipeline

Stage 1

Stage 2

Stage 3

Stage 4

🧩 6 Computation Primitives

🔄 Batch + Streaming Consistency

💡 Key Design Decisions

📚 Standard Feature Library

📖 Design Docs