Overview
How feature DAGs become running computations: the Polars batch path for research, and the DolphinDB streaming path for live production. Same feature definitions, two execution models.
Architecture
The execution layer sits below the type system and IR compiler. It receives resolved FeatureInstance objects and produces computed feature values at the destination — Arrow tables for batch, shared stream tables for streaming.
FeatureInstance[]
│
┌───────────────┴───────────────┐
▼ ▼
Batch Path (Polars) Streaming Path (DolphinDB)
│ │
▼ ▼
Warehouse tables DolphinDB stream tables
Key principle: feature definitions (YAML → FeatureType → FeatureInstance) are identical across both paths. The divergence happens only at the expression-generation layer.
Engine-Agnostic by Design
Polars and DolphinDB are the current backends — they are not the only possible backends. The execution layer is built on a backend protocol (structural subtyping, ~20 methods) and a lowering registry (decorator-based dispatch). Adding a new engine means:
- Implement the
BaseBackendprotocol for the target engine - Register lowering functions via
@register(Primitive, op)for engine-specific operations - Backend-agnostic operations (~30 in
base.py) come free — they call only protocol methods
Same IR, same semantics, any backend. The compiler doesn't care whether it's targeting a warehouse, a stream processor, or a GPU runtime.
Batch vs. Streaming
| Aspect | Batch (Polars) | Streaming (DolphinDB) |
|---|---|---|
| Runtime | Python process (single machine) | DolphinDB cluster (distributed) |
| Data model | Static Arrow tables | Unbounded stream tables |
| Execution trigger | Explicit run() call | Continuous — data arrival triggers computation |
| Expression model | pl.Expr objects (lazy) | DolphinDB DSL strings (deployed as engines) |
| Feature grouping | Sequential per-feature | Consolidated engines by input table |
| Array handling | map_elements Python UDFs | Handler functions (DolphinDB script) |
| Intermediate state | In-memory LazyFrame columns | Shared stream tables between engines |
| Error handling | Feature-level try/except, skip on failure | Engine-level monitoring, re-deployment |
| Backfilling | Natural — run over any date range | Requires historical replay path |
| Primary use | Research, batch scoring | Live trading, real-time signal generation |
Both paths consume the same FeatureInstance objects — single definition, dual runtime.
Mode Polymorphism
Both backends support three computation modes, configured per feature:
| Mode | When Features Compute | DDB Implementation | Polars Implementation |
|---|---|---|---|
tick | Per-event (every tick) | createReactiveStateEngine | Direct column expressions |
bar | Per-bar (OHLCV boundary) | createTimeSeriesEngine | Filtered by bar_type |
tick_to_bar | Tick features → bar aggregate | Two-phase: tick engines + bar createTimeSeriesEngine | Feature DAG + temporal join + group-by agg |
tick: Lowest latency. Features update on every market data event. Used for HFT signals, real-time spread monitoring, and execution algorithms that need per-event updates.
bar: Features compute on bar boundaries. Used for bar-native strategies (OHLCV pattern recognition, bar-level momentum) where tick granularity is unnecessary and would create noise.
tick_to_bar: The most common intraday mode. Features are computed at tick resolution (capturing microstructure dynamics), then projected onto a bar grid via temporal aggregation with per-feature aggregation methods (mean, std, last, sum, min, max). This captures tick-level information while producing bar-aligned outputs suitable for ML model training.
Shared IR + Lowering Registry
Both execution paths share the same compilation infrastructure:
Feature definitions
│
FeatureType definitions (shared metadata)
│
QuantflowIRBuilder (shared IR DAG via rustworkx)
│
Lowering Registry (@register dispatch)
/ \
/ \
DolphinDB Polars
streaming batch
Why this matters: A feature's formula is compiled to an IR DAG once. The lowering registry dispatches the same (Primitive, op) pairs to backend-specific functions. @register(Primitive.WINDOW, "rolling_mean") registers the Polars implementation. @register(Primitive.WINDOW, "rolling_mean", backend="dolphindb") registers the DolphinDB implementation. resolve_lowering(node, backend_name) selects the correct one at runtime. Adding a new backend means registering lowering functions — no IR changes needed.
Deployment Lifecycle
Both paths follow a similar lifecycle, adapted to their runtime models:
Batch (Polars) — run-once lifecycle:
- Load project metadata (features, engine config)
- Read source CDM tables from warehouse per symbol
- Merge tables on time/symbol keys
- Compile formulas → IR DAG via
QuantflowIRBuilder - Execute DAG via
PolarsDAGExecutor(lazy,collect()at end) - Apply post-processing: transforms → bar alignment (if tick_to_bar) → normalization
- Write results to warehouse sinks
- Next symbol — cleared memory, fresh computation
Streaming (DolphinDB) — deploy-and-forget lifecycle:
- Load project metadata (features, engine config)
- Connect to DolphinDB cluster (pool, retry)
- Deploy ingest (create stream tables, start data sources)
- Deploy process (state engine: bars, snapshots, trade enrichment)
- Deploy features (IR DAG → layer-based topological engine deployment)
- Deploy feature output tables (dimension-aware consolidation)
- Python disconnects — pipeline runs autonomously inside DolphinDB
- Monitor via
getStreamEngineStat(), tear down in reverse order on stop
The batch path is run-once-per-symbol. The streaming path is deploy-once-run-continuously. This fundamental difference shapes every design decision in the execution layer.
→ Batch Execution (Polars) · Streaming Execution (DolphinDB)