Overview

How feature DAGs become running computations: the Polars batch path for research, and the DolphinDB streaming path for live production. Same feature definitions, two execution models.

Architecture

The execution layer sits below the type system and IR compiler. It receives resolved FeatureInstance objects and produces computed feature values at the destination — Arrow tables for batch, shared stream tables for streaming.

                        FeatureInstance[]
                              │
              ┌───────────────┴───────────────┐
              ▼                               ▼
     Batch Path (Polars)            Streaming Path (DolphinDB)
              │                               │
              ▼                               ▼
     Warehouse tables           DolphinDB stream tables

Key principle: feature definitions (YAML → FeatureType → FeatureInstance) are identical across both paths. The divergence happens only at the expression-generation layer.

Engine-Agnostic by Design

Polars and DolphinDB are the current backends — they are not the only possible backends. The execution layer is built on a backend protocol (structural subtyping, ~20 methods) and a lowering registry (decorator-based dispatch). Adding a new engine means:

Implement the BaseBackend protocol for the target engine
Register lowering functions via @register(Primitive, op) for engine-specific operations
Backend-agnostic operations (~30 in base.py) come free — they call only protocol methods

Same IR, same semantics, any backend. The compiler doesn't care whether it's targeting a warehouse, a stream processor, or a GPU runtime.

Batch vs. Streaming

Aspect	Batch (Polars)	Streaming (DolphinDB)
Runtime	Python process (single machine)	DolphinDB cluster (distributed)
Data model	Static Arrow tables	Unbounded stream tables
Execution trigger	Explicit `run()` call	Continuous — data arrival triggers computation
Expression model	`pl.Expr` objects (lazy)	DolphinDB DSL strings (deployed as engines)
Feature grouping	Sequential per-feature	Consolidated engines by input table
Array handling	`map_elements` Python UDFs	Handler functions (DolphinDB script)
Intermediate state	In-memory `LazyFrame` columns	Shared stream tables between engines
Error handling	Feature-level try/except, skip on failure	Engine-level monitoring, re-deployment
Backfilling	Natural — run over any date range	Requires historical replay path
Primary use	Research, batch scoring	Live trading, real-time signal generation

Both paths consume the same FeatureInstance objects — single definition, dual runtime.

Mode Polymorphism

Both backends support three computation modes, configured per feature:

Mode	When Features Compute	DDB Implementation	Polars Implementation
`tick`	Per-event (every tick)	`createReactiveStateEngine`	Direct column expressions
`bar`	Per-bar (OHLCV boundary)	`createTimeSeriesEngine`	Filtered by `bar_type`
`tick_to_bar`	Tick features → bar aggregate	Two-phase: tick engines + bar `createTimeSeriesEngine`	Feature DAG + temporal join + group-by agg

tick: Lowest latency. Features update on every market data event. Used for HFT signals, real-time spread monitoring, and execution algorithms that need per-event updates.

bar: Features compute on bar boundaries. Used for bar-native strategies (OHLCV pattern recognition, bar-level momentum) where tick granularity is unnecessary and would create noise.

tick_to_bar: The most common intraday mode. Features are computed at tick resolution (capturing microstructure dynamics), then projected onto a bar grid via temporal aggregation with per-feature aggregation methods (mean, std, last, sum, min, max). This captures tick-level information while producing bar-aligned outputs suitable for ML model training.

Shared IR + Lowering Registry

Both execution paths share the same compilation infrastructure:

Feature definitions
        │
  FeatureType definitions (shared metadata)
        │
  QuantflowIRBuilder (shared IR DAG via rustworkx)
        │
  Lowering Registry (@register dispatch)
       / \
      /   \
DolphinDB    Polars
streaming    batch

Why this matters: A feature's formula is compiled to an IR DAG once. The lowering registry dispatches the same (Primitive, op) pairs to backend-specific functions. @register(Primitive.WINDOW, "rolling_mean") registers the Polars implementation. @register(Primitive.WINDOW, "rolling_mean", backend="dolphindb") registers the DolphinDB implementation. resolve_lowering(node, backend_name) selects the correct one at runtime. Adding a new backend means registering lowering functions — no IR changes needed.

Deployment Lifecycle

Both paths follow a similar lifecycle, adapted to their runtime models:

Batch (Polars) — run-once lifecycle:

Load project metadata (features, engine config)
Read source CDM tables from warehouse per symbol
Merge tables on time/symbol keys
Compile formulas → IR DAG via QuantflowIRBuilder
Execute DAG via PolarsDAGExecutor (lazy, collect() at end)
Apply post-processing: transforms → bar alignment (if tick_to_bar) → normalization
Write results to warehouse sinks
Next symbol — cleared memory, fresh computation

Streaming (DolphinDB) — deploy-and-forget lifecycle:

Load project metadata (features, engine config)
Connect to DolphinDB cluster (pool, retry)
Deploy ingest (create stream tables, start data sources)
Deploy process (state engine: bars, snapshots, trade enrichment)
Deploy features (IR DAG → layer-based topological engine deployment)
Deploy feature output tables (dimension-aware consolidation)
Python disconnects — pipeline runs autonomously inside DolphinDB
Monitor via getStreamEngineStat(), tear down in reverse order on stop

The batch path is run-once-per-symbol. The streaming path is deploy-once-run-continuously. This fundamental difference shapes every design decision in the execution layer.

→ Batch Execution (Polars) · Streaming Execution (DolphinDB)

Architecture​

Engine-Agnostic by Design​

Batch vs. Streaming​

Mode Polymorphism​

Shared IR + Lowering Registry​

Deployment Lifecycle​