QuantFlow Execution Layer

Single definition, dual runtime — batch and streaming from the same YAML

DataInfra → MarketState → FeatureDAG → Execution

The Execution Layer sits below the FeatureDAG compiler. It receives resolved FeatureInstance objects and produces computed feature values at the destination — Arrow tables for batch (research, model training) and shared stream tables for streaming (live trading).

FeatureInstance[]
│
┌─────────────┴─────────────┐
▼                              ▼
Batch (Polars)       Streaming (DolphinDB)
│                              │
Warehouse tables   DolphinDB stream tables

Feature definitions (YAML → FeatureType → FeatureInstance) are identical across both paths. The divergence happens only at the expression-generation layer.

Polars — high-performance DataFrame library built on Apache Arrow and Rust.

Lazy evaluation — builds a full query plan then executes in a single optimized collect() call
Zero-copy Arrow — pipeline stages pass Arrow tables with no serialization tax
Multi-threaded — automatic parallelism, Rust-powered data plane
In-process — embedded library, no server, ideal for notebooks and research

Features are computed via with_columns chaining — Polars' query optimizer applies predicate pushdown, column pruning, and operator fusion across the entire plan.

DolphinDB — high-performance distributed time-series database with built-in stream processing.

ReactiveStateEngine — declarative stream processing with windowed aggregations and stateful computations
Shared stream tables — zero-copy pub/sub between pipeline stages, no serialization, no broker
Consolidated deployment — ~60% fewer engines, features sharing input tables deploy as one engine
Python-free hot path — once deployed, all computation runs inside DolphinDB server processes

Sub-millisecond feature latency. Idempotent deployment with auto-cleanup. Expression folding inlines intermediate steps.

Both backends support three computation modes, configured per feature:

Mode	When Features Compute	Use Case
tick	Per-event (every tick)	HFT signals, real-time spread monitoring, execution algorithms
bar	Per-bar (OHLCV boundary)	Bar-native strategies — pattern recognition, bar-level momentum
tick_to_bar	Tick features → bar aggregate	Microstructure features projected onto a bar grid for ML training

Both execution paths share the same compilation infrastructure:

Feature YAML → FeatureTypeRegistry → IR DAG (rustworkx) → Lowering Registry (@register dispatch)
/ \
DolphinDB streaming    Polars batch

A feature's formula is compiled to an IR DAG once. The lowering registry dispatches (Primitive, op) pairs to backend-specific functions via @register. Adding a new engine means registering lowering functions — no IR changes needed.

DolphinDB is a high-performance distributed time-series database and streaming compute engine purpose-built for financial data. It combines a columnar storage engine, a vectorized computation runtime, and a built-in pub/sub streaming framework — all in a single platform.

Capability	What It Enables
Shared Stream Tables	Zero-copy pub/sub between pipeline stages. Upstream writes, downstream subscribes — no serialization, no broker.
Reactive State Engine	Built-in streaming engine with windowed aggregations and stateful computations. Deploy via declarative script, no JVM.
Vectorized Execution	Columnar operations run at C++ speed on entire batches. SIMD-optimized and memory-contiguous.
In-Process Deployment	All engines, tables, and subscriptions live in a single DolphinDB process. No Python, no serialization in the hot path.
Idempotent Deployment	Every engine deployment script begins with try/catch cleanup. Safe to re-deploy after crashes or config changes.

Aspect	Batch (Polars)	Streaming (DolphinDB)
Runtime	Python process (single machine)	DolphinDB cluster (distributed)
Data model	Static Arrow tables	Unbounded stream tables
Trigger	Explicit `run()` call	Continuous — data arrival triggers computation
Feature grouping	Sequential per-feature	Consolidated engines by input table (~60% fewer)
Intermediate state	In-memory LazyFrame columns	Shared stream tables between engines
Error handling	Feature-level try/except, skip on failure	Engine-level monitoring, re-deployment
Primary use	Research, batch scoring	Live trading, real-time signal generation

Both paths consume the same FeatureInstance objects — single definition, dual runtime.

Backend protocol — structural subtyping (~20 methods). Implement the protocol, register the engine.
Lowering registry — @register(Primitive, op, backend=...) decorator-based dispatch. Same IR, any backend.
Consolidated deployment — features sharing input tables merge into single engines. ~60% fewer engines, lower overhead.
Expression folding — intermediate computation steps inlined into terminal expressions. Deployed engines only see final outputs.
Deploy-and-forget streaming — Python disconnects after deployment. Pipeline runs autonomously inside DolphinDB.
Per-feature error isolation — batch path skips broken features, continues with rest. Critical for research iteration speed.

Execution Layer Overview — Architecture, batch vs streaming, mode polymorphism, lowering registry, deployment lifecycle
Batch Execution (Polars) — Lazy evaluation, Arrow zero-copy, with_columns chaining, feature compute engine
Streaming Execution (DolphinDB) — Connection management, schema management, 3-stage pipeline, expression compiler
FeatureDAG — The compiler that feeds the execution layer: AST → IR → Lowering

← Back to Home

QuantFlow Execution Layer

🔗 Where It Fits

📋 Overview

📊 Batch Execution

⚡ Streaming Execution

🎯 Mode Polymorphism

🔌 Lowering Registry

🐬 Why DolphinDB?

🔄 Batch vs Streaming — Same Definitions, Two Runtimes

💡 Key Design Decisions

📚 Design Docs