Skip to main content

QuantFlow Execution Layer

Single definition, dual runtime — batch and streaming from the same YAML

🔗 Where It Fits

DataInfra → MarketState → FeatureDAG → Execution

📋 Overview

The Execution Layer sits below the FeatureDAG compiler. It receives resolved FeatureInstance objects and produces computed feature values at the destination — Arrow tables for batch (research, model training) and shared stream tables for streaming (live trading).

FeatureInstance[]

┌─────────────┴─────────────┐
▼                              ▼
Batch (Polars)       Streaming (DolphinDB)
│                              │
Warehouse tables   DolphinDB stream tables

Feature definitions (YAML → FeatureType → FeatureInstance) are identical across both paths. The divergence happens only at the expression-generation layer.

📊 Batch Execution

Polars — high-performance DataFrame library built on Apache Arrow and Rust.

  • Lazy evaluation — builds a full query plan then executes in a single optimized collect() call
  • Zero-copy Arrow — pipeline stages pass Arrow tables with no serialization tax
  • Multi-threaded — automatic parallelism, Rust-powered data plane
  • In-process — embedded library, no server, ideal for notebooks and research

Features are computed via with_columns chaining — Polars' query optimizer applies predicate pushdown, column pruning, and operator fusion across the entire plan.

⚡ Streaming Execution

DolphinDB — high-performance distributed time-series database with built-in stream processing.

  • ReactiveStateEngine — declarative stream processing with windowed aggregations and stateful computations
  • Shared stream tables — zero-copy pub/sub between pipeline stages, no serialization, no broker
  • Consolidated deployment — ~60% fewer engines, features sharing input tables deploy as one engine
  • Python-free hot path — once deployed, all computation runs inside DolphinDB server processes

Sub-millisecond feature latency. Idempotent deployment with auto-cleanup. Expression folding inlines intermediate steps.

🎯 Mode Polymorphism

Both backends support three computation modes, configured per feature:

ModeWhen Features ComputeUse Case
tickPer-event (every tick)HFT signals, real-time spread monitoring, execution algorithms
barPer-bar (OHLCV boundary)Bar-native strategies — pattern recognition, bar-level momentum
tick_to_barTick features → bar aggregateMicrostructure features projected onto a bar grid for ML training

🔌 Lowering Registry

Both execution paths share the same compilation infrastructure:

Feature YAML → FeatureTypeRegistry → IR DAG (rustworkx) → Lowering Registry (@register dispatch)
/ \
DolphinDB streaming    Polars batch

A feature's formula is compiled to an IR DAG once. The lowering registry dispatches (Primitive, op) pairs to backend-specific functions via @register. Adding a new engine means registering lowering functions — no IR changes needed.

🐬 Why DolphinDB?

DolphinDB is a high-performance distributed time-series database and streaming compute engine purpose-built for financial data. It combines a columnar storage engine, a vectorized computation runtime, and a built-in pub/sub streaming framework — all in a single platform.

CapabilityWhat It Enables
Shared Stream TablesZero-copy pub/sub between pipeline stages. Upstream writes, downstream subscribes — no serialization, no broker.
Reactive State EngineBuilt-in streaming engine with windowed aggregations and stateful computations. Deploy via declarative script, no JVM.
Vectorized ExecutionColumnar operations run at C++ speed on entire batches. SIMD-optimized and memory-contiguous.
In-Process DeploymentAll engines, tables, and subscriptions live in a single DolphinDB process. No Python, no serialization in the hot path.
Idempotent DeploymentEvery engine deployment script begins with try/catch cleanup. Safe to re-deploy after crashes or config changes.

🔄 Batch vs Streaming — Same Definitions, Two Runtimes

AspectBatch (Polars)Streaming (DolphinDB)
RuntimePython process (single machine)DolphinDB cluster (distributed)
Data modelStatic Arrow tablesUnbounded stream tables
TriggerExplicit run() callContinuous — data arrival triggers computation
Feature groupingSequential per-featureConsolidated engines by input table (~60% fewer)
Intermediate stateIn-memory LazyFrame columnsShared stream tables between engines
Error handlingFeature-level try/except, skip on failureEngine-level monitoring, re-deployment
Primary useResearch, batch scoringLive trading, real-time signal generation

Both paths consume the same FeatureInstance objects — single definition, dual runtime.

💡 Key Design Decisions

  • Backend protocol — structural subtyping (~20 methods). Implement the protocol, register the engine.
  • Lowering registry@register(Primitive, op, backend=...) decorator-based dispatch. Same IR, any backend.
  • Consolidated deployment — features sharing input tables merge into single engines. ~60% fewer engines, lower overhead.
  • Expression folding — intermediate computation steps inlined into terminal expressions. Deployed engines only see final outputs.
  • Deploy-and-forget streaming — Python disconnects after deployment. Pipeline runs autonomously inside DolphinDB.
  • Per-feature error isolation — batch path skips broken features, continues with rest. Critical for research iteration speed.

📚 Design Docs