Skip to main content

QuantFlow MarketState

Market structure reconstruction — bars, order books, and labeling

🔗 Where It Fits

DataInfra → MarketState → FeatureDAG → Execution

📋 Overview

MarketState is the bridge between raw data and feature computation.

It transforms validated CDM data from DataInfra into structured market representations — bars, order book snapshots, and labeled datasets — that FeatureDAG consumes. MarketState constructs canonical market states; it does NOT compute features.

This strict separation ensures that feature logic is never entangled with market reconstruction or labeling, enabling clean, auditable, and reproducible pipelines.

⚡ Single-Pass Numba Fused Kernel

At the core of MarketState is a Numba JIT-compiled fused kernel that processes raw tick data in a single pass, producing all outputs simultaneously — enriched trades, fixed bars, event-driven bars, order book snapshots, and derived quotes.

~100×
faster than native Python
Near C++
LLVM-compiled machine code
1 Pass
all outputs from single scan

Numba compiles Python to LLVM machine code via JIT. The fused kernel design avoids multiple passes over the data — a single scan produces enriched trades, multiple bar types, snapshots, and quotes. Compared to native Python loops, performance is near C++ levels while keeping the development experience in Python.

⚙️ State Engine

Constructs canonical market bars and order book snapshots from raw CDM tick data.

  • Activity-Sampled Bars — time, tick, volume, dollar bars at configurable thresholds
  • Information-Driven Bars — imbalance, run, volatility, dollar imbalance, and CUSUM bars
  • Order Book Snapshots — periodic L2/L3 depth snapshots with configurable levels and intervals, sparse book representation
  • Single-pass fused kernel — Numba JIT-compiled replay engine produces all bar types simultaneously

🏷️ Label Engine

Generates supervised learning labels from structured bar data.

  • Triple Barrier — profit-taking, stop-loss, and time-based barriers with full price path tracking and dynamic threshold adjustment
  • Trend Scanning — CUSUM-based trend detection for directional labeling with adaptive threshold calibration
  • Fixed Horizon Return — continuous or quantile-binned forward returns over configurable horizons
  • Time-Series Label — direction classifier with configurable noise band and threshold

💡 Key Design Decisions

  • Bars are not features — bars are structured market observations; features are computed on top of them by FeatureDAG
  • Labels are a research concern — labels are generated in batch mode only; trading uses the same bar structure without label generation
  • Single-pass kernel — the Numba fused kernel produces all outputs (enriched trades, bars, snapshots, quotes) in a single scan of the raw data
  • Event-based sampling — supports methodologies from modern financial ML for producing IID-like samples from non-stationary market data

🔄 Pipeline Mode

ModeState EngineLabel Engine
Batch (Research)Numba fused kernel over historical CDM dataLabels computed and persisted
Streaming (Trading)DolphinDB reactive state engine, continuous bar formationNot applicable (research-only)

📚 Design Docs

  • State Engine — Numba fused-kernel design, sparse LOB book, fixed and event-driven bar types
  • Label Engine — Triple barrier, trend scanning, fixed horizon, and meta-labeling methods