Market structure reconstruction — bars, order books, and labeling
MarketState is the bridge between raw data and feature computation.
It transforms validated CDM data from DataInfra into structured market representations — bars, order book snapshots, and labeled datasets — that FeatureDAG consumes. MarketState constructs canonical market states; it does NOT compute features.
This strict separation ensures that feature logic is never entangled with market reconstruction or labeling, enabling clean, auditable, and reproducible pipelines.
At the core of MarketState is a Numba JIT-compiled fused kernel that processes raw tick data in a single pass, producing all outputs simultaneously — enriched trades, fixed bars, event-driven bars, order book snapshots, and derived quotes.
Numba compiles Python to LLVM machine code via JIT. The fused kernel design avoids multiple passes over the data — a single scan produces enriched trades, multiple bar types, snapshots, and quotes. Compared to native Python loops, performance is near C++ levels while keeping the development experience in Python.
Constructs canonical market bars and order book snapshots from raw CDM tick data.
Generates supervised learning labels from structured bar data.
| Mode | State Engine | Label Engine |
|---|---|---|
| Batch (Research) | Numba fused kernel over historical CDM data | Labels computed and persisted |
| Streaming (Trading) | DolphinDB reactive state engine, continuous bar formation | Not applicable (research-only) |