Skip to main content

LOB Book, Trade Enrichment & Snapshots

The order book is the State Engine's most performance-critical data structure. Every trade is enriched against the book state at its exact timestamp, producing L1 analytics and full-depth L2 snapshots.


The Sparse LOB Book

A fixed-size, direct-indexed structure optimized for Numba's JIT compiler:

StructureSizePurpose
Price map20M int32 slots (~80 MB)Maps tick price → level index, O(1) lookup without hashing
Level arrays3 × 200,000 entriesPrice, size, side — active levels stored contiguously
Free listStack of freed indicesO(1) slot reuse after deletions

Key Operations

  • Lookup: Direct array access at price_map[tick_price] — returns level index or -1
  • Insert (ADD): Pop free index, write level data, update price map
  • Delete: Write -1 into price map, push index onto free list
  • Modify: Update size in-place (price unchanged)
  • Best bid/ask: Eagerly updated on every ADD/MODIFY that beats the current best. Cold-path re-scan (linear scan of active levels) only triggers when the previously-best level is deleted.

The book is initialized fresh per micro-batch — state carries across all events within one batch but is not persisted between batches.


Trade Enrichment

For each trade, the kernel computes L1-derived analytics from the book state at trade time:

FieldFormulaPurpose
mid_price(best_bid + best_ask) / 2Fair value estimate
spreadbest_ask - best_bidLiquidity cost
effective_spread2 × |trade_price - mid_price|Actual execution cost
book_imbalance(bid_sz - ask_sz) / (bid_sz + ask_sz)Pressure asymmetry
micro_price(bid×ask_sz + ask×bid_sz) / (bid_sz + ask_sz)Volume-weighted fair value
p_buy0.0–1.0Probability trade was buyer-initiated
signed_volumesize × (2×p_buy - 1)Continuous signed volume
trade_direction+1 / -1Discrete buy/sell classification
retprice_t - price_{t-1}Trade-to-trade return
log_returnlog(price_t / price_{t-1})Log return

Trade direction is inferred via a priority chain of three signing methods (configurable via sign_methods): DSIDE (exchange-reported flag) → QUOTE_INFER (trade vs. mid-price) → LEE_READY (quote + tick test). The sign_confidence column records which method produced the final classification.

The full enriched trade output has 28 columns and maps to the cdm_trade_enriched CDM table.


LOB Snapshots

Snapshots capture the full depth of the order book at configurable intervals and map to the cdm_lob_l2 CDM table:

snapshots:
period_seconds: 60.0
depth_levels: 10
on_every_trade: false
interval: 100
ParameterDescription
snapshot_on_every_tradeEmit snapshot after every trade (high frequency, high storage)
snapshot_period_secondsTime-based interval (0 = disabled)
snapshot_intervalTrade-count-based interval (0 = disabled)
depthNumber of price levels to capture per side (default: 10)

Snapshots fire on four independent triggers:

  • Every trade (snapshot_on_every_trade: true)
  • Time interval (snapshot_period_seconds elapsed)
  • Trade count (snapshot_interval trades)
  • CUSUM threshold (cusum_snapshot_threshold > 0 — emits when CUSUM accumulator crosses threshold)

Each snapshot row contains:

  • L1 data: best bid/ask prices, sizes, spread, mid, weighted_mid
  • L2 arrays: bids and asks as arrays of {level, price, size, order_count} structs
  • Depth metrics: total_bid_depth, total_ask_depth, depth_imbalance, vwap_bid, vwap_ask

Integration Points

  • StateEngine.process(): Calls fused_kernel which handles all LOB book operations, trade enrichment, and snapshot emission in one pass
  • StateEngineReader: Queries raw trades + LOB from source engine via engine-specific SQL generators (DuckDB, Trino, BigQuery, Snowflake, Databricks)
  • StateOutputWriter: Routes trades_enrichedcdm_trade_enriched and snapshotscdm_lob_l2 CDM tables
  • StateEngineConfig: Controls all book, enrichment, and snapshot parameters with three-tier resolution (fallbacks → metadata → overrides)