Skip to main content

Bar Build

The State Engine produces 9 bar types in two families: activity-sampled (triggered by volume, trade count, or time) and information-driven (triggered by statistical conditions). Each bar type has init_* and update_* functions in the Numba kernel following the pattern: receive current state + new trade → (next_state, is_emitted, bar_tuple).


Activity-Sampled Bars

Sample the market at predetermined thresholds. Produce more observations when activity is high, fewer when it's quiet.

Bar TypeTriggerDefault ParamKernel File
Ticktrade_count >= countcount: 500kernel/bars/tick_bar.py
Volumecumulative_volume >= thresholdthreshold: 10000.0kernel/bars/volume_bar.py
Dollarcumulative_dollar_vol >= thresholdthreshold: 500000.0kernel/bars/dollar_bar.py
Timeelapsed >= interval_secondsinterval: 30.0skernel/bars/time_bar.py

Tick, volume, and dollar bars adapt to market tempo. Dollar bars additionally account for price changes — a bar at $50k BTC takes fewer trades than at $20k BTC.


Information-Driven Bars

Trigger on statistical conditions rather than fixed thresholds. Each maintains an exponentially weighted running estimate of the expected threshold. Inspired by modern financial ML for producing IID-like samples from non-stationary data.

Bar TypeTriggerDefault Paramevent_value
Imbalanceabs(cum signed volume) >= Kk: 100.0Signed volume at trigger
RunConsecutive same-direction trades >= Wwindow: 10Run length
VolatilityEWMA variance > threshold (n > 2)threshold: 0.0001EWMA variance
Dollar Imbalanceabs(cum signed dollar vol) >= Kk: 500000.0Signed dollar volume
CUSUMs_pos >= h or s_neg <= -hthreshold: 0.5CUSUM value

Trigger Details

  • Imbalance Bars: Track cumulative signed volume (buy=+vol, sell=-vol). Trigger when absolute imbalance exceeds K. Captures order flow toxicity — large directional pressure signals informed trading.
  • Run Bars: Count consecutive same-direction trades. Trigger when run length exceeds W. The kernel also supports directional variants (buy-run, sell-run) that count only one side.
  • Volatility Bars: EWMA of squared returns. Trigger when running variance exceeds threshold. Fire during volatile periods, stay quiet during calm.
  • Dollar Imbalance Bars: Dollar-volume-weighted imbalance — large-dollar trades carry more weight.
  • CUSUM Bars: Dual-sided filter — s_pos and s_neg track deviation from a drift term. When either crosses the threshold, a bar is emitted and both reset. Detects structural breaks.

Trade Signing

Trade direction is inferred via a priority chain of signing methods (configurable via sign_methods):

  1. DSIDE — use exchange-reported aggressor flag if available
  2. QUOTE_INFER — compare trade price to mid-price; buy if trade ≥ ask, sell if trade ≤ bid
  3. LEE_READY — quote test + tick test fallback

The p_buy column stores the continuous probability (0.0–1.0) that a trade was buyer-initiated, enabling signed_volume = size * (2*p_buy - 1) as a continuous alternative to discrete +1/-1 direction.


Bar Output Schema

Every bar has 13 columns in the kernel output (plus metadata columns added by the writer):

ColumnDescription
start_timeBar opening timestamp
end_timeBar closing timestamp
openOpening price
highHighest price in bar
lowLowest price in bar
closeClosing price
volumeTotal traded volume
dollar_volumeTotal traded notional
trade_countNumber of trades
vwapVolume-weighted average price
avg_trade_sizeAverage size per trade
bar_indexSequential bar number
event_valueTrigger value (0 for activity-sampled, trigger-specific for information-driven)

param_set_id strings encode configuration parameters (e.g. "k_100", "t_0.0001", "w_10") to distinguish bars generated with different thresholds.


Configuration

state_engine:
micro_batch_size: 200000
bar_groups:
- name: liquid_equities
symbols: [SPY, QQQ, NVDA, AAPL]
bars:
- type: time
interval_minutes: 1
- type: tick
count: 200
- type: dollar
threshold: 100000.0
- type: volume
threshold: 10000.0
- type: imbalance
k: 100.0
- type: run
window: 10
- type: volatility
threshold: 0.0001
- type: dollar_imbalance
k: 100000.0
- type: cusum
threshold: 0.5
snapshots:
period_seconds: 60.0
depth_levels: 10
trade_signing:
method: quote

Bar groups allow per-symbol bar configurations — different instruments get different thresholds based on their liquidity profiles.

The active_bar_types config field (default: ["tick", "volume", "dollar", "time"]) controls which bar types the kernel actually computes. Only listed types allocate accumulator arrays and emit output. Set it explicitly to include information-driven bars in production.


Batchers

Two batcher implementations feed data to the State Engine:

BatcherSourceUse Case
MockBatcherSynthetic NumPy arrays (60% trades, 40% LOB)Testing and benchmarking — no database needed
SourceBatcherStateEngineReader over a real (venue, symbol) pairProduction — streams RecordBatches from the target engine