Bar Build
The State Engine produces 9 bar types in two families: activity-sampled (triggered by volume, trade count, or time) and information-driven (triggered by statistical conditions). Each bar type has init_* and update_* functions in the Numba kernel following the pattern: receive current state + new trade → (next_state, is_emitted, bar_tuple).
Activity-Sampled Bars
Sample the market at predetermined thresholds. Produce more observations when activity is high, fewer when it's quiet.
| Bar Type | Trigger | Default Param | Kernel File |
|---|---|---|---|
| Tick | trade_count >= count | count: 500 | kernel/bars/tick_bar.py |
| Volume | cumulative_volume >= threshold | threshold: 10000.0 | kernel/bars/volume_bar.py |
| Dollar | cumulative_dollar_vol >= threshold | threshold: 500000.0 | kernel/bars/dollar_bar.py |
| Time | elapsed >= interval_seconds | interval: 30.0s | kernel/bars/time_bar.py |
Tick, volume, and dollar bars adapt to market tempo. Dollar bars additionally account for price changes — a bar at $50k BTC takes fewer trades than at $20k BTC.
Information-Driven Bars
Trigger on statistical conditions rather than fixed thresholds. Each maintains an exponentially weighted running estimate of the expected threshold. Inspired by modern financial ML for producing IID-like samples from non-stationary data.
| Bar Type | Trigger | Default Param | event_value |
|---|---|---|---|
| Imbalance | abs(cum signed volume) >= K | k: 100.0 | Signed volume at trigger |
| Run | Consecutive same-direction trades >= W | window: 10 | Run length |
| Volatility | EWMA variance > threshold (n > 2) | threshold: 0.0001 | EWMA variance |
| Dollar Imbalance | abs(cum signed dollar vol) >= K | k: 500000.0 | Signed dollar volume |
| CUSUM | s_pos >= h or s_neg <= -h | threshold: 0.5 | CUSUM value |
Trigger Details
- Imbalance Bars: Track cumulative signed volume (buy=+vol, sell=-vol). Trigger when absolute imbalance exceeds K. Captures order flow toxicity — large directional pressure signals informed trading.
- Run Bars: Count consecutive same-direction trades. Trigger when run length exceeds W. The kernel also supports directional variants (buy-run, sell-run) that count only one side.
- Volatility Bars: EWMA of squared returns. Trigger when running variance exceeds threshold. Fire during volatile periods, stay quiet during calm.
- Dollar Imbalance Bars: Dollar-volume-weighted imbalance — large-dollar trades carry more weight.
- CUSUM Bars: Dual-sided filter — s_pos and s_neg track deviation from a drift term. When either crosses the threshold, a bar is emitted and both reset. Detects structural breaks.
Trade Signing
Trade direction is inferred via a priority chain of signing methods (configurable via sign_methods):
- DSIDE — use exchange-reported aggressor flag if available
- QUOTE_INFER — compare trade price to mid-price; buy if trade ≥ ask, sell if trade ≤ bid
- LEE_READY — quote test + tick test fallback
The p_buy column stores the continuous probability (0.0–1.0) that a trade was buyer-initiated, enabling signed_volume = size * (2*p_buy - 1) as a continuous alternative to discrete +1/-1 direction.
Bar Output Schema
Every bar has 13 columns in the kernel output (plus metadata columns added by the writer):
| Column | Description |
|---|---|
start_time | Bar opening timestamp |
end_time | Bar closing timestamp |
open | Opening price |
high | Highest price in bar |
low | Lowest price in bar |
close | Closing price |
volume | Total traded volume |
dollar_volume | Total traded notional |
trade_count | Number of trades |
vwap | Volume-weighted average price |
avg_trade_size | Average size per trade |
bar_index | Sequential bar number |
event_value | Trigger value (0 for activity-sampled, trigger-specific for information-driven) |
param_set_id strings encode configuration parameters (e.g. "k_100", "t_0.0001", "w_10") to distinguish bars generated with different thresholds.
Configuration
state_engine:
micro_batch_size: 200000
bar_groups:
- name: liquid_equities
symbols: [SPY, QQQ, NVDA, AAPL]
bars:
- type: time
interval_minutes: 1
- type: tick
count: 200
- type: dollar
threshold: 100000.0
- type: volume
threshold: 10000.0
- type: imbalance
k: 100.0
- type: run
window: 10
- type: volatility
threshold: 0.0001
- type: dollar_imbalance
k: 100000.0
- type: cusum
threshold: 0.5
snapshots:
period_seconds: 60.0
depth_levels: 10
trade_signing:
method: quote
Bar groups allow per-symbol bar configurations — different instruments get different thresholds based on their liquidity profiles.
The active_bar_types config field (default: ["tick", "volume", "dollar", "time"]) controls which bar types the kernel actually computes. Only listed types allocate accumulator arrays and emit output. Set it explicitly to include information-driven bars in production.
Batchers
Two batcher implementations feed data to the State Engine:
| Batcher | Source | Use Case |
|---|---|---|
| MockBatcher | Synthetic NumPy arrays (60% trades, 40% LOB) | Testing and benchmarking — no database needed |
| SourceBatcher | StateEngineReader over a real (venue, symbol) pair | Production — streams RecordBatches from the target engine |