Custom SQL Backends
Add support for new database engines in the State Engine.
Custom SQL Backends (State Engine)
The State Engine generates SQL to read source data. DuckDB and BigQuery are built-in. To add another database:
Steps
- Subclass
StateEngineSQLGenerator(abstract base instate_engine/sql/base.py). - Implement three query builder methods.
- Register:
register_generator("my_backend", MyBackendGenerator).
Required Methods
Each method receives (table, symbol, venue, start_time, end_time) and returns a SQL string:
| Method | Query must return |
|---|---|
build_trades_sql | event_type (2), time (epoch float), price (double), size (double), side (int), update_type (int), is_buyer_maker (int), trade_id (int64) |
build_lob_sql | event_type (1), time (epoch float), price (integer ticks), size, side, update_type (1/2/3), trade_id (null) |
build_lob_snapshot_sql | Same as LOB, but rows exploded from JSON array snapshots |
Implementation Checklist
- Price-to-tick: Multiply float price by
1/tick_size, cast to integer. Use your DB's native cast (::BIGINT,SAFE_CAST,CAST, etc.). - Epoch extraction: Convert timestamps to epoch seconds as float64.
- LOB update type inference: Determine ADD/MODIFY/DELETE in SQL from quantity changes (quantity=0 → DELETE, quantity>0 with existing level → MODIFY, else → ADD). The Numba kernel expects a clean integer.
- Snapshot explosion: For JSON array storage, explode each element into individual rows with UNNEST/JSON_EXTRACT equivalents.
- Time filtering: Apply venue/symbol/time-range filters in WHERE clauses for efficient querying.
Custom Generator Registration
from quantflow.state_engine.sql.registry import register_generator
from quantflow.state_engine.sql.base import StateEngineSQLGenerator
class MyBackendGenerator(StateEngineSQLGenerator):
def build_trades_sql(self, table, symbol, venue, start_time, end_time):
...
def build_lob_sql(self, table, symbol, venue, start_time, end_time):
...
def build_lob_snapshot_sql(self, table, symbol, venue, start_time, end_time):
...
register_generator("my_backend", MyBackendGenerator)
The historical_data_engine setting in the project config (data_processing.historical_data_engine) determines which generator is used.