Configuration Guide
Every setting in
quantflow_project.ymland.local_config.yml, explained.
1. Project Config (quantflow_project.yml)
The project config is organized into pipeline stages, each with an enabled flag. Configure them top to bottom.
1.1 Top-Level Settings
name: my_project
default_pipeline_mode: batch # "batch" = research, "trade" = streaming
symbols:
- BTCUSDT
- ETHUSDT
| Field | Options | Purpose |
|---|---|---|
name | Any string | Project identifier |
default_pipeline_mode | batch, trade | Which mode qf run uses by default |
symbols | List of strings | Trading instruments to process |
1.2 Ingest
ingest:
enabled: true
feeds:
- name: crypto_binance
historical_data_provider: binance_future_historical
streaming_data_provider: binance_streaming
symbols:
- BTCUSDT
- ETHUSDT
data_types:
- cdm_trades
- cdm_lob_l1
- name: equities_databento
historical_data_provider: equities_databento_historical
streaming_data_provider: equities_ig_paper_streaming
symbols:
- QQQ
data_types:
- cdm_trades
Each feed pairs a historical provider (batch downloads) with a streaming provider (live WebSocket). Provider names must match definitions in .definitions/feed_providers/.
| Field | Required | Description |
|---|---|---|
name | Yes | Feed identifier |
historical_data_provider | Yes | Provider for historical data |
streaming_data_provider | Yes | Provider for streaming data |
symbols | No | Override global symbols for this feed |
data_types | No | Data types to ingest (e.g. cdm_trades, cdm_lob_l1) |
disabled | No | Set true to skip this feed |
1.3 Data Processing
data_processing:
enabled: true
historical_data_engine: openlakehouse
streaming_data_engine: dolphindb
feeds:
- crypto_binance
- equities_databento
Controls which engine backends process raw data into CDM tables and which feeds flow through the processing pipeline.
| Field | Required | Description |
|---|---|---|
historical_data_engine | Yes | Engine for batch CDM processing (duckdb, openlakehouse, bigquery, snowflake, databricks) |
streaming_data_engine | Yes | Engine for streaming CDM processing (dolphindb) |
feeds | No | Feed names to process (defaults to all) |
1.4 State Engine
Constructs bars, enriched trades, and LOB snapshots from raw CDM data:
state_engine:
enabled: true
micro_batch_size: 200000
bar_groups:
- name: liquid_equities
symbols: [SPY, QQQ, NVDA, AAPL, MSFT]
trade_signing:
method: quote
bars:
- type: time
interval_minutes: 1
- type: dollar
threshold: 100000.0
- type: tick
count: 200
- type: imbalance
k: 100.0
- type: run
window: 10
- type: volatility
threshold: 0.0001
- type: dollar_imbalance
k: 100000.0
- type: cusum
threshold: 0.5
snapshots:
period_seconds: 60.0
depth_levels: 10
- name: medium_equities
symbols: [COIN, MSTR, RKLB]
trade_signing:
method: quote
bars:
- type: time
interval_minutes: 1
- type: dollar
threshold: 10000.0
- type: tick
count: 100
snapshots:
period_seconds: 60.0
depth_levels: 10
bar_groups allow per-symbol bar configurations. Each group specifies which symbols use which bar types and thresholds — useful when different instruments have different liquidity profiles.
State Engine parameters:
| Parameter | Default | Effect |
|---|---|---|
enabled | true | Enable/disable state engine stage |
micro_batch_size | 200000 | Events per batch — larger = faster, more memory |
bar_groups | — | List of CanonicalBarGroup with per-symbol bar configs |
Bar types (within each bar_group.bars): Each bar is a list entry with a type and type-specific parameters. All 9 bar types are optional — only those listed are deployed:
| type | Parameter | Description |
|---|---|---|
tick | count | Trades per bar |
volume | threshold | Volume units per bar |
dollar | threshold | Dollar volume per bar |
time | interval_minutes | Minutes per bar |
imbalance | k | Signed volume threshold trigger |
run | window | Consecutive same-direction trades |
volatility | threshold | EWMA variance threshold |
dollar_imbalance | k | Signed dollar volume trigger |
cusum | threshold | CUSUM filter threshold |
Snapshot parameters (within each bar_group.snapshots):
| Parameter | Default | Effect |
|---|---|---|
period_seconds | 60.0 | Time-based snapshot interval (0 = disabled) |
depth_levels | 10 | LOB depth levels per side |
on_every_trade | false | Emit a snapshot after every trade |
Trade signing (within each bar_group.trade_signing):
| Parameter | Options | Description |
|---|---|---|
method | quote, tick, volume | Algorithm for inferring trade direction |
1.5 Label Engine
Label computation for ML training (batch only):
label_engine:
enabled: true
historical_label_engine: polars
labels:
- name: triple_barrier_20_2pct
type: triple_barrier
description: 20-period triple barrier with 2% thresholds
parameters:
horizon: 20
upper_barrier: 0.02
lower_barrier: 0.02
vertical_barrier: 20
inputs:
close: close
high: high
low: low
bar_types: [time_1m]
label_engine is a LabelEngineConfig with an engine choice and a list of LabelDefinition entries:
| Field | Required | Description |
|---|---|---|
historical_label_engine | No (default: polars) | Engine for label computation |
labels | No | List of label definitions |
Each label definition:
| Field | Required | Description |
|---|---|---|
name | Yes | Unique identifier |
type | Yes | Label method type |
description | No | Human-readable description |
parameters | No | Type-specific settings |
inputs | No | Maps logical names (close, high) to CDM column names |
dependencies | No | Required CDM tables or features |
bar_types | No | Filter bars by type (e.g. [time_1m]) |
output_name | No (default: label) | Output column name |
Built-in label types:
| type | What it produces | Key params |
|---|---|---|
triple_barrier | +1 (TP), -1 (SL), 0 (expiry) | horizon, upper_barrier, lower_barrier, vertical_barrier |
fixed_horizon_return | Forward return; optional class binning | horizon, return_type (simple/log), binning |
trend_scanning | +1/-1/0 regime detection (no look-ahead) | threshold, drift |
ts_label | Direction classifier with noise band | horizon, threshold |
meta_label | Secondary model of primary label correctness | primary_label, features |
quantile_label | Quantile-based categorical labels | num_quantiles, horizon |
1.6 Feature Engine
feature_engine:
enabled: true
historical_feature_engine: polars
streaming_feature_engine: dolphindb
features:
- name: breakout_strength
type: price_velocity_volume_ratio
normalization:
method: rolling_zscore
clip: [-5, 5]
bar_aggregation: max
- name: order_flow_imbalance
type: ofi
params:
window: 5
decay: 0.95
bar: imbalance_k_10
staleness:
ttl_seconds: 5
on_stale: decay
| Field | Required | Default | Description |
|---|---|---|---|
enabled | No | true | Enable/disable feature engine stage |
historical_feature_engine | Yes | — | Batch engine (polars) |
streaming_feature_engine | Yes | — | Streaming engine (dolphindb) |
features | No | [] | List of FeatureDefinition entries (type ref with overrides) |
features | No | [] | Feature definitions with parameter overrides |
feature_types_dir | No | .definitions/feature_types | Directory for FeatureType YAML definitions |
FeatureDefinition fields:
| Field | Required | Description |
|---|---|---|
name | Yes | Unique feature instance name |
type | Yes | FeatureType name (matches .definitions/feature_types/*.yml) |
params | No | Parameter overrides for the FeatureType |
normalization | No | {method, clip, window} — z-score, min-max, or none |
bar_aggregation | No | Aggregation for tick_to_bar mode (mean, std, last, sum, max, min) |
bar | No | Bar clock identifier (e.g. imbalance_k_10) |
staleness | No | {ttl_seconds, on_stale} — stale data handling policy |
inputs | No | Upstream feature dependencies |
1.7 Sinks
sinks:
- historical_sinks:
- duckdb
streaming_sinks:
- kafka
Each SinkConfig specifies output destinations. At least one of historical_sinks or streaming_sinks must be set.
2. Local Config (.local_config.yml)
engine:
- name: duckdb
database: ./data/my_project/quantflow.duckdb
- name: dolphindb
host: localhost
port: 8848
auth: password
key:
username: admin
password: "123456"
- name: kafka
host: localhost
port: 9092
feed_provider_credentials:
- provider: cryptohftdata
key: "your-api-key-here"
local_cache:
path: ./.cache/raw
| Section | Contents |
|---|---|
engine | One block per engine with name, host, port, database, auth, key |
feed_provider_credentials | List of Credential objects (provider, key, username, password, token) |
local_cache | Dict with path key — where ingested raw data files are cached |
This file is git-ignored. Never commit it.