Skip to main content

Configuration Guide

Every setting in quantflow_project.yml and .local_config.yml, explained.


1. Project Config (quantflow_project.yml)

The project config is organized into pipeline stages, each with an enabled flag. Configure them top to bottom.

1.1 Top-Level Settings

name: my_project
default_pipeline_mode: batch # "batch" = research, "trade" = streaming
symbols:
- BTCUSDT
- ETHUSDT
FieldOptionsPurpose
nameAny stringProject identifier
default_pipeline_modebatch, tradeWhich mode qf run uses by default
symbolsList of stringsTrading instruments to process

1.2 Ingest

ingest:
enabled: true
feeds:
- name: crypto_binance
historical_data_provider: binance_future_historical
streaming_data_provider: binance_streaming
symbols:
- BTCUSDT
- ETHUSDT
data_types:
- cdm_trades
- cdm_lob_l1
- name: equities_databento
historical_data_provider: equities_databento_historical
streaming_data_provider: equities_ig_paper_streaming
symbols:
- QQQ
data_types:
- cdm_trades

Each feed pairs a historical provider (batch downloads) with a streaming provider (live WebSocket). Provider names must match definitions in .definitions/feed_providers/.

FieldRequiredDescription
nameYesFeed identifier
historical_data_providerYesProvider for historical data
streaming_data_providerYesProvider for streaming data
symbolsNoOverride global symbols for this feed
data_typesNoData types to ingest (e.g. cdm_trades, cdm_lob_l1)
disabledNoSet true to skip this feed

1.3 Data Processing

data_processing:
enabled: true
historical_data_engine: openlakehouse
streaming_data_engine: dolphindb
feeds:
- crypto_binance
- equities_databento

Controls which engine backends process raw data into CDM tables and which feeds flow through the processing pipeline.

FieldRequiredDescription
historical_data_engineYesEngine for batch CDM processing (duckdb, openlakehouse, bigquery, snowflake, databricks)
streaming_data_engineYesEngine for streaming CDM processing (dolphindb)
feedsNoFeed names to process (defaults to all)

1.4 State Engine

Constructs bars, enriched trades, and LOB snapshots from raw CDM data:

state_engine:
enabled: true
micro_batch_size: 200000
bar_groups:
- name: liquid_equities
symbols: [SPY, QQQ, NVDA, AAPL, MSFT]
trade_signing:
method: quote
bars:
- type: time
interval_minutes: 1
- type: dollar
threshold: 100000.0
- type: tick
count: 200
- type: imbalance
k: 100.0
- type: run
window: 10
- type: volatility
threshold: 0.0001
- type: dollar_imbalance
k: 100000.0
- type: cusum
threshold: 0.5
snapshots:
period_seconds: 60.0
depth_levels: 10
- name: medium_equities
symbols: [COIN, MSTR, RKLB]
trade_signing:
method: quote
bars:
- type: time
interval_minutes: 1
- type: dollar
threshold: 10000.0
- type: tick
count: 100
snapshots:
period_seconds: 60.0
depth_levels: 10

bar_groups allow per-symbol bar configurations. Each group specifies which symbols use which bar types and thresholds — useful when different instruments have different liquidity profiles.

State Engine parameters:

ParameterDefaultEffect
enabledtrueEnable/disable state engine stage
micro_batch_size200000Events per batch — larger = faster, more memory
bar_groupsList of CanonicalBarGroup with per-symbol bar configs

Bar types (within each bar_group.bars): Each bar is a list entry with a type and type-specific parameters. All 9 bar types are optional — only those listed are deployed:

typeParameterDescription
tickcountTrades per bar
volumethresholdVolume units per bar
dollarthresholdDollar volume per bar
timeinterval_minutesMinutes per bar
imbalancekSigned volume threshold trigger
runwindowConsecutive same-direction trades
volatilitythresholdEWMA variance threshold
dollar_imbalancekSigned dollar volume trigger
cusumthresholdCUSUM filter threshold

Snapshot parameters (within each bar_group.snapshots):

ParameterDefaultEffect
period_seconds60.0Time-based snapshot interval (0 = disabled)
depth_levels10LOB depth levels per side
on_every_tradefalseEmit a snapshot after every trade

Trade signing (within each bar_group.trade_signing):

ParameterOptionsDescription
methodquote, tick, volumeAlgorithm for inferring trade direction

1.5 Label Engine

Label computation for ML training (batch only):

label_engine:
enabled: true
historical_label_engine: polars
labels:
- name: triple_barrier_20_2pct
type: triple_barrier
description: 20-period triple barrier with 2% thresholds
parameters:
horizon: 20
upper_barrier: 0.02
lower_barrier: 0.02
vertical_barrier: 20
inputs:
close: close
high: high
low: low
bar_types: [time_1m]

label_engine is a LabelEngineConfig with an engine choice and a list of LabelDefinition entries:

FieldRequiredDescription
historical_label_engineNo (default: polars)Engine for label computation
labelsNoList of label definitions

Each label definition:

FieldRequiredDescription
nameYesUnique identifier
typeYesLabel method type
descriptionNoHuman-readable description
parametersNoType-specific settings
inputsNoMaps logical names (close, high) to CDM column names
dependenciesNoRequired CDM tables or features
bar_typesNoFilter bars by type (e.g. [time_1m])
output_nameNo (default: label)Output column name

Built-in label types:

typeWhat it producesKey params
triple_barrier+1 (TP), -1 (SL), 0 (expiry)horizon, upper_barrier, lower_barrier, vertical_barrier
fixed_horizon_returnForward return; optional class binninghorizon, return_type (simple/log), binning
trend_scanning+1/-1/0 regime detection (no look-ahead)threshold, drift
ts_labelDirection classifier with noise bandhorizon, threshold
meta_labelSecondary model of primary label correctnessprimary_label, features
quantile_labelQuantile-based categorical labelsnum_quantiles, horizon

1.6 Feature Engine

feature_engine:
enabled: true
historical_feature_engine: polars
streaming_feature_engine: dolphindb
features:
- name: breakout_strength
type: price_velocity_volume_ratio
normalization:
method: rolling_zscore
clip: [-5, 5]
bar_aggregation: max
- name: order_flow_imbalance
type: ofi
params:
window: 5
decay: 0.95
bar: imbalance_k_10
staleness:
ttl_seconds: 5
on_stale: decay
FieldRequiredDefaultDescription
enabledNotrueEnable/disable feature engine stage
historical_feature_engineYesBatch engine (polars)
streaming_feature_engineYesStreaming engine (dolphindb)
featuresNo[]List of FeatureDefinition entries (type ref with overrides)
featuresNo[]Feature definitions with parameter overrides
feature_types_dirNo.definitions/feature_typesDirectory for FeatureType YAML definitions

FeatureDefinition fields:

FieldRequiredDescription
nameYesUnique feature instance name
typeYesFeatureType name (matches .definitions/feature_types/*.yml)
paramsNoParameter overrides for the FeatureType
normalizationNo{method, clip, window} — z-score, min-max, or none
bar_aggregationNoAggregation for tick_to_bar mode (mean, std, last, sum, max, min)
barNoBar clock identifier (e.g. imbalance_k_10)
stalenessNo{ttl_seconds, on_stale} — stale data handling policy
inputsNoUpstream feature dependencies

1.7 Sinks

sinks:
- historical_sinks:
- duckdb
streaming_sinks:
- kafka

Each SinkConfig specifies output destinations. At least one of historical_sinks or streaming_sinks must be set.


2. Local Config (.local_config.yml)

engine:
- name: duckdb
database: ./data/my_project/quantflow.duckdb

- name: dolphindb
host: localhost
port: 8848
auth: password
key:
username: admin
password: "123456"

- name: kafka
host: localhost
port: 9092

feed_provider_credentials:
- provider: cryptohftdata
key: "your-api-key-here"

local_cache:
path: ./.cache/raw
SectionContents
engineOne block per engine with name, host, port, database, auth, key
feed_provider_credentialsList of Credential objects (provider, key, username, password, token)
local_cacheDict with path key — where ingested raw data files are cached

This file is git-ignored. Never commit it.