All QuantFlow configuration is governed by a Pydantic model hierarchy. Every YAML config is validated against these models at load time — type safety and structural correctness enforced before any pipeline runs. The root container is QuantflowMetadata, which composes ProjectConfig, FeedProvider dictionaries, CDMEntity dictionaries, and optional LocalConfig.
Model Hierarchy
QuantflowMetadata
├── project: ProjectConfig (NamedModel)
│ ├── ingest: IngestConfig
│ │ └── feeds[]: IngestFeedConfig
│ ├── data_processing: DataProcessingConfig
│ ├── state_engine: CanonicalStateEngineConfig
│ │ └── bar_groups[]: CanonicalBarGroup
│ │ ├── bars[]: BarConfig
│ │ ├── snapshots: SnapshotConfig
│ │ └── trade_signing: TradeSigningConfig
│ ├── label_engine?: LabelEngineConfig
│ │ └── labels[]: LabelDefinition
│ ├── feature_engine: FeatureEngine
│ │ └── features[]: FeatureDefinition
│ └── sinks[]: SinkConfig
├── feed_providers: Dict[str, FeedProvider] (NamedModel)
│ └── data_types: Dict[str, DataTypeSchema]
│ ├── field_mappings[]: FieldMapping
│ ├── schema: Dict[str, SchemaAttribute]
│ └── tests: TestSuite
├── cdm_entities: Dict[str, CDMEntity]
│ └── attributes: Dict[str, Attribute]
│ └── constraints: AttributeConstraint
└── local?: LocalConfig
├── feed_provider_credentials[]: Credential
└── engine[]: EngineConfig
Base Types
DataType
STRING | TIMESTAMP | DECIMAL | BOOLEAN | BIGINT | ARRAY | JSON | INT64 | INTEGER
PartitionGranularity
hour | day | month | year
SourceType
http | streaming | websocket | database | file
NamedModel
Base for all named entities.
| Field | Type | Required | Description |
|---|
name | str | Yes | 1–100 characters |
description | str | No | Max 500 characters |
Root metadata container for the entire system.
| Field | Type | Required | Description |
|---|
project | ProjectConfig | Yes | Project configuration |
feed_providers | Dict[str, FeedProvider] | No | Feed provider definitions keyed by name |
cdm_entities | Dict[str, CDMEntity] | No | CDM entity definitions keyed by table name |
local | LocalConfig | No | Local runtime overrides (never committed) |
version | str | No | Metadata version |
description | str | No | Human-readable description |
ProjectConfig
Root project model for quantflow_project.yml. Extends NamedModel.
| Field | Type | Required | Default | Description |
|---|
name | str | Yes | — | Project name (inherited from NamedModel) |
default_pipeline_mode | str | No | "batch" | Pipeline mode: batch or trade |
symbols | List[str] | No | [] | Traded symbols |
ingest | IngestConfig | No | — | Ingestion pipeline configuration |
data_processing | DataProcessingConfig | No | — | CDM processing engine selection |
state_engine | CanonicalStateEngineConfig | No | — | State engine with bar groups |
label_engine | LabelEngineConfig | No | — | Label computation config |
feature_engine | FeatureEngine | No | — | Feature definitions |
sinks | List[SinkConfig] | No | [] | Output sink configurations |
IngestConfig
Ingestion pipeline configuration.
| Field | Type | Required | Default | Description |
|---|
enabled | bool | No | true | Enable/disable ingestion |
feeds | List[IngestFeedConfig] | No | [] | Feed configurations |
IngestFeedConfig
| Field | Type | Required | Description |
|---|
name | str | Yes | Feed identifier |
historical_data_provider | str | No | Provider for historical data |
streaming_data_provider | str | No | Provider for streaming data |
symbols | List[str] | No | Symbols for this feed |
data_types | List[str] | No | CDM data types to ingest |
disabled | bool | No | Set true to skip this feed |
DataProcessingConfig
CDM processing engine selection.
| Field | Type | Required | Description |
|---|
enabled | bool | No | Enable/disable processing |
historical_data_engine | str | No | Engine for batch CDM processing |
streaming_data_engine | str | No | Engine for streaming CDM processing |
feeds | List[str] | No | Feed names to process |
FeedProvider
Defines an external data source connection. Extends NamedModel.
| Field | Type | Required | Default | Description |
|---|
name | str | Yes | — | Provider identifier (inherited) |
description | str | No | — | Provider description (inherited) |
type | SourceType | Yes | — | Connection protocol |
location | str | Yes | — | API endpoint (must start with http://, https://, file://, gs://, s3://, wss://, ws://) |
auth | str | No | — | Authentication method |
format | str | No | — | Data format |
compression | str | No | — | Compression |
update_frequency | str | No | — | Refresh interval |
data_types | Dict[str, DataTypeSchema] | No | {} | Data types keyed by name |
protocol | str | No | — | "combined_stream" or "subscription" |
orderbook_mode | str | No | — | "snapshot" or "snapshot_delta" |
reconstruction_interval | int | No | — | ms between reconstructed snapshots |
max_depth_levels | int | No | — | Depth levels to emit |
DataTypeSchema
Maps a provider data type to CDM fields.
| Field | Type | Required | Default | Description |
|---|
name | str | Yes | — | Data type name (e.g. trades, orderbook) |
stream | str | No | — | Exchange stream name for URL substitution |
unique_key | List[str] | No | [] | Columns forming a unique row identifier |
tests | TestSuite | No | TestSuite() | Table-level tests |
schema_attributes | Dict[str, SchemaAttribute] | No | {} | Column name → attribute definition (YAML key: schema) |
field_mappings | List[FieldMapping] | No | [] | Raw field → CDM column transformations |
enabled | bool | No | true | Whether this data type is active |
SchemaAttribute
Column-level schema definition used in DataTypeSchema.schema_attributes.
| Field | Type | Required | Description |
|---|
name | str | Yes | Column name |
dtype | str | Yes | Data type (YAML key: dtype) |
description | str | No | Column description |
tests | List | No | Column-level test specifications |
FieldMapping
Maps a raw source field to a CDM target column with optional QFSQL transformation.
| Field | Type | Required | Default | Description |
|---|
target | str | Yes | — | CDM column name (output) |
source | str | No | — | Raw source column name (input) |
transformation | str | No | — | QFSQL expression applied to source |
is_time_filter_field | bool | No | false | Designates this field as the incremental filter key |
custom_cast | str | No | — | Engine-specific cast override |
Rules:
- At least one of
source or transformation should be specified
source without transformation implies direct passthrough
transformation without source implies a computed column
TestSuite
Collection of tests for a data type or column.
| Field | Type | Required | Default | Description |
|---|
column_tests | Dict[str, List] | No | {} | Column name → list of test specs |
table_tests | List | No | [] | Table-level test specs |
Test specs can be simple strings ("not_null") or dicts with parameters:
column_tests:
trade_price:
- not_null
- dbt_utils.accepted_range:
min_value: 0
inclusive: false
table_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns: [symbol, trade_time, trade_id]
- dbt_utils.recency:
datepart: hour
field: event_time
interval: 24
CanonicalStateEngineConfig
State engine configuration with per-symbol bar groups.
| Field | Type | Required | Default | Description |
|---|
enabled | bool | No | true | Enable/disable state engine |
micro_batch_size | int | No | 200000 | Rows per micro-batch |
bar_groups | List[CanonicalBarGroup] | No | [] | Per-symbol bar configurations |
CanonicalBarGroup
| Field | Type | Required | Description |
|---|
name | str | Yes | Group identifier |
symbols | List[str] | Yes | Symbols in this group |
bars | List[BarConfig] | Yes | Bar type configurations |
snapshots | SnapshotConfig | No | Snapshot parameters |
trade_signing | TradeSigningConfig | No | Trade direction inference |
BarConfig
| Field | Type | Description |
|---|
type | str | Bar type: time, tick, volume, dollar, imbalance, run, volatility, dollar_imbalance, cusum |
interval_minutes | float | Time bar interval (time bars) |
count | int | Trade count threshold (tick bars) |
threshold | float | Volume/dollar threshold |
k | float | Imbalance/dollar_imbalance threshold |
window | int | Run bar consecutive trade count |
SnapshotConfig
| Field | Type | Default | Description |
|---|
on_every_trade | bool | false | Snapshot on every trade |
period_seconds | float | 60.0 | Snapshot interval in seconds |
interval | int | 0 | Nth-trade interval (0 = disabled) |
depth_levels | int | 10 | Order book depth levels captured |
LabelEngineConfig
| Field | Type | Required | Default | Description |
|---|
historical_label_engine | str | No | "polars" | Engine for label computation |
labels | List[LabelDefinition] | No | [] | Label definitions |
LabelDefinition
| Field | Type | Required | Default | Description |
|---|
name | str | Yes | — | Unique label name |
type | str | Yes | — | Label method type |
description | str | No | "" | Human-readable description |
parameters | Dict | No | {} | Type-specific parameters |
inputs | Dict[str, str] | No | {} | Input role → CDM column name |
dependencies | List[str] | No | [] | Required CDM tables or features |
bar_types | List[str] | No | — | Filter bars by type |
output_name | str | No | "label" | Output column name |
Label types: triple_barrier, fixed_horizon_return, ts_label, trend_scanning, quantile_label, meta_label
FeatureEngine
| Field | Type | Required | Default | Description |
|---|
historical_feature_engine | str | Yes | — | Engine for historical feature computation |
streaming_feature_engine | str | Yes | — | Engine for streaming feature computation |
features | List[FeatureDefinition] | No | [] | Feature definitions |
FeatureDefinition
| Field | Type | Required | Default | Description |
|---|
name | str | Yes | — | Unique feature name |
type | str | No | auto from name | Feature type (references a registered FeatureType) |
parameters | Dict[str, Any] | No | {} | Feature-specific parameters |
dependencies | List[str] | No | [] | Names of features this one depends on |
input_entity | str | No | — | Single CDM input entity |
input_entities | Dict[str, List[str]] | No | {} | Multiple input entity mappings |
column_mapping | Dict[str, str] | No | {} | Column name overrides |
input_entity_overrides | Dict | No | {} | Per-input entity column overrides |
steps | List[Dict] | No | — | Inline computation steps (alternative to type reference) |
Two modes:
- Type reference: set
type to a registered FeatureType name — parameters merged with defaults
- Inline: provide
steps directly — type auto-generated from name
SinkConfig
| Field | Type | Required | Description |
|---|
historical_sinks | List[str] | No | Historical sink targets |
streaming_sinks | List[str] | No | Streaming sink targets |
CDMEntity
Common Data Model entity definition.
| Field | Type | Required | Description |
|---|
table_name | str | Yes | Target table name |
description | str | Yes | Entity description |
entity_type | str | Yes | Entity category |
attributes | Dict[str, Attribute] | Yes | Column name → attribute definition |
partition_by | str | Yes | Partition column (must exist in attributes) |
partition_granularity | PartitionGranularity | Yes | Partition bucket size |
cluster_by | List[str] | Yes | Clustering columns |
unique_key | List[str] | No | Composite unique key columns |
Attribute
Extends NamedModel. Defines a single CDM column.
| Field | Type | Required | Default | Description |
|---|
name | str | Yes | — | Attribute name (inherited) |
description | str | No | — | Attribute description (inherited) |
dtype | DataType | Yes | — | Column data type |
required | bool | No | true | Whether the column is NOT NULL |
unique | bool | No | — | Whether values must be unique |
constraints | AttributeConstraint | No | AttributeConstraint() | Value constraints |
AttributeConstraint
| Field | Type | Description |
|---|
max_length | int | Maximum string length |
min_length | int | Minimum string length |
precision | int | Decimal precision |
scale | int | Decimal scale |
min_value | int | float | Minimum numeric value |
max_value | int | float | Maximum numeric value |
pattern | str | Regex pattern |
element_type | str | Array element type |
EngineConfig
Database engine connection configuration.
| Field | Type | Required | Description |
|---|
name | str | Yes | Engine identifier |
host | str | No | Host address |
port | int | No | Port number |
database | str | No | Database name or file path |
auth | str | No | Authentication method |
key | Dict | No | Credentials |
LocalConfig
Separate configuration (.local_config.yml) for credentials and local overrides. Never committed to version control.
| Field | Type | Required | Default | Description |
|---|
feed_provider_credentials | List[Credential] | No | [] | API credentials per provider |
engine | List[EngineConfig] | No | [] | Local engine connection overrides |
local_cache | Dict[str, str] | No | {} | Local cache settings (e.g. path) |
Credential
| Field | Type | Required | Description |
|---|
provider | str | Yes | Provider name matching a FeedProvider |
key | SecretStr | No | API key |
username | str | No | Username |
password | SecretStr | No | Password |
token | SecretStr | No | Bearer token |
Configuration Layering
Fallback Defaults < Project Metadata < Inline Overrides
- Fallback Defaults — Sensible defaults built into each Pydantic model
- Project Metadata —
quantflow_project.yml values
- Inline Overrides — CLI flags, environment variables,
.local_config.yml