Skip to main content

Metadata Specifications

All QuantFlow configuration is governed by a Pydantic model hierarchy. Every YAML config is validated against these models at load time — type safety and structural correctness enforced before any pipeline runs. The root container is QuantflowMetadata, which composes ProjectConfig, FeedProvider dictionaries, CDMEntity dictionaries, and optional LocalConfig.


Model Hierarchy

QuantflowMetadata
├── project: ProjectConfig (NamedModel)
│ ├── ingest: IngestConfig
│ │ └── feeds[]: IngestFeedConfig
│ ├── data_processing: DataProcessingConfig
│ ├── state_engine: CanonicalStateEngineConfig
│ │ └── bar_groups[]: CanonicalBarGroup
│ │ ├── bars[]: BarConfig
│ │ ├── snapshots: SnapshotConfig
│ │ └── trade_signing: TradeSigningConfig
│ ├── label_engine?: LabelEngineConfig
│ │ └── labels[]: LabelDefinition
│ ├── feature_engine: FeatureEngine
│ │ └── features[]: FeatureDefinition
│ └── sinks[]: SinkConfig
├── feed_providers: Dict[str, FeedProvider] (NamedModel)
│ └── data_types: Dict[str, DataTypeSchema]
│ ├── field_mappings[]: FieldMapping
│ ├── schema: Dict[str, SchemaAttribute]
│ └── tests: TestSuite
├── cdm_entities: Dict[str, CDMEntity]
│ └── attributes: Dict[str, Attribute]
│ └── constraints: AttributeConstraint
└── local?: LocalConfig
├── feed_provider_credentials[]: Credential
└── engine[]: EngineConfig

Base Types

DataType

STRING | TIMESTAMP | DECIMAL | BOOLEAN | BIGINT | ARRAY | JSON | INT64 | INTEGER

PartitionGranularity

hour | day | month | year

SourceType

http | streaming | websocket | database | file

NamedModel

Base for all named entities.

FieldTypeRequiredDescription
namestrYes1–100 characters
descriptionstrNoMax 500 characters

QuantflowMetadata

Root metadata container for the entire system.

FieldTypeRequiredDescription
projectProjectConfigYesProject configuration
feed_providersDict[str, FeedProvider]NoFeed provider definitions keyed by name
cdm_entitiesDict[str, CDMEntity]NoCDM entity definitions keyed by table name
localLocalConfigNoLocal runtime overrides (never committed)
versionstrNoMetadata version
descriptionstrNoHuman-readable description

ProjectConfig

Root project model for quantflow_project.yml. Extends NamedModel.

FieldTypeRequiredDefaultDescription
namestrYesProject name (inherited from NamedModel)
default_pipeline_modestrNo"batch"Pipeline mode: batch or trade
symbolsList[str]No[]Traded symbols
ingestIngestConfigNoIngestion pipeline configuration
data_processingDataProcessingConfigNoCDM processing engine selection
state_engineCanonicalStateEngineConfigNoState engine with bar groups
label_engineLabelEngineConfigNoLabel computation config
feature_engineFeatureEngineNoFeature definitions
sinksList[SinkConfig]No[]Output sink configurations

IngestConfig

Ingestion pipeline configuration.

FieldTypeRequiredDefaultDescription
enabledboolNotrueEnable/disable ingestion
feedsList[IngestFeedConfig]No[]Feed configurations

IngestFeedConfig

FieldTypeRequiredDescription
namestrYesFeed identifier
historical_data_providerstrNoProvider for historical data
streaming_data_providerstrNoProvider for streaming data
symbolsList[str]NoSymbols for this feed
data_typesList[str]NoCDM data types to ingest
disabledboolNoSet true to skip this feed

DataProcessingConfig

CDM processing engine selection.

FieldTypeRequiredDescription
enabledboolNoEnable/disable processing
historical_data_enginestrNoEngine for batch CDM processing
streaming_data_enginestrNoEngine for streaming CDM processing
feedsList[str]NoFeed names to process

FeedProvider

Defines an external data source connection. Extends NamedModel.

FieldTypeRequiredDefaultDescription
namestrYesProvider identifier (inherited)
descriptionstrNoProvider description (inherited)
typeSourceTypeYesConnection protocol
locationstrYesAPI endpoint (must start with http://, https://, file://, gs://, s3://, wss://, ws://)
authstrNoAuthentication method
formatstrNoData format
compressionstrNoCompression
update_frequencystrNoRefresh interval
data_typesDict[str, DataTypeSchema]No{}Data types keyed by name
protocolstrNo"combined_stream" or "subscription"
orderbook_modestrNo"snapshot" or "snapshot_delta"
reconstruction_intervalintNoms between reconstructed snapshots
max_depth_levelsintNoDepth levels to emit

DataTypeSchema

Maps a provider data type to CDM fields.

FieldTypeRequiredDefaultDescription
namestrYesData type name (e.g. trades, orderbook)
streamstrNoExchange stream name for URL substitution
unique_keyList[str]No[]Columns forming a unique row identifier
testsTestSuiteNoTestSuite()Table-level tests
schema_attributesDict[str, SchemaAttribute]No{}Column name → attribute definition (YAML key: schema)
field_mappingsList[FieldMapping]No[]Raw field → CDM column transformations
enabledboolNotrueWhether this data type is active

SchemaAttribute

Column-level schema definition used in DataTypeSchema.schema_attributes.

FieldTypeRequiredDescription
namestrYesColumn name
dtypestrYesData type (YAML key: dtype)
descriptionstrNoColumn description
testsListNoColumn-level test specifications

FieldMapping

Maps a raw source field to a CDM target column with optional QFSQL transformation.

FieldTypeRequiredDefaultDescription
targetstrYesCDM column name (output)
sourcestrNoRaw source column name (input)
transformationstrNoQFSQL expression applied to source
is_time_filter_fieldboolNofalseDesignates this field as the incremental filter key
custom_caststrNoEngine-specific cast override

Rules:

  • At least one of source or transformation should be specified
  • source without transformation implies direct passthrough
  • transformation without source implies a computed column

TestSuite

Collection of tests for a data type or column.

FieldTypeRequiredDefaultDescription
column_testsDict[str, List]No{}Column name → list of test specs
table_testsListNo[]Table-level test specs

Test specs can be simple strings ("not_null") or dicts with parameters:

column_tests:
trade_price:
- not_null
- dbt_utils.accepted_range:
min_value: 0
inclusive: false

table_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns: [symbol, trade_time, trade_id]
- dbt_utils.recency:
datepart: hour
field: event_time
interval: 24

CanonicalStateEngineConfig

State engine configuration with per-symbol bar groups.

FieldTypeRequiredDefaultDescription
enabledboolNotrueEnable/disable state engine
micro_batch_sizeintNo200000Rows per micro-batch
bar_groupsList[CanonicalBarGroup]No[]Per-symbol bar configurations

CanonicalBarGroup

FieldTypeRequiredDescription
namestrYesGroup identifier
symbolsList[str]YesSymbols in this group
barsList[BarConfig]YesBar type configurations
snapshotsSnapshotConfigNoSnapshot parameters
trade_signingTradeSigningConfigNoTrade direction inference

BarConfig

FieldTypeDescription
typestrBar type: time, tick, volume, dollar, imbalance, run, volatility, dollar_imbalance, cusum
interval_minutesfloatTime bar interval (time bars)
countintTrade count threshold (tick bars)
thresholdfloatVolume/dollar threshold
kfloatImbalance/dollar_imbalance threshold
windowintRun bar consecutive trade count

SnapshotConfig

FieldTypeDefaultDescription
on_every_tradeboolfalseSnapshot on every trade
period_secondsfloat60.0Snapshot interval in seconds
intervalint0Nth-trade interval (0 = disabled)
depth_levelsint10Order book depth levels captured

LabelEngineConfig

FieldTypeRequiredDefaultDescription
historical_label_enginestrNo"polars"Engine for label computation
labelsList[LabelDefinition]No[]Label definitions

LabelDefinition

FieldTypeRequiredDefaultDescription
namestrYesUnique label name
typestrYesLabel method type
descriptionstrNo""Human-readable description
parametersDictNo{}Type-specific parameters
inputsDict[str, str]No{}Input role → CDM column name
dependenciesList[str]No[]Required CDM tables or features
bar_typesList[str]NoFilter bars by type
output_namestrNo"label"Output column name

Label types: triple_barrier, fixed_horizon_return, ts_label, trend_scanning, quantile_label, meta_label


FeatureEngine

FieldTypeRequiredDefaultDescription
historical_feature_enginestrYesEngine for historical feature computation
streaming_feature_enginestrYesEngine for streaming feature computation
featuresList[FeatureDefinition]No[]Feature definitions

FeatureDefinition

FieldTypeRequiredDefaultDescription
namestrYesUnique feature name
typestrNoauto from nameFeature type (references a registered FeatureType)
parametersDict[str, Any]No{}Feature-specific parameters
dependenciesList[str]No[]Names of features this one depends on
input_entitystrNoSingle CDM input entity
input_entitiesDict[str, List[str]]No{}Multiple input entity mappings
column_mappingDict[str, str]No{}Column name overrides
input_entity_overridesDictNo{}Per-input entity column overrides
stepsList[Dict]NoInline computation steps (alternative to type reference)

Two modes:

  • Type reference: set type to a registered FeatureType name — parameters merged with defaults
  • Inline: provide steps directly — type auto-generated from name

SinkConfig

FieldTypeRequiredDescription
historical_sinksList[str]NoHistorical sink targets
streaming_sinksList[str]NoStreaming sink targets

CDMEntity

Common Data Model entity definition.

FieldTypeRequiredDescription
table_namestrYesTarget table name
descriptionstrYesEntity description
entity_typestrYesEntity category
attributesDict[str, Attribute]YesColumn name → attribute definition
partition_bystrYesPartition column (must exist in attributes)
partition_granularityPartitionGranularityYesPartition bucket size
cluster_byList[str]YesClustering columns
unique_keyList[str]NoComposite unique key columns

Attribute

Extends NamedModel. Defines a single CDM column.

FieldTypeRequiredDefaultDescription
namestrYesAttribute name (inherited)
descriptionstrNoAttribute description (inherited)
dtypeDataTypeYesColumn data type
requiredboolNotrueWhether the column is NOT NULL
uniqueboolNoWhether values must be unique
constraintsAttributeConstraintNoAttributeConstraint()Value constraints

AttributeConstraint

FieldTypeDescription
max_lengthintMaximum string length
min_lengthintMinimum string length
precisionintDecimal precision
scaleintDecimal scale
min_valueint | floatMinimum numeric value
max_valueint | floatMaximum numeric value
patternstrRegex pattern
element_typestrArray element type

EngineConfig

Database engine connection configuration.

FieldTypeRequiredDescription
namestrYesEngine identifier
hoststrNoHost address
portintNoPort number
databasestrNoDatabase name or file path
authstrNoAuthentication method
keyDictNoCredentials

LocalConfig

Separate configuration (.local_config.yml) for credentials and local overrides. Never committed to version control.

FieldTypeRequiredDefaultDescription
feed_provider_credentialsList[Credential]No[]API credentials per provider
engineList[EngineConfig]No[]Local engine connection overrides
local_cacheDict[str, str]No{}Local cache settings (e.g. path)

Credential

FieldTypeRequiredDescription
providerstrYesProvider name matching a FeedProvider
keySecretStrNoAPI key
usernamestrNoUsername
passwordSecretStrNoPassword
tokenSecretStrNoBearer token

Configuration Layering

Fallback Defaults < Project Metadata < Inline Overrides
  1. Fallback Defaults — Sensible defaults built into each Pydantic model
  2. Project Metadataquantflow_project.yml values
  3. Inline Overrides — CLI flags, environment variables, .local_config.yml