Overview
QuantFlow's declarative specification system — two engine-agnostic languages plus the config schema and quality rules they operate within. Everything is defined in YAML, validated at compile time, and compiled to backend-specific execution.
| QFSQL | Formula Language | Metadata Specs | Test Specs | |
|---|---|---|---|---|
| What | SQL dialect | Math DSL | Config schema | Quality rules |
| Stage | DataInfra | FeatureDAG | All layers | DataInfra |
| Audience | Data engineers | Quant researchers | Platform engineers | Data engineers |
| Used in | transformation: | formula: | quantflow_project.yml | tests: blocks |
QFSQL
Engine-agnostic SQL dialect for DataInfra field mappings. Each function translates to native SQL across four engines — the same transformation string produces correct SQL everywhere.
# Feed provider field mapping
field_mapping:
- target: trade_price
source: raw_price
transformation: "cast_safe(raw_price, numeric)"
- target: event_time
transformation: "timestamp_ms(cast_safe(trade_timestamp, bigint))"
Functions organized across cast/convert, date/time, string, numeric/aggregate, conditional, array/JSON, and hash categories. Each function documents per-engine SQL output.
Formula Language
Mathematical DSL for FeatureType definitions. Formula strings are parsed via Python's ast module, compiled to an IR DAG, and lowered to Polars or DolphinDB expressions — same definitions, dual runtime — no duplicate implementations.
# FeatureType YAML definition
formula: cumsum((diff(best_bid_size, window) - diff(best_ask_size, window)), 0)
Functions span arithmetic, unary transforms, window aggregates, lag/diff, autocorrelation, entropy, state accumulators, and specialized indicators. Parameters are prefixed with $ and resolved at compile time from the parameters: block.
Metadata Specifications
Pydantic model hierarchy governing all QuantFlow YAML configuration. Every config file is validated against these models at load time — type safety and structural correctness enforced before any pipeline runs.
QuantflowMetadata
├── project: ProjectConfig
│ ├── ingest: IngestConfig
│ │ └── feeds[]: IngestFeed
│ ├── data_processing: DataProcessingConfig
│ │ ├── historical_data_engine: str
│ │ └── streaming_data_engine: str
│ ├── state_engine: CanonicalStateEngineConfig
│ │ └── bar_groups[]: CanonicalBarGroup
│ │ ├── bars[]: BarConfig
│ │ ├── snapshots: SnapshotConfig
│ │ └── trade_signing: TradeSigningConfig
│ ├── label_engine?: LabelEngineConfig
│ │ └── labels[]: LabelDefinition
│ ├── feature_engine: FeatureEngine
│ │ └── features[]: FeatureDefinition
│ └── sinks[]: SinkConfig
├── feed_providers: Dict[str, FeedProvider]
│ └── data_types: Dict[str, DataTypeSchema]
│ ├── field_mappings[]: FieldMapping
│ └── schema: Dict[str, SchemaAttribute]
├── cdm_entities: Dict[str, CDMEntity]
│ └── attributes: Dict[str, Attribute]
└── local?: LocalConfig
├── engine[]: EngineConfig
└── feed_provider_credentials[]: Credential
Covers the root ProjectConfig, ingest and data processing configs, feed providers, CDM entities with field schemas, state/label/feature engine configs, engine connections, and local overrides.
Data Quality Tests
Four-layer validation framework, with the primary user-facing layer being YAML-declared dbt tests in feed provider configs. Tests are defined per-column or per-table and auto-generated into dbt test cases.
tests:
- not_null
- dbt_utils.unique_combination_of_columns:
combination_of_columns: [symbol, trade_time, trade_id]
- dbt_utils.accepted_range:
min_value: 0
inclusive: false
- dbt_utils.recency:
datepart: hour
field: event_time
interval: 24
Covers column-level tests (not_null, accepted_range, accepted_values) and table-level tests (unique_combination, recency, expression_is_true).
Shared Design
All four specifications follow the same principles:
- Declarative — intent in YAML, not imperative code
- Engine-agnostic — one definition, multiple backend targets (SQL adapters for QFSQL, lowering registry for formula language)
- Compile-time validated — errors caught before execution: Pydantic models for config schemas, Pandera for ingestion, IR DAG contracts for formulas, dbt tests for warehouse data
- Registry-dispatched — backend implementations registered via decorators; add a new engine without changing definitions