Overview

QuantFlow's declarative specification system — two engine-agnostic languages plus the config schema and quality rules they operate within. Everything is defined in YAML, validated at compile time, and compiled to backend-specific execution.

	QFSQL	Formula Language	Metadata Specs	Test Specs
What	SQL dialect	Math DSL	Config schema	Quality rules
Stage	DataInfra	FeatureDAG	All layers	DataInfra
Audience	Data engineers	Quant researchers	Platform engineers	Data engineers
Used in	`transformation:`	`formula:`	`quantflow_project.yml`	`tests:` blocks

QFSQL

Engine-agnostic SQL dialect for DataInfra field mappings. Each function translates to native SQL across four engines — the same transformation string produces correct SQL everywhere.

# Feed provider field mapping
field_mapping:
  - target: trade_price
    source: raw_price
    transformation: "cast_safe(raw_price, numeric)"
  - target: event_time
    transformation: "timestamp_ms(cast_safe(trade_timestamp, bigint))"

Functions organized across cast/convert, date/time, string, numeric/aggregate, conditional, array/JSON, and hash categories. Each function documents per-engine SQL output.

→ QFSQL Reference

Formula Language

Mathematical DSL for FeatureType definitions. Formula strings are parsed via Python's ast module, compiled to an IR DAG, and lowered to Polars or DolphinDB expressions — same definitions, dual runtime — no duplicate implementations.

# FeatureType YAML definition
formula: cumsum((diff(best_bid_size, window) - diff(best_ask_size, window)), 0)

Functions span arithmetic, unary transforms, window aggregates, lag/diff, autocorrelation, entropy, state accumulators, and specialized indicators. Parameters are prefixed with $ and resolved at compile time from the parameters: block.

→ Formula Language Reference

Metadata Specifications

Pydantic model hierarchy governing all QuantFlow YAML configuration. Every config file is validated against these models at load time — type safety and structural correctness enforced before any pipeline runs.

QuantflowMetadata
├── project: ProjectConfig
│   ├── ingest: IngestConfig
│   │   └── feeds[]: IngestFeed
│   ├── data_processing: DataProcessingConfig
│   │   ├── historical_data_engine: str
│   │   └── streaming_data_engine: str
│   ├── state_engine: CanonicalStateEngineConfig
│   │   └── bar_groups[]: CanonicalBarGroup
│   │       ├── bars[]: BarConfig
│   │       ├── snapshots: SnapshotConfig
│   │       └── trade_signing: TradeSigningConfig
│   ├── label_engine?: LabelEngineConfig
│   │   └── labels[]: LabelDefinition
│   ├── feature_engine: FeatureEngine
│   │   └── features[]: FeatureDefinition
│   └── sinks[]: SinkConfig
├── feed_providers: Dict[str, FeedProvider]
│   └── data_types: Dict[str, DataTypeSchema]
│       ├── field_mappings[]: FieldMapping
│       └── schema: Dict[str, SchemaAttribute]
├── cdm_entities: Dict[str, CDMEntity]
│   └── attributes: Dict[str, Attribute]
└── local?: LocalConfig
    ├── engine[]: EngineConfig
    └── feed_provider_credentials[]: Credential

Covers the root ProjectConfig, ingest and data processing configs, feed providers, CDM entities with field schemas, state/label/feature engine configs, engine connections, and local overrides.

→ Metadata Specifications

Data Quality Tests

Four-layer validation framework, with the primary user-facing layer being YAML-declared dbt tests in feed provider configs. Tests are defined per-column or per-table and auto-generated into dbt test cases.

tests:
  - not_null
  - dbt_utils.unique_combination_of_columns:
      combination_of_columns: [symbol, trade_time, trade_id]
  - dbt_utils.accepted_range:
      min_value: 0
      inclusive: false
  - dbt_utils.recency:
      datepart: hour
      field: event_time
      interval: 24

Covers column-level tests (not_null, accepted_range, accepted_values) and table-level tests (unique_combination, recency, expression_is_true).

→ Data Quality Tests

Shared Design

All four specifications follow the same principles:

Declarative — intent in YAML, not imperative code
Engine-agnostic — one definition, multiple backend targets (SQL adapters for QFSQL, lowering registry for formula language)
Compile-time validated — errors caught before execution: Pydantic models for config schemas, Pandera for ingestion, IR DAG contracts for formulas, dbt tests for warehouse data
Registry-dispatched — backend implementations registered via decorators; add a new engine without changing definitions

QFSQL​

Formula Language​

Metadata Specifications​

Data Quality Tests​

Shared Design​

QFSQL

Formula Language

Metadata Specifications

Data Quality Tests

Shared Design