Skip to main content

Overview

QuantFlow's declarative specification system — two engine-agnostic languages plus the config schema and quality rules they operate within. Everything is defined in YAML, validated at compile time, and compiled to backend-specific execution.

QFSQLFormula LanguageMetadata SpecsTest Specs
WhatSQL dialectMath DSLConfig schemaQuality rules
StageDataInfraFeatureDAGAll layersDataInfra
AudienceData engineersQuant researchersPlatform engineersData engineers
Used intransformation:formula:quantflow_project.ymltests: blocks

QFSQL

Engine-agnostic SQL dialect for DataInfra field mappings. Each function translates to native SQL across four engines — the same transformation string produces correct SQL everywhere.

# Feed provider field mapping
field_mapping:
- target: trade_price
source: raw_price
transformation: "cast_safe(raw_price, numeric)"
- target: event_time
transformation: "timestamp_ms(cast_safe(trade_timestamp, bigint))"

Functions organized across cast/convert, date/time, string, numeric/aggregate, conditional, array/JSON, and hash categories. Each function documents per-engine SQL output.

QFSQL Reference


Formula Language

Mathematical DSL for FeatureType definitions. Formula strings are parsed via Python's ast module, compiled to an IR DAG, and lowered to Polars or DolphinDB expressions — same definitions, dual runtime — no duplicate implementations.

# FeatureType YAML definition
formula: cumsum((diff(best_bid_size, window) - diff(best_ask_size, window)), 0)

Functions span arithmetic, unary transforms, window aggregates, lag/diff, autocorrelation, entropy, state accumulators, and specialized indicators. Parameters are prefixed with $ and resolved at compile time from the parameters: block.

Formula Language Reference


Metadata Specifications

Pydantic model hierarchy governing all QuantFlow YAML configuration. Every config file is validated against these models at load time — type safety and structural correctness enforced before any pipeline runs.

QuantflowMetadata
├── project: ProjectConfig
│ ├── ingest: IngestConfig
│ │ └── feeds[]: IngestFeed
│ ├── data_processing: DataProcessingConfig
│ │ ├── historical_data_engine: str
│ │ └── streaming_data_engine: str
│ ├── state_engine: CanonicalStateEngineConfig
│ │ └── bar_groups[]: CanonicalBarGroup
│ │ ├── bars[]: BarConfig
│ │ ├── snapshots: SnapshotConfig
│ │ └── trade_signing: TradeSigningConfig
│ ├── label_engine?: LabelEngineConfig
│ │ └── labels[]: LabelDefinition
│ ├── feature_engine: FeatureEngine
│ │ └── features[]: FeatureDefinition
│ └── sinks[]: SinkConfig
├── feed_providers: Dict[str, FeedProvider]
│ └── data_types: Dict[str, DataTypeSchema]
│ ├── field_mappings[]: FieldMapping
│ └── schema: Dict[str, SchemaAttribute]
├── cdm_entities: Dict[str, CDMEntity]
│ └── attributes: Dict[str, Attribute]
└── local?: LocalConfig
├── engine[]: EngineConfig
└── feed_provider_credentials[]: Credential

Covers the root ProjectConfig, ingest and data processing configs, feed providers, CDM entities with field schemas, state/label/feature engine configs, engine connections, and local overrides.

Metadata Specifications


Data Quality Tests

Four-layer validation framework, with the primary user-facing layer being YAML-declared dbt tests in feed provider configs. Tests are defined per-column or per-table and auto-generated into dbt test cases.

tests:
- not_null
- dbt_utils.unique_combination_of_columns:
combination_of_columns: [symbol, trade_time, trade_id]
- dbt_utils.accepted_range:
min_value: 0
inclusive: false
- dbt_utils.recency:
datepart: hour
field: event_time
interval: 24

Covers column-level tests (not_null, accepted_range, accepted_values) and table-level tests (unique_combination, recency, expression_is_true).

Data Quality Tests


Shared Design

All four specifications follow the same principles:

  1. Declarative — intent in YAML, not imperative code
  2. Engine-agnostic — one definition, multiple backend targets (SQL adapters for QFSQL, lowering registry for formula language)
  3. Compile-time validated — errors caught before execution: Pydantic models for config schemas, Pandera for ingestion, IR DAG contracts for formulas, dbt tests for warehouse data
  4. Registry-dispatched — backend implementations registered via decorators; add a new engine without changing definitions