dbt Generator
DataInfra automatically generates complete dbt projects from metadata definitions — no manual SQL modeling.
Why dbt?
dbt (data build tool) is the standard for data transformation in analytics engineering. Rather than building a proprietary transformation layer, QuantFlow generates dbt projects to leverage its entire ecosystem:
| Capability | What QuantFlow Gets For Free |
|---|---|
| Materializations | Incremental, ephemeral, table, view — engine-specific strategies (insert_overwrite on BigQuery, merge on Snowflake) |
| Testing framework | Auto-generated uniqueness, not-null, referential integrity, and custom expression tests — dbt test runs them all |
| Data lineage | dbt docs generate produces column-level lineage graphs across staging → CDM → feature models |
| CI/CD integration | Native GitHub Actions, pre-commit hooks, Slim CI for incremental testing |
| Observability | Elementary integration for anomaly detection, alerting, and reporting |
| Extensibility | Users add custom models that ref() the auto-generated CDM — best of both worlds |
The generated project is a standard dbt project — users own it, extend it, and run it with the tools they already know. No lock-in.
Generator Architecture
QuantflowMetadata → DBTConfigAdapter → 6 Sub-Generators → dbt Project
| Generator | Output | Purpose |
|---|---|---|
MacrosGenerator | macros/cdm_adapter.sql | Engine-specific SQL macros with dispatch wrappers |
ProjectGenerator | dbt_project.yml | Materialization strategies (staging=ephemeral, cdm=incremental) |
SourcesGenerator | models/sources.yml | Raw data sources with column types and tests |
ProcessingGenerator | models/staging/*.sql | Per-vendor staging models with field mapping SQL |
UnionGenerator | models/cdm/*.sql | CDM union models merging data across all venues |
ProfilesGenerator | profiles.yml | Connection profiles per engine |
Generation Pipeline
- Metadata ingestion →
QuantflowMetadataconverted to normalized config byDBTConfigAdapter - Engine resolution → Target engine (bigquery, snowflake, duckdb, trino) resolved from config
- Template rendering → Each generator renders Jinja2 templates with the normalized config
- SQL translation → QFSQL expressions translated to engine-native SQL by
QFSQLTranslator - Project output → Complete dbt project written to disk, ready for
dbt run
QFSQL Translation
Field mappings use QFSQL (QuantFlow SQL), an engine-agnostic expression language. Each engine adapter defines 60+ translation rules:
# QFSQL:
transformation: "str_concat('binance_', cast_safe(trade_id, string))"
# BigQuery: CONCAT('binance_', SAFE_CAST(trade_id AS STRING))
# Snowflake: CONCAT('binance_', TRY_CAST(trade_id AS STRING))
# DuckDB: CONCAT('binance_', TRY_CAST(trade_id AS STRING))
QFSQL rule categories (60 functions):
- Cast/Convert (7):
cast_safe,cast,try_cast,to_string,to_integer,to_decimal,to_boolean - Date/Time (11):
timestamp_ms,timestamp_us,timestamp_ns,timestamp_seconds,date_add,date_diff,date_trunc,extract,format_timestamp,current_timestamp,current_date - String (15):
str_concat,str_length,str_lower,str_upper,str_trim,str_replace,str_substring,str_contains,str_starts_with,str_ends_with,regexp_extract,regexp_replace,regexp_contains,str_split,str_join - Numeric/Aggregate (16):
coalesce,null_if,if_null,round,floor,ceiling,abs,greatest,least,sum,avg,count,count_distinct,min,max,str_agg - Conditional (2):
if,case_when - Array/JSON (6):
array,array_length,array_contains,struct,json_extract,json_serialize - Hash (3):
md5,sha256,uuid
For the complete syntax, see the QFSQL Reference.
Field Mappings from Feed Providers
Field mappings are defined in feed provider YAML and consumed by the dbt generator to produce staging models. Each mapping specifies a target CDM column, an optional source field, and a QFSQL transformation:
field_mappings:
- target: trade_id
source: trade_id
transformation: "str_concat('binance_', cast_safe(trade_id, string))"
- target: price
source: price
transformation: "cast_safe(price, numeric)"
- target: event_time
source: trade_time
transformation: "timestamp_ms(cast_safe(trade_time, bigint))"
is_time_filter_field: true
- target: venue
transformation: "'binance'" # constant injection — no source field
- target: processed_time
transformation: "current_timestamp()" # computed column — no source field
At least one mapping must set is_time_filter_field: true for incremental loading. Mappings without a source field inject constants or computed values. See Ingestion & Feed Providers for the full feed provider configuration.
Engine Adapter System
A plugin-based architecture. Each engine adapter (BigQueryAdapter, SnowflakeAdapter, DuckDBAdapter, TrinoAdapter) defines:
| Capability | BigQuery | Snowflake | DuckDB/Trino |
|---|---|---|---|
incremental_strategy | insert_overwrite | merge | delete+insert |
| Partition support | Yes (hour) | DDL-based | No |
| Cluster support | Yes | DDL-based | No |
source_database() | Project ID | Database | None |
source_schema() | Dataset | Schema | Schema |
All adapters register via @register_adapter and provide macro_definitions() using the {engine}__{macro_name} dbt dispatch convention.
Jinja2 Templates
| Template | Renders |
|---|---|
project.yml.j2 | dbt_project.yml with model materializations, vars, paths |
sources.yml.j2 | Source definitions with column types, tags, meta |
vendor_dataprocessing.sql.j2 | WITH source AS ({{ source(...) }}) SELECT ... with QFSQL field mappings |
union_model.sql.j2 | UNION ALL across vendors, composite cdm_id, incremental filter |
profiles.yml.j2 | Engine-specific connection blocks |
Templates use dbt pass-through globals (source(), ref(), config(), is_incremental()) and dispatch macros (to_timestamp(), concat(), cdm_staging_config()) — all rendered as literal {{ ... }} Jinja2 strings that dbt resolves at runtime.