dbt Generator

DataInfra automatically generates complete dbt projects from metadata definitions — no manual SQL modeling.

Why dbt?

dbt (data build tool) is the standard for data transformation in analytics engineering. Rather than building a proprietary transformation layer, QuantFlow generates dbt projects to leverage its entire ecosystem:

Capability	What QuantFlow Gets For Free
Materializations	Incremental, ephemeral, table, view — engine-specific strategies (`insert_overwrite` on BigQuery, `merge` on Snowflake)
Testing framework	Auto-generated uniqueness, not-null, referential integrity, and custom expression tests — `dbt test` runs them all
Data lineage	`dbt docs generate` produces column-level lineage graphs across staging → CDM → feature models
CI/CD integration	Native GitHub Actions, pre-commit hooks, Slim CI for incremental testing
Observability	Elementary integration for anomaly detection, alerting, and reporting
Extensibility	Users add custom models that `ref()` the auto-generated CDM — best of both worlds

The generated project is a standard dbt project — users own it, extend it, and run it with the tools they already know. No lock-in.

Generator Architecture

QuantflowMetadata → DBTConfigAdapter → 6 Sub-Generators → dbt Project

Generator	Output	Purpose
`MacrosGenerator`	`macros/cdm_adapter.sql`	Engine-specific SQL macros with dispatch wrappers
`ProjectGenerator`	`dbt_project.yml`	Materialization strategies (staging=ephemeral, cdm=incremental)
`SourcesGenerator`	`models/sources.yml`	Raw data sources with column types and tests
`ProcessingGenerator`	`models/staging/*.sql`	Per-vendor staging models with field mapping SQL
`UnionGenerator`	`models/cdm/*.sql`	CDM union models merging data across all venues
`ProfilesGenerator`	`profiles.yml`	Connection profiles per engine

Generation Pipeline

Metadata ingestion → QuantflowMetadata converted to normalized config by DBTConfigAdapter
Engine resolution → Target engine (bigquery, snowflake, duckdb, trino) resolved from config
Template rendering → Each generator renders Jinja2 templates with the normalized config
SQL translation → QFSQL expressions translated to engine-native SQL by QFSQLTranslator
Project output → Complete dbt project written to disk, ready for dbt run

QFSQL Translation

Field mappings use QFSQL (QuantFlow SQL), an engine-agnostic expression language. Each engine adapter defines 60+ translation rules:

# QFSQL:
transformation: "str_concat('binance_', cast_safe(trade_id, string))"

# BigQuery:  CONCAT('binance_', SAFE_CAST(trade_id AS STRING))
# Snowflake: CONCAT('binance_', TRY_CAST(trade_id AS STRING))
# DuckDB:    CONCAT('binance_', TRY_CAST(trade_id AS STRING))

QFSQL rule categories (60 functions):

Cast/Convert (7): cast_safe, cast, try_cast, to_string, to_integer, to_decimal, to_boolean
Date/Time (11): timestamp_ms, timestamp_us, timestamp_ns, timestamp_seconds, date_add, date_diff, date_trunc, extract, format_timestamp, current_timestamp, current_date
String (15): str_concat, str_length, str_lower, str_upper, str_trim, str_replace, str_substring, str_contains, str_starts_with, str_ends_with, regexp_extract, regexp_replace, regexp_contains, str_split, str_join
Numeric/Aggregate (16): coalesce, null_if, if_null, round, floor, ceiling, abs, greatest, least, sum, avg, count, count_distinct, min, max, str_agg
Conditional (2): if, case_when
Array/JSON (6): array, array_length, array_contains, struct, json_extract, json_serialize
Hash (3): md5, sha256, uuid

For the complete syntax, see the QFSQL Reference.

Field Mappings from Feed Providers

Field mappings are defined in feed provider YAML and consumed by the dbt generator to produce staging models. Each mapping specifies a target CDM column, an optional source field, and a QFSQL transformation:

field_mappings:
  - target: trade_id
    source: trade_id
    transformation: "str_concat('binance_', cast_safe(trade_id, string))"
  - target: price
    source: price
    transformation: "cast_safe(price, numeric)"
  - target: event_time
    source: trade_time
    transformation: "timestamp_ms(cast_safe(trade_time, bigint))"
    is_time_filter_field: true
  - target: venue
    transformation: "'binance'"           # constant injection — no source field
  - target: processed_time
    transformation: "current_timestamp()"  # computed column — no source field

At least one mapping must set is_time_filter_field: true for incremental loading. Mappings without a source field inject constants or computed values. See Ingestion & Feed Providers for the full feed provider configuration.

Engine Adapter System

A plugin-based architecture. Each engine adapter (BigQueryAdapter, SnowflakeAdapter, DuckDBAdapter, TrinoAdapter) defines:

Capability	BigQuery	Snowflake	DuckDB/Trino
`incremental_strategy`	`insert_overwrite`	`merge`	`delete+insert`
Partition support	Yes (hour)	DDL-based	No
Cluster support	Yes	DDL-based	No
`source_database()`	Project ID	Database	`None`
`source_schema()`	Dataset	Schema	Schema

All adapters register via @register_adapter and provide macro_definitions() using the {engine}__{macro_name} dbt dispatch convention.

Jinja2 Templates

Template	Renders
`project.yml.j2`	`dbt_project.yml` with model materializations, vars, paths
`sources.yml.j2`	Source definitions with column types, tags, meta
`vendor_dataprocessing.sql.j2`	`WITH source AS ({{ source(...) }}) SELECT ...` with QFSQL field mappings
`union_model.sql.j2`	`UNION ALL` across vendors, composite `cdm_id`, incremental filter
`profiles.yml.j2`	Engine-specific connection blocks

Templates use dbt pass-through globals (source(), ref(), config(), is_incremental()) and dispatch macros (to_timestamp(), concat(), cdm_staging_config()) — all rendered as literal {{ ... }} Jinja2 strings that dbt resolves at runtime.

Why dbt?​

Generator Architecture​

Generation Pipeline​

QFSQL Translation​

Field Mappings from Feed Providers​

Engine Adapter System​

Jinja2 Templates​