Skip to main content

dbt Generator

DataInfra automatically generates complete dbt projects from metadata definitions — no manual SQL modeling.


Why dbt?

dbt (data build tool) is the standard for data transformation in analytics engineering. Rather than building a proprietary transformation layer, QuantFlow generates dbt projects to leverage its entire ecosystem:

CapabilityWhat QuantFlow Gets For Free
MaterializationsIncremental, ephemeral, table, view — engine-specific strategies (insert_overwrite on BigQuery, merge on Snowflake)
Testing frameworkAuto-generated uniqueness, not-null, referential integrity, and custom expression tests — dbt test runs them all
Data lineagedbt docs generate produces column-level lineage graphs across staging → CDM → feature models
CI/CD integrationNative GitHub Actions, pre-commit hooks, Slim CI for incremental testing
ObservabilityElementary integration for anomaly detection, alerting, and reporting
ExtensibilityUsers add custom models that ref() the auto-generated CDM — best of both worlds

The generated project is a standard dbt project — users own it, extend it, and run it with the tools they already know. No lock-in.


Generator Architecture

QuantflowMetadata → DBTConfigAdapter → 6 Sub-Generators → dbt Project
GeneratorOutputPurpose
MacrosGeneratormacros/cdm_adapter.sqlEngine-specific SQL macros with dispatch wrappers
ProjectGeneratordbt_project.ymlMaterialization strategies (staging=ephemeral, cdm=incremental)
SourcesGeneratormodels/sources.ymlRaw data sources with column types and tests
ProcessingGeneratormodels/staging/*.sqlPer-vendor staging models with field mapping SQL
UnionGeneratormodels/cdm/*.sqlCDM union models merging data across all venues
ProfilesGeneratorprofiles.ymlConnection profiles per engine

Generation Pipeline

  1. Metadata ingestionQuantflowMetadata converted to normalized config by DBTConfigAdapter
  2. Engine resolution → Target engine (bigquery, snowflake, duckdb, trino) resolved from config
  3. Template rendering → Each generator renders Jinja2 templates with the normalized config
  4. SQL translation → QFSQL expressions translated to engine-native SQL by QFSQLTranslator
  5. Project output → Complete dbt project written to disk, ready for dbt run

QFSQL Translation

Field mappings use QFSQL (QuantFlow SQL), an engine-agnostic expression language. Each engine adapter defines 60+ translation rules:

# QFSQL:
transformation: "str_concat('binance_', cast_safe(trade_id, string))"

# BigQuery: CONCAT('binance_', SAFE_CAST(trade_id AS STRING))
# Snowflake: CONCAT('binance_', TRY_CAST(trade_id AS STRING))
# DuckDB: CONCAT('binance_', TRY_CAST(trade_id AS STRING))

QFSQL rule categories (60 functions):

  • Cast/Convert (7): cast_safe, cast, try_cast, to_string, to_integer, to_decimal, to_boolean
  • Date/Time (11): timestamp_ms, timestamp_us, timestamp_ns, timestamp_seconds, date_add, date_diff, date_trunc, extract, format_timestamp, current_timestamp, current_date
  • String (15): str_concat, str_length, str_lower, str_upper, str_trim, str_replace, str_substring, str_contains, str_starts_with, str_ends_with, regexp_extract, regexp_replace, regexp_contains, str_split, str_join
  • Numeric/Aggregate (16): coalesce, null_if, if_null, round, floor, ceiling, abs, greatest, least, sum, avg, count, count_distinct, min, max, str_agg
  • Conditional (2): if, case_when
  • Array/JSON (6): array, array_length, array_contains, struct, json_extract, json_serialize
  • Hash (3): md5, sha256, uuid

For the complete syntax, see the QFSQL Reference.


Field Mappings from Feed Providers

Field mappings are defined in feed provider YAML and consumed by the dbt generator to produce staging models. Each mapping specifies a target CDM column, an optional source field, and a QFSQL transformation:

field_mappings:
- target: trade_id
source: trade_id
transformation: "str_concat('binance_', cast_safe(trade_id, string))"
- target: price
source: price
transformation: "cast_safe(price, numeric)"
- target: event_time
source: trade_time
transformation: "timestamp_ms(cast_safe(trade_time, bigint))"
is_time_filter_field: true
- target: venue
transformation: "'binance'" # constant injection — no source field
- target: processed_time
transformation: "current_timestamp()" # computed column — no source field

At least one mapping must set is_time_filter_field: true for incremental loading. Mappings without a source field inject constants or computed values. See Ingestion & Feed Providers for the full feed provider configuration.


Engine Adapter System

A plugin-based architecture. Each engine adapter (BigQueryAdapter, SnowflakeAdapter, DuckDBAdapter, TrinoAdapter) defines:

CapabilityBigQuerySnowflakeDuckDB/Trino
incremental_strategyinsert_overwritemergedelete+insert
Partition supportYes (hour)DDL-basedNo
Cluster supportYesDDL-basedNo
source_database()Project IDDatabaseNone
source_schema()DatasetSchemaSchema

All adapters register via @register_adapter and provide macro_definitions() using the {engine}__{macro_name} dbt dispatch convention.


Jinja2 Templates

TemplateRenders
project.yml.j2dbt_project.yml with model materializations, vars, paths
sources.yml.j2Source definitions with column types, tags, meta
vendor_dataprocessing.sql.j2WITH source AS ({{ source(...) }}) SELECT ... with QFSQL field mappings
union_model.sql.j2UNION ALL across vendors, composite cdm_id, incremental filter
profiles.yml.j2Engine-specific connection blocks

Templates use dbt pass-through globals (source(), ref(), config(), is_incremental()) and dispatch macros (to_timestamp(), concat(), cdm_staging_config()) — all rendered as literal {{ ... }} Jinja2 strings that dbt resolves at runtime.