Metadata-driven data infrastructure for quantitative finance
DataInfra is the engine-agnostic, metadata-driven data infrastructure layer of QuantFlow. It provides a unified system for ingesting market data from multiple sources, normalizing it into a Common Data Model (CDM), generating production-ready dbt pipelines, and enforcing data quality controls.
Without a CDM, each dataset requires custom transformation logic. Raw tick data, order book snapshots, and reference data arrive in incompatible schemas. DataInfra eliminates this fragmentation by enforcing a unified schema across all sources — ensuring every downstream component operates on consistent, validated inputs.
The central metadata registry that defines and governs all data assets:
One metadata module. All engines. No schema duplication across environments.
External data sources configured declaratively with field mappings and quality tests:
Providers are configured once in YAML. New venues or data sources are a configuration change, not a code change.
| Stage | What It Does | Options |
|---|---|---|
| Reader | Download raw data from external source | DatabentoReader, HTTPReader |
| Processor | Transform and validate | Decompressor (gzip/snappy/zstd) |
| Writer | Persist to target engine | DuckDB, OpenLakehouse, BigQuery, Snowflake, Databricks |
| Layer | What It Does |
|---|---|
| Staging | Raw source → typed, validated staging tables with field-level tests |
| Intermediate | Cross-source joins, deduplication, enrichment, business logic |
| Mart | Analysis-ready tables for feature computation and downstream consumption |
Auto-generated from CDM definitions. Follows dbt best practices. Zero manual SQL.
OpenLakehouse
DolphinDBFour-layer validation ensures data integrity at every stage, powered by Elementary for dbt-native monitoring and anomaly detection:
Test failures are reported with row-level diagnostics. Configurable severity: warn, error, or abort. Elementary provides dashboards, alerts, and lineage-aware monitoring out of the box.