Challenges Facing Quantitative Teams
Quant teams don't lack models—they're bottlenecked by data, scale, and repetitive work.
Most Effort Goes to Non-Alpha Work
Teams spend the majority of their time preparing data, rebuilding pipelines, and recomputing features. This repetitive work doesn't generate alpha—but it consumes the resources that should.
Data Is Fragmented and Unstandardized
Tick data, order books, and cross-venue feeds arrive in incompatible schemas. Turning raw market data into usable signals is slow, repetitive, and requires custom transformation logic for every source.
Scale Is Crushing
Macroscopic alpha is elusive, so focus shifts to microscopic data—fine-grained but massive across instruments and time. Most tools buckle under the load, forcing trade-offs between data fidelity and runtime in both research and live trading.
Research and Production Diverge
Features built in research notebooks must be rewritten for production streaming. The two implementations drift—results diverge, bugs go undetected, and every new feature requires a full rewrite cycle.

Pipeline Orchestration
Declarative pipeline definition with automated dependency resolution, scheduling, and monitoring via Dagster.

Platform Components
Four components, one pipeline — each stage communicates through the Common Data Model, no tight coupling.
DataInfra
Metadata-driven data infrastructure — ingest, normalize, and validate market data into a unified Common Data Model
- Multi-source ingestion & normalization
- Automatic dbt pipeline generation
- Four-layer data quality enforcement
- Engine-agnostic — Snowflake, Databricks, BigQuery...
MarketState
Market structure reconstruction — bars, order books, and supervised labeling from raw CDM market data
- 11 bar types (fixed + information-driven)
- Order book snapshot reconstruction
- Triple barrier & trend scanning labels
- Single-pass Numba fused kernel
FeatureDAG
Compiler-based feature engine — define features in YAML, generate engine-agnostic DAG, compile to batch and streaming execution
- Formula DSL — ~40 math functions compiled to optimized DAG
- 4-stage compiler — AST → IR → Lowering → Execution
- 133 FeatureTypes across 6 dimensions
- 50+ compile-time schema contracts — catches errors before execution
Execution Layer
Dual-backend execution — Polars for batch research, DolphinDB for live streaming. Same feature definitions, two runtimes.
- Single definition, dual runtime — no duplicate implementations
- Deploy once, run continuously — Python-free hot path in DolphinDB
- Mode polymorphism: tick / bar / tick_to_bar
- Extensible by design — one protocol, any engine
Why QuantFlow
Designed for the realities of production quantitative finance — not just research notebooks.
Handles Real-World Scale
Built for tick-level trades and order book updates across thousands of instruments. Columnar engines for batch. Streaming engines for real-time. One platform for both.
Define Once, Deploy Everywhere
Define data schemas and features in YAML. No DAG wiring. No pipeline orchestration code. No separate batch and streaming implementations. One definition, two runtimes.
Engine-Agnostic by Design
DataInfra already supports popular data engines — BigQuery, Snowflake, Databricks, DuckDB, and more. Add new execution engines without rewriting pipelines. The IR layer keeps features portable across backends.
Ready to transform your quantitative workflow?
Stop rebuilding infrastructure. Define your data and features once, execute everywhere — from research notebooks to live trading.