Skip to main content
Distributed processing for large-scale market data

QuantFlow

Quantitative Data & Feature Platform

Transform raw market data into production-ready machine learning features — reducing effort from months to hours.

Challenges Facing Quantitative Teams

Quant teams don't lack models—they're bottlenecked by data, scale, and repetitive work.

🛠️
01

Most Effort Goes to Non-Alpha Work

Teams spend the majority of their time preparing data, rebuilding pipelines, and recomputing features. This repetitive work doesn't generate alpha—but it consumes the resources that should.

Efficiency
🗄️
02

Data Is Fragmented and Unstandardized

Tick data, order books, and cross-venue feeds arrive in incompatible schemas. Turning raw market data into usable signals is slow, repetitive, and requires custom transformation logic for every source.

Complexity
03

Scale Is Crushing

Macroscopic alpha is elusive, so focus shifts to microscopic data—fine-grained but massive across instruments and time. Most tools buckle under the load, forcing trade-offs between data fidelity and runtime in both research and live trading.

Scale
🔄
04

Research and Production Diverge

Features built in research notebooks must be rewritten for production streaming. The two implementations drift—results diverge, bugs go undetected, and every new feature requires a full rewrite cycle.

Consistency
QuantFlow pipeline: DataInfra → MarketState → FeatureDAG → Execution

Pipeline Orchestration

Declarative pipeline definition with automated dependency resolution, scheduling, and monitoring via Dagster.

Dagster pipeline orchestration for QuantFlow

Platform Components

Four components, one pipeline — each stage communicates through the Common Data Model, no tight coupling.

🏗️

DataInfra

Metadata-driven data infrastructure — ingest, normalize, and validate market data into a unified Common Data Model

  • Multi-source ingestion & normalization
  • Automatic dbt pipeline generation
  • Four-layer data quality enforcement
  • Engine-agnostic — Snowflake, Databricks, BigQuery...
Learn more →
📊

MarketState

Market structure reconstruction — bars, order books, and supervised labeling from raw CDM market data

  • 11 bar types (fixed + information-driven)
  • Order book snapshot reconstruction
  • Triple barrier & trend scanning labels
  • Single-pass Numba fused kernel
Learn more →
🧠

FeatureDAG

Compiler-based feature engine — define features in YAML, generate engine-agnostic DAG, compile to batch and streaming execution

  • Formula DSL — ~40 math functions compiled to optimized DAG
  • 4-stage compiler — AST → IR → Lowering → Execution
  • 133 FeatureTypes across 6 dimensions
  • 50+ compile-time schema contracts — catches errors before execution
Learn more →

Execution Layer

Dual-backend execution — Polars for batch research, DolphinDB for live streaming. Same feature definitions, two runtimes.

  • Single definition, dual runtime — no duplicate implementations
  • Deploy once, run continuously — Python-free hot path in DolphinDB
  • Mode polymorphism: tick / bar / tick_to_bar
  • Extensible by design — one protocol, any engine
Learn more →

Why QuantFlow

Designed for the realities of production quantitative finance — not just research notebooks.

📈

Handles Real-World Scale

Built for tick-level trades and order book updates across thousands of instruments. Columnar engines for batch. Streaming engines for real-time. One platform for both.

Define Once, Deploy Everywhere

Define data schemas and features in YAML. No DAG wiring. No pipeline orchestration code. No separate batch and streaming implementations. One definition, two runtimes.

🧩

Engine-Agnostic by Design

DataInfra already supports popular data engines — BigQuery, Snowflake, Databricks, DuckDB, and more. Add new execution engines without rewriting pipelines. The IR layer keeps features portable across backends.

Ready to transform your quantitative workflow?

Stop rebuilding infrastructure. Define your data and features once, execute everywhere — from research notebooks to live trading.