DolphinDB
DolphinDB is a high-performance distributed time-series database and stream processing engine. QuantFlow uses it as the streaming execution backend — all real-time feature computation runs inside the DolphinDB cluster, with Python acting only as the deployment and monitoring control plane.
Why DolphinDB
QuantFlow evaluated multiple streaming engines and chose DolphinDB for several reasons that align with the demands of production quantitative finance:
Native time-series engine. DolphinDB is purpose-built for tick-level financial data. Its storage engine, query language, and stream processing primitives are designed for time-series workloads — not retrofitted from a general-purpose database. Concepts like createTimeSeriesEngine, createReactiveStateEngine, asof join, and context by are first-class language constructs, not add-on plugins.
Stream-table architecture. DolphinDB's shared stream tables enable a clean pub/sub pipeline: an upstream engine writes to a stream table, a downstream engine subscribes to it. Stages are decoupled — each can be deployed, monitored, and torn down independently. This maps directly to QuantFlow's ingest → process → feature pipeline stages.
Columnar execution with vectorized operations. Like Polars (the batch backend), DolphinDB operates on columns, not rows. Rolling window functions (mavg, msum, mstd, mcorr), arithmetic, and conditional logic are all vectorized. A single expression processes millions of rows in milliseconds.
In-process deployment — no Python in the hot path. Once deployed, the entire pipeline runs inside the DolphinDB server process. Python generates deployment scripts (DSL strings), sends them to the cluster, and disconnects. The streaming pipeline continues running autonomously. This avoids the GIL, serialization overhead, and network round-trips that would come with Python-in-the-loop.
Distributed scale. DolphinDB clusters can scale horizontally across nodes, with automatic data partitioning and distributed query execution. A single cluster can handle thousands of instruments with tick-level updates.
Key Concepts
Stream Tables
A stream table is a special table type that acts as a pub/sub channel. Publishers append rows; subscribers receive them asynchronously. Stream tables are the only communication mechanism between QuantFlow pipeline stages — no REST calls, no message queues, no shared memory.
# Created via Python (deployment script)
share streamTable(100000:0, `symbol`venue`event_time`price`volume,
[SYMBOL, SYMBOL, TIMESTAMP, DOUBLE, DOUBLE]) as trades
Reactive State Engine
A createReactiveStateEngine processes events one at a time, applying metric expressions to each arriving row and emitting results to an output table. This is the primary engine type for tick-mode feature computation.
createReactiveStateEngine(
name="ofi_engine",
metrics=[<cumsum((deltas(best_bid_size)-deltas(best_ask_size)), 0)>],
dummyTable=trades,
outputTable=ofi_output,
keyColumn=`symbol`venue)
Key constraint (Community Edition): metrics cannot use @state functions or perform arithmetic on array vector elements. QuantFlow works around this with handler functions (see below) and consolidated deployment strategies.
Time Series Engine
A createTimeSeriesEngine aggregates events over fixed time windows, emitting one row per window. Used for bar-mode and tick_to_bar Phase B feature computation.
createTimeSeriesEngine(
name="bar_engine",
windowSize=60000, # 60-second bars
metrics=[<first(price)>, <last(price)>, <max(price)>, <min(price)>, <sum(volume)>],
dummyTable=trades,
outputTable=bars_output,
keyColumn=`symbol`venue)
Handler Functions
When engine metric expressions can't express a computation (array operations on Community Edition, complex state machines), DolphinDB handler functions fill the gap. A handler is a DolphinDB function that subscribes to a stream table, iterates over arriving batches row-by-row, and appends results to an output table.
def handler(msg):
output = array(DOUBLE, msg.size())
for i in 0..(msg.size()-1):
# Parse JSON, compute feature, store result
bids = fromStdJson(msg.bids[i])
asks = fromStdJson(msg.asks[i])
output[i] = (sum(bids) - sum(asks)) / (sum(bids) + sum(asks) + 1e-8)
append!(output_table, output)
QuantFlow's consolidated LOB handler parses the bids/asks JSON once per row and computes all LOB features (depth_imbalance, book_pressure, weighted_imbalance) in a single loop, avoiding redundant JSON parsing.
How QuantFlow Uses DolphinDB
DSL Generation, Not Data Processing
QuantFlow never processes streaming data in Python. Instead, it generates DolphinDB DSL scripts as Python strings and executes them via conn.run(). Every deployment — table creation, engine creation, subscription, handler definition — is a generated script. This is possible because DolphinDB's entire streaming topology is defined declaratively in its scripting language.
Deployment Pattern
Python (QuantFlow) DolphinDB Cluster
───────────────── ─────────────────
1. Generate CREATE scripts
↓
conn.run(script) ─────────────────────→ 2. Execute scripts
3. Create stream tables
4. Create engines
5. Subscribe engines
6. Generate handler scripts
↓
conn.run(script) ─────────────────────→ 7. Deploy handlers
8. Python disconnects
9. Pipeline runs autonomously
- Data flows through stages
- Features compute continuously
- Output tables fill
10. Monitor (periodic)
↓
conn.run("getStreamEngineStat()") ──→ 11. Return health metrics
Community Edition Considerations
QuantFlow targets DolphinDB Community Edition (free, widely available), which has two notable constraints:
-
Array arithmetic in engine metrics: Array vector element access inside
createReactiveStateEnginemetrics does not support arithmetic operations. The handler pattern works around this (see Streaming Feature Engine). -
No
@statefunctions in metrics: Stateful accumulations require handler-based or multi-engine workarounds for complex state machines.
Enterprise Edition lifts both constraints, enabling more concise engine definitions.
Performance at Scale
Quantitative trading generates data volumes that break conventional databases. A single liquid instrument produces ~10M ticks/day across trades and order book updates. A 1,000-instrument universe generates 10B+ events daily. At this scale, every millisecond of query latency compounds across thousands of features, hundreds of instruments, and continuous real-time evaluation.
DolphinDB was built for exactly this problem.
Architectural Speed
DolphinDB achieves its performance through several architectural decisions that align perfectly with financial time-series workloads:
C++ native core. The entire engine — storage, query execution, stream processing, and the scripting language runtime — is implemented in C++ with no garbage collection. SIMD vectorization accelerates aggregation operations on modern CPUs. There is no JVM warmup, no GC pause, no interpreted-language overhead in the data path.
Columnar storage and execution. Data is stored and processed column-by-column, not row-by-row. A rolling_mean(close, 20) query reads only the close column, touches only the relevant rows, and applies a vectorized window function. This is the same execution model as Polars (QuantFlow's batch backend) and Apache Arrow — zero impedance mismatch between batch research and streaming production.
In-process streaming. Every engine, handler, and subscription runs inside a single DolphinDB process. Data flows through the pipeline stages via in-memory stream tables — no network hops, no serialization, no context switches between stages. A tick arriving at the ingest stage reaches the feature output stage in microseconds.
Native time-series primitives. Functions like mavg, mstd, mcorr, asof join, and context by are implemented as native C++ operators, not SQL wrappers. They operate directly on compressed columnar data with memory access patterns optimized for sequential scans.
The Scale of Real-World Trading
QuantFlow's streaming pipeline on DolphinDB handles workloads that would be infeasible with Python-in-the-loop architectures:
| Scale | What It Means |
|---|---|
| 100+ instruments | Simultaneous tick-by-tick feature computation across an entire liquid universe |
| 10M+ events/day per instrument | Every trade, every quote update, every order book change processed individually |
| 50+ features per instrument | Full feature deployment — signal, quality, regime, stability, execution features |
| Sub-millisecond latency | From market data arrival to feature output update |
| 24/7 continuous operation | No batch windows, no restart cycles — true continuous streaming for crypto markets |
Real-World Performance
Independent benchmarks and user reports consistently place DolphinDB ahead of general-purpose time-series databases and on par with or exceeding specialized financial databases:
- Ingestion: 10M+ rows/second sustained ingest on commodity hardware
- Aggregation queries: 100M+ rows/second for
group by symbolaggregation over tick data - Rolling window functions: 10-50x faster than Python/Pandas, 3-10x faster than general-purpose columnar databases (ClickHouse, DuckDB) on financial time-series patterns
- Streaming latency: Single-digit milliseconds end-to-end from ingest to feature output
The performance gap widens as data volumes grow — DolphinDB's vectorized execution and columnar compression scale linearly with data size while row-based and interpreted approaches degrade super-linearly.
Learn More
- DolphinDB Official Benchmarks — detailed performance comparisons across ingestion, query, and streaming workloads
- DolphinDB YouTube Channel — technical deep dives, benchmarking demos, and performance optimization guides
- STAC-M3 Benchmarks — the independent industry standard for financial time-series database performance (search for DolphinDB results)
- DolphinDB Blog — architecture deep dives, customer case studies, and performance tuning guides
Cross-Backend Consistency
DolphinDB and Polars (the batch backend) consume the same IR DAG — no duplicate feature definitions. Cross-backend consistency is validated through:
- Shared lowering registry:
@register(Primitive.WINDOW, "rolling_mean", backend="dolphindb")and@register(Primitive.WINDOW, "rolling_mean")(Polars default) register backend-specific implementations for the same(primitive, op)key. - Stable numerical methods: The DDB
rolling_stduses a two-pass stable formula (sqrt(mavg((x - mavg(x,w))^2, w) * w/(w-1))) to match Polars'ddof=1behavior and avoid catastrophic cancellation on large-magnitude inputs. - Identical IR: Both backends receive the same IRNode DAG from
QuantflowIRBuilder, ensuring structural equivalence before lowering.