Single definition, dual runtime — batch and streaming from the same YAML
The Execution Layer sits below the FeatureDAG compiler. It receives resolved FeatureInstance objects and produces computed feature values at the destination — Arrow tables for batch (research, model training) and shared stream tables for streaming (live trading).
Feature definitions (YAML → FeatureType → FeatureInstance) are identical across both paths. The divergence happens only at the expression-generation layer.
Polars — high-performance DataFrame library built on Apache Arrow and Rust.
collect() callFeatures are computed via with_columns chaining — Polars' query optimizer applies predicate pushdown, column pruning, and operator fusion across the entire plan.
DolphinDB — high-performance distributed time-series database with built-in stream processing.
Sub-millisecond feature latency. Idempotent deployment with auto-cleanup. Expression folding inlines intermediate steps.
Both backends support three computation modes, configured per feature:
| Mode | When Features Compute | Use Case |
|---|---|---|
| tick | Per-event (every tick) | HFT signals, real-time spread monitoring, execution algorithms |
| bar | Per-bar (OHLCV boundary) | Bar-native strategies — pattern recognition, bar-level momentum |
| tick_to_bar | Tick features → bar aggregate | Microstructure features projected onto a bar grid for ML training |
Both execution paths share the same compilation infrastructure:
A feature's formula is compiled to an IR DAG once. The lowering registry dispatches (Primitive, op) pairs to backend-specific functions via @register. Adding a new engine means registering lowering functions — no IR changes needed.
DolphinDB is a high-performance distributed time-series database and streaming compute engine purpose-built for financial data. It combines a columnar storage engine, a vectorized computation runtime, and a built-in pub/sub streaming framework — all in a single platform.
| Capability | What It Enables |
|---|---|
| Shared Stream Tables | Zero-copy pub/sub between pipeline stages. Upstream writes, downstream subscribes — no serialization, no broker. |
| Reactive State Engine | Built-in streaming engine with windowed aggregations and stateful computations. Deploy via declarative script, no JVM. |
| Vectorized Execution | Columnar operations run at C++ speed on entire batches. SIMD-optimized and memory-contiguous. |
| In-Process Deployment | All engines, tables, and subscriptions live in a single DolphinDB process. No Python, no serialization in the hot path. |
| Idempotent Deployment | Every engine deployment script begins with try/catch cleanup. Safe to re-deploy after crashes or config changes. |
| Aspect | Batch (Polars) | Streaming (DolphinDB) |
|---|---|---|
| Runtime | Python process (single machine) | DolphinDB cluster (distributed) |
| Data model | Static Arrow tables | Unbounded stream tables |
| Trigger | Explicit run() call | Continuous — data arrival triggers computation |
| Feature grouping | Sequential per-feature | Consolidated engines by input table (~60% fewer) |
| Intermediate state | In-memory LazyFrame columns | Shared stream tables between engines |
| Error handling | Feature-level try/except, skip on failure | Engine-level monitoring, re-deployment |
| Primary use | Research, batch scoring | Live trading, real-time signal generation |
Both paths consume the same FeatureInstance objects — single definition, dual runtime.
@register(Primitive, op, backend=...) decorator-based dispatch. Same IR, any backend.