Type System
The type system defines the vocabulary for declaring features: 6 computation primitives, formula-based FeatureTypes, the template→instance lifecycle, typed parameter validation, and registries for loading YAML specs.
Computation Primitives
Every IR node is classified into one of 6 primitives. The primitive determines the execution model — how data flows through the node.
| Primitive | Role | Execution Model | Example |
|---|---|---|---|
SOURCE | Data input boundary | Reads from external CDM table | cdm_trade_enriched, cdm_time_bars |
TRANSFORM | Single-row stateless | Each output row depends only on current input row | a + b, sign(x), abs(x), sqrt(x) |
WINDOW | Fixed-size trailing window | Each output row depends on last N input rows | rolling_mean(close, 20), diff(price, 1) |
STATE | Cumulative across all prior rows | Each output row depends on entire prior history | ema(price, 0.94), cumsum(ofi, 0) |
EVENT | Event-driven trigger | Computation triggered by barrier/condition (reserved) | Threshold crossing, barrier hit |
SINK | Data output boundary | Marks final output column | export |
Why these 6? Every financial computation fits one of these patterns. TRANSFORM handles arithmetic and scalar math (fast, parallelizable). WINDOW handles rolling aggregations (fixed memory, O(N) per row). STATE handles unbounded accumulators (expensive but necessary for EMA, cumulative sums). EVENT is reserved for future event-driven triggers. SOURCE and SINK demarcate the boundaries — where data enters and exits the computation graph.
FeatureType (Template / Blueprint)
A FeatureType is a parameterized feature definition — a blueprint, not a concrete computation. It declares:
- Inputs — CDM data sources or upstream feature columns
- Parameters — typed, validated, with defaults and constraints
- Formula — a mathematical expression compiled to an IR DAG by the AST compiler
- Output column — what this feature produces
Formula-Based Definition (Preferred)
name: ofi
description: Order Flow Imbalance - measures net order flow
category: order_flow
dimension: signal
version: v0.9.0 (Beta)
required_inputs:
- best_bid_size
- best_ask_size
output_column: ofi
parameters:
window:
type: integer
description: Lookback window for differences
default: 1
constraints:
min: 1
max: 100
decay:
type: float
description: Decay factor for EMA smoothing
default: 0.95
constraints:
min: 0.0
max: 1.0
formula: "cumsum((diff(best_bid_size, window) - diff(best_ask_size, window)), 0)"
The formula: field compiles through Python's ast module into IRNodes — no manual DAG wiring needed. The compiler automatically extracts column references, resolves $parameter placeholders, constructs the dependency graph, and produces a validated IR DAG ready for lowering.
$placeholder Resolution
In a formula, bare names refer to CDM columns. Names prefixed with $ refer to configurable parameters. Resolution happens at compile time:
best_bid_size→ column reference (TRANSFORM.col)$window→ parameter value (e.g.,5) → literal (TRANSFORM.lit)
Steps-Based Definition (Legacy)
The older steps: format is still supported for backward compatibility:
steps:
- primitive: TRANSFORM
op: sub
params: { a: best_bid_size, b: best_ask_size }
- primitive: WINDOW
op: diff
params: { input: bid_ask_diff, window: $window }
- primitive: STATE
op: cumsum
params: { input: diff_result, init: 0 }
All new FeatureTypes use the formula format. See the AST Compiler for the full formula language reference.
ComputationStep (Internal IR Bridge)
ComputationStep is an internal representation that bridges FeatureType definitions and the IR DAG. When a formula is compiled, the resulting IRNodes are converted to ComputationSteps for backward compatibility with existing pipeline infrastructure. When a FeatureType is instantiated with concrete parameters, each ComputationStep resolves its $placeholders and becomes part of the IR DAG.
Most users never interact with ComputationStep directly — it's an implementation detail of the formula → AST → IR compilation pipeline. The public interface is the formula: field in the FeatureType YAML.
FeatureInstance (Concrete Feature)
A FeatureInstance is a fully resolved feature, ready to execute. It binds a FeatureType template to specific parameter values, data sources, and runtime context:
FeatureInstance(
name="close_zscore_20",
feature_type=rolling_zscore, # Reference to FeatureType template
params={"column": "close", "window": 20},
sources=["cdm_time_bars"], # Concrete data sources
)
Template → Instance separation is a core design decision. One FeatureType (e.g., rolling_zscore) can produce many FeatureInstance objects — close_zscore_20, volume_zscore_60, etc. — each with different parameterizations. This avoids code duplication and ensures consistent semantics across parameterizations.
ParameterDef (Typed Parameters)
Parameters are validated via ParameterDef — a type-safe parameter definition with 7 dtype categories:
| Dtype | Constraints | Example |
|---|---|---|
INTEGER | min, max | window size, lag, horizon |
FLOAT | min, max | decay factor, threshold, epsilon |
STRING | pattern (regex) | bar identifier, column name |
BOOLEAN | — | feature flags, toggles |
COLUMN | must be valid identifier | column name references |
LIST | — | list of values |
DICT | — | key-value parameter maps |
Each parameter in a FeatureType YAML definition is validated at parse time against its ParameterDef. Invalid values (e.g., window: "abc" for an INT parameter) fail immediately during project loading — not mid-computation.
TypeResolver
The TypeResolver maps canonical dtypes (STRING, INT, FLOAT, DECIMAL, TIMESTAMP) to backend-specific runtime types. Since FeatureDAG targets multiple backends (Polars, DolphinDB), the same canonical type must resolve correctly in each context:
| Canonical | Polars | DolphinDB |
|---|---|---|
STRING | pl.Utf8 | STRING |
INT | pl.Int64 | LONG |
FLOAT | pl.Float64 | DOUBLE |
DECIMAL | pl.Decimal | DECIMAL64 |
TIMESTAMP | pl.Datetime | TIMESTAMP |
Type resolution is part of the lowering phase — canonical types flow through the IR unchanged, and the lowering registry translates them at expression-generation time.
Registries
Two registries load feature definitions from YAML/JSON into validated Python objects at project startup:
| Registry | Loads | Keys |
|---|---|---|
FeatureTypeRegistry | FeatureType definitions from feature_types/ | feature type name |
ParameterTypeRegistry | Custom parameter validators | dtype |
All registries validate at load time via Pydantic. A malformed definition never reaches the pipeline — it fails during from_project().
from quantflow.metadata import load_metadata
meta = load_metadata(project_dir=".")
registry = meta.feature_type_registry
ft = registry.get("ofi") # Validated FeatureType object