Type System

The type system defines the vocabulary for declaring features: 6 computation primitives, formula-based FeatureTypes, the template→instance lifecycle, typed parameter validation, and registries for loading YAML specs.

Computation Primitives

Every IR node is classified into one of 6 primitives. The primitive determines the execution model — how data flows through the node.

Primitive	Role	Execution Model	Example
`SOURCE`	Data input boundary	Reads from external CDM table	`cdm_trade_enriched`, `cdm_time_bars`
`TRANSFORM`	Single-row stateless	Each output row depends only on current input row	`a + b`, `sign(x)`, `abs(x)`, `sqrt(x)`
`WINDOW`	Fixed-size trailing window	Each output row depends on last N input rows	`rolling_mean(close, 20)`, `diff(price, 1)`
`STATE`	Cumulative across all prior rows	Each output row depends on entire prior history	`ema(price, 0.94)`, `cumsum(ofi, 0)`
`EVENT`	Event-driven trigger	Computation triggered by barrier/condition (reserved)	Threshold crossing, barrier hit
`SINK`	Data output boundary	Marks final output column	`export`

Why these 6? Every financial computation fits one of these patterns. TRANSFORM handles arithmetic and scalar math (fast, parallelizable). WINDOW handles rolling aggregations (fixed memory, O(N) per row). STATE handles unbounded accumulators (expensive but necessary for EMA, cumulative sums). EVENT is reserved for future event-driven triggers. SOURCE and SINK demarcate the boundaries — where data enters and exits the computation graph.

FeatureType (Template / Blueprint)

A FeatureType is a parameterized feature definition — a blueprint, not a concrete computation. It declares:

Inputs — CDM data sources or upstream feature columns
Parameters — typed, validated, with defaults and constraints
Formula — a mathematical expression compiled to an IR DAG by the AST compiler
Output column — what this feature produces

Formula-Based Definition (Preferred)

name: ofi
description: Order Flow Imbalance - measures net order flow
category: order_flow
dimension: signal
version: v0.9.0 (Beta)
required_inputs:
  - best_bid_size
  - best_ask_size
output_column: ofi
parameters:
  window:
    type: integer
    description: Lookback window for differences
    default: 1
    constraints:
      min: 1
      max: 100
  decay:
    type: float
    description: Decay factor for EMA smoothing
    default: 0.95
    constraints:
      min: 0.0
      max: 1.0
formula: "cumsum((diff(best_bid_size, window) - diff(best_ask_size, window)), 0)"

The formula: field compiles through Python's ast module into IRNodes — no manual DAG wiring needed. The compiler automatically extracts column references, resolves $parameter placeholders, constructs the dependency graph, and produces a validated IR DAG ready for lowering.

`$placeholder` Resolution

In a formula, bare names refer to CDM columns. Names prefixed with $ refer to configurable parameters. Resolution happens at compile time:

best_bid_size → column reference (TRANSFORM.col)
$window → parameter value (e.g., 5) → literal (TRANSFORM.lit)

Steps-Based Definition (Legacy)

The older steps: format is still supported for backward compatibility:

steps:
  - primitive: TRANSFORM
    op: sub
    params: { a: best_bid_size, b: best_ask_size }
  - primitive: WINDOW
    op: diff
    params: { input: bid_ask_diff, window: $window }
  - primitive: STATE
    op: cumsum
    params: { input: diff_result, init: 0 }

All new FeatureTypes use the formula format. See the AST Compiler for the full formula language reference.

ComputationStep (Internal IR Bridge)

ComputationStep is an internal representation that bridges FeatureType definitions and the IR DAG. When a formula is compiled, the resulting IRNodes are converted to ComputationSteps for backward compatibility with existing pipeline infrastructure. When a FeatureType is instantiated with concrete parameters, each ComputationStep resolves its $placeholders and becomes part of the IR DAG.

Most users never interact with ComputationStep directly — it's an implementation detail of the formula → AST → IR compilation pipeline. The public interface is the formula: field in the FeatureType YAML.

FeatureInstance (Concrete Feature)

A FeatureInstance is a fully resolved feature, ready to execute. It binds a FeatureType template to specific parameter values, data sources, and runtime context:

FeatureInstance(
    name="close_zscore_20",
    feature_type=rolling_zscore,       # Reference to FeatureType template
    params={"column": "close", "window": 20},
    sources=["cdm_time_bars"],        # Concrete data sources
)

Template → Instance separation is a core design decision. One FeatureType (e.g., rolling_zscore) can produce many FeatureInstance objects — close_zscore_20, volume_zscore_60, etc. — each with different parameterizations. This avoids code duplication and ensures consistent semantics across parameterizations.

ParameterDef (Typed Parameters)

Parameters are validated via ParameterDef — a type-safe parameter definition with 7 dtype categories:

Dtype	Constraints	Example
`INTEGER`	`min`, `max`	window size, lag, horizon
`FLOAT`	`min`, `max`	decay factor, threshold, epsilon
`STRING`	`pattern` (regex)	bar identifier, column name
`BOOLEAN`	—	feature flags, toggles
`COLUMN`	must be valid identifier	column name references
`LIST`	—	list of values
`DICT`	—	key-value parameter maps

Each parameter in a FeatureType YAML definition is validated at parse time against its ParameterDef. Invalid values (e.g., window: "abc" for an INT parameter) fail immediately during project loading — not mid-computation.

TypeResolver

The TypeResolver maps canonical dtypes (STRING, INT, FLOAT, DECIMAL, TIMESTAMP) to backend-specific runtime types. Since FeatureDAG targets multiple backends (Polars, DolphinDB), the same canonical type must resolve correctly in each context:

Canonical	Polars	DolphinDB
`STRING`	`pl.Utf8`	`STRING`
`INT`	`pl.Int64`	`LONG`
`FLOAT`	`pl.Float64`	`DOUBLE`
`DECIMAL`	`pl.Decimal`	`DECIMAL64`
`TIMESTAMP`	`pl.Datetime`	`TIMESTAMP`

Type resolution is part of the lowering phase — canonical types flow through the IR unchanged, and the lowering registry translates them at expression-generation time.

Registries

Two registries load feature definitions from YAML/JSON into validated Python objects at project startup:

Registry	Loads	Keys
`FeatureTypeRegistry`	FeatureType definitions from `feature_types/`	feature type name
`ParameterTypeRegistry`	Custom parameter validators	dtype

All registries validate at load time via Pydantic. A malformed definition never reaches the pipeline — it fails during from_project().

from quantflow.metadata import load_metadata

meta = load_metadata(project_dir=".")
registry = meta.feature_type_registry
ft = registry.get("ofi")  # Validated FeatureType object

Computation Primitives​

FeatureType (Template / Blueprint)​

Formula-Based Definition (Preferred)​

$placeholder Resolution​

Steps-Based Definition (Legacy)​

ComputationStep (Internal IR Bridge)​

FeatureInstance (Concrete Feature)​

ParameterDef (Typed Parameters)​

TypeResolver​

Registries​