Skip to main content

Type System

The type system defines the vocabulary for declaring features: 6 computation primitives, formula-based FeatureTypes, the template→instance lifecycle, typed parameter validation, and registries for loading YAML specs.


Computation Primitives

Every IR node is classified into one of 6 primitives. The primitive determines the execution model — how data flows through the node.

PrimitiveRoleExecution ModelExample
SOURCEData input boundaryReads from external CDM tablecdm_trade_enriched, cdm_time_bars
TRANSFORMSingle-row statelessEach output row depends only on current input rowa + b, sign(x), abs(x), sqrt(x)
WINDOWFixed-size trailing windowEach output row depends on last N input rowsrolling_mean(close, 20), diff(price, 1)
STATECumulative across all prior rowsEach output row depends on entire prior historyema(price, 0.94), cumsum(ofi, 0)
EVENTEvent-driven triggerComputation triggered by barrier/condition (reserved)Threshold crossing, barrier hit
SINKData output boundaryMarks final output columnexport

Why these 6? Every financial computation fits one of these patterns. TRANSFORM handles arithmetic and scalar math (fast, parallelizable). WINDOW handles rolling aggregations (fixed memory, O(N) per row). STATE handles unbounded accumulators (expensive but necessary for EMA, cumulative sums). EVENT is reserved for future event-driven triggers. SOURCE and SINK demarcate the boundaries — where data enters and exits the computation graph.


FeatureType (Template / Blueprint)

A FeatureType is a parameterized feature definition — a blueprint, not a concrete computation. It declares:

  • Inputs — CDM data sources or upstream feature columns
  • Parameters — typed, validated, with defaults and constraints
  • Formula — a mathematical expression compiled to an IR DAG by the AST compiler
  • Output column — what this feature produces

Formula-Based Definition (Preferred)

name: ofi
description: Order Flow Imbalance - measures net order flow
category: order_flow
dimension: signal
version: v0.9.0 (Beta)
required_inputs:
- best_bid_size
- best_ask_size
output_column: ofi
parameters:
window:
type: integer
description: Lookback window for differences
default: 1
constraints:
min: 1
max: 100
decay:
type: float
description: Decay factor for EMA smoothing
default: 0.95
constraints:
min: 0.0
max: 1.0
formula: "cumsum((diff(best_bid_size, window) - diff(best_ask_size, window)), 0)"

The formula: field compiles through Python's ast module into IRNodes — no manual DAG wiring needed. The compiler automatically extracts column references, resolves $parameter placeholders, constructs the dependency graph, and produces a validated IR DAG ready for lowering.

$placeholder Resolution

In a formula, bare names refer to CDM columns. Names prefixed with $ refer to configurable parameters. Resolution happens at compile time:

  • best_bid_size → column reference (TRANSFORM.col)
  • $window → parameter value (e.g., 5) → literal (TRANSFORM.lit)

Steps-Based Definition (Legacy)

The older steps: format is still supported for backward compatibility:

steps:
- primitive: TRANSFORM
op: sub
params: { a: best_bid_size, b: best_ask_size }
- primitive: WINDOW
op: diff
params: { input: bid_ask_diff, window: $window }
- primitive: STATE
op: cumsum
params: { input: diff_result, init: 0 }

All new FeatureTypes use the formula format. See the AST Compiler for the full formula language reference.


ComputationStep (Internal IR Bridge)

ComputationStep is an internal representation that bridges FeatureType definitions and the IR DAG. When a formula is compiled, the resulting IRNodes are converted to ComputationSteps for backward compatibility with existing pipeline infrastructure. When a FeatureType is instantiated with concrete parameters, each ComputationStep resolves its $placeholders and becomes part of the IR DAG.

Most users never interact with ComputationStep directly — it's an implementation detail of the formula → AST → IR compilation pipeline. The public interface is the formula: field in the FeatureType YAML.


FeatureInstance (Concrete Feature)

A FeatureInstance is a fully resolved feature, ready to execute. It binds a FeatureType template to specific parameter values, data sources, and runtime context:

FeatureInstance(
name="close_zscore_20",
feature_type=rolling_zscore, # Reference to FeatureType template
params={"column": "close", "window": 20},
sources=["cdm_time_bars"], # Concrete data sources
)

Template → Instance separation is a core design decision. One FeatureType (e.g., rolling_zscore) can produce many FeatureInstance objects — close_zscore_20, volume_zscore_60, etc. — each with different parameterizations. This avoids code duplication and ensures consistent semantics across parameterizations.


ParameterDef (Typed Parameters)

Parameters are validated via ParameterDef — a type-safe parameter definition with 7 dtype categories:

DtypeConstraintsExample
INTEGERmin, maxwindow size, lag, horizon
FLOATmin, maxdecay factor, threshold, epsilon
STRINGpattern (regex)bar identifier, column name
BOOLEANfeature flags, toggles
COLUMNmust be valid identifiercolumn name references
LISTlist of values
DICTkey-value parameter maps

Each parameter in a FeatureType YAML definition is validated at parse time against its ParameterDef. Invalid values (e.g., window: "abc" for an INT parameter) fail immediately during project loading — not mid-computation.


TypeResolver

The TypeResolver maps canonical dtypes (STRING, INT, FLOAT, DECIMAL, TIMESTAMP) to backend-specific runtime types. Since FeatureDAG targets multiple backends (Polars, DolphinDB), the same canonical type must resolve correctly in each context:

CanonicalPolarsDolphinDB
STRINGpl.Utf8STRING
INTpl.Int64LONG
FLOATpl.Float64DOUBLE
DECIMALpl.DecimalDECIMAL64
TIMESTAMPpl.DatetimeTIMESTAMP

Type resolution is part of the lowering phase — canonical types flow through the IR unchanged, and the lowering registry translates them at expression-generation time.


Registries

Two registries load feature definitions from YAML/JSON into validated Python objects at project startup:

RegistryLoadsKeys
FeatureTypeRegistryFeatureType definitions from feature_types/feature type name
ParameterTypeRegistryCustom parameter validatorsdtype

All registries validate at load time via Pydantic. A malformed definition never reaches the pipeline — it fails during from_project().

from quantflow.metadata import load_metadata

meta = load_metadata(project_dir=".")
registry = meta.feature_type_registry
ft = registry.get("ofi") # Validated FeatureType object