Labeling Methods

Six labeling paradigms, each registered via @register() and dispatched by name from the project YAML. All methods share the same signature: (cdm_data: dict, params: dict, inputs: dict) → LabelResult.

Triple Barrier

The classic method from modern financial ML (AFML Ch. 3). For each bar at index i:

Upper barrier: close[i] * (1 + upper_pct)
Lower barrier: close[i] * (1 - lower_pct)
Vertical barrier: i + max_horizon (time-based expiry)

The algorithm scans forward through subsequent bars:

Label = +1  →  upper barrier hit first (profit take)
Label = -1  →  lower barrier hit first (stop loss)
Label = 0   →  vertical barrier hit first (time expiration)

Parameter	Description
`horizon`	Number of bars forward to scan
`upper_barrier`	Profit target (e.g. 0.02 = +2%)
`lower_barrier`	Stop loss (e.g. 0.02 = -2%)
`vertical_barrier`	Max holding period in bars

Why triple barrier: Unlike fixed-horizon labels that only look at the endpoint, triple barrier labels incorporate the full price path — more robust for strategies with explicit stop-loss and take-profit rules.

Input columns: close (return calculation), high (upper barrier path), low (lower barrier path). Uses Numba JIT for the scan loop over Polars columns serialized to NumPy arrays. Output: label_value, barrier_hit, forward_return.

Fixed Horizon Return

The simplest labeling method. Computes forward return over a fixed number of periods using Polars' shift(-horizon).over("symbol").

Parameter	Description
`horizon`	Number of bars forward
`return_type`	`simple` or `log`
`binning.method`	`quantile`, `std`, `fixed`, or none (continuous)
`binning.n_bins`	Number of bins for quantile binning

Binning Methods

quantile: Polars qcut to bin returns into equal-frequency buckets → Int8 categories.
std: Standard deviation thresholds — +2, +1, 0, -1, -2 based on multiples of std.
fixed: Explicit threshold boundaries for bin edges.

Without binning, outputs raw forward_return as a regression target.

Trend Scanning

A backward-looking CUSUM-based method. Does NOT look into the future — suitable for regime-detection without look-ahead bias.

Two accumulators track deviation from a drift term:

s_pos: cumulative positive deviation, clamped at 0
s_neg: cumulative negative deviation, clamped at 0

s_pos > threshold  → emit +1, reset both
s_neg < -threshold → emit -1, reset both
Otherwise           → emit 0

Parameter	Description
`threshold`	CUSUM sensitivity — lower values detect shorter/weaker trends
`drift`	Expected drift to subtract (typically 0)

Input: log returns computed per symbol via log().diff().over("symbol"). Horizon: 0 — label assigned at detection timestamp. Uses Numba JIT. Output: label_value, cusum_value.

Quantile Label

Cross-sectional relative-strength ranking across assets at each point in time:

Compute forward return over horizon periods per symbol
At each timestamp, compute cross-sectional quantile thresholds (default upper=0.8, lower=0.2)
Label: +1 if return ≥ upper quantile, -1 if ≤ lower quantile, 0 otherwise

A symbol is "good" only if it outperforms its peers, not just if its return is positive. Standard approach for long-short portfolio construction. Output: label_value, forward_return.

Time-Series Sign

Direction classifier with a noise band:

Compute forward return over horizon periods
Label: +1 if forward return > threshold, -1 if < -threshold, 0 otherwise

Parameter	Description
`horizon`	Number of bars forward
`threshold`	Minimum absolute return to count as up/down
`return_type`	`simple` or `log`

The threshold prevents labeling tiny noise-driven movements as directional signals. Output: label_value, forward_return.

Meta Labeling

Secondary model of primary model correctness (AFML Ch. 4). Requires an existing primary model's predictions:

A primary model produces trade predictions (e.g. buy/sell signals)
The meta-label model predicts whether each primary prediction will be correct
The meta-label filters primary predictions — only trades where the meta-model predicts success are executed

Parameter	Description
`primary_label`	Column name of the primary model's predictions
`features`	List of feature column names to use as predictors

The method compares the primary model's predicted direction against the actual forward return sign. Output: label_value (1 if primary prediction matched actual direction, 0 otherwise). Used to size bets or filter trades in a secondary ML layer.

Triple Barrier​

Fixed Horizon Return​

Binning Methods​

Trend Scanning​

Quantile Label​

Time-Series Sign​

Meta Labeling​