Skip to main content

Labeling Methods

Six labeling paradigms, each registered via @register() and dispatched by name from the project YAML. All methods share the same signature: (cdm_data: dict, params: dict, inputs: dict) → LabelResult.


Triple Barrier

The classic method from modern financial ML (AFML Ch. 3). For each bar at index i:

  • Upper barrier: close[i] * (1 + upper_pct)
  • Lower barrier: close[i] * (1 - lower_pct)
  • Vertical barrier: i + max_horizon (time-based expiry)

The algorithm scans forward through subsequent bars:

Label = +1 → upper barrier hit first (profit take)
Label = -1 → lower barrier hit first (stop loss)
Label = 0 → vertical barrier hit first (time expiration)
ParameterDescription
horizonNumber of bars forward to scan
upper_barrierProfit target (e.g. 0.02 = +2%)
lower_barrierStop loss (e.g. 0.02 = -2%)
vertical_barrierMax holding period in bars

Why triple barrier: Unlike fixed-horizon labels that only look at the endpoint, triple barrier labels incorporate the full price path — more robust for strategies with explicit stop-loss and take-profit rules.

Input columns: close (return calculation), high (upper barrier path), low (lower barrier path). Uses Numba JIT for the scan loop over Polars columns serialized to NumPy arrays. Output: label_value, barrier_hit, forward_return.


Fixed Horizon Return

The simplest labeling method. Computes forward return over a fixed number of periods using Polars' shift(-horizon).over("symbol").

ParameterDescription
horizonNumber of bars forward
return_typesimple or log
binning.methodquantile, std, fixed, or none (continuous)
binning.n_binsNumber of bins for quantile binning

Binning Methods

  • quantile: Polars qcut to bin returns into equal-frequency buckets → Int8 categories.
  • std: Standard deviation thresholds — +2, +1, 0, -1, -2 based on multiples of std.
  • fixed: Explicit threshold boundaries for bin edges.

Without binning, outputs raw forward_return as a regression target.


Trend Scanning

A backward-looking CUSUM-based method. Does NOT look into the future — suitable for regime-detection without look-ahead bias.

Two accumulators track deviation from a drift term:

  • s_pos: cumulative positive deviation, clamped at 0
  • s_neg: cumulative negative deviation, clamped at 0
s_pos > threshold → emit +1, reset both
s_neg < -threshold → emit -1, reset both
Otherwise → emit 0
ParameterDescription
thresholdCUSUM sensitivity — lower values detect shorter/weaker trends
driftExpected drift to subtract (typically 0)

Input: log returns computed per symbol via log().diff().over("symbol"). Horizon: 0 — label assigned at detection timestamp. Uses Numba JIT. Output: label_value, cusum_value.


Quantile Label

Cross-sectional relative-strength ranking across assets at each point in time:

  1. Compute forward return over horizon periods per symbol
  2. At each timestamp, compute cross-sectional quantile thresholds (default upper=0.8, lower=0.2)
  3. Label: +1 if return ≥ upper quantile, -1 if ≤ lower quantile, 0 otherwise

A symbol is "good" only if it outperforms its peers, not just if its return is positive. Standard approach for long-short portfolio construction. Output: label_value, forward_return.


Time-Series Sign

Direction classifier with a noise band:

  1. Compute forward return over horizon periods
  2. Label: +1 if forward return > threshold, -1 if < -threshold, 0 otherwise
ParameterDescription
horizonNumber of bars forward
thresholdMinimum absolute return to count as up/down
return_typesimple or log

The threshold prevents labeling tiny noise-driven movements as directional signals. Output: label_value, forward_return.


Meta Labeling

Secondary model of primary model correctness (AFML Ch. 4). Requires an existing primary model's predictions:

  1. A primary model produces trade predictions (e.g. buy/sell signals)
  2. The meta-label model predicts whether each primary prediction will be correct
  3. The meta-label filters primary predictions — only trades where the meta-model predicts success are executed
ParameterDescription
primary_labelColumn name of the primary model's predictions
featuresList of feature column names to use as predictors

The method compares the primary model's predicted direction against the actual forward return sign. Output: label_value (1 if primary prediction matched actual direction, 0 otherwise). Used to size bets or filter trades in a secondary ML layer.