Labeling Methods
Six labeling paradigms, each registered via @register() and dispatched by name from the project YAML. All methods share the same signature: (cdm_data: dict, params: dict, inputs: dict) → LabelResult.
Triple Barrier
The classic method from modern financial ML (AFML Ch. 3). For each bar at index i:
- Upper barrier:
close[i] * (1 + upper_pct) - Lower barrier:
close[i] * (1 - lower_pct) - Vertical barrier:
i + max_horizon(time-based expiry)
The algorithm scans forward through subsequent bars:
Label = +1 → upper barrier hit first (profit take)
Label = -1 → lower barrier hit first (stop loss)
Label = 0 → vertical barrier hit first (time expiration)
| Parameter | Description |
|---|---|
horizon | Number of bars forward to scan |
upper_barrier | Profit target (e.g. 0.02 = +2%) |
lower_barrier | Stop loss (e.g. 0.02 = -2%) |
vertical_barrier | Max holding period in bars |
Why triple barrier: Unlike fixed-horizon labels that only look at the endpoint, triple barrier labels incorporate the full price path — more robust for strategies with explicit stop-loss and take-profit rules.
Input columns: close (return calculation), high (upper barrier path), low (lower barrier path). Uses Numba JIT for the scan loop over Polars columns serialized to NumPy arrays. Output: label_value, barrier_hit, forward_return.
Fixed Horizon Return
The simplest labeling method. Computes forward return over a fixed number of periods using Polars' shift(-horizon).over("symbol").
| Parameter | Description |
|---|---|
horizon | Number of bars forward |
return_type | simple or log |
binning.method | quantile, std, fixed, or none (continuous) |
binning.n_bins | Number of bins for quantile binning |
Binning Methods
- quantile: Polars
qcutto bin returns into equal-frequency buckets → Int8 categories. - std: Standard deviation thresholds — +2, +1, 0, -1, -2 based on multiples of std.
- fixed: Explicit threshold boundaries for bin edges.
Without binning, outputs raw forward_return as a regression target.
Trend Scanning
A backward-looking CUSUM-based method. Does NOT look into the future — suitable for regime-detection without look-ahead bias.
Two accumulators track deviation from a drift term:
s_pos: cumulative positive deviation, clamped at 0s_neg: cumulative negative deviation, clamped at 0
s_pos > threshold → emit +1, reset both
s_neg < -threshold → emit -1, reset both
Otherwise → emit 0
| Parameter | Description |
|---|---|
threshold | CUSUM sensitivity — lower values detect shorter/weaker trends |
drift | Expected drift to subtract (typically 0) |
Input: log returns computed per symbol via log().diff().over("symbol"). Horizon: 0 — label assigned at detection timestamp. Uses Numba JIT. Output: label_value, cusum_value.
Quantile Label
Cross-sectional relative-strength ranking across assets at each point in time:
- Compute forward return over
horizonperiods per symbol - At each timestamp, compute cross-sectional quantile thresholds (default upper=0.8, lower=0.2)
- Label: +1 if return ≥ upper quantile, -1 if ≤ lower quantile, 0 otherwise
A symbol is "good" only if it outperforms its peers, not just if its return is positive. Standard approach for long-short portfolio construction. Output: label_value, forward_return.
Time-Series Sign
Direction classifier with a noise band:
- Compute forward return over
horizonperiods - Label: +1 if forward return >
threshold, -1 if < -threshold, 0 otherwise
| Parameter | Description |
|---|---|
horizon | Number of bars forward |
threshold | Minimum absolute return to count as up/down |
return_type | simple or log |
The threshold prevents labeling tiny noise-driven movements as directional signals. Output: label_value, forward_return.
Meta Labeling
Secondary model of primary model correctness (AFML Ch. 4). Requires an existing primary model's predictions:
- A primary model produces trade predictions (e.g. buy/sell signals)
- The meta-label model predicts whether each primary prediction will be correct
- The meta-label filters primary predictions — only trades where the meta-model predicts success are executed
| Parameter | Description |
|---|---|
primary_label | Column name of the primary model's predictions |
features | List of feature column names to use as predictors |
The method compares the primary model's predicted direction against the actual forward return sign. Output: label_value (1 if primary prediction matched actual direction, 0 otherwise). Used to size bets or filter trades in a secondary ML layer.