Skip to main content

Common Workflows & Troubleshooting

Day-to-day tasks and what to do when things go wrong.


1. Generating ML Feature Sets

  1. Pick features from the built-in catalog or define custom ones.
  2. Add your chosen features to quantflow_project.yml under feature_engine.features.
  3. Configure labels — e.g., triple_barrier for TP/SL-based targets.
  4. Run: qf run --start-date 2024-01-01 --end-date 2025-12-31
  5. Query the computed feature tables and cdm.cdm_labels in DuckDB.
  6. Query features and labels in DuckDB — features in {project}_feature.features, labels in cdm.cdm_labels. Join on symbol and timestamp for ML training.

2. Switching from Research to Production

  1. Verify batch mode works: qf run --mode research
  2. Ensure DolphinDB is running — check host/port in .local_config.yml.
  3. Configure streaming feed providers in .definitions/feed_providers/.
  4. Deploy: qf run --mode trade
  5. Monitor: qf pipeline status

Features defined once in YAML run across batch and streaming modes — no duplicate implementations. Only the execution backend differs.


3. Changing Data Providers

  1. Find or create a feed provider YAML in .definitions/feed_providers/.
  2. Set the provider name in quantflow_project.yml under sources[].historical_feed_provider or streaming_feed_provider.
  3. Add credentials to .local_config.yml under feed_provider_credentials.
  4. Run qf validate to check the configuration.

4. Adding Features

The crypto template includes 133 built-in FeatureTypes. To configure features:

  1. In quantflow_project.yml, add entries under feature_engine.features:
    features:
    - name: my_ofi
    type: ofi
    parameters:
    bar: imbalance_k_10
  2. Override any feature parameters as needed (horizon, bar type, etc.).
  3. Run qf validate and then qf run.

5. Reprocessing After Config Changes

After changing state engine thresholds, label definitions, or feature parameters:

# Re-run state engine for the affected date range
qf run --engine state --start-date 2026-01-01 --end-date 2026-01-31

# Re-compute features and labels
qf run --engine feature --start-date 2026-01-01 --end-date 2026-01-31

New results replace old ones for the overlapping date range (state engine) or append with a new run_id (features/labels).


6. Running Batch Pipelines via Dagster

For research workflows that benefit from asset lineage and observability:

# Start Dagster UI
dagster dev -w dagster_workspace.yaml

# Trigger full pipeline
python -c "
from quantflow.pipeline import create_runner
from quantflow.metadata import load_metadata

meta = load_metadata(project_dir='.')
runner = create_runner(meta)
runner.run(stage='all')
"

# Or run individual stages
runner.run(stage='ingest')
runner.run(stage='dbt')
runner.run(stage='state_engine')
runner.run(stage='feature_engine')

Dagster tracks run history, asset materializations, and per-stage failures — useful for debugging and reproducibility. Only the batch path goes through Dagster; streaming runs independently in DolphinDB.


7. Troubleshooting

"Feature type not in registry"

The type in your feature definition doesn't match any name in .definitions/feature_types/. Check for typos, and verify the YAML file exists in that directory (subdirectories are searched recursively).

"Missing required parameter"

A feature type declares a parameter as required: true, but no value is in your config. Add it under the feature's parameters section in quantflow_project.yml, or in the feature YAML definition under .definitions/features/.

"Required input column not available"

A feature's required_inputs lists a column that doesn't exist in the source CDM tables. Verify the state engine ran successfully for the date range, and check that the column name matches what the state engine produces.

"No source columns available" (feature skipped)

The feature's first step references columns that don't exist in the merged source data. Check column name mappings and ensure the state engine completed before the feature engine runs.

DolphinDB connection refused

  • Verify host/port in .local_config.yml.
  • Default credentials: admin / 123456.
  • Ensure the DolphinDB server has enough memory for the deployed engines.
  • Check that the DolphinDB service is running.

Pipeline stage hangs or times out

  • For batch mode: try reducing micro_batch_size in state engine config.
  • For streaming mode: check qf pipeline status for queue depth — backed-up queues indicate a downstream bottleneck.
  • Verify data is actually available for the requested date range.