Custom Feed Providers
Connect new data sources to ingest market data. Define providers declaratively in YAML — no custom ingestion code.
1. Overview
Feed providers are defined as YAML in .definitions/feed_providers/ and referenced by sources[].historical_feed_provider or streaming_feed_provider in the project config. Each provider specification is validated against the FeedProvider Pydantic model.
2. FeedProvider YAML
name: cryptohftdata
description: Historical crypto trade and LOB data from CryptoHFTData
type: http
location: https://api.cryptohftdata.com/v1
auth: api_key
format: parquet
compression: zstd
update_frequency: 1d
data_types:
trades:
name: trades
stream: trade
unique_key: [symbol, trade_time, trade_id]
enabled: true
schema:
symbol:
dtype: string
description: Trading pair symbol
tests:
- not_null
price:
dtype: decimal
description: Trade price
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
inclusive: false
size:
dtype: decimal
description: Trade quantity
tests:
- not_null
trade_time:
dtype: timestamp
description: Trade timestamp
tests:
- not_null
field_mappings:
- target: trade_id
source: t
transformation: "cast_safe(t, bigint)"
- target: price
source: p
transformation: "cast_safe(p, decimal)"
- target: size
source: q
transformation: "cast_safe(q, decimal)"
- target: event_time
source: "E"
transformation: "timestamp_ms(cast_safe(E, bigint))"
is_time_filter_field: true
tests:
table_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns: [symbol, trade_time, trade_id]
- dbt_utils.recency:
datepart: hour
field: event_time
interval: 24
orderbook:
name: orderbook
stream: depth@100ms
unique_key: [symbol, snapshot_time]
enabled: true
schema:
symbol:
dtype: string
tests: [not_null]
best_bid_price:
dtype: decimal
tests: [not_null]
best_ask_price:
dtype: decimal
tests: [not_null]
snapshot_time:
dtype: timestamp
tests: [not_null]
field_mappings:
- target: symbol
source: s
- target: best_bid_price
source: b
transformation: "cast_safe(b, decimal)"
- target: best_ask_price
source: a
transformation: "cast_safe(a, decimal)"
- target: event_time
source: "E"
transformation: "timestamp_ms(cast_safe(E, bigint))"
is_time_filter_field: true
tests:
table_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns: [symbol, snapshot_time]
Top-Level Fields
| Field | Required | Description |
|---|---|---|
name | Yes | Unique provider identifier (inherited from NamedModel) |
description | No | Human-readable description |
type | Yes | SourceType: http, streaming, websocket, database, file |
location | Yes | Endpoint URL (must start with http://, https://, file://, gs://, s3://, wss://, ws://) |
auth | No | Authentication method |
format | No | Data format (parquet, csv, json, protobuf) |
compression | No | Compression (none, gzip, snappy, zstd) |
update_frequency | No | Refresh interval |
data_types | No | Dict of DataTypeSchema keyed by name |
Streaming-Specific Fields
| Field | Description |
|---|---|
protocol | "combined_stream" or "subscription" |
orderbook_mode | "snapshot" or "snapshot_delta" |
reconstruction_interval | ms between reconstructed snapshots |
max_depth_levels | Depth levels to emit |
3. DataTypeSchema
Each entry under data_types maps a provider data type to CDM fields.
| Field | Required | Default | Description |
|---|---|---|---|
name | Yes | — | Data type name (e.g. trades, orderbook) |
stream | No | — | Exchange stream name for URL substitution |
unique_key | No | [] | Columns forming a unique row identifier |
enabled | No | true | Whether this data type is active |
schema | No | {} | Column name → SchemaAttribute (dtype + tests) |
field_mappings | No | [] | Raw field → CDM column transformations |
tests | No | TestSuite() | Table-level tests (table_tests + column_tests) |
SchemaAttribute
Per-column definition under schema::
| Field | Required | Description |
|---|---|---|
dtype | Yes | Data type (YAML key) |
description | No | Column description |
tests | No | Column-level test specs |
FieldMapping
| Field | Required | Description |
|---|---|---|
target | Yes | CDM column name |
source | No | Raw source column name (passthrough if no transformation) |
transformation | No | QFSQL expression applied to source |
is_time_filter_field | No | Designates this field as the incremental filter key |
custom_cast | No | Engine-specific type cast override |
At least one of source or transformation must be specified.
4. Credentials
Add credentials to .local_config.yml under feed_provider_credentials:
feed_provider_credentials:
- provider: cryptohftdata
key: "your-api-key"
- provider: my_exchange
key: "your-key"
username: "your-username"
password: "your-password"
The provider field must match the name in the feed provider YAML.
5. Activate in Project Config
sources:
- name: binance_spot
exchange: binance_spot
historical_feed_provider: cryptohftdata # Matches FeedProvider.name
streaming_feed_provider: binance_streaming # Matches FeedProvider.name
→ QFSQL Reference — all transformation functions → Data Quality Tests — test catalog