Skip to main content

Custom Feed Providers

Connect new data sources to ingest market data. Define providers declaratively in YAML — no custom ingestion code.


1. Overview

Feed providers are defined as YAML in .definitions/feed_providers/ and referenced by sources[].historical_feed_provider or streaming_feed_provider in the project config. Each provider specification is validated against the FeedProvider Pydantic model.


2. FeedProvider YAML

name: cryptohftdata
description: Historical crypto trade and LOB data from CryptoHFTData
type: http
location: https://api.cryptohftdata.com/v1
auth: api_key
format: parquet
compression: zstd
update_frequency: 1d

data_types:
trades:
name: trades
stream: trade
unique_key: [symbol, trade_time, trade_id]
enabled: true

schema:
symbol:
dtype: string
description: Trading pair symbol
tests:
- not_null
price:
dtype: decimal
description: Trade price
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
inclusive: false
size:
dtype: decimal
description: Trade quantity
tests:
- not_null
trade_time:
dtype: timestamp
description: Trade timestamp
tests:
- not_null

field_mappings:
- target: trade_id
source: t
transformation: "cast_safe(t, bigint)"
- target: price
source: p
transformation: "cast_safe(p, decimal)"
- target: size
source: q
transformation: "cast_safe(q, decimal)"
- target: event_time
source: "E"
transformation: "timestamp_ms(cast_safe(E, bigint))"
is_time_filter_field: true

tests:
table_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns: [symbol, trade_time, trade_id]
- dbt_utils.recency:
datepart: hour
field: event_time
interval: 24

orderbook:
name: orderbook
stream: depth@100ms
unique_key: [symbol, snapshot_time]
enabled: true

schema:
symbol:
dtype: string
tests: [not_null]
best_bid_price:
dtype: decimal
tests: [not_null]
best_ask_price:
dtype: decimal
tests: [not_null]
snapshot_time:
dtype: timestamp
tests: [not_null]

field_mappings:
- target: symbol
source: s
- target: best_bid_price
source: b
transformation: "cast_safe(b, decimal)"
- target: best_ask_price
source: a
transformation: "cast_safe(a, decimal)"
- target: event_time
source: "E"
transformation: "timestamp_ms(cast_safe(E, bigint))"
is_time_filter_field: true

tests:
table_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns: [symbol, snapshot_time]

Top-Level Fields

FieldRequiredDescription
nameYesUnique provider identifier (inherited from NamedModel)
descriptionNoHuman-readable description
typeYesSourceType: http, streaming, websocket, database, file
locationYesEndpoint URL (must start with http://, https://, file://, gs://, s3://, wss://, ws://)
authNoAuthentication method
formatNoData format (parquet, csv, json, protobuf)
compressionNoCompression (none, gzip, snappy, zstd)
update_frequencyNoRefresh interval
data_typesNoDict of DataTypeSchema keyed by name

Streaming-Specific Fields

FieldDescription
protocol"combined_stream" or "subscription"
orderbook_mode"snapshot" or "snapshot_delta"
reconstruction_intervalms between reconstructed snapshots
max_depth_levelsDepth levels to emit

3. DataTypeSchema

Each entry under data_types maps a provider data type to CDM fields.

FieldRequiredDefaultDescription
nameYesData type name (e.g. trades, orderbook)
streamNoExchange stream name for URL substitution
unique_keyNo[]Columns forming a unique row identifier
enabledNotrueWhether this data type is active
schemaNo{}Column name → SchemaAttribute (dtype + tests)
field_mappingsNo[]Raw field → CDM column transformations
testsNoTestSuite()Table-level tests (table_tests + column_tests)

SchemaAttribute

Per-column definition under schema::

FieldRequiredDescription
dtypeYesData type (YAML key)
descriptionNoColumn description
testsNoColumn-level test specs

FieldMapping

FieldRequiredDescription
targetYesCDM column name
sourceNoRaw source column name (passthrough if no transformation)
transformationNoQFSQL expression applied to source
is_time_filter_fieldNoDesignates this field as the incremental filter key
custom_castNoEngine-specific type cast override

At least one of source or transformation must be specified.


4. Credentials

Add credentials to .local_config.yml under feed_provider_credentials:

feed_provider_credentials:
- provider: cryptohftdata
key: "your-api-key"
- provider: my_exchange
key: "your-key"
username: "your-username"
password: "your-password"

The provider field must match the name in the feed provider YAML.


5. Activate in Project Config

sources:
- name: binance_spot
exchange: binance_spot
historical_feed_provider: cryptohftdata # Matches FeedProvider.name
streaming_feed_provider: binance_streaming # Matches FeedProvider.name

QFSQL Reference — all transformation functions → Data Quality Tests — test catalog