Architecture

High-level design of Matyan: storage, ingestion, and serving.

Components

Service	Role
matyan-backend	FastAPI REST API. Reads from FoundationDB; handles control operations (delete run, rename experiment, etc.) with synchronous FDB writes and Kafka events for async side effects.
matyan-frontier	Ingestion gateway. WebSocket for metrics/params; presigned S3 URLs for large blobs. Publishes to Kafka only.
ingestion-workers	Consume `data-ingestion` topic; write runs, sequences, and metadata to FoundationDB.
control-workers	Consume `control-events` topic; perform S3 cleanup and other side effects.
matyan-client	Python SDK. Sends tracking data to frontier; uses backend for metadata and queries.
matyan-ui	React UI; talks to matyan-backend.

Data flow

Reads: UI or client → backend (REST) → FoundationDB.
Writes (training): Client → frontier (WebSocket or presigned S3) → Kafka → ingestion workers → FoundationDB (and S3 for blobs).
Control: UI → backend (REST) → FDB + Kafka control-events → control workers (e.g. S3 cleanup).

Storage (FoundationDB)

Key space: Runs, sequences, and indexes live under FDB directories (e.g. data/runs, data/indexes, system). Keys use fdb.tuple encoding.
Indexes: Tier 1 (archived, experiment, tag, created_at, etc.) and Tier 2 (scalar hyperparameters) are maintained on write and used by the query planner for MatyanQL.

The following section documents the concrete key layout for reference and debugging.

FoundationDB key structure

The backend uses three top-level directories (FDB directory layer), each exposing a subspace for key-value storage. All keys are tuple-encoded with fdb.tuple; values are msgpack-serialized (see storage/encoding.py). Nested structures (e.g. run metadata, attributes) are stored as trees: each leaf is a key ending with a sentinel (e.g. __leaf__) so that subspace.range(path) returns all keys under that path.

Top-level directories

Directory path	Subspace usage
`("data", "runs")`	Run data: metadata, attributes, traces, contexts, run–tag links, and all time-series sequences.
`("data", "indexes")`	Secondary indexes (Tier 1–3) and reverse index for deindexing; deletion tombstones.
`("system",)`	Entities (experiments, tags, dashboards, reports, notes), run–experiment and run–tag mappings, project settings, ping key.

Data directory: `data/runs`

All keys are under a single runs subspace. run_hash is the run’s unique id (e.g. UUID or hash).

Key tuple pattern	Description
`(run_hash, "meta", <field>, "__leaf__")`	Run metadata: `name`, `description`, `created_at`, `updated_at`, `finalized_at`, `is_archived`, `active`, `experiment_id`, `client_start_ts`, `duration`, `pending_deletion`. Stored as a flat key per field (tree with leaf sentinel).
`(run_hash, "attrs", <path...>, "__leaf__")` or list/dict sentinels	Run attributes (e.g. hyperparameters under `hparams`). Nested dicts/lists flattened; scalars use `__leaf__`. Special key `attrs.__blobs__` holds blob references.
`(run_hash, "traces", ctx_id, name, "dtype")`	Trace metadata: `dtype`, optional `last`, `last_step` per (context_id, metric_name).
`(run_hash, "contexts", ctx_id)`	Context dict for the given context id (deterministic id from context hash).
`(run_hash, "tags", tag_uuid)`	Run–tag association (value is truthy; used for “this run has this tag”).
`(run_hash, "seqs", ctx_id, name, col, step)`	Time-series columns: `col` is one of `"val"`, `"step"`, `"epoch"`, `"time"`. `step` is the step index. One key per (context, sequence name, column, step).

So for a given run you have: meta (run-level fields), attrs (tree of attributes), traces (per-context per-metric metadata), contexts (context id → dict), tags (set of tag UUIDs), and seqs (time-series values and optional step/epoch/time columns).

Data directory: `data/indexes`

Index entries live in the indexes subspace. Values are empty bytes; the payload is the run hash (and optionally other fields) in the key. Range scans on a prefix return matching run hashes. A reverse index under _rev allows O(1) removal of all index entries for a run when the run is deleted or updated.

Tier 1 (structured fields):

Key tuple	Purpose
`("archived", <bool>, run_hash)`	Filter by archived flag.
`("active", <bool>, run_hash)`	Filter by active (e.g. live runs).
`("experiment", <exp_name>, run_hash)`	Filter by experiment name.
`("created_at", <timestamp>, run_hash)`	Range scan by creation time.
`("tag", <tag_name>, run_hash)`	Filter by tag name.

Tier 2 (hyperparameters):

Key tuple	Purpose
`("hparam", <param_name>, <value>, run_hash)`	Equality/range on top-level scalar hparams.

Tier 3 (metric trace names):

Key tuple	Purpose
`("trace", <metric_name>, run_hash)`	Lookup runs that have a given metric.

Maintenance:

Key tuple	Purpose
`("_rev", run_hash, <forward_key_elements>...)`	Reverse index: same elements as the forward key with `run_hash` prepended; used to delete all index entries for a run.
`("_deleted", run_hash)`	Tombstone: run was deleted; ingestion workers skip re-creating it. See Understanding — Tombstones.

System directory: `system`

Entities (experiments, tags, dashboards, dashboard apps, reports, notes) are stored as one key per field per entity; a “by name” index gives UUID from name where applicable.

Key tuple pattern	Description
`("experiments", uuid, field)`	Experiment record fields (e.g. name, description).
`("experiments_by_name", name)`	Name → experiment UUID.
`("experiment_runs", exp_uuid, run_hash)`	Experiment ↔ run association.
`("run_experiment", run_hash)`	Run → experiment UUID (reverse lookup).
`("tags", uuid, field)`	Tag record fields.
`("tags_by_name", name)`	Name → tag UUID.
`("run_tags", run_hash, tag_uuid)`	Run–tag link (also stored in runs_dir for co-location).
`("tag_runs", tag_uuid, run_hash)`	Tag → runs.
`("dashboards", uuid, field)`	Dashboard entity fields.
`("dashboard_apps", uuid, field)`	Dashboard app fields.
`("reports", uuid, field)`	Report fields.
`("notes", uuid, field)`	Note fields.

Project and UI state:

Key tuple pattern	Description
`("project", key, "__leaf__")`	Project-level settings (e.g. name, description).
`("pinned_sequences", "__leaf__")`	Pinned sequence list for the UI.
`("__ping__",)`	Used by the backend for a minimal read to verify FDB connectivity.

Encoding and tree convention

Values: msgpack via storage/encoding.py (with datetime extension).
Tree storage: Nested dicts/lists are flattened into key paths. Scalars are stored with a final __leaf__ component so that subspace.range((run_hash, "meta")) returns all meta keys. Empty dict/list use sentinels __empty_dict__ / __empty_list__.