Architecture
High-level design of Matyan: storage, ingestion, and serving.
Components
| Service | Role |
|---|---|
| matyan-backend | FastAPI REST API. Reads from FoundationDB; handles control operations (delete run, rename experiment, etc.) with synchronous FDB writes and Kafka events for async side effects. |
| matyan-frontier | Ingestion gateway. WebSocket for metrics/params; presigned S3 URLs for large blobs. Publishes to Kafka only. |
| ingestion-workers | Consume data-ingestion topic; write runs, sequences, and metadata to FoundationDB. |
| control-workers | Consume control-events topic; perform S3 cleanup and other side effects. |
| matyan-client | Python SDK. Sends tracking data to frontier; uses backend for metadata and queries. |
| matyan-ui | React UI; talks to matyan-backend. |
Data flow
- Reads: UI or client → backend (REST) → FoundationDB.
- Writes (training): Client → frontier (WebSocket or presigned S3) → Kafka → ingestion workers → FoundationDB (and S3 for blobs).
- Control: UI → backend (REST) → FDB + Kafka control-events → control workers (e.g. S3 cleanup).
Storage (FoundationDB)
- Key space: Runs, sequences, and indexes live under FDB directories (e.g.
data/runs,data/indexes,system). Keys usefdb.tupleencoding. - Indexes: Tier 1 (archived, experiment, tag, created_at, etc.) and Tier 2 (scalar hyperparameters) are maintained on write and used by the query planner for MatyanQL.
The following section documents the concrete key layout for reference and debugging.
FoundationDB key structure
The backend uses three top-level directories (FDB directory layer), each exposing a subspace for key-value storage. All keys are tuple-encoded with fdb.tuple; values are msgpack-serialized (see storage/encoding.py). Nested structures (e.g. run metadata, attributes) are stored as trees: each leaf is a key ending with a sentinel (e.g. __leaf__) so that subspace.range(path) returns all keys under that path.
Top-level directories
| Directory path | Subspace usage |
|---|---|
("data", "runs") |
Run data: metadata, attributes, traces, contexts, run–tag links, and all time-series sequences. |
("data", "indexes") |
Secondary indexes (Tier 1–3) and reverse index for deindexing; deletion tombstones. |
("system",) |
Entities (experiments, tags, dashboards, reports, notes), run–experiment and run–tag mappings, project settings, ping key. |
Data directory: data/runs
All keys are under a single runs subspace. run_hash is the run’s unique id (e.g. UUID or hash).
| Key tuple pattern | Description |
|---|---|
(run_hash, "meta", <field>, "__leaf__") |
Run metadata: name, description, created_at, updated_at, finalized_at, is_archived, active, experiment_id, client_start_ts, duration, pending_deletion. Stored as a flat key per field (tree with leaf sentinel). |
(run_hash, "attrs", <path...>, "__leaf__") or list/dict sentinels |
Run attributes (e.g. hyperparameters under hparams). Nested dicts/lists flattened; scalars use __leaf__. Special key attrs.__blobs__ holds blob references. |
(run_hash, "traces", ctx_id, name, "dtype") |
Trace metadata: dtype, optional last, last_step per (context_id, metric_name). |
(run_hash, "contexts", ctx_id) |
Context dict for the given context id (deterministic id from context hash). |
(run_hash, "tags", tag_uuid) |
Run–tag association (value is truthy; used for “this run has this tag”). |
(run_hash, "seqs", ctx_id, name, col, step) |
Time-series columns: col is one of "val", "step", "epoch", "time". step is the step index. One key per (context, sequence name, column, step). |
So for a given run you have: meta (run-level fields), attrs (tree of attributes), traces (per-context per-metric metadata), contexts (context id → dict), tags (set of tag UUIDs), and seqs (time-series values and optional step/epoch/time columns).
Data directory: data/indexes
Index entries live in the indexes subspace. Values are empty bytes; the payload is the run hash (and optionally other fields) in the key. Range scans on a prefix return matching run hashes. A reverse index under _rev allows O(1) removal of all index entries for a run when the run is deleted or updated.
Tier 1 (structured fields):
| Key tuple | Purpose |
|---|---|
("archived", <bool>, run_hash) |
Filter by archived flag. |
("active", <bool>, run_hash) |
Filter by active (e.g. live runs). |
("experiment", <exp_name>, run_hash) |
Filter by experiment name. |
("created_at", <timestamp>, run_hash) |
Range scan by creation time. |
("tag", <tag_name>, run_hash) |
Filter by tag name. |
Tier 2 (hyperparameters):
| Key tuple | Purpose |
|---|---|
("hparam", <param_name>, <value>, run_hash) |
Equality/range on top-level scalar hparams. |
Tier 3 (metric trace names):
| Key tuple | Purpose |
|---|---|
("trace", <metric_name>, run_hash) |
Lookup runs that have a given metric. |
Maintenance:
| Key tuple | Purpose |
|---|---|
("_rev", run_hash, <forward_key_elements>...) |
Reverse index: same elements as the forward key with run_hash prepended; used to delete all index entries for a run. |
("_deleted", run_hash) |
Tombstone: run was deleted; ingestion workers skip re-creating it. See Understanding — Tombstones. |
System directory: system
Entities (experiments, tags, dashboards, dashboard apps, reports, notes) are stored as one key per field per entity; a “by name” index gives UUID from name where applicable.
| Key tuple pattern | Description |
|---|---|
("experiments", uuid, field) |
Experiment record fields (e.g. name, description). |
("experiments_by_name", name) |
Name → experiment UUID. |
("experiment_runs", exp_uuid, run_hash) |
Experiment ↔ run association. |
("run_experiment", run_hash) |
Run → experiment UUID (reverse lookup). |
("tags", uuid, field) |
Tag record fields. |
("tags_by_name", name) |
Name → tag UUID. |
("run_tags", run_hash, tag_uuid) |
Run–tag link (also stored in runs_dir for co-location). |
("tag_runs", tag_uuid, run_hash) |
Tag → runs. |
("dashboards", uuid, field) |
Dashboard entity fields. |
("dashboard_apps", uuid, field) |
Dashboard app fields. |
("reports", uuid, field) |
Report fields. |
("notes", uuid, field) |
Note fields. |
Project and UI state:
| Key tuple pattern | Description |
|---|---|
("project", key, "__leaf__") |
Project-level settings (e.g. name, description). |
("pinned_sequences", "__leaf__") |
Pinned sequence list for the UI. |
("__ping__",) |
Used by the backend for a minimal read to verify FDB connectivity. |
Encoding and tree convention
- Values: msgpack via
storage/encoding.py(with datetime extension). - Tree storage: Nested dicts/lists are flattened into key paths. Scalars are stored with a final
__leaf__component so thatsubspace.range((run_hash, "meta"))returns all meta keys. Empty dict/list use sentinels__empty_dict__/__empty_list__.
Next
- Getting started — Run the stack locally.
- Advanced — Per-component design and architectural decisions (backend, frontier, workers, FDB, Kafka, S3, UI, client SDK).
- API — Endpoint reference.