Skip to content

S3 and blob storage

Large binary objects (images, audio, artifacts, and other blobs) are stored in an S3-compatible object store. Matyan uses S3 only for blob content; metadata (which run, which sequence, step index) lives in FoundationDB. The frontier issues presigned URLs so clients upload directly to S3; the backend fetches from S3 when serving blob content to the UI.

Role

  • Blob storage — Actual bytes (PNG, WAV, serialized figures, etc.) are stored in a single bucket (e.g. matyan-artifacts) under keys that include run and sequence identity. No blob bytes go through Kafka or the frontier.
  • Metadata in FDB — Run attrs and sequence metadata (including S3 key references) are written by ingestion workers to FDB after the client uploads and the frontier publishes a blob_ref to Kafka.
  • Serving — When the UI (or API) requests blob content (e.g. image batch, audio), the backend reads the S3 key from FDB, fetches the object from S3, and streams it back, often using encrypted blob URIs so that the client does not see raw S3 keys.

Architectural decisions

Why S3 (not FDB) for blobs

  • Size and cost — FDB is optimized for smaller values and transactional reads/writes. Large blobs (megabytes) would blow up transaction size and storage cost. S3 is designed for large objects and is cheaper at scale.
  • Streaming — Serving a large file is a simple GET; the backend streams from S3 to the client without loading the whole object into memory. FDB would require reading the value into memory or chunking.
  • Ecosystem — S3 (or MinIO, RustFS, etc.) is a standard choice for ML artifacts and is often already present in data platforms. Matyan reuses it instead of inventing a custom blob store.

Presigned URLs (client → S3 directly)

Clients do not upload blob bytes to the frontier. Instead:

  1. Client asks the frontier for a presigned PUT URL (e.g. POST /api/v1/rest/artifacts/presign with run id, sequence, step, etc.).
  2. Frontier generates an S3 key, creates a presigned URL with a time-limited expiry, and publishes a blob_ref message to Kafka (so workers can write the key into FDB).
  3. Client uploads the blob directly to S3 using the presigned URL.
  4. Ingestion worker consumes the blob_ref and writes the S3 key (and any metadata) into FDB.

This avoids the frontier (or backend) being a bottleneck or proxy for large uploads and keeps the frontier stateless and lightweight.

Single bucket, key layout

All blobs live in one configurable bucket. Key structure (e.g. run hash, sequence type, step) is chosen so that:

  • Listing by run — Control worker (or cleanup jobs) can delete all objects under a run prefix when a run is deleted.
  • Uniqueness — Key includes enough context (run, sequence, step, maybe index) so that concurrent uploads do not collide.

The exact key format is an implementation detail; the important point is that “all blobs for run X” can be enumerated and deleted by prefix.

Backend fetches on read

When the UI requests blob content (e.g. for the image explorer), the backend looks up the S3 key from FDB, fetches the object from S3 using the same credentials/config as the workers, and streams it back. So S3 credentials are needed by both the frontier (to generate presigned URLs), the backend (to serve blobs), and the control worker (to delete blobs). The frontier does not need to read from S3; it only generates URLs.

Encrypted blob URIs

Responses that reference blobs (e.g. custom object search results) may expose an encrypted blob URI (Fernet) instead of the raw S3 key. The UI or client sends this token to a backend endpoint (e.g. blob-batch), which decrypts it, fetches from S3, and returns the bytes. This hides S3 keys and bucket details from the client and allows the backend to enforce access control and audit.

Control worker: S3 cleanup on run_deleted

When a run is deleted, the backend writes the deletion to FDB and publishes run_deleted to the control-events topic. The control worker consumes it and deletes all S3 objects under that run’s prefix. So blob lifecycle is tied to run lifecycle; if the control worker is down or the event is lost, blobs can be orphaned. A periodic job (e.g. cleanup-orphan-s3 CronJob) can scan FDB for deletion tombstones and remove leftover S3 objects to avoid unbounded storage growth.

  • Frontier — Issues presigned URLs and publishes blob_ref.
  • Backend — Fetches blobs from S3 for API responses.
  • Workers — Control worker performs S3 cleanup; ingestion worker writes blob refs to FDB.
  • FoundationDB — Blob metadata and keys stored in FDB.