Tombstones (deletion markers)
When a run is deleted, Matyan keeps a tombstone in FoundationDB. The tombstone is a small record that says “this run was deleted.” It prevents deleted runs from being recreated by late or out-of-order ingestion messages and drives periodic S3 cleanup. This page explains how tombstones work and how they are used.
What is a tombstone?
A deletion tombstone is a key-value pair in the FDB indexes subspace:
- Key:
("_deleted", run_hash) - Value: Timestamp (float) when the run was deleted (msgpack-encoded)
The run’s actual data (metadata, attributes, sequences, tags) is removed from FDB at delete time. Only this marker remains. Its roles are:
- Guard against late ingestion — Kafka messages can arrive out of order or after a run is deleted. The ingestion worker checks the tombstone before creating or updating a run; if present, it skips those messages so the run is not “resurrected.”
- Drive orphan S3 cleanup — The periodic job cleanup-orphan-s3 lists all tombstones and deletes S3 objects under each run’s prefix. That catches any blobs that were not removed by the control worker (e.g. eventual consistency, failed runs).
- Optional cleanup of tombstones — Tombstones can be removed later by the cleanup-tombstones job (or manually) so the
_deletedindex does not grow forever. This is done only after S3 cleanup has had time to run (e.g. tombstones older than 7 days).
Lifecycle
1. Run is deleted
- The user (or API client) deletes a run via the backend REST API (e.g.
DELETE /api/v1/runs/{run_id}or batch delete). - The backend does not delete the run in FDB itself. It publishes a
delete_runmessage to the data-ingestion Kafka topic. - An ingestion worker consumes that message and runs
runs.delete_run(), which in one transaction: - Removes all index entries for the run (Tier 1, Tier 2, Tier 3, reverse index),
- Removes run–experiment and run–tag associations,
- Deletes all keys under that run in the runs subspace (meta, attrs, traces, contexts, tags, seqs),
-
Writes the tombstone
("_deleted", run_hash) → timestamp. -
After processing the batch, the ingestion worker publishes a
run_deletedevent to the control-events topic. The control worker consumes it and deletes S3 objects under that run’s prefix (immediate S3 cleanup).
So: tombstone is written by the ingestion worker when it applies the delete. The control worker only cleans S3; it does not read or write tombstones.
2. Late ingestion messages
- If a create_run, log_metric, or other ingestion message for that run arrives later (e.g. delayed in Kafka), the ingestion worker groups messages by run_id and, before applying any of them, checks
is_run_deleted(tr, run_id). - If the tombstone exists, the worker skips the whole batch for that run (no create, no sequences written). So the run stays deleted.
- The worker uses a small in-memory cache of “deleted” run ids to avoid repeated FDB reads for the same run.
3. Orphan S3 cleanup (periodic)
- The cleanup-orphan-s3 CronJob (or manual
matyan-backend cleanup-orphan-s3) callslist_tombstones()to get all(run_hash, deleted_at)pairs. - For each run hash, it deletes S3 objects under the prefix
{run_hash}/. That way, any blobs that were not removed by the control worker (e.g. objects created after the event, or missed due to failures) are eventually removed. - The tombstone is not removed by this job; it only uses the list of tombstoned runs to drive S3 deletion.
4. Tombstone cleanup (periodic or manual)
- The cleanup-tombstones CronJob (or manual
matyan-backend cleanup-tombstones) lists tombstones, keeps only those older than N hours (e.g. 168 = 7 days), and callsclear_run_tombstone(db, run_hash)for each. - Clearing removes the
("_deleted", run_hash)key from FDB. After that, the run hash is no longer in the tombstone index (saving space and keeping the index bounded). - You should only clear tombstones after S3 cleanup has had time to run for that run (e.g. run cleanup-orphan-s3 daily and cleanup-tombstones weekly with
olderThanHours: 168).
5. Manual clear (re-ingest a deleted run)
- If you need to re-create a run that was deleted (e.g. restore from backup and re-ingest), you can clear its tombstone so the ingestion worker will accept new messages for that run again.
- REST API:
POST /api/v1/runs/{run_id}/clear-tombstone/(returns 204). No body; idempotent if the run has no tombstone. - CLI / code:
indexes.clear_run_tombstone(db, run_hash). - Direct restore: When you restore a run from a backup with matyan-backend restore, the restore code clears the tombstone for that run after restoring its data so the run is visible and not treated as deleted.
Summary
| Step | Who | What |
|---|---|---|
| Delete run | API → Kafka delete_run → ingestion worker |
Worker runs delete_run(): removes run data and indexes, writes tombstone. |
| S3 cleanup (immediate) | Ingestion worker → Kafka run_deleted → control worker |
Control worker deletes S3 prefix for that run. |
| Late ingestion | Ingestion worker | Before applying messages, checks tombstone; skips if run is deleted. |
| Orphan S3 cleanup | CronJob / cleanup-orphan-s3 |
Lists tombstones, deletes S3 prefix for each; does not remove tombstones. |
| Tombstone cleanup | CronJob / cleanup-tombstones |
Removes old tombstones from FDB (after S3 cleanup has run). |
| Re-ingest deleted run | API POST .../clear-tombstone/ or restore |
Clears tombstone so new ingestion for that run is accepted. |
See Architecture — FoundationDB key structure for where _deleted lives in the key layout, and Periodic cleanup jobs (CronJobs) for enabling and scheduling the cleanup jobs.