Periodic cleanup jobs (CronJobs)
The Helm chart can create Kubernetes CronJobs that run the matyan-backend CLI in one-off pods on a schedule. These jobs perform periodic maintenance: cleaning up orphan S3 objects for deleted runs and removing old deletion tombstones from FoundationDB. Both jobs use the same FDB and (where needed) S3 configuration as the control worker. For how tombstones work and why they exist, see Understanding — Tombstones.
Overview
| CronJob | CLI command | Purpose |
|---|---|---|
| cleanup-orphan-s3 | matyan-backend cleanup-orphan-s3 |
Delete S3 objects under run prefixes for runs that have a deletion tombstone in FDB. Complements the control worker’s immediate S3 cleanup by catching any objects that were missed (e.g. eventual consistency, failed control-worker runs). |
| cleanup-tombstones | matyan-backend cleanup-tombstones |
Remove old deletion tombstones from the indexes subspace. Tombstones prevent deleted runs from being recreated by late ingestion messages; after the run’s S3 data has been cleaned (e.g. by control worker or cleanup-orphan-s3), the tombstone can be removed to avoid unbounded growth of the _deleted index. |
Both CronJobs are enabled by default with their default schedules. Disable them in values.yaml (or an overlay) by setting periodicJobs.<name>.enabled: false or schedule: "".
Enabling and configuring
Configuration lives under periodicJobs in the chart values.
cleanup-orphan-s3
periodicJobs:
cleanupOrphanS3:
enabled: true
schedule: "0 3 * * *" # e.g. daily at 03:00 UTC
lockTtlSeconds: 3600 # FDB lock TTL; only one instance runs at a time when > 0
limit: 0 # max run prefixes to process (0 = no limit)
# image: {} # override; defaults to controlWorker.image
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
- schedule — Cron expression (required when
enabled: true). Example:"0 3 * * *"= daily at 03:00. - lockTtlSeconds — When > 0, the job acquires an FDB-based lock before running; only one instance runs at a time. Prevents overlapping runs when the previous job is still executing. Default in chart: 3600 (1 hour).
- limit — Process at most this many run prefixes per run (0 = no limit). Useful to cap runtime or throttle S3 deletion rate.
The CronJob runs in a pod that has FDB and S3 env/volumes (same as control worker). It lists tombstones from FDB, then for each tombstone deletes S3 objects under {bucket}/{run_hash}/.
cleanup-tombstones
periodicJobs:
cleanupTombstones:
enabled: true
schedule: "0 4 * * 0" # e.g. weekly Sunday 04:00 UTC
olderThanHours: 168 # only clear tombstones older than 7 days
lockTtlSeconds: 600 # FDB lock TTL
# image: {}
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
- schedule — Cron expression (required when
enabled: true). Example:"0 4 * * 0"= weekly on Sunday at 04:00. - olderThanHours — Only tombstones older than this many hours are removed (default 168 = 7 days). Ensures control worker and cleanup-orphan-s3 have had time to delete S3 data before the tombstone is dropped.
- lockTtlSeconds — FDB lock so only one instance runs (default in chart: 600).
Concurrency and locking
- Each CronJob template sets
concurrencyPolicy: Forbid, so Kubernetes does not start a new job if the previous run is still active. - When
lockTtlSecondsis > 0, the CLI acquires a distributed lock in FDB (seematyan_backend.jobs.lock) before doing work. If the lock cannot be acquired (e.g. another replica or manual run holds it), the process exits with a non-zero code. This avoids duplicate cleanup when multiple clusters or manual invocations exist. - To disable a CronJob without removing the block (e.g. in an env-specific overlay), set
schedule: ""orenabled: false.
Running the commands manually
You can run the same commands outside Kubernetes (e.g. from a host with FDB and S3 access):
# Preview what would be deleted (no S3 or FDB writes)
matyan-backend cleanup-orphan-s3 --dry-run
matyan-backend cleanup-tombstones --dry-run --older-than-hours=168
# Run with FDB lock (recommended when scheduled via cron or multiple nodes)
matyan-backend cleanup-orphan-s3 --lock-ttl-seconds=3600
matyan-backend cleanup-tombstones --older-than-hours=168 --lock-ttl-seconds=600
See the backend CLI help (matyan-backend cleanup-orphan-s3 --help, matyan-backend cleanup-tombstones --help) and References — CLI for all options.
When to enable
- cleanup-orphan-s3 — Enable in production if you rely on S3 for blob storage and want a safety net for orphan objects (e.g. after control worker failures or eventual consistency). Schedule after peak hours; lock TTL should cover typical run duration. Uses the list of tombstones to know which run prefixes to delete from S3.
- cleanup-tombstones — Enable to prevent the
_deletedindex from growing indefinitely. Run after S3 cleanup (e.g. cleanup-orphan-s3 daily at 03:00, cleanup-tombstones weekly at 04:00) and setolderThanHoursso tombstones are only cleared once S3 cleanup has had time to run for that run.
Summary
| Job | Default schedule (when enabled) | Main options |
|---|---|---|
| cleanup-orphan-s3 | 0 3 * * * (daily 03:00) |
lockTtlSeconds, limit |
| cleanup-tombstones | 0 4 * * 0 (weekly Sun 04:00) |
olderThanHours, lockTtlSeconds |
Both jobs are rendered when periodicJobs.<name>.enabled is true and schedule is non-empty (both default to true with preset schedules). They also require a valid FDB cluster file configuration. See Production (Helm) for required FDB and S3 configuration.