Search and MatyanQL
MatyanQL is the query language for filtering runs, metrics, and custom objects (images, audios, texts, distributions, figures) in the UI and API. This page is the single reference: Quick reference below gives syntax and examples by tab; Reference explains the pipeline, property tables, and why some queries are fast or slow.
Quick reference
How it works
- MatyanQL is a single Python expression evaluated in a boolean context (like an
ifcondition). - The expression is evaluated for each candidate (run, or run+sequence). If it is truthy, the entity is included.
- Queries are filter-only; they do not define sorting or aggregation.
- An optional
SELECT ... IF <expr>prefix is stripped; you can also write the condition directly. - If you do not mention
run.is_archivedorrun.archived, the backend automatically addsrun.is_archived == Falseso archived runs are hidden by default.
Which namespaces in which tab
| Tab / context | Namespaces you can use | Use for |
|---|---|---|
| Runs | run only |
Filter by run properties and hyperparameters. |
| Metrics | run, metric |
Filter by run and by metric name/context/last. |
| Images | run, images |
Filter by run and by image sequence name/context. |
| Audios | run, audios |
Same with audios. |
| Texts | run, texts |
Same with texts. |
| Distributions | run, distributions |
Same with distributions. |
| Figures | run, figures |
Same with figures. |
Use the namespace that matches the tab (e.g. metric on Metrics, images on Images). Full property lists are in the Reference section below.
Run search (Runs tab)
Use the run namespace. Common properties: run.hash, run.name, run.experiment, run.tags, run.is_archived, run.active, run.created_at, run.duration, run.hparams.<name>.
Examples (indexed — fast):
run.experiment == "baseline"
"production" in run.tags
run.hparams.lr > 0.0001
run.hparams.batch_size == 32
Examples (not indexed — can be slow):
run.duration > 60
run.created_at >= datetime(2026, 3, 1)
run.experiment == "baseline" and run.duration > 10
Metric search (Metrics tab)
Use run and metric (metric.name, metric.context, metric.last, metric.last_step). Only metric.name == "literal" is index-backed.
Examples:
metric.name == "loss"
run.experiment == "baseline" and metric.name == "accuracy"
metric.name == "loss" and metric.context.subset == "train"
Custom object search (Images, Audios, Texts, etc.)
Use run and the tab’s namespace: images, audios, texts, distributions, or figures. Same shape as metric: name, context, last, last_step.
Examples:
run.experiment == "exp1" and images.name == "samples"
audios.name == "waveform" and run.active == True
Why some queries are fast and others slow (summary)
- Very fast: Only index-backed predicates (e.g.
run.experiment == "x",run.hparams.lr == 0.001,"tag" in run.tags,metric.name == "loss"). - Slower: At least one non-indexed predicate; backend may still use an index for a subset then filter in memory.
- Slow (full scan): No index-backed predicate, or OR with an unindexed branch.
See Index-backed predicates below for the full list.
Security
MatyanQL is evaluated with RestrictedPython: only a safe subset of Python is allowed. User-written queries cannot execute arbitrary code.
Reference
Pipeline overview
When you submit a MatyanQL string (e.g. in the Search box or via the run/metric search API), the backend runs a fixed pipeline:
- String normalization — Optional
SELECT ... IF <expr>prefix is stripped; the result is wrapped in parentheses. Empty input is treated as "show non-archived runs." - Default predicate — Unless the query already mentions
run.is_archivedorrun.archived, the backend ANDs the expression withrun.is_archived == Falseso archived runs are hidden by default. - Parse — The string is parsed into a Python AST (abstract syntax tree) with
compile(..., mode="eval"). Invalid syntax raisesSyntaxError(e.g. 400 to the UI). - AST rewrites — Two rewrites run on the AST:
- datetime(...) — Replaced with a numeric UTC timestamp using the request's timezone offset (
x_timezone_offset), so e.g.datetime(2026, 3, 10) <= run.created_atbecomes a float comparison. - Chained comparisons —
a <= b < cis split into(a <= b) and (b < c)so the planner can match each half against index patterns.
- datetime(...) — Replaced with a numeric UTC timestamp using the request's timezone offset (
- Planner — The prepared AST is passed to plan_query(db, prepared_ast). The planner walks the AST and matches index-backed predicates (experiment, tag, archived, active, hparams, metric.name). It returns:
- PlanResult(candidates, exact, trace_names)
candidates: list of run hashes from index(es), or None for "no index use" (full scan).exact: if True, every candidate satisfies the full query; if False, candidates are a superset and the backend must run RestrictedPythonQuery.check() on each.trace_names: when the query hasmetric.name == "...", the set of metric names to stream; None means "all traces."
- PlanResult(candidates, exact, trace_names)
- Execution — Depending on the endpoint (run search, metric search, or custom-object search):
- If candidates is a list and exact is True: iterate only those hashes (no per-run filter).
- If candidates is a list and exact is False: iterate candidates, load run (and optionally sequence) data, and call q.check(run=..., metric=...) (or the right namespace) to filter.
- If candidates is None: lazy path — iterate all run hashes from the created_at index (no predicate index), load each run (and sequences if needed), and call q.check() for each. This is a full scan.
- Security — The expression is compiled with RestrictedPython: only a safe subset of Python is allowed (no file access, no arbitrary imports). The only allowed namespaces are
run,metric, and the custom-object names (images,audios,distributions,figures,texts), plus builtins likedatetime,min,max,sorted, etc.
Namespaces and which tab uses them
MatyanQL expressions can reference one or more namespaces. The backend passes only the namespaces that exist in the current context; referencing a namespace that wasn't passed (e.g. metric on the Runs tab) makes that part of the expression evaluate against a missing name and can lead to errors or no matches.
| UI / API context | Namespaces passed to check() |
Typical use |
|---|---|---|
| Runs tab / run search | run only |
Filter by run properties and hyperparameters. |
| Metrics tab / metric search | run, metric |
Filter by run and by metric name/context/last. |
| Images tab | run, images |
Filter by run and by image sequence name/context. |
| Audios tab | run, audios |
Same with audios. |
| Texts tab | run, texts |
Same with texts. |
| Distributions tab | run, distributions |
Same with distributions. |
| Figures tab | run, figures |
Same with figures. |
In every sequence context (metrics, images, audios, etc.), the second namespace is a sequence view with the same shape: name, context (dict-like), last, last_step.
Run namespace (run)
Available in all tabs. Properties (and examples) include:
| Property | Type / notes | Example |
|---|---|---|
run.hash |
str | run.hash == "abc123" |
run.name |
str | run.name != "" |
run.experiment |
str or None | run.experiment == "baseline" |
run.tags |
container | "production" in run.tags |
run.archived / run.is_archived |
bool | run.is_archived == False (default) |
run.active |
bool | run.active == True |
run.created_at |
float (timestamp) | run.created_at >= datetime(2026, 3, 10) |
run.duration |
float | run.duration > 60 |
run.hparams.<name> |
any (top-level scalar) | run.hparams.lr > 0.001 |
run["hparams"]["<name>"] |
same | run["hparams"]["batch_size"] == 32 |
Hyperparameters are only indexed when they are top-level scalar attributes under hparams (e.g. run.hparams.lr, not nested objects). Dot and bracket syntax are both supported for indexing.
Metric namespace (metric)
Available only in the Metrics tab (and metric search API). Use metric in the expression.
| Property | Type / notes | Example |
|---|---|---|
metric.name |
str | metric.name == "loss" |
metric.context |
dict-like | metric.context.subset == "train" |
metric.last |
last value | metric.last >= 0.5 |
metric.last_step |
int or None | metric.last_step > 100 |
Only metric.name == "literal" is index-backed (Tier 3 trace-name index). All other metric conditions (context, last, last_step, or metric.name.startswith(...)) are not indexed: the planner cannot use them to narrow candidates, so the backend may still scan runs/traces and filter in memory.
Custom-object namespaces (images, audios, texts, distributions, figures)
Same shape as metric: name, context, last, last_step. Use the name that matches the tab (e.g. images on Images tab). There is no index on sequence name or context for custom objects — filtering by images.name == "x" or run.experiment == "y" uses the run-level index for run.experiment and then filters sequences in memory.
Index-backed predicates (why some queries are fast)
The planner only recognizes the following patterns. Anything else does not contribute to candidates and can force a full scan or a superset + filter.
| Predicate shape | Index | Notes |
|---|---|---|
run.experiment == "name" |
Tier 1 (experiment) | Exact experiment name. |
"tag" in run.tags |
Tier 1 (tag) | Run has this tag. |
run.active == True or False |
Tier 1 (active) | Live vs finished. |
run.is_archived == True or False (or run.archived) |
Tier 1 (archived) | Archived flag. |
run.hash == "hash" |
— | Single run by hash. |
run.hparams.<name> == <val> |
Tier 2 (hparam) | Top-level scalar hparam equality. |
run.hparams.<name> < \| <= \| > \| >= <val> |
Tier 2 (hparam range) | Range on top-level scalar hparam. |
run["hparams"]["<name>"] same ops |
Tier 2 | Bracket form. |
metric.name == "name" |
Tier 3 (trace name) | Only in metric search; restricts which traces are streamed. |
Not indexed (so they don't narrow the candidate set; used only in check()):
run.name,run.description,run.created_at,run.duration,run.tags.contains(...)(other thanin run.tagsfor a single tag)metric.context.*,metric.last,metric.last_step,metric.name.startswith(...), or any non-equality onmetric.name- Any predicate that the planner doesn't match (e.g. complex expressions, method calls that aren't one of the above)
AND logic: if the query is A and B and both A and B are index-backed, the planner intersects the candidate sets → often very fast. If only A is index-backed, the planner returns candidates for A with exact=False, and the backend loads each run and runs check() to apply B.
OR logic: if any branch of an OR is not index-backed, the planner returns None (full scan). So run.experiment == "a" or run.name == "b" cannot use the experiment index and falls back to the lazy path.
Why some queries are fast and others slow (detailed)
-
Very fast (index-only, exact)
Query uses only index-backed predicates and the planner can resolve the full expression (e.g.run.experiment == "baseline" and run.hparams.lr == 0.001). The backend iterates only the returned run hashes and does not callcheck(). No per-run meta or attrs load for filtering. -
Fast (index superset + filter)
Query mixes index-backed and non-indexed predicates (e.g.run.experiment == "baseline" and run.name != ""). The planner returns candidates forexperiment == "baseline"withexact=False. The backend iterates only those hashes but loads run meta (and optionally traces) and runs check() for each. Slower than exact path but still only touches a subset of runs. -
Slow (full scan / lazy path)
Query has no index-backed predicate, or has an OR with at least one unindexed branch, or uses only run properties that are not in the planner (e.g.run.created_at > X,run.duration > 10,run.name.startswith("x")). The planner returns candidates=None. The backend iterates all run hashes from the created_at index, loads each run (and for metric/custom-object search, all traces), and runs check() for each. Time is proportional to total runs (and total traces in metric/object search). -
Metric search: unindexed sequence predicate
If the query usesmetric.context,metric.last, ormetric.name.startswith(...)(or any non–index-backed metric condition), the backend resets the candidate list to None for metric search and uses the lazy path so that every run and trace is considered. So even ifrun.experiment == "x"is indexed, addingmetric.context.subset == "train"forces a full scan for that request.
Summary: prefer index-backed predicates (experiment, tag, active, archived, top-level hparam equality/range, and for metrics metric.name == "..."). Avoid relying only on run.created_at, run.duration, run.name, or metric context/last in hot paths; combine them with at least one indexed predicate when possible.
Security and builtins
MatyanQL is compiled and executed with RestrictedPython. Only a restricted set of builtins is available (e.g. datetime, timedelta, min, max, sum, sorted, any, all). Attribute access is guarded (e.g. no format on strings, no attributes starting with _). This prevents arbitrary code execution from the query string.