Overview
The SDK returns a thin Job handle on success. After that, the web dashboard is where progress tracking happens — but you can still read status and artefacts from Python if you need to.
States
JobState enum values: PENDING, RUNNING, FINISHED, FAILED, CANCELLED. Jobs start in PENDING while the kernel allocates resources, then transition to RUNNING as soon as workers are ready.
What rt_submit Returns
Open job.dashboard_url to get the full tracking UI: rounds, logs, file browser, model download, worker telemetry.
Checking Status From Python
Use sdk.jobs.get(job.id) for a one-shot refresh — no polling helpers are built into the SDK.
Job.stream() / Job.wait() — those were removed in 0.2.0 to keep the SDK focused on submission. Wrap the loop above yourself if you want polling logic in your code.Listing Your Jobs
list() paginates with limit / offset.
Output Locations
Once a job reaches FINISHED:
| Path | Contents |
|---|---|
| jobs/<slug>/files_out/ | Training logs, per-round metrics, anything your train() wrote. |
| jobs/<slug>/model_out/ | Final aggregated .pt checkpoint. |
Everything under jobs/<slug>/ stays in your bucket until you delete it (subject to your storage quota).
From the web UI
- Dashboard → Jobs → click your job → Browse Job Files opens the file browser at
jobs/<slug>/. - Click
files_out/training_log.txtto open in the Monaco viewer. - Click
model_out/checkpoint.pt→ Download for the final model.
From Python
Failed Jobs
On FAILED, the dashboard shows the error message and worker logs. Common causes:
- Worker hit CUDA OOM → raise
estimated_memory_mbor pick a larger GPU class. - Shard count mismatch → backend needs exactly N workers for N shards. Re-upload to match or use
auto_select_workers=True. model_def.pymissing / broken — see source-extraction notes.requirements.txtresolution failure — check spelling and version pins; prefer exact versions for reproducibility.
Cancelling
Stop a running job from the dashboard (Job detail → Stop) or via the backend API directly:
The SDK doesn't expose a dedicated stop() wrapper yet — this is the same endpoint the UI uses.
Archival
Bucket contents are not auto-deleted. Once you have the outputs you want:
Or use the file browser's folder delete in the web UI (recursive by default).