Auth & Login
AuthError: 401 Unauthorized: Invalid credentials
Wrong email or password. The SDK retries with a refresh token once before raising.
AuthError: Login response did not contain an access token
Almost always a wrong base_url. Open it in a browser — a healthy deployment returns a 404 page, not a login portal or a blank 200.
Storage / Bucket
StorageError: No S3 bucket configured
You haven't provisioned a bucket yet. Open the web UI → Profile → Storage → Provision Bucket, pick an alias and quota, and paste the returned secret into ResonTechConfig.
StorageError: Could not resolve your storage bucket
Login succeeded but GET /api/users/storage/bucket returned 404. Same fix — provision a bucket.
StorageError: s3_access_key_id and s3_secret_access_key are required
You constructed ResonTechConfig with empty strings. Pass both values before creating ResonTech(config).
StorageError: … AccessDenied
Credentials in s3_access_key_id / s3_secret_access_key don't match the bucket on your account. Rotate the key from Profile → Storage → Rotate Key and update your config.
StorageError: … InvalidAccessKeyId
The access key no longer exists in Garage. Someone (maybe you) already rotated it. Rotate again to get a fresh pair.
Slow or stalled shard upload
Check your upstream bandwidth. The SDK uses 50 MB parts × 5 concurrent threads — a 50 MB part on a 10 Mbit/s link takes ~40 s. Watch stderr — the _ProgressPrinter shows per-file percent.
Submission
ValidationError: No shard zip files found in './shards'
shards_dir must contain at least one *.zip. The filename pattern is free-form — shard_0.zip, part-01.zip, anything with a .zip suffix works.
ValidationError: shards_dir is not a directory
Typo or wrong path — confirm Path(shards_dir).is_dir() locally first.
ValidationError: model_checkpoint must be a .pt file
The backend enforces exactly one .pt in model/. Rename or convert your checkpoint before passing it.
HTTP 400: Folder not found in your S3 bucket: "/jobs/foo/scripts"
The submit backend re-verifies every path. Usually means an upload silently failed earlier — re-run rt_submit or inspect the bucket with sdk.storage.list("jobs/foo/").
HTTP 400: Shard count (5) does not match worker count (3)
You passed explicit worker_ids=[...] with a different length than the shard count. Either upload fewer shards or pass the right number of workers. With auto_select_workers=True, the backend picks the shard count for you.
Source Extraction
ValueError: Could not extract source of 'MyModel'
Happens in some notebook kernels where torch patches inspect. Workarounds: save the class to a .py file and set ModelConfig(model_class="my_module.MyModel"); or restart the kernel before importing torch and define the class first.
NameError: name 'nn' is not defined (server-side)
Imports live in a different cell from the class. Consolidate them into the same cell and resubmit. Applies to model=, executor=, persistor=.
NameError: name 'torch' is not defined on the worker
Same fix — put import torch in the same cell as your class.
Network / Infrastructure
ResonTechError: Cannot reach <base_url>
Backend is down or your network blocks it. Try the same URL in a browser.
Presigned shard URLs 403 the worker
Garage CORS or the bucket's platform-write-key grant is misconfigured. This is a platform-side issue — contact your admin.
Uploads fail with InvalidRequest: The Content-Md5 you specified was invalid
Rare boto3 regression on old versions. Upgrade: pip install -U boto3 (≥ 1.34 required).
Jupyter / Notebook Issues
ModuleNotFoundError: No module named 'resontech'
The SDK is not visible to the Python the notebook kernel is running. Compare sys.executable inside the notebook against the Python that your SDK was installed into — they need to match. The example notebooks call sys.executable when wiring up dependencies to avoid this mismatch.
Kernel ignores SDK source changes after editing
Use importlib.reload after editing SDK source — the example notebooks include a reload cell at the top.
Where Do I See the Real Error?
job.dashboard_url) is the source of truth for runtime errors. Worker logs, training stdout, Python tracebacks — they all end up there. The SDK only surfaces errors from the submission pipeline itself.