SDK — Troubleshooting | ResonTech Docs

Auth & Login

`AuthError: 401 Unauthorized: Invalid credentials`

Wrong email or password. The SDK retries with a refresh token once before raising.

`AuthError: Login response did not contain an access token`

Almost always a wrong base_url. Open it in a browser — a healthy deployment returns a 404 page, not a login portal or a blank 200.

Storage / Bucket

`StorageError: No S3 bucket configured`

You haven't provisioned a bucket yet. Open the web UI → Profile → Storage → Provision Bucket, pick an alias and quota, and paste the returned secret into ResonTechConfig.

`StorageError: Could not resolve your storage bucket`

`StorageError: s3_access_key_id and s3_secret_access_key are required`

You constructed ResonTechConfig with empty strings. Pass both values before creating ResonTech(config).

`StorageError: … AccessDenied`

Credentials in s3_access_key_id / s3_secret_access_key don't match the bucket on your account. Rotate the key from Profile → Storage → Rotate Key and update your config.

`StorageError: … InvalidAccessKeyId`

The access key no longer exists in Garage. Someone (maybe you) already rotated it. Rotate again to get a fresh pair.

Slow or stalled shard upload

Check your upstream bandwidth. The SDK uses 50 MB parts × 5 concurrent threads — a 50 MB part on a 10 Mbit/s link takes ~40 s. Watch stderr — the _ProgressPrinter shows per-file percent.

Submission

`ValidationError: No shard zip files found in './shards'`

shards_dir must contain at least one *.zip. The filename pattern is free-form — shard_0.zip, part-01.zip, anything with a .zip suffix works.

`ValidationError: shards_dir is not a directory`

Typo or wrong path — confirm Path(shards_dir).is_dir() locally first.

`ValidationError: model_checkpoint must be a .pt file`

The backend enforces exactly one .pt in model/. Rename or convert your checkpoint before passing it.

`HTTP 400: Folder not found in your S3 bucket: "/jobs/foo/scripts"`

The submit backend re-verifies every path. Usually means an upload silently failed earlier — re-run rt_submit or inspect the bucket with sdk.storage.list("jobs/foo/").

`HTTP 400: Shard count (5) does not match worker count (3)`

You passed explicit worker_ids=[...] with a different length than the shard count. Either upload fewer shards or pass the right number of workers. With auto_select_workers=True, the backend picks the shard count for you.

Source Extraction

`ValueError: Could not extract source of 'MyModel'`

Happens in some notebook kernels where torch patches inspect. Workarounds: save the class to a .py file and set ModelConfig(model_class="my_module.MyModel"); or restart the kernel before importing torch and define the class first.

`NameError: name 'nn' is not defined` (server-side)

Imports live in a different cell from the class. Consolidate them into the same cell and resubmit. Applies to model=, executor=, persistor=.

`NameError: name 'torch' is not defined` on the worker

Same fix — put import torch in the same cell as your class.

Network / Infrastructure

`ResonTechError: Cannot reach <base_url>`

Backend is down or your network blocks it. Try the same URL in a browser.

Presigned shard URLs 403 the worker

Garage CORS or the bucket's platform-write-key grant is misconfigured. This is a platform-side issue — contact your admin.

Uploads fail with `InvalidRequest: The Content-Md5 you specified was invalid`

Rare boto3 regression on old versions. Upgrade: pip install -U boto3 (≥ 1.34 required).

Jupyter / Notebook Issues

`ModuleNotFoundError: No module named 'resontech'`

The SDK is not visible to the Python the notebook kernel is running. Compare sys.executable inside the notebook against the Python that your SDK was installed into — they need to match. The example notebooks call sys.executable when wiring up dependencies to avoid this mismatch.

Kernel ignores SDK source changes after editing

Use importlib.reload after editing SDK source — the example notebooks include a reload cell at the top.

Where Do I See the Real Error?

The dashboard job detail page (job.dashboard_url) is the source of truth for runtime errors. Worker logs, training stdout, Python tracebacks — they all end up there. The SDK only surfaces errors from the submission pipeline itself.

PreviousJob Lifecycle NextFL Integration Guide

SDK — Troubleshooting

Auth & Login

AuthError: 401 Unauthorized: Invalid credentials

AuthError: Login response did not contain an access token

Storage / Bucket

StorageError: No S3 bucket configured

StorageError: Could not resolve your storage bucket

StorageError: s3_access_key_id and s3_secret_access_key are required

StorageError: … AccessDenied

StorageError: … InvalidAccessKeyId