How Workers Receive Data
At job dispatch, each worker receives a presigned 1-hour GET URL for its assigned shard ZIP from your S3 bucket. The worker:
- Downloads the ZIP using the presigned URL (direct from Garage, no proxy)
- Unpacks it flat into
/var/tmp/nvflare/data/{shard_index}/ - Your training code accesses this path via
payload["dataset"]["data_root"]
The number of shard ZIPs you upload to jobs/{name}/shards/ equals the number of GPU workers allocated. Upload 4 shards → 4 workers run in parallel, each training on its own partition.
shard_0001.zip, shard_0002.zip, etc.ZIP Structure Rules
The ZIP is unpacked flat into data_root. There must be no wrapper folder.
Manifest Format (manifest.ndjson)
The default dataset adapter (used by rt_submit()) expects a manifest.ndjson file at the root of each shard ZIP. Each line is a JSON object describing one training sample:
| Field | Type | Required | Description |
|---|---|---|---|
| id | str | Yes | Unique identifier for this sample (any string) |
| uri | str | Yes | Image URI. Use file:// for local paths relative to the worker, https:// for remote URLs (downloaded and cached on first access) |
| y | list[int] | Yes | Label indices. Multi-label supported. Must be 0-indexed integers. |
| meta | dict | No | Optional metadata — available in payload but not used by the default adapter |
Local file URIs
When using file:// URIs, paths should be relative to the worker's data_root. Use the absolute path on the worker:
rt_submit(). If you write your own model_def.py manually, you can use any dataset format as long as your code reads from payload["dataset"]["data_root"].Creating Shards
Split your dataset into N roughly equal partitions, one ZIP per worker:
Creating a manifest programmatically
samples value from your fl_train_model().Uploading Shards
Upload shard ZIPs to your bucket under jobs/{name}/shards/.
Verifying shard count
Before submitting, verify that the number of shards in your bucket matches your intended worker count. The platform will allocate exactly N workers for N shard files.
Non-Image Datasets
The manifest format and default adapter are designed for image classification. For other data types (text, tabular, audio, time series), write a custom model_def.pythat reads directly from payload["dataset"]["data_root"]:
Pack your data files directly into the shard ZIP without a manifest. The data_root will contain whatever files you put in the ZIP root.