Dataset Format & Sharding | ResonTech Docs

How Workers Receive Data

At job dispatch, each worker receives a presigned 1-hour GET URL for its assigned shard ZIP from your S3 bucket. The worker:

Downloads the ZIP using the presigned URL (direct from Garage, no proxy)
Unpacks it flat into /var/tmp/nvflare/data/{shard_index}/
Your training code accesses this path via payload["dataset"]["data_root"]

The number of shard ZIPs you upload to jobs/{name}/shards/ equals the number of GPU workers allocated. Upload 4 shards → 4 workers run in parallel, each training on its own partition.

Shards are assigned to workers in sort order by filename. Name them consistently:shard_0001.zip, shard_0002.zip, etc.

ZIP Structure Rules

The ZIP is unpacked flat into data_root. There must be no wrapper folder.

Do not add a wrapper folder inside the ZIP. The manifest and data must be at the root of the archive.

Manifest Format (manifest.ndjson)

The default dataset adapter (used by rt_submit()) expects a manifest.ndjson file at the root of each shard ZIP. Each line is a JSON object describing one training sample:

Field	Type	Required	Description
id	str	Yes	Unique identifier for this sample (any string)
uri	str	Yes	Image URI. Use file:// for local paths relative to the worker, https:// for remote URLs (downloaded and cached on first access)
y	list[int]	Yes	Label indices. Multi-label supported. Must be 0-indexed integers.
meta	dict	No	Optional metadata — available in payload but not used by the default adapter

Local file URIs

When using file:// URIs, paths should be relative to the worker's data_root. Use the absolute path on the worker:

The manifest-based format is required for the default adapter and rt_submit(). If you write your own model_def.py manually, you can use any dataset format as long as your code reads from payload["dataset"]["data_root"].

Creating Shards

Split your dataset into N roughly equal partitions, one ZIP per worker:

Creating a manifest programmatically

Shards do not need to be equal in size. FedAvg weights each worker's contribution by its sample count — just return the correct samples value from your fl_train_model().

Uploading Shards

Upload shard ZIPs to your bucket under jobs/{name}/shards/.

Verifying shard count

Before submitting, verify that the number of shards in your bucket matches your intended worker count. The platform will allocate exactly N workers for N shard files.

Non-Image Datasets

The manifest format and default adapter are designed for image classification. For other data types (text, tabular, audio, time series), write a custom model_def.pythat reads directly from payload["dataset"]["data_root"]:

Pack your data files directly into the shard ZIP without a manifest. The data_root will contain whatever files you put in the ZIP root.

PreviousFL Integration Guide Nextrclone Access