DocsPlatformSubmit Wizard
Platform

Submit Wizard

Step-by-step walkthrough of the 7-step job submission wizard — from naming a job to launching workers.

Overview

The Submit wizard at /dashboard/submit walks you through 7 steps. Progress is auto-saved to localStorage at every step — close the browser and resume later from the Resume Draft list.

StepStorage writesWhat you do
1 — NameNoneEnter a job name
2 — TrainingNoneSet rounds, batch size, learning rate, local epochs
3 — FederationNoneSet min clients and aggregation strategy
4 — ModelNoneSet model class, pretrained flag, num_classes
5 — AdvancedOn "Build"Review & edit generated configs and scripts, then click Build to write to bucket
6 — ValidateUser uploadsUpload model_def.py, shard zips, optional checkpoint
7 — LaunchRead onlySubmit job — workers are dispatched
Submit wizard — step progress indicator

Steps 1–4: Name, Training, Federation, Model

Step 1 — Name

Enter a human-readable name, e.g. mnist-round-1. The platform sanitizes it to an S3-safe prefix: jobs/mnist-round-1/.

Step 2 — Training

A flexible key-value hyperparameter editor. Every row becomes a field in config_fed_client.json → executors[0].executor.args. You can add, remove, or rename any parameter. Defaults:

Default paramValueWhere it ends up
local_epochs2config_fed_client.json → args.local_epochs
batch_size32config_fed_client.json → args.batch_size
learning_rate0.001config_fed_client.json → args.learning_rate
i
Add num_classes here if your model needs it — it will be injected into both args (executor) and persistor.model.args (server).
Step 2 — Training Config form

Step 3 — Federation

FieldDefaultWhere it ends up
Num Rounds5config_fed_server.json → workflows[0].args.num_rounds
Min Clients1config_fed_server.json → workflows[0].args.min_clients
Wait After Min Received10 sconfig_fed_server.json → workflows[0].args.wait_time_after_min_received
Heartbeat Timeout600 sconfig_fed_server.json → server.heart_beat_timeout
Step 3 — Federation Config form

Step 4 — Model

FieldDefaultWhere it ends up
Model Classmodel_def.MyModelWrapperconfig_fed_server.json → components[persistor].args.model.path
Adapter Modulemodel_defconfig_fed_client.json → args.adapter_module
Train Functionfl_train_modelconfig_fed_client.json → args.train_fn

From Step 4 you can click Next (go to Step 5 to review configs) or Skip Advanced — both paths build the workspace. Skip Advanced writes the auto-generated defaults immediately and jumps straight to Validate.

Step 4 — Model Config form

Step 5: Advanced

Step 5 shows Monaco editors for the three NVFlare config files and two generated scripts — pre-populated from your Steps 2–4 settings. Existing files in your bucket are loaded automatically if the job was previously created.

Edit anything you need, then click Build workspace. This writes these files directly to your Garage bucket:

i
If you don't need to customise configs, use Skip Advanced on Step 4 — it builds the workspace with generated defaults and skips directly to Validate.

Step 6: Validate

User uploads required. The platform blocks launch until required files are detected in your bucket.

Required

FileWhat it does
model_def.pyYour model class + fl_train_model() — class name must match Step 4
shard_0.zip, shard_1.zip, …One .zip per GPU worker — count determines worker allocation

Optional

FileWhat it does
checkpoint.ptPre-trained weights loaded at round 0 — exactly 1 .pt file if provided
!
Number of shard zips = number of workers reserved. If more than one .pt exists in model/, submission is blocked.
Step 6 — all assets uploaded, validation passing
Green checkmarks appear when each required file is detected.

Step 7: Launch

Click Launch. The API validates the bucket, counts shards, allocates workers, generates presigned shard URLs, and dispatches the NVFlare server + workers.

  1. 1

    Path validation

    Verifies scripts/, configs/, requirements/ exist.
  2. 2

    model/ check

    If present: exactly one .pt file required.
  3. 3

    Shard count → worker allocation

    N zips → N workers reserved.
  4. 4

    Presigned shard URLs

    1-hour GET URL per (worker, shard) pair.
  5. 5

    Dispatch

    NVFlare server + worker containers started.
Step 7 — launch confirmation, job created as PENDING
i
After launch you land on the Jobs page. The job starts as PENDING while workers spin up, then transitions to RUNNING once all required workers connect to the NVFlare server.

Draft Jobs

The wizard saves progress to localStorage at every step.

  • Close the browser mid-wizard — progress is restored when you return.
  • Drafts listed on the Submit page under Resume Draft.
  • Automatically removed after successful launch.
  • Delete manually from the draft list at any time.