Platform

Train your model.
Serve it.

Submit a training job from a Python notebook. When it finishes, promote the checkpoint to an inference endpoint — one SDK, one bucket, one bill.

Live Platform Preview

The actual platform.
Click through it yourself.

Deploy an endpoint, scale replicas, walk through the submit wizard, filter the cluster — all live.

Inference
R
ResonTech
Aalice@acme.com
Inference Endpoints LIVE
Total RPS
89
Avg P50
52ms
Active
3 / 4
Replicas
6
EndpointRPSTrendP50Replicas
llama3-8b-chat
48
38ms
whisper-large-v3
72
210ms
sdxl-turbo
97
1.2s
mistral-7b-instruct
0
scaled-0
How it works

Two flows. One platform.

Submit training jobs or deploy inference endpoints — same SDK, same bucket.

DATA
01

Shard Dataset

Split your dataset into .zip shards — one per GPU worker. Upload to your S3 bucket via browser, rclone, or SDK.

FETCH
02

Bucket Pull

Workers get a short-lived presigned URL at dispatch and pull shards straight from storage. Raw data never traverses the control plane.

SUBMIT
03

Submit Job

Drop your scripts, pick GPU count and topology, hit submit. The fabric provisions clusters and starts distributed training.

RUN
04

Parallel Execution

Workers train in parallel inside each cluster. Cross-cluster sync runs over whatever interconnect you have. Checkpoints stream back every epoch.

ARTIFACTS
05

Get Your Model

Final weights land in jobs/<name>/model_out/ in your bucket. Download, deploy as an endpoint, or keep training.

Python SDK

Drive everything from Python.

One install. Two calls. Train any model on the pool you composed, then promote the checkpoint to an inference endpoint — same SDK, same auth, same bucket.

1 · Submittrain_and_serve.py
train_and_serve.py
2 · Call the endpointshell
curl
pip install resontech
What the SDK gives you
One install

pip install resontech. No CUDA, drivers, or worker dependencies on your machine.

Submit + serve in one file

sdk.rt_submit() dispatches training. sdk.rt_deploy() promotes the checkpoint to an OpenAI-compatible endpoint.

Live telemetry

job.logs(stream=True), job.status, job.metrics — tail GPU util, loss, and per-worker state from the REPL.

Bucket-native

sdk.bucket gives you a boto3-compatible handle on your S3 prefix. Workers fetch shards via presigned URLs — bytes never cross our API.

Works in notebooks, scripts, CI

No daemon, no background process. Drop it in a notebook, a GitHub Action, or a remote SSH session — same SDK, same auth.

Override anything

Custom Executor, custom Persistor, custom aggregator — pass any NVFlare subclass and the platform runs it.

Full SDK reference
What runs on the fabric

Any model. Any task. Any modality.

These are families we already see on the platform. None of them is a constraint — train whatever you want, in whatever framework you want. If it runs in a Python container, it runs here.

Computer Vision

Object detection (YOLOv5, YOLO11), classification (ViT-B/16), instance segmentation, custom architectures. Endpoints serve raw tensors or annotated outputs.

Language & LLMs

Full-parameter fine-tuning or LoRA / PEFT. Instruction tuning, function-calling agents, embeddings, RAG components. 4-bit / 8-bit quantized training supported.

Medical & Scientific

MONAI integration for medical imaging. DenseNet, U-Net, classification and segmentation. HIPAA-compatible deployment paths via Private Fabric mode.

Speech & Audio

Speech recognition (Whisper family), audio classification, custom audio models. LoRA fine-tuning supported.

Generative & Diffusion

Diffusion fine-tuning (SDXL and successors). LoRA adapters for style and subject. Trained checkpoints serve through standard diffusion APIs.

Your model

Anything else — RL agents, time-series, robotics policies, custom multimodal stacks. Bring your model class and a Dockerfile; the platform runs it.

Bring your own

Same SDK across every tier.
Start free on the public pool.

Python SDK, no credit card. Talk to engineering when you need multi-cluster orchestration or a Private Fabric inside your perimeter.

Talk to engineering