Train your model.
Serve it.
Submit a training job from a Python notebook. When it finishes, promote the checkpoint to an inference endpoint — one SDK, one bucket, one bill.
The actual platform.
Click through it yourself.
Deploy an endpoint, scale replicas, walk through the submit wizard, filter the cluster — all live.
Two flows. One platform.
Submit training jobs or deploy inference endpoints — same SDK, same bucket.
Shard Dataset
Split your dataset into .zip shards — one per GPU worker. Upload to your S3 bucket via browser, rclone, or SDK.
Bucket Pull
Workers get a short-lived presigned URL at dispatch and pull shards straight from storage. Raw data never traverses the control plane.
Submit Job
Drop your scripts, pick GPU count and topology, hit submit. The fabric provisions clusters and starts distributed training.
Parallel Execution
Workers train in parallel inside each cluster. Cross-cluster sync runs over whatever interconnect you have. Checkpoints stream back every epoch.
Get Your Model
Final weights land in jobs/<name>/model_out/ in your bucket. Download, deploy as an endpoint, or keep training.
Drive everything from Python.
One install. Two calls. Train any model on the pool you composed, then promote the checkpoint to an inference endpoint — same SDK, same auth, same bucket.
pip install resontech. No CUDA, drivers, or worker dependencies on your machine.
sdk.rt_submit() dispatches training. sdk.rt_deploy() promotes the checkpoint to an OpenAI-compatible endpoint.
job.logs(stream=True), job.status, job.metrics — tail GPU util, loss, and per-worker state from the REPL.
sdk.bucket gives you a boto3-compatible handle on your S3 prefix. Workers fetch shards via presigned URLs — bytes never cross our API.
No daemon, no background process. Drop it in a notebook, a GitHub Action, or a remote SSH session — same SDK, same auth.
Custom Executor, custom Persistor, custom aggregator — pass any NVFlare subclass and the platform runs it.
Any model. Any task. Any modality.
These are families we already see on the platform. None of them is a constraint — train whatever you want, in whatever framework you want. If it runs in a Python container, it runs here.
Computer Vision
Object detection (YOLOv5, YOLO11), classification (ViT-B/16), instance segmentation, custom architectures. Endpoints serve raw tensors or annotated outputs.
Language & LLMs
Full-parameter fine-tuning or LoRA / PEFT. Instruction tuning, function-calling agents, embeddings, RAG components. 4-bit / 8-bit quantized training supported.
Medical & Scientific
MONAI integration for medical imaging. DenseNet, U-Net, classification and segmentation. HIPAA-compatible deployment paths via Private Fabric mode.
Speech & Audio
Speech recognition (Whisper family), audio classification, custom audio models. LoRA fine-tuning supported.
Generative & Diffusion
Diffusion fine-tuning (SDXL and successors). LoRA adapters for style and subject. Trained checkpoints serve through standard diffusion APIs.
Your model
Anything else — RL agents, time-series, robotics policies, custom multimodal stacks. Bring your model class and a Dockerfile; the platform runs it.
Same SDK across every tier.
Start free on the public pool.
Python SDK, no credit card. Talk to engineering when you need multi-cluster orchestration or a Private Fabric inside your perimeter.