Platform

The End-to-End
ML Ecosystem

Distributed training with multiple parallelism strategies. Inference as input → tensor out. Full lifecycle from raw data to deployed model — in one platform.

Platform includes
Distributed Training
Inference Engine
Python SDK
Live Telemetry
Multi-GPU Scheduling
Fault Recovery
Live Platform Preview

The actual platform.
Click through it yourself.

Deploy an endpoint, scale replicas, walk through the submit wizard, filter the cluster — all live.

Inference
R
ResonTech
Aalice@acme.com
Inference Endpoints LIVE
Total RPS
89
Avg P50
52ms
Active
3 / 4
Replicas
6
EndpointRPSTrendP50Replicas
llama3-8b-chat
48
38ms
whisper-large-v3
72
210ms
sdxl-turbo
97
1.2s
mistral-7b-instruct
0
scaled-0
Distributed Training

Why we're faster:
True distribution.

We don't just run your script on multiple GPUs. We orchestrate genuine distributed training across a cluster — gradient synchronization, topology-aware node placement, and automatic parallelism configuration. Set your GPU count in the platform and we handle the rest.

Throughput gain
vs single-GPU baseline
< 10s
Avg boot time
cold cluster start
< 15s
Recovery time
on node failure
Data throughput
via parallel sharding
  • Topology-aware node placement — NVLink / InfiniBand preferred
  • Automatic NCCL configuration — no manual rank setup
  • Gradient checkpointing and mixed precision by default
  • Checkpoint recovery resumes from last epoch — not zero
01
Shard

Prepare Data

Split dataset into .zip shards — one per GPU worker. Manual or via the Python SDK. Upload to your S3 bucket once.

02
Fetch

Bucket Pull

We host the bucket — your private Garage storage. At dispatch each worker gets a short-lived presigned URL for its shard and pulls it straight from storage, bypassing the control plane.

03
Train

Distributed Run

Workers run in parallel. Gradients sync. Checkpoints stream back to your bucket under the job folder.

04
Recover

Auto-Recovery

Node fails? Job reschedules. Resumes from last checkpoint in your bucket — not zero.

05
Artifacts

Get Your Model

Final weights land in jobs/<name>/model_out/ in your bucket. Download via browser, rclone, or SDK.

Dataset Sharding

One shard
per GPU worker.

ResonTech uses shard-based data distribution. Your dataset is split into .zip shards — one per GPU worker. Each worker loads only its shard, with no centralized bottleneck. Do it manually, or let the agent handle it.

Why shards?

Each worker reads from its own .zip independently — no data loader bottleneck, no network contention. Training scales linearly with GPU count.

🗂️Manual

Split your dataset into .zip shards yourself. One shard = one GPU worker. Drop into the shards folder path.

shard.sh
🐍Via Python SDK
Coming Soon

Use the ResonTech Python library to split and upload your dataset to the workspace programmatically. Handles sharding, compression, and path registration automatically.

prepare_data.py
Python SDK

One import,
a whole cluster.

The resontech Python SDK handles authentication, job submission, live telemetry, and artifact retrieval — straight from your notebook or script.

Install
pip install resontech
  • One-call job submission — rt.submit(model=..., training=...)
  • Native PyTorch, HuggingFace, and federated workflows
  • Stream logs, metrics, and checkpoints from any running job
  • Artifacts synced back to your workspace when training finishes
Coming Soon
resontech — python — 120×36
sdk v0.9.4-beta
resontech sdk · reson.tech/docs
Inference Engine

Model + Input
→ Tensor out.

Push your trained model. Send your input. Get tensors back. No servers to manage, no scaling to configure. We handle all of it.

01

Push Model

Upload checkpoint, point to HuggingFace repo, or reference a completed training job. PyTorch, TensorFlow, HuggingFace, Triton.

02

Send Input

Send any input via our SDK or platform. Text, images, audio, embeddings. Format-agnostic. Batching handled automatically.

03

Get Tensors

Receive raw tensors or decoded outputs. Sub-10s cold start. Autoscales to zero when idle — pay per request.

classify.pyComing Soon
classify.py
latency: 18ms· throughput: 3,200 img/s· cold start: 6.1s
Performance Benchmarks

From 4 days to 12 hours.

DeepLab fine-tuning · 61M parameters · 30GB dataset · identical final quality

Faster Training
87.5%
Less Time
100%
Same Quality
✗ Self-Managed
Dataset30 GB
Batch size6
Duration4 days
Multi-node
✓ ResonTech (3-node DDP)
Dataset30 GB (3 shards)
Batch size10
Duration12 hours
Multi-node✓ auto
Supported Technologies

Works with what
you already use.

PyTorch
Framework
TensorFlow
Framework
HuggingFace
Models
Scikit-learn
ML Library
Weights & Biases
Observability
CUDA
Runtime
More coming soon

Ready to run?
Start in 60 seconds.

No credit card required for the public pool. Book a demo for managed or private cluster.

BOOK A DEMO