Platform

The End-to-End
ML Ecosystem

Distributed training with multiple parallelism strategies. Inference as input → tensor out. Full lifecycle from raw data to deployed model — in one platform.

Platform includes

✓Distributed Training

✓Inference Engine

✓Python SDK

✓Live Telemetry

✓Multi-GPU Scheduling

✓Fault Recovery

Live Demo Distributed Training Data Sharding Python SDK Inference Benchmarks Tech Stack

Live Platform Preview

The actual platform.
Click through it yourself.

Deploy an endpoint, scale replicas, walk through the submit wizard, filter the cluster — all live.

Inference

ResonTech

Aalice@acme.com

Inference Endpoints LIVE

Total RPS

Avg P50

52ms

Active

3 / 4

Replicas

EndpointRPSTrendP50Replicas

llama3-8b-chat

38ms

whisper-large-v3

210ms

sdxl-turbo

1.2s

mistral-7b-instruct

scaled-0

—

Distributed Training

Why we're faster:
True distribution.

We don't just run your script on multiple GPUs. We orchestrate genuine distributed training across a cluster — gradient synchronization, topology-aware node placement, and automatic parallelism configuration. Set your GPU count in the platform and we handle the rest.

8×

Throughput gain

vs single-GPU baseline

< 10s

Avg boot time

cold cluster start

< 15s

Recovery time

on node failure

3×

Data throughput

via parallel sharding

✓Topology-aware node placement — NVLink / InfiniBand preferred
✓Automatic NCCL configuration — no manual rank setup
✓Gradient checkpointing and mixed precision by default
✓Checkpoint recovery resumes from last epoch — not zero

Shard

Prepare Data

Split dataset into .zip shards — one per GPU worker. Manual or via the Python SDK. Upload to your S3 bucket once.

↓

Fetch

Bucket Pull

We host the bucket — your private Garage storage. At dispatch each worker gets a short-lived presigned URL for its shard and pulls it straight from storage, bypassing the control plane.

↓

Train

Distributed Run

Workers run in parallel. Gradients sync. Checkpoints stream back to your bucket under the job folder.

↓

Recover

Auto-Recovery

Node fails? Job reschedules. Resumes from last checkpoint in your bucket — not zero.

↓

Artifacts

Get Your Model

Final weights land in jobs/<name>/model_out/ in your bucket. Download via browser, rclone, or SDK.

Dataset Sharding

One shard
per GPU worker.

ResonTech uses shard-based data distribution. Your dataset is split into .zip shards — one per GPU worker. Each worker loads only its shard, with no centralized bottleneck. Do it manually, or let the agent handle it.

Why shards?

Each worker reads from its own .zip independently — no data loader bottleneck, no network contention. Training scales linearly with GPU count.

🗂️Manual

Split your dataset into .zip shards yourself. One shard = one GPU worker. Drop into the shards folder path.

shard.sh

🐍Via Python SDK

Coming Soon

Use the ResonTech Python library to split and upload your dataset to the workspace programmatically. Handles sharding, compression, and path registration automatically.

prepare_data.py

Python SDK

One import,
a whole cluster.

The resontech Python SDK handles authentication, job submission, live telemetry, and artifact retrieval — straight from your notebook or script.

Install

pip install resontech

✓One-call job submission — rt.submit(model=..., training=...)
✓Native PyTorch, HuggingFace, and federated workflows
✓Stream logs, metrics, and checkpoints from any running job
✓Artifacts synced back to your workspace when training finishes

Coming Soon
resontech — python — 120×36
sdk v0.9.4-beta
resontech sdk  ·  reson.tech/docs

Inference Engine

Model + Input
→ Tensor out.

Push your trained model. Send your input. Get tensors back. No servers to manage, no scaling to configure. We handle all of it.

Push Model

Upload checkpoint, point to HuggingFace repo, or reference a completed training job. PyTorch, TensorFlow, HuggingFace, Triton.

Send Input

Send any input via our SDK or platform. Text, images, audio, embeddings. Format-agnostic. Batching handled automatically.

Get Tensors

Receive raw tensors or decoded outputs. Sub-10s cold start. Autoscales to zero when idle — pay per request.

classify.pyComing Soon

classify.py

latency: 18ms· throughput: 3,200 img/s· cold start: 6.1s

Performance Benchmarks

From 4 days to 12 hours.

DeepLab fine-tuning · 61M parameters · 30GB dataset · identical final quality

8×

Faster Training

87.5%

Less Time

100%

Same Quality

✗ Self-Managed

Dataset30 GB

Batch size6

Duration4 days

Multi-node✗

✓ ResonTech (3-node DDP)

Dataset30 GB (3 shards)

Batch size10

Duration12 hours

Multi-node✓ auto

Supported Technologies

Works with what
you already use.

PyTorch

Framework

TensorFlow

Framework

HuggingFace

Models

Scikit-learn

ML Library

Weights & Biases

Observability

CUDA

Runtime

More coming soon

Ready to run?
Start in 60 seconds.

No credit card required for the public pool. Book a demo for managed or private cluster.

BOOK A DEMO

The End-to-EndML Ecosystem

The actual platform.Click through it yourself.

Why we're faster:True distribution.