ML & AI operating ecosystem

Compose your cluster.
Train your model.
Serve it.

Distributed training and inference, across any GPU pool. From one GPU to multiple clusters. No infrastructure to manage.

Read the docs
The proof

From notebook to multi-worker training in one call.

01

Compose your cluster

Pick GPUs from the shared pool, your reservation, or your own hardware. Mix RTX 3090 to H100 — a tuned cluster of mid-tier cards often matches a smaller cluster of flagships at the same hourly spend. Heterogeneous composition lets you actually choose.

02

Bucket as workspace

Per-account S3 bucket on signup. Browse folders, edit Python and configs in the browser, pin files side-by-side. boto3-native end-to-end.

03

Recovery without restart

Worker drops mid-run? The fabric evicts the bad node, reschedules onto a healthy one, and resumes from your last checkpoint. Not epoch zero.

04

Any framework, any model

PyTorch, TensorFlow, JAX, HuggingFace — bring your stack. YOLO11, Phi-3.5-mini LoRA, Llama-3, DenseNet on HAM10000, Whisper, SDXL LoRA, BGE — ready-to-run examples you can lift.

A spread of GPU cards
Training

Train across GPUs around the world.

Submit one job. The platform coordinates training across every worker you picked — shared pool, dedicated cluster, on-prem rack, or a partner site half a continent away — and writes outputs directly to your S3 bucket. Write standard PyTorch, TensorFlow, JAX, or HuggingFace — no rewrites.

  • PyTorch, TensorFlow, JAX, HuggingFace, MONAI, Ultralytics, Diffusers — no custom operators.
  • Choose an aggregation algorithm: FedAvg, FedOpt, Scaffold, FedProx, DiLoCo.
  • Workers self-shard the dataset via presigned URLs from your bucket.
  • Auto-resume from the last checkpoint on node failure or preemption.
  • Stream logs, metrics, and artifacts back to your S3 prefix in real time.
  • Submit from a notebook or CI — the SDK is one HTTP call.
See the training pipeline
Datacenter racks with fiber-optic interconnect
Inference

Serve from a mesh, not a single endpoint.

Push a checkpoint or HuggingFace repo. Get an OpenAI-compatible endpoint with scale-to-zero, health-aware load balancing, and routing across replicas in the regions you choose. Run vanilla HF models or bring a custom script — same deploy path.

  • Drop-in replacement for OpenAI clients — point base_url at your endpoint.
  • Scale to zero when idle; sub-10s warm boot when traffic returns.
  • Multi-region mesh with automatic failover across replicas.
  • Token streaming, request batching, and per-endpoint rate limits.
  • Bring your own script for custom routing, A/B tests, canary rollouts.
  • One short-lived API key per deployment — rotate from the dashboard.
See the inference layer
Where you run it

Three deployment tiers.

Same SDK across every tier. Start free on the Public Pool, graduate to a reserved cluster when you need predictability, drop into your own perimeter when compliance demands it.

Public Pool
Shared GPUs

Shared GPU capacity contributed by suppliers. Start with one GPU on the same SDK an enterprise uses.

Best for
Experiments, prototyping, single-GPU LoRA fine-tunes
Get started
Managed Cluster
Dedicated capacity

Dedicated GPU capacity on our infrastructure. Reserved for your team. Provisioned, monitored, and recovered by us.

Best for
Production training pipelines, inference deployments at scale
Talk to engineering
Early access
Private Fabric
Your hardware

Run the platform inside your perimeter. Connect your on-prem clusters and cloud accounts under one control plane.

Best for
Regulated industries, IP-sensitive workloads, residency rules
Contact us
FAQ

Common questions.

How is this different from Modal, RunPod, AWS, or Lambda Labs?

They are infrastructure providers — they rent you GPUs from their own datacenters and regions. ResonTech is the layer above: a decentralized datacenter you compose yourself. Plug in capacity from any of those providers, plus your on-prem racks, your reservations, or partner clusters, and treat them as one pool. Submit a single job; the platform places workers across the mix, coordinates training, and serves the resulting model — with minimal cross-cluster overhead. We aren't competing with their infra; we sit on top of it.

Do I need to rewrite my training code?

Not a rewrite, but a thin adaptation — typically 50–100 lines. You define a model class, a dataloader, and the training step the way you normally would in PyTorch / TensorFlow / JAX, then expose them through a small Executor + Persistor scaffold so the platform can shard data, dispatch workers, and aggregate model deltas across clusters. We're shipping a Claude plugin that ports an existing training script to that scaffold automatically — paste your repo, get back a submit-ready job in minutes.

How does billing work? Are there hidden idle costs?

Public Pool bills per-minute on allocated GPU time, the same model as RunPod or Lambda — you pay for the minutes a worker is reserved to your job, with no monthly minimums and no pre-purchased credits. Managed Cluster is a flat monthly reservation for dedicated capacity. Private Fabric is per-cluster licensing on your own hardware. Inference endpoints bill per active replica-minute and can scale replicas down between traffic. Egress, storage, and NAT are itemized line-by-line — no hidden tail in the invoice.

What happens if a node crashes mid-training?

The fabric detects the failure, evicts the bad node from the worker pool, and resumes from the last checkpoint automatically. You receive a notification but no manual intervention is required. For long multi-cluster runs this can save dozens of GPU-hours you would otherwise rerun from scratch.

Where does my data live? Can we run air-gapped or data-sovereign?

You can host the bucket yourself (any S3-compatible store). Workers fetch only their assigned shard through short-lived presigned URLs; the control plane never proxies raw bytes. Managed Cluster lets you pin storage per region. Private Fabric runs the entire control plane inside your perimeter, with air-gapped mode for sensitive environments. Data-sovereign mode keeps data anchored across organizations for federated training and federated inference.

How fast can we get started?

Install the Python SDK with one pip install, point it at your model or training script, and submit. First job typically runs within minutes on the public pool. No infrastructure provisioning, no cloud-account setup, no support ticket to request quota.

What GPU types are available on the Public Pool?

H100 SXM5, A100 80GB, and A40 nodes depending on availability and priority tier. Managed Cluster reserves specific GPU types — H100 NVLink, A100 PCIe, L40S — for your team. Private Fabric runs on whatever you bring (B100/B200, H200, A100, L40S, RTX-class, mixed pools all supported).

How does cross-cluster training actually work over slow links?

Across clusters the platform runs federated-style coordination — each cluster trains a local round (multiple SGD steps), then exchanges only model deltas with the central aggregator over gRPC + TLS. That replaces per-step all-reduce, which would die on WAN latency. The aggregator combines updates using a chosen algorithm — FedAvg by default, with FedOpt, FedProx, Scaffold, and DiLoCo selectable per job. Deltas are compressed and the sync interval auto-adjusts to the observed link bandwidth, so it tolerates anything from 10 Gbps cloud interconnect down to public internet. Final model quality typically lands within a few percent of centralized training; the wall-clock cost is the round-trip latency between sites.

Compose your first cluster.
Free on the Public Pool.

One GPU or a hundred. pip install resontech. No credit card to start.