Yeti Claw

Mission Control | economics dispatch

Savage Mission Control banner used for the cloud vs local economics dispatch

Mission Control Dispatch | Published May 7, 2026

Cloud vs. local inference economics: Mac mini, DGX Spark, and BeastMode

We took the SemiAnalysis / Nebius GPU-cluster study you supplied, matched it against our own fleet benchmarks, and then rebuilt the answer around the boxes actually powering Yeti Claw today. The Mac mini comes out as the strongest small-surface economics play, DGX Spark becomes compelling when it stays busy, and the BeastMode lanes make the most sense as shared-capacity text workers rather than pretend GPU replacements.

Download The PDF Download Artifact Bundle Back To Mission Control
Mac mini local cost $0.110/h

24x7 fully loaded local cost using current-market Apple capital and California power.

Spark local cost $0.254/h

24x7 fully loaded local cost using current NVIDIA price and a conservative 240W power envelope.

Study inference ratio 2.13x

AWS vs. Nebius TCO in the study’s H200 inference-endpoint scenario.

Cheapest cloud anchor $0.39/h

Runpod Secure Cloud L4 starting rate used as the budget GPU baseline.

Executive read

What changed once we used our own fleet data

  • The Mac mini is already cheaper than every cloud accelerator baseline in this report once it is amortized, even if we only count a normal workweek and use a conservative max-power assumption.
  • DGX Spark is not the cheapest sticker-price answer, but it becomes rational quickly when its text and image workloads actually keep the box occupied.
  • The BeastMode lanes are useful because they harvest spare ESXi capacity. Their economics are good only if we treat them as shared text workers, not premium real-time lanes.
  • The SemiAnalysis / Nebius study’s core thesis holds: support, setup, storage, and utilization move the answer more than raw GPU-hour price alone.

Study context

Why the provided study still matters

The study you provided is useful because it refuses to stop at the headline GPU sticker price. In its inference-endpoint scenario, it concludes that the total cost lands at 1.00x for Nebius, 2.13x for AWS, and 1.04x for the silver-tier composite provider. The report argues that support, storage, setup, and the cost of operating the surrounding stack are what make inference economics widen out, not just the accelerator line item.

Study scenario Nebius AWS Silver-tier Read
Large LLM pretrain 1.00x 1.09x 1.08x Support and setup already move the answer, even before cloud prices diverge.
Multimodal RL research 1.00x 1.43x 1.08x GPU price, storage, and ops overhead widen the spread fast.
Inference endpoints 1.00x 2.13x 1.04x The operating stack is what makes hyperscaler inference expensive in real terms.

Cost curves

Local hourly cost vs. published cloud pricing

These local numbers include hardware amortization and electricity. The Mac mini uses Apple’s published maximum continuous power. Spark uses the full 240W external PSU as a deliberate upper-bound assumption, which means its true sustained local cost is often better than shown here.

Chart comparing local hourly cost of the Mac mini and DGX Spark against current cloud GPU pricing
Mac mini beats every cloud baseline in this set. Spark beats the enterprise baselines and needs utilization to beat the cheapest L4 rental.
Chart showing hardware-only breakeven days for the Mac mini and DGX Spark against cloud GPU pricing
Spark pays back very fast against H200-class cloud rates, but much slower against bargain L4 rentals. That is the utilization story in one picture.
Platform Capital anchor 24x7 local $/h 40h/week local $/h Role
Mac mini $1,609 $0.110 $0.307 Primary small-surface text lane
DGX Spark $4,699 $0.254 $0.829 Text + image development and inference box
Runpod L4 n/a $0.390 $0.390 Budget cloud GPU baseline
Nebius L40S n/a $1.350 $1.350 Enterprise mid-range GPU baseline
Nebius H200 n/a $3.500 $3.500 Premium inference GPU baseline
AWS p5en H200 n/a $5.721 $5.721 Hyperscaler premium GPU baseline

Breakeven window

How long until the hardware pays for itself

This breakeven view uses the simplest clean formula: hardware price divided by published cloud hourly rate. It is not a full TCO replacement. It is the fastest way to answer the buy-vs-rent question before we add staff time, storage, or incident overhead.

Platform Runpod L4 Nebius L40S Nebius H200 AWS p5en H200
Mac mini 4,125.6h / 171.9d 1,191.9h / 49.7d 459.7h / 19.2d 281.2h / 11.7d
DGX Spark 12,048.7h / 502.0d 3,480.7h / 145.0d 1,342.6h / 55.9d 821.4h / 34.2d

Token view

What the local fleet actually buys you per output token

We sampled representative local text requests and converted them into output-token throughput. This is a better normalization than requests alone because these boxes are serving very different models. For BeastMode, the numbers are explicitly labeled as lower-bound because ESXi does not expose clean per-host watt telemetry through the management path used here.

Chart showing local cost per one million output tokens across Mac mini, DGX Spark, Chewbacuh, and LiL-Beastly
Mac mini is the cheapest token engine in the fleet. Spark is still strong when it stays busy. BeastMode is viable because the lane shares are small, not because CPU inference is inherently fast.
Lane Representative model Output tok/s 24x7 $/1M tok 40h/week $/1M tok Read
Mac mini qwen2.5:7b 47.77 $0.64 $1.78 Best small-box economics in the fleet.
DGX Spark qwen3:8b 41.97 $1.68 $5.48 Best when text and image duty cycles both keep the box active.
Chewbacuh* qwen3:8b 3.32 $1.04 $4.37 Lower-bound capital-only estimate on shared infrastructure.
LiL-Beastly* qwen3:14b 1.78 $3.98 $16.78 Better for larger-model availability than for cheap real-time output.

BeastMode appendix

How the CPU-only lanes compare to CPU cloud

The right cloud comparison for BeastMode is not an H200. It is a CPU-only instance with similar vCPU and RAM. We used Nebius non-GPU AMD EPYC pricing as the baseline and priced each lane by its resource footprint.

Chart comparing local BeastMode lane capex per hour against a Nebius CPU-cloud equivalent
Chewbacuh and LiL-Beastly are good because they consume already-owned shared capacity. That is a different economic story than buying a box just for them.
Lane VM shape Shared-host capital share CPU-cloud equivalent Breakeven
Chewbacuh 8 vCPU / 48 GiB $326 $0.250/h 1,304.8h / 54.4d
LiL-Beastly 12 vCPU / 96 GiB $670 $0.451/h 1,484.6h / 61.9d