Mission Control Dispatch | Published May 7, 2026
Yeti Claw fleet map: where each inference lane fits
This is the operator view of the fleet as it exists today. We took the economics dispatch, the Mac mini and Spark text-capacity study, the BeastMode lane benchmark, and the Spark image sweep, then collapsed them into one practical answer: which box should take which work, what each lane costs, and where queueing should start instead of pretending everything is a real-time surface.
$0.110/h 24x7 with the lowest measured text-token cost in the fleet.
8 of 10 open-source image models completed in the concurrent burst study.
10.37s baseline latency with a flat throughput plateau around 0.098 rps.
Two simultaneous conversations stay in the premium responsiveness zone before queueing dominates.
Executive read
What the fleet map says in plain language
- Use the Mac mini as the default public text lane when you want the best cost-to-latency ratio on a small box.
- Use DGX Spark whenever the job is image-first, multimodal, or premium enough to justify the more expensive but more capable box.
- Use Chewbacuh as the first BeastMode overflow lane when the Mac mini is saturated or when you want dedicated public-route capacity on ESXi.
- Use LiL-Beastly when the larger 14B lane is worth the extra wait time. It is an availability lane, not the cheap real-time lane.
Lane map
The fleet at a glance
Mac mini
The strongest small-surface economics play in the fleet and the least dramatic operator lane for public text.
Use it for: everyday chat, premium low-latency text, and the cheapest local inference in the current fleet.
DGX Spark
The only lane with benchmark-proven image-generation coverage and still a strong text lane when the workload stays busy.
Use it for: image generation, multimodal experiments, and premium shared text where queueing is acceptable past two live sessions.
Chewbacuh
The faster of the two ESXi text workers, useful for public-route overflow and queue-friendly text traffic.
Use it for: supplemental public text capacity once the primary small-box lane is saturated.
LiL-Beastly
The heavier ESXi worker. Stable, but deliberately slower because the larger model is the point.
Use it for: larger-model availability where users will tolerate queue-heavy behavior in exchange for model quality.
* BeastMode token economics are lower-bound capital-only estimates because this management path does not expose clean per-host watt telemetry.
Routing matrix
What should take which job
| Scenario | Primary lane | Fallback lane | Why |
|---|---|---|---|
| Fast public chat | Mac mini | Chewbacuh | Best cost-to-latency ratio with the least operational drama. |
| Shared premium text | DGX Spark | Mac mini | Spark remains strong for text as long as the live band stays small. |
| Larger-model public text | LiL-Beastly | Chewbacuh | The 14B lane exists for model availability, not for cheapest real-time output. |
| Image generation | DGX Spark | Queue on Spark | No other current public lane has benchmark-proven image-generation coverage. |
| Burst traffic / overflow | Chewbacuh | LiL-Beastly | BeastMode turns spare ESXi capacity into a shared text safety valve. |
Series bridge
The four reports that shape this map
Cloud vs. local inference economics
The buy-vs-rent model: local hourly cost, breakeven windows, and token-normalized economics.
Mac mini vs DGX Spark text capacity
The concurrency study that defines the comfort band for the two primary text boxes.
Chewbacuh vs LiL-Beastly
The ESXi lane benchmark that tells us when BeastMode is useful and when it is just queueing.
DGX Spark image model sweep
The evidence that Spark is more than a text box and where its queue-design pressure begins.
Operator playbook
How this should change product behavior
- Default the main public text experience to the Mac mini unless a route explicitly needs a BeastMode or Spark lane.
- Queue image jobs on Spark instead of treating them like synchronous chat replies, because the image sweep proved that model times vary wildly.
- Expose BeastMode as deliberate routing, not hidden magic. Users should understand when they are choosing the faster 8B lane versus the slower 14B lane.
- Use Spark for premium multimodal and image-heavy flows, then use Mission Control data to justify when the queue should widen or when new hardware is warranted.