Hardware LLM Performance Simulator

Realistic TTFT, decode speed, memory fit, and hardware behavior.

Device Presets
Model Presets
Backend
Manual Model Config
Simulation Results
Total VRAM
6.0 GB
Model VRAM required
7.7 GB
Fits in VRAM
No
Fits in RAM
Yes
Execution mode
CPU
Decode speed
2.3 tok/s
Prefill speed
3.4 tok/s
TTFT
1.80 s
Max context
8192
Perceived speed
Very slow
Paragraph length
120 tokens
Perceived speed
2.3 tok/s (0–80 scale)
Live Response Simulation
😄
How fast will this model feel on my hardware?
🤖
Run a simulation to see a live response.

Compare Any Two Devices

Select two hardware presets and compare their performance for the same model and backend.

Left device
Model + backend
Right device
Desktop — RTX 4090 + i9-13900K
TTFT0.86 sDecode83.0 tok/sPrefill124.4 tok/sFits VRAMFits RAMModeGPUContext49152
Desktop — RTX 4080 + Ryzen 9 7900X
TTFT0.86 sDecode61.7 tok/sPrefill92.5 tok/sFits VRAMFits RAMModeGPUContext32768

Benchmark All Devices

Run a full performance simulation across every hardware preset.

Model
Backend

Performance Charts

Visualize how hardware, model size, and backend affect performance.

Decode Speed vs Model Size

VRAM vs Max Context

TTFT vs Model Size

Backend Comparison

Export / Import Profile

Save your current hardware profile or load a custom one.

Current profile snapshot:
{
  "cpuVendor": "Intel",
  "cpuModel": "Core i7-12700H",
  "cpuCores": 6,
  "cpuThreads": 12,
  "cpuBaseGHz": 2.3,
  "cpuBoostGHz": 4.7,
  "cpuTdpWatts": 45,
  "hasAVX2": true,
  "hasAVX512": false,
  "ramGB": 16,
  "ramType": "DDR5",
  "ramSpeedMT": 4800,
  "gpuVendor": "NVIDIA",
  "gpuModel": "RTX 3060 Laptop",
  "gpuArchitecture": "Ampere",
  "vramGB": 6,
  "vramType": "GDDR6",
  "memoryBandwidthGBs": 192,
  "tflopsFP16": 20,
  "tflopsFP32": 10,
  "tflopsINT8": 40,
  "os": "Windows",
  "isLaptop": true,
  "coolingClass": "Thin",
  "pcieGen": 4
}