Hardware LLM Performance Simulator

Realistic TTFT, decode speed, memory fit, and hardware behavior.

Device Presets

Model Presets

Backend

Manual Model Config

Model size (Billion params)

Quantization

Simulation Results

Total VRAM

6.0 GB

Model VRAM required

7.7 GB

Fits in VRAM

Fits in RAM

Yes

Execution mode

CPU

Decode speed

2.3 tok/s

Prefill speed

3.4 tok/s

TTFT

1.80 s

Max context

8192

Perceived speed

Very slow

Paragraph length

120 tokens

Perceived speed

2.3 tok/s (0–80 scale)

Slow‑motion stream (for demo)

Live Response Simulation

😄

How fast will this model feel on my hardware?

🤖

Run a simulation to see a live response.

Compare Any Two Devices

Select two hardware presets and compare their performance for the same model and backend.

Left device

Model + backend

Right device

Desktop — RTX 4090 + i9-13900K

TTFT0.86 sDecode83.0 tok/sPrefill124.4 tok/sFits VRAM✔Fits RAM✔ModeGPUContext49152

Desktop — RTX 4080 + Ryzen 9 7900X

TTFT0.86 sDecode61.7 tok/sPrefill92.5 tok/sFits VRAM✔Fits RAM✔ModeGPUContext32768

Benchmark All Devices

Run a full performance simulation across every hardware preset.

Model

Backend

Performance Charts

Visualize how hardware, model size, and backend affect performance.

Decode Speed vs Model Size

VRAM vs Max Context

TTFT vs Model Size

Backend Comparison

Export / Import Profile

Save your current hardware profile or load a custom one.

Current profile snapshot:

{
  "cpuVendor": "Intel",
  "cpuModel": "Core i7-12700H",
  "cpuCores": 6,
  "cpuThreads": 12,
  "cpuBaseGHz": 2.3,
  "cpuBoostGHz": 4.7,
  "cpuTdpWatts": 45,
  "hasAVX2": true,
  "hasAVX512": false,
  "ramGB": 16,
  "ramType": "DDR5",
  "ramSpeedMT": 4800,
  "gpuVendor": "NVIDIA",
  "gpuModel": "RTX 3060 Laptop",
  "gpuArchitecture": "Ampere",
  "vramGB": 6,
  "vramType": "GDDR6",
  "memoryBandwidthGBs": 192,
  "tflopsFP16": 20,
  "tflopsFP32": 10,
  "tflopsINT8": 40,
  "os": "Windows",
  "isLaptop": true,
  "coolingClass": "Thin",
  "pcieGen": 4
}