A model in the suite · OpenAI

GPT-5.4

OpenAI · GPT · 2026-04-19

51/100
Strict suite averageLegacy 67 · 1 benchmark

Single-benchmark historical packet. The Car Wash scaffold has enough recoverable model identity for a packet, but multiple primary canary failures keep it in interesting-but-unreliable territory.

Copies GPT-5.4's full data pack — paste it into ChatGPT, Claude, or any AI to talk it through.

GPT-5.4 against the field

How GPT-5.4 handled each benchmark

Score, capability radar, and the honest read on what it nailed and where it slipped. Hit Overlay to drop other models onto the same axes.

Car Wash Operations

A filthy operational dataset — ghost records, orphaned orders, typo'd customers, raw enum variants. Tests judgment under messy real-world data: what gets fixed, quarantined, or wrongly promoted.

51legacy 67
Interesting but Unreliable

GPT-5.4 showed better file-level rigor than Opus on SVC-007 and duplicate-customer conflict evidence, but strict scoring centers migration safety. Ghost/test records survived as canonical data, Terrence Blackwood was promoted to a customer, status/payment values remained raw enough for magic/case variants to survive, and the canonical customer count ballooned. The output is a review scaffold, not a trustworthy migration.

OverlayDownload radar
Instr. FollowingArtifact ValiditySource IntegritySemantic JudgmentQuant. Reas.UX ReviewabilityProd. ReadinessSpeed
1Claude Opus 4.886
2GPT-5.555
3Gemini 3.5 Flash (High) Fast51
4GPT-5.451
5Opus 4.748

What it nailed

  • Completed the required artifact set.
  • Accounted for the full file corpus in the cross-review analysis.
  • Parsed deshawn_services.tsv and surfaced the SVC-007 conflict.
  • Preserved useful conflict evidence for some duplicate customer cases.

Where it slipped

  • Promoted Mickey Mouse, Test Customer, and Asdf Asdf into canonical data.
  • Promoted Terrence Blackwood to a canonical customer.
  • Left status and payment methods raw enough that case variants and magic survived.
  • Over-expanded the canonical customer table.
Misses Three Or More Primary CanariesPromotes Ghost RecordsPromotes Orphan Order