← Back to News

GPT-5.4 Just Beat Humans at Real Computer Work — Scoring 75% Where Experts Scored 72.4%

OpenAI GPT-5.4 AI model

OpenAI's GPT-5.4 scored 75% on the OSWorld-Verified benchmark — a test that measures the ability to complete real desktop productivity tasks like navigating software, filling out forms, and managing files. Human experts scored 72.4%.

This is the first time any AI model has outperformed humans at autonomous computer work.

What makes this different from previous benchmarks:

  • It's not trivia or math — it's real-world desktop productivity
  • GPT-5.4 can see your screen, move the mouse, type, and complete multi-step workflows
  • It jumped from GPT-5.2's 47.3% to 75% — a massive leap in one generation
  • 1 million token context window — it can process entire codebases or documents at once
  • Three variants: Standard, Thinking (deep reasoning), and Pro

This isn't about replacing people. It's about what becomes possible when AI can operate a computer as well as you can. Think automated testing, data entry at scale, customer onboarding, report generation — all running autonomously.

Source: OpenAI ↗
#OpenAI#GPT5#AI#Automation