OpenAI’s GPT-5 Matches Humans in Over 40% of Professional Tasks, Signaling a New Era in AI Capabilities

OpenAI has announced striking new results revealing that its latest language model, GPT-5, now performs at or above human expert levels in a significant portion of economically valuable work. According to OpenAI’s recent GDPval benchmark, GPT-5 achieved human-level performance in **40.6% of professional tasks** across 44 different occupations spanning nine industries, including finance, healthcare, law, logistics, and engineering.

This marks a substantial leap compared to the previous generation, GPT-4o, which managed just 13.7% in similar benchmarking. The progress between versions signals an **exponential advancement** in AI models’ ability to understand, reason, and produce work that can rival skilled human professionals. OpenAI’s evaluation lead, Tejal Patwardhan, described this rapid improvement as “really encouraging,” underscoring how quickly AI is closing the gap toward what some call artificial general intelligence (AGI).

The GDPval benchmark tests models on real-world tasks rather than synthetic challenges. For instance, GPT-5 was compared against investment bankers writing competitive analyses, nurses documenting patient care, and other specialized professionals performing high-level tasks. The model’s success rate indicates it can either tie with or outperform human experts about two out of every five times in these complex work scenarios.

Beyond raw performance, GPT-5 is also celebrated for its **improved reasoning skills and multimodal understanding** — meaning it can analyze not just text but images, diagrams, and charts, enhancing its value for tasks requiring integrated understanding of diverse inputs. This multimodal ability supports use cases like interpreting medical imaging alongside textual data or summarizing visual presentations, opening doors for more versatile applications.

OpenAI’s GPT-5 Pro variant further extends its capabilities by enabling **deeper reasoning and more reliable task execution**, reducing error rates significantly compared to earlier models. For example, in health-related prompts where accuracy is critical, GPT-5 demonstrates lower hallucination and mistake frequencies, making it a safer option for sensitive domains. It also leverages an innovative “real-time router” system that dynamically switches between fast responses and thorough “thinking mode” to optimize task handling efficiently.

While GPT-5 isn’t yet surpassing humans in the majority of professional roles, the trajectory suggests AI is transitioning from a supplementary tool to a direct collaborator in knowledge work. This will likely reshape industries by automating routine or data-intensive tasks, freeing experts to focus on higher-level problem-solving and creativity. GPT-5’s close performance to experts in functions like legal counsel, software engineering, and sales forecasting signifies profound implications for workforce dynamics and productivity.

Other AI models from competitors, such as Anthropic’s Claude Opus 4.1, have shown similar capabilities, in some areas outperforming GPT-5 by producing integrated visual graphics alongside content, highlighting ongoing innovation in the AI ecosystem.

In conclusion, OpenAI’s release of GDPval test results demonstrates that GPT-5 has entered a pivotal phase where AI can reliably match human specialists in many complex professional tasks. This advancement not only challenges traditional notions of work and expertise but also sets the stage for widespread adoption of AI-assisted workflows across sectors, accelerating a new era of productivity and innovation.

Continue Reading

This is a summary. Read the full story on the original publication.

Read Full Article

Continue Reading

Comments (0)