Agents prefer guessing to asking — ProactiveBench + frontier differentiation

Your AI agent would rather deliver a wrong answer than admit it's missing data. For most models in the wild, that's still true — even as the picture at the top is starting to shift.

ProactiveBench systematically tested 22 multimodal models. With complete context, the better models land around 80% accuracy. Once relevant information is missing, that picture breaks: the models typically don't ask — they keep guessing. Not a model defect in the narrow sense, but a training artifact. Answering gets rewarded, "I don't know" doesn't.

At the frontier, all major labs have now addressed this, but in different ways:

Gemini 3 ships an explicit tool for it (ask_user), and Google's agent guidelines are clear: if information is missing, ask, don't guess. Claude Opus 4.6 and 4.7 asks for clarification on underspecified tasks as default behavior and states its assumptions explicitly. GPT-5.4 can ask up to three clarifying questions. OpenAI's own agent guidance, however, often recommends the opposite: don't interrupt, don't hand back, keep going.

Deploy an agent into a verification-critical process with the vendor's "productive assistant" prompt frame, and you may get precisely the behavior that isn't acceptable in compliance reviews or financial workflows.

Before any agent rollout, it pays to check the vendor's default prompts. They decide whether the agent asks when data is missing or just keeps going.

Background: https://openreview.net/forum?id=e5a1ZlVcjN