LLM Homogenisation — Why Your LLM Makes You Average · Observations

ChatGPT makes you average. Not because it's bad — but because it does exactly what it was built to do.

Harvard researchers just measured what many suspect. They asked GPT-5.1 an open-ended question — 1,000 times. The model delivered exactly 19 different ideas. Nineteen. While a different querying method extracted over 1,300 from the same model.

This is not a bug. It's a side effect of optimisation.

Modern language models are tuned for precision — the most probable next word wins. For facts, that's brilliant. For strategy, product development, or positioning, it's a problem — because "statistically probable" means: conventional. Mainstream. Exactly what your competitor gets too.

And it's getting worse, not better: the more powerful a model scores on benchmarks, the narrower its output becomes on open-ended questions. In the tests, the latest models were more homogeneous than their predecessors. More compute, less originality.

The Harvard study also shows a way out. With a modified querying method, instead of 19 you suddenly get 1,307 different results. At comparable relevance. But a completely different search space.

What does this mean in practice?

→ Anyone using language models for ideation should be aware: the first five suggestions are the most conventional — not the best. → Use different models in parallel. Claude, GPT, Gemini, an open-source model — they have different blind spots. → Deliberately shift the entry point. Don't ask the same question five times — ask from the perspective of a completely unrelated field. Instead of "How do we improve our onboarding?" ask: "What would a game designer criticise about our onboarding?" Every perspective shift forces the model into a different part of its knowledge space. → Never treat AI output as the result — treat it as the starting point. The interesting ideas lie beyond the top 5.

The problem is not the technology. The problem is that most people treat it like an oracle instead of a tool.