When the Machine Invents a Category for You

There is a number that should make every B2B SaaS founder uncomfortable.

It is not a churn rate or a CAC ratio. It is a cosine similarity score, and it sits at the intersection of how you describe your company and how AI systems understand it.

In a recent audit, we ran cosine similarity between a company's own self-description — the language they use on their website and in press — and the category an AI model assigned to that same company when asked about it independently.

−0.014

The cosine similarity score between how this company described itself and the category the model independently assigned to it. A score of 1.0 means identical. Approaching 0.00 means no meaningful conceptual relationship — a 90-degree angle in vector space.

This company was not invisible to the model. The model had an opinion about them. It just invented an entirely different category to put them in.

What is actually happening

When a language model encounters your brand, it does not read your website the way a human does. It converts every piece of text it has seen about you into a high-dimensional vector — a point in a geometric space where meaning is expressed as distance.

Two ideas that are semantically close end up near each other in that space. Two ideas that are unrelated end up far apart. The angle between them, expressed as cosine similarity, is how the model understands the relationship.

The problem is not that models are wrong. The problem is that they are precise about imprecise inputs.

When your public signals are fragmented — when your website says one thing, your press coverage implies another, and your founder interviews suggest a third — the model does not average them into a coherent picture. It makes a probabilistic inference from whatever has the most weight in its training data. And that inference often lands somewhere you have never been.

The three failure modes we see repeatedly

How misrepresentation happens in practice

Failure mode 01

Legacy category lock

A company launched in 2019 as a "legal research tool." They have since built something fundamentally different — a judicial reasoning layer, a strategic inference engine. But the 2019 framing still dominates authoritative sources. So models retrieve them as a legal research tool, and newer queries — the ones high-intent buyers are actually typing — don't surface them at all.

Failure mode 02

Gap-filling

The model knows roughly what the company does but does not know how it works. The company has deliberately kept that vague to protect IP. So the model fills the gap using adjacent signals — competitor descriptions, category norms, surface-level inference. The company ends up described as doing something they have never confirmed and may explicitly reject.

Failure mode 03

Category invention

This is the −0.014 case. The model cannot reconcile the conflicting signals, so it generates its own category from scratch — something that partially overlaps with the company's actual positioning but is not anchored to anything they have ever said. The company is now being retrieved under a label they did not create, cannot control, and may not even recognize.

A pipeline problem, not a PR problem

The natural instinct is to treat this as a content issue. Write more. Publish more. Get more press.

That instinct is wrong.

The problem is not volume. It is coherence. A company can have hundreds of public mentions and still score near 0.00 on the measure that matters — the alignment between their stated positioning and the model's actual representation of them.

When a buyer asks an AI system for a shortlist of tools in your category, the system is not counting your press hits. It is running a semantic retrieval against its internal representation of your entity. If that representation is fragmented, inconsistent, or anchored to the wrong category, you are filtered out — not because you are unknown, but because what you are is unclear.

Invisibility is not the risk. Misrepresentation is.

What the diagnostic looks like

At Visivle, we measure entity coherence across three axes:

Entity coherence — three measurement axes

Axis 01

Source-to-model drift

The distance between how a company describes itself and how models classify it independently.

Axis 02

Cross-model variance

The degree to which different AI systems produce different representations of the same entity.

Axis 03

Category anchor strength

How consistently the model retrieves the company under the intended category versus adjacent or invented ones.

The goal is not to game the model. The goal is to give the model enough consistent, unambiguous signal that it does not have to invent anything.

When category anchors are strong, models retrieve accurately. When they are weak, models fill the gap — and they fill it with whatever is loudest in the surrounding noise.

The only number that matters

The question worth asking about your company is not "Are we ranking?" or "Are we being mentioned?"

What is the cosine similarity between what we think we are and what the machine thinks we are?

If you do not know that number, you do not know how AI systems are representing you to the buyers who are searching for you right now.