AI Search Visibility: A Complete Glossary

This is a working glossary. It's not exhaustive — categories this young don't have stable canonical terminology, and any list claiming to be exhaustive is either six months out of date or guessing what the field will look like in six months. The 32 terms below are the ones you'll actually hear in serious conversations about AI search visibility, ordered by how often they come up.

Where multiple terms refer to the same thing, we name the synonyms and pick a canonical version. Where the field disagrees, we say so. Where someone is using a term sloppily, we call that out too.

Core concepts

The terms you'll use most

Generative Engine Optimization

GEO

The practice of making a brand more likely to appear inside the answers AI engines (ChatGPT, Claude, Gemini, Perplexity, AI Overviews) generate when users ask commercial questions. Coined in a 2023 academic paper by Princeton, Georgia Tech, and Allen Institute for AI researchers. The most widely-used canonical term for the discipline; preferred over AEO and LLMO because it names the actual mechanism rather than the surface.

The engines you're optimising for

ChatGPT

OpenAI

The dominant AI engine by user volume. Built on the GPT family of models (currently GPT-4o and successors). Heavy reliance on training-data baseline knowledge for general queries, with selective live retrieval via "browse with Bing" mode for current-state queries. The engine most marketers test first when checking AI visibility, and the one most CMOs personally use, which gives it disproportionate weight in internal "are we visible?" conversations.

Claude

Anthropic

The second AI engine most professionals use. Built on the Claude model family (currently Claude 4.7). Used heavily by knowledge workers and developers; the engine your buyers are most likely to use for serious research vs. casual queries. Live retrieval via Claude's web tool. Important for enterprise B2B audiences specifically.

Gemini

Google

Google's AI engine. Distinct from AI Overviews (which appear in Google Search). Heavy integration with Google Workspace; benefits from Google's index for live retrieval. Often underweight in audits because marketing teams don't use it personally — but consumer-facing brands targeting Android-heavy markets should not skip it.

Perplexity

An AI engine that prioritises live retrieval over training-data knowledge. Every response includes inline source citations (numbered references in the text). Heaviest use case is research-style buyer queries — people who want sources, not just answers. Disproportionately influential among researchers, journalists, analysts. Cite-by-default model means strong organic content can drive direct attribution traffic.

Google AI Overviews

AIO · formerly Search Generative Experience

The AI-generated summary that appears at the top of Google search results for many queries. Replaces the user's need to click through to source pages for many informational queries. Drives the largest single shift in click-through-rate behaviour in Google's history. Critical for brands whose acquisition channel is Google organic search; less critical for brands whose buyers are already AI-engine-native.

The mechanics

Training Data

The corpus of text (and increasingly images, code, etc.) used to train an AI model. For LLMs, this is typically a frozen snapshot of the public web plus books, papers, code repositories, and conversational data. Training data shapes the model's baseline knowledge — what it knows about your brand without looking anything up. Cannot be updated after training; only changed by training a new model with a new snapshot.

The bots reading your site

GPTBot

OpenAI's web crawler, used to gather content for training future GPT models and for ChatGPT's live-retrieval features. Identifies itself with the user-agent string starting GPTBot. Respects robots.txt. Allowing GPTBot means your content can be used in training and retrieved live; disallowing means you opt out of both. Most brands should allow it; the IP-protection cases are narrow and specific.

ClaudeBot

Anthropic's web crawler. User-agent string contains ClaudeBot. Functions equivalently to GPTBot — used for training data gathering and Claude's web-tool retrieval. Respects robots.txt. Block-or-allow decision should generally mirror your GPTBot decision.

PerplexityBot

Perplexity's crawler. Particularly important to allow because Perplexity is heavily retrieval-driven — content not crawled by PerplexityBot is unlikely to be cited in Perplexity's answers. Respects robots.txt. Blocking it is one of the most common reasons brands are conspicuously absent from Perplexity specifically.

Google-Extended

Google's user-agent for Bard/Gemini training and AI-feature data gathering. Distinct from Googlebot (which is for Google Search). You can block Google-Extended without affecting your Search rankings — but doing so opts you out of being trained on for Gemini and used in AI Overview retrieval. Brands occasionally block Google-Extended thinking it'll preserve their content; it preserves the content but makes them invisible to Google's AI surfaces.

llms.txt

An emerging convention: a markdown file at the root of your domain (yourdomain.com/llms.txt) that tells AI crawlers what to index and how to interpret your site. Modelled on robots.txt but with structured intent. Not yet widely respected by all engines, but adoption is increasing rapidly. Cheap to add, no downside, marginal upside that may grow significantly. Worth doing.

Influence levers

The handles you can actually pull

Entity

In the SEO/GEO context, a thing the AI recognises as a distinct named concept — your brand, your competitor, a city, a product category. Entities are the unit of representation in a knowledge graph. Strong entity recognition for your brand means the AI knows what you are, where you fit, and who you compete with. Weak entity recognition means the AI confuses you with similarly-named things or fails to place you in the right category.

See also: Knowledge Graph

Knowledge Graph

A structured representation of entities and their relationships, used by search engines and AI engines to ground generated content. Google has its Knowledge Graph (the source for the right-hand info panel in search results); other engines maintain their own. Wikipedia and Wikidata are the dominant inputs to most knowledge graphs. Brands with weak knowledge-graph presence have weaker AI visibility almost regardless of other optimisation work.

Digital PR

The practice of earning editorial coverage in third-party publications — trade press, mainstream news, podcasts, YouTube — to build entity associations and authority. Historically valued for direct referral traffic and link-building. Now valued primarily for shaping the source corpus that AI engines synthesise from. The most under-rated GEO discipline in 2026; the brands investing heavily in Digital PR are pulling ahead in AI visibility faster than the brands focusing only on on-site SEO.

Citation Laundering

The pattern by which a claim originates in a low-authority source (a Reddit comment, a press release, your own blog post), gets picked up by a higher-authority source (a trade publication), then propagates as if originating in the higher-authority source. AI engines lose track of original provenance during this process. Understanding citation laundering is key to understanding why "seeding" a claim into the AI conversation requires getting it into authoritative third-party sources, not just publishing it on your own site.

Schema Markup

Schema.org · Structured Data · JSON-LD

Standardised structured-data annotations embedded in HTML that tell search and AI engines exactly what type of content a page contains, what entities it describes, what facts it asserts. Schema.org is the standard vocabulary; JSON-LD is the recommended format. Strong schema markup makes your facts machine-extractable, which AI engines reward when synthesising answers. Specifically valuable types for GEO: Organization, Product, Service, FAQPage, HowTo, DefinedTerm.

Funnel-Stage Share of Voice

Share of AI Voice broken down by where in the buyer journey each query sits — awareness, consideration, decision. Most brands' AI visibility is uneven across stages; a typical pattern is "strong awareness, weak decision," meaning the AI knows what category you're in but doesn't recommend you when buyers ask for a specific tool. Funnel-stage breakdown surfaces the gap location, which determines the fix: more category-defining content for awareness gaps; more comparison and decision content for decision gaps.

Sentiment Frame

The implicit positioning an AI engine assigns to your brand when it mentions you. "X is the enterprise leader" and "X is the cheaper alternative for small teams" are different frames for the same brand. Sentiment frame is more granular than sentiment (positive/neutral/negative); it captures the qualitative narrative the AI applies. Surveying sentiment frames across hundreds of responses reveals how the AI thinks about your brand at a level no individual response shows.

Updates

This glossary is maintained quarterly. Last updated 25 April 2026. Terms added in the most recent revision: Funnel-Stage Share of Voice, Sentiment Frame, Citation Laundering, llms.txt.

Disagreements, suggested additions, terms we've defined badly: gareth@visible.md.

AI Search Visibility: A Complete Glossary

Quick navigation

The terms you'll use most

The engines you're optimising for

The mechanics

The bots reading your site

The handles you can actually pull

Updates

Get a Search Visibility Audit.

Quick navigation

The terms you'll use most

The engines you're optimising for

The mechanics

The bots reading your site

The handles you can actually pull

Updates

Get a Search Visibility Audit.

What is Generative Engine Optimization (GEO)?

Why isn't my brand showing up in ChatGPT?