This is a working glossary. It's not exhaustive — categories this young don't have stable canonical terminology, and any list claiming to be exhaustive is either six months out of date or guessing what the field will look like in six months. The 32 terms below are the ones you'll actually hear in serious conversations about AI search visibility, ordered by how often they come up.
Where multiple terms refer to the same thing, we name the synonyms and pick a canonical version. Where the field disagrees, we say so. Where someone is using a term sloppily, we call that out too.
Quick navigation
The terms you'll use most
The practice of making a brand more likely to appear inside the answers AI engines (ChatGPT, Claude, Gemini, Perplexity, AI Overviews) generate when users ask commercial questions. Coined in a 2023 academic paper by Princeton, Georgia Tech, and Allen Institute for AI researchers. The most widely-used canonical term for the discipline; preferred over AEO and LLMO because it names the actual mechanism rather than the surface.
Functionally synonymous with GEO. AEO emphasises that AI engines deliver answers, not lists of links. Some practitioners insist on the distinction; most use the terms interchangeably. If you're in a meeting and someone says AEO, hear "GEO." If you're writing for SEO buyers, AEO sometimes lands better; for marketing leaders, GEO does.
Less common synonym for GEO. Emphasises the underlying technology (LLMs) rather than the user-facing surface (engines). Tends to be preferred by technical audiences and AI engineers; less common in marketing departments. Same practice as GEO and AEO — the field hasn't fully consolidated on one term.
Umbrella term for buyer-research queries directed at conversational AI engines (ChatGPT, Claude, Gemini, Perplexity) and at AI-augmented traditional search (Google AI Overviews, Bing Copilot). Distinct from traditional search, which produces ranked-link results. Buyer behaviour is migrating from traditional to AI search at a measurable rate; the question for marketers is how fast the migration is in their specific category, not whether it's happening.
The aggregate measure of a brand's discoverability across all search channels — Google rankings, AI engine answers, AI Overviews, and increasingly other discovery surfaces (TikTok search, Reddit search, etc.). Distinct from "AI visibility" (which only covers AI engines) or "SEO visibility" (which only covers Google). Useful as a single number to track over time as channels evolve.
The percentage of AI engine responses that link to your domain as a source. Distinct from Share of AI Voice — your brand can be mentioned without your URL being cited, and vice versa. Citation rate matters because cited URLs receive direct referral traffic, while mentioned-but-not-cited brand placements influence the broader AI conversation but generate fewer clicks.
Loose synonym for Share of AI Voice, used informally. Some practitioners use "brand mention rate" to mean raw mentions before normalising for response volume; in this context, prefer Share of AI Voice as the precise term. Reserve "brand mention rate" for the simpler unnormalised count if you must.
The engines you're optimising for
The dominant AI engine by user volume. Built on the GPT family of models (currently GPT-4o and successors). Heavy reliance on training-data baseline knowledge for general queries, with selective live retrieval via "browse with Bing" mode for current-state queries. The engine most marketers test first when checking AI visibility, and the one most CMOs personally use, which gives it disproportionate weight in internal "are we visible?" conversations.
The second AI engine most professionals use. Built on the Claude model family (currently Claude 4.7). Used heavily by knowledge workers and developers; the engine your buyers are most likely to use for serious research vs. casual queries. Live retrieval via Claude's web tool. Important for enterprise B2B audiences specifically.
Google's AI engine. Distinct from AI Overviews (which appear in Google Search). Heavy integration with Google Workspace; benefits from Google's index for live retrieval. Often underweight in audits because marketing teams don't use it personally — but consumer-facing brands targeting Android-heavy markets should not skip it.
An AI engine that prioritises live retrieval over training-data knowledge. Every response includes inline source citations (numbered references in the text). Heaviest use case is research-style buyer queries — people who want sources, not just answers. Disproportionately influential among researchers, journalists, analysts. Cite-by-default model means strong organic content can drive direct attribution traffic.
The AI-generated summary that appears at the top of Google search results for many queries. Replaces the user's need to click through to source pages for many informational queries. Drives the largest single shift in click-through-rate behaviour in Google's history. Critical for brands whose acquisition channel is Google organic search; less critical for brands whose buyers are already AI-engine-native.
The previous name for Google AI Overviews, used during the experimental rollout phase (2023–2024). You'll still see "SGE" in older articles and tool documentation. Treat as a synonym for AI Overviews when reading anything pre-2025.
The mechanics
The corpus of text (and increasingly images, code, etc.) used to train an AI model. For LLMs, this is typically a frozen snapshot of the public web plus books, papers, code repositories, and conversational data. Training data shapes the model's baseline knowledge — what it knows about your brand without looking anything up. Cannot be updated after training; only changed by training a new model with a new snapshot.
The date past which an AI model has no knowledge from training data. For example, ChatGPT 4o has a cutoff of October 2023; events after that date are unknown to the baseline model unless retrieved live. Brands launching after a cutoff are functionally invisible to the engine's baseline knowledge until the next model snapshot. Engines update at different cadences — typically every 6–12 months for major model releases.
The technical mechanism by which AI engines fetch live web content, read it, and include it in their generated answer. The engine does a search query, retrieves 5–8 high-relevance pages, reads them, then writes an answer that synthesises across them. RAG is what makes "live retrieval" possible. Brands optimising for AI engines need to optimise for both the training-data layer and the retrieval-by-RAG layer.
The process of an AI engine basing its answer on retrieved source material rather than purely on training data. A "grounded" answer cites specific sources; an "ungrounded" answer is generated from the model's parametric knowledge alone. Grounded answers are typically more accurate for current-state queries; ungrounded answers can be more fluent but more prone to hallucination. Engines differ in how aggressively they ground.
When an AI model generates a statement that's confidently presented but factually false. Includes invented brand attributes, fabricated quotes, made-up statistics, and false claims about real entities. From a GEO perspective, hallucinations matter because an engine might confidently misattribute features, pricing, or positioning to your brand. The fix is at the source layer (more accurate, easily-extractable facts about you on authoritative pages) — not at the engine.
When an AI engine breaks a single user query into multiple sub-queries, each retrieved separately, and synthesises across them. Example: a user asks "what's the best CRM for a 20-person sales team?" The engine fans out into queries like "best CRM 2026", "CRM for small sales teams", "Salesforce alternatives", etc. Fan-out is why optimising for the exact phrase a user types is less important than being well-represented across the cluster of sub-queries the engine generates.
The collection of test queries used to audit a brand's AI visibility. Typically 50–150 commercial queries, balanced across the funnel (awareness, consideration, decision). A representative prompt set is the foundation of any rigorous Share of AI Voice measurement; an unrepresentative one will give misleading results. Prompt sets should be designed by category and refreshed quarterly as buyer language evolves.
The bots reading your site
OpenAI's web crawler, used to gather content for training future GPT models and for ChatGPT's live-retrieval features. Identifies itself with the user-agent string starting GPTBot. Respects robots.txt. Allowing GPTBot means your content can be used in training and retrieved live; disallowing means you opt out of both. Most brands should allow it; the IP-protection cases are narrow and specific.
Anthropic's web crawler. User-agent string contains ClaudeBot. Functions equivalently to GPTBot — used for training data gathering and Claude's web-tool retrieval. Respects robots.txt. Block-or-allow decision should generally mirror your GPTBot decision.
Perplexity's crawler. Particularly important to allow because Perplexity is heavily retrieval-driven — content not crawled by PerplexityBot is unlikely to be cited in Perplexity's answers. Respects robots.txt. Blocking it is one of the most common reasons brands are conspicuously absent from Perplexity specifically.
Google's user-agent for Bard/Gemini training and AI-feature data gathering. Distinct from Googlebot (which is for Google Search). You can block Google-Extended without affecting your Search rankings — but doing so opts you out of being trained on for Gemini and used in AI Overview retrieval. Brands occasionally block Google-Extended thinking it'll preserve their content; it preserves the content but makes them invisible to Google's AI surfaces.
An emerging convention: a markdown file at the root of your domain (yourdomain.com/llms.txt) that tells AI crawlers what to index and how to interpret your site. Modelled on robots.txt but with structured intent. Not yet widely respected by all engines, but adoption is increasing rapidly. Cheap to add, no downside, marginal upside that may grow significantly. Worth doing.
The handles you can actually pull
In the SEO/GEO context, a thing the AI recognises as a distinct named concept — your brand, your competitor, a city, a product category. Entities are the unit of representation in a knowledge graph. Strong entity recognition for your brand means the AI knows what you are, where you fit, and who you compete with. Weak entity recognition means the AI confuses you with similarly-named things or fails to place you in the right category.
A structured representation of entities and their relationships, used by search engines and AI engines to ground generated content. Google has its Knowledge Graph (the source for the right-hand info panel in search results); other engines maintain their own. Wikipedia and Wikidata are the dominant inputs to most knowledge graphs. Brands with weak knowledge-graph presence have weaker AI visibility almost regardless of other optimisation work.
The practice of earning editorial coverage in third-party publications — trade press, mainstream news, podcasts, YouTube — to build entity associations and authority. Historically valued for direct referral traffic and link-building. Now valued primarily for shaping the source corpus that AI engines synthesise from. The most under-rated GEO discipline in 2026; the brands investing heavily in Digital PR are pulling ahead in AI visibility faster than the brands focusing only on on-site SEO.
The pattern by which a claim originates in a low-authority source (a Reddit comment, a press release, your own blog post), gets picked up by a higher-authority source (a trade publication), then propagates as if originating in the higher-authority source. AI engines lose track of original provenance during this process. Understanding citation laundering is key to understanding why "seeding" a claim into the AI conversation requires getting it into authoritative third-party sources, not just publishing it on your own site.
Standardised structured-data annotations embedded in HTML that tell search and AI engines exactly what type of content a page contains, what entities it describes, what facts it asserts. Schema.org is the standard vocabulary; JSON-LD is the recommended format. Strong schema markup makes your facts machine-extractable, which AI engines reward when synthesising answers. Specifically valuable types for GEO: Organization, Product, Service, FAQPage, HowTo, DefinedTerm.
Share of AI Voice broken down by where in the buyer journey each query sits — awareness, consideration, decision. Most brands' AI visibility is uneven across stages; a typical pattern is "strong awareness, weak decision," meaning the AI knows what category you're in but doesn't recommend you when buyers ask for a specific tool. Funnel-stage breakdown surfaces the gap location, which determines the fix: more category-defining content for awareness gaps; more comparison and decision content for decision gaps.
The implicit positioning an AI engine assigns to your brand when it mentions you. "X is the enterprise leader" and "X is the cheaper alternative for small teams" are different frames for the same brand. Sentiment frame is more granular than sentiment (positive/neutral/negative); it captures the qualitative narrative the AI applies. Surveying sentiment frames across hundreds of responses reveals how the AI thinks about your brand at a level no individual response shows.
Updates
This glossary is maintained quarterly. Last updated 25 April 2026. Terms added in the most recent revision: Funnel-Stage Share of Voice, Sentiment Frame, Citation Laundering, llms.txt.
Disagreements, suggested additions, terms we've defined badly: gareth@visible.md.