If you’ve spent any time optimizing for AI search, you’ve probably hit the same frustrating question: is it actually working?
Traditional SEO gave us relatively clean answers: rankings, impressions, clicks, positions. You could open Search Console and Google Analytics, see where you stood, and track progress week over week. AI visibility doesn’t work like that.
The outputs from ChatGPT, Gemini, and Claude are variable, personalized, and often impossible to reproduce exactly. There’s no fixed “position 1.” A query that mentions your brand today might not mention it tomorrow, even with the same prompt.
The AI might cite your page as a source in one session and ignore it completely in the next. And unlike organic search, there’s no unified analytics dashboard that tells you how visible you are across these platforms.
That’s the core challenge – and it’s why most marketing teams are either ignoring AI visibility measurement entirely or cobbling together unreliable manual checks. Neither approach is good enough anymore.
Continue reading: How ChatGPT Decides Which Businesses to Recommend
This guide covers the full measurement framework, platform-specific tactics, the metrics that actually matter, and a practical comparison of every tool worth considering in 2026.
Why AI visibility measurement is different from SEO
In traditional SEO, you’re working with a relatively stable system. Google indexes your page, assigns it a position for specific queries, and you can track that position over time. The feedback loop is clean: optimize content, watch rankings change, measure traffic impact.
AI visibility breaks almost every part of that loop. LLMs don’t produce the same response twice, even for identical prompts. Ask ChatGPT “what’s the best project management tool for startups?” three times and you’ll likely get three different answers with different brands mentioned. There’s no canonical “ranking” to track.
Platform fragmentation makes it worse. ChatGPT, Gemini, Claude, Perplexity, and Copilot all generate answers differently, pull from different sources, and offer different levels of measurement support. What works for tracking one platform is often useless for another.
Then there’s the attribution gap. Even when an AI mentions your brand or cites your page, you often can’t tell whether that mention drove a visit or did nothing at all. And every commercial tool that claims to measure AI visibility is working with a proxy — a modeled database, automated prompt runs, or browser automation snapshots.
None of them can tell you exactly what every user sees in every session. Once you accept that, you can start building a system that’s honest about its limitations but still genuinely useful.
Continue reading: How AI visibility and GEO are going to change in Q2 2026
The 4-layer measurement model
The most practical way to think about AI visibility is as four distinct layers, each measuring something different. Tracking all four matters because they can diverge badly — you might be mentioned frequently but never cited, cited but never clicked, or clicked but never converted.
| Layer | What it measures | How to track it | Key question |
|---|---|---|---|
| Presence | Does the AI mention your brand? | Prompt testing — run queries and record brand mentions | Are we in the conversation at all? |
| Citation | Does the AI link to your pages? | Prompt testing — note when your domain appears as a cited source | Is the AI sending anyone our way? |
| Traffic | Do AI users actually visit your site? | GA4 referral tracking, server logs, referral analysis | Are those citations turning into visits? |
| Business impact | Do AI-referred visitors convert? | Conversion tracking in GA4, pipeline attribution | Does AI visibility actually move the business? |
Layer 1: Presence
This is the most basic question: does the AI mention your brand at all? When someone asks ChatGPT about your category, do you show up in the answer?
Presence tracking means running a set of prompts repeatedly and recording whether your brand name appears in the response. It’s binary at its simplest — mentioned or not — but you can also capture sentiment, framing, and how prominently you’re featured relative to competitors.
Layer 2: Citation
Being mentioned isn’t the same as being cited. Citation means the AI linked to your page or domain as a source for its answer. This matters because citations are the mechanism that can actually drive traffic.
An AI might say “tools like Acme and Contoso are popular” without linking to either — that’s presence without citation. You want to track both, but citation is where the measurable value starts.
Layer 3: Traffic
Does the AI platform actually send visitors to your site? This is where first-party analytics become essential.
For ChatGPT, you can track this through UTM parameters (meaning it will be marked in Google Analytics). For other platforms, it’s harder — you’ll need to look at referral sources, server logs, and landing page patterns to piece things together.
Layer 4: Business impact
Traffic alone doesn’t tell you much. The real question is whether AI-referred visitors convert — do they sign up, buy, request a demo, subscribe? This layer connects AI visibility to actual business outcomes.
You’ll track this the same way you track any channel: conversion rates, assisted conversions, pipeline influence, and revenue attribution. The difference is that AI referral volumes are still small enough that you need to be careful about drawing conclusions from thin data.
Optimizing for one layer without watching the others can be misleading. A brand might celebrate high presence scores while their citation rate is near zero — the AI talks about them but never sends anyone their way.
Or they might see strong citation rates but discover that AI-referred visitors bounce immediately because the cited page doesn’t match what the AI promised.
How to track AI visibility by platform
Each major AI platform offers a different level of measurement support, and the practical tracking approach varies significantly across them.
ChatGPT
ChatGPT is the most measurement-friendly platform right now. When ChatGPT cites a source and a user clicks through, the referral URL includes utm_source=chatgpt.com, which means you can track these sessions cleanly in GA4 or any analytics platform.
Start by checking eligibility. Verify that you haven’t blocked OAI-SearchBot in your robots.txt — blocking it means you won’t appear in search-grounded answers.
For traffic measurement, set up a dedicated segment or report in GA4 filtering for utm_source=chatgpt.com. Track landing pages, session duration, and conversion events from these sessions. Compare the conversion rate against your other channels to understand the quality of ChatGPT-referred traffic.
For visibility beyond traffic, you’ll need to run prompt tests. Pick a set of prompts relevant to your category — we’ll cover how to build a proper prompt library in the next section — and record whether ChatGPT mentions your brand, cites your pages, and where you appear relative to competitors.
Gemini
Gemini’s measurement story is more complicated because it’s deeply integrated with Google Search. Traffic from Gemini and AI Overviews shows up in Google Search Console under the “Web” search type, blended with traditional organic results. There’s no separate filter that lets you isolate “this click came from an AI Overview” versus “this click came from a classic blue link.”
The practical approach is a delta method: monitor your Search Console performance trends and look for shifts in impressions, clicks, and CTR that correlate with AI feature rollouts. If you see a query where impressions increased but CTR dropped, that’s often a signal that an AI Overview is answering the query directly and reducing click-through.
Supplement this with manual SERP snapshots. For your priority queries, regularly check whether Google is displaying an AI Overview, what sources it cites, and whether your pages appear among them. Tools like Semrush can help automate parts of this, but the ground truth comes from actually looking at the SERPs.
Claude
Claude is the hardest platform to measure. Anthropic’s crawler (ClaudeBot) indexes content, and Claude’s web search tool can pull live results, but there’s no native referral parameter equivalent to ChatGPT’s utm_source tagging. When a Claude user clicks a cited link, there’s no reliable way to distinguish that visit from direct traffic in your analytics.
Your best options are indirect. Check your server logs for ClaudeBot and Claude-SearchBot activity to understand what Anthropic is crawling on your site. Run manual prompt tests on Claude to track presence and citation. And monitor your referral traffic for any patterns that suggest Claude-originated visits — though this will always be imprecise.
The honest assessment is that Claude tracking today is closer to research-style observability than proper analytics. You can build a picture of whether Claude knows about you and cites you, but tying that to traffic and conversions requires more inference than measurement.
How to build a prompt tracking system
Prompt tracking is the backbone of any AI visibility measurement program. Without it, you might know that ChatGPT sends you some traffic, but you won’t know which topics you’re visible for, where competitors are beating you, or whether your optimization efforts are moving the needle.
The biggest mistake teams make is starting with arbitrary keyword guesses. Build your prompt library from real data instead.
Gagan Ghotra outlined a practical approach that starts with your existing Search Console queries — the terms real people actually use to find you — and structures them by funnel stage: top-of-funnel awareness queries, mid-funnel consideration and comparison queries, and bottom-funnel high-intent queries.
This gives you a prompt set grounded in actual user behavior rather than speculation.
What a good prompt library looks like
Your prompt library should cover several categories: branded informational queries (“what does [your brand] do?”), branded comparison queries (“[your brand] vs [competitor]”), non-branded category queries (“best tools for X”), high-intent commercial queries (“which X should I buy for Y?”), and local or regional variants if relevant.
Don’t limit yourself to exact-match keyword prompts. Real users ask conversational questions — “I’m looking for a project management tool for remote teams under 20 people” — so include natural-language variants alongside more structured queries.
What to capture per prompt
For each prompt, record:
- Whether your brand was mentioned (yes/no)
- Whether your domain was cited as a source (yes/no)
- Citation position if cited
- Which competitor brands and domains appeared
- The overall sentiment of how you were described
- The timestamp, geographic location, and language setting
This structured data is what lets you calculate meaningful metrics over time.
Recommended cadence
Run your highest-priority prompts — typically 10–20 core queries — daily. Expand to a broader set of 50–100 prompts weekly.
Run an extended long-tail sweep monthly to catch emerging topics or shifting patterns. Daily runs give you trend data; weekly and monthly runs give you coverage and discovery.
The core metrics to track
Once you have prompt tracking data flowing alongside your first-party analytics, these are the 10 metrics that give you the clearest picture of AI visibility performance.
| Metric | What it measures | Layer |
|---|---|---|
| Prompt coverage | % of tracked prompts where your brand appears in the response | Presence |
| Citation coverage | % of tracked prompts where your domain or page is cited as a source | Citation |
| Owned source rate | % of prompts where your own domain (not a third-party site) is the cited source | Citation |
| Average citation position | Where you appear in the source list when you are cited | Citation |
| Competitor citation gap | Prompts where a competitor is cited and you’re not — your opportunity backlog | Presence / Citation |
| AI referral sessions | Actual visits from AI platforms, particularly ChatGPT-tagged sessions | Traffic |
| AI referral conversion rate | How AI-referred visitors convert compared to other channels | Business impact |
| Top cited pages | Which pages on your site power your AI visibility | Citation / Traffic |
| Source opportunity rate | External sources cited for competitors’ prompts but not for yours | Citation |
| Technical eligibility score | Are crawlers blocked? Is structured data valid? Are noindex directives limiting your content? | Foundation |
Tools for tracking AI visibility
The tooling market has matured rapidly, but no tool gives you ground truth about AI visibility. They all work with proxies, models, or snapshots. Understanding what each category actually does — and can’t do — is essential.
Free / DIY first-party approach
Before spending on any commercial tool, set up the basics. GA4 gives you ChatGPT referral tracking. Google Search Console shows performance trends that may reflect AI feature impact.
Server logs reveal which AI crawlers are accessing your site. And manual prompt testing — asking ChatGPT, Gemini, and Claude your key queries and recording results — gives you the most honest snapshot of what users actually see.
This baseline is mandatory regardless of budget.
Continue reading: How to Evaluate an AI Visibility Provider Before Signing
Semrush AI Visibility Toolkit
Semrush’s AI Visibility Toolkit is the most comprehensive option for most SEO and marketing teams. It covers ChatGPT, Gemini, Google AI Overviews, AI Mode, SearchGPT, and Perplexity, drawing from a database of 239M+ prompts and responses.
The key features include prompt tracking for monitoring specific queries, cited source and page analysis, competitor benchmarking, share of voice reporting, sentiment tracking, and a site audit that checks AI-specific technical issues.
It’s part of the Semrush One subscription, which makes it accessible for teams already using the platform.
That said, Semrush’s data is modeled and directional, not real-time observation. It tells you what’s likely happening based on a large sample, not what’s definitely happening in any specific user’s session. Think of it as market intelligence, not a live monitoring feed.
Best for: SEO teams, agencies, and SMB to mid-market companies who want a practical all-in-one stack.
DataForSEO LLM Mentions API
DataForSEO’s LLM Mentions API takes a fundamentally different approach — it’s a developer-friendly API, not a turnkey dashboard. You get access to endpoints for search mentions, aggregated metrics, cross-aggregated competitor benchmarking, and top domains/pages analysis across multiple AI models.
The pricing model is $100/month minimum commitment, but it’s non-subscription — your balance stays and can be used across all DataForSEO APIs, making the per-query cost very low. This makes it attractive for technical teams who want raw data to feed into their own dashboards and reporting systems.
One important distinction: the LLM Mentions API is database-driven, similar to Semrush’s model. If you need real-time tracking of ChatGPT responses with web search enabled, DataForSEO’s separate LLM Scraper API is the relevant product.
Best for: technical teams and developers who want raw data and API access to build custom monitoring.
Scrunch
Scrunch positions itself as enterprise-grade prompt and citation monitoring with full raw data access. It uses a combination of browser automation, platform APIs, and raw response capture to cover eight platforms: ChatGPT, Claude, Gemini, Perplexity, Google AI Mode, Google AI Overviews, Meta AI, and Copilot.
The standout features are its Query API and Responses API for raw AI responses — the actual text the AI generated, not summaries or scores. It also offers Agent Traffic monitoring (bot activity), AI Referrals tracking (human visitors), persona-based prompt filtering, and SOC 2 Type II compliance.
Best for: enterprise teams, agencies managing multiple brands, and teams that need API-level access to raw response data.
Peec AI
Peec AI uses UI scraping and browser automation to mirror what real users actually see in AI interfaces — as opposed to database-modeled approaches. It covers ChatGPT, Gemini, Copilot, Perplexity, and Claude, tracking visibility, position, sentiment, source analysis, and competitor presence.
What sets Peec apart is its focus on prompt-level observability. You can track specific prompts over time and see exactly how AI responses change. It also offers a Looker connector and enterprise API for teams that need to integrate the data into existing reporting workflows.
Best for: modern SEO teams running dedicated GEO programs, mid-market agencies, and teams who prioritize seeing what the AI interface actually shows users.
Otterly
Otterly is a lightweight, practical monitoring tool covering ChatGPT, Perplexity, Google AI Overviews, and AI Mode. It focuses on the essentials: prompt monitoring, brand mention tracking, website citation tracking, domain rankings, and weekly reporting.
It’s not trying to be a full strategic platform — it’s a monitoring layer that tells you whether you’re showing up, how often, and whether that’s changing. The simplicity is the selling point for teams that don’t need enterprise-grade infrastructure.
Best for: SMBs, consultants, agencies, and teams in the early stages of building a GEO program.
Goodie
Goodie takes a different angle entirely, focusing on brand narrative and reputation monitoring across LLMs. Rather than tracking citations and traffic, it’s designed to answer the question: “How do AI assistants describe our brand?”
Key features include sentiment analysis, visibility trend tracking, competitor benchmarking, cross-LLM scanning, and country/language filtering. It’s less about driving clicks and more about understanding and shaping the story AI tells about you.
Best for: brand and reputation teams, PR/comms teams, and marketing leadership who care about narrative control.
Rankscale
Rankscale is a lightweight monitoring option that covers rank tracking, mention and citation monitoring, engine-specific tracking, competitive visibility insights, and API access. It’s positioned for teams that want basic AI visibility monitoring without enterprise overhead or pricing.
Documentation and public methodology are thinner than larger competitors, but for scrappy teams that need a quick-start option, it’s worth considering.
Best for: smaller teams, tighter budgets, and quick monitoring needs.
Tools at a glance
| Tool | Platforms covered | Data approach | Pricing tier | Best for |
|---|---|---|---|---|
| GA4 + Search Console | ChatGPT (via UTM), Google | First-party, real | Free | Everyone — mandatory baseline |
| Semrush AI Toolkit | ChatGPT, Gemini, AI Overviews, Perplexity, SearchGPT | Modeled database (239M+ prompts) | Semrush One subscription | SEO teams, agencies, SMB–mid-market |
| DataForSEO LLM API | Multiple LLMs | Database + scraper API options | $100/mo minimum (non-subscription) | Technical teams, developers |
| Scrunch | 8 platforms incl. Meta AI, Copilot | Browser automation + raw response capture | Enterprise | Enterprise teams, multi-brand agencies |
| Peec AI | ChatGPT, Gemini, Copilot, Perplexity, Claude | UI scraping (mirrors real user view) | Mid-market | GEO-focused SEO teams |
| Otterly | ChatGPT, Perplexity, AI Overviews, AI Mode | Automated monitoring | SMB-friendly | Early-stage GEO programs, consultants |
| Goodie | Multiple LLMs | Cross-LLM scanning | Mid-market | PR/brand/comms teams |
| Rankscale | Multiple LLMs | Automated monitoring | Budget-friendly | Small teams, tight budgets |
Recommended stack by budget
| Budget level | Stack | What you get |
|---|---|---|
| Low-cost (free) | GA4 + Search Console + spreadsheet for manual prompt checks + server log analysis | Real first-party data — more honest than any commercial tool. Don’t skip this baseline. |
| Mid-tier | Everything above + Semrush AI Toolkit (if already a Semrush user) or Peec AI | First-party truth plus directional intelligence from a larger dataset |
| Serious / enterprise | Custom prompt runner + DataForSEO API + Scrunch or Semrush for benchmarking + Peec for prompt-level observability | Full BI dashboard pulling from multiple sources |
Best combination for serious GEO programs: Semrush for the market map and competitor benchmarking, paired with Peec or Scrunch for prompt-level observability. Semrush tells you the landscape; Peec or Scrunch tells you exactly what’s happening for your specific prompts.
FAQ
Can I track AI visibility for free?
Yes — and you should, regardless of whether you also use paid tools. GA4 tracks ChatGPT referral traffic automatically. Google Search Console shows performance trends that reflect AI feature impact. Server logs reveal AI crawler activity. And manual prompt testing is free beyond your time. This first-party baseline is the most reliable data you’ll have.
What is the most important metric to track?
It depends on your goal. For awareness, prompt coverage matters most. For traffic, citation coverage and AI referral sessions are your priority. For ROI, AI referral conversion rate is the focus. Start with citation coverage — it’s the clearest leading indicator of whether AI visibility will translate into business value.
How often should I run AI visibility checks?
Run your top 10–20 priority prompts daily to build trend data. Expand to 50–100 prompts weekly for broader coverage. Do a full long-tail sweep monthly. For first-party analytics (GA4, Search Console), check weekly at minimum and set up automated reports or alerts for significant changes.
Is utm_source=chatgpt.com available for all ChatGPT users?
The parameter is appended when ChatGPT users click cited sources in search-grounded responses. It works across ChatGPT’s web interface and apps, but only applies when ChatGPT uses its web search feature and cites sources — not for every conversational mention. See OpenAI’s publisher FAQ for details.
What is the best AI visibility tool for a small business?
Start with the free first-party approach: GA4, Search Console, and manual prompt checks. If you want to add a paid tool, Otterly is the most lightweight and practical option for small teams.
If you’re already using Semrush for SEO, its AI Visibility Toolkit adds significant value without requiring another subscription.
If you are comfortable with code, DataForSEO might be another excellent low cost option.
Avoid jumping straight to enterprise-grade tools — you’ll get more out of your investment once you have baseline data and know which metrics actually move your business.