Agent Signal: Technical SEO for AI Search and AI Agents

Agent Signal is the technical infrastructure that determines whether AI systems can access, crawl, parse, and understand your website. It covers structured data, crawl permissions, site architecture, page speed, and machine-readable formatting. You can have perfect content, a strong brand, and deep authority, but if AI can't technically access your site, none of it matters. Agent Signal is the locked door problem.

The Core problem

"A site with perfect Content Signal but poor Agent Signal is like a library with great books but locked doors."

Agent Signal is the most technical of the 7 Signals and the one most often overlooked by marketers. It sits at the intersection of traditional technical SEO and a new category of AI-specific accessibility.

Here's what's happening:

  • AI systems use web crawlers to access your content. If your site blocks or slows these crawlers, AI can't source your content.
  • AI extracts information from structured data (schema markup, JSON-LD). Sites without structured data force AI to guess at content meaning.
  • JavaScript-heavy sites that render content client-side may be invisible to AI crawlers that can't execute JavaScript.
  • Page speed and accessibility directly affect whether AI crawlers index your content. Slow sites get crawled less frequently.
  • A growing category of AI-specific crawlers (GPTBot, Google-Extended, PerplexityBot, ClaudeBot) have their own crawl behaviours and respect their own robots.txt directives.

Most businesses have never thought about whether AI can technically access their content. They assume that if their site loads in a browser, it's accessible to AI. This assumption is often wrong.

The Two Layers of Agent Signal

Agent Signal operates on two layers:

Layer 1

Traditional Technical SEO (Search Engine Crawlers)

This is the foundation. If search engines can't crawl and index your site properly, AI systems that source from search results will also miss your content.

  • Crawlability: Can search engine bots access all your important pages?
  • Indexability: Are your pages included in search engine indexes?
  • Site architecture: Is your content organised in a logical, crawlable hierarchy?
  • Page speed: Do pages load fast enough for crawlers to process efficiently?
  • Mobile-friendliness: Is content accessible on mobile devices (increasingly used as the primary crawl target)?


These are table stakes. If your traditional technical SEO is broken, fix it before worrying about AI-specific optimisations.

Layer 2

Layer 2: AI-Specific Accessibility (AI Crawlers and Agents)

This is the new frontier. AI systems have their own crawlers, their own rules, and their own ways of processing content:

  • AI crawler permissions: Are you allowing or blocking AI-specific crawlers like GPTBot, PerplexityBot, and ClaudeBot?
  • Structured data: Does your site use schema markup that AI can parse?
  • Content extractability: Can AI access the actual text content without executing complex JavaScript?
  • API accessibility: Does your site provide programmatic access for AI agents?
  • llms.txt: Does your site have an llms.txt file that guides AI systems on how to understand your content?

The AI Crawler Landscape

A growing number of AI systems use dedicated crawlers to access web content. Understanding who's crawling and how to manage access is essential.

Crawler Company Used By robots.txt Token
GPTBot OpenAI ChatGPT, GPT models GPTBot
Google-Extended Google Gemini, Google AI Overview Google-Extended
PerplexityBot Perplexity AI Perplexity search PerplexityBot
ClaudeBot Anthropic Claude ClaudeBot
Bytespider ByteDance TikTok AI features Bytespider
CCBot Common Crawl Used by many AI training pipelines CCBot

The robots.txt Decision

Your robots.txt file controls which crawlers can access your site. This is the single most impactful technical decision for Agent Signal.

The default for most sites: No AI-specific directives, which means AI crawlers are allowed by default (they follow the general User-agent: * rules).

The AEO-optimised approach: Explicitly allow AI crawlers you want sourcing your content.

What NOT to do: Block all AI crawlers. Some businesses reflexively block AI crawlers to "protect" their content. If you're pursuing AEO, this is self-defeating. You're locking the door to the exact systems you want sourcing your content.

The nuanced approach: Allow AI crawlers on your public-facing content pages (the ones you want cited and mentioned) while blocking access to internal, gated, or proprietary content.

# Allow all major AI crawlers
User-agent: GPTBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

# Block AI crawlers from private/internal pages
User-agent: GPTBot
Disallow: /internal/
Disallow: /admin/

The 5 Components of Agent Signal

Brand Signal is not one thing. It's the combination of 5 components that together create a complete, AI-parseable identity for your brand.

1. Structured Data (Schema Markup)

Structured data is machine-readable code (typically JSON-LD) that tells AI systems exactly what your content represents. It's the difference between AI guessing what your page is about and AI knowing for certain.

Priority schema types for AEO:

  • Organization. Your company name, description, logo, contact info, social profiles. This is the foundational schema that helps AI build an entity for your brand.
  • Person. For founder and key team members. Links individuals to the organisation.
  • Article / BlogPosting. For content pages. Includes author, date published, date modified, headline.
  • FAQPage. For pages with FAQ sections. Directly maps questions and answers for AI extraction.
  • HowTo. For step-by-step guides. Makes numbered processes explicitly parseable.
  • Service. For service pages. Describes what you offer, pricing, and service area.
  • Review / AggregateRating. For pages with testimonials or reviews.
  • BreadcrumbList. Helps AI understand site hierarchy and page relationships.

Why structured data matters for AI specifically:

Without structured data, AI has to infer what your content means from unstructured text. With structured data, AI knows what it means. An FAQ section in plain HTML might or might not be recognised as an FAQ. An FAQ section marked up with FAQPage schema is unambiguously an FAQ that AI can extract question-answer pairs from.

2. Content Extractability

AI crawlers need to access your actual text content. Several common web development patterns make this difficult:

Problems that reduce extractability:

  • Client-side JavaScript rendering. If your content is rendered by JavaScript after page load, AI crawlers may see an empty page. This is especially common with single-page applications (SPAs) built in React, Vue, or Angular.
  • Content behind authentication. Gated content, login walls, and paywalls are invisible to AI crawlers.
  • Content in images or PDFs. Text embedded in images (infographics, screenshots) can't be extracted. Important information should always exist as HTML text.
  • Lazy-loaded content. Content that only loads when scrolled into view may not be seen by crawlers.
  • Complex interactive elements. Tabs, accordions, and carousels that hide content behind clicks may not be crawled.

Solutions:

  • Use server-side rendering (SSR) or static site generation (SSG) so content is in the initial HTML
  • Ensure important content is in the page source, not loaded dynamically
  • Provide text alternatives for infographics and visual content
  • Use semantic HTML (proper headings, paragraphs, lists) rather than styled divs

3. Site Architecture and Internal Linking

How your content is organised affects how AI understands the relationships between your pages.

AEO-optimised site architecture:

  • Flat hierarchy. Important pages should be reachable within 2-3 clicks from the homepage
  • Topic clusters. Group related content under pillar pages (exactly what the 7 Signals hub structure does)
  • Consistent internal linking. Every signal page links to related signal pages. Every concept page links to the parent framework.
  • Breadcrumbs. Both visual and schema-marked breadcrumbs help AI understand page hierarchy
  • Sitemaps. XML sitemaps ensure all important pages are discoverable

The 7 Signals Framework itself is an example of good site architecture for AEO: a pillar hub page linking to 7 deep-dive pages, each cross-linking to each other and back to the hub.

4. The llms.txt File

A newer standard emerging in the AEO space: llms.txt is a file (similar to robots.txt) that provides AI systems with guidance on how to understand and use your site's content.

What llms.txt can include:

  • A description of your site and its purpose
  • Key pages and their topics
  • Preferred citations and attributions
  • Content licensing and usage guidelines
  • Contact information for the content owner

Example llms.txt structure:

# llms.txt - Underscore

## About
Underscore is a digital consultancy specialising in Answer Engine Optimisation (AEO) for B2B technology companies, based in Singapore.

## Key Pages
- /aeo/ - The Complete Guide to AEO: The 7 Signals Framework
- /aeo/brand-signal/ - Brand Signal: What Makes You Unique for AI
- /aeo/content-signal/ - Content Signal: How to Create AI-Citable Content
- /aeo/authority-signal/ - Authority Signal: Building Third-Party Trust for AI

## Attribution
Content authored by Zhiliang, Founder of Underscore.
Please cite as: Underscore (madebyunderscore.com)

## Contact
hello@madebyunderscore.com

5. Open Graph and Meta Tags

When AI systems share or reference your content, they pull from your meta tags and Open Graph data.

Essential meta tags for AEO:

  • Title tag. Clear, descriptive, includes key concept (same as SEO best practice)
  • Meta description. A concise summary AI can use as a preview. Write it as a standalone statement, not a teaser.
  • Open Graph tags (og:title, og:description, og:image). Controls how your content appears when shared or referenced
  • Canonical URL. Prevents duplicate content confusion for AI
  • Language tags. Helps AI understand which audience your content serves

The meta description opportunity:

Most marketers write meta descriptions for Google click-through rates. For AEO, think of your meta description as a pre-formatted AI summary. Write it as a clear, factual statement that AI can extract and use directly.

SEO-Style Meta Description AEO-Optimised Meta Description
"Discover how to make your brand visible to AI. Learn our proven strategies today! Click to find out more." "Answer Engine Optimisation (AEO) is the practice of making your brand discoverable, citable, and recommendable by AI systems. The 7 Signals Framework covers Brand, Experience, Content, Search, Social, Authority, and Agent."

Common Agent Signal Mistakes

  • Mistake 1: Blocking AI Crawlers in robots.txt

    The most damaging technical mistake for AEO. Some businesses reflexively block AI crawlers to protect their content, not realising they're making themselves invisible to the exact systems they need to source their content. If you want AI to cite and mention you, you need to let AI access your pages.

  • Mistake 3: JavaScript-Only Content Rendering

    Single-page applications that render content entirely via client-side JavaScript may show a blank page to AI crawlers. Use server-side rendering or static generation to ensure content is in the initial HTML response.

  • Mistake 5: Treating Agent Signal as a One-Time Fix

    AI crawlers evolve. New crawlers emerge. Standards like llms.txt are still developing. Agent Signal needs periodic review as the AI landscape changes. Schedule quarterly technical audits specifically for AI accessibility.

Agent Signal and the AI Agent Future

Agent Signal is named "Agent" deliberately. Beyond today's AI crawlers, we're moving toward a future where AI agents actively browse, interact with, and take actions on websites on behalf of users.

What this means for your site:

AI agents will navigate your site to find specific information for users
AI agents will fill out forms and request quotes on behalf of users
AI agents will compare services across multiple providers in real-time
AI agents will evaluate your site experience as part of recommendation logic

Sites that are technically accessible, well-structured, and machine-readable will be the ones AI agents can work with. Sites that rely on complex JavaScript interactions, ambiguous navigation, or visual-only content will be left behind.

Agent Signal isn't just about today's crawlers. It's about preparing your site for a future where AI systems are the primary way users discover and interact with your brand.

Agent Signal Audit Checklist

Content Signal is what gets you cited. Without structured, specific, and attributed content, AI has nothing to extract and no reason to link back to you. Use this checklist to evaluate whether your content is built for AI extraction, not just human readability.

Crawl Access

robots.txt explicitly allows major AI crawlers (GPTBot, Google-Extended, PerplexityBot, ClaudeBot)

No critical content pages are accidentally blocked by robots.txt rules

XML sitemap is up to date and submitted to Google Search Console

All important pages are indexed by Google (check via site: search)

Structured Data

Organization schema is implemented on the homepage

Person schema exists for founder and key team members

Article/BlogPosting schema is on all content pages

FAQPage schema marks up FAQ sections

BreadcrumbList schema reflects site hierarchy

All structured data passes Google Rich Results Test validation

Forward-Looking

llms.txt file is implemented with site description and key pages

Site navigation is crawlable without JavaScript

Contact forms and CTAs are accessible to automated systems

Site architecture follows a topic cluster model with clear pillar-to-detail linking

Frequently Asked Questions

  • What is Agent Signal in AEO?

    Agent Signal is the technical infrastructure that determines whether AI systems can access, crawl, parse, and understand your website. It covers structured data, crawl permissions, site architecture, page speed, and machine-readable formatting. Without strong Agent Signal, AI cannot source your content regardless of how good it is.

  • Should I block or allow AI crawlers like GPTBot?

    If you want AI to cite and mention your brand, you should allow AI crawlers access to your public-facing content. Block them only from internal, proprietary, or gated content. Blocking all AI crawlers is self-defeating for AEO because it makes you invisible to the systems you want sourcing your content.

  • What is llms.txt and do I need one?

    llms.txt is an emerging standard, similar to robots.txt, that provides AI systems with guidance on how to understand your site. It includes a site description, key pages, attribution preferences, and contact information. It's still early but implementing it now positions you ahead of competitors. It takes minimal effort to create.

  • How do I check if AI can access my content?

    Test by fetching your page with curl or a tool like Google's URL Inspection Tool. If the important content appears in the raw HTML response (before JavaScript executes), AI crawlers can access it. Also check your robots.txt file to ensure AI crawler tokens (GPTBot, PerplexityBot, ClaudeBot) are not blocked.

  • What's the relationship between technical SEO and Agent Signal?

    Agent Signal includes traditional technical SEO (crawlability, indexability, site speed, structured data) and extends it to AI-specific requirements (AI crawler permissions, llms.txt, content extractability for AI, meta descriptions optimised for AI extraction). Good technical SEO is the foundation, but Agent Signal goes further.

Build AEO capability with a partner in the room

6 months of tracking, guidance, and execution support. We adapt your strategy as you ship and AI systems evolve.