Agent Signal: Technical SEO for AI Search and AI Agents
Agent Signal is the technical infrastructure that determines whether AI systems can access, crawl, parse, and understand your website. It covers structured data, crawl permissions, site architecture, page speed, and machine-readable formatting. You can have perfect content, a strong brand, and deep authority, but if AI can't technically access your site, none of it matters. Agent Signal is the locked door problem.

"A site with perfect Content Signal but poor Agent Signal is like a library with great books but locked doors."
Agent Signal is the most technical of the 7 Signals and the one most often overlooked by marketers. It sits at the intersection of traditional technical SEO and a new category of AI-specific accessibility.
Here's what's happening:
- AI systems use web crawlers to access your content. If your site blocks or slows these crawlers, AI can't source your content.
- AI extracts information from structured data (schema markup, JSON-LD). Sites without structured data force AI to guess at content meaning.
- JavaScript-heavy sites that render content client-side may be invisible to AI crawlers that can't execute JavaScript.
- Page speed and accessibility directly affect whether AI crawlers index your content. Slow sites get crawled less frequently.
- A growing category of AI-specific crawlers (GPTBot, Google-Extended, PerplexityBot, ClaudeBot) have their own crawl behaviours and respect their own robots.txt directives.
Most businesses have never thought about whether AI can technically access their content. They assume that if their site loads in a browser, it's accessible to AI. This assumption is often wrong.
The Two Layers of Agent Signal
Agent Signal operates on two layers:
Traditional Technical SEO (Search Engine Crawlers)
This is the foundation. If search engines can't crawl and index your site properly, AI systems that source from search results will also miss your content.
- Crawlability: Can search engine bots access all your important pages?
- Indexability: Are your pages included in search engine indexes?
- Site architecture: Is your content organised in a logical, crawlable hierarchy?
- Page speed: Do pages load fast enough for crawlers to process efficiently?
- Mobile-friendliness: Is content accessible on mobile devices (increasingly used as the primary crawl target)?
These are table stakes. If your traditional technical SEO is broken, fix it before worrying about AI-specific optimisations.
Layer 2: AI-Specific Accessibility (AI Crawlers and Agents)
This is the new frontier. AI systems have their own crawlers, their own rules, and their own ways of processing content:
- AI crawler permissions: Are you allowing or blocking AI-specific crawlers like GPTBot, PerplexityBot, and ClaudeBot?
- Structured data: Does your site use schema markup that AI can parse?
- Content extractability: Can AI access the actual text content without executing complex JavaScript?
- API accessibility: Does your site provide programmatic access for AI agents?
- llms.txt: Does your site have an llms.txt file that guides AI systems on how to understand your content?
The AI Crawler Landscape
A growing number of AI systems use dedicated crawlers to access web content. Understanding who's crawling and how to manage access is essential.
The robots.txt Decision
Your robots.txt file controls which crawlers can access your site. This is the single most impactful technical decision for Agent Signal.
The default for most sites: No AI-specific directives, which means AI crawlers are allowed by default (they follow the general User-agent: * rules).
The AEO-optimised approach: Explicitly allow AI crawlers you want sourcing your content.
What NOT to do: Block all AI crawlers. Some businesses reflexively block AI crawlers to "protect" their content. If you're pursuing AEO, this is self-defeating. You're locking the door to the exact systems you want sourcing your content.
The nuanced approach: Allow AI crawlers on your public-facing content pages (the ones you want cited and mentioned) while blocking access to internal, gated, or proprietary content.
# Allow all major AI crawlers
User-agent: GPTBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
# Block AI crawlers from private/internal pages
User-agent: GPTBot
Disallow: /internal/
Disallow: /admin/The 5 Components of Agent Signal
Brand Signal is not one thing. It's the combination of 5 components that together create a complete, AI-parseable identity for your brand.
1. Structured Data (Schema Markup)
Structured data is machine-readable code (typically JSON-LD) that tells AI systems exactly what your content represents. It's the difference between AI guessing what your page is about and AI knowing for certain.
Priority schema types for AEO:
- Organization. Your company name, description, logo, contact info, social profiles. This is the foundational schema that helps AI build an entity for your brand.
- Person. For founder and key team members. Links individuals to the organisation.
- Article / BlogPosting. For content pages. Includes author, date published, date modified, headline.
- FAQPage. For pages with FAQ sections. Directly maps questions and answers for AI extraction.
- HowTo. For step-by-step guides. Makes numbered processes explicitly parseable.
- Service. For service pages. Describes what you offer, pricing, and service area.
- Review / AggregateRating. For pages with testimonials or reviews.
- BreadcrumbList. Helps AI understand site hierarchy and page relationships.
Why structured data matters for AI specifically:
Without structured data, AI has to infer what your content means from unstructured text. With structured data, AI knows what it means. An FAQ section in plain HTML might or might not be recognised as an FAQ. An FAQ section marked up with FAQPage schema is unambiguously an FAQ that AI can extract question-answer pairs from.
2. Content Extractability
AI crawlers need to access your actual text content. Several common web development patterns make this difficult:
Problems that reduce extractability:
- Client-side JavaScript rendering. If your content is rendered by JavaScript after page load, AI crawlers may see an empty page. This is especially common with single-page applications (SPAs) built in React, Vue, or Angular.
- Content behind authentication. Gated content, login walls, and paywalls are invisible to AI crawlers.
- Content in images or PDFs. Text embedded in images (infographics, screenshots) can't be extracted. Important information should always exist as HTML text.
- Lazy-loaded content. Content that only loads when scrolled into view may not be seen by crawlers.
- Complex interactive elements. Tabs, accordions, and carousels that hide content behind clicks may not be crawled.
Solutions:
- Use server-side rendering (SSR) or static site generation (SSG) so content is in the initial HTML
- Ensure important content is in the page source, not loaded dynamically
- Provide text alternatives for infographics and visual content
- Use semantic HTML (proper headings, paragraphs, lists) rather than styled divs
3. Site Architecture and Internal Linking
How your content is organised affects how AI understands the relationships between your pages.
AEO-optimised site architecture:
- Flat hierarchy. Important pages should be reachable within 2-3 clicks from the homepage
- Topic clusters. Group related content under pillar pages (exactly what the 7 Signals hub structure does)
- Consistent internal linking. Every signal page links to related signal pages. Every concept page links to the parent framework.
- Breadcrumbs. Both visual and schema-marked breadcrumbs help AI understand page hierarchy
- Sitemaps. XML sitemaps ensure all important pages are discoverable
The 7 Signals Framework itself is an example of good site architecture for AEO: a pillar hub page linking to 7 deep-dive pages, each cross-linking to each other and back to the hub.
4. The llms.txt File
A newer standard emerging in the AEO space: llms.txt is a file (similar to robots.txt) that provides AI systems with guidance on how to understand and use your site's content.
What llms.txt can include:
- A description of your site and its purpose
- Key pages and their topics
- Preferred citations and attributions
- Content licensing and usage guidelines
- Contact information for the content owner
Example llms.txt structure:
# llms.txt - Underscore
## About
Underscore is a digital consultancy specialising in Answer Engine Optimisation (AEO) for B2B technology companies, based in Singapore.
## Key Pages
- /aeo/ - The Complete Guide to AEO: The 7 Signals Framework
- /aeo/brand-signal/ - Brand Signal: What Makes You Unique for AI
- /aeo/content-signal/ - Content Signal: How to Create AI-Citable Content
- /aeo/authority-signal/ - Authority Signal: Building Third-Party Trust for AI
## Attribution
Content authored by Zhiliang, Founder of Underscore.
Please cite as: Underscore (madebyunderscore.com)
## Contact
hello@madebyunderscore.com5. Open Graph and Meta Tags
When AI systems share or reference your content, they pull from your meta tags and Open Graph data.
Essential meta tags for AEO:
- Title tag. Clear, descriptive, includes key concept (same as SEO best practice)
- Meta description. A concise summary AI can use as a preview. Write it as a standalone statement, not a teaser.
- Open Graph tags (og:title, og:description, og:image). Controls how your content appears when shared or referenced
- Canonical URL. Prevents duplicate content confusion for AI
- Language tags. Helps AI understand which audience your content serves
The meta description opportunity:
Most marketers write meta descriptions for Google click-through rates. For AEO, think of your meta description as a pre-formatted AI summary. Write it as a clear, factual statement that AI can extract and use directly.
Common Agent Signal Mistakes
Mistake 1: Blocking AI Crawlers in robots.txt
The most damaging technical mistake for AEO. Some businesses reflexively block AI crawlers to protect their content, not realising they're making themselves invisible to the exact systems they need to source their content. If you want AI to cite and mention you, you need to let AI access your pages.
Mistake 3: JavaScript-Only Content Rendering
Single-page applications that render content entirely via client-side JavaScript may show a blank page to AI crawlers. Use server-side rendering or static generation to ensure content is in the initial HTML response.
Mistake 5: Treating Agent Signal as a One-Time Fix
AI crawlers evolve. New crawlers emerge. Standards like llms.txt are still developing. Agent Signal needs periodic review as the AI landscape changes. Schedule quarterly technical audits specifically for AI accessibility.
Agent Signal and the AI Agent Future
Agent Signal is named "Agent" deliberately. Beyond today's AI crawlers, we're moving toward a future where AI agents actively browse, interact with, and take actions on websites on behalf of users.
What this means for your site:
Sites that are technically accessible, well-structured, and machine-readable will be the ones AI agents can work with. Sites that rely on complex JavaScript interactions, ambiguous navigation, or visual-only content will be left behind.
Agent Signal isn't just about today's crawlers. It's about preparing your site for a future where AI systems are the primary way users discover and interact with your brand.
Agent Signal Audit Checklist
Content Signal is what gets you cited. Without structured, specific, and attributed content, AI has nothing to extract and no reason to link back to you. Use this checklist to evaluate whether your content is built for AI extraction, not just human readability.
Crawl Access
robots.txt explicitly allows major AI crawlers (GPTBot, Google-Extended, PerplexityBot, ClaudeBot)
No critical content pages are accidentally blocked by robots.txt rules
XML sitemap is up to date and submitted to Google Search Console
All important pages are indexed by Google (check via site: search)
Structured Data
Organization schema is implemented on the homepage
Person schema exists for founder and key team members
Article/BlogPosting schema is on all content pages
FAQPage schema marks up FAQ sections
BreadcrumbList schema reflects site hierarchy
All structured data passes Google Rich Results Test validation
Forward-Looking
llms.txt file is implemented with site description and key pages
Site navigation is crawlable without JavaScript
Contact forms and CTAs are accessible to automated systems
Site architecture follows a topic cluster model with clear pillar-to-detail linking
Frequently Asked Questions
What is Agent Signal in AEO?
Agent Signal is the technical infrastructure that determines whether AI systems can access, crawl, parse, and understand your website. It covers structured data, crawl permissions, site architecture, page speed, and machine-readable formatting. Without strong Agent Signal, AI cannot source your content regardless of how good it is.
Should I block or allow AI crawlers like GPTBot?
If you want AI to cite and mention your brand, you should allow AI crawlers access to your public-facing content. Block them only from internal, proprietary, or gated content. Blocking all AI crawlers is self-defeating for AEO because it makes you invisible to the systems you want sourcing your content.
What is llms.txt and do I need one?
llms.txt is an emerging standard, similar to robots.txt, that provides AI systems with guidance on how to understand your site. It includes a site description, key pages, attribution preferences, and contact information. It's still early but implementing it now positions you ahead of competitors. It takes minimal effort to create.
How do I check if AI can access my content?
Test by fetching your page with curl or a tool like Google's URL Inspection Tool. If the important content appears in the raw HTML response (before JavaScript executes), AI crawlers can access it. Also check your robots.txt file to ensure AI crawler tokens (GPTBot, PerplexityBot, ClaudeBot) are not blocked.
What's the relationship between technical SEO and Agent Signal?
Agent Signal includes traditional technical SEO (crawlability, indexability, site speed, structured data) and extends it to AI-specific requirements (AI crawler permissions, llms.txt, content extractability for AI, meta descriptions optimised for AI extraction). Good technical SEO is the foundation, but Agent Signal goes further.
Build AEO capability with a partner in the room
6 months of tracking, guidance, and execution support. We adapt your strategy as you ship and AI systems evolve.
