Part of the 7 Signals of AEO Framework

Agent Signal: Making Your Site AI-Accessible

Agent Signal

Refers to your site's technical readiness for AI. It covers structured data, crawl permissions, content extractability, site architecture, and emerging AI-agent protocols like llms.txt and WebMCP that determine whether AI systems can access, parse, and interact with your website.

A site with perfect content but poor Agent Signal is like a library with great books but locked doors.

Most businesses assume that if their site loads in a browser, it's accessible to AI. This assumption is often wrong.

Agent Signal operates on two layers. The first is traditional technical SEO: crawlability, indexability, site speed, and mobile-friendliness. The second is AI-specific accessibility: AI crawler permissions, structured data, content extractability, and emerging protocols.

Agent Signal is also the most forward-looking signal. Emerging standards like WebMCP are moving beyond passive crawlability toward active tool provision, where your site can expose structured actions (booking, purchasing, form submission) directly to AI agents. This means Agent Signal will evolve from making your site readable by AI to making it usable by AI.

What goes wrong
What happens
Site blocks AI crawlers in robots.txt
AI can't access your content at all
Content renders via client-side JavaScript
AI crawlers see an empty page
No structured data (schema markup)
AI has to guess what your content means
Poor site speed or broken architecture
AI crawlers deprioritise your pages

The 3 Components of Agent Signal

Your robots.txt file controls which crawlers can access your site. This is the single most impactful technical decision for Agent Signal.

Crawler
Company
Used By
GPTBot
OpenAI
ChatGPT, GPT models
Google-Extended
Google
Gemini, Google AI Overview
PerplexityBot
Perplexity AI
Perplexity search
ClaudeBot
Anthropic
Claude
CCBot
Common Crawl
Many AI training pipelines

Structured data is machine-readable code (typically JSON-LD) that tells AI exactly what your content represents. Without it, AI infers. With it, AI knows.

Schema Type
Where to Use
Why It Matters for AEO
Organization
Homepage
Helps AI build an entity for your brand
Person
Founder / key team pages
Links individuals to the organisation
Article / BlogPosting
Content pages
Includes author, dates, headline for attribution
Service
Service pages
Describes offerings, pricing, service area
FAQPage
FAQ sections
Maps Q&A pairs for direct AI extraction

Two emerging standards are shaping the future of Agent Signal:

llms.txt
A file (like robots.txt) that tells AI how to understand your site: purpose, key pages, attribution preferences. Early adoption. Low effort to implement now but usefulness is still debatable.

WebMCP
A standard that lets your site expose structured actions (booking, forms, checkout) directly to AI agents. This is still in early preview as of Mar 26. Worth monitoring and preparing for.

EXAMPLE (LLMS.TXT)

# llms.txt - ConnectTel

## About
ConnectTel is a telecommunications provider in Singaporeoffering mobile, broadband, and enterprise connectivityfor consumers and businesses.

## Key Pages
- /mobile-plans/ - Postpaid and Prepaid Mobile Plans
- /broadband/ - Fibre Broadband Plans for Homes and Offices
- /enterprise/ - Enterprise Connectivity and SD-WAN Solutions
- /5g/ - 5G Network Coverage and Plans

## Attribution
Please cite as: ConnectTel (connecttel.sg)

Example of a Brand

Scenario: A consumer asks ChatGPT, "What's the best telco in Singapore for unlimited mobile data and fibre broadband bundles?" AI needs to crawl and parse plan details, pricing, and coverage information from the telco's website to include it in its response.

TYPICAL SIGNAL

Crawl access
Blanket block on all non-Google crawlers in robots.txt. GPTBot, PerplexityBot, and ClaudeBot are denied access. Only Google can index the site.

Schema
No structured data. AI has to guess that "/mobile-plans/" is a page about mobile plans, and can't extract individual plan details in a structured way.

llms.txt
None. AI has no structured guide to which pages matter or how to describe the company.

STRONGER SIGNAL

Crawl access
GPTBot, PerplexityBot, ClaudeBot, and Google-Extended are explicitly allowed. AI systems can crawl plan pages, coverage maps, and support articles.

Schema
Organization schema on the homepage identifies ConnectTel as a telecommunications provider in Singapore. Product schema on each plan page describes plan name, price, data allowance, and contract terms. FAQPage schema on the support page maps common questions like "What's the best plan for heavy data users?"

llms.txt
llms.txt lists ConnectTel as a Singapore telco, points AI to key pages (/mobile-plans/, /broadband/, /enterprise/, /5g/), and specifies attribution preferences.

Uncertain where your brand signal is? Get a complimentary 7-day snapshot audit

The Snapshot gives you a clear picture of where you stand today.

Apply for a complimentary Snapshot

How to define your Agent Signal

Use this to evaluate whether your site is technically ready for AI systems to access, parse, and understand your content.

1. AI Crawler Permissions

"Our robots.txt explicitly allows GPTBot, Google-Extended, PerplexityBot, and ClaudeBot: [yes/no]. We block AI crawlers on the following pages: [list pages or 'none']."

2. Structured Data

"We have implemented Organization schema on our homepage: [yes/no]. We have Article/BlogPosting schema on content pages: [yes/no]. We have FAQPage schema on FAQ sections: [yes/no]. We have Service or Product schema on service/product pages: [yes/no]."

5. Emerging Protocols

"We have implemented an llms.txt file with site description and key pages: [yes/no]. We are monitoring WebMCP for future implementation: [yes/no]."

Understand more on the impact of llms.txt and learn how to write your own

Read More
AUDIT CHECKLIST

Common mistakes to avoid:

  1. Blocking AI crawlers in robots.txt. The most damaging technical mistake for AEO. If you want AI to cite you, let AI crawlers access your public content.
  2. No structured data. Forces AI to guess what your content means. Add Organization, Article, FAQPage, and Service schema at minimum.
  3. JavaScript-only rendering. AI crawlers may see a blank page. Use server-side rendering to ensure content is in the initial HTML.
Download the full checklist

FAQ

Agent Signal is the technical infrastructure that determines whether AI systems can access, crawl, parse, and understand your website. Without strong Agent Signal, AI cannot source your content regardless of how good it is.

If you want AI to cite and mention your brand, allow AI crawlers access to your public-facing content. Block them only from internal or proprietary pages.

An emerging standard, similar to robots.txt, that tells AI systems how to understand your site. It takes minimal effort to create and positions you ahead of competitors.

Fetch your page with curl or Google's URL Inspection Tool. If the content appears in the raw HTML before JavaScript executes, AI crawlers can access it. Also check robots.txt for blocked AI crawler tokens.

WebMCP is an emerging standard that lets websites expose structured actions directly to AI agents. It represents the next evolution of Agent Signal. Moving from passive crawlability to active AI interaction.

Not sure how AI sees your brand? Find out in 7 days.

We'll assess how AI systems currently see your brand, identify the infrastructure gaps that matter, and show you what's possible.

Apply for a consultation