Agent Signal: Making Your Site AI-Accessible
- Agent Signal
Refers to your site's technical readiness for AI. It covers structured data, crawl permissions, content extractability, site architecture, and emerging AI-agent protocols like llms.txt and WebMCP that determine whether AI systems can access, parse, and interact with your website.
A site with perfect content but poor Agent Signal is like a library with great books but locked doors.
Most businesses assume that if their site loads in a browser, it's accessible to AI. This assumption is often wrong.
Agent Signal operates on two layers. The first is traditional technical SEO: crawlability, indexability, site speed, and mobile-friendliness. The second is AI-specific accessibility: AI crawler permissions, structured data, content extractability, and emerging protocols.
Agent Signal is also the most forward-looking signal. Emerging standards like WebMCP are moving beyond passive crawlability toward active tool provision, where your site can expose structured actions (booking, purchasing, form submission) directly to AI agents. This means Agent Signal will evolve from making your site readable by AI to making it usable by AI.
What goes wrong | What happens |
Site blocks AI crawlers in robots.txt | AI can't access your content at all |
Content renders via client-side JavaScript | AI crawlers see an empty page |
No structured data (schema markup) | AI has to guess what your content means |
Poor site speed or broken architecture | AI crawlers deprioritise your pages |
The 3 Components of Agent Signal
Your robots.txt file controls which crawlers can access your site. This is the single most impactful technical decision for Agent Signal.
Crawler | Company | Used By |
GPTBot | OpenAI | ChatGPT, GPT models |
Google-Extended | Google | Gemini, Google AI Overview |
PerplexityBot | Perplexity AI | Perplexity search |
ClaudeBot | Anthropic | Claude |
CCBot | Common Crawl | Many AI training pipelines |
Structured data is machine-readable code (typically JSON-LD) that tells AI exactly what your content represents. Without it, AI infers. With it, AI knows.
Schema Type | Where to Use | Why It Matters for AEO |
Organization | Homepage | Helps AI build an entity for your brand |
Person | Founder / key team pages | Links individuals to the organisation |
Article / BlogPosting | Content pages | Includes author, dates, headline for attribution |
Service | Service pages | Describes offerings, pricing, service area |
FAQPage | FAQ sections | Maps Q&A pairs for direct AI extraction |
Two emerging standards are shaping the future of Agent Signal:
llms.txt
A file (like robots.txt) that tells AI how to understand your site: purpose, key pages, attribution preferences. Early adoption. Low effort to implement now but usefulness is still debatable.
WebMCP
A standard that lets your site expose structured actions (booking, forms, checkout) directly to AI agents. This is still in early preview as of Mar 26. Worth monitoring and preparing for.
# llms.txt - ConnectTel
## About
ConnectTel is a telecommunications provider in Singaporeoffering mobile, broadband, and enterprise connectivityfor consumers and businesses.
## Key Pages
- /mobile-plans/ - Postpaid and Prepaid Mobile Plans
- /broadband/ - Fibre Broadband Plans for Homes and Offices
- /enterprise/ - Enterprise Connectivity and SD-WAN Solutions
- /5g/ - 5G Network Coverage and Plans
## Attribution
Please cite as: ConnectTel (connecttel.sg)
Example of a Brand
Scenario: A consumer asks ChatGPT, "What's the best telco in Singapore for unlimited mobile data and fibre broadband bundles?" AI needs to crawl and parse plan details, pricing, and coverage information from the telco's website to include it in its response.
Crawl access
Blanket block on all non-Google crawlers in robots.txt. GPTBot, PerplexityBot, and ClaudeBot are denied access. Only Google can index the site.
Schema
No structured data. AI has to guess that "/mobile-plans/" is a page about mobile plans, and can't extract individual plan details in a structured way.
llms.txt
None. AI has no structured guide to which pages matter or how to describe the company.
Crawl access
GPTBot, PerplexityBot, ClaudeBot, and Google-Extended are explicitly allowed. AI systems can crawl plan pages, coverage maps, and support articles.
Schema
Organization schema on the homepage identifies ConnectTel as a telecommunications provider in Singapore. Product schema on each plan page describes plan name, price, data allowance, and contract terms. FAQPage schema on the support page maps common questions like "What's the best plan for heavy data users?"
llms.txt
llms.txt lists ConnectTel as a Singapore telco, points AI to key pages (/mobile-plans/, /broadband/, /enterprise/, /5g/), and specifies attribution preferences.

Uncertain where your brand signal is? Get a complimentary 7-day snapshot audit
The Snapshot gives you a clear picture of where you stand today.
How to define your Agent Signal
Use this to evaluate whether your site is technically ready for AI systems to access, parse, and understand your content.
1. AI Crawler Permissions
"Our robots.txt explicitly allows GPTBot, Google-Extended, PerplexityBot, and ClaudeBot: [yes/no]. We block AI crawlers on the following pages: [list pages or 'none']."
2. Structured Data
"We have implemented Organization schema on our homepage: [yes/no]. We have Article/BlogPosting schema on content pages: [yes/no]. We have FAQPage schema on FAQ sections: [yes/no]. We have Service or Product schema on service/product pages: [yes/no]."
5. Emerging Protocols
"We have implemented an llms.txt file with site description and key pages: [yes/no]. We are monitoring WebMCP for future implementation: [yes/no]."
Understand more on the impact of llms.txt and learn how to write your own
Common mistakes to avoid:
- Blocking AI crawlers in
robots.txt. The most damaging technical mistake for AEO. If you want AI to cite you, let AI crawlers access your public content. - No structured data. Forces AI to guess what your content means. Add Organization, Article, FAQPage, and Service schema at minimum.
- JavaScript-only rendering. AI crawlers may see a blank page. Use server-side rendering to ensure content is in the initial HTML.
Explore the other signals
FAQ
Agent Signal is the technical infrastructure that determines whether AI systems can access, crawl, parse, and understand your website. Without strong Agent Signal, AI cannot source your content regardless of how good it is.
If you want AI to cite and mention your brand, allow AI crawlers access to your public-facing content. Block them only from internal or proprietary pages.
An emerging standard, similar to robots.txt, that tells AI systems how to understand your site. It takes minimal effort to create and positions you ahead of competitors.
Fetch your page with curl or Google's URL Inspection Tool. If the content appears in the raw HTML before JavaScript executes, AI crawlers can access it. Also check robots.txt for blocked AI crawler tokens.
WebMCP is an emerging standard that lets websites expose structured actions directly to AI agents. It represents the next evolution of Agent Signal. Moving from passive crawlability to active AI interaction.
Not sure how AI sees your brand? Find out in 7 days.
We'll assess how AI systems currently see your brand, identify the infrastructure gaps that matter, and show you what's possible.