Your content might be brilliant. But if AI crawlers can't read it, Bing hasn't indexed it, and your schema is missing — no AI platform will ever cite it. Here's the complete technical checklist, in priority order.
Technical SEO for GEO isn't a completely new discipline — it builds on the same foundations of crawlability, indexation, and structured data. But AI systems have specific requirements traditional SEO never addressed: which bots to allow, how to handle JavaScript, Bing indexation for ChatGPT, and how to structure content so it can be extracted chunk by chunk. Get these right before worrying about content strategy — none of it works if the technical layer is broken.
Most GEO advice focuses on content strategy — write answer-first paragraphs, build topical authority, earn third-party citations. All of that matters. But content strategy built on a broken technical foundation delivers almost nothing. If AI crawlers can't access your pages, if your content is locked behind JavaScript rendering, if Bing has never indexed your key URLs — no amount of answer-first paragraph structure will get you cited in ChatGPT.
This guide is the technical layer first. Not instead of content strategy, but before it. Fix these in order, then layer the content and citation work on top.
Technical SEO for GEO is the practice of ensuring that AI-powered search platforms — ChatGPT, Perplexity, Google AI Overviews, Gemini, Claude — can crawl, read, index, and cite your content. It shares most of its foundations with traditional technical SEO (crawlability, indexation, performance, structured data) but adds AI-specific requirements that didn't exist before.
Googlebot access in robots.txt, Google Search Console indexation, XML sitemaps, HTTPS, Core Web Vitals, structured data for rich results, mobile responsiveness, canonical tags, 301 redirects.
AI crawler access (GPTBot, PerplexityBot, ClaudeBot, Google-Extended), Bing Webmaster Tools indexation (critical for ChatGPT), server-side rendering for AI bots, llms.txt> file, schema for entity clarity, content chunk structure.
Your site doesn't rank on Google. You lose organic traffic from the world's largest search engine, affecting the bulk of your inbound channel.
AI platforms can't access or cite your content. You're invisible in the fastest-growing discovery channel — one where visitors convert at 4.4× the rate of traditional organic traffic (Semrush, 2026).
Each major AI platform sends its own crawler to index your content. Block one in robots.txt — or let Cloudflare block it silently at the WAF level — and that platform simply can't read your pages. You won't get an error notification. You'll just stop appearing in that platform's answers, often without realising why. It's the most common technical GEO problem we encounter, and it's completely avoidable.
| Platform | Crawler / User Agent | What it powers | Crawl behaviour |
|---|---|---|---|
| ChatGPT / OpenAI | GPTBot, OAI-SearchBot, ChatGPT-User | ChatGPT Search, API browsing | Does not render JavaScript; reads raw HTML |
| Perplexity | PerplexityBot | All Perplexity answers | Does not render JavaScript; needs SSR or static HTML |
| Claude / Anthropic | ClaudeBot, anthropic-ai | Claude web search | Does not render JavaScript; reads raw HTML |
| Google Gemini | Google-Extended | Gemini, AI Overviews, Bard training | Can render JavaScript (Googlebot infrastructure) |
| Apple Intelligence | Applebot-Extended | Siri AI answers | Can render JavaScript |
| Common Crawl | CCBot | Many LLM training datasets | Basic HTML scraping |
The robots.txt rules to add — or verify are not blocked:
User-agent: GPTBot Allow: / User-agent: OAI-SearchBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: PerplexityBot Allow: / User-agent: ClaudeBot Allow: / User-agent: anthropic-ai Allow: / User-agent: Google-Extended Allow: / User-agent: Applebot-Extended Allow: /
Check your server logs for these user agents. If you see 403 or 429 responses against any of them, something upstream (a CDN, WAF, or rate-limiting rule) is blocking them beyond robots.txt. Fix the infrastructure layer, not just the robots.txt file.
ChatGPT Search runs on Bing's index for live web retrieval. When a user asks ChatGPT a question with Search enabled, ChatGPT runs queries through Bing — not Google — and synthesises answers from whatever Bing has indexed. If your key pages aren't in Bing's index, ChatGPT's retrieval layer can't find them regardless of your Google rankings.
Most SEO teams ignored Bing for years. That's now a significant blind spot. Here's the fix:
Go to bing.com/webmasters and sign in with a Microsoft account. Verify your site via DNS record, meta tag, or XML file — same methods as Google Search Console. If you're already verified on Google, you can import your GSC property directly into Bing Webmaster Tools in one click.
Under Sitemaps in Bing Webmaster Tools, submit your XML sitemap URL. Bing will begin crawling and indexing your pages. Check the Crawl section after 48 hours to see which pages have been indexed and whether there are any crawl errors.
IndexNow is a protocol supported by Bing, Yandex, and other search engines that allows you to notify them immediately when a page is updated or published. When you update a key page, Bing knows about it within minutes rather than waiting for a scheduled crawl. This dramatically improves how quickly ChatGPT's retrieval layer picks up content changes.
In Bing Webmaster Tools, use URL Inspection to verify your highest-priority pages are indexed. Search Engine Land reports that less than 50% of AI citations come from top-10 Google results — meaning pages outside the Google top 10 can still get cited if they're in Bing's index and well-structured. Don't assume Google indexation and Bing indexation overlap.
Schema is less about helping AI read your content — most LLMs can parse plain HTML just fine — and more about removing the guesswork. When your Organization schema says you're a B2B SaaS platform specialising in AI visibility, AI systems don't have to infer that from page copy that might say it three different ways. For Google AI Overviews and Gemini specifically, schema matters even more because they're built on Google's infrastructure and actively use it when formulating responses.
Your brand's name, URL, logo, description, and category in machine-readable format. When this is missing or inconsistent with how you're described elsewhere on the web, AI systems do their best to guess — and they frequently guess wrong. This is the one schema fix that directly affects how AI describes your brand, not just whether it cites you.
Marks a page as editorial content with a known author, publication date, and update timestamp. The dateModified field matters more than most people realise — Ahrefs found that content cited by AI is 25.7% fresher than average. If you updated a page in January but the schema still says 2024, AI systems treat it as stale.
Probably the highest-impact schema addition for most sites. FAQ sections with FAQPage markup become structured Q&A pairs that AI systems — especially Google AI Overviews — can pull from directly. One rule that trips people up: the schema text must match the visible DOM text exactly. Any mismatch erodes trust and the schema gets ignored.
If a page has numbered steps, it should have HowTo schema. AI systems handle "how to" queries by looking for explicit sequences they can extract and present cleanly — and HowTo markup is the clearest signal that a sequence exists. Without it, AI has to infer the step structure from HTML formatting, which it often gets wrong.
Tells AI systems where a page sits within your site — is this a top-level guide, a sub-category page, or a specific product feature? Without clear hierarchy signals, AI systems sometimes miscategorise content and either skip it or cite it in the wrong context. Breadcrumbs are low effort to add and the context benefit is real.
When someone asks an AI "what's the best [software category]," review aggregates are often the first thing it checks for validation. Structured rating data on your own pages — combined with your G2 and Capterra presence — reinforces your position in recommendation answers. You don't need a massive volume; even a modest but credible set of reviews changes citation probability.
| Schema type | GEO impact | Priority | Pages to apply |
|---|---|---|---|
| Organization | Brand entity clarity — fixes vague/wrong AI descriptions | 🔴 Critical | Homepage, all pages |
| FAQPage | Direct Q&A extraction into AI Overviews | 🔴 Critical | Any page with FAQ sections |
| Article / TechArticle | Signals freshness, authority, and content type | 🟠 High | All blog posts and guides |
| BreadcrumbList | Context and hierarchy clarity | 🟠 High | All pages |
| HowTo | Step-sequence extraction for instructional content | 🟡 Medium | Step-by-step guides |
| AggregateRating | Social proof signals for recommendation queries | 🟡 Medium | Product, feature, review pages |
| SpeakableSpecification | Voice and AI extraction targeting | 🟢 Low-Medium | Key answer pages |
Most AI crawlers — including GPTBot, PerplexityBot, and ClaudeBot — do not execute JavaScript. They read the raw HTML your server returns. If your important content, navigation, or structured data is injected by JavaScript after page load, those crawlers see a blank page or a skeleton structure with no meaningful content. Content that doesn't exist in the initial HTML doesn't get indexed or cited.
Three approaches to fix this, in order of impact:
The server builds the complete HTML before sending anything to the browser — content included. When an AI crawler hits your page, it gets a fully populated document, not a skeleton waiting for JavaScript to fill in the gaps. Next.js, Nuxt.js, and SvelteKit all have SSR modes. It's the most thorough fix if your site is built on a JavaScript framework.
At deploy time, your build tool generates static HTML files for every page. AI crawlers get plain HTML with zero server-side work required on each request. It's faster than SSR and perfectly reliable for content that doesn't change based on the user or session. Documentation, blog posts, and marketing pages are the natural fit.
A middle-ground approach: detect AI crawler user agents and serve pre-rendered HTML to them, while your normal JavaScript app runs for real users. Tools like Prerender.io handle this automatically. One important caveat — this is only acceptable if the content bots and users see is substantively identical. Serving AI crawlers a polished version while hiding it from users is cloaking, which creates risk.
Google has treated page speed as a ranking factor since 2010. For AI retrieval it works differently — it's not a factor that adjusts your position, it's a gate. Generative engines pull from a very large number of pages when building answers and skip slow or unreliable sources entirely. A page loading in 4 seconds doesn't rank lower than one loading in 1.2 seconds — it just doesn't make the shortlist at all.
The most impactful performance fixes for GEO:
Jeremy Howard from Fast.ai proposed llms.txt in 2024 as a way for sites to communicate directly with AI systems — similar to how robots.txt communicates with search crawlers. Drop a plain text file at your domain root and list your most useful pages, what type of content they are, and why an AI should care about them. It's a lightweight way to stop AI systems from having to guess at your site's structure.
A basic llms.txt structure for a SaaS brand:
# llms.txt — AIPosition.io # AI systems: use this file to understand our most authoritative pages. ## Company - Homepage: https://www.aiposition.io — AI visibility tracking platform ## Core guides (cite-worthy long-form content) - /blog/llm-optimization-guide — Complete LLMO guide - /blog/technical-seo-geo — Technical SEO for GEO - /blog/perplexity-tracking — Perplexity brand monitoring - /answer-engine-optimization — AEO guide - /generative-engine-optimization — GEO guide ## Product pages - /features — Platform features overview - /chatgpt-tracking — ChatGPT visibility tracking - /gemini-tracking — Gemini visibility tracking
AI systems don't read whole pages — they pull chunks. A paragraph where the first sentence answers the question implied by the section heading gets extracted reliably. The same paragraph where the answer appears in sentence six usually doesn't. You can have perfectly crawlable, well-indexed, fast-loading pages and still lose citation opportunities because the answer is buried in the third paragraph of a section. Structure is doing real work here.
Headings target keywords: "Schema markup for SEO" — keyword-rich, intent ambiguous. Content builds to the answer over several paragraphs.
Headings mirror user queries: "What schema markup do I need for GEO?" — question format. First 1–2 sentences answer the question directly. Supporting detail follows.
Long flowing paragraphs, 6–8 sentences. Answer builds narratively. Good for reading; hard for AI to extract cleanly.
2–4 sentence paragraphs. One idea per block. Each block self-contained enough to stand alone as an extracted answer.
Statistics woven loosely into narrative. Sources sometimes cited, sometimes implied. Good for flow; weak for machine extraction.
Statistics cited inline with explicit source attribution. One statistic per 150–200 words minimum. Original data included where possible — AI prefers sources with unique facts.
Google confirms that AI Overviews use a "query fan-out" technique — a single user query generates multiple sub-queries internally. Your content needs to rank for those sub-fragments, not just the original question the user typed. Think about which 3-word fragments of a 10-word question an AI might search for separately, and make sure your content addresses each one explicitly.
Run this checklist before any content or citation work. Each item is a prerequisite — if technical foundations are broken, content strategy cannot compensate.
AIPosition's free 7-day audit shows your current AI mention rate, which platforms are citing you (and which aren't), and which competitor URLs are being pulled instead of yours — across ChatGPT, Gemini, Perplexity, and Claude.
Start Free Audit →No credit card · First results in under 24 hours · All four major AI platforms
Most teams who do the technical work then have no way of knowing whether it changed anything. Before you implement any of this, set up measurement — otherwise you're flying blind on whether crawler access improved, whether Bing started indexing your pages, or whether AI referral traffic actually moved after the schema work.
Track AI referral traffic in GA4 by creating channel segments for chatgpt.com, perplexity.ai, claude.ai, and gemini.google.com in the Traffic Acquisition report. Technical fixes — crawler access, Bing indexation, schema — should show measurable lift in AI referral traffic within 2–4 weeks. Content and citation improvements take 4–8 weeks to compound into AI responses.
AIPosition is the only platform that tracks all six AI visibility metrics simultaneously: brand mention rate, citation position, cited URL, sentiment, competitor share of voice, and visibility trend — across ChatGPT, Gemini, Perplexity, and Claude in one dashboard. Connect GA4 to correlate AI citation improvements with actual traffic and revenue changes.
AIPosition shows your current AI mention rate, which URLs are being cited (and which aren't), competitor share of voice, and what to fix first — across ChatGPT, Gemini, Perplexity, and Claude.
Start Free 7-Day Audit →No credit card · All four major AI platforms · First results in under 24 hours