Skip to content

Add robots.txt crawl rules#57

Merged
AnnatarHe merged 1 commit intomasterfrom
codex/add-robotstxt-crawl-rules
Apr 19, 2026
Merged

Add robots.txt crawl rules#57
AnnatarHe merged 1 commit intomasterfrom
codex/add-robotstxt-crawl-rules

Conversation

@AnnatarHe
Copy link
Copy Markdown
Member

Summary

  • Add a root robots.txt for the site
  • Allow crawling of published posts and RSS
  • Disallow preview and generated Open Graph image paths
  • Include the sitemap location

Testing

  • Not run (not requested)

@AnnatarHe AnnatarHe temporarily deployed to codex/add-robotstxt-crawl-rules - asynctalk-website PR #57 April 19, 2026 02:31 — with Render Destroyed
@AnnatarHe AnnatarHe merged commit 01d7a6a into master Apr 19, 2026
1 check passed
@AnnatarHe AnnatarHe deleted the codex/add-robotstxt-crawl-rules branch April 19, 2026 02:32
@claude
Copy link
Copy Markdown

claude Bot commented Apr 19, 2026

Code Review — PR #57: Add robots.txt crawl rules

Note: This PR is already merged. Feedback is provided for awareness and future reference.

Overview

Adds a robots.txt to public/ to control crawler access — allowing content pages and RSS while blocking OG image/preview paths. The intent and placement are correct for an Astro static site.


What's Good

  • Correct placement in public/ so Astro serves it as a static root-level file.
  • Disallowing OG image paths (/og-preview, /posts/*-og.png) is a good practice — these are generated assets, not useful to index.
  • Sitemap declaration helps search engines discover content.

Issues & Suggestions

1. Allow: / makes the explicit Allow lines redundant

Crawlers allow everything by default; Disallow rules are the exceptions. Adding Allow: /posts/ and Allow: /rss.xml alongside Allow: / is harmless but noisy. If the intent is to be explicit, that's fine, but it's worth knowing these lines have no effect beyond documentation.

 User-agent: *
 Allow: /
-Allow: /posts/
-Allow: /rss.xml
 Disallow: /og-preview
 Disallow: /posts/*-og.png

2. Wildcard * in Disallow: /posts/*-og.png is non-standard

The original robots.txt spec does not support wildcards. Major crawlers (Google, Bing) do support *, but obscure or legacy bots may ignore the rule entirely and crawl those paths. This is a low-risk concern for this site, but worth noting.

3. Verify /og-preview path exists

The Astro project uses Satori for OG image generation. Please confirm that /og-preview matches the actual route used in the codebase (e.g., is it /og-preview/ with a trailing slash, or a different path entirely?). A mismatch means the rule has no effect.

4. Sitemap filename — verify sitemap-index.xml vs sitemap.xml

Astro's @astrojs/sitemap integration generates sitemap-index.xml when multiple sitemaps exist, but for a single-sitemap site it may generate sitemap.xml directly. Verify the actual output filename at build time to ensure the declared URL is reachable.

Sitemap: https://asynctalk.com/sitemap-index.xml  ← confirm this file exists after build

Summary

Area Status
File placement ✅ Correct
Crawl intent ✅ Sensible
Redundant Allow rules ⚠️ Minor cleanup opportunity
Wildcard support ⚠️ Non-standard, works with major crawlers
Path correctness ❓ Should be verified against actual routes
Sitemap URL ❓ Should be verified against build output

Overall this is a low-risk, useful addition. The main actionable follow-up is verifying that /og-preview and the sitemap filename match what the Astro build actually produces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant