Skip to content

workingmodel/wm-wp-plugin-llms-file-generator

Repository files navigation

WM LLMs File Generator

A WordPress plugin that automatically generates and maintains /llms.txt and /llms-full.txt files for your site — a machine-readable index that helps AI language models understand your content.

Developed by Working Model Inc

Based on the emerging llms.txt specification.


What It Does

  • Generates a spec-compliant /llms.txt at your site root
  • Optionally generates /llms-full.txt with full page body content
  • Auto-regenerates whenever you publish, update, or delete content
  • Serves files via PHP with correct UTF-8 encoding — works on any host, no server configuration required
  • Respects noindex settings from Yoast SEO and RankMath
  • Supports ACF — extracts text from custom field content when post_content is empty

Installation

  1. Upload the wm-llms-file-generator folder to /wp-content/plugins/
  2. Activate the plugin via Plugins → Installed Plugins
  3. Go to Settings → LLMs File to configure

Click Generate Now after activation to prime the cache immediately.


Settings

Setting Description
Enable Master on/off switch for file generation
Site Description A paragraph describing your site, included at the top of llms.txt. Falls back to your WP tagline if left blank.
Include Post Types Which post types (Pages, Posts, custom types) appear in the file
Include Taxonomy Archives Optionally include category, tag, and custom taxonomy archive URLs
Exclude Posts / Pages Comma-separated post IDs to omit
Auto-Regenerate Regenerate automatically on publish, update, delete, theme switch, or permalink change
Generate llms-full.txt Also produce /llms-full.txt with complete page body text
Max File Length Character cap for llms-full.txt (default: 100,000)

How It Works

File serving

Files are served via WordPress rewrite rules and PHP, which always sends Content-Type: text/plain; charset=UTF-8. Generated content is cached in a WordPress transient (refreshed weekly or on regeneration) so requests are fast. No physical files are written to the filesystem — no Apache or Nginx configuration needed.

Auto-regeneration

Hooks into save_post, transition_post_status, wp_trash_post, before_delete_post, wp_update_nav_menu, switch_theme, and permalink_structure_changed. A 5-second debounce transient prevents excessive regeneration during bulk operations. A WP-Cron job also runs daily as a safety net.

ACF support

When a post's post_content is empty or sparse (common with ACF-driven pages), the plugin calls get_fields() and recursively extracts text from all field values. Media fields, attachment objects, URLs, and metadata strings are automatically filtered out.

Noindex respect

Posts marked as noindex in Yoast SEO or RankMath are automatically excluded.

Content cleaning (llms-full.txt)

Full page content is cleaned before output: Gutenberg block comments, shortcodes, <style> and <script> blocks, and all HTML tags are stripped. HTML entities are decoded and whitespace is normalised. Content is truncated with a notice if it exceeds the configured character cap.


Output Format

# Site Name

> Site tagline

Optional site description paragraph.

## Pages

- [About](https://example.com/about/): Brief description.
- [Contact](https://example.com/contact/): Get in touch.

## Posts

- [Post Title](https://example.com/post-slug/): Post excerpt.

## Categories

- [News](https://example.com/category/news/): Latest news articles.

Requirements

  • WordPress 6.4+
  • PHP 8.1+

Running Tests

Requires wp-env and Composer.

npx wp-env start
composer install
composer test

Changelog

1.0.2

  • Renamed plugin slug and all identifiers to wm-llms
  • Fixed character encoding issues (…) by serving files via PHP instead of writing physical files — PHP always sends Content-Type: text/plain; charset=UTF-8
  • Generated content now cached in WordPress transients instead of the filesystem

1.0.1

  • Fixed .htaccess charset block for Apache UTF-8 serving
  • Fixed truncation notice reporting incorrect omitted entry count
  • Fixed mb_substr / mb_strlen for multibyte-safe excerpt truncation
  • Fixed HTML entity encoding (&amp;, non-breaking spaces) in output
  • Fixed Unicode line separator (U+2028) corruption in post titles
  • Fixed attachment metadata polluting ACF content extraction

1.0.0

  • Initial release
  • llms.txt generation with post types and taxonomy archives
  • llms-full.txt optional variant with full body content
  • Yoast SEO and RankMath noindex support
  • ACF field content extraction
  • Auto-regeneration hooks + WP-Cron daily fallback
  • 57 PHPUnit tests

License

GPL-2.0-or-later — see LICENSE.

About

A WordPress plugin that automatically generates and maintains /llms.txt and /llms-full.txt files for your site — a machine-readable index that helps AI language models understand your content. Developed by Working Model Inc

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors