-
Notifications
You must be signed in to change notification settings - Fork 345
llms optimisations #2865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llms optimisations #2865
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive LLM optimization features to the Deno documentation site, introducing new resource files and improving frontmatter parsing for better LLM consumption of the documentation.
Changes:
- Added new LLM resource outputs:
llms-summary.txt(compact index),llms.json(structured JSON from Orama), and enhanced existingllms.txtandllms-full.txtgeneration - Improved LLM generation to parse YAML frontmatter, respect frontmatter URLs, extract summaries from content, and handle missing descriptions
- Added AI entrypoint page at
/ai/with links to LLM resources and updatedrobots.txtto allow LLM-related endpoints - Fixed malformed frontmatter title in tunnel database tutorial
- Updated standard library package versions (auto-generated from JSR)
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| generate_llms_files.ts | Core changes: added YAML parsing, URL resolution from frontmatter, summary extraction, new functions for generating llms-summary.txt and llms.json, scoring logic for summary candidates |
| test_llms_gen.ts | Updated test to include new generation functions (generateLlmsSummaryTxt, generateLlmsJson, loadOramaSummaryIndex) |
| _config.ts | Updated build script to generate new LLM resource files during site build |
| static/robots.txt | Added Allow directives for /ai/ and all LLM resource files |
| ai/index.md | New AI entrypoint page documenting available LLM resources and usage notes |
| examples/tutorials/tunnel_database.md | Fixed duplicate "title:" prefix in frontmatter |
| runtime/reference/std/*.md | Auto-generated version bumps for standard library packages (legitimate updates from JSR) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| while ((match = H2_REGEX.exec(markdownContent)) !== null) { | ||
| h2Sections.push(match[1]); | ||
| } |
Copilot
AI
Feb 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The H2_REGEX uses the global flag (/gm) and is reused across multiple files. Regular expressions with the global flag maintain a lastIndex property that persists between calls. Although the regex should be reset naturally when processing different content, it's safer to either: (1) reset H2_REGEX.lastIndex = 0 before the while loop, or (2) use markdownContent.matchAll(H2_REGEX) instead of exec() in a loop. This prevents potential bugs where the regex might not match correctly if its lastIndex wasn't properly reset.
|
|
||
| function extractSummary(markdownContent: string): string | null { | ||
| const withoutCode = markdownContent.replace(/```[\s\S]*?```/g, ""); | ||
| const withoutHtml = withoutCode.replace(/<[^>]+>/g, ""); |
Copilot
AI
Feb 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern for removing code blocks only matches triple backtick code blocks (```). This means inline code with single backticks will remain in the summary. While this is likely acceptable, consider whether inline code should be preserved or stripped from summaries. If inline code should be preserved (which is reasonable for technical documentation), this is fine. Otherwise, you may want to strip single backticks as well.
| const withoutHtml = withoutCode.replace(/<[^>]+>/g, ""); | |
| const withoutInlineCode = withoutCode.replace(/`([^`]+)`/g, "$1"); | |
| const withoutHtml = withoutInlineCode.replace(/<[^>]+>/g, ""); |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
donjo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, I'll let you decide if you want to address any of the copilot questions
llms.txtrobots.txtto allow LLM-related endpoints.