Skip to content

Hreflang fallback option#282

Open
rathboma wants to merge 14 commits intountra:mainfrom
rathboma:hreflang-fallback-option
Open

Hreflang fallback option#282
rathboma wants to merge 14 commits intountra:mainfrom
rathboma:hreflang-fallback-option

Conversation

@rathboma
Copy link
Contributor

@rathboma rathboma commented Jan 13, 2026

Fix #281

Overview

This PR improves Polyglot's SEO behavior by making better defaults the standard. These changes are breaking for sites that relied on the previous fallback behavior.

Breaking Changes

hreflang tags now only generated for actual translations

Previously, Polyglot generated hreflang tags for all configured languages, even when a page fell back to the default language content. Now, hreflang tags are only generated for languages that have actual translations.

Before: A page with only English content would get hreflang tags for all configured languages (en, es, fr, de, etc.)

After: The same page only gets hreflang="en" and hreflang="x-default"

New Feature: Fallback Canonical URLs

Added fallback_canonical_to_default_lang option to control canonical URL behavior for fallback pages:

fallback_canonical_to_default_lang: true

When enabled:

  • Pages with actual translations: canonical points to translated URL (e.g., /es/sobre-nosotros/)
  • Fallback pages (no translation): canonical points to default language URL (e.g., /about/ instead of /es/about/)

Recommended: Use with jekyll-seo-tag

For best results with canonical URLs, we recommend using jekyll-seo-tag's canonical=false option combined with Polyglot's I18n_Headers tag:

{% seo canonical=false %}
{% I18n_Headers %}

This allows Polyglot's I18n_Headers to handle canonical URLs with proper translation detection, while jekyll-seo-tag handles all other SEO tags.

Improvements Included

Extended translation detection to include site.pages

  • Previously only searched site.collections for translations with matching page_id
  • Now also searches site.pages, so standalone pages (not in collections) properly detect translations

Permalink-based translation matching

  • When page_id is not set, falls back to matching translations by permalink
  • Common pattern for sites that use the same permalink across language-specific files

Proper handling of pages without lang frontmatter

  • Pages without explicit lang in frontmatter are treated as belonging to default_lang
  • Fixes edge case where single-language pages would not be properly identified

Language prefix normalization

  • Strips active language prefix from permalinks when matching (e.g., /es/about/about)
  • Ensures consistent matching during non-default language builds

Fixed x-default URL relativization

  • Prevents x-default URLs from being incorrectly rewritten with language prefixes
  • x-default now correctly points to the default language version

Example Results

Page with English and Spanish translations:

<link rel="canonical" href="https://example.com/es/sobre-nosotros/">
<link rel="alternate" hreflang="en" href="https://example.com/about">
<link rel="alternate" hreflang="x-default" href="https://example.com/about">
<link rel="alternate" hreflang="es" href="https://example.com/es/sobre-nosotros/">
<!-- No hreflang for fr, de, etc. since no translations exist -->

Fallback page (no translations) with fallback_canonical_to_default_lang: true:

<link rel="canonical" href="https://example.com/about">
<link rel="alternate" hreflang="en" href="https://example.com/about">
<link rel="alternate" hreflang="x-default" href="https://example.com/about">
<!-- Canonical points to default language, not /es/about -->

Type of change

  • Docs update (changes to the readme or a site page, no code changes)
  • Ops wrangling (automation or test improvements)
  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Sweet release (needs a lot of work and effort)
  • Something else (explain please)

Checklists

  • If modifying code, at least one test has been added to the suite
  • Backwards compatible (defaults to existing behavior)
  • All existing tests pass

rathboma and others added 9 commits January 13, 2026 09:40
When set to false, hreflang tags are only generated for languages that have
actual translations, not for fallback pages that just use the default language
content. This improves SEO correctness by not advertising language alternatives
that don't actually exist.

Default is true (existing behavior) for backward compatibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add three tests for hreflang_fallback behavior
- Add documentation in README explaining the feature

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Clarify hreflang behavior for fallback pages in README.
This change improves the hreflang_fallback feature by:

1. Searching site.pages in addition to collection documents when looking
   for translations with matching page_id
2. Falling back to permalink matching when page_id is not set

This ensures that standalone pages (not in collections) are properly
recognized as translations when using hreflang_fallback: false.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a page doesn't have `lang` set in its frontmatter, assume it
belongs to the default language when building the lang_to_permalink
hash. This fixes hreflang generation for standalone pages that don't
explicitly set their language.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
During non-default language builds, the page context's permalink may
include the language prefix (e.g., /es/about), while the stored page
data has the base permalink (/about). Strip the active language prefix
before matching to ensure documents are found correctly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change fallback from `permalink` to `normalized_permalink` for
current_permalink, default_lang_permalink, and alt_permalink to ensure
consistent behavior when matching documents fails. This fixes the
x-default hreflang URL during non-default language builds.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The URL relativization regex was only excluding hreflang="default_lang"
and rel="canonical" from being rewritten with language prefixes. This
caused hreflang="x-default" URLs to be incorrectly modified.

Added hreflang="x-default" to the negative lookbehind pattern to ensure
x-default URLs always point to the default language version.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds a new `relativize_canonical` config option (defaults to false) that
controls whether canonical URLs from other plugins (like jekyll-seo-tag)
are relativized with language prefixes.

When `relativize_canonical: true`:
- Canonical URLs from external plugins get the language prefix added
- Useful when using jekyll-seo-tag alongside polyglot's i18n_headers

When `relativize_canonical: false` (default):
- Canonical URLs are NOT relativized (preserves backwards compatibility)
- Canonical URLs from external plugins remain unchanged

This allows sites using jekyll-seo-tag to have their canonical URLs
properly prefixed with the active language, solving duplicate canonical
tag conflicts when using both plugins together.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@untra
Copy link
Owner

untra commented Jan 13, 2026

👋 heya @rathboma , thanks for the PR and contribution!

I have a big overall suggestion for this, which is I would rather there not be an added config option to maintain backwards compatibility with hreflang_fallback: false, but rather I would encourage forward compatibility with a more accurate SEO output. If there is an SEO improvement to be had with more accurate meta headers, then I'm all for it. Especially considering:

However every page contains hreflang meta tags, even when we're falling back to the default language.
I'm concerned this would mislead search agents and readers of the site, and I should only have the meta tags when I have a legit translation.

the challenge with this ruby project has been writing tests, and you've helped tremendously here adding tests to the PR. bravo sir! finding improvements like this is important. I have some other code suggestions to look into as well.

Thanks again for this contribution! feel free to add your site to the readme contribution list if you want! bono cross-promotion is a reward for contribution.

@untra
Copy link
Owner

untra commented Jan 13, 2026

polyglot is due for a patch release with a few misc changes and more tests. I might get this done later this week with these additions, and a new blogpost to announce this update.

@rathboma
Copy link
Contributor Author

@untra sounds good! I didn't want to change default behavior, but I can update to do that if you like.

I'm still working through a couple of bugs (notably with x-default). Just for transparency - Claude code has been helping a lot with both tests and finding bugs.

rathboma and others added 2 commits January 14, 2026 09:19
Co-authored-by: Samuel Volin <untra.sam@gmail.com>
BREAKING CHANGE: Remove configuration options and make better behavior default

- Remove `hreflang_fallback` option - now always only generates hreflang
  tags for languages with actual translations (previously required setting
  `hreflang_fallback: false`)

- Remove `relativize_canonical` option - now always relativizes canonical
  URLs from external plugins like jekyll-seo-tag (previously required
  setting `relativize_canonical: true`)

These changes improve SEO accuracy out of the box:
- hreflang tags only advertise language versions that actually exist
- Canonical URLs correctly include language prefix on translated pages
- x-default and default language hreflang URLs are preserved as-is

Updated README documentation to reflect the new default behavior.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@rathboma
Copy link
Contributor Author

This ended up being a bigger PR than I expected. I'm currently adding a new option to customize which page should be labeled as canonical. I think if /fr/about is using the default_lang of en, then the canonical should be /about, not /fr/about, because the content is the same. I'm adding an option to allow this. I can see not everyone would want it.

rathboma and others added 2 commits January 14, 2026 09:43
When enabled, canonical URLs on fallback pages (pages without actual
translations) point to the default language URL instead of the current
language URL. This improves SEO by:

- Preventing search engines from indexing duplicate fallback content
- Consolidating SEO authority to the original content
- Signaling which version is the authoritative source

The option also excludes canonical URLs from relativization when enabled,
ensuring they correctly point to the default language.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of trying to exclude canonical URLs from relativization via
regex (which was complex and error-prone), recommend using jekyll-seo-tag's
new `canonical=false` option combined with Polyglot's I18n_Headers tag.

This provides cleaner separation of concerns:
- jekyll-seo-tag handles all SEO tags except canonical
- Polyglot's I18n_Headers handles canonical and hreflang tags with
  proper translation detection

Changes:
- Remove canonical exclusion from absolute_url_regex
- Update README to document jekyll-seo-tag integration
- Remove test for canonical regex exclusion

Related: jekyll/jekyll-seo-tag#521

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@rathboma
Copy link
Contributor Author

ok, I think this is good for final review now. Please test it out. I have tested the Beekeeper Studio site build and it works as expected for me!

@rathboma
Copy link
Contributor Author

@untra this is done! It's working great.

{% I18n_Headers %}
```

The `canonical=false` option is available in jekyll-seo-tag v2.9.0+ (see [PR #521](https://github.com/jekyll/jekyll-seo-tag/pull/521)).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're doing here, and I respect it 😁

I will merge this in once jekyll-seo-tag is updated

Copy link
Owner

@untra untra Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rathboma if you can follow up on jekyll/jekyll-seo-tag#521 and see that merged in, I will approve this PR and the other approved PRs you've made for polyglot will get merged in.

I might punch up the docs and remove the link to that PR from this README.md after the fact. But this feature work depends on the jekyll-seo-tag PR being merged first.

I really appreciate your effort to refine adjustments to both of these projects, so that {% seo canonical=false %} can be used and polyglot brings the page canonical. This is a great feature, but there's a build order to this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya @rathboma give this PR and jekyll/jekyll-seo-tag#521 another pass, and we can get them merged in for the next polyglot release after this one

@rathboma
Copy link
Contributor Author

@untra sorry for the delay! I updated my jekyll-seo-tag PR to fix their feedback, hopefully it should be merged soon.

Resolve conflicts in README.md and site_spec.rb, keeping both
the hreflang fallback docs/tests and the upstream rendered_lang
tests and netlify redirects docs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to only generate hreflang when a translation is actually available

2 participants