Add i18n support for web UI (German + English)#5
Conversation
Introduce a JSON-based translation system with cookie-persisted language switching. All UI strings are now translatable via server-side Jinja2 _() and client-side window.__t() helpers. New files: - i18n module with LRU-cached translation loading and TypeGuard narrowing - Translation files (de.json, en.json) with 53 keys each - Client-side i18n.js helper with global placeholder replacement - Unit tests (19) and integration tests (17) for i18n Key changes: - Language detection: query param > cookie > default (German) - Secure cookie (httponly, secure, samesite=lax) set on explicit switch - XSS prevention: html.escape on all interpolated values, </script> breakout escaping in embedded JSON - TypeGuard is_supported_lang() eliminates type: ignore comments - Existing tests pinned to explicit language via cookies
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds full internationalization: a new i18n Python module with German/English translation files, server-side language detection and Jinja integration, client-side translation bootstrap and helpers, template/JS text replaced with translation keys, and tests validating translations and language switching. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Server
participant Templates
participant FS as Translations FS
Client->>Server: HTTP GET / (may include ?lang=)
Server->>Server: _get_lang(request) (query -> cookie -> default)
Server->>FS: _load_translations_cached(lang)
FS-->>Server: translations JSON
Server->>Templates: render template with _ (jinja_translate) and translations_json
Templates-->>Server: rendered HTML (window.__translations embedded)
Server-->>Client: HTML response (sets lang cookie if requested)
Client->>Client: window.__t(key) uses window.__translations for runtime UI strings
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/document_anonymizer/i18n/__init__.py (1)
16-16: Consider using the project'sget_loggerhelper for consistency.The codebase has a
get_loggerutility insrc/document_anonymizer/audit/logging.pythat wrapsstructlog.get_logger. For consistency across modules, consider using it here instead of callingstructlog.get_loggerdirectly.♻️ Suggested change
-import structlog +from document_anonymizer.audit.logging import get_logger from jinja2 import pass_context -logger = structlog.get_logger(__name__) +logger = get_logger(__name__)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/document_anonymizer/i18n/__init__.py` at line 16, Replace the direct call to structlog.get_logger in the module-level assignment (logger = structlog.get_logger(__name__)) with the project's wrapper get_logger to keep logging consistent; import get_logger from document_anonymizer.audit.logging at the top of the file and change the assignment to use get_logger(__name__) (retain the existing logger variable name) so the module uses the centralized logging helper instead of structlog.get_logger directly.tests/test_web/test_i18n_integration.py (1)
8-8: Minor: Module-level TestClient instead of fixture.The module creates a
TestClientdirectly rather than using theclientfixture fromconftest.py. This works but is inconsistent with the existing test pattern. Since these tests are read-only and don't modify state, it's functionally fine.🔧 Suggested alignment with conftest pattern
-client = TestClient(app) +import pytest + +@pytest.fixture +def client() -> TestClient: + return TestClient(app)Then inject
clientas a parameter in each test method.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_web/test_i18n_integration.py` at line 8, Replace the module-level TestClient instantiation ("client = TestClient(app)") with the existing pytest fixture by removing that line and updating each test function in this file to accept the "client" fixture as a parameter; ensure tests use the injected "client" (from conftest.py) instead of the removed TestClient and keep test bodies unchanged other than the parameter.src/document_anonymizer/web/templates/base.html (1)
8-8: Consider translating meta description for SEO.The
<meta name="description">remains hardcoded in German while other UI text is now internationalized. This affects SEO for non-German users.🌐 Suggested fix
- <meta name="description" content="Datenschutzkonforme Dokumentenanonymisierung mit deutscher PII-Erkennung. Zero-Persistence-Architektur."> + <meta name="description" content="{{ _('meta.description') }}">Then add
meta.descriptionkeys to both translation files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/document_anonymizer/web/templates/base.html` at line 8, The meta description in the base.html template is hardcoded in German; replace the static content attribute with the i18n lookup used elsewhere (e.g., use the same translation helper/tag used for UI strings) and add a new translation key meta.description to both language resource files (e.g., en and de) so SEO uses localized descriptions; ensure the template uses the translation key (meta.description) consistently and that both translation files contain appropriate localized texts.src/document_anonymizer/web/static/js/app.js (1)
24-39: Consider: German-only example text may confuse English users.The
EXAMPLE_TEXTis hardcoded German sample data. While this is demo content rather than UI text, English users may find it less intuitive. This is a minor UX consideration that could be addressed in a follow-up if needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/document_anonymizer/web/static/js/app.js` around lines 24 - 39, EXAMPLE_TEXT in src/document_anonymizer/web/static/js/app.js currently contains only German sample data; replace it with an English example or provide locale-aware examples so English users aren’t confused. Update the code to define two constants (e.g., EXAMPLE_TEXT_EN and EXAMPLE_TEXT_DE) and set EXAMPLE_TEXT dynamically based on navigator.language (or a simple language toggle) so the UI uses the appropriate sample; ensure the variable name EXAMPLE_TEXT remains the one consumed elsewhere so no further changes are needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/document_anonymizer/web/routes.py`:
- Around line 111-118: The cookie is being set with secure=True unconditionally
in the response.set_cookie call (where query_lang is written) which prevents the
cookie in local HTTP dev; change the secure flag to be conditional on the
environment (use APP_ENV) so secure=False for development and secure=True
otherwise, and ensure os.getenv("APP_ENV") (or the existing config variable) is
imported/used when constructing the set_cookie arguments in the same function
that handles the query_lang and response.set_cookie call.
---
Nitpick comments:
In `@src/document_anonymizer/i18n/__init__.py`:
- Line 16: Replace the direct call to structlog.get_logger in the module-level
assignment (logger = structlog.get_logger(__name__)) with the project's wrapper
get_logger to keep logging consistent; import get_logger from
document_anonymizer.audit.logging at the top of the file and change the
assignment to use get_logger(__name__) (retain the existing logger variable
name) so the module uses the centralized logging helper instead of
structlog.get_logger directly.
In `@src/document_anonymizer/web/static/js/app.js`:
- Around line 24-39: EXAMPLE_TEXT in
src/document_anonymizer/web/static/js/app.js currently contains only German
sample data; replace it with an English example or provide locale-aware examples
so English users aren’t confused. Update the code to define two constants (e.g.,
EXAMPLE_TEXT_EN and EXAMPLE_TEXT_DE) and set EXAMPLE_TEXT dynamically based on
navigator.language (or a simple language toggle) so the UI uses the appropriate
sample; ensure the variable name EXAMPLE_TEXT remains the one consumed elsewhere
so no further changes are needed.
In `@src/document_anonymizer/web/templates/base.html`:
- Line 8: The meta description in the base.html template is hardcoded in German;
replace the static content attribute with the i18n lookup used elsewhere (e.g.,
use the same translation helper/tag used for UI strings) and add a new
translation key meta.description to both language resource files (e.g., en and
de) so SEO uses localized descriptions; ensure the template uses the translation
key (meta.description) consistently and that both translation files contain
appropriate localized texts.
In `@tests/test_web/test_i18n_integration.py`:
- Line 8: Replace the module-level TestClient instantiation ("client =
TestClient(app)") with the existing pytest fixture by removing that line and
updating each test function in this file to accept the "client" fixture as a
parameter; ensure tests use the injected "client" (from conftest.py) instead of
the removed TestClient and keep test bodies unchanged other than the parameter.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f1a4c9be-01d3-4239-bf6e-998664bc9700
📒 Files selected for processing (18)
src/document_anonymizer/i18n/__init__.pysrc/document_anonymizer/i18n/translations/de.jsonsrc/document_anonymizer/i18n/translations/en.jsonsrc/document_anonymizer/web/routes.pysrc/document_anonymizer/web/static/js/app.jssrc/document_anonymizer/web/static/js/i18n.jssrc/document_anonymizer/web/static/js/review.jssrc/document_anonymizer/web/templates/anonymized.htmlsrc/document_anonymizer/web/templates/base.htmlsrc/document_anonymizer/web/templates/index.htmlsrc/document_anonymizer/web/templates/results.htmltests/test_i18n/__init__.pytests/test_i18n/test_translations.pytests/test_web/test_highlighted_text.pytests/test_web/test_i18n_integration.pytests/test_web/test_review_helpers.pytests/test_web/test_review_routes.pytests/test_web/test_routes.py
| response.set_cookie( | ||
| "lang", | ||
| query_lang, | ||
| httponly=True, | ||
| secure=True, | ||
| samesite="lax", | ||
| max_age=365 * 24 * 3600, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there's any environment-based configuration for secure cookies
rg -n "secure.*cookie|SECURE.*COOKIE|https.*local" --type=pyRepository: JuliusScheuerer/document-anonymizer
Length of output: 61
🏁 Script executed:
# Examine the routes.py file to understand the full context
head -n 125 src/document_anonymizer/web/routes.py | tail -n 30Repository: JuliusScheuerer/document-anonymizer
Length of output: 1036
🏁 Script executed:
# Search for environment variable usage and configuration patterns
rg -n "environ|config|getenv|FLASK_ENV|DEBUG" --type=py src/document_anonymizer/web/ | head -20Repository: JuliusScheuerer/document-anonymizer
Length of output: 61
🏁 Script executed:
# Look for any development or setup documentation
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "*.rst" \) | xargs rg -l "development\|local\|https\|secure" 2>/dev/null | head -10Repository: JuliusScheuerer/document-anonymizer
Length of output: 61
🏁 Script executed:
# Search for Flask app initialization and configuration
rg -n "Flask\|app\s*=\|create_app" --type=py src/document_anonymizer/web/ | head -20Repository: JuliusScheuerer/document-anonymizer
Length of output: 61
🏁 Script executed:
# Check main entry point and app setup
find src -name "main.py" -o -name "__main__.py" -o -name "app.py" | xargs catRepository: JuliusScheuerer/document-anonymizer
Length of output: 3518
🏁 Script executed:
# Look for README or setup documentation
find . -maxdepth 2 -name "README*" -o -name "SETUP*" -o -name "DEVELOP*" | xargs cat 2>/dev/null | head -100Repository: JuliusScheuerer/document-anonymizer
Length of output: 4474
🏁 Script executed:
# Check if there's any Docker or deployment configuration
find . -maxdepth 2 \( -name "Dockerfile" -o -name "docker-compose*" -o -name ".env*" \) | head -5Repository: JuliusScheuerer/document-anonymizer
Length of output: 127
🏁 Script executed:
# Check .env.example and docker-compose.yml for HTTPS/secure configuration
cat .env.example docker-compose.ymlRepository: JuliusScheuerer/document-anonymizer
Length of output: 680
🏁 Script executed:
# Search for any environment variable controlling secure cookies or development mode
rg -n "ENVIRONMENT|DEVELOPMENT|PRODUCTION|DEBUG" --type=py src/ | head -20Repository: JuliusScheuerer/document-anonymizer
Length of output: 150
🏁 Script executed:
# Check if secure flag is used anywhere else in the codebase
rg -n "secure\s*=" --type=py src/Repository: JuliusScheuerer/document-anonymizer
Length of output: 145
Make the secure cookie flag conditional on environment — it's set unconditionally but local development uses HTTP.
The secure=True flag prevents the cookie from being sent over HTTP. Since the README shows local development runs on http://localhost:8000 (via uvicorn --reload), the language preference cookie will silently fail to be set in development. Use APP_ENV (already defined in .env.example) to conditionally set secure=False for development:
response.set_cookie(
"lang",
query_lang,
httponly=True,
secure=os.getenv("APP_ENV") != "development",
samesite="lax",
max_age=365 * 24 * 3600,
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/document_anonymizer/web/routes.py` around lines 111 - 118, The cookie is
being set with secure=True unconditionally in the response.set_cookie call
(where query_lang is written) which prevents the cookie in local HTTP dev;
change the secure flag to be conditional on the environment (use APP_ENV) so
secure=False for development and secure=True otherwise, and ensure
os.getenv("APP_ENV") (or the existing config variable) is imported/used when
constructing the set_cookie arguments in the same function that handles the
query_lang and response.set_cookie call.
Greptile SummaryThis PR adds a solid JSON-based i18n module with LRU-cached translation loading, server-side Jinja2 integration ( However, the PR contains a critical regression introduced while implementing the optimization suggested in a previous review thread:
Key changes:
Confidence Score: 0/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Browser
participant FastAPI as FastAPI (routes.py)
participant i18n as i18n module
participant Jinja2
participant JS as Client JS (i18n.js / app.js / review.js)
Browser->>FastAPI: GET / (optional ?lang=en or lang cookie)
FastAPI->>i18n: _get_lang(request) → Lang
FastAPI->>i18n: get_translations(lang) → dict
FastAPI->>Jinja2: TemplateResponse("index.html", {lang, translations_json})
Note over Jinja2: index.html extends base.html<br/>_("key") calls jinja_translate(context, key)<br/>window.__translations = translations_json
Jinja2-->>Browser: Full HTML page with embedded translations JSON
Browser->>JS: i18n.js loaded → sets window.__t()
Browser->>JS: app.js loaded (defer) → uses window.__t()
Browser->>FastAPI: POST /detect (HX-Request, lang cookie)
FastAPI->>i18n: _get_lang(request) → Lang
FastAPI->>Jinja2: TemplateResponse("results.html", {lang, ...})
Note over Jinja2: Fragment — no base.html<br/>_("key") uses lang from context
Jinja2-->>Browser: HTML fragment (HTMX swap into #results)
Browser->>JS: review.js loaded → uses window.__t() (already set)
Browser->>FastAPI: GET /?lang=en (language switch)
FastAPI->>i18n: is_supported_lang("en") → True
FastAPI->>FastAPI: set_cookie("lang", "en", httponly, secure, samesite=lax)
FastAPI-->>Browser: Full page in English + Set-Cookie: lang=en
Last reviewed commit: 821adb4 |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
| # Only embed translations for full-page responses that include base.html | ||
| if template_name in ("index.html", "base.html"): # or check explicitly | ||
| all_translations = get_translations(lang) | ||
| ctx["translations_json"] = json.dumps(all_translations, ensure_ascii=False).replace( | ||
| "</", r"<\/" | ||
| ) | ||
| else: | ||
| ctx["translations_json"] = "{}" # or omit entirely | ||
|
|
||
| response = templates.TemplateResponse( | ||
| request, template_name, ctx, status_code=status_code | ||
| ) | ||
| # Only set cookie when user explicitly requests a language via query param | ||
| query_lang = request.query_params.get("lang", "") | ||
| if is_supported_lang(query_lang): | ||
| response.set_cookie( | ||
| "lang", | ||
| query_lang, | ||
| httponly=True, | ||
| secure=True, | ||
| samesite="lax", | ||
| max_age=365 * 24 * 3600, | ||
| ) | ||
| return response |
There was a problem hiding this comment.
SyntaxError: if/else block is at module level, not inside _template_response
Lines 100–107 have zero indentation, so Python parses them as module-level statements rather than as the body of _template_response. As a result:
- The function body silently ends at line 98 (after
ctx.update(context)), returningNoneunconditionally instead of anHTMLResponse. - The module-level
else:block at line 106 containsreturn responseat line 123 (still 4-space indented, so inside theelse), which is aSyntaxError: 'return' outside function— Python raises this at compile time, preventing the module from being imported at all and crashing the application on startup.
The entire block from line 99 to 123 must be indented four spaces so it belongs inside _template_response:
| # Only embed translations for full-page responses that include base.html | |
| if template_name in ("index.html", "base.html"): # or check explicitly | |
| all_translations = get_translations(lang) | |
| ctx["translations_json"] = json.dumps(all_translations, ensure_ascii=False).replace( | |
| "</", r"<\/" | |
| ) | |
| else: | |
| ctx["translations_json"] = "{}" # or omit entirely | |
| response = templates.TemplateResponse( | |
| request, template_name, ctx, status_code=status_code | |
| ) | |
| # Only set cookie when user explicitly requests a language via query param | |
| query_lang = request.query_params.get("lang", "") | |
| if is_supported_lang(query_lang): | |
| response.set_cookie( | |
| "lang", | |
| query_lang, | |
| httponly=True, | |
| secure=True, | |
| samesite="lax", | |
| max_age=365 * 24 * 3600, | |
| ) | |
| return response | |
| # Provide all translations as JSON for client-side window.__t() | |
| # Only embed translations for full-page responses that include base.html | |
| if template_name == "index.html": | |
| all_translations = get_translations(lang) | |
| ctx["translations_json"] = json.dumps(all_translations, ensure_ascii=False).replace( | |
| "</", r"<\/" | |
| ) | |
| else: | |
| ctx["translations_json"] = "{}" | |
| response = templates.TemplateResponse( | |
| request, template_name, ctx, status_code=status_code | |
| ) | |
| # Only set cookie when user explicitly requests a language via query param | |
| query_lang = request.query_params.get("lang", "") | |
| if is_supported_lang(query_lang): | |
| response.set_cookie( | |
| "lang", | |
| query_lang, | |
| httponly=True, | |
| secure=True, | |
| samesite="lax", | |
| max_age=365 * 24 * 3600, | |
| ) | |
| return response |
Note: the condition is also corrected from template_name in ("index.html", "base.html") to template_name == "index.html" — see the separate comment below.
| ctx.update(context) | ||
| # Provide all translations as JSON for client-side window.__t() | ||
| # Only embed translations for full-page responses that include base.html | ||
| if template_name in ("index.html", "base.html"): # or check explicitly |
There was a problem hiding this comment.
Incorrect template name guard — "base.html" is never a direct template argument
base.html is only ever used as a parent template (extended via {% extends "base.html" %}); no route calls _template_response(request, "base.html", ...) directly. The only full-page template among the current route handlers is index.html.
Including "base.html" in the set gives a false sense of coverage, and any future full-page template added (e.g. an error page) would silently receive an empty translations_json, breaking client-side window.__t() on that page.
Prefer an explicit allowlist or, for safety, flip the logic and only skip serialization for known fragment templates:
| if template_name in ("index.html", "base.html"): # or check explicitly | |
| if template_name == "index.html": |
Or, if you expect additional full-page templates to be added:
_FULL_PAGE_TEMPLATES = frozenset({"index.html"})
...
if template_name in _FULL_PAGE_TEMPLATES:…#5 followup) The i18n PR (#5) introduced two bugs: (1) broken indentation in _template_response() causing SyntaxError at import, and (2) an inline <script> violating script-src 'self' CSP policy. Fixes: - Re-indent lines 100-107 of routes.py back inside _template_response() - Add per-request CSP nonce via secrets.token_urlsafe() in middleware - Add nonce attribute to inline translations <script> in base.html - Add rate limiter reset fixture to prevent 429 cascade across tests - Add tests verifying nonce presence, uniqueness, and header/HTML match
…#5 followup) (#6) The i18n PR (#5) introduced two bugs: (1) broken indentation in _template_response() causing SyntaxError at import, and (2) an inline <script> violating script-src 'self' CSP policy. Fixes: - Re-indent lines 100-107 of routes.py back inside _template_response() - Add per-request CSP nonce via secrets.token_urlsafe() in middleware - Add nonce attribute to inline translations <script> in base.html - Add rate limiter reset fixture to prevent 429 cascade across tests - Add tests verifying nonce presence, uniqueness, and header/HTML match
Summary
src/document_anonymizer/i18n/) with LRU-cached translation loading,TypeGuardnarrowing, and Jinja2 integration?lang=query param or persisted cookie (httponly, secure, samesite=lax)_("key")(Jinja2) and client-sidewindow.__t("key")callsSecurity
html.escape()on all translated strings interpolated into HTML attributes</→<\/escaping in embedded translations JSON to prevent</script>breakout XSSis_supported_lang()TypeGuard — notype: ignoreneededhttponly=True, secure=True, samesite="lax"Test plan
cookies={"lang": "de"}where asserting German stringsmatch=patterns removed frompytest.raisesfor language-dependent error messagesSummary by CodeRabbit
New Features
Bug Fixes
Tests