Skip to content

Blasp v4: Driver-based architecture rewrite#48

Merged
deemonic merged 26 commits into
mainfrom
feature/v4-refactor
Mar 26, 2026
Merged

Blasp v4: Driver-based architecture rewrite#48
deemonic merged 26 commits into
mainfrom
feature/v4-refactor

Conversation

@deemonic
Copy link
Copy Markdown
Collaborator

@deemonic deemonic commented Mar 26, 2026

Summary

  • Driver-based architecture — regex, pattern, phonetic, and pipeline drivers replace the monolithic detection engine
  • Severity scoring — profanities categorised as mild/moderate/high/extreme with 0-100 score calculation
  • Multi-language support — English, Spanish, German, French with language-specific normalizers
  • Laravel integrationsBlaspable Eloquent trait, CheckProfanity middleware, Blade directive, Str/Stringable macros, validation rule
  • Masking strategies — character mask, grawlix, or custom callback
  • Testing utilitiesBlasp::fake() with assertions
  • EventsProfanityDetected, ContentBlocked, ModelProfanityDetected
  • Namespace flattenedBlaspsoft\Blasp\Laravel\ merged into Blaspsoft\Blasp\
  • Bug fixes — UTF-8 safety, multibyte word counting, regex JIT stack overflow prevention

Test plan

  • Run full test suite (composer test)
  • Verify installation in a fresh Laravel 13 project
  • Test each driver individually (regex, pattern, phonetic, pipeline)
  • Test multi-language detection
  • Test Blaspable trait on an Eloquent model
  • Test middleware with severity filtering
  • Verify Blasp::fake() works in test doubles

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Prevented self-referential pipeline driver configs and added validation.
    • Improved UTF‑8 handling for matching and masking to better support multilingual text.
    • Preserved nested disablement state when temporarily turning off checks.
    • Applied severity filters earlier and avoided overlapping match re-detection.
    • Early-exit validation for non-string/empty inputs.
    • Adjusted request field inclusion/exclusion precedence.
    • Bounded and deduplicated tracked cache keys to limit growth.
  • Configuration

    • Updated French and German severity lists.
  • Tests

    • Updated string casting assertion in a unit test.

deemonic and others added 22 commits February 12, 2026 19:37
…ction

Adds a `Blaspable` trait that hooks into the Eloquent `saving` event to
automatically check and sanitize (or reject) profanity on specified model
attributes. Supports per-model language, mask, and mode overrides.

- Blaspable trait with sanitize/reject modes and helper methods
- ProfanityRejectedException for reject mode
- ModelProfanityDetected event fired on detection
- `model.mode` config key in blasp.php
- 21 tests covering all trait functionality

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove all v3-era source files that have been replaced by the new v4
architecture: Abstracts, Config, Contracts, Facades, Generators,
Normalizers, Registries, and the monolithic BlaspService/ProfanityDetector.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New modular core with Analyzer, Dictionary, Result, and driver-based
detection (RegexDriver, PatternDriver). Includes normalizers per language,
configurable masking strategies, severity levels, and false positive filtering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BlaspManager with fluent PendingCheck API, Facade, ServiceProvider,
middleware, validation rule, artisan commands (clear, test, languages),
events (ProfanityDetected, ContentBlocked), and BlaspFake for testing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update composer.json laravel extra to point to new BlaspServiceProvider
and Facade namespaces. Add severity tiers to English language config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Migrate all tests to use the new v4 Facade, PendingCheck fluent API,
and Result methods. Simplify TestCase base class to use BlaspServiceProvider.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Full rewrite covering the new driver architecture, fluent API, Result
object, Blaspable trait, middleware, validation rules, testing utilities,
events, artisan commands, configuration reference, and v3 migration guide.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Register 'blasp' as a short middleware alias, add @clean Blade directive
for XSS-safe profanity masking in views, and register isProfane/cleanProfanity
macros on Str and Stringable for fluent usage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move all classes from Blaspsoft\Blasp\Laravel\* to Blaspsoft\Blasp\* and
update imports across src and tests to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Catches sound-alike profanity evasions (e.g. "phuck", "fuk", "sheit")
that bypass the regex and pattern drivers. Uses PHP's metaphone() for
indexing and levenshtein() for confirmation, with a curated false-positive
list to protect common words like "fork", "duck", and "beach".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows combining regex, pattern, and phonetic drivers so a single
check() call catches obfuscated text, exact matches, and sound-alikes
in one pass. Supports config-based (`driver('pipeline')`) and ad-hoc
(`pipeline('regex', 'phonetic')`) usage with union merge semantics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add severity maps (mild/moderate/extreme) for Spanish, French, and German
so withSeverity() filtering works correctly for all languages instead of
defaulting everything to High.

Implement result caching in PendingCheck — check() results are cached by
a hash of all parameters (text, driver, language, severity, allow/block
lists, mask strategy). CallbackMask bypasses cache since closures can't
serialize. Add Result::fromArray() for deserialization, extend
Dictionary::clearCache() to also clear result cache, and add
cache.results config toggle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lists

Non-English severity maps (Spanish, French, German) only had 3 tiers
(mild, moderate, extreme) while English had 4. Added 'high' tier with
representative strong profanity words to each.

Also added 39 words that appeared in severity maps but were missing
from profanities arrays (21 English, 5 French, 13 German), which
meant they could never be detected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Dictionary: sanitize language parameter to prevent path traversal
  via loadLanguageConfig(), forLanguage(), and forLanguages()
- TestCommand: rename --verbose to --detail to avoid conflict with
  Symfony Console's built-in -v|--verbose flag
- PatternDriver, PhoneticDriver, RegexDriver: convert PREG_OFFSET_CAPTURE
  byte offsets to character offsets for correct multibyte string handling
- PatternDriver, PhoneticDriver, RegexDriver: apply severity filter before
  masking so low-severity words aren't masked in cleanText when filtered out
- Blasp facade: throw RuntimeException in assertChecked() and
  assertCheckedTimes() when fake() hasn't been called, instead of silently
  passing
- Profanity rule: convert static factory methods to instance methods with
  __callStatic for backward compat, enabling chaining like
  Profanity::in('spanish')->severity(Severity::High)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove 'knob' from false_positives list (conflicts with profanities)
- PatternDriver: deduplicate overlapping matches before masking to
  prevent double-masking (e.g., "motherfucker" matching both
  "motherfucker" and "fuck")
- PhoneticDriver, RegexDriver: pass byte offsets to FalsePositiveFilter
  methods (isInsideHexToken, isSpanningWordBoundary, getFullWordContext)
  which use byte-level operations, while keeping character offsets for
  MatchedWord positions and mb_substr masking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n PatternDriver

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…icDriver matches

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces unbounded lazy quantifier (*?) with {0,3} in the separator
expression between profanity characters. This prevents PHP-FPM worker
segfaults caused by PCRE JIT stack overflow when processing 1,300+
complex patterns with nested lazy quantifiers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tack overflow

Each branch in the separator group now matches exactly one character,
with the outer {0,3}? handling repetition. Removes redundant (?:\s)
alternative since \s is already in the character class.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 20544b4f-badf-4d27-a23b-fcb5ee796937

📥 Commits

Reviewing files that changed from the base of the PR and between a03f977 and d94740b.

📒 Files selected for processing (1)
  • src/BlaspManager.php

📝 Walkthrough

Walkthrough

Validates pipeline driver config, guards against self-referential pipeline names, early-exits non-string validator inputs, preserves nested disable state, improves UTF-8 handling in matchers/drivers, reorders severity filtering before deduplication/masking, adjusts middleware field selection order, trims language lists, and limits tracked cache keys.

Changes

Cohort / File(s) Summary
Manager & Provider & State
src/BlaspManager.php, src/BlaspServiceProvider.php, src/Blaspable.php
Validate blasp.drivers.pipeline.drivers shape and forbid "pipeline" name; validator rule early-returns for non-string/empty values; withoutBlaspChecking() restores previous flag state; removed unused import.
UTF‑8 / Phonetic Matching
src/Core/Matchers/PhoneticMatcher.php
Replaced ASCII strtolower/strlen usage with mb_strtolower(..., 'UTF-8') and mb_strlen(..., 'UTF-8') for multibyte-safe normalization and threshold calculations.
Regex Driver: masking & overlap
src/Drivers/RegexDriver.php
Use immutable normalized string for positions; derive matched text via mb_substr; track/skip already-masked character ranges to avoid overlapping matches; apply severity filter earlier.
Pattern Driver: severity ordering
src/Drivers/PatternDriver.php
Apply severity threshold filter immediately after collecting matches and before overlap deduplication.
Pipeline Driver: UTF‑8 substrings
src/Drivers/PipelineDriver.php
Use explicit 'UTF-8' encoding in mb_substr calls during right-to-left mask application for correct multibyte slicing.
Middleware: field selection logic
src/Middleware/CheckProfanity.php
When fields['*'], build input via only() then except() (via collection) so inclusion is limited first; otherwise use prior except() behavior.
Pending checks & cache trimming
src/PendingCheck.php
Deduplicate tracked cache keys, enforce config('blasp.cache.max_tracked_keys') limit, evict oldest entries when over limit, and persist trimmed list.
Language config edits
config/languages/french.php, config/languages/german.php
Removed three mild French terms; moved several German terms from extreme to moderate.
Tests / Stringable behavior
tests/StrMacroTest.php
Assert cleaned Stringable by casting to string ((string)) instead of calling toString().

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰
I nibble code with careful paws,
UTF‑8 stitched without a flaw,
pipelines checked, masks aligned,
nested flags I tend and bind—
hop, the meadow’s running law!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Blasp v4: Driver-based architecture rewrite' accurately describes the main change: a comprehensive rewrite implementing a driver-based architecture replacing the monolithic detection engine, which is the primary focus across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/v4-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
config/languages/spanish.php (1)

149-149: ⚠️ Potential issue | 🟡 Minor

Duplicate entry: cabronazo appears twice in the profanities list.

This term is listed at both line 149 and line 186.

Proposed fix
         'cabronazo',
-        'hijoelagranputa',
+        'hijoelagranputa',

Remove one of the duplicate cabronazo entries (line 149 or 186).

Also applies to: 186-186

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/languages/spanish.php` at line 149, The Spanish profanities list
contains a duplicated entry 'cabronazo'; remove one of the two occurrences so
the array contains a single 'cabronazo' entry (locate the profanities array
where 'cabronazo' appears and delete either the item at the earlier occurrence
or the later one), ensuring the array syntax/commas remain valid after removal.
src/Core/Normalizers/SpanishNormalizer.php (1)

21-33: ⚠️ Potential issue | 🟠 Major

Handle preg_replace_callback() failure paths to maintain string contract.

Both preg_replace_callback() calls (lines 21 and 28) can return null on PCRE errors or invalid UTF-8. The first failure leaves $normalizedString as null, which the second callback receives instead of a string, and the method's return type declaration (string at line 7) is violated when null is returned at line 35.

Proposed fix
-        $normalizedString = preg_replace_callback('/\bll(?=[aeiouáéíóúü])/i', function ($matches) {
+        $normalizedString = preg_replace_callback('/\bll(?=[aeiouáéíóúü])/i', function ($matches) {
             $match = $matches[0];
             if ($match === 'LL') return 'Y';
             if ($match === 'Ll') return 'Y';
             return 'y';
-        }, $normalizedString);
+        }, $normalizedString) ?? $normalizedString;

-        $normalizedString = preg_replace_callback('/rr/i', function ($matches) {
+        $normalizedString = preg_replace_callback('/rr/i', function ($matches) {
             $match = $matches[0];
             if ($match === 'RR') return 'R';
             if ($match === 'Rr') return 'R';
             return 'r';
-        }, $normalizedString);
+        }, $normalizedString) ?? $normalizedString;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Core/Normalizers/SpanishNormalizer.php` around lines 21 - 33, The
preg_replace_callback calls can return null and thus break the method's string
return contract; change each assignment to use a temporary variable (e.g.
$result = preg_replace_callback(...)) and then check if $result === null — if
so, keep the previous string value of $normalizedString (or cast to string) and
log/handle the PCRE failure as appropriate; otherwise assign $normalizedString =
$result. Apply this for both callbacks that touch $normalizedString so the
method (in SpanishNormalizer, the function using $normalizedString and declared
to return string) never ends up returning null.
🟡 Minor comments (12)
src/Core/Matchers/FalsePositiveFilter.php-125-139 (1)

125-139: ⚠️ Potential issue | 🟡 Minor

Use /\wu/ regex flag for proper Unicode word boundary detection in getFullWordContext().

The method uses the /\w/ pattern without the /u flag to detect word character boundaries. This matches only ASCII word characters [A-Za-z0-9_], not Unicode letters. When expanding to full word context for strings containing accented or non-ASCII word characters, the regex will stop at the first non-ASCII character, potentially producing incorrect context. Add the /u flag to the regex patterns on lines 130 and 134 to properly support UTF-8 word boundaries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Core/Matchers/FalsePositiveFilter.php` around lines 125 - 139, In
getFullWordContext, the preg_match calls using '/\w/' only match ASCII word
characters and break on UTF-8; update both regexes in the left and right
expansion loops to use the Unicode flag (e.g. change '/\w/' to '/\w/u') so
preg_match properly recognizes non-ASCII word characters when expanding the left
and right bounds around the match.
tests/BlaspCheckTest.php-163-163 (1)

163-163: ⚠️ Potential issue | 🟡 Minor

Typo in test method name: boudaryboundary.

Proposed fix
-    public function test_word_boudary()
+    public function test_word_boundary()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/BlaspCheckTest.php` at line 163, Rename the test method
test_word_boudary to test_word_boundary to fix the typo; update the method
declaration and any references/annotations that call or refer to
test_word_boudary so PHPUnit runs the corrected test name, ensuring the function
signature in BlaspCheckTest (public function test_word_boudary) is changed to
public function test_word_boundary.
tests/BlaspCheckTest.php-175-175 (1)

175-175: ⚠️ Potential issue | 🟡 Minor

Typo in test method name: puralplural.

Proposed fix
-    public function test_pural_profanity()
+    public function test_plural_profanity()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/BlaspCheckTest.php` at line 175, Rename the test method named
test_pural_profanity to test_plural_profanity (fixing the "pural" → "plural"
typo) and update any references to that method (calls, annotations, or data
providers) so PHPUnit discovers and runs it correctly; ensure method name in
class BlaspCheckTest and any related docblocks or test-suite references are
adjusted accordingly.
tests/BlaspCheckTest.php-193-193 (1)

193-193: ⚠️ Potential issue | 🟡 Minor

Typo in test method name: subtitutionsubstitution.

Proposed fix
-    public function test_ass_subtitution()
+    public function test_ass_substitution()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/BlaspCheckTest.php` at line 193, Rename the test method
test_ass_subtitution to test_ass_substitution to fix the typo; update the method
declaration in BlaspCheckTest (and any references, data providers, or
annotations that refer to test_ass_subtitution) so the test runner and any
callers use the corrected name test_ass_substitution.
tests/CacheDriverConfigurationTest.php-18-32 (1)

18-32: ⚠️ Potential issue | 🟡 Minor

These tests don't actually prove the cache behavior yet.

test_dictionary_can_be_created_without_cache() never asserts that no cache entries were written, and both clear-cache tests call Dictionary::clearCache() before warming any dictionary. They will still pass if caching is broken or clearCache() is a no-op. Prime a language first and assert the registry changes around the clear call.

🧪 Suggested test hardening
     public function test_dictionary_can_be_created_without_cache(): void
     {
         Config::set('blasp.cache.driver', null);

         $dictionary = Dictionary::forLanguage('english');

         $this->assertNotNull($dictionary);
         $this->assertNotEmpty($dictionary->getProfanities());
+        $this->assertFalse(Cache::has('blasp_cache_keys'));
     }

     public function test_clear_cache_works(): void
     {
+        Config::set('blasp.cache.driver', 'array');
+        Dictionary::forLanguage('english');
+        $this->assertNotEmpty(Cache::store('array')->get('blasp_cache_keys', []));
+
         Dictionary::clearCache();
         $this->assertFalse(Cache::has('blasp_cache_keys'));
     }
...
     public function test_clear_cache_with_custom_driver(): void
     {
         Config::set('blasp.cache.driver', 'array');
+        Dictionary::forLanguage('english');
+        $this->assertNotEmpty(Cache::store('array')->get('blasp_cache_keys', []));

         Dictionary::clearCache();

         $keys = Cache::store('array')->get('blasp_cache_keys', []);
         $this->assertEmpty($keys);
     }

Also applies to: 51-58

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/CacheDriverConfigurationTest.php` around lines 18 - 32, Both tests are
not validating caching behavior: update
test_dictionary_can_be_created_without_cache to set
Config::set('blasp.cache.driver', null), call Dictionary::forLanguage('english')
to prime the dictionary, then assert Cache::has('blasp_cache_keys') is false (no
entries written) while still asserting $dictionary and getProfanities() are
valid; update test_clear_cache_works to first prime the cache by calling
Dictionary::forLanguage('english') (or another language), assert
Cache::has('blasp_cache_keys') is true, then call Dictionary::clearCache() and
assert Cache::has('blasp_cache_keys') is false to prove clearCache() actually
removes registry entries (use Dictionary::forLanguage, Dictionary::clearCache,
Cache::has and Config::set references).
tests/MultiLanguageProfanityTest.php-29-31 (1)

29-31: ⚠️ Potential issue | 🟡 Minor

Keep a native UTF-8 case in the regression set.

The PR calls out multibyte/UTF-8 fixes, but these assertions now only prove the ASCII transliterations. If accent/ß normalization regresses, this file can still stay green.

Proposed fix
         $testCases = [
             'mierda' => 'Esta es una mierda',
             'joder' => 'No quiero joder',
             'cabron' => 'Eres un cabron',
+            'cabrón' => 'Eres un cabrón',
             'puta' => 'La puta madre',
         ];
@@
         $testCases = [
             'english' => ['FUCK', 'FuCk', 'fUcK'],
-            'spanish' => ['MIERDA', 'MiErDa', 'mIeRdA'],
-            'german' => ['SCHEISSE', 'ScHeIsSe', 'schEISSE'],
+            'spanish' => ['MIERDA', 'MiErDa', 'mIeRdA', 'CABRÓN'],
+            'german' => ['SCHEISSE', 'ScHeIsSe', 'schEISSE', 'SchEiße'],
             'french' => ['MERDE', 'MeRdE', 'mErDe'],
         ];

Also applies to: 93-96

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/MultiLanguageProfanityTest.php` around lines 29 - 31, The test
regression set currently uses ASCII transliterations (e.g. "'cabron' => 'Eres un
cabron'") which hides multibyte/UTF‑8 behavior; update those entries to use
native UTF‑8 characters (for example change "cabron"/"Eres un cabron" to
"cabrón"/"Eres un cabrón" and any similar ASCII forms) and apply the same UTF‑8
replacements to the other entries referred to (the ones around the 93-96 block)
so the tests exercise accent/ß/multibyte normalization end-to-end.
tests/ConfigurationLoaderTest.php-80-83 (1)

80-83: ⚠️ Potential issue | 🟡 Minor

Seed the cache before asserting clearCache().

As written, this test passes even if Dictionary::clearCache() is a no-op, because blasp_cache_keys is never created first.

Proposed fix
     public function test_clear_cache()
     {
+        Cache::put('blasp_cache_keys', ['blasp.test']);
+        $this->assertTrue(Cache::has('blasp_cache_keys'));
+
         Dictionary::clearCache();
         $this->assertFalse(Cache::has('blasp_cache_keys'));
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ConfigurationLoaderTest.php` around lines 80 - 83, The test
test_clear_cache currently calls Dictionary::clearCache() then asserts
Cache::has('blasp_cache_keys') is false but never seeds that cache key first;
modify the test to seed the cache (e.g. Cache::put('blasp_cache_keys', ['dummy'
=> true]) or Cache::forever('blasp_cache_keys', ['dummy' => true'])) before
calling Dictionary::clearCache(), then call Dictionary::clearCache() and assert
Cache::has('blasp_cache_keys') is false to verify the method actually removes
the key.
tests/MultiLanguageProfanityTest.php-43-45 (1)

43-45: ⚠️ Potential issue | 🟡 Minor

Remove the duplicate German test key.

PHP keeps only the last value for duplicate string keys, so one of these cases is silently discarded before the loop runs. Replace it with the UTF-8 variant 'scheiße' (with ß) to add coverage for the multibyte character handling:

Proposed fix
         $testCases = [
-            'scheisse' => 'Das ist scheisse',
-            'scheisse' => 'Das ist scheisse',
+            'scheisse' => 'Das ist scheisse',
+            'scheiße' => 'Das ist scheiße',
             'arsch' => 'Du bist ein arsch',
             'ficken' => 'Ich will ficken',
             'verdammt' => 'Verdammt noch mal',
         ];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/MultiLanguageProfanityTest.php` around lines 43 - 45, The $testCases
array in MultiLanguageProfanityTest.php contains duplicate string keys
('scheisse') so PHP drops the first entry; update the array assigned to
$testCases to remove the duplicate key and replace one of them with the UTF-8
variant 'scheiße' (use 'scheiße' as the key and a corresponding value like 'Das
ist scheiße') so the test covers multibyte character handling; adjust only the
$testCases entries (refer to the $testCases variable in this file/test) and keep
other test logic unchanged.
tests/ResultCachingTest.php-107-118 (1)

107-118: ⚠️ Potential issue | 🟡 Minor

This test loses the keys it needs to verify.

Lines 112-113 reload $keys after Dictionary::clearCache(), so the foreach iterates the cleared list and never asserts that the previously cached result entries were removed. Keep the pre-clear key set and assert against that.

♻️ Suggested fix
         $keys = Cache::get('blasp_result_cache_keys', []);
         $this->assertNotEmpty($keys);
 
         Dictionary::clearCache();
 
-        $keys = Cache::get('blasp_result_cache_keys', []);
         $this->assertNull(Cache::get('blasp_result_cache_keys'));
 
         // Verify the cached result data was also cleared
         foreach ($keys as $key) {
             $this->assertNull(Cache::get($key));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ResultCachingTest.php` around lines 107 - 118, The test reloads $keys
after calling Dictionary::clearCache(), losing the original key list needed for
verification; change the test to capture the pre-clear keys (e.g. $originalKeys
= $keys) before calling Dictionary::clearCache(), remove the second assignment
to $keys, keep the assertion that Cache::get('blasp_result_cache_keys') is null,
and iterate $originalKeys in the foreach to assert Cache::get($key) is null for
each previously cached entry (references: $keys, $originalKeys,
Dictionary::clearCache(), Cache::get()).
src/Core/Result.php-156-157 (1)

156-157: ⚠️ Potential issue | 🟡 Minor

Edge case: Empty text with matches could produce incorrect score.

When $originalText is empty (or whitespace-only), preg_split returns [''] after PREG_SPLIT_NO_EMPTY, resulting in count() = 0. The max(1, ...) ensures $totalWords >= 1, but if $words is non-empty while $originalText is empty, the score calculation may not reflect the expected context.

This is defensive but worth documenting the expected behavior when withMatches() is called with words but an empty original text.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Core/Result.php` around lines 156 - 157, The current total-words
calculation in Result.php can miscount when $originalText is empty but $words is
non-empty; modify the logic used before calling Score::calculate so that if
trim($originalText) is empty and $words is provided, $totalWords is derived from
count($words) (or count(preg_split(..., implode(' ', $words)))) instead of
defaulting to 1; update the block that computes $totalWords (the line using
preg_split on $originalText ?: implode(' ', $words)) and ensure
Score::calculate($matchedWords, $totalWords) receives this corrected value, and
add a short comment in the withMatches-related area documenting this edge-case
behavior.
src/PendingCheck.php-151-157 (1)

151-157: ⚠️ Potential issue | 🟡 Minor

Unused $falsePositives parameter in configure().

The $falsePositives parameter is accepted but ignored. This appears to be incomplete backward-compatibility implementation. Either implement false positive filtering or document that this parameter is deprecated/ignored.

🔧 Proposed fix to document or implement

Option 1: Document as ignored (if intentional):

+    /**
+     * `@deprecated` Use allow() for false positives. The $falsePositives parameter is ignored.
+     */
     public function configure(?array $profanities = null, ?array $falsePositives = null): self

Option 2: Implement the functionality:

     public function configure(?array $profanities = null, ?array $falsePositives = null): self
     {
         if ($profanities !== null) {
             $this->blockList = array_merge($this->blockList, $profanities);
         }
+        if ($falsePositives !== null) {
+            $this->allowList = array_merge($this->allowList, $falsePositives);
+        }
         return $this;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/PendingCheck.php` around lines 151 - 157, The configure() method
currently ignores the $falsePositives parameter; update it to apply
false-positive handling by merging $falsePositives into a dedicated property and
removing any false positives from the current block list: inside
PendingCheck::configure add handling like if ($falsePositives !== null) {
$this->falsePositives = array_merge($this->falsePositives ?? [],
$falsePositives); $this->blockList = array_values(array_diff($this->blockList,
$falsePositives)); } so the unique symbols to change are the configure() method,
the $falsePositives parameter, the $this->blockList property, and add/use
$this->falsePositives property to store the allowed items.
src/Core/Dictionary.php-49-66 (1)

49-66: ⚠️ Potential issue | 🟡 Minor

Use mb_strtolower() for UTF-8 safety.

The PR objectives mention UTF-8/multibyte safety fixes, but this code uses strtolower() which doesn't handle multibyte characters correctly. Additionally, line 55 performs a case-sensitive in_array() check against $this->profanities which may contain mixed-case words.

🛡️ Proposed fix
-        $this->allowList = array_map('strtolower', $allowList);
-        $this->blockList = array_map('strtolower', $blockList);
+        $this->allowList = array_map(fn($w) => mb_strtolower($w, 'UTF-8'), $allowList);
+        $this->blockList = array_map(fn($w) => mb_strtolower($w, 'UTF-8'), $blockList);
         $this->language = $language;

         // Apply block list — add extra words to profanities
         foreach ($this->blockList as $word) {
-            if (!in_array($word, $this->profanities)) {
+            if (!in_array($word, array_map(fn($p) => mb_strtolower($p, 'UTF-8'), $this->profanities))) {
                 $this->profanities[] = $word;
                 $this->severityMap[$word] = Severity::High;
             }
         }

         // Remove allow-listed words
         if (!empty($this->allowList)) {
             $this->profanities = array_values(array_filter(
                 $this->profanities,
-                fn($p) => !in_array(strtolower($p), $this->allowList)
+                fn($p) => !in_array(mb_strtolower($p, 'UTF-8'), $this->allowList)
             ));
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Core/Dictionary.php` around lines 49 - 66, Replace all uses of
strtolower() with mb_strtolower() for UTF-8 safety when normalizing $allowList
and $blockList (e.g. the array_map calls that assign $this->allowList and
$this->blockList), and ensure comparisons against $this->profanities are done
case-insensitively by comparing normalized values (use mb_strtolower on both
sides). Specifically, update the block-list loop that checks in_array($word,
$this->profanities) to compare mb_strtolower($word) against a lowercased
profanities set (or normalize $this->profanities once), and change the
allow-list filter closure (fn($p) => !in_array(strtolower($p),
$this->allowList)) to use mb_strtolower($p) and mb_strtolower for list entries
so all checks are multibyte-safe.
🧹 Nitpick comments (16)
src/Core/Matchers/CompoundWordDetector.php (1)

39-43: Use array_keys() to avoid the unused $_ variable.

Static analysis flagged the unused $_. Since only the keys are needed, iterating over array_keys() is cleaner.

♻️ Proposed fix
-        foreach ($profanityExpressions as $profanity => $_) {
-            if (strlen($profanity) >= 3 && stripos($remainder, $profanity) !== false) {
+        foreach (array_keys($profanityExpressions) as $profanity) {
+            if (strlen($profanity) >= 3 && stripos($remainder, $profanity) !== false) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Core/Matchers/CompoundWordDetector.php` around lines 39 - 43, In
CompoundWordDetector (the loop over $profanityExpressions), replace the foreach
that uses an unused value slot (foreach ($profanityExpressions as $profanity =>
$_)) by iterating only the keys (e.g., foreach
(array_keys($profanityExpressions) as $profanity)) so static analysis warnings
go away while preserving the existing logic that checks strlen($profanity) >= 3
and stripos($remainder, $profanity) !== false before returning false.
src/Core/Score.php (1)

7-16: Add type documentation for the $matchedWords parameter.

The method assumes array elements are MatchedWord instances (accessing $word->severity->weight()), but this isn't documented. Consider adding a PHPDoc annotation for IDE support and clarity.

📝 Suggested documentation
+    /**
+     * `@param` MatchedWord[] $matchedWords
+     */
     public static function calculate(array $matchedWords, int $totalWordCount): int
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Core/Score.php` around lines 7 - 16, Add a PHPDoc type annotation for the
$matchedWords parameter on the calculate method to document that the array
contains MatchedWord instances (e.g., `@param` MatchedWord[] $matchedWords).
Update the docblock above the public static function calculate(array
$matchedWords, int $totalWordCount): int to reference the MatchedWord class so
IDEs and static analyzers know the element type used when calling
$word->severity->weight().
composer.json (1)

18-24: Potential testbench/Laravel version mismatch.

illuminate/support supports Laravel 8–12, but orchestra/testbench ^10.0 only supports Laravel 12. This may cause issues when testing against older Laravel versions in CI. Consider either narrowing illuminate/support to ^12.0 or using a wider testbench version range if backward compatibility is needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@composer.json` around lines 18 - 24, The composer.json currently allows
"illuminate/support" ^8–12 while "orchestra/testbench" is fixed to ^10.0,
causing a testbench/Laravel version mismatch; update composer.json so both
requirements align—either restrict "illuminate/support" to ^12.0 if you only
intend to support Laravel 12, or broaden "orchestra/testbench" to a range that
supports the older Laravel majors you need (so the "require" entry for
"illuminate/support" and the "require-dev" entry for "orchestra/testbench"
reference compatible major versions).
src/Core/Normalizers/EnglishNormalizer.php (1)

5-11: Consider extending or delegating to NullNormalizer.

This implementation is identical to NullNormalizer. While having separate language-specific normalizer classes allows independent evolution (e.g., adding English-specific transformations later), you could reduce duplication by extending NullNormalizer or delegating to it until English-specific logic is needed.

♻️ Optional: extend NullNormalizer
-class EnglishNormalizer implements StringNormalizer
+class EnglishNormalizer extends NullNormalizer
 {
-    public function normalize(string $string): string
-    {
-        return $string;
-    }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Core/Normalizers/EnglishNormalizer.php` around lines 5 - 11, The
EnglishNormalizer currently duplicates NullNormalizer behavior; update
EnglishNormalizer to reuse NullNormalizer by either extending NullNormalizer
(e.g., class EnglishNormalizer extends NullNormalizer implements
StringNormalizer) or delegating its normalize(string $string): string to an
internal NullNormalizer instance, keeping the public normalize method and the
EnglishNormalizer class name so language-specific logic can be added later.
config/languages/spanish.php (1)

4-33: Severity map added - verify coverage alignment with profanities list.

The new severity mapping provides good categorization. However, some terms in profanities (e.g., homosexual at line 74, jodido/jodida at lines 44-45) are not present in any severity tier. Consider whether all profanities should have a severity assignment for consistent scoring behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/languages/spanish.php` around lines 4 - 33, The severity map
('severity' array) is missing entries for some profanities declared elsewhere
(e.g., the profanities list contains "homosexual" and "jodido"/"jodida" which
are not present in any severity tier), so update the severity mapping to include
those terms with appropriate tiers; locate the 'severity' array in this diff and
add the missing tokens (or remove/normalize them from the profanities list if
intentionally excluded) so every profanity in the profanities collection has a
corresponding severity level and consistent scoring behavior.
src/Core/Normalizers/GermanNormalizer.php (1)

18-23: Case-preserving callback handles common cases; consider edge cases.

The callback handles SCH, Sch, and defaults to sh. Mixed-case variants like sCH or ScH will normalize to lowercase sh. This is likely acceptable for profanity detection, but if full case preservation is desired:

Alternative: preserve casing per character
         $normalizedString = preg_replace_callback('/sch/i', function ($matches) {
             $match = $matches[0];
-            if ($match === 'SCH') return 'SH';
-            if ($match === 'Sch') return 'Sh';
-            return 'sh';
+            return (ctype_upper($match[0]) ? 'S' : 's') . (ctype_upper($match[2]) ? 'H' : 'h');
         }, $normalizedString);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Core/Normalizers/GermanNormalizer.php` around lines 18 - 23, The sch
replacement callback in GermanNormalizer (the preg_replace_callback on
$normalizedString) only special-cases 'SCH' and 'Sch' and otherwise returns
'sh', which loses mixed-case patterns like 'sCH' or 'ScH'; update the callback
to compute the replacement per-character by examining the original $matches[0]
characters and producing 's'/'S' + 'h'/'H' according to each source character's
case (preserve full per-character casing) so mixed-case inputs map to
corresponding mixed-case outputs while keeping the existing checks for 'SCH' and
'Sch'.
src/Console/ClearCommand.php (1)

13-17: Consider adding error handling for cache clear failures.

The command assumes Dictionary::clearCache() always succeeds. While cache clearing is typically safe, adding minimal error handling would improve robustness.

♻️ Optional: Add try-catch for robustness
 public function handle(): void
 {
-    Dictionary::clearCache();
-    $this->info('Blasp cache cleared successfully!');
+    try {
+        Dictionary::clearCache();
+        $this->info('Blasp cache cleared successfully!');
+    } catch (\Throwable $e) {
+        $this->error('Failed to clear cache: ' . $e->getMessage());
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Console/ClearCommand.php` around lines 13 - 17, Wrap the call to
Dictionary::clearCache() in a try-catch inside ClearCommand::handle so failures
are caught; on success keep $this->info('Blasp cache cleared successfully!'), on
exception call $this->error(...) with the exception message (e.g.
$e->getMessage()) and return a non-zero status (return 1) so the command reports
failure, otherwise return 0. Ensure you reference Dictionary::clearCache,
ClearCommand::handle, $this->info and $this->error when making the change.
tests/AllLanguagesDetectionTest.php (1)

102-118: Use accented spellings here so this remains an end-to-end normalizer test.

Line 103 says “umlauts and eszett”, but scheisse is already the ASCII fallback; Line 114 calls out accents, but connard has none. A broken normalizer wiring in Blasp::...()->check() would still pass. Swap at least one case to scheiße and an accented French term like enculé or mèrde.

♻️ Suggested update
-        $germanTests = ['scheisse', 'Scheisse', 'SCHEISSE'];
+        $germanTests = ['scheiße', 'Scheiße', 'SCHEISSE'];
...
-        $frenchTests = ['connard', 'CONNARD', 'Connard'];
+        $frenchTests = ['enculé', 'ENCULÉ', 'Enculé'];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/AllLanguagesDetectionTest.php` around lines 102 - 118, The test uses
ASCII fallbacks so the normalization pipeline isn't truly exercised; update the
test inputs used with Blasp::german()->check and Blasp::french()->check to
include at least one real-accented form (e.g., replace or add "scheiße" among
$germanTests and replace or add a French accented term like "enculé" in
$frenchTests) so the end-to-end normalizer is validated rather than only ASCII
variants.
tests/DetectionStrategyRegistryTest.php (1)

47-60: Consider suppressing the unused parameter warnings or using named underscore prefixes.

The anonymous class implementing DriverInterface has unused parameters ($app, $dictionary, $mask, $options) flagged by PHPMD. While these are required by the interface contract, you could improve clarity by using underscore-prefixed names to signal intentional non-use.

🔧 Optional: Use underscore prefix for intentionally unused parameters
-        $this->manager->extend('custom', function ($app) {
-            return new class implements DriverInterface {
-                public function detect(string $text, Dictionary $dictionary, MaskStrategyInterface $mask, array $options = []): Result
+        $this->manager->extend('custom', function ($_app) {
+            return new class implements DriverInterface {
+                public function detect(string $text, Dictionary $_dictionary, MaskStrategyInterface $_mask, array $_options = []): Result
                 {
                     return new Result($text, $text, [], 0);
                 }
             };
         });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/DetectionStrategyRegistryTest.php` around lines 47 - 60, The anonymous
DriverInterface implementation in the test (inside
test_extend_registers_custom_driver and the closure passed to
$this->manager->extend) has unused parameters in detect causing PHPMD warnings;
update the detect signature to mark them intentionally unused by renaming
parameters to _app, _dictionary, _mask, and _options (or add a PHPMD suppression
comment on the anonymous class) so the interface contract is preserved but
static analysis warnings are silenced.
tests/SeverityMapTest.php (1)

8-8: Missing TestCase import.

The class extends TestCase but there's no explicit import statement. This relies on the class being in the same namespace (Blaspsoft\Blasp\Tests), which should work if TestCase.php exists there, but explicit imports improve clarity.

♻️ Add explicit import for clarity
 use Blaspsoft\Blasp\Facades\Blasp;
 use Blaspsoft\Blasp\Enums\Severity;
+use Blaspsoft\Blasp\Tests\TestCase;
 
 class SeverityMapTest extends TestCase
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/SeverityMapTest.php` at line 8, The test class SeverityMapTest
currently extends TestCase without an explicit import; add a use statement to
import the TestCase class used by the tests (e.g., add "use
PHPUnit\Framework\TestCase;" or the project's TestCase namespace if different)
so the class reference is explicit and clear, ensuring the file defines
SeverityMapTest extends TestCase with the corresponding use line at the top.
src/Core/Matchers/RegexMatcher.php (1)

103-112: Verify default quantifier usage.

The default quantifier *? could potentially cause performance issues with certain inputs. While all current callers in this file pass explicit quantifiers ('', '+'), external consumers could trigger the default.

Consider making the quantifier parameter required or documenting that *? is the default:

♻️ Proposed documentation improvement
-    private function generateEscapedExpression(array $characters = [], array $escapedCharacters = [], string $quantifier = '*?'): string
+    /**
+     * `@param` string $quantifier Repetition quantifier (default '*?' for zero-or-more lazy)
+     */
+    private function generateEscapedExpression(array $characters = [], array $escapedCharacters = [], string $quantifier = '*?'): string
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Core/Matchers/RegexMatcher.php` around lines 103 - 112, The method
generateEscapedExpression currently uses a risky default quantifier '*?' which
may cause performance problems; make the quantifier explicit by removing the
default and requiring callers to pass it (update the signature of
generateEscapedExpression to require string $quantifier), then update all call
sites in this class to pass their explicit quantifiers (noting existing callers
already pass '' and '+'), and add/update the PHPDoc for
generateEscapedExpression to state that the quantifier must be provided and what
values are expected.
src/PendingCheck.php (1)

241-259: Lenient mode silently overrides driver selection.

When lenientMode is true, the driver is forced to 'pattern' regardless of any explicit driver() call. This could be surprising to users who call ->driver('phonetic')->lenient().

Consider documenting this behavior or logging a debug message:

         if ($this->lenientMode) {
+            // Lenient mode uses pattern driver for looser matching
             $driverName = 'pattern';
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/PendingCheck.php` around lines 241 - 259, The current resolveDriver()
silently forces the driver to 'pattern' whenever $this->lenientMode is true,
overriding any explicit ->driver() selection; update resolveDriver() so that it
only falls back to the 'pattern' driver when lenientMode is true AND no explicit
driver was chosen (i.e. $this->driverName is null), or alternatively emit a
debug/log message when lenientMode overrides an explicit driver; adjust the
logic around $driverName, keeping references to resolveDriver(),
$this->driverName, $this->lenientMode, manager->getDefaultDriver(), and the
'pattern' literal to implement the conditional override or add a logging
statement.
tests/PhoneticDriverTest.php (1)

112-126: Consider more specific assertion for phonetic variant detection.

The test uses str_contains to check if any matched word contains 'fuck' or 'phuk'. While flexible, this could match unintended substrings. Consider asserting on the exact base word if the dictionary/matcher behavior is deterministic.

-        $matched = false;
-        foreach ($result->uniqueWords() as $word) {
-            if (str_contains($word, 'fuck') || str_contains($word, 'phuk')) {
-                $matched = true;
-                break;
-            }
-        }
-        $this->assertTrue($matched, 'Expected a fuck/phuk variant in uniqueWords: ' . implode(', ', $result->uniqueWords()));
+        $this->assertContains('fuck', $result->uniqueWords(), 'Expected base word "fuck" in uniqueWords');

However, if the base word can legitimately vary, the current approach is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/PhoneticDriverTest.php` around lines 112 - 126, The
test_detects_phonetic_evasion currently uses str_contains on each entry of
$result->uniqueWords() which may produce false positives; update the assertion
to check for exact matches against the expected deterministic base word(s)
instead: retrieve the array from $result->uniqueWords() and assert that it
contains one of the exact allowed variants (e.g., 'fucking' or 'phuking' or
whatever the phonetic driver is expected to produce) using equality comparisons
or in_array, or if the dictionary can legitimately vary keep a small explicit
whitelist of acceptable exact base words and assert intersection with that
whitelist is non-empty; locate this change in the test_detects_phonetic_evasion
function and replace the str_contains loop with an exact-match check against the
whitelist of expected phonetic variants.
src/Core/Dictionary.php (3)

194-198: Use mb_strtolower() in getSeverity() for consistency.

For UTF-8 safety, this should use mb_strtolower() to match the multibyte handling elsewhere in the codebase.

♻️ Proposed fix
     public function getSeverity(string $word): Severity
     {
-        $lower = strtolower($word);
+        $lower = mb_strtolower($word, 'UTF-8');
         return $this->severityMap[$lower] ?? Severity::High;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Core/Dictionary.php` around lines 194 - 198, The getSeverity method uses
strtolower which is not multibyte-safe; change it to use mb_strtolower to match
the rest of the codebase and ensure UTF-8 characters are handled correctly.
Update the implementation in function getSeverity(string $word): Severity to
call mb_strtolower($word) (optionally passing 'UTF-8') before looking up
$this->severityMap[$lower]; keep the fallback to Severity::High unchanged so
behavior remains the same for missing entries.

294-318: Use mb_strtolower() in buildSeverityMap() for UTF-8 safety.

Lines 302 and 310 use strtolower() which isn't multibyte-safe. This should be consistent with the PR's UTF-8 safety goals.

♻️ Proposed fix
     private static function buildSeverityMap(array $config): array
     {
         $map = [];

         if (isset($config['severity']) && is_array($config['severity'])) {
             foreach ($config['severity'] as $level => $words) {
                 $severity = Severity::tryFrom($level) ?? Severity::High;
                 foreach ($words as $word) {
-                    $map[strtolower($word)] = $severity;
+                    $map[mb_strtolower($word, 'UTF-8')] = $severity;
                 }
             }
         }

         // Words only in profanities (not in severity map) default to High
         if (isset($config['profanities'])) {
             foreach ($config['profanities'] as $word) {
-                $lower = strtolower($word);
+                $lower = mb_strtolower($word, 'UTF-8');
                 if (!isset($map[$lower])) {
                     $map[$lower] = Severity::High;
                 }
             }
         }

         return $map;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Core/Dictionary.php` around lines 294 - 318, In buildSeverityMap(),
strtolower() is used on words from $config['severity'] and
$config['profanities'], which is not multibyte-safe; replace those calls with
mb_strtolower($word, 'UTF-8') (or mb_strtolower($word) with default encoding if
project-wide) so UTF-8 characters are handled correctly when populating the $map
and checking !isset($map[$lower]); ensure the same change is made for both
places where strtolower() is used in this method.

16-16: Unused constant CACHE_TTL.

This constant is defined but never referenced elsewhere in the class. Either remove it or use it in the caching methods (e.g., when storing cache entries).

🧹 Proposed fix: Remove unused constant
 class Dictionary
 {
-    private const CACHE_TTL = 86400;
-
     private array $profanities;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Core/Dictionary.php` at line 16, The private constant CACHE_TTL is
declared but never used in the Dictionary class; either remove the constant or
apply it where cache entries are stored/returned. If you want to keep it, update
the caching methods (e.g., methods that set or save cache entries in class
Dictionary such as any putCache/saveToCache/setCache method) to use CACHE_TTL as
the TTL value when calling the cache store operation; otherwise delete the
unused CACHE_TTL declaration to eliminate dead code. Ensure references use the
constant name CACHE_TTL and that any cache store call passes it as the TTL
argument.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 72fbb9b0-bf10-4996-82b6-8fec9c9dd973

📥 Commits

Reviewing files that changed from the base of the PR and between f24d341 and a67ccd8.

📒 Files selected for processing (95)
  • README.md
  • composer.json
  • config/blasp.php
  • config/config.php
  • config/languages/english.php
  • config/languages/french.php
  • config/languages/german.php
  • config/languages/spanish.php
  • src/Abstracts/BaseDetectionStrategy.php
  • src/Abstracts/StringNormalizer.php
  • src/BlaspManager.php
  • src/BlaspService.php
  • src/BlaspServiceProvider.php
  • src/Blaspable.php
  • src/Config/ConfigurationLoader.php
  • src/Config/DetectionConfig.php
  • src/Config/MultiLanguageDetectionConfig.php
  • src/Console/ClearCommand.php
  • src/Console/Commands/BlaspClearCommand.php
  • src/Console/LanguagesCommand.php
  • src/Console/TestCommand.php
  • src/Contracts/DetectionConfigInterface.php
  • src/Contracts/DetectionStrategyInterface.php
  • src/Contracts/ExpressionGeneratorInterface.php
  • src/Contracts/MultiLanguageConfigInterface.php
  • src/Contracts/RegistryInterface.php
  • src/Core/Analyzer.php
  • src/Core/Contracts/DriverInterface.php
  • src/Core/Contracts/MaskStrategyInterface.php
  • src/Core/Dictionary.php
  • src/Core/Masking/CallbackMask.php
  • src/Core/Masking/CharacterMask.php
  • src/Core/Masking/GrawlixMask.php
  • src/Core/MatchedWord.php
  • src/Core/Matchers/CompoundWordDetector.php
  • src/Core/Matchers/FalsePositiveFilter.php
  • src/Core/Matchers/PhoneticMatcher.php
  • src/Core/Matchers/RegexMatcher.php
  • src/Core/Normalizers/EnglishNormalizer.php
  • src/Core/Normalizers/FrenchNormalizer.php
  • src/Core/Normalizers/GermanNormalizer.php
  • src/Core/Normalizers/NullNormalizer.php
  • src/Core/Normalizers/SpanishNormalizer.php
  • src/Core/Normalizers/StringNormalizer.php
  • src/Core/Result.php
  • src/Core/Score.php
  • src/Drivers/PatternDriver.php
  • src/Drivers/PhoneticDriver.php
  • src/Drivers/PipelineDriver.php
  • src/Drivers/RegexDriver.php
  • src/Enums/Severity.php
  • src/Events/ContentBlocked.php
  • src/Events/ModelProfanityDetected.php
  • src/Events/ProfanityDetected.php
  • src/Exceptions/ProfanityRejectedException.php
  • src/Facades/Blasp.php
  • src/Middleware/CheckProfanity.php
  • src/Normalizers/EnglishStringNormalizer.php
  • src/Normalizers/GermanStringNormalizer.php
  • src/Normalizers/Normalize.php
  • src/PendingCheck.php
  • src/ProfanityDetector.php
  • src/Registries/DetectionStrategyRegistry.php
  • src/Registries/LanguageNormalizerRegistry.php
  • src/Rules/Profanity.php
  • src/ServiceProvider.php
  • src/Testing/BlaspFake.php
  • tests/AllLanguagesApiTest.php
  • tests/AllLanguagesDetectionTest.php
  • tests/BladeDirectiveTest.php
  • tests/BlaspCheckTest.php
  • tests/BlaspableTest.php
  • tests/CacheDriverConfigurationTest.php
  • tests/ConfigurationLoaderLanguageTest.php
  • tests/ConfigurationLoaderTest.php
  • tests/CustomMaskCharacterTest.php
  • tests/DetectionStrategyRegistryTest.php
  • tests/EdgeCaseTest.php
  • tests/EmptyInputTest.php
  • tests/FrenchStringNormalizerTest.php
  • tests/GermanStringNormalizerTest.php
  • tests/Issue24Test.php
  • tests/Issue32FalsePositiveTest.php
  • tests/MiddlewareAliasTest.php
  • tests/MultiLanguageDetectionConfigTest.php
  • tests/MultiLanguageProfanityTest.php
  • tests/PhoneticDriverTest.php
  • tests/PipelineDriverTest.php
  • tests/ProfanityExpressionGeneratorTest.php
  • tests/ResultCachingTest.php
  • tests/SeverityMapTest.php
  • tests/SpanishStringNormalizerTest.php
  • tests/StrMacroTest.php
  • tests/TestCase.php
  • tests/UuidFalsePositiveTest.php
💤 Files with no reviewable changes (21)
  • src/Abstracts/StringNormalizer.php
  • src/Normalizers/EnglishStringNormalizer.php
  • src/Contracts/ExpressionGeneratorInterface.php
  • tests/UuidFalsePositiveTest.php
  • src/Contracts/RegistryInterface.php
  • src/Console/Commands/BlaspClearCommand.php
  • src/Contracts/DetectionStrategyInterface.php
  • src/Normalizers/GermanStringNormalizer.php
  • src/Normalizers/Normalize.php
  • src/ProfanityDetector.php
  • src/Abstracts/BaseDetectionStrategy.php
  • config/config.php
  • src/Contracts/DetectionConfigInterface.php
  • src/Contracts/MultiLanguageConfigInterface.php
  • src/Config/DetectionConfig.php
  • src/Registries/DetectionStrategyRegistry.php
  • src/Config/ConfigurationLoader.php
  • src/Registries/LanguageNormalizerRegistry.php
  • src/ServiceProvider.php
  • src/BlaspService.php
  • src/Config/MultiLanguageDetectionConfig.php

Comment thread config/languages/french.php Outdated
Comment thread config/languages/german.php Outdated
Comment thread src/Blaspable.php Outdated
Comment thread src/BlaspManager.php
Comment thread src/BlaspServiceProvider.php
Comment thread src/Drivers/RegexDriver.php
Comment thread src/Drivers/RegexDriver.php
Comment thread src/Middleware/CheckProfanity.php Outdated
Comment thread src/PendingCheck.php
Comment thread tests/StrMacroTest.php Outdated
deemonic and others added 2 commits March 26, 2026 09:16
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Preserve previous state in Blaspable::withoutBlaspChecking() for nested calls
- Reject recursive pipeline driver configuration
- Guard validation rule against non-string input
- Respect except fields when middleware fields config is set
- Apply severity filter before overlap dedup in PatternDriver
- Apply severity filter before masking in RegexDriver
- Use mb_strtolower/mb_strlen in PhoneticMatcher for UTF-8 safety
- Remove unused Dictionary import from BlaspServiceProvider

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/BlaspManager.php`:
- Around line 81-83: Normalize and validate the drivers list before the
self-reference check in BlaspManager: ensure $config['drivers'] exists and is an
array of strings (filter non-strings and cast items to string or throw) and
build $driverNames using strtolower on each entry so the in_array check detects
any case variants of "pipeline"; then perform the in_array('pipeline',
$driverNames, true) guard and throw the InvalidArgumentException if found (this
prevents miscased values from bypassing the guard and avoids TypeError from
array_map when createDriver() / createPipelineDriver() could recurse).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7e68e2a6-9fe2-4aa6-af22-3c68f10b933d

📥 Commits

Reviewing files that changed from the base of the PR and between a67ccd8 and 2728501.

📒 Files selected for processing (7)
  • src/BlaspManager.php
  • src/BlaspServiceProvider.php
  • src/Blaspable.php
  • src/Core/Matchers/PhoneticMatcher.php
  • src/Drivers/PatternDriver.php
  • src/Drivers/RegexDriver.php
  • src/Middleware/CheckProfanity.php
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/Drivers/PatternDriver.php
  • src/Middleware/CheckProfanity.php
  • src/Drivers/RegexDriver.php
  • src/Core/Matchers/PhoneticMatcher.php

Comment thread src/BlaspManager.php Outdated
- RegexDriver: track masked ranges and use immutable normalized string
  for position lookups to prevent offset drift across mutated buffers
- RegexDriver: extract matched text from original input instead of
  normalized string to preserve original casing/characters
- PipelineDriver: pass explicit UTF-8 encoding to mb_substr calls
- PendingCheck: cap cache key tracking with configurable max_tracked_keys
  to prevent unbounded growth
- French config: remove mince/flûte/flute from profanity list as they
  are common benign words that cause false positives
- German config: move schwul/schwule/schwuler/schwules from extreme to
  moderate severity as they are also neutral self-identifiers
- StrMacroTest: replace toString() with (string) cast for Laravel 8 compat

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/Drivers/PipelineDriver.php (1)

63-69: Apply the same UTF-8 fix to PatternDriver and RegexDriver for consistency.

PatternDriver (lines 87–89 in src/Drivers/PatternDriver.php) and RegexDriver (lines 152–154 in src/Drivers/RegexDriver.php) call mb_substr() without explicit encoding parameter, while PipelineDriver explicitly specifies 'UTF-8'. When these drivers are composed in a pipeline, the inconsistent encoding handling could produce different masking results for multibyte input if the internal encoding differs from UTF-8.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Drivers/PipelineDriver.php` around lines 63 - 69, PatternDriver and
RegexDriver currently call mb_substr() without the explicit encoding, causing
inconsistent behavior with PipelineDriver which uses 'UTF-8'; update the
mb_substr() calls in src/Drivers/PatternDriver.php and
src/Drivers/RegexDriver.php to pass 'UTF-8' as the final argument (same as in
PipelineDriver) so multibyte strings are handled consistently in PatternDriver
and RegexDriver (look for the mb_substr usages inside the classes PatternDriver
and RegexDriver and add the 'UTF-8' parameter).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/PendingCheck.php`:
- Around line 321-327: The current use of array_unique($keys) keeps the first
occurrence so frequently reused keys stay at the front and may be FIFO-evicted;
to refresh recency before dedupe/trim, dedupe while preserving the last
occurrence instead of the first: transform $keys so duplicates are removed but
the most recent appearance is kept (e.g. reverse $keys, call array_unique, then
reverse back or use a keyed map to keep last), then apply the existing
max-tracked-keys trimming using the same $maxKeys from
config('blasp.cache.max_tracked_keys', 1000); update the code around $keys,
array_unique and array_slice in PendingCheck (the block manipulating $keys) to
perform this reverse-dedupe-restore sequence so hot keys are considered recent
and not evicted.

---

Nitpick comments:
In `@src/Drivers/PipelineDriver.php`:
- Around line 63-69: PatternDriver and RegexDriver currently call mb_substr()
without the explicit encoding, causing inconsistent behavior with PipelineDriver
which uses 'UTF-8'; update the mb_substr() calls in
src/Drivers/PatternDriver.php and src/Drivers/RegexDriver.php to pass 'UTF-8' as
the final argument (same as in PipelineDriver) so multibyte strings are handled
consistently in PatternDriver and RegexDriver (look for the mb_substr usages
inside the classes PatternDriver and RegexDriver and add the 'UTF-8' parameter).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ff8e6473-56ae-490d-b05c-1e687662615c

📥 Commits

Reviewing files that changed from the base of the PR and between 2728501 and a03f977.

📒 Files selected for processing (6)
  • config/languages/french.php
  • config/languages/german.php
  • src/Drivers/PipelineDriver.php
  • src/Drivers/RegexDriver.php
  • src/PendingCheck.php
  • tests/StrMacroTest.php
✅ Files skipped from review due to trivial changes (1)
  • config/languages/french.php
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Drivers/RegexDriver.php

Comment thread src/PendingCheck.php
Comment on lines +321 to +327
$keys = array_unique($keys);

// Evict oldest keys when exceeding the configured limit
$maxKeys = config('blasp.cache.max_tracked_keys', 1000);
if (count($keys) > $maxKeys) {
$keys = array_slice($keys, -$maxKeys);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Refresh key recency before dedupe/trim to avoid stale cache survivors.

array_unique() keeps the first occurrence, so repeated keys are not moved to the “recent” end. With FIFO-style trimming, actively reused keys can be evicted from blasp_result_cache_keys, and then Dictionary::clearCache() may miss clearing them.

🔧 Proposed fix
-        $keys[] = $key;
-        $keys = array_unique($keys);
+        // Move key to most-recent position while preserving uniqueness
+        $keys = array_values(array_filter($keys, static fn ($k) => $k !== $key));
+        $keys[] = $key;

         // Evict oldest keys when exceeding the configured limit
-        $maxKeys = config('blasp.cache.max_tracked_keys', 1000);
+        $maxKeys = max(1, (int) config('blasp.cache.max_tracked_keys', 1000));
         if (count($keys) > $maxKeys) {
             $keys = array_slice($keys, -$maxKeys);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/PendingCheck.php` around lines 321 - 327, The current use of
array_unique($keys) keeps the first occurrence so frequently reused keys stay at
the front and may be FIFO-evicted; to refresh recency before dedupe/trim, dedupe
while preserving the last occurrence instead of the first: transform $keys so
duplicates are removed but the most recent appearance is kept (e.g. reverse
$keys, call array_unique, then reverse back or use a keyed map to keep last),
then apply the existing max-tracked-keys trimming using the same $maxKeys from
config('blasp.cache.max_tracked_keys', 1000); update the code around $keys,
array_unique and array_slice in PendingCheck (the block manipulating $keys) to
perform this reverse-dedupe-restore sequence so hot keys are considered recent
and not evicted.

… type validation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@deemonic deemonic merged commit a6758c9 into main Mar 26, 2026
3 checks passed
@deemonic deemonic deleted the feature/v4-refactor branch March 26, 2026 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant