-
-
Notifications
You must be signed in to change notification settings - Fork 192
Description
Check for existing issues
- Completed
Describe the feature
Summary
Vale currently relies on regex and dictionary-based matching, which makes it difficult to handle inflected word forms (e.g., pluralization, conjugation, declension). This can lead to incomplete rule coverage or complex and hard-to-maintain patterns.
Problem
When defining rules, users often need to account for multiple inflected forms of the same word manually. For example, a rule targeting a base form like run will not match:
- runs
- ran
- running
To compensate, users must either:
- enumerate all variants explicitly, or
- write complex regex patterns (e.g.,
run(s|ning)?)
This approach is error-prone, and reduces readability and maintainability of rules.
This limitation becomes significantly more severe in languages with richer morphology, such as German.
This gives a short overview about what you can expect for most languages other than English: https://youtu.be/ettP9Ayrho8?is=g0hMRMZqVJE74K8p
Minimal example (in German)
Rule:
extends: substitution
message: Consider using '%s' instead of '%s'
level: warning
ignorecase: false
swap:
gut: hervorragendText:
Das ist eine gute Lösung.
Current behavior:
- No match
Expected behavior:
- Match based on shared lemma (“gut” → “gute”)
For example, the adjective „gut“ can appear in many forms depending on case, gender, and number:
- gut
- gute
- guten
- gutem
- guter
Similarly, verbs like „gehen“ produce forms such as:
- gehe, gehst, geht
- ging, gegangen
Covering these via regex or explicit lists quickly becomes impractical. As a result:
- rule definitions become bloated
- important variants are easily missed
- false negatives increase significantly
This makes Vale harder to use effectively for non-English content and limits its usefulness in multilingual environments.
Discussion / possible approaches
I understand that Vale is intentionally lightweight and primarily regex/dictionary-based, and that performance and simplicity are key design goals.
With that in mind, a possible direction could be:
- We could basically do some sort of macro expansion with a syntax that triggers an expansion to all valid forms through a dictionary: "Word A" in a rule becomes "Word A - Variant 1|Word A -Variant 2| ... | Word A - Variant n". * All of that happens before the actual linting so it is only done once per linting run. Potentially, vale could even cache results for performance.
Benefits
- Simpler and more maintainable rules
- Better coverage with fewer false negatives
- Improved support for morphologically rich languages (German, Finnish, Slavic languages, etc.)
- Better usability in multilingual teams
Question
Would morphology-aware matching be considered within Vale’s scope? I could also try working on it if we agree on an approach that would be accepted as PR.