Morphology-aware matching (inflection / lemmatization support)

### Check for existing issues

- [x] Completed

### Describe the feature

**Summary**
Vale currently relies on regex and dictionary-based matching, which makes it difficult to handle inflected word forms (e.g., pluralization, conjugation, declension). This can lead to incomplete rule coverage or complex and hard-to-maintain patterns.

**Problem**
When defining rules, users often need to account for multiple inflected forms of the same word manually. For example, a rule targeting a base form like `run` will not match:

* runs
* ran
* running

To compensate, users must either:

* enumerate all variants explicitly, or
* write complex regex patterns (e.g., `run(s|ning)?`)

This approach is error-prone, and reduces readability and maintainability of rules.
 This limitation becomes significantly more severe in languages with richer morphology, such as German.
  
This gives a short overview about what you can expect for most languages other than English: https://youtu.be/ettP9Ayrho8?is=g0hMRMZqVJE74K8p

**Minimal example (in German)**

Rule:

```yaml
extends: substitution
message: Consider using '%s' instead of '%s'
level: warning
ignorecase: false
swap:
  gut: hervorragend
```

Text:

```
Das ist eine gute Lösung.
```

Current behavior:

* No match

Expected behavior:

* Match based on shared lemma (“gut” → “gute”)

For example, the adjective *„gut“* can appear in many forms depending on case, gender, and number:

* gut
* gute
* guten
* gutem
* guter

Similarly, verbs like *„gehen“* produce forms such as:

* gehe, gehst, geht
* ging, gegangen

Covering these via regex or explicit lists quickly becomes impractical. As a result:

* rule definitions become bloated
* important variants are easily missed
* false negatives increase significantly

This makes Vale harder to use effectively for non-English content and limits its usefulness in multilingual environments.

**Discussion / possible approaches**
I understand that Vale is intentionally lightweight and primarily regex/dictionary-based, and that performance and simplicity are key design goals.

With that in mind, a possible direction could be:

* We could basically do some sort of macro expansion with a syntax that triggers an expansion to all valid forms through a dictionary: "Word A" in a rule becomes "Word A - Variant 1|Word A -Variant 2| ... | Word A - Variant n". * All of that happens before the actual linting so it is only done once per linting run. Potentially, vale could even cache results for performance.

**Benefits**

* Simpler and more maintainable rules
* Better coverage with fewer false negatives
* Improved support for morphologically rich languages (German, Finnish, Slavic languages, etc.)
* Better usability in multilingual teams

**Question**
Would morphology-aware matching be considered within Vale’s scope? I could also try working on it if we agree on an approach that would be accepted as PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Morphology-aware matching (inflection / lemmatization support) #1091

Check for existing issues

Describe the feature

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Morphology-aware matching (inflection / lemmatization support) #1091

Description

Check for existing issues

Describe the feature

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions