Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/parsers.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ def test_my_parser_parse():

- [JSON Parser](parsers/json_parser.md): extracts structured fields from JSON-formatted logs.
- [Template Matcher](parsers/template_matcher.md): matches logs against a predefined set of `<*>` templates.
- [Template Tree Matcher](parsers/template_tree_matcher.md): matches logs against a predefined set of `<*>` templates using a tree structure.
- [LogBatcher Parser](parsers/logbatcher_parser.md): LLM-based parser that infers templates from raw logs with no training data.

Go back to [Index](index.md)
4 changes: 2 additions & 2 deletions docs/parsers/template_matcher.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Template matcher

The remplate matcher is a parser that takes a set of templates and matches them to incoming logs. It extracts parameters from positions marked with the <*> wildcard and returns a ParserSchema with the matched template and the extracted variables.
The template matcher is a parser that takes a set of templates and matches them to incoming logs. It extracts parameters from positions marked with the <*> wildcard and returns a ParserSchema with the matched template and the extracted variables.

| | Schema | Description |
|------------|----------------------------|--------------------|
Expand Down Expand Up @@ -47,7 +47,7 @@ login success: user=<*> source=<*>
Typical MatcherParser config options (fields in config class):

- `method_type`: must match the parser type ("matcher_parser" or configured name).
- `path_templates`: path to the newline-delimited template file.
- `path_templates`: path to the newline-delimited template file. If `null`, the matcher runs without templates.
- `remove_spaces` (bool, default True): remove all spaces during matching.
- `remove_punctuation` (bool, default True): strip punctuation except the `<*>` token.
- `lowercase` (bool, default True): lowercase logs and templates before matching.
Expand Down
79 changes: 79 additions & 0 deletions docs/parsers/template_tree_matcher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Template Tree Matcher

The Template Tree Matcher is a parser that matches incoming logs against a set of templates using a tree-based depth-first search. It provides faster matching for batch workloads compared with the streaming Template Matcher.

This parser wraps functionality from the DetectMatePerformance project: https://github.com/ait-detectmate/DetectMatePerformance. Prefer use the performance implementation when parsing many log lines in non-stream (batch) mode.

| | Schema | Description |
|------------|----------------------------|--------------------|
| **Input** | [LogSchema](../schemas.md) | Unstructured log |
| **Output** | [ParserSchema](../schemas.md) | Structured log |

WARNING: This parser is not yet in a stable release and may behave differently across platforms or hardware.

## Template format

- Templates are plain text lines in a template file.
- Use the token `<*>` to mark wildcard slots.

Example template file (templates.txt):
```text
pid=<*> uid=<*> auid=<*> ses=<*> msg='op=PAM:<*> acct=<*>
login success: user=<*> source=<*>
```

## Configuration

Typical TemplateTreeMatcher config options:

- `method_type` (string): parser type identifier (for example `"tree_matcher"`).
- `path_templates` (string or null): path to the newline-delimited template file. If `null`, the matcher runs without templates.
- `auto_config` (bool): whether to attempt an optional auto-configuration phase (not required).

Note: this matcher removes non-alphanumeric characters from logs and templates before matching, except for the `<*>` token. Ensure your templates are compatible with that normalization.

Example YAML fragment:
```yaml
parsers:
TemplateTreeMatcher:
method_type: tree_matcher
auto_config: False
params:
path_templates: path/to/templates.txt
```

## Usage example

Simple usage — load templates and match a log:

```python
from detectmatelibrary.parsers.tree_matcher import TemplateTreeMatcher
from detectmatelibrary import schemas

# instantiate parser (config can be a dict or a config object)
cfg = {
"parsers": {
"TreeMatcher": {
"method_type": "tree_matcher",
"params": {
"path_templates": "tests/test_folder/test_templates.txt"
}
}
}
}

parser = TemplateTreeMatcher(name="TreeMatcher", config=cfg)

# match a log
input_log = schemas.LogSchema({
"logID": "0",
"log": "pid=9699 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:accounting acct=\"root\"'"
})
parsed = parser.process(input_log) # or call parser.parse / parser.match depending on the API

# parsed is a ParserSchema. Inspect fields:
print(parsed.template) # matched template text
print(parsed.variables) # list of extracted parameters
```

Go back to [Index](../index.md)
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ nav:
- Detectors: detectors.md
- Parsers Methods:
- Template Matcher: parsers/template_matcher.md
- Template Tree Matcher: parsers/template_tree_matcher.md
- Json Parser: parsers/json_parser.md
- LogBatcher Parser: parsers/logbatcher_parser.md
- Detectors Methods:
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ dependencies = [
"pydantic>=2.11.7",
"pyyaml>=6.0.3",
"regex>=2025.11.3",
"kafka-python>=2.3.0",
"openai>=2.26.0",
"tenacity>=9.1.4",
"scipy>=1.17.1",
Expand All @@ -19,6 +18,7 @@ dependencies = [
"numpy>=2.3.2",
"pandas>=2.3.2",
"polars>=1.38.1",
"detectmateperformance @ git+https://github.com/ait-detectmate/DetectMatePerformance@main",
]

[dependency-groups]
Expand Down
46 changes: 46 additions & 0 deletions src/detectmatelibrary/parsers/tree_matcher.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
from detectmatelibrary.common.parser import CoreParser, CoreParserConfig
from detectmatelibrary import schemas

from detectmateperformance.match_tree import TreeMatcher
from detectmateperformance.types_ import LogTemplates

from typing import Any
import warnings


class TemplateTreeMatcherConfig(CoreParserConfig):
method_type: str = "tree_matcher"

path_templates: str | None = None


class TemplateTreeMatcher(CoreParser):
def __init__(
self,
name: str = "TreeMatcher",
config: TemplateTreeMatcherConfig | dict[str, Any] = TemplateTreeMatcherConfig()
) -> None:

if isinstance(config, dict):
config = TemplateTreeMatcherConfig.from_dict(config, name)
super().__init__(name=name, config=config)

self.config: TemplateTreeMatcherConfig
if self.config.path_templates is None:
self.tree = TreeMatcher(LogTemplates([]))
else:
self.tree = TreeMatcher.from_file(self.config.path_templates)
warnings.warn("Warning, this method is still in prototype phase, it may not be stable")

def parse(
self,
input_: schemas.LogSchema,
output_: schemas.ParserSchema
) -> None:

parsed = self.tree.match_log(input_["log"], get_var=True)

output_["EventID"] = parsed.get_all_events_ids()[0]
values = parsed[0]
output_["variables"].extend(values[1].split(" "))
output_["template"] = values[0].replace("VAR", "<*>")
47 changes: 47 additions & 0 deletions tests/test_parsers/test_tree_macther.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
"""Most of the functionality is test it in DetectMatePerformance."""
from detectmatelibrary.parsers.tree_matcher import TemplateTreeMatcher
from detectmatelibrary import schemas


test_template = [
"pid <*> uid <*> auid <*> ses <*> msg op PAM <*> acct <*>",
]
test_log_match = 'pid=9699 uid=0 auid=4294967295 ses=4294967295 msg=\'op=PAM:accounting acct="root"'
test_log_no_match = 'this is not matching'


class TestMatcherParserBasic:
def test_no_path_templates(self):
config_dict = {
"parsers": {
"TreeMatcher": {
"method_type": "tree_matcher",
}
}
}
parser = TemplateTreeMatcher(name="TreeMatcher", config=config_dict)
input_log = schemas.LogSchema({"log": test_log_match})
output_data = schemas.ParserSchema()
parser.parse(input_log, output_data)

assert output_data.template == "template not found"
assert output_data.EventID == -1
assert output_data.variables == [""]

def test_successful_match(self):
config_dict = {
"parsers": {
"TreeMatcher": {
"method_type": "tree_matcher",
"path_templates": "tests/test_folder/test_templates.txt"
}
}
}
parser = TemplateTreeMatcher(name="TreeMatcher", config=config_dict)
input_log = schemas.LogSchema({"log": test_log_match})
output_data = schemas.ParserSchema()
parser.parse(input_log, output_data)

assert output_data.template == test_template[0]
assert output_data.EventID == 0
assert len(output_data.variables) > 0
Loading
Loading