Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "posthog",
"description": "Access PostHog analytics, feature flags, experiments, error tracking, and insights directly from Claude Code. Optionally capture Claude Code sessions to PostHog LLM Analytics.",
"version": "1.1.21",
"version": "1.1.22",
"author": {
"name": "PostHog",
"email": "hey@posthog.com",
Expand Down
2 changes: 1 addition & 1 deletion .codex-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "posthog",
"version": "1.0.20",
"version": "1.0.21",
"description": "Access PostHog analytics, feature flags, experiments, error tracking, and insights directly from Codex",
"author": {
"name": "PostHog",
Expand Down
2 changes: 1 addition & 1 deletion .cursor-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "posthog",
"displayName": "PostHog",
"version": "1.1.17",
"version": "1.1.18",
"description": "Access PostHog analytics, feature flags, experiments, error tracking, and insights directly from Cursor",
"author": {
"name": "PostHog",
Expand Down
2 changes: 1 addition & 1 deletion gemini-extension.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "posthog",
"version": "1.0.19",
"version": "1.0.20",
"description": "Access PostHog analytics, feature flags, experiments, error tracking, and insights directly from Gemini CLI",
"mcpServers": {
"posthog": {
Expand Down
2 changes: 2 additions & 0 deletions skills/.sync-manifest
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,9 @@ instrument-product-analytics
investigate-metric
investigating-replay
managing-experiment-lifecycle
managing-path-cleaning-rules
managing-subscriptions
planning-user-interviews
querying-posthog-data
setting-up-a-data-warehouse-source
signals
Expand Down
180 changes: 180 additions & 0 deletions skills/managing-path-cleaning-rules/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
---
name: managing-path-cleaning-rules
description: 'Inspects URL paths and proposes, tests, orders, and applies project-level path cleaning rules so dynamic segments (numeric IDs, UUIDs, slugs, dates) collapse into readable aliases. Use when the user says "clean the paths", "normalize URLs", "group similar pages", "too many distinct paths", "/users/123 and /users/456 are the same page", "set up path cleaning", or asks why a Web analytics or Paths breakdown is fragmented across thousands of nearly-identical URLs. Covers regex syntax (re2), alias placeholder convention, rule ordering, the test workflow, and applying rules via the project-settings-update MCP tool.'
---

# Managing path cleaning rules

Path cleaning rules normalize `$pathname` and `$entry_pathname` so that pages
sharing the same template (`/users/123/profile`, `/users/456/profile`, …) collapse
into one row (`/users/<id>/profile`) in Web analytics tiles, Paths insights, and
any HogQL query that calls `apply_path_cleaning`. They are the right answer when
a breakdown is fragmented across thousands of near-identical URLs.

This skill teaches you how to:

- recognize when path cleaning is the right tool
- inspect real paths to find what needs cleaning
- write `regex` + `alias` rules in re2 syntax with the project's placeholder
convention
- test rules before saving them
- order rules so specific patterns aren't swallowed by generic ones
- apply the rules via MCP

## Data model

`Team.path_cleaning_filters` is a JSON list of `PathCleaningFilter` objects:

```json
{
"regex": "/users/\\d+/profile",
"alias": "/users/<id>/profile",
"order": 0
}
```

- **`regex`** — a [re2](https://github.com/google/re2/wiki/Syntax) pattern. No
need to escape `/`. Anchor with `^` / `$` when you mean it.
- **`alias`** — the literal replacement. Use angle-bracket placeholders
(`<id>`, `<slug>`, `<uuid>`, `<date>`) by convention so the cleaned path stays
human-readable. The alias is _not_ a regex template — backreferences are not
supported.
- **`order`** — integer. Rules apply **sequentially** in `order` ascending,
each rule's output feeds the next.

Application is `replaceRegexpAll(pathname, regex, alias)` per rule, chained.
Source: `posthog/hogql/property.py:613`.

## Workflow

### 1. Confirm path cleaning is the right move

Ask yourself: is the user complaining about cardinality (too many distinct paths
in a chart), or do they want a per-URL drill-down? Path cleaning is for the
former. If they want per-URL data, suggest a property filter on `$pathname`
instead.

### 2. Inspect the real paths

Don't guess at patterns — query them. With the `execute-sql` MCP tool:

```sql
SELECT properties.$pathname AS path, count() AS views
FROM events
WHERE event = '$pageview'
AND timestamp > now() - INTERVAL 7 DAY
GROUP BY path
ORDER BY views DESC
LIMIT 200
```

Scan the result for:

- numeric IDs: `/users/123`, `/orders/4242`
- UUIDs: `/sessions/8f3c1a3b-…`
- slugs: `/posts/why-i-love-posthog`
- dates: `/archive/2024-09-12`
- locales: `/en-US/`, `/fr-FR/`
- pagination: `?page=3`, `/page/3/`

### 3. Draft regex + alias

| Pattern | Example match | `regex` | `alias` |
| ------------------- | ---------------------- | ---------------------------- | ---------------------- |
| Numeric segment | `/users/123/profile` | `/users/\d+/profile` | `/users/<id>/profile` |
| UUID v4 | `/sessions/8f3c1a3b-…` | `/sessions/[0-9a-f-]{36}` | `/sessions/<uuid>` |
| Slug | `/posts/why-posthog` | `/posts/[a-z0-9-]+$` | `/posts/<slug>` |
| ISO date | `/archive/2024-09-12` | `/archive/\d{4}-\d{2}-\d{2}` | `/archive/<date>` |
| Locale prefix | `/en-US/about` | `^/[a-z]{2}-[A-Z]{2}/` | `/<locale>/` |
| Trailing query/page | `/blog?page=3` | `\?page=\d+$` | (empty alias drops it) |

Anchoring rules of thumb:

- start the regex with `^` only when the segment must be at the beginning of
the path
- end with `$` to keep a generic rule (e.g. `\d+$`) from matching mid-path
segments

### 4. Test before saving

Three options, pick one:

- **Settings page tester**: `/settings/project#path_cleaning` has a built-in
"test path" input that replays the full ordered chain.
- **Project HogQL** (via `execute-sql`):

```sql
SELECT replaceRegexpAll('/users/42/profile', '/users/\d+/profile', '/users/<id>/profile')
```

Chain `replaceRegexpAll` calls in the same order the rules will run if you
want to verify multi-rule interaction.

- **Built-in AI helper**: there is already an `AiRegexHelper` modal accessible
from the rule editor (`Help me with Regex` button) that turns natural
language into a regex. Suggest it to the user when they say "I don't know
regex" — but always validate the output against real paths via the tester.

### 5. Order rules from most-specific to most-general

Sequential application means a generic rule placed first will swallow
everything that should have hit a specific rule.

```text
order=0 /users/me/profile → /users/me/profile (specific, runs first)
order=1 /users/\d+/profile → /users/<id>/profile
order=2 /users/[a-z0-9-]+ → /users/<slug> (catch-all, runs last)
```

If `/users/[a-z0-9-]+` ran first it would also match `/users/me/profile` and
make the more specific rule unreachable.

### 6. Apply via MCP

Use the `project-settings-update` tool with the full list (the field is
replaced, not merged):

```json
{
"path_cleaning_filters": [
{ "regex": "/users/me/profile", "alias": "/users/me/profile", "order": 0 },
{ "regex": "/users/\\d+/profile", "alias": "/users/<id>/profile", "order": 1 },
{ "regex": "/users/[a-z0-9-]+", "alias": "/users/<slug>", "order": 2 }
]
}
```

Always **read the existing rules first** (project settings include
`path_cleaning_filters`) and merge — overwriting silently destroys whatever the
team has already configured.

## Where the rules apply

When the user (or a HogQL query) opts in:

- Web analytics: the **Path cleaning** toggle in the page header
(`PathCleaningToggle.tsx`)
- Paths insights: the path cleaning toggle in the insight filters
- HogQL: any query that calls `apply_path_cleaning(path_expr, team)`

The rules are stored once per project — they are not insight-scoped.

## Common pitfalls

- **Backreferences in `alias` need double-escaping** — ClickHouse's
`replaceRegexpAll` supports `\0` (whole match) and `\1`–`\9` (capture
groups). In a JSON field or SQL string literal the backslash must be
doubled, so use `\\1` in `path_cleaning_filters` / HogQL to get the `\1`
backreference at the ClickHouse layer.
- **Forgetting `$`** — `\d+` without an end anchor matches every numeric run
in any path, so `/blog/2024-09-12/post` becomes
`/blog/<num>-<num>-<num>/post` when you only meant to match the year
segment. Use `\d+$` or `\d+(/|$)` depending on intent.
- **Escaping `/`** — re2 does not require it. `\/` works but adds noise.
- **Case sensitivity** — re2 is case-sensitive by default. Use `(?i)` at the
start of the pattern for case-insensitive matching, e.g. `(?i)/users/\d+`.
- **Replacing the whole list** — `path_cleaning_filters` is overwrite, not
append. Always start from the current list.
- **Rules apply globally** — adding a rule can change historical numbers in
every Web analytics / Paths chart that has cleaning enabled. Warn the user
before applying anything destructive.
Loading