Skip to content

Conversation

@jaffrepaul
Copy link
Contributor

@jaffrepaul jaffrepaul commented Feb 2, 2026

Enhance traffic classification for docs site metrics with better bot detection and visibility into unknown traffic.

Summary

  • Add Next.js userAgent() for improved bot detection at middleware level
  • Track "unknown" traffic explicitly instead of silently skipping it
  • Add device type (mobile/tablet/desktop) to metrics for richer segmentation
  • Extract shared patterns and types to src/lib/trafficClassification.ts

Changes

Middleware (src/middleware.ts)

  • Use Next.js built-in userAgent() which has comprehensive bot detection (Google, Bing, social crawlers, etc.)
  • Classify traffic at middleware level and pass via request headers (not response headers)
  • Remove speculative markdown client patterns (vscode, intellij, sublime, got) - keep only confirmed AI tools
  • Clients can still request markdown via Accept header (text/markdown, text/plain)

Traces Sampler (src/tracesSampler.ts)

  • Read middleware classification from x-traffic-type and x-device-type request headers
  • Validate header values to prevent injection of invalid traffic types
  • Fall back to user-agent pattern matching when middleware headers unavailable
  • Track "unknown" traffic explicitly (previously silently skipped, causing blind spots)
  • Add emitSamplingMetric() helper to reduce code repetition

Shared Module (src/lib/trafficClassification.ts)

  • TrafficType - shared type definition
  • AI_AGENT_PATTERN - single source of truth for AI agent detection
  • BOT_PATTERN - single source of truth for bot detection
  • SAMPLE_RATES - lookup object for sample rates by traffic type
  • matchPattern() - shared utility function

Metrics Attributes

The docs.trace.sampled metric now includes:

Attribute Values Description
traffic_type ai_agent, bot, user, unknown Classification of the request
device_type desktop, mobile, tablet, crawler, unknown Device type from user-agent
sample_rate 0, 0.3, 1 The applied sample rate
agent_match e.g. claudebot, cursor Which AI agent pattern matched
bot_match e.g. googlebot, slackbot Which bot pattern matched

IS YOUR CHANGE URGENT?

  • Urgent deadline (GA date, etc.):
  • Other deadline:
  • None: Not urgent, can wait up to 1 week+

SLA

  • Teamwork makes the dream work, so please add a reviewer to your PRs.
  • Please give the docs team up to 1 week to review your PR unless you've added an urgent due date to it.

PRE-MERGE CHECKLIST

  • Checked Vercel preview for correctness, including links
  • PR was reviewed and approved by any necessary SMEs (subject matter experts)
  • PR was reviewed and approved by a member of the Sentry docs team

@vercel
Copy link

vercel bot commented Feb 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
develop-docs Ready Ready Preview, Comment Feb 9, 2026 9:44pm
sentry-docs Ready Ready Preview, Comment Feb 9, 2026 9:44pm

Request Review

jaffrepaul and others added 2 commits February 9, 2026 16:05
Middleware traces (Edge runtime) don't receive normalizedRequest from
Sentry SDK, causing all middleware traffic to be classified as "unknown".
Since each request generates both a middleware trace and a handler trace,
this was inflating the "unknown" count significantly.

Instead of emitting "unknown" when we can't classify traffic, skip the
metric entirely. The handler trace (Node.js) will emit the properly
classified metric.

Also adds case-insensitive header lookup since HTTP headers are
case-insensitive but JS objects are case-sensitive.

Co-Authored-By: Claude <noreply@anthropic.com>
- Use Next.js userAgent() for improved bot detection at middleware level
- Track "unknown" traffic explicitly (previously silently skipped)
- Add device type to metrics for richer segmentation
- Fix critical bug: set classification on request headers (not response)
  so tracesSampler can actually read them
- Extract shared patterns/types to src/lib/trafficClassification.ts
- Add header validation to prevent injection of invalid traffic types
- Remove speculative markdown client patterns (vscode, intellij, sublime, got)
- Add emitSamplingMetric helper to reduce repetition in tracesSampler

Co-Authored-By: Claude <noreply@anthropic.com>
Merge MARKDOWN_CLIENT_PATTERNS and AI_AGENT_PATTERN into a single
shared pattern in trafficClassification.ts. Both markdown serving
and traffic classification now use the same pattern.

Co-Authored-By: Claude <noreply@anthropic.com>
Remove unnecessary abstraction - the helper was only wrapping a single
Sentry.metrics.count() call across 5 call sites.

Co-Authored-By: Claude <noreply@anthropic.com>
@jaffrepaul jaffrepaul force-pushed the fix/trace-sampler-unknown-metrics branch from cd700dc to 7edfa70 Compare February 9, 2026 21:34
Copy link
Member

@sergical sergical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖

@jaffrepaul jaffrepaul changed the title fix(metrics): Skip traffic metrics when user-agent unavailable fix(metrics): Update handling metrics when user-agent unavailable Feb 9, 2026
@jaffrepaul jaffrepaul merged commit e72afb0 into master Feb 9, 2026
15 checks passed
@jaffrepaul jaffrepaul deleted the fix/trace-sampler-unknown-metrics branch February 9, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants