Skip to content

Replace page classifier with dit, add -fpt flag#2404

Merged
Mzack9999 merged 10 commits intodevfrom
feat/dit-page-classifier
Mar 4, 2026
Merged

Replace page classifier with dit, add -fpt flag#2404
Mzack9999 merged 10 commits intodevfrom
feat/dit-page-classifier

Conversation

@dogancanbakir
Copy link
Copy Markdown
Member

@dogancanbakir dogancanbakir commented Feb 16, 2026

Proposed changes

Replace built-in Naive Bayes page classifier with dit (20 page types, 8 form types, 79 field types). Add -fpt/-filter-page-type flag for filtering by any page type(s). Deprecate -fep as alias for -fpt error.

  • Replace common/pagetypeclassifier/ with dit.Classifier
  • Add -fpt flag (e.g. -fpt login,captcha,parked)
  • Deprecate -fep with info message
  • KnowledgeBase now includes Forms with form type and field classifications
  • Bump Go to 1.25.7, update CI/CD workflows and Dockerfile

Closes #2403

Proof

  • httpx -u https://github.com/login -json — KnowledgeBase shows PageType: login + Forms
  • -fpt login filters login pages, -fpt error filters error pages
  • -fpt login,error filters multiple types, case-insensitive
  • -fep backward compat filters error pages + shows deprecation message
  • go build ./... and go test ./... pass

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Summary by CodeRabbit

  • Chores

    • Bumped Go toolchain to 1.25.7 and updated base builder image; refreshed dependency set.
  • New Features

    • Added -fpt / --filter-page-type flag to filter output by page type.
  • Deprecations

    • Deprecated -fep / --filter-error-page; retained for backward compatibility with a deprecation notice.
  • Removals

    • Removed the previous page-type classifier, its dataset, and associated tests.
  • Documentation

    • README updated with new flag examples and Go requirement.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Enhancement Most issues will probably ask for additions or changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace page classifier with dit, add -fpt flag

3 participants