Skip to content

feat: add file content identification with MIME type detection#5

Draft
Copilot wants to merge 6 commits intomainfrom
copilot/add-file-content-identification
Draft

feat: add file content identification with MIME type detection#5
Copilot wants to merge 6 commits intomainfrom
copilot/add-file-content-identification

Conversation

Copy link
Copy Markdown

Copilot AI commented Dec 3, 2025

Adds automatic content type detection to identify HTTP requests, URL lists, and file MIME types.

Changes

New pkg/contentid package

  • MIME type detection via gabriel-vasile/mimetype
  • HTTP request detection (GET/POST/etc with headers)
  • URL list detection (70%+ URL content threshold)
  • Pure Go implementation, no CGO dependencies

CLI integration

  • Added -identify flag for standalone content identification
  • Supports both human-readable and JSON output (-jsonl)
  • Works with existing -f and -l file input flags

Usage

# Human-readable output
ph -identify -f /path/to/files/*.txt

# JSON output
ph -identify -jsonl -f /path/to/files/*.txt

Example output:

/tmp/request.txt: HTTP request [text/plain; charset=utf-8] (HTTP Request)
/tmp/urls.txt: URL list [text/plain; charset=utf-8] (URL List)
/tmp/data.json: JSON data [application/json]

Note: Uses mimetype library instead of magika to avoid CGO/ONNX Runtime complexity while providing equivalent functionality for the use case.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com//advisories
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>feat: add file content identification</issue_title>
<issue_description>ph should be able to detect what type of content it is (raw request, file with urls, etc...)

We can use magika for this.</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 5 commits December 3, 2025 19:35
Co-authored-by: ocervell <9629314+ocervell@users.noreply.github.com>
Co-authored-by: ocervell <9629314+ocervell@users.noreply.github.com>
Co-authored-by: ocervell <9629314+ocervell@users.noreply.github.com>
Co-authored-by: ocervell <9629314+ocervell@users.noreply.github.com>
Co-authored-by: ocervell <9629314+ocervell@users.noreply.github.com>
Copilot AI changed the title [WIP] Add file content identification feature feat: add file content identification with MIME type detection Dec 3, 2025
Copilot AI requested a review from ocervell December 3, 2025 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add file content identification

2 participants