Skip to content

fix(onboarding): parse X URLs with www. and bare-domain prefixes#994

Open
vimzh wants to merge 3 commits into
supermemoryai:mainfrom
vimzh:fix/research-extract-handle-urls
Open

fix(onboarding): parse X URLs with www. and bare-domain prefixes#994
vimzh wants to merge 3 commits into
supermemoryai:mainfrom
vimzh:fix/research-extract-handle-urls

Conversation

@vimzh
Copy link
Copy Markdown

@vimzh vimzh commented May 24, 2026

The extractHandle helper in the onboarding research route relies on a few naive .replace() calls to strip protocol+domain prefixes, then splits on / and ?. Anything other than the exact form https://x.com/handle or https://twitter.com/handle produces garbage. Confirmed locally:

  • https://www.x.com/vansh returns "https:"
  • x.com/vansh returns "x.com"
  • https://mobile.twitter.com/foo returns "https:"

That junk handle then flows into the Grok call, so we burn an LLM request and the user gets back research about the wrong "handle".

This rewrites extractHandle using the URL constructor, mirroring the parseXAccount pattern that already exists in apps/web/app/api/onboarding/account-status/route.ts. A regex fallback handles inputs the URL constructor refuses. After extraction the handle is validated against the 1-15 char alphanumeric+underscore X handle rule, so anything still malformed returns a clean 400 instead of a wasted Grok call.

The parsing logic is now duplicated between this route and account-status. A shared helper would be nice but felt out of scope for this one.

vimzh added 2 commits May 24, 2026 11:44
…ch extractHandle

The previous extractHandle only stripped exact "https://x.com/" / "https://twitter.com/"
prefixes (and one leading "@"), so common URL formats produced garbage:

  https://www.x.com/vansh        -> "https:"
  x.com/vansh                    -> "x.com"
  https://mobile.twitter.com/foo -> "https:"

That junk handle was then passed straight to the Grok call, wasting an LLM
request and returning unusable research to the user.

Rewrite the parser to use the URL constructor (mirroring the more robust
parseXAccount in apps/web/app/api/onboarding/account-status/route.ts) so any
subdomain or bare-domain form works. Validate the resulting handle against the
1-15 char alphanumeric+underscore rule before invoking the model; fail fast
with a 400 instead of burning a model call on a bad input.
Copilot AI review requested due to automatic review settings May 24, 2026 07:03
@graphite-app graphite-app Bot requested a review from Dhravya May 24, 2026 07:03
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the onboarding research API to more robustly extract an X/Twitter handle from various input formats and reject invalid handles early.

Changes:

  • Reworked extractHandle to handle @handle, bare handles, and URL inputs via URL parsing + fallback regex.
  • Added server-side validation enforcing X/Twitter handle constraints (1–15 chars, alphanumeric/underscore) and returning 400 on failure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +10 to +26
function extractHandle(input: string): string {
const trimmed = input.trim()
if (!trimmed) return ""

let handle = trimmed.replace(/^@+/, "")
const lower = handle.toLowerCase()

if (lower.includes("x.com") || lower.includes("twitter.com")) {
try {
const parsed = new URL(
handle.startsWith("http://") || handle.startsWith("https://")
? handle
: `https://${handle}`,
)
handle = parsed.pathname.split("/").filter(Boolean)[0] ?? ""
} catch {
handle = handle.match(/(?:x\.com|twitter\.com)\/([^/\s?#]+)/i)?.[1] ?? ""
Copilot review flagged that 'lower.includes("x.com")' and the regex
fallback match any substring of the input, so a URL like
'https://examplex.com/foo' or 'notwitter.com/foo' would parse, extract
'foo' as a handle, pass the 1-15 alphanumeric guard, and burn a Grok
call on a wrong handle.

Add an isXHost helper that checks the parsed hostname against x.com /
twitter.com (and their subdomains) instead of relying on substring
matches. Anchor the regex fallback to '^', '.', or '/' before the
domain so the same substring trap does not apply there either.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants