fix(onboarding): parse X URLs with www. and bare-domain prefixes#994
Open
vimzh wants to merge 3 commits into
Open
fix(onboarding): parse X URLs with www. and bare-domain prefixes#994vimzh wants to merge 3 commits into
vimzh wants to merge 3 commits into
Conversation
…ch extractHandle The previous extractHandle only stripped exact "https://x.com/" / "https://twitter.com/" prefixes (and one leading "@"), so common URL formats produced garbage: https://www.x.com/vansh -> "https:" x.com/vansh -> "x.com" https://mobile.twitter.com/foo -> "https:" That junk handle was then passed straight to the Grok call, wasting an LLM request and returning unusable research to the user. Rewrite the parser to use the URL constructor (mirroring the more robust parseXAccount in apps/web/app/api/onboarding/account-status/route.ts) so any subdomain or bare-domain form works. Validate the resulting handle against the 1-15 char alphanumeric+underscore rule before invoking the model; fail fast with a 400 instead of burning a model call on a bad input.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates the onboarding research API to more robustly extract an X/Twitter handle from various input formats and reject invalid handles early.
Changes:
- Reworked
extractHandleto handle@handle, bare handles, and URL inputs via URL parsing + fallback regex. - Added server-side validation enforcing X/Twitter handle constraints (1–15 chars, alphanumeric/underscore) and returning
400on failure.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+10
to
+26
| function extractHandle(input: string): string { | ||
| const trimmed = input.trim() | ||
| if (!trimmed) return "" | ||
|
|
||
| let handle = trimmed.replace(/^@+/, "") | ||
| const lower = handle.toLowerCase() | ||
|
|
||
| if (lower.includes("x.com") || lower.includes("twitter.com")) { | ||
| try { | ||
| const parsed = new URL( | ||
| handle.startsWith("http://") || handle.startsWith("https://") | ||
| ? handle | ||
| : `https://${handle}`, | ||
| ) | ||
| handle = parsed.pathname.split("/").filter(Boolean)[0] ?? "" | ||
| } catch { | ||
| handle = handle.match(/(?:x\.com|twitter\.com)\/([^/\s?#]+)/i)?.[1] ?? "" |
Copilot review flagged that 'lower.includes("x.com")' and the regex
fallback match any substring of the input, so a URL like
'https://examplex.com/foo' or 'notwitter.com/foo' would parse, extract
'foo' as a handle, pass the 1-15 alphanumeric guard, and burn a Grok
call on a wrong handle.
Add an isXHost helper that checks the parsed hostname against x.com /
twitter.com (and their subdomains) instead of relying on substring
matches. Anchor the regex fallback to '^', '.', or '/' before the
domain so the same substring trap does not apply there either.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The extractHandle helper in the onboarding research route relies on a few naive .replace() calls to strip protocol+domain prefixes, then splits on
/and?. Anything other than the exact formhttps://x.com/handleorhttps://twitter.com/handleproduces garbage. Confirmed locally:https://www.x.com/vanshreturns"https:"x.com/vanshreturns"x.com"https://mobile.twitter.com/fooreturns"https:"That junk handle then flows into the Grok call, so we burn an LLM request and the user gets back research about the wrong "handle".
This rewrites extractHandle using the URL constructor, mirroring the parseXAccount pattern that already exists in
apps/web/app/api/onboarding/account-status/route.ts. A regex fallback handles inputs the URL constructor refuses. After extraction the handle is validated against the 1-15 char alphanumeric+underscore X handle rule, so anything still malformed returns a clean 400 instead of a wasted Grok call.The parsing logic is now duplicated between this route and account-status. A shared helper would be nice but felt out of scope for this one.