Skip to content

Reject canonical duplicate bounty submission URLs#254

Closed
yui-stingray wants to merge 1 commit into
ramimbo:mainfrom
yui-stingray:yui/bounty-228-canonical-submission-url
Closed

Reject canonical duplicate bounty submission URLs#254
yui-stingray wants to merge 1 commit into
ramimbo:mainfrom
yui-stingray:yui/bounty-228-canonical-submission-url

Conversation

@yui-stingray
Copy link
Copy Markdown
Contributor

Bounty #228

Summary

  • Canonicalize bounty submission URLs before duplicate checks and storage.
  • Normalize GitHub issue/pull URLs for scheme, host, owner, repo, issue/pull segment casing, and trailing-slash variants.
  • Compare existing legacy submission URLs through the same canonical form before allowing another bounty payment.
  • Preserve query strings and fragments, and keep non-GitHub trailing slashes unchanged.

Evidence

Before this change, a multi-award bounty could treat equivalent GitHub proof URLs such as https://GITHUB.com/Ramimbo/MergeWork/PULL/12/ and https://github.com/ramimbo/mergework/pull/12 as separate submissions.

Validation

  • uv run --extra dev python -m pytest tests/test_ledger.py -q --capture=no -> 19 passed
  • uv run --extra dev python -m pytest -q --capture=no -> 208 passed, 2 warnings
  • uv run --extra dev ruff check app/ledger/service.py tests/test_ledger.py -> passed
  • uv run --extra dev ruff format --check app/ledger/service.py tests/test_ledger.py -> passed
  • uv run --extra dev python -m mypy app -> passed
  • uv run --extra dev python scripts/docs_smoke.py -> docs smoke ok
  • uv run --extra dev python scripts/check_agents.py -> AGENTS.md ok
  • git diff --check -> clean

No secrets, wallet material, private vulnerability details, deployment values, payout details, or MRWK price claims are included.

Copy link
Copy Markdown

@weilixiong weilixiong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review ✅ APPROVED

QUALITY_GATE: LOW — 2f/20+, reject canonical duplicate submission URLs.

  • Uses urlunparse to normalize URLs before comparison
  • GITHUB_SUBMISSION_PATH_RE regex for GitHub issue/PR URL matching
  • Prevents duplicate submissions with same canonical URL
  • Test coverage confirmed

Bounty: MW #219 (round 7, 40 MRWK)

Copy link
Copy Markdown
Contributor

@MolhamHamwi MolhamHamwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers found on this change. Evidence checked:

  • Inspected app/ledger/service.py canonicalization and duplicate-detection path: pay_bounty() now stores canonical submission URLs, performs an exact canonical match first, and scans legacy submissions through the same canonicalizer for backward compatibility.
  • Checked that GitHub issue/PR URLs normalize scheme/host/repo/kind case and trailing slash while preserving query/fragment and leaving non-GitHub path semantics intact.
  • Inspected tests/test_ledger.py coverage for current canonical duplicates, legacy mixed-case/trailing-slash duplicates, and non-GitHub trailing slash preservation.
  • Local validation on PR head 270b3be: uv run --python 3.12 --extra dev python -m pytest tests/test_ledger.py -q -> 19 passed; uv run --python 3.12 --extra dev ruff check app/ledger/service.py tests/test_ledger.py -> passed; uv run --python 3.12 --extra dev ruff format --check app/ledger/service.py tests/test_ledger.py -> already formatted; git diff --check origin/main...HEAD -> clean.
  • Hosted CI Quality, readiness, docs, and image checks is green.

No secrets, wallet material, payout details, private deployment values, private vulnerability details, or MRWK price claims were reviewed or disclosed.

Copy link
Copy Markdown

@Thanhdn1984 Thanhdn1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for Bounty #219 / PR #254:

I reviewed the canonical submission URL guard and the regression coverage. The change looks sound.

Checks performed:

  • Confirmed canonical_submission_url() validates via validate_public_url() first, then only normalizes GitHub issue/pull URLs when host is exactly github.com and URL params are absent.
  • Confirmed normalization covers scheme/host casing, owner/repo casing, issues|pull segment casing, and one trailing slash via GITHUB_SUBMISSION_PATH_RE.fullmatch().
  • Confirmed non-GitHub URLs keep path semantics, including trailing slash, via test_submission_url_canonicalization_preserves_non_github_trailing_slash.
  • Confirmed both new-payment and legacy-stored-payment duplicate paths are tested: test_multi_award_bounty_rejects_canonical_duplicate_submission_url and test_multi_award_bounty_rejects_legacy_canonical_duplicate_submission_url.

One small note, not blocking: the PR body says query strings/fragments are preserved. They are preserved, so GitHub URLs that differ only by query/fragment still remain distinct. That matches the stated behavior, but if the product goal is strict duplicate prevention for PR/issue evidence, maintainers may later want to decide whether ?utm=... or #... variants should collapse too.

No security/privacy concerns found in the diff.

@ramimbo ramimbo added the mrwk:rejected Submission rejected label May 26, 2026
@ramimbo
Copy link
Copy Markdown
Owner

ramimbo commented May 26, 2026

Closing this stale small-fix PR to clear the filled #228/#292 queue. #292 is now filled, no fresh small-fixes round is open, and this remaining work is not accepted in this pass.

@ramimbo ramimbo closed this May 26, 2026
@yui-stingray yui-stingray deleted the yui/bounty-228-canonical-submission-url branch May 26, 2026 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mrwk:rejected Submission rejected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants