Skip to content

Add PII-safe EIN hash lookup for importers#413

Open
drsteinerdj wants to merge 1 commit into
vjuliaife:mainfrom
drsteinerdj:bounty/ein-hash-importers
Open

Add PII-safe EIN hash lookup for importers#413
drsteinerdj wants to merge 1 commit into
vjuliaife:mainfrom
drsteinerdj:bounty/ein-hash-importers

Conversation

@drsteinerdj

Copy link
Copy Markdown

Summary

Fixes #243.

Adds a PII-safe EIN equality lookup path for importers by storing a SHA-256 hex digest alongside the existing plaintext/encrypted EIN migration path.

Changes

  • Adds ein_hash TEXT to the fresh importers table DDL.
  • Adds an idempotent migration for existing deployments:
    • ALTER TABLE importers ADD COLUMN IF NOT EXISTS ein_hash TEXT
    • backfills ein_hash from existing plaintext ein values with encode(sha256(ein::bytea), 'hex')
    • creates the partial unique index idx_importers_ein_hash for non-null hashes.
  • Computes ein_hash with Node crypto.createHash('sha256') during importer registration.
  • Updates the admin importer registration script so scripted/imported records also populate ein_hash.
  • Documents in db.ts that plaintext EIN is scheduled for AES-GCM encryption while ein_hash handles PII-safe lookup.

Validation

  • git diff --check
  • rg 'WHERE\\s+ein\\s*=|INSERT INTO importers|ein_hash|createHash|sha256\\(ein::bytea\\)' apps/api/src scripts -S
  • npm run build --workspace=apps/api currently reaches pre-existing main-branch TypeScript errors unrelated to this change:
    • src/routes/erasure.ts: possibly undefined request
    • src/routes/importers.ts: existing PRICE_ORACLE_CONTRACT_ID env typing mismatch
    • src/stellar.ts: missing @tariffshield/sdk type resolution

@vercel

vercel Bot commented Jun 27, 2026

Copy link
Copy Markdown

Deployment failed with the following error:

The `vercel.json` schema validation failed with the following message: should NOT have additional property `rootDirectory`

Learn More: https://vercel.com/docs/concepts/projects/project-configuration

@vercel

vercel Bot commented Jun 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
tariff-shield-web Error Error Jun 27, 2026 7:28pm

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92a4677b38

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/api/src/db.ts
Comment on lines +328 to +329
CREATE UNIQUE INDEX IF NOT EXISTS idx_importers_ein_hash
ON importers(ein_hash) WHERE ein_hash IS NOT NULL;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid bricking migrations on duplicate legacy EINs

On any existing deployment that already has two importer rows with the same non-null ein (which the previous schema allowed, since importers.ein was not unique and registration only checked user_id), the backfill will assign the same ein_hash to both rows and this unique index creation will abort migrate(), preventing the service from starting. If global EIN uniqueness is required, the migration needs to detect/dedupe legacy conflicts first or defer the unique constraint until data has been cleaned.

Useful? React with 👍 / 👎.

Comment on lines +25 to +27
function hashEin(ein?: string): string | null {
return ein ? createHash("sha256").update(ein).digest("hex") : null;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use a keyed digest for EIN lookup

For the PII-safe lookup path, storing a plain SHA-256 of an EIN is reversible by offline enumeration if a database snapshot or read-only access leaks, because EINs have a small, fixed format. This same digest is persisted for new registrations and backfilled in the migration, so once plaintext EINs are encrypted or removed the lookup column still exposes the identifier; use a secret-keyed HMAC/pepper for deterministic equality instead.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ein_hash Column (SHA-256 of EIN) to importers for PII-Safe Queries

1 participant