Add PII-safe EIN hash lookup for importers#413
Conversation
|
Deployment failed with the following error: Learn More: https://vercel.com/docs/concepts/projects/project-configuration |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 92a4677b38
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| CREATE UNIQUE INDEX IF NOT EXISTS idx_importers_ein_hash | ||
| ON importers(ein_hash) WHERE ein_hash IS NOT NULL; |
There was a problem hiding this comment.
Avoid bricking migrations on duplicate legacy EINs
On any existing deployment that already has two importer rows with the same non-null ein (which the previous schema allowed, since importers.ein was not unique and registration only checked user_id), the backfill will assign the same ein_hash to both rows and this unique index creation will abort migrate(), preventing the service from starting. If global EIN uniqueness is required, the migration needs to detect/dedupe legacy conflicts first or defer the unique constraint until data has been cleaned.
Useful? React with 👍 / 👎.
| function hashEin(ein?: string): string | null { | ||
| return ein ? createHash("sha256").update(ein).digest("hex") : null; | ||
| } |
There was a problem hiding this comment.
Use a keyed digest for EIN lookup
For the PII-safe lookup path, storing a plain SHA-256 of an EIN is reversible by offline enumeration if a database snapshot or read-only access leaks, because EINs have a small, fixed format. This same digest is persisted for new registrations and backfilled in the migration, so once plaintext EINs are encrypted or removed the lookup column still exposes the identifier; use a secret-keyed HMAC/pepper for deterministic equality instead.
Useful? React with 👍 / 👎.
Summary
Fixes #243.
Adds a PII-safe EIN equality lookup path for importers by storing a SHA-256 hex digest alongside the existing plaintext/encrypted EIN migration path.
Changes
ein_hash TEXTto the freshimporterstable DDL.ALTER TABLE importers ADD COLUMN IF NOT EXISTS ein_hash TEXTein_hashfrom existing plaintexteinvalues withencode(sha256(ein::bytea), 'hex')idx_importers_ein_hashfor non-null hashes.ein_hashwith Nodecrypto.createHash('sha256')during importer registration.ein_hash.db.tsthat plaintext EIN is scheduled for AES-GCM encryption whileein_hashhandles PII-safe lookup.Validation
git diff --checkrg 'WHERE\\s+ein\\s*=|INSERT INTO importers|ein_hash|createHash|sha256\\(ein::bytea\\)' apps/api/src scripts -Snpm run build --workspace=apps/apicurrently reaches pre-existing main-branch TypeScript errors unrelated to this change:src/routes/erasure.ts: possibly undefinedrequestsrc/routes/importers.ts: existingPRICE_ORACLE_CONTRACT_IDenv typing mismatchsrc/stellar.ts: missing@tariffshield/sdktype resolution