Skip to content

refactor: migrate to AWS SDK v3 (S3)#2293

Draft
aalemayhu wants to merge 1 commit into
mainfrom
refactor/aws-sdk-v3-s3
Draft

refactor: migrate to AWS SDK v3 (S3)#2293
aalemayhu wants to merge 1 commit into
mainfrom
refactor/aws-sdk-v3-s3

Conversation

@aalemayhu
Copy link
Copy Markdown
Contributor

@aalemayhu aalemayhu commented May 15, 2026

Summary

Migrate the S3 surface from aws-sdk v2 (end-of-support 2024-09-08) to the modular @aws-sdk/client-s3 v3. The prod box's pm2 logs print the EOL warning on every restart; this clears it and re-enables security updates.

Files migrated

  • src/lib/storage/StorageHandler.ts — central S3Client; getFileContents now drains the v3 stream into a Buffer so consumers that read .Body keep working unchanged. getContents properly paginates via ContinuationToken. getPresignedUrl uses @aws-sdk/s3-request-presigner.
  • src/services/DownloadService.ts — return type S3.BodyBuffer; isMissingDownloadError structural check.
  • src/infrastracture/adapters/fileConversion/convertPDFToImages.ts — param type S3.BodyBuffer | Uint8Array | string.
  • src/infrastracture/adapters/fileConversion/ConvertPPTToPDF.ts — same.
  • src/usecases/uploads/getPackagesFromZip.ts — same.
  • src/lib/zip/zip.tsxFile.contents typed against Buffer | Uint8Array | string. Outside the original migration plan — flagged because it was a touch-up the worker did to satisfy types, not part of the file list from the research phase. Worth a careful read.
  • package.json — drop aws-sdk ^2.1502.0; add @aws-sdk/client-s3 ^3.1048.0 and @aws-sdk/s3-request-presigner ^3.1048.0 (both pinned to the same minor — mismatched versions fail tsc on @smithy/types).

Static e2e checks (all 7 passed locally)

  • pnpm test — 169/169 suites green, 1179 tests pass
  • pnpm typecheck — clean
  • pnpm build — clean
  • grep -rE \"from 'aws-sdk'\" src/ — 0 hits
  • grep -rE \"require\\('aws-sdk'\\)\" src/ — 0 hits
  • grep '\"aws-sdk\":' package.json — 0 hits
  • grep '@aws-sdk/client-s3' package.json — 1 hit

What the static checks DON'T cover (and why this PR is held in draft)

The Jest suite mocks the SDK at the module boundary, so no test in this PR actually exercises the v3 client against real S3 stream-handling behavior. The greps confirm "no v2 left," not "v3 works correctly." The risky paths are dynamic and need manual e2e:

Manual e2e recipe (run in a fresh session before flipping ready)

  1. pnpm dev — start server + web.
  2. PPTX with embedded images. Upload a .pptx that has at least one embedded image. Exercises: ConvertPPTToPDF → S3 putObject → convertPDFToImages → S3 getObject → image array. Confirm: the .apkg downloads, imports into Anki, cards render with the right media.
  3. ZIP upload. Upload a multi-deck .zip. Exercises: getPackagesFromZip → S3 list/get. Confirm: each deck inside the zip converts and downloads.
  4. Large file (>50 MB). Upload one big .apkg or PDF. Watch pnpm dev's memory footprint. The new getFileContents drains the v3 stream into a Buffer in one shot via transformToByteArray() — large files now occupy full memory before any consumer sees them, vs. v2 which also returned a Buffer but with different backpressure characteristics. Mostly a yellow flag, not red — but worth watching once.
  5. Showcase / deck preview. Open any showcase deck on the running app. Exercises StorageHandler.getFileContents in the read-only path.
  6. Confirm pm2 logs on the dev terminal shows no AWS SDK v2 EOL warning at startup.

If any of the above breaks, the fix is local to the worker's diff — see "Notes for reviewer" below for the most likely culprits.

Notes for reviewer

  • Pagination bug fix (behavior change beyond the SDK swap). The old getContents called listObjects with the same params every iteration (no Marker advancement), so it could only ever return page 1 — capped at 1000 objects. v3 uses ContinuationToken correctly. If any downstream code assumed "showcase will never have more than 1000 objects" or "this list is always exhaustive on first call," it now sees more results. Worth a grep for callers of getContents() and a sanity check.
  • Stream-to-Buffer drain. getFileContents uses transformToByteArray() then wraps in Buffer.from(...). This loads the entire object into memory before returning. Fine for typical decks (single-digit MB) but a regression risk for very large .apkgs. If the deploy reveals memory pressure, the fix is to stream directly to the consumer rather than buffer first — but that's a follow-up, not a blocker.
  • zip.tsx change is wider than the file list. The original research plan listed 5 source files; the worker added src/lib/zip/zip.tsx to satisfy types. Quick read recommended.

Sonar

sonar-scanner was not installed in the worker environment; expect a possible Sonar bounce on the first CI run after flipping ready. Happy to address any cognitive-complexity smells in a follow-up commit.

Pre-merge checklist

Per CLAUDE.md ("Touching auth, payments, or external-API integration? Run /security-review before merge."):

  • Manual e2e recipe (steps 1–6 above) executed locally with no regressions
  • /security-review run against the diff (S3 = external-API integration)
  • All statusCheckRollup entries non-FAILURE
  • pm2 logs on prod confirmed no v2 EOL warning post-deploy

🤖 Generated with Claude Code


<a href="https://app.blacksmith.sh/2anki/codesmith/server/pr/2293\"><source media="(prefers-color-scheme: dark)" srcset="https://pr-comments-assets.blacksmith.sh/codesmith/view-in-codesmith-dark.svg\"><source media="(prefers-color-scheme: light)" srcset="https://pr-comments-assets.blacksmith.sh/codesmith/view-in-codesmith-light.svg\"><img alt="View in Codesmith" src="https://pr-comments-assets.blacksmith.sh/codesmith/view-in-codesmith-dark.svg\">
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

aws-sdk v2 has reached end-of-support (the prod box's pm2 logs show the
EOL warning on every restart). Replace it with the modular v3 client so
security updates resume and the warning clears.

Why now: AWS announced v2 end-of-support on 2024-09-08; security patches
have stopped. https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-javascript-v2/

Changes:
- StorageHandler: rebuilt on @aws-sdk/client-s3 S3Client; getFileContents
  now eagerly drains the v3 Readable body into a Buffer so existing
  consumers that read `.Body` keep working unchanged.
- getContents: paginates via ContinuationToken (the v2 loop never
  advanced — IsTruncated alone could not move past page 1).
- getPresignedUrl: uses @aws-sdk/s3-request-presigner.
- DownloadService, convertPDFToImages, ConvertPPTToPDF, getPackagesFromZip,
  zip.tsx: replace `S3.Body`/`Body` type with `Buffer | Uint8Array | string`
  (the body shapes the conversion paths actually receive).
- DownloadService.isMissingDownloadError: switch from the removed
  `AWS.AWSError` ambient type to a structural check on `error.name`.

Deps:
- Drop `aws-sdk` ^2.1502.0.
- Add `@aws-sdk/client-s3` ^3.1048.0 and `@aws-sdk/s3-request-presigner`
  ^3.1048.0 (pinned to the same version to keep `@smithy/types` deduped;
  mismatched minors fail tsc with HandlerExecutionContext incompatibility).

No user-visible behavior change; no changelog entry.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant