Skip to content

feat: add interceptor to bypass serialization for heavy resp#2524

Draft
minottic wants to merge 1 commit intomasterfrom
fast_pub
Draft

feat: add interceptor to bypass serialization for heavy resp#2524
minottic wants to merge 1 commit intomasterfrom
fast_pub

Conversation

@minottic
Copy link
Member

@minottic minottic commented Feb 6, 2026

Description

The thumbnail in the publication data can be quite big (we have cases of >8MB). This causes, in certain envs (e.g. a pod in k8s) the backend to be killed with OOM. The cause seems to be due to nestjs after serialization logics. This allow to bypass the additional interceptors from nestjs and avoid the memory peak.

If this feature is not desired by other facilities, I will be happy to make it conditional or configurable.

@minottic minottic requested a review from a team as a code owner February 6, 2026 15:32
@Junjiequan
Copy link
Member

This looks quite hacky solution to me.
In which field of publishedData do you store the images? I'm not very familiar with the publishedData flow

@minottic
Copy link
Member Author

it's stored in the thumbnail, which, as long as it's under the 16MB limit, it's fine for mongo. What I could do is making the fastResponse thingy parametric and test if the provided field has length > 5 MB otheriwse keep the nestjs serialisers

@HayenNico
Copy link
Member

The thumbnail field for publishedData is deprecated for APIv4 and only kept for backwards compatibility with APIv3 - the intended way to do this is to ingest an attachment separately (attachments can point to publishedData, proposals and samples besides datasets).
My guess would be the memory spike comes from running the publishedData.metadata including the encoded image through validation after

@minottic
Copy link
Member Author

minottic commented Feb 18, 2026

My guess would be the memory spike comes from running the publishedData.metadata including the encoded image through validation after

yes, the spike is after all logic has run, from retrieving from the DB to running the class-transform from DB schema to v3. It also appears both with v3 and v4, so, even if I also suspected it had to do with class-transform, it seems to be after that. Removing the afterAll nestJS interceptors avoids this, so it feels it's there, maybe when nestJS tries to compute the content-length before the response.

I am happy to change this PR though, since it's too PSI specific?

what I could see is:

  1. the FastInterceptor takes another field in the constructor and then I call it in V3 with new FastResponseInterceptor("thumbnail") (and similiar in v4). It will then check if the thumbnail from the data is > 5MB if not it will continue with nestjs defaults, if yes it will run the current logic
  2. I can resize our historical thumbnails and then I modify the DTO to allow thumbnail only smaller than a given size (say 3/5 MB)

Both 1 or 2 would work for us, so happy to do any of the 2

@fpotier
Copy link
Member

fpotier commented Feb 18, 2026

The thumbnail field for publishedData is deprecated for APIv4 and only kept for backwards compatibility with APIv3 - the intended way to do this is to ingest an attachment separately (attachments can point to publishedData, proposals and samples besides datasets). My guess would be the memory spike comes from running the publishedData.metadata including the encoded image through validation after

We weren't aware that the new way to add a thumbnail was using a dedicated Attachment, I think it makes sense but the migration script doesn't reflect that (simply copies thumbnail to metadata).
Anyway, the v3 endpoints are still used by the landing page so the issue would still arises in this case.

@Junjiequan
Copy link
Member

Junjiequan commented Feb 18, 2026

Just to share my 2cents..

I don’t have a strong opinion having this change as a temporary hotfix, as long as we aware do not abuse it or rely on it as a permanent pattern..
My main concern is that bypassing the built-in NestJS serialization process is not a suistainable or NestJS officially supported long-term solution. The root cause seems to be the size of the raw data, which we should address in the future, the way how we manage/store it.

@nitrosx
Copy link
Member

nitrosx commented Feb 27, 2026

@minottic Apologies for the late reply.
I'm not sure I understand what you are trying to do here? Are you just trying to avoid nestjs serialization because with big thumbnails you have a memory spike, but you will still use the thumbnail or not?
If you do not use the thumbnail, can you/we just exclude it from the data returned?
Would that be a possible solution for you?

@fpotier apologies for the incomplete migration script. Will you be able to update the current one to reflect the best practice?

So the code changes look good, but I second @Junjiequan that it might be and feel a hacky solution to the problem.
Of course, I do not want you to stop PSI from completing the migration. Please reach you with DM if we need to discuss further

@minottic minottic marked this pull request as draft February 27, 2026 11:23
@nitrosx
Copy link
Member

nitrosx commented Feb 27, 2026

After talking with @minottic, we agreed that the most viable solution at the moment is solution #2 highlighted in here.
Such solution should prevent memory spikes and will reduce thumbnails size.

Leading to SciCatCon2026, we should some research and gather information on how to address this issue and what is the best practices specifically for nestjs. At SciCatCon 2026, we should make sure to have a breakout discussion and decide the course of action.

@nitrosx
Copy link
Member

nitrosx commented Feb 27, 2026

Some material that can help us finding a viable solution: https://medium.com/@duckweave/nestjs-large-uploads-stream-it-dont-spike-ram-a9cc4b0f1b74

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants