Conversation
|
This looks quite hacky solution to me. |
|
it's stored in the thumbnail, which, as long as it's under the 16MB limit, it's fine for mongo. What I could do is making the fastResponse thingy parametric and test if the provided field has length > 5 MB otheriwse keep the nestjs serialisers |
|
The thumbnail field for publishedData is deprecated for APIv4 and only kept for backwards compatibility with APIv3 - the intended way to do this is to ingest an attachment separately (attachments can point to publishedData, proposals and samples besides datasets). |
yes, the spike is after all logic has run, from retrieving from the DB to running the class-transform from DB schema to v3. It also appears both with v3 and v4, so, even if I also suspected it had to do with class-transform, it seems to be after that. Removing the afterAll nestJS interceptors avoids this, so it feels it's there, maybe when nestJS tries to compute the content-length before the response. I am happy to change this PR though, since it's too PSI specific? what I could see is:
Both 1 or 2 would work for us, so happy to do any of the 2 |
We weren't aware that the new way to add a thumbnail was using a dedicated Attachment, I think it makes sense but the migration script doesn't reflect that (simply copies thumbnail to metadata). |
|
Just to share my 2cents.. I don’t have a strong opinion having this change as a temporary hotfix, as long as we aware do not abuse it or rely on it as a permanent pattern.. |
|
@minottic Apologies for the late reply. @fpotier apologies for the incomplete migration script. Will you be able to update the current one to reflect the best practice? So the code changes look good, but I second @Junjiequan that it might be and feel a hacky solution to the problem. |
|
After talking with @minottic, we agreed that the most viable solution at the moment is solution #2 highlighted in here. Leading to SciCatCon2026, we should some research and gather information on how to address this issue and what is the best practices specifically for nestjs. At SciCatCon 2026, we should make sure to have a breakout discussion and decide the course of action. |
|
Some material that can help us finding a viable solution: https://medium.com/@duckweave/nestjs-large-uploads-stream-it-dont-spike-ram-a9cc4b0f1b74 |
Description
The thumbnail in the publication data can be quite big (we have cases of >8MB). This causes, in certain envs (e.g. a pod in k8s) the backend to be killed with OOM. The cause seems to be due to nestjs after serialization logics. This allow to bypass the additional interceptors from nestjs and avoid the memory peak.
If this feature is not desired by other facilities, I will be happy to make it conditional or configurable.