fix(backend): batch processed file ownership validation#6725
Conversation
3d25b1b to
8f9ed32
Compare
8f9ed32 to
93f70d4
Compare
| .validateFilenameCharacters(fileMapping) | ||
| .validateFilenamesAreUnique(fileMapping) | ||
| .validateCategoriesMatchOutputSchema(fileMapping, organism) | ||
| .validateMultipartUploads(fileMapping.fileIds) |
There was a problem hiding this comment.
validateMultipartUploads also calls the db (FilesTable), so this will still be done once per entry
| .validateFilenamesAreUnique(fileMapping) | ||
| .validateCategoriesMatchOutputSchema(fileMapping, organism) | ||
| .validateMultipartUploads(fileMapping.fileIds) | ||
| .validateFilesExist(fileMapping.fileIds) |
There was a problem hiding this comment.
this also calls the FilesTable twice actually for all fileIds
| organism, | ||
| processedFiles, | ||
| ) | ||
| validateFilesBelongToSubmittingGroups(filesToValidate) |
There was a problem hiding this comment.
I think the other file validations that call the db: validateMultipartUploads and validateFilesExist can be added here, so that they only call the db once per batch (they take a set of fileIds as input so I dont see an issue moving them unless I overlooked sth)
There was a problem hiding this comment.
Moved validateMultipartUploads and validateFilesExist out of the per-entry loop.
|
@claude review |
|
Claude finished @anna-parker's task in 2m 43s —— View job Code Review
SummaryThe PR correctly batches Issues1.
|
There was a problem hiding this comment.
this is a preexisting issue but Im now concerned the stream is not being closed correctly, I believe ServletInputStream (the inputStream) handles closing itsself but Im not sure this BufferedReader class does... probably this isnt an issue as weve been doing it but might be good to check
There was a problem hiding this comment.
There was a problem hiding this comment.
doesnt seem to close the stream on its own? https://docs.oracle.com/javase/8/docs/api/java/io/BufferedReader.html
There was a problem hiding this comment.
@claude what are your thoughts? do we need to close this stream explicitly?
There was a problem hiding this comment.
Claude finished @anna-parker's task in 1m 7s —— View job
- Read the relevant code around line 274
- Analyze BufferedReader / ServletInputStream lifecycle
- Post answer
Stream Closing Analysis
Short answer: no resource leak in practice, but the use {} idiom is still the right fix.
Here's the chain:
request.inputStream ← ServletInputStream (managed by servlet container)
└─ InputStreamReader(...)
└─ BufferedReader(...) ← never closed
Why it isn't technically a leak:
request.inputStream is a ServletInputStream whose lifecycle is owned by the servlet container (Tomcat/Jetty etc.). Per the Servlet spec (§3.1.1), the container closes the stream after the response is committed — regardless of whether application code closed it first. So even if BufferedReader is never closed, no file descriptor or socket leaks.
Why it's still worth fixing:
BufferedReaderimplementsAutoCloseable, so leaving it unclosed will trip static analysis (e.g. detekt) and is a bad pattern to copy.- If an exception is thrown mid-stream (which does happen here — validation errors), the
BufferedReaderis abandoned without flushing any internal buffer. That's harmless for a read-only reader, but structurally sloppy. - The
lineSequence()on an unclosed reader is fine as long as the sequence is fully consumed (it is, via.forEach), butuse {}makes the ownership explicit.
The fix is a one-liner — wrap with Kotlin's use {}:
BufferedReader(InputStreamReader(inputStream)).use { reader ->
reader.lineSequence().chunked(streamBatchSize).forEach { lines ->
...
}
}Calling close() on the BufferedReader will propagate to InputStreamReader → ServletInputStream. The servlet container handles a double-close gracefully (it's a no-op after the first), so this is safe.
Verdict: Preexisting issue, low severity, but worth a one-line fix to keep the code idiomatic and static-analysis clean. Since this is marked as a preexisting issue, it could go in this PR or a follow-up chore.
There was a problem hiding this comment.
added use {} here so the lifecycle is explicit
Resolves #3951
Batches file ownership validation in
/submit-processed-dataso the backend checks file groups once per stream chunk instead of once per sequence entry.This keeps the existing safety check that processed data cannot reference files from another group, while reducing many small DB lookups during large preprocessing batches.
🚀 Preview: Add
previewlabel to enable