rust: Add identify_reader_sync/async for Read + Seek types by withbitsinmind · Pull Request #1313 · google/magika

withbitsinmind · 2026-03-02T20:26:41Z

This PR adds two new functions to session, identify_reader_sync and identify_reader_async which makes it possible to pass in any type that implements Read and Seek. It is analogous to the identify_stream function in the Python implementation #979.

I suppose an alternative, and perhaps cleaner solution, would be to drop the sealed trait pattern for SyncInput and SyncInputApi (and the async equivalent) and instead expose the SyncInputApi so that users can implement it for arbitrary types.

google-cla · 2026-03-02T20:26:46Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

gemini-code-assist

Code Review

This pull request introduces identify_reader_sync and identify_reader_async to allow identifying file types from Read + Seek sources, which is a great addition for flexibility. The implementation looks solid, but I have a few suggestions. There's a potential issue with integer truncation on 32-bit systems when getting the file length, which could lead to problems with large files. I've also pointed out a small readability improvement in the read_at method and suggested refactoring the new test to reduce code duplication. Overall, these are good changes that enhance the library's usability.

rust/lib/src/input.rs

gemini-code-assist · 2026-03-02T20:29:10Z

rust/lib/src/lib.rs

+    fn identify_by_reader_reference() {
+        #[derive(Debug, Deserialize)]
+        #[serde(deny_unknown_fields)]
+        struct Test {
+            prediction_mode: String,
+            content_base64: String,
+            status: String,
+            prediction: Option<Prediction>,
+        }
+        let path = format!(
+            "../../tests_data/reference/{MODEL_NAME}-inference_examples_by_content.json.gz"
+        );
+        let mut tests = String::new();
+        GzDecoder::new(File::open(path).unwrap()).read_to_string(&mut tests).unwrap();
+        let tests: Vec<Test> = serde_json::from_str(&tests).unwrap();
+        let mut session = Session::new().unwrap();
+        for test in tests {
+            if test.prediction_mode != "high-confidence" {
+                continue;
+            }
+            assert_eq!(test.status, "ok");
+            let expected = test.prediction.unwrap();
+            let content = BASE64.decode(test.content_base64.as_bytes()).unwrap();
+            let actual = session.identify_reader_sync(std::io::Cursor::new(content)).unwrap();
+            assert_prediction(actual, expected, &test.content_base64);
+        }
+    }


There is significant code duplication between this new test identify_by_reader_reference and the existing identify_by_content_reference test. The struct definition, test data loading, and test loop are nearly identical.

To improve maintainability, you could refactor the common parts. For example, define the Test struct once outside the test functions and create a helper function to load the test data. This would make the tests cleaner and easier to manage.

ia0 · 2026-03-03T08:39:05Z

Thanks for the PR!

Do you have an example of an object that implements Read and Seek and is not File? This could help figure out the best solution.

I see at least the following options:

Add the identify_reader_{sync,async} functions as you did. I don't think we should do that. We should use the identify_content_{sync,async} functions directly.
Add the ReadSeek wrapper as you did. This doesn't look great.
Unseal {Sync,Async}Input as you proposed. I guess that's an option, but then we need to decide if it should be an unsafe trait.
Provide a blanket implementation for Read and Seek. We need to make sure this is not a breaking change.

withbitsinmind · 2026-03-03T12:40:27Z

The example object in my case is HttpRangeReader, a private struct, which implements Read and Seek (or the async equivalents) using HTTP Range requests towards a URL. This implementation would make it possible to run magika on said resource without having to download it to a file beforehand. It could for example be the presigned URL for a large S3 object.

As for the unseal alternative, my take is that it should not be marked as unsafe because it is not in itself unsafe, though an implementor could of course provide an incorrect or unsafe implementation.

ia0 · 2026-03-03T13:08:44Z

The example object in my case is HttpRangeReader, a private struct, which implements Read and Seek (or the async equivalents) using HTTP Range requests towards a URL.

Thanks! This makes sense. Could you tell me if #1315 would work for you?

withbitsinmind · 2026-03-03T14:09:27Z

Brilliant! I can confirm that #1315 works for me 👍

Add identify_reader_sync/async for Read + Seek types

08edd98

withbitsinmind requested a review from ia0 as a code owner March 2, 2026 20:26

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

Merge branch 'main' into main

278e3dc

ia0 mentioned this pull request Mar 3, 2026

Unseal the Sync- and AsyncInput traits #1315

Merged

ia0 closed this in #1315 Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rust: Add identify_reader_sync/async for Read + Seek types#1313

rust: Add identify_reader_sync/async for Read + Seek types#1313
withbitsinmind wants to merge 2 commits intogoogle:mainfrom
withbitsinmind:main

withbitsinmind commented Mar 2, 2026

Uh oh!

google-cla bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 2, 2026

Uh oh!

ia0 commented Mar 3, 2026

Uh oh!

withbitsinmind commented Mar 3, 2026

Uh oh!

ia0 commented Mar 3, 2026

Uh oh!

withbitsinmind commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

withbitsinmind commented Mar 2, 2026

Uh oh!

google-cla bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

ia0 commented Mar 3, 2026

Uh oh!

withbitsinmind commented Mar 3, 2026

Uh oh!

ia0 commented Mar 3, 2026

Uh oh!

withbitsinmind commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants