test(perf): add microbenchmarks for put operations#939
Conversation
Introduces performance microbenchmarks for GCS upload (put) operations. Includes: - PutConfigurator and YAML configurations for single-threaded and multi-process put benchmarks. - Single-threaded and multi-process put benchmark test cases. - Unit tests for PutConfigurator. - Fixtures and setup in conftest.py. - Updated README documentation. - Fixed ResourceMonitor import and multi-process argument mismatch.
There was a problem hiding this comment.
Code Review
This pull request introduces a new 'put' benchmark group to evaluate the performance of uploading local files to GCS, including single-threaded and multi-process test cases, configuration files, and setup fixtures. The review feedback highlights three key areas for improvement: correcting the parameter name in gcs.put from chunksize to block_size to ensure configured chunk sizes are respected, optimizing the CPU-intensive os.urandom file generation by repeating a smaller pre-generated block, and reducing or diversifying the default 4 GB benchmark file size to prevent excessively slow and expensive runs in CI/CD environments.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #939 +/- ##
=======================================
Coverage 89.77% 89.77%
=======================================
Files 16 16
Lines 3569 3569
=======================================
Hits 3204 3204
Misses 365 365 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| def _put_op(gcs, local_path, remote_path, chunk_size): | ||
| """Upload a local file to a single remote path.""" | ||
| try: | ||
| gcs.put(local_path, remote_path, chunksize=chunk_size) |
There was a problem hiding this comment.
Should we pass blocksize?
There was a problem hiding this comment.
put routes through _put_file, which takes chunksize (used directly as the resumable-upload part size). block_size is only a parameter of the open()/GCSFile path, so passing it to put would land in **kwargs and be ignored.
Introduces performance microbenchmarks for GCS upload (put) operations. Includes: