Skip to content

Add compare sync async benchmark#933

Open
mohe2015 wants to merge 2 commits intotafia:masterfrom
mohe2015:benchmark-sync-async
Open

Add compare sync async benchmark#933
mohe2015 wants to merge 2 commits intotafia:masterfrom
mohe2015:benchmark-sync-async

Conversation

@mohe2015
Copy link

It seems like the async implementation is much slower that the synchronous implementation. As a first start, add an benchmark to more easily compare them and be able to track improvements / regressions.

My actual usage does not read from static data but I would expect that reading from static data should have very similar performance for sync and async.

cargo bench --features async-tokio --bench compare_sync_async
   Compiling quick-xml v0.39.0 (/home/moritz/Documents/quick-xml)
    Finished `bench` profile [optimized + debuginfo] target(s) in 30.10s
     Running benches/compare_sync_async.rs (target/release/deps/compare_sync_async-ded794078866a808)
Gnuplot not found, using plotters backend
compare_sync/sample_rss.xml
                        time:   [103.25 µs 103.51 µs 103.77 µs]
                        thrpt:  [1.7749 GiB/s 1.7794 GiB/s 1.7839 GiB/s]
                 change:
                        time:   [−0.5326% −0.2991% −0.0553%] (p = 0.02 < 0.05)
                        thrpt:  [+0.0553% +0.2999% +0.5355%]
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

compare_async/sample_rss.xml
                        time:   [176.42 µs 176.71 µs 177.05 µs]
                        thrpt:  [1.0403 GiB/s 1.0423 GiB/s 1.0440 GiB/s]
                 change:
                        time:   [−0.1489% +1.0627% +2.0229%] (p = 0.05 < 0.05)
                        thrpt:  [−1.9828% −1.0515% +0.1491%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 0% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.41%. Comparing base (d1acdb5) to head (244f588).
⚠️ Report is 33 commits behind head on master.

Files with missing lines Patch % Lines
benches/compare_sync_async.rs 0.00% 28 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #933      +/-   ##
==========================================
- Coverage   58.08%   54.41%   -3.68%     
==========================================
  Files          42       45       +3     
  Lines       15513    16959    +1446     
==========================================
+ Hits         9011     9228     +217     
- Misses       6502     7731    +1229     
Flag Coverage Δ
unittests 54.41% <0.00%> (-3.68%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that reading from static data should have very similar performance for sync and async.

async have some runtime costs to handle all the machinery and if data is static (i.e no I/O) it is expected that it would be slower than sync interface. async is not about increasing performance of an individual task, but about using less threads to process more I/O bound tasks.

Cargo.toml Outdated
serde_derive = { version = "1.0.206" }
serde-value = "0.7"
tokio = { version = "1.21", default-features = false, features = ["macros", "rt"] }
tokio = { version = "1.21", default-features = false, features = ["macros", "rt", "rt-multi-thread"] }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like to include new features that required just for internal tests / benches. Move benchmark to the compare sub-project to avoid this. It also would be useful to benchmark other libraries that provides async interface if there are so.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to a current-thread runtime. As this in in dev-dependencies, is it fine that criterion still has an added feature flag? Otherwise I could also just call block_on myself but the measurement code of criterion takes async into account

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to a current-thread runtime. As this in in dev-dependencies, is it fine that criterion still has an added feature flag?

Yes, that is fine. Actually, I was wrong about my initial comment. Because this is dev-dependencies, it was possible to add a new feature, but for testing purposes I think that single-thread runtime is enough anyway.

I still think, that it would be better to place this benchmark into compare project.

@mohe2015 mohe2015 force-pushed the benchmark-sync-async branch from 244f588 to 1fde3e4 Compare January 25, 2026 19:19
@mohe2015
Copy link
Author

async have some runtime costs to handle all the machinery and if data is static (i.e no I/O) it is expected that it would be slower than sync interface. async is not about increasing performance of an individual task, but about using less threads to process more I/O bound tasks.

I am fully aware that it is expected to be slower but without IO (e.g. Tokio's File implementation is extremely inefficient on Linux because it uses thread pools and not io_uring) I wanted to avoid factors outside of quick-xml's control. Also https://docs.rs/tokio/1.49.0/src/tokio/io/async_buf_read.rs.html#99-107 and https://doc.rust-lang.org/src/std/io/impls.rs.html#406-416 seems to be relatively similar in their implementation so I would expect the compiler to be able to mostly remove the async state machine overhead. There are likely several places where the compiler is not able to optimize properly but I would've expected a smaller overhead.

My actual usage is significantly more complex. I also shortly looked at the async implementation in quick-xml but I couldn't find an obvious place to improve.

@mohe2015
Copy link
Author

So personally I would prefer the benchmark at the current location because I intend it as a comparison of the crates own implementations. The compare subcrate seems to be more for comparison between different crates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants