-
Notifications
You must be signed in to change notification settings - Fork 152
Speed up refundable orders query #4061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides an excellent optimization for the refundable_orders query. The changes made are sound and follow best practices for improving SQL performance in PostgreSQL. Switching the query to start from ethflow_orders, replacing LEFT JOIN with NOT EXISTS for checking non-existence, and moving filter conditions into the ON clauses of the JOINs are all effective strategies that have clearly paid off, as shown by the dramatic reduction in execution time. The resulting query is not only more performant but also more readable and maintainable. This is a high-quality contribution.
|
I experimented a bit more and the biggest culprit appears to be that the query does a sequential scan over the entire trades table. I think the most pragmatic way forward is to limit the look back window to 1 week or so. This reliably results in ~300ms runtimes on the prod DB while still being very generous. |
|
These are the changes in DB metrics after temporarily deploying this + 1 week lookback window + #4062. The deployment happened only on mainnet (yellow bars in second graph) so since all network DBs run on the same postgres instance the improvement will be even more pronounced when the fix gets rolled out to all networks. While I assumed the change would reduce the CPU usage of the DB it actually resolved a IO bottle neck. Due to constant full table scans the DB had to page memory in and out of disk a ton.
On the services side this appeared to lower the variance in latency of some DB queries (but not all). |
squadgazzz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement!



Description
In the RDS UI you can show the DB active sessions grouped by specific queries.

It turns out the purple, dark blue, and one other color all belong to the query fetching the yet to be refunded ethflow orders.
To confirm the data I ran the query on the read replica and it took ~130s in total. Luckily the query could be fixed with a few simple changes.
Changes
orderstable andLEFT JOINing theethflow_orderstable onto it we can simple select from theethflow_orderstable and unconditionally JOIN the orders table onto it. That's because every row in theethflow_orderstable has a corresponding row in theorderstable but not the other way around.ethflow_orders.uidinstead of theorders.uidso there is no more dependency on first doing theordersjoin.wherefilters got moved into the joins to eliminate as many rows as early as possiblewhere not exists (select 1 where ...)instead of joining a bunch of tables and doing regularwherestatementsHow to test
compared the queries on the read replica.
The old one always takes at least 2 minutes (this run even 2.5m)
original execution plan
The new query takes ~1.5s. One query I previously thought was this fast was actually only that fast with a warm cache. So far I was not able to make this query run significantly slower but just to be sure I'll run another test tomorrow.
After some more testing it looks like there might be some pathological table locking behavior or sth like that. For runs that take a long time to finish the execution plan actually still says 1.5s execution time. So I think those runs might only be so slow because the tables are locked for an extended period of time which does not get accounted for in the execution time.
new execution plan