8312182: THPs cause huge RSS due to thread start timing issue#3210
8312182: THPs cause huge RSS due to thread start timing issue#3210tabata-d wants to merge 1 commit into
Conversation
|
👋 Welcome back dtabata! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
This backport pull request has now been updated with issues from the original commit. |
|
Hi @tabata-d , First off, thanks for preparing the patches. I know that is a lot of work. To save you the work and frustration of getting something work-intensive rejected later, it is always good to discuss larger backport projects upfront in the OpenJDK mailing lists first. Not sure which mailing list would be best for that @jerboaa ? In this case, I am not yet convinced that these backports are necessary, at least not in their full form. Both @jerboaa and @gnu-andrew have reservations, and they are the maintainers who need convincing. JDK-8312182 bypasses a problem caused by the behavior of Java, kernel, and glibc:
But that was then. Things are different now, since java sure was not the only program suffering from THP RSS bloat. Therefore:
So we have the following workarounds:
@tabata-d where, exactly, can you reproduce the problem? Which OS/kernel version/glibc release? |
|
|
Hi @tstuefe, Thank you for the detail explanations. I now understand that this is not a fatal issue for newer glibc/kernel, because there are useful workarounds available.
The issue can be reproduced in the following environments:
It can be reproduced on RHEL 8 and 9 series, which still have a large number of users. |
|
@tabata-d Yes, I did my own tests today and saw the bug (very clearly) still on RHEL8 but not that clearly on RHEL9. On RHEL 10 it should be pretty much gone (I cannot reproduce the problem on kernels >= 6), since newer Khugepaged versions are a lot smarter about coalescing large sparse memory areas to huge pages. Hugepaged on older releases is way too aggressive. If you can, can you share some more details about the RHEL 9 tests?
Sorry, we can't really take the full chain of backports you did - just too risky and invasive. I would ask that you close your THP-related PRs - I know it feels bad to have spent time on something that gets rejected, but we need to keep the risk minimal. I will fix this problem in JDK 11 and 8 with a small, targeted fix specific to these older releases. Your work helped to make us aware that this was a real problem, thanks for that. The problem will be fixed for your users (probably not for the July update, but for the October update). |
|
@tstuefe Sorry for the late reply.
OS: RHEL 9.6 khugepaged settings: Reproducer program: Test.java prints When I ran Test.java in the above environment, I got the following result: After changing the THP configuration from On my RHEL 9 environment, I observed a significant difference in RSS depending on the THP configuration. |
I will close the series of PR. @tstuefe @jerboaa @gnu-andrew |
JDK-8312182 is a bug fix that addresses significant Resident Set Size (RSS) bloat in Java applications when Transparent Huge Pages (THP) are enabled on Linux.
The fix was originally implemented in JDK 21. Now we are backporting this into JDK 11.
Unclean Backport
globals_linux.hpp,os_linux.cpp) due to the evolution of internal HotSpot VM APIs and syntax between JDK 11 and JDK 21.productmacro supports aDIAGNOSTICattribute. This attribute was introduced by JDK-8243208, which is not present in JDK 11. Directly using the JDK 21 syntax would cause a compilation error in JDK 11. ForDisableTHPStackMitigation, instead of attempting to force theDIAGNOSTICattribute into theproductmacro, it was declared using thediagnosticmacro directly inglobals_linux.hpp. This achieves the same intent of marking the flag as diagnostic without requiring the backport of JDK-8243208, which has a broad impact and would introduce significant complexity and risk to the JDK 11 codebase.THPModeenum members are accessed changed. In JDK 21, they are accessed using the scope resolution operator (THPMode::always), while in JDK 11, they are accessed directly (THPMode_always). The code was adjusted to use the direct access method (THPMode_always) for enum members inos_linux.cppto align with JDK 11's syntax.FLAG_SET_ERGOmacro in JDK 21 no longer requires the type (bool) as its first argument, but JDK 11 explicitly requires it. TheFLAG_SET_ERGOmacro call inos_linux.cppwas modified to explicitly include thebooltype argument, conforming to JDK 11's macro definition.Testing
System: Ran on Red Hat Enterprise Linux 9.4 (x86_64).
jtreg: A comprehensive
jtregrun on the entirehotspot/jtregtest suite confirmed that all HotSpot tests passed.New Test: The newly added test
THPsInThreadStackPreventionTest.javapassed successfully in both enabled and disabled configurations, confirming the effectiveness of the fix.Progress
Integration blocker
Issues
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk11u-dev.git pull/3210/head:pull/3210$ git checkout pull/3210Update a local copy of the PR:
$ git checkout pull/3210$ git pull https://git.openjdk.org/jdk11u-dev.git pull/3210/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3210View PR using the GUI difftool:
$ git pr show -t 3210Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk11u-dev/pull/3210.diff
Using Webrev
Link to Webrev Comment