Skip to content

8312182: THPs cause huge RSS due to thread start timing issue#3210

Closed
tabata-d wants to merge 1 commit into
openjdk:pr/3171from
tabata-d:JDK-8312182
Closed

8312182: THPs cause huge RSS due to thread start timing issue#3210
tabata-d wants to merge 1 commit into
openjdk:pr/3171from
tabata-d:JDK-8312182

Conversation

@tabata-d

@tabata-d tabata-d commented May 29, 2026

Copy link
Copy Markdown
Member

JDK-8312182 is a bug fix that addresses significant Resident Set Size (RSS) bloat in Java applications when Transparent Huge Pages (THP) are enabled on Linux.

The fix was originally implemented in JDK 21. Now we are backporting this into JDK 11.

Unclean Backport

  • The backport was not entirely clean for the C++ source files (globals_linux.hpp, os_linux.cpp) due to the evolution of internal HotSpot VM APIs and syntax between JDK 11 and JDK 21.
  • Specific Unclean Parts:
    1. Flag Declaration: In JDK 21, the product macro supports a DIAGNOSTIC attribute. This attribute was introduced by JDK-8243208, which is not present in JDK 11. Directly using the JDK 21 syntax would cause a compilation error in JDK 11. For DisableTHPStackMitigation, instead of attempting to force the DIAGNOSTIC attribute into the product macro, it was declared using the diagnostic macro directly in globals_linux.hpp. This achieves the same intent of marking the flag as diagnostic without requiring the backport of JDK-8243208, which has a broad impact and would introduce significant complexity and risk to the JDK 11 codebase.
    2. Enum Access: Between JDK 11 and JDK 21, the way THPMode enum members are accessed changed. In JDK 21, they are accessed using the scope resolution operator (THPMode::always), while in JDK 11, they are accessed directly (THPMode_always). The code was adjusted to use the direct access method (THPMode_always) for enum members in os_linux.cpp to align with JDK 11's syntax.
    3. Macro Signature: The FLAG_SET_ERGO macro in JDK 21 no longer requires the type (bool) as its first argument, but JDK 11 explicitly requires it. The FLAG_SET_ERGO macro call in os_linux.cpp was modified to explicitly include the bool type argument, conforming to JDK 11's macro definition.

Testing

System: Ran on Red Hat Enterprise Linux 9.4 (x86_64).
jtreg: A comprehensive jtreg run on the entire hotspot/jtreg test suite confirmed that all HotSpot tests passed.
New Test: The newly added test THPsInThreadStackPreventionTest.java passed successfully in both enabled and disabled configurations, confirming the effectiveness of the fix.



Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • JDK-8310687 needs maintainer approval
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • JDK-8312182 needs maintainer approval

Integration blocker

 ⚠️ Dependency #3171 must be integrated first

Issues

  • JDK-8312182: THPs cause huge RSS due to thread start timing issue (Bug - P3)
  • JDK-8310687: JDK-8303215 is incomplete (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk11u-dev.git pull/3210/head:pull/3210
$ git checkout pull/3210

Update a local copy of the PR:
$ git checkout pull/3210
$ git pull https://git.openjdk.org/jdk11u-dev.git pull/3210/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3210

View PR using the GUI difftool:
$ git pr show -t 3210

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk11u-dev/pull/3210.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper

bridgekeeper Bot commented May 29, 2026

Copy link
Copy Markdown

👋 Welcome back dtabata! A progress list of the required criteria for merging this PR into pr/3171 will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk

openjdk Bot commented May 29, 2026

Copy link
Copy Markdown

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk Bot changed the title backport 84b325b844c08809448a9c073a11443d9e3c3f8e 8312182: THPs cause huge RSS due to thread start timing issue May 29, 2026
@openjdk

openjdk Bot commented May 29, 2026

Copy link
Copy Markdown

This backport pull request has now been updated with issues from the original commit.

@openjdk openjdk Bot added backport Port of a pull request already in a different code base rfr Pull request is ready for review labels May 29, 2026
@mlbridge

mlbridge Bot commented May 29, 2026

Copy link
Copy Markdown

Webrevs

@tstuefe

tstuefe commented Jun 2, 2026

Copy link
Copy Markdown
Member

Hi @tabata-d ,

First off, thanks for preparing the patches. I know that is a lot of work. To save you the work and frustration of getting something work-intensive rejected later, it is always good to discuss larger backport projects upfront in the OpenJDK mailing lists first. Not sure which mailing list would be best for that @jerboaa ?

In this case, I am not yet convinced that these backports are necessary, at least not in their full form. Both @jerboaa and @gnu-andrew have reservations, and they are the maintainers who need convincing.


JDK-8312182 bypasses a problem caused by the behavior of Java, kernel, and glibc:

  • java orders glibc to create thread stacks without glibc guard pages since we create our own guard pages.
  • glibc mmap the memory for thread stack
  • on the system level, we use THP mode "always"
  • (concurrently) khugepaged decides to coalesce the start of the stacks into huge pages. If multiple stacks are directly adjacent, from the kernel's perspective they are one VMA, and it aggressively converts that VMA into huge pages, making the memory resident as a result
  • Java then places its stack guard, which splinters the VMA again, but the damage is already done - all that memory is resident, and thus RSS goes up.

But that was then. Things are different now, since java sure was not the only program suffering from THP RSS bloat. Therefore:

  1. glibc has a tunable since glibc 2.38, called glibc.pthread.stack_hugetlb. That is by default 1, but can be set to 0, and that causes glibc to mmap the thread stacks with MADV_NOHUGEPAGE, which should completely prevent the problem.
  2. Kernels are a lot smarter now about when to do MMAP coalescing. It will be a lot less likely to promote VMAs to THP that are sparsely populated, and stacks of freshly created stacks are just that.

So we have the following workarounds:

  • on new kernels (starting at around 6.14... ??) nothing should be needed at all
  • on new glibcs >= 2.38 we can use the tunable
  • obviously, we can switch THP mode to "madvise", which affects the whole machine. But tbh on older kernels where the THP daemon is that aggressive this may be good advise anyway. Because the problem does not only affect java stacks.
  • Yet another obvious workaround is to use stack sizes just a wee bit smaller than THP size, so not 2MB but e.g. 1.9MB.

@tabata-d where, exactly, can you reproduce the problem? Which OS/kernel version/glibc release?

@jerboaa

jerboaa commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Not sure which mailing list would be best for that @jerboaa ?

jdk8u-dev@openjdk.org for OpenJDK 8u backports, jdk-updates-dev@openjdk.org for any other (newer) release.

@tabata-d

tabata-d commented Jun 3, 2026

Copy link
Copy Markdown
Member Author

Hi @tstuefe,

Thank you for the detail explanations. I now understand that this is not a fatal issue for newer glibc/kernel, because there are useful workarounds available.

@tabata-d where, exactly, can you reproduce the problem? Which OS/kernel version/glibc release?

The issue can be reproduced in the following environments:

  1. RHEL 8.10
    kernel: 4.18.0
    glibc: 2.28

  2. RHEL 9.6
    kernel: 5.14.0
    glibc: 2.34

It can be reproduced on RHEL 8 and 9 series, which still have a large number of users.
On RHEL 10 (glibc 2.39), it might be possible to workaround it using glibc.pthread.stack_hugetlb, but I haven’t tried it yet.

@tstuefe

tstuefe commented Jun 3, 2026

Copy link
Copy Markdown
Member

@tabata-d Yes, I did my own tests today and saw the bug (very clearly) still on RHEL8 but not that clearly on RHEL9.

On RHEL 10 it should be pretty much gone (I cannot reproduce the problem on kernels >= 6), since newer Khugepaged versions are a lot smarter about coalescing large sparse memory areas to huge pages. Hugepaged on older releases is way too aggressive.

If you can, can you share some more details about the RHEL 9 tests?

  • what architecture (arm64 or x64?)
  • how many cores has the box
  • what are the khugepaged settings (basically everything inside /sys/kernel/mm/transparent_hugepage/khugepaged/?
  • How exactly did you reproduce the bug? What are the RSS numbers you see?

Sorry, we can't really take the full chain of backports you did - just too risky and invasive. I would ask that you close your THP-related PRs - I know it feels bad to have spent time on something that gets rejected, but we need to keep the risk minimal.

I will fix this problem in JDK 11 and 8 with a small, targeted fix specific to these older releases. Your work helped to make us aware that this was a real problem, thanks for that. The problem will be fixed for your users (probably not for the July update, but for the October update).

@tabata-d

tabata-d commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

@tstuefe Sorry for the late reply.

If you can, can you share some more details about the RHEL 9 tests?

OS: RHEL 9.6
Architecture: x64
Machine cores: 4 cores

khugepaged settings:

$ find /sys/kernel/mm/transparent_hugepage/khugepaged/ -type f | while read f; do echo "$f"; cat
"$f"; done
/sys/kernel/mm/transparent_hugepage/khugepaged/defrag
1
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared
256
/sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs
10000
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
511
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan
4096
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_swap
64
/sys/kernel/mm/transparent_hugepage/khugepaged/alloc_sleep_millisecs
60000
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed
69
/sys/kernel/mm/transparent_hugepage/khugepaged/full_scans
1

Reproducer program:
Test.java

Test.java prints /proc/self/status at runtime.

When I ran Test.java in the above environment, I got the following result:

$ ./jdk-11.0.31+11/bin/java -Xss2m -XX:+AlwaysPreTouch Test
Process PID: 100323

--- /proc/self/status ---
VmSize: 28563792 kB
VmRSS:  13053880 kB
Threads:        10020
-------------------------

After changing the THP configuration from always to madvise, I got the following result:

$ ./jdk-11.0.31+11/bin/java -Xss2m -XX:+AlwaysPreTouch Test
Process PID: 12219

--- /proc/self/status ---
VmSize: 28566808 kB
VmRSS:    693716 kB
Threads:        10020
-------------------------

On my RHEL 9 environment, I observed a significant difference in RSS depending on the THP configuration.

@tabata-d

tabata-d commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

Sorry, we can't really take the full chain of backports you did - just too risky and invasive. I would ask that you close your THP-related PRs - I know it feels bad to have spent time on something that gets rejected, but we need to keep the risk minimal.

I will close the series of PR.
I was reminded that JDK 11 and 8 are very stable releases, and that unclean and large backports like this one are considered too risky to accept. From now on, I will consult the mailing list before working on such changes.

@tstuefe @jerboaa @gnu-andrew
Thank you very much for the courteous responses and also for your continuous maintenace work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Port of a pull request already in a different code base rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

3 participants