forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 2
Update README #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
alchark
wants to merge
122
commits into
flipper-devel
Choose a base branch
from
alchark-stupid-pr
base: flipper-devel
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
To configure audio registers, the clock of the video port in use must be enabled. As those clocks are managed by the VOP driver, they can't be enabled here to write the registers even when the HDMI cable is disconnected. Furthermore, the registers values are computed from the TMDS char rate, which is not available when disconnected. Returning -ENODEV seemed reasonable at first, but ASoC will log an error multiple times if dw_hdmi_qp_audio_prepare() return an error. Userspace might also retry multiple times, filling the kernel log with: hdmi-audio-codec hdmi-audio-codec.0.auto: ASoC error (-19): at snd_soc_dai_prepare() on i2s-hifi This has become even worse with the support of the second HDMI TX port. Activating the clocks to write fake data (fake because the TMDS char rate is unavailable) would require API changes to communicate between VOP and HDMI, which doesn't really make sense. Using a cached regmap to be dumped when a cable is connected won't work because writing order is important and some data needs to be retrieved from registers to write others. Returning 0 to silently fail sounds like the best and simplest solution. Fixes: fd0141d ("drm/bridge: synopsys: Add audio support for dw-hdmi-qp") Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
There is no need to warn about non pre-computed values, just change it to dbg. Fixes: fd0141d ("drm/bridge: synopsys: Add audio support for dw-hdmi-qp") Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Add a new DIV_ROUND_UP helper, which cannot overflow when big numbers are being used. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
The clock framework handles clock rates as "unsigned long", so u32 on 32-bit architectures and u64 on 64-bit architectures. The current code casts the dividend to u64 on 32-bit to avoid a potential overflow. For example DIV_ROUND_UP(3000000000, 1500000000) = (3.0G + 1.5G - 1) / 1.5G = = OVERFLOW / 1.5G, which has been introduced in commit 9556f9d ("clk: divider: handle integer overflow when dividing large clock rates"). On 64 bit platforms this masks the divisor, so that only the lower 32 bit are used. Thus requesting a frequency >= 4.3GHz results in incorrect values. For example requesting 4300000000 (4.3 GHz) will effectively request ca. 5 MHz. Requesting clk_round_rate(clk, ULONG_MAX) is a bit of a special case, since that still returns correct values as long as the parent clock is below 8.5 GHz. Fix this by switching to DIV_ROUND_UP_NO_OVERFLOW, which cannot overflow. This avoids any requirements on the arguments (except that divisor should not be 0 obviously). Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Replace the open coded abs_diff() with the existing helper function. Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Add rfkill support for bluetooth. Bluetooth support itself is still missing, but this ensures bluetooth can be powered off properly. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Also describe clkreq and wake signals in the PCIe pinmux used by the onboard LAN card. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
This drops to hs200 mode and 150Mhz as this is actually stable across eMMC modules. There exist some that are incompatible at higher rates with the rk3588 and to avoid your filesystem corrupting due to IO errors, be more conservative and reduce the max. speed. Signed-off-by: Carsten Haitzler <raster@rasterman.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
When I converted rk808 to device managed resources I converted the rk808 specific pm_power_off handler to devm_register_sys_off_handler() using SYS_OFF_MODE_POWER_OFF_PREPARE, which is allowed to sleep. I did this because the driver's poweroff function makes use of regmap and the backend of that might sleep. But the PMIC poweroff function will kill off the board power and the kernel does some extra steps after the prepare handler. Thus the prepare handler should not be used for the PMIC's poweroff routine. Instead the normal SYS_OFF_MODE_POWER_OFF phase should be used. The old pm_power_off method is also being called from there, so this would have been a cleaner conversion anyways. But it still makes sense to investigate the sleep handling and check if there are any issues. Apparently the Rockchip and Meson I2C drivers (the only platforms using the PMICs handled by this driver) both have support for atomic transfers and thus may be called from the atomic poweroff context. Things are different on the SPI side. That is so far only used by rk806 and that one is only used by Rockchip RK3588. Unfortunately the Rockchip SPI driver does not support atomic transfers. That means this change will introduce an error splash directly before doing the final power off on all upstream supported RK3588 boards: [ 13.761353] ------------[ cut here ]------------ [ 13.761764] Voluntary context switch within RCU read-side critical section! [ 13.761776] WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_plugin.h:330 rcu_note_context_switch+0x3ac/0x404 [ 13.763219] Modules linked in: [ 13.763498] CPU: 0 UID: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.10.0-12284-g2818a9a19514 #1499 [ 13.764297] Hardware name: Rockchip RK3588 EVB1 V10 Board (DT) [ 13.764812] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 13.765427] pc : rcu_note_context_switch+0x3ac/0x404 [ 13.765871] lr : rcu_note_context_switch+0x3ac/0x404 [ 13.766314] sp : ffff800084f4b5b0 [ 13.766609] x29: ffff800084f4b5b0 x28: ffff00040139b800 x27: 00007dfb4439ae80 [ 13.767245] x26: ffff00040139bc80 x25: 0000000000000000 x24: ffff800082118470 [ 13.767880] x23: 0000000000000000 x22: ffff000400300000 x21: ffff000400300000 [ 13.768515] x20: ffff800083a9d600 x19: ffff0004fee48600 x18: fffffffffffed448 [ 13.769151] x17: 000000040044ffff x16: 005000f2b5503510 x15: 0000000000000048 [ 13.769787] x14: fffffffffffed490 x13: ffff80008473b3c0 x12: 0000000000000900 [ 13.770421] x11: 0000000000000300 x10: ffff800084797bc0 x9 : ffff80008473b3c0 [ 13.771057] x8 : 00000000ffffefff x7 : ffff8000847933c0 x6 : 0000000000000300 [ 13.771692] x5 : 0000000000000301 x4 : 40000000fffff300 x3 : 0000000000000000 [ 13.772328] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000400300000 [ 13.772964] Call trace: [ 13.773184] rcu_note_context_switch+0x3ac/0x404 [ 13.773598] __schedule+0x94/0xb0c [ 13.773907] schedule+0x34/0x104 [ 13.774198] schedule_timeout+0x84/0xfc [ 13.774544] wait_for_completion_timeout+0x78/0x14c [ 13.774980] spi_transfer_one_message+0x588/0x690 [ 13.775403] __spi_pump_transfer_message+0x19c/0x4ec [ 13.775846] __spi_sync+0x2a8/0x3c4 [ 13.776161] spi_write_then_read+0x120/0x208 [ 13.776543] rk806_spi_bus_read+0x54/0x88 [ 13.776905] _regmap_raw_read+0xec/0x16c [ 13.777257] _regmap_bus_read+0x44/0x7c [ 13.777601] _regmap_read+0x60/0xd8 [ 13.777915] _regmap_update_bits+0xf4/0x13c [ 13.778289] regmap_update_bits_base+0x64/0x98 [ 13.778686] rk808_power_off+0x70/0xfc [ 13.779024] sys_off_notify+0x40/0x6c [ 13.779356] atomic_notifier_call_chain+0x60/0x90 [ 13.779776] do_kernel_power_off+0x54/0x6c [ 13.780146] machine_power_off+0x18/0x24 [ 13.780499] kernel_power_off+0x70/0x7c [ 13.780845] __do_sys_reboot+0x210/0x270 [ 13.781198] __arm64_sys_reboot+0x24/0x30 [ 13.781558] invoke_syscall+0x48/0x10c [ 13.781897] el0_svc_common+0x3c/0xe8 [ 13.782228] do_el0_svc+0x20/0x2c [ 13.782528] el0_svc+0x34/0xd8 [ 13.782806] el0t_64_sync_handler+0x120/0x12c [ 13.783197] el0t_64_sync+0x190/0x194 [ 13.783527] ---[ end trace 0000000000000000 ]--- The board will shutdown nevertheless, since this also re-enables interrupts. A proper fix for this requires changes to the core SPI subsystem and will be done as a follow-up series. Note, that this patch also fixes a problem for the Asus C201. Without the function being registered as a proper shutdown handler the syscall for poweroff exits early and does not even call the shutdown prepare handler. This in turn means the system can no longer poweroff properly since my original change. Fixes: 4fec8a5 ("mfd: rk808: Convert to device managed resources") Cc: stable@vger.kernel.org Reported-by: Urja <urja@urja.dev> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Add the documentation for VOP2 video ports reset clocks. One reset can be set per video port. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
At the end of initialization, each VP clock needs to be reset before they can be used. Failing to do so can put the VOP in an undefined state where the generated HDMI signal is either lost or not matching the selected mode. This issue can be reproduced by switching modes multiple times. Depending on the setup, after about 10 mode switches, the signal will be lost and the value in register 0x890 (VSYNCWIDTH + VFRONT) will take the value `0x0000018c`. That makes VSYNCWIDTH=0, which is wrong. Adding the clock resets after the VOP configuration fixes the issue. Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
This adds the needed clock resets for all rk3588(s) based SOCs. Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
The RK3588 EVB1 comes with a W552793DBA-V10 Touchscreen/Display combination. It contains a Wanchanglong W552793BAA panel and a Goodix GT1158 touchscreen. This adds the DT description of it. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
A previous from Detlev Casanova adds reset handling for the video ports. This also resets the AHB and AXI interface when the system binds the VOP2 controller. This fixes issues when the bootloader (or a previously running kernel when using kexec) left the VOP2 initialized to some degree. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Fix the USB-C connector description, so that it follows the binding: port@0 is the high-speed lanes port@1 is the super-speed lanes port@2 is the SBU lanes Right now the high-speed and super-speed links are swapped and for the high-speed lanes the link points to the controller instead of the PHY. I'm still investigating if this should be changed. This also updates the port naming, so that it describes the hardware instead of how the drivers are using the information. These are effectively the same, but the DT should describe hardware and not software. Fixes: b37146b ("arm64: dts: rockchip: add USB3 to rk3588-evb1") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Any camera related IP of the RK3588 is not yet supported and the cameras must be handled via overlays anyways, but it is sensible to expose the related I2C interfaces by default. This allows using i2cdetect to investigate anything connected to the CSI connectors right now. Since the Rockchip I2C driver implements proper power management there are no disadvantages, if nothing is connected to the port. Note, that the second CSI port's I2C in the Rock 5B+ and Rock 5T reuse I2C4, which is already used by fusb302 and thus already enabled. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
With USB type C connectors, the vbus detect pin of the OTG controller attached to it is pulled high by a USB Type C controller chip such as the fusb302. This means USB enumeration on Type-C ports never works, as the vbus is always seen as high. Rockchip added some GRF register flags to deal with this situation. The RK3576 TRM calls these "soft_vbusvalid_bvalid" (con0 bit index 15) and "soft_vbusvalid_bvalid_sel" (con0 bit index 14). Downstream introduces a new vendor property which tells the USB 2 PHY that it's connected to a type C port, but we can do better. Since in such an arrangement, we'll have an OF graph connection from the USB controller to the USB connector anyway, we can walk said OF graph and check the connector's compatible to determine this without adding any further vendor properties. Do keep in mind that the usbdp PHY driver seemingly fiddles with these register fields as well, but what it does doesn't appear to be enough for us to get working USB enumeration, presumably because the whole vbus_attach logic needs to be adjusted as well either way. Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/20250610-rk3576-sige5-usb-v4-1-7e7f779619c1@collabora.com Signed-off-by: Sebastian Reichel <sre@kernel.org>
adc-keys, unlike gpio-keys, does not allow linux,input-type as a valid property. This makes it impossible to model devices that have ADC inputs that should generate switch events. Add the property to the binding with the same default as gpio-keys. Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250630-rock4d-audio-v1-1-0b3c8e8fda9c@collabora.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Instead of doing something like what gpio-keys is doing, adc-keys hardcodes that all keycodes must be of type EV_KEY. This limits the usefulness of adc-keys, and overcomplicates the code with manual bit-setting logic. Instead, refactor the code to read the linux,input-type fwnode property, and get rid of the custom bit setting logic, replacing it with input_set_capability instead. input_report_key is replaced with input_event, which allows us to explicitly pass the type. Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250630-rock4d-audio-v1-2-0b3c8e8fda9c@collabora.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
The RADXA ROCK 4D, like many other Rockchip-based boards, uses an ES8388 analog audio codec. On the production version of the board, the codec's LOUT1 and ROUT1 pins are tied to the headphone jack, whereas pins LOUT2 and ROUT2 lead to a non-populated speaker amplifier that itself leads to a non-populated speaker jack. The schematic is still haunted by the ghosts of those symbols, but it clearly marks them as "NC". The 3.5mm TRRS jack has its microphone ring (and ground ring) wired to the codec's LINPUT1 and RINPUT1 pins for differential signalling. Furthermore, it uses the SoCs ADC to detect whether the inserted cable is of headphones (i.e., no microphone), or a headset (i.e., with microphone). The way this is done is that the ADC input taps the output of a 100K/100K resistor divider that divides the microphone ring pin that's pulled up to 3.3V. There is no ADC level difference between a completely empty jack and one with a set of headphones (i.e., ones that don't have a microphone) connected. Consequently headphone insertion detection isn't something that can be done. Add the necessary codec and audio card nodes. The non-populated parts, i.e. LOUT2 and ROUT2, are not modeled at all, as they are not present on the hardware. Also, add an adc-keys node for the headset detection, which uses an input type of EV_SW with the SW_MICROPHONE_INSERT keycode. Below the 220mV pressed voltage level of our SW_MICROPHONE_INSERT switch, we also define a button that emits a KEY_RESERVED code, which is there to model this part of the voltage range as not just being extra legroom for the button above it, but actually a state that is encountered in the real world, and should be recognised as a valid state for the ADC range to be in so that no "closer" ADC button is chosen. Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/20250630-rock4d-audio-v1-3-0b3c8e8fda9c@collabora.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
On Radxa ROCK 4D boards we are seeing some issues with PHY detection and stability (e.g. link loss or not capable of transceiving packages) after new board revisions switched from a dedicated crystal to providing the 25 MHz PHY input clock from the SoC instead. This board is using a RTL8211F PHY, which is connected to an always-on regulator. Unfortunately the datasheet does not explicitly mention the power-up sequence regarding the clock, but it seems to assume that the clock is always-on (i.e. dedicated crystal). By doing an explicit reset after enabling the clock, the issue on the boards could no longer be observed. Note, that the RK3576 SoC used by the ROCK 4D board does not yet support system level PM, so the resume path has not been tested. Cc: stable@vger.kernel.org Fixes: 7300c9b ("net: phy: realtek: Add optional external PHY clock") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
According to the Ethernet controller device tree binding "rgmii-id"
means, that the PCB does not have extra long lines to add the required
delays. This is indeed the case for the ROCK 4D.
The problem is, that the Rockchip MAC Linux driver interprets the
interface type differently and abuses the information to configure
RX and TX delays in the MAC using (vendor) properties 'rx_delay' and
'tx_delay'.
When Detlev Casanova upstreamed the ROCK 4D device tree, he used the
correct description for the board ("rgmii-id"). This results in no delays
being configured in the MAC. At the same time the PHY will provide
some delays.
This works to some degree, but is not a stable configuration. All five
ROCK 4D production boards, which have recently been added to the Collabora
LAVA lab for CI purposes have trouble with data not getting through
after a connection has been established.
Using the same delay setup as the vendor device tree fixes the
functionality (at the cost of not properly following the DT binding).
As we cannot fix the driver behavior for RK3576 (some other boards
already depend on this), let's update the ROCK 4D DT instead.
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Rename rockchip_pcie_get_ltssm to rockchip_pcie_get_ltssm_status_reg to avoid confusion after introducing the .get_ltssm operation support, which requires further processing of the register. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Implement .get_ltssm operation function pointer, which will be needed for system suspend support. As the driver used to have a function called rockchip_pcie_get_ltssm() with different behavior the new function is named rockchip_pcie_get_ltssm_state(), so that issues can easily be detected when porting patches. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
…r_get_enable_optional()" This reverts commit c930b10. We need the vpcie3v3 regulator handle in the system suspend routine. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
By moving devm_phy_get() to the probe routine, rockchip_pcie_phy_init() can be used to re-initialize the PCIe PHY, which is for example needed after a system suspend/resume cycle. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Remove code duplocation and improve readability by introducing a new function to setup the enhanced LTSSM mode. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Remove code duplication and improve readability by introducing a new function to setup the controller mode. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Remove code duplication and improve readability by introducing a new function to setup the DLL indicator. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Prepare Rockchip PCIe controller for system suspend support by adding the PME turn off operation. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Fix the mismatch between the simple-audio-card routing table vs. widget names, which caused the following error at boot preventing the sound card from getting added: [ 6.625634] asoc-simple-card sound: ASoC: DAPM unknown pin Headphones [ 6.627247] asoc-simple-card sound: ASoC: Failed to add route HPOL -> Headphones(*) [ 6.627988] asoc-simple-card sound: ASoC: Failed to add route HPOR -> Headphones(*) Signed-off-by: Alexey Charkov <alchark@gmail.com>
In environment with too many APs, the buffer of 64k is not enough to keep all BSS entries. Let's suppress the message (can be enabled via debugfs).
'simple-audio-card,name' ends up in user visible places such as ALSA mixer names, so use a more human-readable name instead of realtek,rt5616-codec Signed-off-by: Alexey Charkov <alchark@gmail.com>
NanoPi M5 derives its analog sound signal from SAI2 in M0 pin mode, so the MCLK pin should be configured accordingly for the sound codec to get its I2S signal from the SoC. Request the required pin config. The clock itself should also be CLK_SAI2_MCLKOUT_TO_IO for the sound to work. Signed-off-by: Alexey Charkov <alchark@gmail.com>
Rockchip RK3576 EVB1 board uses the typical configuration with an ES8388 analog codec driven from built-in SAI I2S. Add device tree nodes for it. Signed-off-by: Alexey Charkov <alchark@gmail.com>
… on EVB1 Headphone detect pin needs to be pulled up internally for input events to trigger. While at that, rename pins to follow the schematic. Signed-off-by: Alexey Charkov <alchark@gmail.com>
These allow selective muting/unmuting of inputs and outputs, as well as setting mutually-exclusive rules in ALSA UCM. Signed-off-by: Alexey Charkov <alchark@gmail.com>
Hardcode always show single logo regardless of CPU count Signed-off-by: Alexey Charkov <alchark@gmail.com>
The DisplayPort found on RK3576 is very similar to that of RK3588, but work in dual pixel mode. And itself does not depend on the I2S clock or the SPDIF clock when transmit audio. Signed-off-by: Andy Yan <andy.yan@rock-chips.com>
The DW DisplayPort hardware block can be configured to work in single, dual,quad pixel mode on differnt platforms, so make the pixel mode set by plat_data to support the upcoming rk3576 variant. Signed-off-by: Andy Yan <andy.yan@rock-chips.com>
The i2s/spdif clk are mandatory for rk3588, but not used for the upcoming rk3576, so make it optional here. Signed-off-by: Andy Yan <andy.yan@rock-chips.com>
The DisplayPort of the RK3576 is basically the same as that of the RK3588, but it operates in dual-pixel mode and also support MST. This patch only enable the SST output now. Signed-off-by: Andy Yan <andy.yan@rock-chips.com>
The DisplayPort on rk3576 is compliant with DisplayPort Specification Version 1.4 with MST support, and share the USBDP combo PHY with USB 3.1 OTG0 controller. Signed-off-by: Andy Yan <andy.yan@rock-chips.com>
Signed-off-by: Andy Yan <andy.yan@rock-chips.com>
A super smart change to test PRs notifications
9041732 to
3b3e4f0
Compare
alchark
pushed a commit
that referenced
this pull request
Jan 12, 2026
The GPIO controller is configured as non-sleeping but it uses generic
pinctrl helpers which use a mutex for synchronization.
This can cause the following lockdep splat with shared GPIOs enabled on
boards which have multiple devices using the same GPIO:
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:591
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 12, name:
kworker/u16:0
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
6 locks held by kworker/u16:0/12:
#0: ffff0001f0018d48 ((wq_completion)events_unbound#2){+.+.}-{0:0},
at: process_one_work+0x18c/0x604
#1: ffff8000842dbdf0 (deferred_probe_work){+.+.}-{0:0}, at:
process_one_work+0x1b4/0x604
#2: ffff0001f18498f8 (&dev->mutex){....}-{4:4}, at:
__device_attach+0x38/0x1b0
#3: ffff0001f75f1e90 (&gdev->srcu){.+.?}-{0:0}, at:
gpiod_direction_output_raw_commit+0x0/0x360
#4: ffff0001f46e3db8 (&shared_desc->spinlock){....}-{3:3}, at:
gpio_shared_proxy_direction_output+0xd0/0x144 [gpio_shared_proxy]
#5: ffff0001f180ee90 (&gdev->srcu){.+.?}-{0:0}, at:
gpiod_direction_output_raw_commit+0x0/0x360
irq event stamp: 81450
hardirqs last enabled at (81449): [<ffff8000813acba4>]
_raw_spin_unlock_irqrestore+0x74/0x78
hardirqs last disabled at (81450): [<ffff8000813abfb8>]
_raw_spin_lock_irqsave+0x84/0x88
softirqs last enabled at (79616): [<ffff8000811455fc>]
__alloc_skb+0x17c/0x1e8
softirqs last disabled at (79614): [<ffff8000811455fc>]
__alloc_skb+0x17c/0x1e8
CPU: 2 UID: 0 PID: 12 Comm: kworker/u16:0 Not tainted
6.19.0-rc4-next-20260105+ #11975 PREEMPT
Hardware name: Hardkernel ODROID-M1 (DT)
Workqueue: events_unbound deferred_probe_work_func
Call trace:
show_stack+0x18/0x24 (C)
dump_stack_lvl+0x90/0xd0
dump_stack+0x18/0x24
__might_resched+0x144/0x248
__might_sleep+0x48/0x98
__mutex_lock+0x5c/0x894
mutex_lock_nested+0x24/0x30
pinctrl_get_device_gpio_range+0x44/0x128
pinctrl_gpio_direction+0x3c/0xe0
pinctrl_gpio_direction_output+0x14/0x20
rockchip_gpio_direction_output+0xb8/0x19c
gpiochip_direction_output+0x38/0x94
gpiod_direction_output_raw_commit+0x1d8/0x360
gpiod_direction_output_nonotify+0x7c/0x230
gpiod_direction_output+0x34/0xf8
gpio_shared_proxy_direction_output+0xec/0x144 [gpio_shared_proxy]
gpiochip_direction_output+0x38/0x94
gpiod_direction_output_raw_commit+0x1d8/0x360
gpiod_direction_output_nonotify+0x7c/0x230
gpiod_configure_flags+0xbc/0x480
gpiod_find_and_request+0x1a0/0x574
gpiod_get_index+0x58/0x84
devm_gpiod_get_index+0x20/0xb4
devm_gpiod_get_optional+0x18/0x30
rockchip_pcie_probe+0x98/0x380
platform_probe+0x5c/0xac
really_probe+0xbc/0x298
Fixes: 936ee26 ("gpio/rockchip: add driver for rockchip gpio")
Cc: stable@vger.kernel.org
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Closes: https://lore.kernel.org/all/d035fc29-3b03-4cd6-b8ec-001f93540bc6@samsung.com/
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20260106090011.21603-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
alchark
pushed a commit
that referenced
this pull request
Jan 12, 2026
…ked_inode()
In btrfs_read_locked_inode() we are calling btrfs_init_file_extent_tree()
while holding a path with a read locked leaf from a subvolume tree, and
btrfs_init_file_extent_tree() may do a GFP_KERNEL allocation, which can
trigger reclaim.
This can create a circular lock dependency which lockdep warns about with
the following splat:
[6.1433] ======================================================
[6.1574] WARNING: possible circular locking dependency detected
[6.1583] 6.18.0+ #4 Tainted: G U
[6.1591] ------------------------------------------------------
[6.1599] kswapd0/117 is trying to acquire lock:
[6.1606] ffff8d9b6333c5b8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1625]
but task is already holding lock:
[6.1633] ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60
[6.1646]
which lock already depends on the new lock.
[6.1657]
the existing dependency chain (in reverse order) is:
[6.1667]
-> #2 (fs_reclaim){+.+.}-{0:0}:
[6.1677] fs_reclaim_acquire+0x9d/0xd0
[6.1685] __kmalloc_cache_noprof+0x59/0x750
[6.1694] btrfs_init_file_extent_tree+0x90/0x100
[6.1702] btrfs_read_locked_inode+0xc3/0x6b0
[6.1710] btrfs_iget+0xbb/0xf0
[6.1716] btrfs_lookup_dentry+0x3c5/0x8e0
[6.1724] btrfs_lookup+0x12/0x30
[6.1731] lookup_open.isra.0+0x1aa/0x6a0
[6.1739] path_openat+0x5f7/0xc60
[6.1746] do_filp_open+0xd6/0x180
[6.1753] do_sys_openat2+0x8b/0xe0
[6.1760] __x64_sys_openat+0x54/0xa0
[6.1768] do_syscall_64+0x97/0x3e0
[6.1776] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[6.1784]
-> #1 (btrfs-tree-00){++++}-{3:3}:
[6.1794] lock_release+0x127/0x2a0
[6.1801] up_read+0x1b/0x30
[6.1808] btrfs_search_slot+0x8e0/0xff0
[6.1817] btrfs_lookup_inode+0x52/0xd0
[6.1825] __btrfs_update_delayed_inode+0x73/0x520
[6.1833] btrfs_commit_inode_delayed_inode+0x11a/0x120
[6.1842] btrfs_log_inode+0x608/0x1aa0
[6.1849] btrfs_log_inode_parent+0x249/0xf80
[6.1857] btrfs_log_dentry_safe+0x3e/0x60
[6.1865] btrfs_sync_file+0x431/0x690
[6.1872] do_fsync+0x39/0x80
[6.1879] __x64_sys_fsync+0x13/0x20
[6.1887] do_syscall_64+0x97/0x3e0
[6.1894] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[6.1903]
-> #0 (&delayed_node->mutex){+.+.}-{3:3}:
[6.1913] __lock_acquire+0x15e9/0x2820
[6.1920] lock_acquire+0xc9/0x2d0
[6.1927] __mutex_lock+0xcc/0x10a0
[6.1934] __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1944] btrfs_evict_inode+0x20b/0x4b0
[6.1952] evict+0x15a/0x2f0
[6.1958] prune_icache_sb+0x91/0xd0
[6.1966] super_cache_scan+0x150/0x1d0
[6.1974] do_shrink_slab+0x155/0x6f0
[6.1981] shrink_slab+0x48e/0x890
[6.1988] shrink_one+0x11a/0x1f0
[6.1995] shrink_node+0xbfd/0x1320
[6.1002] balance_pgdat+0x67f/0xc60
[6.1321] kswapd+0x1dc/0x3e0
[6.1643] kthread+0xff/0x240
[6.1965] ret_from_fork+0x223/0x280
[6.1287] ret_from_fork_asm+0x1a/0x30
[6.1616]
other info that might help us debug this:
[6.1561] Chain exists of:
&delayed_node->mutex --> btrfs-tree-00 --> fs_reclaim
[6.1503] Possible unsafe locking scenario:
[6.1110] CPU0 CPU1
[6.1411] ---- ----
[6.1707] lock(fs_reclaim);
[6.1998] lock(btrfs-tree-00);
[6.1291] lock(fs_reclaim);
[6.1581] lock(&delayed_node->mutex);
[6.1874]
*** DEADLOCK ***
[6.1716] 2 locks held by kswapd0/117:
[6.1999] #0: ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60
[6.1294] #1: ffff8d998344b0e0 (&type->s_umount_key#40){++++}- {3:3}, at: super_cache_scan+0x37/0x1d0
[6.1596]
stack backtrace:
[6.1183] CPU: 11 UID: 0 PID: 117 Comm: kswapd0 Tainted: G U 6.18.0+ #4 PREEMPT(lazy)
[6.1185] Tainted: [U]=USER
[6.1186] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
[6.1187] Call Trace:
[6.1187] <TASK>
[6.1189] dump_stack_lvl+0x6e/0xa0
[6.1192] print_circular_bug.cold+0x17a/0x1c0
[6.1194] check_noncircular+0x175/0x190
[6.1197] __lock_acquire+0x15e9/0x2820
[6.1200] lock_acquire+0xc9/0x2d0
[6.1201] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1204] __mutex_lock+0xcc/0x10a0
[6.1206] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1208] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1211] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1213] __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1215] btrfs_evict_inode+0x20b/0x4b0
[6.1217] ? lock_acquire+0xc9/0x2d0
[6.1220] evict+0x15a/0x2f0
[6.1222] prune_icache_sb+0x91/0xd0
[6.1224] super_cache_scan+0x150/0x1d0
[6.1226] do_shrink_slab+0x155/0x6f0
[6.1228] shrink_slab+0x48e/0x890
[6.1229] ? shrink_slab+0x2d2/0x890
[6.1231] shrink_one+0x11a/0x1f0
[6.1234] shrink_node+0xbfd/0x1320
[6.1236] ? shrink_node+0xa2d/0x1320
[6.1236] ? shrink_node+0xbd3/0x1320
[6.1239] ? balance_pgdat+0x67f/0xc60
[6.1239] balance_pgdat+0x67f/0xc60
[6.1241] ? finish_task_switch.isra.0+0xc4/0x2a0
[6.1246] kswapd+0x1dc/0x3e0
[6.1247] ? __pfx_autoremove_wake_function+0x10/0x10
[6.1249] ? __pfx_kswapd+0x10/0x10
[6.1250] kthread+0xff/0x240
[6.1251] ? __pfx_kthread+0x10/0x10
[6.1253] ret_from_fork+0x223/0x280
[6.1255] ? __pfx_kthread+0x10/0x10
[6.1257] ret_from_fork_asm+0x1a/0x30
[6.1260] </TASK>
This is because:
1) The fsync task is holding an inode's delayed node mutex (for a
directory) while calling __btrfs_update_delayed_inode() and that needs
to do a search on the subvolume's btree (therefore read lock some
extent buffers);
2) The lookup task, at btrfs_lookup(), triggered reclaim with the
GFP_KERNEL allocation done by btrfs_init_file_extent_tree() while
holding a read lock on a subvolume leaf;
3) The reclaim triggered kswapd which is doing inode eviction for the
directory inode the fsync task is using as an argument to
btrfs_commit_inode_delayed_inode() - but in that call chain we are
trying to read lock the same leaf that the lookup task is holding
while calling btrfs_init_file_extent_tree() and doing the GFP_KERNEL
allocation.
Fix this by calling btrfs_init_file_extent_tree() after we don't need the
path anymore and release it in btrfs_read_locked_inode().
Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/linux-btrfs/6e55113a22347c3925458a5d840a18401a38b276.camel@linux.intel.com/
Fixes: 8679d26 ("btrfs: initialize inode::file_extent_tree after i_mode has been set")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
alchark
pushed a commit
that referenced
this pull request
Jan 12, 2026
When forward-porting Rust Binder to 6.18, I neglected to take commit fb56fdf ("mm/list_lru: split the lock to per-cgroup scope") into account, and apparently I did not end up running the shrinker callback when I sanity tested the driver before submission. This leads to crashes like the following: ============================================ WARNING: possible recursive locking detected 6.18.0-mainline-maybe-dirty #1 Tainted: G IO -------------------------------------------- kswapd0/68 is trying to acquire lock: ffff956000fa18b0 (&l->lock){+.+.}-{2:2}, at: lock_list_lru_of_memcg+0x128/0x230 but task is already holding lock: ffff956000fa18b0 (&l->lock){+.+.}-{2:2}, at: rust_helper_spin_lock+0xd/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&l->lock); lock(&l->lock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kswapd0/68: #0: ffffffff90d2e260 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0x597/0x1160 #1: ffff956000fa18b0 (&l->lock){+.+.}-{2:2}, at: rust_helper_spin_lock+0xd/0x20 #2: ffffffff90cf3680 (rcu_read_lock){....}-{1:2}, at: lock_list_lru_of_memcg+0x2d/0x230 To fix this, remove the spin_lock() call from rust_shrink_free_page(). Cc: stable <stable@kernel.org> Fixes: eafedbc ("rust_binder: add Rust Binder driver") Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20251202-binder-shrink-unspin-v1-1-263efb9ad625@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
586f4d7 to
2bff7da
Compare
alchark
pushed a commit
that referenced
this pull request
Jan 21, 2026
When a page is freed it coalesces with a buddy into a higher order page while possible. When the buddy page migrate type differs, it is expected to be updated to match the one of the page being freed. However, only the first pageblock of the buddy page is updated, while the rest of the pageblocks are left unchanged. That causes warnings in later expand() and other code paths (like below), since an inconsistency between migration type of the list containing the page and the page-owned pageblocks migration types is introduced. [ 308.986589] ------------[ cut here ]------------ [ 308.987227] page type is 0, passed migratetype is 1 (nr=256) [ 308.987275] WARNING: CPU: 1 PID: 5224 at mm/page_alloc.c:812 expand+0x23c/0x270 [ 308.987293] Modules linked in: algif_hash(E) af_alg(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) loop(E) nfnetlink(E) vsock_loopback(E) vmw_vsock_virtio_transport_common(E) vsock(E) ctcm(E) fsm(E) diag288_wdt(E) watchdog(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) aes_s390(E) des_generic(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha_common(E) paes_s390(E) crypto_engine(E) pkey_cca(E) pkey_ep11(E) zcrypt(E) rng_core(E) pkey_pckmo(E) pkey(E) autofs4(E) [ 308.987439] Unloaded tainted modules: hmac_s390(E):2 [ 308.987650] CPU: 1 UID: 0 PID: 5224 Comm: mempig_verify Kdump: loaded Tainted: G E 6.18.0-gcc-bpf-debug torvalds#431 PREEMPT [ 308.987657] Tainted: [E]=UNSIGNED_MODULE [ 308.987661] Hardware name: IBM 3906 M04 704 (z/VM 7.3.0) [ 308.987666] Krnl PSW : 0404f00180000000 00000349976fa600 (expand+0x240/0x270) [ 308.987676] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 [ 308.987682] Krnl GPRS: 0000034980000004 0000000000000005 0000000000000030 000003499a0e6d88 [ 308.987688] 0000000000000005 0000034980000005 000002be803ac000 0000023efe6c8300 [ 308.987692] 0000000000000008 0000034998d57290 000002be00000100 0000023e00000008 [ 308.987696] 0000000000000000 0000000000000000 00000349976fa5fc 000002c99b1eb6f0 [ 308.987708] Krnl Code: 00000349976fa5f0: c020008a02f2 larl %r2,000003499883abd4 00000349976fa5f6: c0e5ffe3f4b5 brasl %r14,0000034997378f60 #00000349976fa5fc: af000000 mc 0,0 >00000349976fa600: a7f4ff4c brc 15,00000349976fa498 00000349976fa604: b9040026 lgr %r2,%r6 00000349976fa608: c0300088317f larl %r3,0000034998800906 00000349976fa60e: c0e5fffdb6e1 brasl %r14,00000349976b13d0 00000349976fa614: af000000 mc 0,0 [ 308.987734] Call Trace: [ 308.987738] [<00000349976fa600>] expand+0x240/0x270 [ 308.987744] ([<00000349976fa5fc>] expand+0x23c/0x270) [ 308.987749] [<00000349976ff95e>] rmqueue_bulk+0x71e/0x940 [ 308.987754] [<00000349976ffd7e>] __rmqueue_pcplist+0x1fe/0x2a0 [ 308.987759] [<0000034997700966>] rmqueue.isra.0+0xb46/0xf40 [ 308.987763] [<0000034997703ec8>] get_page_from_freelist+0x198/0x8d0 [ 308.987768] [<0000034997706fa8>] __alloc_frozen_pages_noprof+0x198/0x400 [ 308.987774] [<00000349977536f8>] alloc_pages_mpol+0xb8/0x220 [ 308.987781] [<0000034997753bf6>] folio_alloc_mpol_noprof+0x26/0xc0 [ 308.987786] [<0000034997753e4c>] vma_alloc_folio_noprof+0x6c/0xa0 [ 308.987791] [<0000034997775b22>] vma_alloc_anon_folio_pmd+0x42/0x240 [ 308.987799] [<000003499777bfea>] __do_huge_pmd_anonymous_page+0x3a/0x210 [ 308.987804] [<00000349976cb08e>] __handle_mm_fault+0x4de/0x500 [ 308.987809] [<00000349976cb14c>] handle_mm_fault+0x9c/0x3a0 [ 308.987813] [<000003499734d70e>] do_exception+0x1de/0x540 [ 308.987822] [<0000034998387390>] __do_pgm_check+0x130/0x220 [ 308.987830] [<000003499839a934>] pgm_check_handler+0x114/0x160 [ 308.987838] 3 locks held by mempig_verify/5224: [ 308.987842] #0: 0000023ea44c1e08 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xb2/0x2a0 [ 308.987859] #1: 0000023ee4d41b18 (&pcp->lock){+.+.}-{2:2}, at: rmqueue.isra.0+0xad6/0xf40 [ 308.987871] #2: 0000023efe6c8998 (&zone->lock){..-.}-{2:2}, at: rmqueue_bulk+0x5a/0x940 [ 308.987886] Last Breaking-Event-Address: [ 308.987890] [<0000034997379096>] __warn_printk+0x136/0x140 [ 308.987897] irq event stamp: 52330356 [ 308.987901] hardirqs last enabled at (52330355): [<000003499838742e>] __do_pgm_check+0x1ce/0x220 [ 308.987907] hardirqs last disabled at (52330356): [<000003499839932e>] _raw_spin_lock_irqsave+0x9e/0xe0 [ 308.987913] softirqs last enabled at (52329882): [<0000034997383786>] handle_softirqs+0x2c6/0x530 [ 308.987922] softirqs last disabled at (52329859): [<0000034997382f86>] __irq_exit_rcu+0x126/0x140 [ 308.987929] ---[ end trace 0000000000000000 ]--- [ 308.987936] ------------[ cut here ]------------ [ 308.987940] page type is 0, passed migratetype is 1 (nr=256) [ 308.987951] WARNING: CPU: 1 PID: 5224 at mm/page_alloc.c:860 __del_page_from_free_list+0x1be/0x1e0 [ 308.987960] Modules linked in: algif_hash(E) af_alg(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) loop(E) nfnetlink(E) vsock_loopback(E) vmw_vsock_virtio_transport_common(E) vsock(E) ctcm(E) fsm(E) diag288_wdt(E) watchdog(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) aes_s390(E) des_generic(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha_common(E) paes_s390(E) crypto_engine(E) pkey_cca(E) pkey_ep11(E) zcrypt(E) rng_core(E) pkey_pckmo(E) pkey(E) autofs4(E) [ 308.988070] Unloaded tainted modules: hmac_s390(E):2 [ 308.988087] CPU: 1 UID: 0 PID: 5224 Comm: mempig_verify Kdump: loaded Tainted: G W E 6.18.0-gcc-bpf-debug torvalds#431 PREEMPT [ 308.988095] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE [ 308.988100] Hardware name: IBM 3906 M04 704 (z/VM 7.3.0) [ 308.988105] Krnl PSW : 0404f00180000000 00000349976f9e32 (__del_page_from_free_list+0x1c2/0x1e0) [ 308.988118] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 [ 308.988127] Krnl GPRS: 0000034980000004 0000000000000005 0000000000000030 000003499a0e6d88 [ 308.988133] 0000000000000005 0000034980000005 0000034998d57290 0000023efe6c8300 [ 308.988139] 0000000000000001 0000000000000008 000002be00000100 000002be803ac000 [ 308.988144] 0000000000000000 0000000000000001 00000349976f9e2e 000002c99b1eb728 [ 308.988153] Krnl Code: 00000349976f9e22: c020008a06d9 larl %r2,000003499883abd4 00000349976f9e28: c0e5ffe3f89c brasl %r14,0000034997378f60 #00000349976f9e2e: af000000 mc 0,0 >00000349976f9e32: a7f4ff4e brc 15,00000349976f9cce 00000349976f9e36: b904002b lgr %r2,%r11 00000349976f9e3a: c030008a06e7 larl %r3,000003499883ac08 00000349976f9e40: c0e5fffdbac8 brasl %r14,00000349976b13d0 00000349976f9e46: af000000 mc 0,0 [ 308.988184] Call Trace: [ 308.988188] [<00000349976f9e32>] __del_page_from_free_list+0x1c2/0x1e0 [ 308.988195] ([<00000349976f9e2e>] __del_page_from_free_list+0x1be/0x1e0) [ 308.988202] [<00000349976ff946>] rmqueue_bulk+0x706/0x940 [ 308.988208] [<00000349976ffd7e>] __rmqueue_pcplist+0x1fe/0x2a0 [ 308.988214] [<0000034997700966>] rmqueue.isra.0+0xb46/0xf40 [ 308.988221] [<0000034997703ec8>] get_page_from_freelist+0x198/0x8d0 [ 308.988227] [<0000034997706fa8>] __alloc_frozen_pages_noprof+0x198/0x400 [ 308.988233] [<00000349977536f8>] alloc_pages_mpol+0xb8/0x220 [ 308.988240] [<0000034997753bf6>] folio_alloc_mpol_noprof+0x26/0xc0 [ 308.988247] [<0000034997753e4c>] vma_alloc_folio_noprof+0x6c/0xa0 [ 308.988253] [<0000034997775b22>] vma_alloc_anon_folio_pmd+0x42/0x240 [ 308.988260] [<000003499777bfea>] __do_huge_pmd_anonymous_page+0x3a/0x210 [ 308.988267] [<00000349976cb08e>] __handle_mm_fault+0x4de/0x500 [ 308.988273] [<00000349976cb14c>] handle_mm_fault+0x9c/0x3a0 [ 308.988279] [<000003499734d70e>] do_exception+0x1de/0x540 [ 308.988286] [<0000034998387390>] __do_pgm_check+0x130/0x220 [ 308.988293] [<000003499839a934>] pgm_check_handler+0x114/0x160 [ 308.988300] 3 locks held by mempig_verify/5224: [ 308.988305] #0: 0000023ea44c1e08 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xb2/0x2a0 [ 308.988322] #1: 0000023ee4d41b18 (&pcp->lock){+.+.}-{2:2}, at: rmqueue.isra.0+0xad6/0xf40 [ 308.988334] #2: 0000023efe6c8998 (&zone->lock){..-.}-{2:2}, at: rmqueue_bulk+0x5a/0x940 [ 308.988346] Last Breaking-Event-Address: [ 308.988350] [<0000034997379096>] __warn_printk+0x136/0x140 [ 308.988356] irq event stamp: 52330356 [ 308.988360] hardirqs last enabled at (52330355): [<000003499838742e>] __do_pgm_check+0x1ce/0x220 [ 308.988366] hardirqs last disabled at (52330356): [<000003499839932e>] _raw_spin_lock_irqsave+0x9e/0xe0 [ 308.988373] softirqs last enabled at (52329882): [<0000034997383786>] handle_softirqs+0x2c6/0x530 [ 308.988380] softirqs last disabled at (52329859): [<0000034997382f86>] __irq_exit_rcu+0x126/0x140 [ 308.988388] ---[ end trace 0000000000000000 ]--- Link: https://lkml.kernel.org/r/20251215081002.3353900A9c-agordeev@linux.ibm.com Link: https://lkml.kernel.org/r/20251212151457.3898073Add-agordeev@linux.ibm.com Fixes: e6cf9e1 ("mm: page_alloc: fix up block types when merging compatible blocks") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Closes: https://lore.kernel.org/linux-mm/87wmalyktd.fsf@linux.ibm.com/ Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Marc Hartmayer <mhartmay@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
alchark
pushed a commit
that referenced
this pull request
Jan 21, 2026
When running the Rust maple tree kunit tests with lockdep, you may trigger a warning that looks like this: lib/maple_tree.c:780 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by kunit_try_catch/344. stack backtrace: CPU: 3 UID: 0 PID: 344 Comm: kunit_try_catch Tainted: G N 6.19.0-rc1+ #2 NONE Tainted: [N]=TEST Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x71/0x90 lockdep_rcu_suspicious+0x150/0x190 mas_start+0x104/0x150 mas_find+0x179/0x240 _RINvNtCs5QSdWC790r4_4core3ptr13drop_in_placeINtNtCs1cdwasc6FUb_6kernel10maple_tree9MapleTreeINtNtNtBL_5alloc4kbox3BoxlNtNtB1x_9allocator7KmallocEEECsgxAQYCfdR72_25doctests_kernel_generated+0xaf/0x130 rust_doctest_kernel_maple_tree_rs_0+0x600/0x6b0 ? lock_release+0xeb/0x2a0 ? kunit_try_catch_run+0x210/0x210 kunit_try_run_case+0x74/0x160 ? kunit_try_catch_run+0x210/0x210 kunit_generic_run_threadfn_adapter+0x12/0x30 kthread+0x21c/0x230 ? __do_trace_sched_kthread_stop_ret+0x40/0x40 ret_from_fork+0x16c/0x270 ? __do_trace_sched_kthread_stop_ret+0x40/0x40 ret_from_fork_asm+0x11/0x20 </TASK> This is because the destructor of maple tree calls mas_find() without taking rcu_read_lock() or the spinlock. Doing that is actually ok in this case since the destructor has exclusive access to the entire maple tree, but it triggers a lockdep warning. To fix that, take the rcu read lock. In the future, it's possible that memory reclaim could gain a feature where it reallocates entries in maple trees even if no user-code is touching it. If that feature is added, then this use of rcu read lock would become load-bearing, so I did not make it conditional on lockdep. We have to repeatedly take and release rcu because the destructor of T might perform operations that sleep. Link: https://lkml.kernel.org/r/20251217-maple-drop-rcu-v1-1-702af063573f@google.com Fixes: da939ef ("rust: maple_tree: add MapleTree") Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reported-by: Andreas Hindborg <a.hindborg@kernel.org> Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/x/topic/x/near/564215108 Reviewed-by: Gary Guo <gary@garyguo.net> Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Trevor Gross <tmgross@umich.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
alchark
pushed a commit
that referenced
this pull request
Jan 21, 2026
… to macb_open() In the non-RT kernel, local_bh_disable() merely disables preemption, whereas it maps to an actual spin lock in the RT kernel. Consequently, when attempting to refill RX buffers via netdev_alloc_skb() in macb_mac_link_up(), a deadlock scenario arises as follows: WARNING: possible circular locking dependency detected 6.18.0-08691-g2061f18ad76e torvalds#39 Not tainted ------------------------------------------------------ kworker/0:0/8 is trying to acquire lock: ffff00080369bbe0 (&bp->lock){+.+.}-{3:3}, at: macb_start_xmit+0x808/0xb7c but task is already holding lock: ffff000803698e58 (&queue->tx_ptr_lock){+...}-{3:3}, at: macb_start_xmit +0x148/0xb7c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&queue->tx_ptr_lock){+...}-{3:3}: rt_spin_lock+0x50/0x1f0 macb_start_xmit+0x148/0xb7c dev_hard_start_xmit+0x94/0x284 sch_direct_xmit+0x8c/0x37c __dev_queue_xmit+0x708/0x1120 neigh_resolve_output+0x148/0x28c ip6_finish_output2+0x2c0/0xb2c __ip6_finish_output+0x114/0x308 ip6_output+0xc4/0x4a4 mld_sendpack+0x220/0x68c mld_ifc_work+0x2a8/0x4f4 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20 -> #2 (_xmit_ETHER#2){+...}-{3:3}: rt_spin_lock+0x50/0x1f0 sch_direct_xmit+0x11c/0x37c __dev_queue_xmit+0x708/0x1120 neigh_resolve_output+0x148/0x28c ip6_finish_output2+0x2c0/0xb2c __ip6_finish_output+0x114/0x308 ip6_output+0xc4/0x4a4 mld_sendpack+0x220/0x68c mld_ifc_work+0x2a8/0x4f4 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20 -> #1 ((softirq_ctrl.lock)){+.+.}-{3:3}: lock_release+0x250/0x348 __local_bh_enable_ip+0x7c/0x240 __netdev_alloc_skb+0x1b4/0x1d8 gem_rx_refill+0xdc/0x240 gem_init_rings+0xb4/0x108 macb_mac_link_up+0x9c/0x2b4 phylink_resolve+0x170/0x614 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20 -> #0 (&bp->lock){+.+.}-{3:3}: __lock_acquire+0x15a8/0x2084 lock_acquire+0x1cc/0x350 rt_spin_lock+0x50/0x1f0 macb_start_xmit+0x808/0xb7c dev_hard_start_xmit+0x94/0x284 sch_direct_xmit+0x8c/0x37c __dev_queue_xmit+0x708/0x1120 neigh_resolve_output+0x148/0x28c ip6_finish_output2+0x2c0/0xb2c __ip6_finish_output+0x114/0x308 ip6_output+0xc4/0x4a4 mld_sendpack+0x220/0x68c mld_ifc_work+0x2a8/0x4f4 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20 other info that might help us debug this: Chain exists of: &bp->lock --> _xmit_ETHER#2 --> &queue->tx_ptr_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&queue->tx_ptr_lock); lock(_xmit_ETHER#2); lock(&queue->tx_ptr_lock); lock(&bp->lock); *** DEADLOCK *** Call trace: show_stack+0x18/0x24 (C) dump_stack_lvl+0xa0/0xf0 dump_stack+0x18/0x24 print_circular_bug+0x28c/0x370 check_noncircular+0x198/0x1ac __lock_acquire+0x15a8/0x2084 lock_acquire+0x1cc/0x350 rt_spin_lock+0x50/0x1f0 macb_start_xmit+0x808/0xb7c dev_hard_start_xmit+0x94/0x284 sch_direct_xmit+0x8c/0x37c __dev_queue_xmit+0x708/0x1120 neigh_resolve_output+0x148/0x28c ip6_finish_output2+0x2c0/0xb2c __ip6_finish_output+0x114/0x308 ip6_output+0xc4/0x4a4 mld_sendpack+0x220/0x68c mld_ifc_work+0x2a8/0x4f4 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20 Notably, invoking the mog_init_rings() callback upon link establishment is unnecessary. Instead, we can exclusively call mog_init_rings() within the ndo_open() callback. This adjustment resolves the deadlock issue. Furthermore, since MACB_CAPS_MACB_IS_EMAC cases do not use mog_init_rings() when opening the network interface via at91ether_open(), moving mog_init_rings() to macb_open() also eliminates the MACB_CAPS_MACB_IS_EMAC check. Fixes: 633e98a ("net: macb: use resolved link config in mac_link_up()") Cc: stable@vger.kernel.org Suggested-by: Kevin Hao <kexin.hao@windriver.com> Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com> Link: https://patch.msgid.link/20251222015624.1994551-1-xiaolei.wang@windriver.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
alchark
pushed a commit
that referenced
this pull request
Jan 21, 2026
ctx->tcxt_list holds the tasks using this ring, and it's currently
protected by the normal ctx->uring_lock. However, this can cause a
circular locking issue, as reported by syzbot, where cancelations off
exec end up needing to remove an entry from this list:
======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Tainted: G L
------------------------------------------------------
syz.0.9999/12287 is trying to acquire lock:
ffff88805851c0a8 (&ctx->uring_lock){+.+.}-{4:4}, at: io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
but task is already holding lock:
ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: prepare_bprm_creds fs/exec.c:1360 [inline]
ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xb9/0x1400 fs/exec.c:1733
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&sig->cred_guard_mutex){+.+.}-{4:4}:
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
proc_pid_attr_write+0x547/0x630 fs/proc/base.c:2837
vfs_write+0x27e/0xb30 fs/read_write.c:684
ksys_write+0x145/0x250 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #1 (sb_writers#3){.+.+}-{0:0}:
percpu_down_read_internal include/linux/percpu-rwsem.h:53 [inline]
percpu_down_read_freezable include/linux/percpu-rwsem.h:83 [inline]
__sb_start_write include/linux/fs/super.h:19 [inline]
sb_start_write+0x4d/0x1c0 include/linux/fs/super.h:125
mnt_want_write+0x41/0x90 fs/namespace.c:499
open_last_lookups fs/namei.c:4529 [inline]
path_openat+0xadd/0x3dd0 fs/namei.c:4784
do_filp_open+0x1fa/0x410 fs/namei.c:4814
io_openat2+0x3e0/0x5c0 io_uring/openclose.c:143
__io_issue_sqe+0x181/0x4b0 io_uring/io_uring.c:1792
io_issue_sqe+0x165/0x1060 io_uring/io_uring.c:1815
io_queue_sqe io_uring/io_uring.c:2042 [inline]
io_submit_sqe io_uring/io_uring.c:2320 [inline]
io_submit_sqes+0xbf4/0x2140 io_uring/io_uring.c:2434
__do_sys_io_uring_enter io_uring/io_uring.c:3280 [inline]
__se_sys_io_uring_enter+0x2e0/0x2b60 io_uring/io_uring.c:3219
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&ctx->uring_lock){+.+.}-{4:4}:
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x15a6/0x2cf0 kernel/locking/lockdep.c:5237
lock_acquire+0x107/0x340 kernel/locking/lockdep.c:5868
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
io_uring_clean_tctx+0xd4/0x1a0 io_uring/tctx.c:195
io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:646
io_uring_task_cancel include/linux/io_uring.h:24 [inline]
begin_new_exec+0x10ed/0x2440 fs/exec.c:1131
load_elf_binary+0x9f8/0x2d70 fs/binfmt_elf.c:1010
search_binary_handler fs/exec.c:1669 [inline]
exec_binprm fs/exec.c:1701 [inline]
bprm_execve+0x92e/0x1400 fs/exec.c:1753
do_execveat_common+0x510/0x6a0 fs/exec.c:1859
do_execve fs/exec.c:1933 [inline]
__do_sys_execve fs/exec.c:2009 [inline]
__se_sys_execve fs/exec.c:2004 [inline]
__x64_sys_execve+0x94/0xb0 fs/exec.c:2004
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Chain exists of:
&ctx->uring_lock --> sb_writers#3 --> &sig->cred_guard_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sig->cred_guard_mutex);
lock(sb_writers#3);
lock(&sig->cred_guard_mutex);
lock(&ctx->uring_lock);
*** DEADLOCK ***
1 lock held by syz.0.9999/12287:
#0: ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: prepare_bprm_creds fs/exec.c:1360 [inline]
#0: ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xb9/0x1400 fs/exec.c:1733
stack backtrace:
CPU: 0 UID: 0 PID: 12287 Comm: syz.0.9999 Tainted: G L syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_circular_bug+0x2e2/0x300 kernel/locking/lockdep.c:2043
check_noncircular+0x12e/0x150 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x15a6/0x2cf0 kernel/locking/lockdep.c:5237
lock_acquire+0x107/0x340 kernel/locking/lockdep.c:5868
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
io_uring_clean_tctx+0xd4/0x1a0 io_uring/tctx.c:195
io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:646
io_uring_task_cancel include/linux/io_uring.h:24 [inline]
begin_new_exec+0x10ed/0x2440 fs/exec.c:1131
load_elf_binary+0x9f8/0x2d70 fs/binfmt_elf.c:1010
search_binary_handler fs/exec.c:1669 [inline]
exec_binprm fs/exec.c:1701 [inline]
bprm_execve+0x92e/0x1400 fs/exec.c:1753
do_execveat_common+0x510/0x6a0 fs/exec.c:1859
do_execve fs/exec.c:1933 [inline]
__do_sys_execve fs/exec.c:2009 [inline]
__se_sys_execve fs/exec.c:2004 [inline]
__x64_sys_execve+0x94/0xb0 fs/exec.c:2004
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff3a8b8f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ff3a9a97038 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 00007ff3a8de5fa0 RCX: 00007ff3a8b8f749
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000200000000400
RBP: 00007ff3a8c13f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ff3a8de6038 R14: 00007ff3a8de5fa0 R15: 00007ff3a8f0fa28
</TASK>
Add a separate lock just for the tctx_list, tctx_lock. This can nest
under ->uring_lock, where necessary, and be used separately for list
manipulation. For the cancelation off exec side, this removes the
need to grab ->uring_lock, hence fixing the circular locking
dependency.
Reported-by: syzbot+b0e3b77ffaa8a4067ce5@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
alchark
pushed a commit
that referenced
this pull request
Jan 21, 2026
…te in qfq_reset `qfq_class->leaf_qdisc->q.qlen > 0` does not imply that the class itself is active. Two qfq_class objects may point to the same leaf_qdisc. This happens when: 1. one QFQ qdisc is attached to the dev as the root qdisc, and 2. another QFQ qdisc is temporarily referenced (e.g., via qdisc_get() / qdisc_put()) and is pending to be destroyed, as in function tc_new_tfilter. When packets are enqueued through the root QFQ qdisc, the shared leaf_qdisc->q.qlen increases. At the same time, the second QFQ qdisc triggers qdisc_put and qdisc_destroy: the qdisc enters qfq_reset() with its own q->q.qlen == 0, but its class's leaf qdisc->q.qlen > 0. Therefore, the qfq_reset would wrongly deactivate an inactive aggregate and trigger a null-deref in qfq_deactivate_agg: [ 0.903172] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 0.903571] #PF: supervisor write access in kernel mode [ 0.903860] #PF: error_code(0x0002) - not-present page [ 0.904177] PGD 10299b067 P4D 10299b067 PUD 10299c067 PMD 0 [ 0.904502] Oops: Oops: 0002 [#1] SMP NOPTI [ 0.904737] CPU: 0 UID: 0 PID: 135 Comm: exploit Not tainted 6.19.0-rc3+ #2 NONE [ 0.905157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 [ 0.905754] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2)) [ 0.906046] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0 Code starting with the faulting instruction =========================================== 0: 0f 84 4d 01 00 00 je 0x153 6: 48 89 70 18 mov %rsi,0x18(%rax) a: 8b 4b 10 mov 0x10(%rbx),%ecx d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 14: 48 8b 78 08 mov 0x8(%rax),%rdi 18: 48 d3 e2 shl %cl,%rdx 1b: 48 21 f2 and %rsi,%rdx 1e: 48 2b 13 sub (%rbx),%rdx 21: 48 8b 30 mov (%rax),%rsi 24: 48 d3 ea shr %cl,%rdx 27: 8b 4b 18 mov 0x18(%rbx),%ecx ... [ 0.907095] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246 [ 0.907368] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000 [ 0.907723] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.908100] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000 [ 0.908451] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000 [ 0.908804] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880 [ 0.909179] FS: 000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000 [ 0.909572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.909857] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0 [ 0.910247] PKRU: 55555554 [ 0.910391] Call Trace: [ 0.910527] <TASK> [ 0.910638] qfq_reset_qdisc (net/sched/sch_qfq.c:357 net/sched/sch_qfq.c:1485) [ 0.910826] qdisc_reset (include/linux/skbuff.h:2195 include/linux/skbuff.h:2501 include/linux/skbuff.h:3424 include/linux/skbuff.h:3430 net/sched/sch_generic.c:1036) [ 0.911040] __qdisc_destroy (net/sched/sch_generic.c:1076) [ 0.911236] tc_new_tfilter (net/sched/cls_api.c:2447) [ 0.911447] rtnetlink_rcv_msg (net/core/rtnetlink.c:6958) [ 0.911663] ? __pfx_rtnetlink_rcv_msg (net/core/rtnetlink.c:6861) [ 0.911894] netlink_rcv_skb (net/netlink/af_netlink.c:2550) [ 0.912100] netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344) [ 0.912296] ? __alloc_skb (net/core/skbuff.c:706) [ 0.912484] netlink_sendmsg (net/netlink/af_netlink.c:1894) [ 0.912682] sock_write_iter (net/socket.c:727 (discriminator 1) net/socket.c:742 (discriminator 1) net/socket.c:1195 (discriminator 1)) [ 0.912880] vfs_write (fs/read_write.c:593 fs/read_write.c:686) [ 0.913077] ksys_write (fs/read_write.c:738) [ 0.913252] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) [ 0.913438] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131) [ 0.913687] RIP: 0033:0x424c34 [ 0.913844] Code: 89 02 48 c7 c0 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 2d 44 09 00 00 74 13 b8 01 00 00 00 0f 05 9 Code starting with the faulting instruction =========================================== 0: 89 02 mov %eax,(%rdx) 2: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax 9: eb bd jmp 0xffffffffffffffc8 b: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 12: 00 00 00 15: 90 nop 16: f3 0f 1e fa endbr64 1a: 80 3d 2d 44 09 00 00 cmpb $0x0,0x9442d(%rip) # 0x9444e 21: 74 13 je 0x36 23: b8 01 00 00 00 mov $0x1,%eax 28: 0f 05 syscall 2a: 09 .byte 0x9 [ 0.914807] RSP: 002b:00007ffea1938b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 0.915197] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000424c34 [ 0.915556] RDX: 000000000000003c RSI: 000000002af378c0 RDI: 0000000000000003 [ 0.915912] RBP: 00007ffea1938bc0 R08: 00000000004b8820 R09: 0000000000000000 [ 0.916297] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffea1938d28 [ 0.916652] R13: 00007ffea1938d38 R14: 00000000004b3828 R15: 0000000000000001 [ 0.917039] </TASK> [ 0.917158] Modules linked in: [ 0.917316] CR2: 0000000000000000 [ 0.917484] ---[ end trace 0000000000000000 ]--- [ 0.917717] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2)) [ 0.917978] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0 Code starting with the faulting instruction =========================================== 0: 0f 84 4d 01 00 00 je 0x153 6: 48 89 70 18 mov %rsi,0x18(%rax) a: 8b 4b 10 mov 0x10(%rbx),%ecx d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 14: 48 8b 78 08 mov 0x8(%rax),%rdi 18: 48 d3 e2 shl %cl,%rdx 1b: 48 21 f2 and %rsi,%rdx 1e: 48 2b 13 sub (%rbx),%rdx 21: 48 8b 30 mov (%rax),%rsi 24: 48 d3 ea shr %cl,%rdx 27: 8b 4b 18 mov 0x18(%rbx),%ecx ... [ 0.918902] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246 [ 0.919198] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000 [ 0.919559] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.919908] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000 [ 0.920289] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000 [ 0.920648] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880 [ 0.921014] FS: 000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000 [ 0.921424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.921710] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0 [ 0.922097] PKRU: 55555554 [ 0.922240] Kernel panic - not syncing: Fatal exception [ 0.922590] Kernel Offset: disabled Fixes: 0545a30 ("pkt_sched: QFQ - quick fair queue scheduler") Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260106034100.1780779-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
alchark
pushed a commit
that referenced
this pull request
Jan 21, 2026
During device unmapping (triggered by module unload or explicit unmap), a refcount underflow occurs causing a use-after-free warning: [14747.574913] ------------[ cut here ]------------ [14747.574916] refcount_t: underflow; use-after-free. [14747.574917] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x55/0x90, CPU#9: kworker/9:1/378 [14747.574924] Modules linked in: rnbd_client(-) rtrs_client rnbd_server rtrs_server rtrs_core ... [14747.574998] CPU: 9 UID: 0 PID: 378 Comm: kworker/9:1 Tainted: G O N 6.19.0-rc3lblk-fnext+ torvalds#42 PREEMPT(voluntary) [14747.575005] Workqueue: rnbd_clt_wq unmap_device_work [rnbd_client] [14747.575010] RIP: 0010:refcount_warn_saturate+0x55/0x90 [14747.575037] Call Trace: [14747.575038] <TASK> [14747.575038] rnbd_clt_unmap_device+0x170/0x1d0 [rnbd_client] [14747.575044] process_one_work+0x211/0x600 [14747.575052] worker_thread+0x184/0x330 [14747.575055] ? __pfx_worker_thread+0x10/0x10 [14747.575058] kthread+0x10d/0x250 [14747.575062] ? __pfx_kthread+0x10/0x10 [14747.575066] ret_from_fork+0x319/0x390 [14747.575069] ? __pfx_kthread+0x10/0x10 [14747.575072] ret_from_fork_asm+0x1a/0x30 [14747.575083] </TASK> [14747.575096] ---[ end trace 0000000000000000 ]--- Befor this patch :- The bug is a double kobject_put() on dev->kobj during device cleanup. Kobject Lifecycle: kobject_init_and_add() sets kobj.kref = 1 (initialization) kobject_put() sets kobj.kref = 0 (should be called once) * Before this patch: rnbd_clt_unmap_device() rnbd_destroy_sysfs() kobject_del(&dev->kobj) [remove from sysfs] kobject_put(&dev->kobj) PUT #1 (WRONG!) kref: 1 to 0 rnbd_dev_release() kfree(dev) [DEVICE FREED!] rnbd_destroy_gen_disk() [use-after-free!] rnbd_clt_put_dev() refcount_dec_and_test(&dev->refcount) kobject_put(&dev->kobj) PUT #2 (UNDERFLOW!) kref: 0 to -1 [WARNING!] The first kobject_put() in rnbd_destroy_sysfs() prematurely frees the device via rnbd_dev_release(), then the second kobject_put() in rnbd_clt_put_dev() causes refcount underflow. * After this patch :- Remove kobject_put() from rnbd_destroy_sysfs(). This function should only remove sysfs visibility (kobject_del), not manage object lifetime. Call Graph (FIXED): rnbd_clt_unmap_device() rnbd_destroy_sysfs() kobject_del(&dev->kobj) [remove from sysfs only] [kref unchanged: 1] rnbd_destroy_gen_disk() [device still valid] rnbd_clt_put_dev() refcount_dec_and_test(&dev->refcount) kobject_put(&dev->kobj) ONLY PUT (CORRECT!) kref: 1 to 0 [BALANCED] rnbd_dev_release() kfree(dev) [CLEAN DESTRUCTION] This follows the kernel pattern where sysfs removal (kobject_del) is separate from object destruction (kobject_put). Fixes: 581cf83 ("block: rnbd: add .release to rnbd_dev_ktype") Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
alchark
pushed a commit
that referenced
this pull request
Jan 21, 2026
Patch series "mm/hugetlb: fixes for PMD table sharing (incl. using mmu_gather)", v3. One functional fix, one performance regression fix, and two related comment fixes. I cleaned up my prototype I recently shared [1] for the performance fix, deferring most of the cleanups I had in the prototype to a later point. While doing that I identified the other things. The goal of this patch set is to be backported to stable trees "fairly" easily. At least patch #1 and #4. Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing Patch #2 + #3 are simple comment fixes that patch #4 interacts with. Patch #4 is a fix for the reported performance regression due to excessive IPI broadcasts during fork()+exit(). The last patch is all about TLB flushes, IPIs and mmu_gather. Read: complicated There are plenty of cleanups in the future to be had + one reasonable optimization on x86. But that's all out of scope for this series. Runtime tested, with a focus on fixing the performance regression using the original reproducer [2] on x86. This patch (of 4): We switched from (wrongly) using the page count to an independent shared count. Now, shared page tables have a refcount of 1 (excluding speculative references) and instead use ptdesc->pt_share_count to identify sharing. We didn't convert hugetlb_pmd_shared(), so right now, we would never detect a shared PMD table as such, because sharing/unsharing no longer touches the refcount of a PMD table. Page migration, like mbind() or migrate_pages() would allow for migrating folios mapped into such shared PMD tables, even though the folios are not exclusive. In smaps we would account them as "private" although they are "shared", and we would be wrongly setting the PM_MMAP_EXCLUSIVE in the pagemap interface. Fix it by properly using ptdesc_pmd_is_shared() in hugetlb_pmd_shared(). Link: https://lkml.kernel.org/r/20251223214037.580860-1-david@kernel.org Link: https://lkml.kernel.org/r/20251223214037.580860-2-david@kernel.org Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [1] Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [2] Fixes: 59d9094 ("mm: hugetlb: independent PMD page table shared count") Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Tested-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Uschakow, Stanislav" <suschako@amazon.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
08fe4b4 to
17fff13
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A super smart change to test PRs notifications