Author: Nathan Chancellor <nathan@kernel.org>
Date: Wed Jan 14 16:27:11 2026 -0700
ACPI: APEI: GHES: Disable KASAN instrumentation when compile testing with clang < 18
[ Upstream commit b584bfbd7ec417f257f651cc00a90c66e31dfbf1 ]
After a recent innocuous change to drivers/acpi/apei/ghes.c, building
ARCH=arm64 allmodconfig with clang-17 or older (which has both
CONFIG_KASAN=y and CONFIG_WERROR=y) fails with:
drivers/acpi/apei/ghes.c:902:13: error: stack frame size (2768) exceeds limit (2048) in 'ghes_do_proc' [-Werror,-Wframe-larger-than]
902 | static void ghes_do_proc(struct ghes *ghes,
| ^
A KASAN pass that removes unneeded stack instrumentation, enabled by
default in clang-18 [1], drastically improves stack usage in this case.
To avoid the warning in the common allmodconfig case when it can break
the build, disable KASAN for ghes.o when compile testing with clang-17
and older. Disabling KASAN outright may hide legitimate runtime issues,
so live with the warning in that case; the user can either increase the
frame warning limit or disable -Werror, which they should probably do
when debugging with KASAN anyways.
Closes: https://github.com/ClangBuiltLinux/linux/issues/2148
Link: https://github.com/llvm/llvm-project/commit/51fbab134560ece663517bf1e8c2a30300d08f1a [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/20260114-ghes-avoid-wflt-clang-older-than-18-v1-1-9c8248bfe4f4@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Takashi Iwai <tiwai@suse.de>
Date: Sat Feb 7 14:13:17 2026 +0100
ALSA: hda/conexant: Add quirk for HP ZBook Studio G4
[ Upstream commit 1585cf83e98db32463e5d54161b06a5f01fe9976 ]
It was reported that we need the same quirk for HP ZBook Studio G4
(SSID 103c:826b) as other HP models to make the mute-LED working.
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/64d78753-b9ff-4c64-8920-64d8d31cd20c@gmail.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=221002
Link: https://patch.msgid.link/20260207131324.2428030-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Takashi Iwai <tiwai@suse.de>
Date: Tue Feb 17 11:44:11 2026 +0100
ALSA: hda/conexant: Fix headphone jack handling on Acer Swift SF314
[ Upstream commit 7bc0df86c2384bc1e2012a2c946f82305054da64 ]
Acer Swift SF314 (SSID 1025:136d) needs a bit of tweaks of the pin
configurations for NID 0x16 and 0x19 to make the headphone / headset
jack working. NID 0x17 can remain as is for the working speaker, and
the built-in mic is supported via SOF.
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=221086
Link: https://patch.msgid.link/20260217104414.62911-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Eric Naim <dnaim@cachyos.org>
Date: Tue Feb 10 17:34:02 2026 +0800
ALSA: hda/realtek: Add quirk for Gigabyte G5 KF5 (2023)
[ Upstream commit 405d59fdd2038a65790eaad8c1013d37a2af6561 ]
Fixes microphone detection when a headset is connected to the audio jack
using the ALC256.
Cc: stable@vger.kernel.org
Signed-off-by: Eric Naim <dnaim@cachyos.org>
Link: https://patch.msgid.link/20260210093403.21514-1-dnaim@cachyos.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Lewis Mason <mason8110@gmail.com>
Date: Tue Feb 10 23:13:37 2026 +0000
ALSA: hda/realtek: Add quirk for Samsung Galaxy Book3 Pro 360 (NP965QFG)
[ Upstream commit 3a6b7dc431aab90744e973254604855e654294ae ]
The Samsung Galaxy Book3 Pro 360 NP965QFG (subsystem ID 0x144d:0xc1cb)
uses the same Realtek ALC298 codec and amplifier configuration as the
NP960QFG (0x144d:0xc1ca). Apply the same ALC298_FIXUP_SAMSUNG_AMP_V2_4_AMPS
fixup to enable the internal speakers.
Cc: stable@vger.kernel.org
Signed-off-by: Lewis Mason <lewis@ocuru.co.uk>
Link: https://patch.msgid.link/20260210231337.7265-1-lewis@ocuru.co.uk
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Richard Fitzgerald <rf@opensource.cirrus.com>
Date: Thu Feb 26 11:17:28 2026 +0000
ALSA: hda: cs35l56: Fix signedness error in cs35l56_hda_posture_put()
[ Upstream commit 003ce8c9b2ca28fbb4860651e76fb1c9a91f2ea1 ]
In cs35l56_hda_posture_put() assign ucontrol->value.integer.value[0] to
a long instead of an unsigned long. ucontrol->value.integer.value[0] is
a long.
This fixes the sparse warning:
sound/hda/codecs/side-codecs/cs35l56_hda.c:256:20: warning: unsigned value
that used to be signed checked against zero?
sound/hda/codecs/side-codecs/cs35l56_hda.c:252:29: signed value source
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 73cfbfa9caea8 ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Link: https://patch.msgid.link/20260226111728.1700431-1-rf@opensource.cirrus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Date: Wed May 7 04:59:45 2025 +0000
ALSA: pci: hda: use snd_kcontrol_chip()
[ Upstream commit 483dd12dbe34c6d4e71d4d543bcb1292bcb62d08 ]
We can use snd_kcontrol_chip(). Let's use it.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/87plglauda.wl-kuninori.morimoto.gx@renesas.com
Stable-dep-of: 003ce8c9b2ca ("ALSA: hda: cs35l56: Fix signedness error in cs35l56_hda_posture_put()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Geoffrey D. Bennett <g@b4.vu>
Date: Fri Feb 20 21:58:48 2026 +1030
ALSA: scarlett2: Fix DSP filter control array handling
[ Upstream commit 1d241483368f2fd87fbaba64d6aec6bad3a1e12e ]
scarlett2_add_dsp_ctls() was incorrectly storing the precomp and PEQ
filter coefficient control pointers into the precomp_flt_switch_ctls
and peq_flt_switch_ctls arrays instead of the intended targets
precomp_flt_ctls and peq_flt_ctls. Pass NULL instead, as the filter
coefficient control pointers are not used, and remove the unused
precomp_flt_ctls and peq_flt_ctls arrays from struct scarlett2_data.
Additionally, scarlett2_update_filter_values() was reading
dsp_input_count * peq_flt_count values for
SCARLETT2_CONFIG_PEQ_FLT_SWITCH, but the peq_flt_switch array is
indexed only by dsp_input_count (one switch per DSP input, not per
filter). Fix the read count.
Fixes: b64678eb4e70 ("ALSA: scarlett2: Add DSP controls")
Signed-off-by: Geoffrey D. Bennett <g@b4.vu>
Link: https://patch.msgid.link/86497b71db060677d97c38a6ce5f89bb3b25361b.1771581197.git.g@b4.vu
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Geoffrey D. Bennett <g@b4.vu>
Date: Fri Oct 4 23:57:54 2024 +0930
ALSA: scarlett2: Fix redeclaration of loop variable
[ Upstream commit 5e7b782259fd396c7802948f5901bb2d769ddff8 ]
Was using both "for (i = 0, ..." and "for (int i = 0, ..." in
scarlett2_update_autogain(). Remove "int" to fix.
Signed-off-by: Geoffrey D. Bennett <g@b4.vu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/ecb0a8931c1883abd6c0e335c63961653bef85f0.1727971672.git.g@b4.vu
Stable-dep-of: 1d241483368f ("ALSA: scarlett2: Fix DSP filter control array handling")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Takashi Iwai <tiwai@suse.de>
Date: Wed Feb 25 09:52:28 2026 +0100
ALSA: usb-audio: Cap the packet size pre-calculations
[ Upstream commit 7fe8dec3f628e9779f1631576f8e693370050348 ]
We calculate the possible packet sizes beforehand for adaptive and
synchronous endpoints, but we didn't take care of the max frame size
for those pre-calculated values. When a device or a bus limits the
packet size, a high sample rate or a high number of channels may lead
to the packet sizes that are larger than the given limit, which
results in an error from the USB core at submitting URBs.
As a simple workaround, just add the sanity checks of pre-calculated
packet sizes to have the upper boundary of ep->maxframesize.
Fixes: f0bd62b64016 ("ALSA: usb-audio: Improve frames size computation")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=221076
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260225085233.316306-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Geoffrey D. Bennett <g@b4.vu>
Date: Sat Feb 21 02:34:48 2026 +1030
ALSA: usb-audio: Remove VALIDATE_RATES quirk for Focusrite devices
[ Upstream commit a8cc55bf81a45772cad44c83ea7bb0e98431094a ]
Remove QUIRK_FLAG_VALIDATE_RATES for Focusrite. With the previous
commit, focusrite_valid_sample_rate() produces correct rate tables
without USB probing.
QUIRK_FLAG_VALIDATE_RATES sends SET_CUR requests for each rate (~25ms
each) and leaves the device at 192kHz. This is a problem because that
rate: 1) disables the internal mixer, so outputs are silent until an
application opens the PCM and sets a lower rate, and 2) the Air and
Safe modes get disabled.
Fixes: 5963e5262180 ("ALSA: usb-audio: Enable rate validation for Scarlett devices")
Signed-off-by: Geoffrey D. Bennett <g@b4.vu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/09b9c012024c998c4ca14bd876ef0dce0d0b6101.1771594828.git.g@b4.vu
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jun Seo <jun.seo.93@proton.me>
Date: Thu Feb 26 10:08:20 2026 +0900
ALSA: usb-audio: Use correct version for UAC3 header validation
commit 54f9d645a5453d0bfece0c465d34aaf072ea99fa upstream.
The entry of the validators table for UAC3 AC header descriptor is
defined with the wrong protocol version UAC_VERSION_2, while it should
have been UAC_VERSION_3. This results in the validator never matching
for actual UAC3 devices (protocol == UAC_VERSION_3), causing their
header descriptors to bypass validation entirely. A malicious USB
device presenting a truncated UAC3 header could exploit this to cause
out-of-bounds reads when the driver later accesses unvalidated
descriptor fields.
The bug was introduced in the same commit as the recently fixed UAC3
feature unit sub-type typo, and appears to be from the same copy-paste
error when the UAC3 section was created from the UAC2 section.
Fixes: 57f8770620e9 ("ALSA: usb-audio: More validations of descriptor units")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jun Seo <jun.seo.93@proton.me>
Link: https://patch.msgid.link/20260226010820.36529-1-jun.seo.93@proton.me
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Takashi Iwai <tiwai@suse.de>
Date: Wed Feb 25 09:52:31 2026 +0100
ALSA: usb-audio: Use inclusive terms
[ Upstream commit 4e9113c533acee2ba1f72fd68ee6ecd36b64484e ]
Replace the remaining with inclusive terms; it's only this function
name we overlooked at the previous conversion.
Fixes: 53837b4ac2bd ("ALSA: usb-audio: Replace slave/master terms")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260225085233.316306-5-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Raju Rangoju <Raju.Rangoju@amd.com>
Date: Thu Feb 26 22:37:53 2026 +0530
amd-xgbe: fix MAC_TCR_SS register width for 2.5G and 10M speeds
[ Upstream commit 9439a661c2e80485406ce2c90b107ca17858382d ]
Extend the MAC_TCR_SS (Speed Select) register field width from 2 bits
to 3 bits to properly support all speed settings.
The MAC_TCR register's SS field encoding requires 3 bits to represent
all supported speeds:
- 0x00: 10Gbps (XGMII)
- 0x02: 2.5Gbps (GMII) / 100Mbps
- 0x03: 1Gbps / 10Mbps
- 0x06: 2.5Gbps (XGMII) - P100a only
With only 2 bits, values 0x04-0x07 cannot be represented, which breaks
2.5G XGMII mode on newer platforms and causes incorrect speed select
values to be programmed.
Fixes: 07445f3c7ca1 ("amd-xgbe: Add support for 10 Mbps speed")
Co-developed-by: Guruvendra Punugupati <Guruvendra.Punugupati@amd.com>
Signed-off-by: Guruvendra Punugupati <Guruvendra.Punugupati@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260226170753.250312-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Raju Rangoju <Raju.Rangoju@amd.com>
Date: Mon Mar 2 09:51:24 2026 +0530
amd-xgbe: fix sleep while atomic on suspend/resume
[ Upstream commit e2f27363aa6d983504c6836dd0975535e2e9dba0 ]
The xgbe_powerdown() and xgbe_powerup() functions use spinlocks
(spin_lock_irqsave) while calling functions that may sleep:
- napi_disable() can sleep waiting for NAPI polling to complete
- flush_workqueue() can sleep waiting for pending work items
This causes a "BUG: scheduling while atomic" error during suspend/resume
cycles on systems using the AMD XGBE Ethernet controller.
The spinlock protection in these functions is unnecessary as these
functions are called from suspend/resume paths which are already serialized
by the PM core
Fix this by removing the spinlock. Since only code that takes this lock
is xgbe_powerdown() and xgbe_powerup(), remove it completely.
Fixes: c5aa9e3b8156 ("amd-xgbe: Initial AMD 10GbE platform driver")
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260302042124.1386445-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: John Johansen <john.johansen@canonical.com>
Date: Fri Oct 17 01:53:00 2025 -0700
apparmor: fix differential encoding verification
commit 39440b137546a3aa383cfdabc605fb73811b6093 upstream.
Differential encoding allows loops to be created if it is abused. To
prevent this the unpack should verify that a diff-encode chain
terminates.
Unfortunately the differential encode verification had two bugs.
1. it conflated states that had gone through check and already been
marked, with states that were currently being checked and marked.
This means that loops in the current chain being verified are treated
as a chain that has already been verified.
2. the order bailout on already checked states compared current chain
check iterators j,k instead of using the outer loop iterator i.
Meaning a step backwards in states in the current chain verification
was being mistaken for moving to an already verified state.
Move to a double mark scheme where already verified states get a
different mark, than the current chain being kept. This enables us
to also drop the backwards verification check that was the cause of
the second error as any already verified state is already marked.
Fixes: 031dcc8f4e84 ("apparmor: dfa add support for state differential encoding")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: John Johansen <john.johansen@canonical.com>
Date: Wed Sep 10 06:22:17 2025 -0700
apparmor: Fix double free of ns_name in aa_replace_profiles()
commit 5df0c44e8f5f619d3beb871207aded7c78414502 upstream.
if ns_name is NULL after
1071 error = aa_unpack(udata, &lh, &ns_name);
and if ent->ns_name contains an ns_name in
1089 } else if (ent->ns_name) {
then ns_name is assigned the ent->ns_name
1095 ns_name = ent->ns_name;
however ent->ns_name is freed at
1262 aa_load_ent_free(ent);
and then again when freeing ns_name at
1270 kfree(ns_name);
Fix this by NULLing out ent->ns_name after it is transferred to ns_name
Fixes: 145a0ef21c8e9 ("apparmor: fix blob compression when ns is forced on a policy load")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Date: Tue Jan 20 15:24:04 2026 +0100
apparmor: fix memory leak in verify_header
commit e38c55d9f834e5b848bfed0f5c586aaf45acb825 upstream.
The function sets `*ns = NULL` on every call, leaking the namespace
string allocated in previous iterations when multiple profiles are
unpacked. This also breaks namespace consistency checking since *ns
is always NULL when the comparison is made.
Remove the incorrect assignment.
The caller (aa_unpack) initializes *ns to NULL once before the loop,
which is sufficient.
Fixes: dd51c8485763 ("apparmor: provide base for multiple profiles to be replaced at once")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Date: Thu Jan 29 16:51:11 2026 +0100
apparmor: fix missing bounds check on DEFAULT table in verify_dfa()
commit d352873bbefa7eb39995239d0b44ccdf8aaa79a4 upstream.
The verify_dfa() function only checks DEFAULT_TABLE bounds when the state
is not differentially encoded.
When the verification loop traverses the differential encoding chain,
it reads k = DEFAULT_TABLE[j] and uses k as an array index without
validation. A malformed DFA with DEFAULT_TABLE[j] >= state_count,
therefore, causes both out-of-bounds reads and writes.
[ 57.179855] ==================================================================
[ 57.180549] BUG: KASAN: slab-out-of-bounds in verify_dfa+0x59a/0x660
[ 57.180904] Read of size 4 at addr ffff888100eadec4 by task su/993
[ 57.181554] CPU: 1 UID: 0 PID: 993 Comm: su Not tainted 6.19.0-rc7-next-20260127 #1 PREEMPT(lazy)
[ 57.181558] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 57.181563] Call Trace:
[ 57.181572] <TASK>
[ 57.181577] dump_stack_lvl+0x5e/0x80
[ 57.181596] print_report+0xc8/0x270
[ 57.181605] ? verify_dfa+0x59a/0x660
[ 57.181608] kasan_report+0x118/0x150
[ 57.181620] ? verify_dfa+0x59a/0x660
[ 57.181623] verify_dfa+0x59a/0x660
[ 57.181627] aa_dfa_unpack+0x1610/0x1740
[ 57.181629] ? __kmalloc_cache_noprof+0x1d0/0x470
[ 57.181640] unpack_pdb+0x86d/0x46b0
[ 57.181647] ? srso_alias_return_thunk+0x5/0xfbef5
[ 57.181653] ? srso_alias_return_thunk+0x5/0xfbef5
[ 57.181656] ? aa_unpack_nameX+0x1a8/0x300
[ 57.181659] aa_unpack+0x20b0/0x4c30
[ 57.181662] ? srso_alias_return_thunk+0x5/0xfbef5
[ 57.181664] ? stack_depot_save_flags+0x33/0x700
[ 57.181681] ? kasan_save_track+0x4f/0x80
[ 57.181683] ? kasan_save_track+0x3e/0x80
[ 57.181686] ? __kasan_kmalloc+0x93/0xb0
[ 57.181688] ? __kvmalloc_node_noprof+0x44a/0x780
[ 57.181693] ? aa_simple_write_to_buffer+0x54/0x130
[ 57.181697] ? policy_update+0x154/0x330
[ 57.181704] aa_replace_profiles+0x15a/0x1dd0
[ 57.181707] ? srso_alias_return_thunk+0x5/0xfbef5
[ 57.181710] ? __kvmalloc_node_noprof+0x44a/0x780
[ 57.181712] ? aa_loaddata_alloc+0x77/0x140
[ 57.181715] ? srso_alias_return_thunk+0x5/0xfbef5
[ 57.181717] ? _copy_from_user+0x2a/0x70
[ 57.181730] policy_update+0x17a/0x330
[ 57.181733] profile_replace+0x153/0x1a0
[ 57.181735] ? rw_verify_area+0x93/0x2d0
[ 57.181740] vfs_write+0x235/0xab0
[ 57.181745] ksys_write+0xb0/0x170
[ 57.181748] do_syscall_64+0x8e/0x660
[ 57.181762] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 57.181765] RIP: 0033:0x7f6192792eb2
Remove the MATCH_FLAG_DIFF_ENCODE condition to validate all DEFAULT_TABLE
entries unconditionally.
Fixes: 031dcc8f4e84 ("apparmor: dfa add support for state differential encoding")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: John Johansen <john.johansen@canonical.com>
Date: Sun Mar 1 16:10:51 2026 -0800
apparmor: fix race between freeing data and fs accessing it
commit 8e135b8aee5a06c52a4347a5a6d51223c6f36ba3 upstream.
AppArmor was putting the reference to i_private data on its end after
removing the original entry from the file system. However the inode
can aand does live beyond that point and it is possible that some of
the fs call back functions will be invoked after the reference has
been put, which results in a race between freeing the data and
accessing it through the fs.
While the rawdata/loaddata is the most likely candidate to fail the
race, as it has the fewest references. If properly crafted it might be
possible to trigger a race for the other types stored in i_private.
Fix this by moving the put of i_private referenced data to the correct
place which is during inode eviction.
Fixes: c961ee5f21b20 ("apparmor: convert from securityfs to apparmorfs for policy ns files")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Maxime Bélair <maxime.belair@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: John Johansen <john.johansen@canonical.com>
Date: Tue Feb 24 10:20:02 2026 -0800
apparmor: fix race on rawdata dereference
commit a0b7091c4de45a7325c8780e6934a894f92ac86b upstream.
There is a race condition that leads to a use-after-free situation:
because the rawdata inodes are not refcounted, an attacker can start
open()ing one of the rawdata files, and at the same time remove the
last reference to this rawdata (by removing the corresponding profile,
for example), which frees its struct aa_loaddata; as a result, when
seq_rawdata_open() is reached, i_private is a dangling pointer and
freed memory is accessed.
The rawdata inodes weren't refcounted to avoid a circular refcount and
were supposed to be held by the profile rawdata reference. However
during profile removal there is a window where the vfs and profile
destruction race, resulting in the use after free.
Fix this by moving to a double refcount scheme. Where the profile
refcount on rawdata is used to break the circular dependency. Allowing
for freeing of the rawdata once all inode references to the rawdata
are put.
Fixes: 5d5182cae401 ("apparmor: move to per loaddata files, instead of replicating in profiles")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Maxime Bélair <maxime.belair@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Date: Thu Jan 29 17:08:25 2026 +0100
apparmor: fix side-effect bug in match_char() macro usage
commit 8756b68edae37ff546c02091989a4ceab3f20abd upstream.
The match_char() macro evaluates its character parameter multiple
times when traversing differential encoding chains. When invoked
with *str++, the string pointer advances on each iteration of the
inner do-while loop, causing the DFA to check different characters
at each iteration and therefore skip input characters.
This results in out-of-bounds reads when the pointer advances past
the input buffer boundary.
[ 94.984676] ==================================================================
[ 94.985301] BUG: KASAN: slab-out-of-bounds in aa_dfa_match+0x5ae/0x760
[ 94.985655] Read of size 1 at addr ffff888100342000 by task file/976
[ 94.986319] CPU: 7 UID: 1000 PID: 976 Comm: file Not tainted 6.19.0-rc7-next-20260127 #1 PREEMPT(lazy)
[ 94.986322] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 94.986329] Call Trace:
[ 94.986341] <TASK>
[ 94.986347] dump_stack_lvl+0x5e/0x80
[ 94.986374] print_report+0xc8/0x270
[ 94.986384] ? aa_dfa_match+0x5ae/0x760
[ 94.986388] kasan_report+0x118/0x150
[ 94.986401] ? aa_dfa_match+0x5ae/0x760
[ 94.986405] aa_dfa_match+0x5ae/0x760
[ 94.986408] __aa_path_perm+0x131/0x400
[ 94.986418] aa_path_perm+0x219/0x2f0
[ 94.986424] apparmor_file_open+0x345/0x570
[ 94.986431] security_file_open+0x5c/0x140
[ 94.986442] do_dentry_open+0x2f6/0x1120
[ 94.986450] vfs_open+0x38/0x2b0
[ 94.986453] ? may_open+0x1e2/0x2b0
[ 94.986466] path_openat+0x231b/0x2b30
[ 94.986469] ? __x64_sys_openat+0xf8/0x130
[ 94.986477] do_file_open+0x19d/0x360
[ 94.986487] do_sys_openat2+0x98/0x100
[ 94.986491] __x64_sys_openat+0xf8/0x130
[ 94.986499] do_syscall_64+0x8e/0x660
[ 94.986515] ? count_memcg_events+0x15f/0x3c0
[ 94.986526] ? srso_alias_return_thunk+0x5/0xfbef5
[ 94.986540] ? handle_mm_fault+0x1639/0x1ef0
[ 94.986551] ? vma_start_read+0xf0/0x320
[ 94.986558] ? srso_alias_return_thunk+0x5/0xfbef5
[ 94.986561] ? srso_alias_return_thunk+0x5/0xfbef5
[ 94.986563] ? fpregs_assert_state_consistent+0x50/0xe0
[ 94.986572] ? srso_alias_return_thunk+0x5/0xfbef5
[ 94.986574] ? arch_exit_to_user_mode_prepare+0x9/0xb0
[ 94.986587] ? srso_alias_return_thunk+0x5/0xfbef5
[ 94.986588] ? irqentry_exit+0x3c/0x590
[ 94.986595] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 94.986597] RIP: 0033:0x7fda4a79c3ea
Fix by extracting the character value before invoking match_char,
ensuring single evaluation per outer loop.
Fixes: 074c1cd798cb ("apparmor: dfa move character match into a macro")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: John Johansen <john.johansen@canonical.com>
Date: Fri Nov 7 08:36:04 2025 -0800
apparmor: fix unprivileged local user can do privileged policy management
commit 6601e13e82841879406bf9f369032656f441a425 upstream.
An unprivileged local user can load, replace, and remove profiles by
opening the apparmorfs interfaces, via a confused deputy attack, by
passing the opened fd to a privileged process, and getting the
privileged process to write to the interface.
This does require a privileged target that can be manipulated to do
the write for the unprivileged process, but once such access is
achieved full policy management is possible and all the possible
implications that implies: removing confinement, DoS of system or
target applications by denying all execution, by-passing the
unprivileged user namespace restriction, to exploiting kernel bugs for
a local privilege escalation.
The policy management interface can not have its permissions simply
changed from 0666 to 0600 because non-root processes need to be able
to load policy to different policy namespaces.
Instead ensure the task writing the interface has privileges that
are a subset of the task that opened the interface. This is already
done via policy for confined processes, but unconfined can delegate
access to the opened fd, by-passing the usual policy check.
Fixes: b7fd2c0340eac ("apparmor: add per policy ns .load, .replace, .remove interface files")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: John Johansen <john.johansen@canonical.com>
Date: Tue Mar 3 11:08:02 2026 -0800
apparmor: fix: limit the number of levels of policy namespaces
commit 306039414932c80f8420695a24d4fe10c84ccfb2 upstream.
Currently the number of policy namespaces is not bounded relying on
the user namespace limit. However policy namespaces aren't strictly
tied to user namespaces and it is possible to create them and nest
them arbitrarily deep which can be used to exhaust system resource.
Hard cap policy namespaces to the same depth as user namespaces.
Fixes: c88d4c7b049e8 ("AppArmor: core policy routines")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Reviewed-by: Ryan Lee <ryan.lee@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Date: Tue Jan 13 09:09:43 2026 +0100
apparmor: replace recursive profile removal with iterative approach
commit ab09264660f9de5d05d1ef4e225aa447c63a8747 upstream.
The profile removal code uses recursion when removing nested profiles,
which can lead to kernel stack exhaustion and system crashes.
Reproducer:
$ pf='a'; for ((i=0; i<1024; i++)); do
echo -e "profile $pf { \n }" | apparmor_parser -K -a;
pf="$pf//x";
done
$ echo -n a > /sys/kernel/security/apparmor/.remove
Replace the recursive __aa_profile_list_release() approach with an
iterative approach in __remove_profile(). The function repeatedly
finds and removes leaf profiles until the entire subtree is removed,
maintaining the same removal semantic without recursion.
Fixes: c88d4c7b049e ("AppArmor: core policy routines")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Date: Thu Jan 15 15:30:50 2026 +0100
apparmor: validate DFA start states are in bounds in unpack_pdb
commit 9063d7e2615f4a7ab321de6b520e23d370e58816 upstream.
Start states are read from untrusted data and used as indexes into the
DFA state tables. The aa_dfa_next() function call in unpack_pdb() will
access dfa->tables[YYTD_ID_BASE][start], and if the start state exceeds
the number of states in the DFA, this results in an out-of-bound read.
==================================================================
BUG: KASAN: slab-out-of-bounds in aa_dfa_next+0x2a1/0x360
Read of size 4 at addr ffff88811956fb90 by task su/1097
...
Reject policies with out-of-bounds start states during unpacking
to prevent the issue.
Fixes: ad5ff3db53c6 ("AppArmor: Add ability to load extended policy")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Shawn Lin <shawn.lin@rock-chips.com>
Date: Mon Jan 5 16:15:28 2026 +0800
arm64: dts: rockchip: Fix rk356x PCIe range mappings
[ Upstream commit f63ea193a404481f080ca2958f73e9f364682db9 ]
The pcie bus address should be mapped 1:1 to the cpu side MMIO address, so
that there is no same address allocated from normal system memory. Otherwise
it's broken if the same address assigned to the EP for DMA purpose.Fix it to
sync with the vendor BSP.
Fixes: 568a67e742df ("arm64: dts: rockchip: Fix rk356x PCIe register and range mappings")
Fixes: 66b51ea7d70f ("arm64: dts: rockchip: Add rk3568 PCIe2x1 controller")
Cc: stable@vger.kernel.org
Cc: Andrew Powers-Holmes <aholmes@omnom.net>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Link: https://patch.msgid.link/1767600929-195341-1-git-send-email-shawn.lin@rock-chips.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Shawn Lin <shawn.lin@rock-chips.com>
Date: Mon Jan 5 16:15:29 2026 +0800
arm64: dts: rockchip: Fix rk3588 PCIe range mappings
[ Upstream commit 46c56b737161060dfa468f25ae699749047902a2 ]
The pcie bus address should be mapped 1:1 to the cpu side MMIO address, so
that there is no same address allocated from normal system memory. Otherwise
it's broken if the same address assigned to the EP for DMA purpose.Fix it to
sync with the vendor BSP.
Fixes: 0acf4fa7f187 ("arm64: dts: rockchip: add PCIe3 support for rk3588")
Fixes: 8d81b77f4c49 ("arm64: dts: rockchip: add rk3588 PCIe2 support")
Cc: stable@vger.kernel.org
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Link: https://patch.msgid.link/1767600929-195341-2-git-send-email-shawn.lin@rock-chips.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Date: Fri Feb 13 08:39:29 2026 +0100
ARM: clean up the memset64() C wrapper
commit b52343d1cb47bb27ca32a3f4952cc2fd3cd165bf upstream.
The current logic to split the 64-bit argument into its 32-bit halves is
byte-order specific and a bit clunky. Use a union instead which is
easier to read and works in all cases.
GCC still generates the same machine code.
While at it, rename the arguments of the __memset64() prototype to
actually reflect their semantics.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Ben Hutchings <ben@decadent.org.uk> # for -stable
Link: https://lore.kernel.org/all/1a11526ae3d8664f705b541b8d6ea57b847b49a8.camel@decadent.org.uk/
Suggested-by: https://lore.kernel.org/all/aZonkWMwpbFhzDJq@casper.infradead.org/ # for -stable
Link: https://lore.kernel.org/all/aZonkWMwpbFhzDJq@casper.infradead.org/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Fri Feb 20 12:09:12 2026 +0900
ata: libata-core: fix cancellation of a port deferred qc work
commit 55db009926634b20955bd8abbee921adbc8d2cb4 upstream.
cancel_work_sync() is a sleeping function so it cannot be called with
the spin lock of a port being held. Move the call to this function in
ata_port_detach() after EH completes, with the port lock released,
together with other work cancellation calls.
Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Fri Feb 20 13:43:00 2026 +0900
ata: libata-eh: correctly handle deferred qc timeouts
commit eddb98ad9364b4e778768785d46cfab04ce52100 upstream.
A deferred qc may timeout while waiting for the device queue to drain
to be submitted. In such case, since the qc is not active,
ata_scsi_cmd_error_handler() ends up calling scsi_eh_finish_cmd(),
which frees the qc. But as the port deferred_qc field still references
this finished/freed qc, the deferred qc work may eventually attempt to
call ata_qc_issue() against this invalid qc, leading to errors such as
reported by UBSAN (syzbot run):
UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
...
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
ubsan_epilogue+0xa/0x30 lib/ubsan.c:233
__ubsan_handle_shift_out_of_bounds+0x279/0x2a0 lib/ubsan.c:494
ata_qc_issue.cold+0x38/0x9f drivers/ata/libata-core.c:5166
ata_scsi_deferred_qc_work+0x154/0x1f0 drivers/ata/libata-scsi.c:1679
process_one_work+0x9d7/0x1920 kernel/workqueue.c:3275
process_scheduled_works kernel/workqueue.c:3358 [inline]
worker_thread+0x5da/0xe40 kernel/workqueue.c:3439
kthread+0x370/0x450 kernel/kthread.c:467
ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
Fix this by checking if the qc of a timed out SCSI command is a deferred
one, and in such case, clear the port deferred_qc field and finish the
SCSI command with DID_TIME_OUT.
Reported-by: syzbot+1f77b8ca15336fff21ff@syzkaller.appspotmail.com
Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Guenter Roeck <linux@roeck-us.net>
Date: Thu Mar 5 18:48:05 2026 -0800
ata: libata-eh: Fix detection of deferred qc timeouts
commit ee0e6e69a772d601e152e5368a1da25d656122a8 upstream.
If the ata_qc_for_each_raw() loop finishes without finding a matching SCSI
command for any QC, the variable qc will hold a pointer to the last element
examined, which has the tag i == ATA_MAX_QUEUE - 1. This qc can match the
port deferred QC (ap->deferred_qc).
If that happens, the condition qc == ap->deferred_qc evaluates to true
despite the loop not breaking with a match on the SCSI command for this QC.
In that case, the error handler mistakenly intercepts a command that has
not been issued yet and that has not timed out, and thus erroneously
returning a timeout error.
Fix the problem by checking for i < ATA_MAX_QUEUE in addition to
qc == ap->deferred_qc.
The problem was found by an experimental code review agent based on
gemini-3.1-pro while reviewing backports into v6.18.y.
Assisted-by: Gemini:gemini-3.1-pro
Fixes: eddb98ad9364 ("ata: libata-eh: correctly handle deferred qc timeouts")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[cassel: modified commit log as suggested by Damien]
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Wed Dec 17 16:40:48 2025 +0900
ata: libata-scsi: avoid Non-NCQ command starvation
[ Upstream commit 0ea84089dbf62a92dc7889c79e6b18fc89260808 ]
When a non-NCQ command is issued while NCQ commands are being executed,
ata_scsi_qc_issue() indicates to the SCSI layer that the command issuing
should be deferred by returning SCSI_MLQUEUE_XXX_BUSY. This command
deferring is correct and as mandated by the ACS specifications since
NCQ and non-NCQ commands cannot be mixed.
However, in the case of a host adapter using multiple submission queues,
when the target device is under a constant load of NCQ commands, there
are no guarantees that requeueing the non-NCQ command will be executed
later and it may be deferred again repeatedly as other submission queues
can constantly issue NCQ commands from different CPUs ahead of the
non-NCQ command. This can lead to very long delays for the execution of
non-NCQ commands, and even complete starvation for these commands in the
worst case scenario.
Since the block layer and the SCSI layer do not distinguish between
queueable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
implementation must ensure forward progress for non-NCQ commands in the
presence of NCQ command traffic. This is similar to what SAS HBAs with a
hardware/firmware based SAT implementation do.
Implement such forward progress guarantee by limiting requeueing of
non-NCQ commands from ata_scsi_qc_issue(): when a non-NCQ command is
received and NCQ commands are in-flight, do not force a requeue of the
non-NCQ command by returning SCSI_MLQUEUE_XXX_BUSY and instead return 0
to indicate that the command was accepted but hold on to the qc using
the new deferred_qc field of struct ata_port.
This deferred qc will be issued using the work item deferred_qc_work
running the function ata_scsi_deferred_qc_work() once all in-flight
commands complete, which is checked with the port qc_defer() callback
return value indicating that no further delay is necessary. This check
is done using the helper function ata_scsi_schedule_deferred_qc() which
is called from ata_scsi_qc_complete(). This thus excludes this mechanism
from all internal non-NCQ commands issued by ATA EH.
When a port deferred_qc is non NULL, that is, the port has a command
waiting for the device queue to drain, the issuing of all incoming
commands (both NCQ and non-NCQ) is deferred using the regular busy
mechanism. This simplifies the code and also avoids potential denial of
service problems if a user issues too many non-NCQ commands.
Finally, whenever ata EH is scheduled, regardless of the reason, a
deferred qc is always requeued so that it can be retried once EH
completes. This is done by calling the function
ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids
the need for any special processing for the deferred qc in case of NCQ
error, link or device reset, or device timeout.
Reported-by: Xingui Yang <yangxingui@huawei.com>
Reported-by: Igor Pylypiv <ipylypiv@google.com>
Fixes: bdb01301f3ea ("scsi: Add host and host template flag 'host_tagset'")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Tested-by: Igor Pylypiv <ipylypiv@google.com>
Tested-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Tue Oct 22 11:45:35 2024 +0900
ata: libata-scsi: Document all VPD page inquiry actors
[ Upstream commit 47000e84b3d0630d7d86eeb115894205be68035d ]
Add the missing kdoc comments for the ata_scsiop_inq_XX functions used
to emulate access to VPD pages.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241022024537.251905-5-dlemoal@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Stable-dep-of: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Tue Oct 22 11:45:32 2024 +0900
ata: libata-scsi: Refactor ata_scsi_simulate()
[ Upstream commit b055e3be63bebc3c50d0fb1830de9bf4f2be388d ]
Factor out the code handling the INQUIRY command in ata_scsi_simulate()
using the function ata_scsi_rbuf_fill() with the new actor
ata_scsiop_inquiry(). This new actor function calls the existing actors
to handle the standard inquiry as well as extended inquiry (VPD page
access).
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241022024537.251905-2-dlemoal@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Stable-dep-of: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Tue Oct 22 11:45:34 2024 +0900
ata: libata-scsi: Refactor ata_scsiop_maint_in()
[ Upstream commit 4ab7bb97634351914a18f3c4533992c99eb6edb6 ]
Move the check for MI_REPORT_SUPPORTED_OPERATION_CODES from
ata_scsi_simulate() into ata_scsiop_maint_in() to simplify
ata_scsi_simulate() code.
Furthermore, since an rbuff fill actor function returning a non-zero
value causes no data to be returned for the command, directly return
an error (return 1) for invalid command formt after setting the invalid
field in cdb error.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241022024537.251905-4-dlemoal@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Stable-dep-of: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Tue Oct 22 11:45:33 2024 +0900
ata: libata-scsi: Refactor ata_scsiop_read_cap()
[ Upstream commit 44bdde151a6f5b34993c570a8f6508e2e00b56e1 ]
Move the check for the scsi command service action being
SAI_READ_CAPACITY_16 from ata_scsi_simulate() into ata_scsiop_read_cap()
to simplify ata_scsi_simulate() for processing capacity reading commands
(READ_CAPACITY and SERVICE_ACTION_IN_16).
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241022024537.251905-3-dlemoal@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Stable-dep-of: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Tue Oct 22 11:45:36 2024 +0900
ata: libata-scsi: Remove struct ata_scsi_args
[ Upstream commit 2365278e03916b6b9a65df91e9f7c7afe5a6cf2e ]
The data structure struct ata_scsi_args is used to pass the target ATA
device, the SCSI command to simulate and the device identification data
to ata_scsi_rbuf_fill() and to its actor function. This method of
passing information does not improve the code in any way and in fact
increases the number of pointer dereferences for no gains.
Drop this data structure by modifying the interface of
ata_scsi_rbuf_fill() and its actor function to take an ATA device and a
SCSI command as argument.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241022024537.251905-6-dlemoal@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Stable-dep-of: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Niklas Cassel <cassel@kernel.org>
Date: Tue Mar 3 11:03:42 2026 +0100
ata: libata: cancel pending work after clearing deferred_qc
commit aac9b27f7c1f2b2cf7f50a9ca633ecbbcaf22af9 upstream.
Syzbot reported a WARN_ON() in ata_scsi_deferred_qc_work(), caused by
ap->ops->qc_defer() returning non-zero before issuing the deferred qc.
ata_scsi_schedule_deferred_qc() is called during each command completion.
This function will check if there is a deferred QC, and if
ap->ops->qc_defer() returns zero, meaning that it is possible to queue the
deferred qc at this time (without being deferred), then it will queue the
work which will issue the deferred qc.
Once the work get to run, which can potentially be a very long time after
the work was scheduled, there is a WARN_ON() if ap->ops->qc_defer() returns
non-zero.
While we hold the ap->lock both when assigning and clearing deferred_qc,
and the work itself holds the ap->lock, the code currently does not cancel
the work after clearing the deferred qc.
This means that the following scenario can happen:
1) One or several NCQ commands are queued.
2) A non-NCQ command is queued, gets stored in ap->deferred_qc.
3) Last NCQ command gets completed, work is queued to issue the deferred
qc.
4) Timeout or error happens, ap->deferred_qc is cleared. The queued work is
currently NOT canceled.
5) Port is reset.
6) One or several NCQ commands are queued.
7) A non-NCQ command is queued, gets stored in ap->deferred_qc.
8) Work is finally run. Yet at this time, there is still NCQ commands in
flight.
The work in 8) really belongs to the non-NCQ command in 2), not to the
non-NCQ command in 7). The reason why the work is executed when it is not
supposed to, is because it was never canceled when ap->deferred_qc was
cleared in 4). Thus, ensure that we always cancel the work after clearing
ap->deferred_qc.
Another potential fix would have been to let ata_scsi_deferred_qc_work() do
nothing if ap->ops->qc_defer() returns non-zero. However, canceling the
work when clearing ap->deferred_qc seems slightly more logical, as we hold
the ap->lock when clearing ap->deferred_qc, so we know that the work cannot
be holding the lock. (The function could be waiting for the lock, but that
is okay since it will do nothing if ap->deferred_qc is not set.)
Reported-by: syzbot+bcaf842a1e8ead8dfb89@syzkaller.appspotmail.com
Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Fixes: eddb98ad9364 ("ata: libata-eh: correctly handle deferred qc timeouts")
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Fri Jul 4 19:46:00 2025 +0900
ata: libata: Introduce ata_port_eh_scheduled()
[ Upstream commit 7aae547bbe442affc4afe176b157fab820a12437 ]
Introduce the inline helper function ata_port_eh_scheduled() to test if
EH is pending (ATA_PFLAG_EH_PENDING port flag is set) or running
(ATA_PFLAG_EH_IN_PROGRESS port flag is set) for a port. Use this helper
in ata_port_wait_eh() and __ata_scsi_queuecmd() to replace the hardcoded
port flag tests.
No functional changes.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20250704104601.310643-1-dlemoal@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Stable-dep-of: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Fri Jun 27 09:42:33 2025 +0900
ata: libata: Remove ATA_DFLAG_ZAC device flag
[ Upstream commit a0f26fcc383965e0522b81269062a9278bc802fe ]
The ATA device flag ATA_DFLAG_ZAC is used to indicate if a devie is a
host managed or host aware zoned device. However, this flag is not used
in the hot path and only used during device scanning/revalidation and
for inquiry and sense SCSI command translation.
Save one bit from struct ata_device flags field by replacing this flag
with the internal helper function ata_dev_is_zac(). This function
returns true if the device class is ATA_DEV_ZAC (host managed ZAC device
case) or if its identify data reports it supports the zoned command set
(host aware ZAC device case).
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Stable-dep-of: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jiayuan Chen <jiayuan.chen@shopee.com>
Date: Wed Feb 25 20:32:40 2026 +0800
atm: lec: fix null-ptr-deref in lec_arp_clear_vccs
[ Upstream commit 101bacb303e89dc2e0640ae6a5e0fb97c4eb45bb ]
syzkaller reported a null-ptr-deref in lec_arp_clear_vccs().
This issue can be easily reproduced using the syzkaller reproducer.
In the ATM LANE (LAN Emulation) module, the same atm_vcc can be shared by
multiple lec_arp_table entries (e.g., via entry->vcc or entry->recv_vcc).
When the underlying VCC is closed, lec_vcc_close() iterates over all
ARP entries and calls lec_arp_clear_vccs() for each matched entry.
For example, when lec_vcc_close() iterates through the hlists in
priv->lec_arp_empty_ones or other ARP tables:
1. In the first iteration, for the first matched ARP entry sharing the VCC,
lec_arp_clear_vccs() frees the associated vpriv (which is vcc->user_back)
and sets vcc->user_back to NULL.
2. In the second iteration, for the next matched ARP entry sharing the same
VCC, lec_arp_clear_vccs() is called again. It obtains a NULL vpriv from
vcc->user_back (via LEC_VCC_PRIV(vcc)) and then attempts to dereference it
via `vcc->pop = vpriv->old_pop`, leading to a null-ptr-deref crash.
Fix this by adding a null check for vpriv before dereferencing
it. If vpriv is already NULL, it means the VCC has been cleared
by a previous call, so we can safely skip the cleanup and just
clear the entry's vcc/recv_vcc pointers.
The entire cleanup block (including vcc_release_async()) is placed inside
the vpriv guard because a NULL vpriv indicates the VCC has already been
fully released by a prior iteration — repeating the teardown would
redundantly set flags and trigger callbacks on an already-closing socket.
The Fixes tag points to the initial commit because the entry->vcc path has
been vulnerable since the original code. The entry->recv_vcc path was later
added by commit 8d9f73c0ad2f ("atm: fix a memory leak of vcc->user_back")
with the same pattern, and both paths are fixed here.
Reported-by: syzbot+72e3ea390c305de0e259@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68c95a83.050a0220.3c6139.0e5c.GAE@google.com/T/
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://patch.msgid.link/20260225123250.189289-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Fuad Tabba <tabba@google.com>
Date: Thu Feb 26 07:55:25 2026 +0000
bpf, arm64: Force 8-byte alignment for JIT buffer to prevent atomic tearing
[ Upstream commit ef06fd16d48704eac868441d98d4ef083d8f3d07 ]
struct bpf_plt contains a u64 target field. Currently, the BPF JIT
allocator requests an alignment of 4 bytes (sizeof(u32)) for the JIT
buffer.
Because the base address of the JIT buffer can be 4-byte aligned (e.g.,
ending in 0x4 or 0xc), the relative padding logic in build_plt() fails
to ensure that target lands on an 8-byte boundary.
This leads to two issues:
1. UBSAN reports misaligned-access warnings when dereferencing the
structure.
2. More critically, target is updated concurrently via WRITE_ONCE() in
bpf_arch_text_poke() while the JIT'd code executes ldr. On arm64,
64-bit loads/stores are only guaranteed to be single-copy atomic if
they are 64-bit aligned. A misaligned target risks a torn read,
causing the JIT to jump to a corrupted address.
Fix this by increasing the allocation alignment requirement to 8 bytes
(sizeof(u64)) in bpf_jit_binary_pack_alloc(). This anchors the base of
the JIT buffer to an 8-byte boundary, allowing the relative padding math
in build_plt() to correctly align the target field.
Fixes: b2ad54e1533e ("bpf, arm64: Implement bpf_arch_text_poke() for arm64")
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20260226075525.233321-1-tabba@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jiayuan Chen <jiayuan.chen@shopee.com>
Date: Thu Feb 26 16:03:01 2026 +0800
bpf/bonding: reject vlan+srcmac xmit_hash_policy change when XDP is loaded
[ Upstream commit 479d589b40b836442bbdadc3fdb37f001bb67f26 ]
bond_option_mode_set() already rejects mode changes that would make a
loaded XDP program incompatible via bond_xdp_check(). However,
bond_option_xmit_hash_policy_set() has no such guard.
For 802.3ad and balance-xor modes, bond_xdp_check() returns false when
xmit_hash_policy is vlan+srcmac, because the 802.1q payload is usually
absent due to hardware offload. This means a user can:
1. Attach a native XDP program to a bond in 802.3ad/balance-xor mode
with a compatible xmit_hash_policy (e.g. layer2+3).
2. Change xmit_hash_policy to vlan+srcmac while XDP remains loaded.
This leaves bond->xdp_prog set but bond_xdp_check() now returning false
for the same device. When the bond is later destroyed, dev_xdp_uninstall()
calls bond_xdp_set(dev, NULL, NULL) to remove the program, which hits
the bond_xdp_check() guard and returns -EOPNOTSUPP, triggering:
WARN_ON(dev_xdp_install(dev, mode, bpf_op, NULL, 0, NULL))
Fix this by rejecting xmit_hash_policy changes to vlan+srcmac when an
XDP program is loaded on a bond in 802.3ad or balance-xor mode.
commit 39a0876d595b ("net, bonding: Disallow vlan+srcmac with XDP")
introduced bond_xdp_check() which returns false for 802.3ad/balance-xor
modes when xmit_hash_policy is vlan+srcmac. The check was wired into
bond_xdp_set() to reject XDP attachment with an incompatible policy, but
the symmetric path -- preventing xmit_hash_policy from being changed to an
incompatible value after XDP is already loaded -- was left unguarded in
bond_option_xmit_hash_policy_set().
Note:
commit 094ee6017ea0 ("bonding: check xdp prog when set bond mode")
later added a similar guard to bond_option_mode_set(), but
bond_option_xmit_hash_policy_set() remained unprotected.
Reported-by: syzbot+5a287bcdc08104bc3132@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6995aff6.050a0220.2eeac1.014e.GAE@google.com/T/
Fixes: 39a0876d595b ("net, bonding: Disallow vlan+srcmac with XDP")
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://patch.msgid.link/20260226080306.98766-2-jiayuan.chen@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Lang Xu <xulang@uniontech.com>
Date: Tue Mar 3 17:52:17 2026 +0800
bpf: Fix a UAF issue in bpf_trampoline_link_cgroup_shim
[ Upstream commit 56145d237385ca0e7ca9ff7b226aaf2eb8ef368b ]
The root cause of this bug is that when 'bpf_link_put' reduces the
refcount of 'shim_link->link.link' to zero, the resource is considered
released but may still be referenced via 'tr->progs_hlist' in
'cgroup_shim_find'. The actual cleanup of 'tr->progs_hlist' in
'bpf_shim_tramp_link_release' is deferred. During this window, another
process can cause a use-after-free via 'bpf_trampoline_link_cgroup_shim'.
Based on Martin KaFai Lau's suggestions, I have created a simple patch.
To fix this:
Add an atomic non-zero check in 'bpf_trampoline_link_cgroup_shim'.
Only increment the refcount if it is not already zero.
Testing:
I verified the fix by adding a delay in
'bpf_shim_tramp_link_release' to make the bug easier to trigger:
static void bpf_shim_tramp_link_release(struct bpf_link *link)
{
/* ... */
if (!shim_link->trampoline)
return;
+ msleep(100);
WARN_ON_ONCE(bpf_trampoline_unlink_prog(&shim_link->link,
shim_link->trampoline, NULL));
bpf_trampoline_put(shim_link->trampoline);
}
Before the patch, running a PoC easily reproduced the crash(almost 100%)
with a call trace similar to KaiyanM's report.
After the patch, the bug no longer occurs even after millions of
iterations.
Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor")
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Closes: https://lore.kernel.org/bpf/3c4ebb0b.46ff8.19abab8abe2.Coremail.kaiyanm@hust.edu.cn/
Signed-off-by: Lang Xu <xulang@uniontech.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/279EEE1BA1DDB49D+20260303095217.34436-1-xulang@uniontech.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Kohei Enju <kohei@enjuk.jp>
Date: Wed Feb 25 05:34:44 2026 +0000
bpf: Fix stack-out-of-bounds write in devmap
[ Upstream commit b7bf516c3ecd9a2aae2dc2635178ab87b734fef1 ]
get_upper_ifindexes() iterates over all upper devices and writes their
indices into an array without checking bounds.
Also the callers assume that the max number of upper devices is
MAX_NEST_DEV and allocate excluded_devices[1+MAX_NEST_DEV] on the stack,
but that assumption is not correct and the number of upper devices could
be larger than MAX_NEST_DEV (e.g., many macvlans), causing a
stack-out-of-bounds write.
Add a max parameter to get_upper_ifindexes() to avoid the issue.
When there are too many upper devices, return -EOVERFLOW and abort the
redirect.
To reproduce, create more than MAX_NEST_DEV(8) macvlans on a device with
an XDP program attached using BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS.
Then send a packet to the device to trigger the XDP redirect path.
Reported-by: syzbot+10cc7f13760b31bd2e61@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/698c4ce3.050a0220.340abe.000b.GAE@google.com/T/
Fixes: aeea1b86f936 ("bpf, devmap: Exclude XDP broadcast to master device")
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Link: https://lore.kernel.org/r/20260225053506.4738-1-kohei@enjuk.jp
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Qu Wenruo <wqu@suse.com>
Date: Tue Feb 4 13:46:09 2025 +1030
btrfs: always fallback to buffered write if the inode requires checksum
commit 968f19c5b1b7d5595423b0ac0020cc18dfed8cb5 upstream.
[BUG]
It is a long known bug that VM image on btrfs can lead to data csum
mismatch, if the qemu is using direct-io for the image (this is commonly
known as cache mode 'none').
[CAUSE]
Inside the VM, if the fs is EXT4 or XFS, or even NTFS from Windows, the
fs is allowed to dirty/modify the folio even if the folio is under
writeback (as long as the address space doesn't have AS_STABLE_WRITES
flag inherited from the block device).
This is a valid optimization to improve the concurrency, and since these
filesystems have no extra checksum on data, the content change is not a
problem at all.
But the final write into the image file is handled by btrfs, which needs
the content not to be modified during writeback, or the checksum will
not match the data (checksum is calculated before submitting the bio).
So EXT4/XFS/NTRFS assume they can modify the folio under writeback, but
btrfs requires no modification, this leads to the false csum mismatch.
This is only a controlled example, there are even cases where
multi-thread programs can submit a direct IO write, then another thread
modifies the direct IO buffer for whatever reason.
For such cases, btrfs has no sane way to detect such cases and leads to
false data csum mismatch.
[FIX]
I have considered the following ideas to solve the problem:
- Make direct IO to always skip data checksum
This not only requires a new incompatible flag, as it breaks the
current per-inode NODATASUM flag.
But also requires extra handling for no csum found cases.
And this also reduces our checksum protection.
- Let hardware handle all the checksum
AKA, just nodatasum mount option.
That requires trust for hardware (which is not that trustful in a lot
of cases), and it's not generic at all.
- Always fallback to buffered write if the inode requires checksum
This was suggested by Christoph, and is the solution utilized by this
patch.
The cost is obvious, the extra buffer copying into page cache, thus it
reduces the performance.
But at least it's still user configurable, if the end user still wants
the zero-copy performance, just set NODATASUM flag for the inode
(which is a common practice for VM images on btrfs).
Since we cannot trust user space programs to keep the buffer
consistent during direct IO, we have no choice but always falling back
to buffered IO. At least by this, we avoid the more deadly false data
checksum mismatch error.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Miquel Sabaté Solà <mssola@mssola.com>
Date: Fri Oct 24 12:21:41 2025 +0200
btrfs: define the AUTO_KFREE/AUTO_KVFREE helper macros
[ Upstream commit d00cbce0a7d5de5fc31bf60abd59b44d36806b6e ]
These are two simple macros which ensure that a pointer is initialized
to NULL and with the proper cleanup attribute for it.
Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 52ee9965d09b ("btrfs: zoned: fixup last alloc pointer after extent removal for RAID0/10")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: David Sterba <dsterba@suse.com>
Date: Wed Oct 9 16:31:04 2024 +0200
btrfs: drop unused parameter fs_info from do_reclaim_sweep()
[ Upstream commit 343a63594bb6a49d094860705817aad6663b1f8f ]
The parameter is unused and we can get it from space info if needed.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 19eff93dc738 ("btrfs: fix periodic reclaim condition")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Mark Harmstone <mark@harmstone.com>
Date: Tue Feb 17 17:46:41 2026 +0000
btrfs: fix compat mask in error messages in btrfs_check_features()
[ Upstream commit 587bb33b10bda645a1028c1737ad3992b3d7cf61 ]
Commit d7f67ac9a928 ("btrfs: relax block-group-tree feature dependency
checks") introduced a regression when it comes to handling unsupported
incompat or compat_ro flags. Beforehand we only printed the flags that
we didn't recognize, afterwards we printed them all, which is less
useful. Fix the error handling so it behaves like it used to.
Fixes: d7f67ac9a928 ("btrfs: relax block-group-tree feature dependency checks")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Mark Harmstone <mark@harmstone.com>
Date: Tue Feb 17 10:21:44 2026 +0000
btrfs: fix incorrect key offset in error message in check_dev_extent_item()
[ Upstream commit 511dc8912ae3e929c1a182f5e6b2326516fd42a0 ]
Fix the error message in check_dev_extent_item(), when an overlapping
stripe is encountered. For dev extents, objectid is the disk number and
offset the physical address, so prev_key->objectid should actually be
prev_key->offset.
(I can't take any credit for this one - this was discovered by Chris and
his friend Claude.)
Reported-by: Chris Mason <clm@fb.com>
Fixes: 008e2512dc56 ("btrfs: tree-checker: add dev extent item checks")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Mark Harmstone <mark@harmstone.com>
Date: Tue Feb 17 14:39:46 2026 +0000
btrfs: fix objectid value in error message in check_extent_data_ref()
[ Upstream commit a10172780526c2002e062102ad4f2aabac495889 ]
Fix a copy-paste error in check_extent_data_ref(): we're printing root
as in the message above, we should be printing objectid.
Fixes: f333a3c7e832 ("btrfs: tree-checker: validate dref root and objectid")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Sun YangKai <sunk67188@gmail.com>
Date: Wed Jan 14 11:47:02 2026 +0800
btrfs: fix periodic reclaim condition
[ Upstream commit 19eff93dc738e8afaa59cb374b44bb5a162e6c2d ]
Problems with current implementation:
1. reclaimable_bytes is signed while chunk_sz is unsigned, causing
negative reclaimable_bytes to trigger reclaim unexpectedly
2. The "space must be freed between scans" assumption breaks the
two-scan requirement: first scan marks block groups, second scan
reclaims them. Without the second scan, no reclamation occurs.
Instead, track actual reclaim progress: pause reclaim when block groups
will be reclaimed, and resume only when progress is made. This ensures
reclaim continues until no further progress can be made. And resume
periodic reclaim when there's enough free space.
And we take care if reclaim is making any progress now, so it's
unnecessary to set periodic_reclaim_ready to false when failed to reclaim
a block group.
Fixes: 813d4c6422516 ("btrfs: prevent pathological periodic reclaim loops")
CC: stable@vger.kernel.org # 6.12+
Suggested-by: Boris Burkov <boris@bur.io>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Filipe Manana <fdmanana@suse.com>
Date: Mon Feb 24 16:22:22 2025 +0000
btrfs: fix reclaimed bytes accounting after automatic block group reclaim
[ Upstream commit 620768704326c9a71ea9c8324ffda8748d8d4f10 ]
We are considering the used bytes counter of a block group as the amount
to update the space info's reclaim bytes counter after relocating the
block group, but this value alone is often not enough. This is because we
may have a reserved extent (or more) and in that case its size is
reflected in the reserved counter of the block group - the size of the
extent is only transferred from the reserved counter to the used counter
of the block group when the delayed ref for the extent is run - typically
when committing the transaction (or when flushing delayed refs due to
ENOSPC on space reservation). Such call chain for data extents is:
btrfs_run_delayed_refs_for_head()
run_one_delayed_ref()
run_delayed_data_ref()
alloc_reserved_file_extent()
alloc_reserved_extent()
btrfs_update_block_group()
-> transfers the extent size from the reserved
counter to the used counter
For metadata extents:
btrfs_run_delayed_refs_for_head()
run_one_delayed_ref()
run_delayed_tree_ref()
alloc_reserved_tree_block()
alloc_reserved_extent()
btrfs_update_block_group()
-> transfers the extent size from the reserved
counter to the used counter
Since relocation flushes delalloc, waits for ordered extent completion
and commits the current transaction before doing the actual relocation
work, the correct amount of reclaimed space is therefore the sum of the
"used" and "reserved" counters of the block group before we call
btrfs_relocate_chunk() at btrfs_reclaim_bgs_work().
So fix this by taking the "reserved" counter into consideration.
Fixes: 243192b67649 ("btrfs: report reclaim stats in sysfs")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 19eff93dc738 ("btrfs: fix periodic reclaim condition")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Mark Harmstone <mark@harmstone.com>
Date: Tue Feb 17 17:46:13 2026 +0000
btrfs: fix warning in scrub_verify_one_metadata()
[ Upstream commit 44e2fda66427a0442d8d2c0e6443256fb458ab6b ]
Commit b471965fdb2d ("btrfs: fix replace/scrub failure with
metadata_uuid") fixed the comparison in scrub_verify_one_metadata() to
use metadata_uuid rather than fsid, but left the warning as it was. Fix
it so it matches what we're doing.
Fixes: b471965fdb2d ("btrfs: fix replace/scrub failure with metadata_uuid")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Filipe Manana <fdmanana@suse.com>
Date: Mon Feb 24 15:40:26 2025 +0000
btrfs: get used bytes while holding lock at btrfs_reclaim_bgs_work()
[ Upstream commit ba5d06440cae63edc4f49465baf78f1f43e55c77 ]
At btrfs_reclaim_bgs_work(), we are grabbing twice the used bytes counter
of the block group while not holding the block group's spinlock. This can
result in races, reported by KCSAN and similar tools, since a concurrent
task can be updating that counter while at btrfs_update_block_group().
So avoid these races by grabbing the counter in a critical section
delimited by the block group's spinlock after setting the block group to
RO mode. This also avoids using two different values of the counter in
case it changes in between each read. This silences KCSAN and is required
for the next patch in the series too.
Fixes: 243192b67649 ("btrfs: report reclaim stats in sysfs")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 19eff93dc738 ("btrfs: fix periodic reclaim condition")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Mark Harmstone <mark@harmstone.com>
Date: Tue Feb 17 17:32:39 2026 +0000
btrfs: print correct subvol num if active swapfile prevents deletion
[ Upstream commit 1c7e9111f4e6d6d42bc47759c9af1ef91f03ac2c ]
Fix the error message in btrfs_delete_subvolume() if we can't delete a
subvolume because it has an active swapfile: we were printing the number
of the parent rather than the target.
Fixes: 60021bd754c6 ("btrfs: prevent subvol with swapfile from being deleted")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Date: Fri Jun 6 09:17:41 2025 +0200
btrfs: zoned: fix alloc_offset calculation for partly conventional block groups
[ Upstream commit c0d90a79e8e65b89037508276b2b31f41a1b3783 ]
When one of two zones composing a DUP block group is a conventional zone,
we have the zone_info[i]->alloc_offset = WP_CONVENTIONAL. That will, of
course, not match the write pointer of the other zone, and fails that
block group.
This commit solves that issue by properly recovering the emulated write
pointer from the last allocated extent. The offset for the SINGLE, DUP,
and RAID1 are straight-forward: it is same as the end of last allocated
extent. The RAID0 and RAID10 are a bit tricky that we need to do the math
of striping.
This is the kernel equivalent of Naohiro's user-space commit:
"btrfs-progs: zoned: fix alloc_offset calculation for partly
conventional block groups".
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: dda3ec9ee6b3 ("btrfs: zoned: fixup last alloc pointer after extent removal for RAID1")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Naohiro Aota <naohiro.aota@wdc.com>
Date: Tue Sep 16 11:46:11 2025 +0900
btrfs: zoned: fix stripe width calculation
[ Upstream commit 6a1ab50135ce829b834b448ce49867b5210a1641 ]
The stripe offset calculation in the zoned code for raid0 and raid10
wrongly uses map->stripe_size to calculate it. In fact, map->stripe_size is
the size of the device extent composing the block group, which always is
the zone_size on the zoned setup.
Fix it by using BTRFS_STRIPE_LEN and BTRFS_STRIPE_LEN_SHIFT. Also, optimize
the calculation a bit by doing the common calculation only once.
Fixes: c0d90a79e8e6 ("btrfs: zoned: fix alloc_offset calculation for partly conventional block groups")
CC: stable@vger.kernel.org # 6.17+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 52ee9965d09b ("btrfs: zoned: fixup last alloc pointer after extent removal for RAID0/10")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Naohiro Aota <naohiro.aota@wdc.com>
Date: Fri Jan 23 21:41:35 2026 +0900
btrfs: zoned: fixup last alloc pointer after extent removal for DUP
[ Upstream commit e2d848649e64de39fc1b9c64002629b4daa1105d ]
When a block group is composed of a sequential write zone and a
conventional zone, we recover the (pseudo) write pointer of the
conventional zone using the end of the last allocated position.
However, if the last extent in a block group is removed, the last extent
position will be smaller than the other real write pointer position.
Then, that will cause an error due to mismatch of the write pointers.
We can fixup this case by moving the alloc_offset to the corresponding
write pointer position.
Fixes: c0d90a79e8e6 ("btrfs: zoned: fix alloc_offset calculation for partly conventional block groups")
CC: stable@vger.kernel.org # 6.16+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Naohiro Aota <naohiro.aota@wdc.com>
Date: Fri Jan 23 21:41:36 2026 +0900
btrfs: zoned: fixup last alloc pointer after extent removal for RAID0/10
[ Upstream commit 52ee9965d09b2c56a027613db30d1fb20d623861 ]
When a block group is composed of a sequential write zone and a
conventional zone, we recover the (pseudo) write pointer of the
conventional zone using the end of the last allocated position.
However, if the last extent in a block group is removed, the last extent
position will be smaller than the other real write pointer position.
Then, that will cause an error due to mismatch of the write pointers.
We can fixup this case by moving the alloc_offset to the corresponding
write pointer position.
Fixes: 568220fa9657 ("btrfs: zoned: support RAID0/1/10 on top of raid stripe tree")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Naohiro Aota <naohiro.aota@wdc.com>
Date: Wed Dec 17 20:14:04 2025 +0900
btrfs: zoned: fixup last alloc pointer after extent removal for RAID1
[ Upstream commit dda3ec9ee6b3e120603bff1b798f25b51e54ac5d ]
When a block group is composed of a sequential write zone and a
conventional zone, we recover the (pseudo) write pointer of the
conventional zone using the end of the last allocated position.
However, if the last extent in a block group is removed, the last extent
position will be smaller than the other real write pointer position.
Then, that will cause an error due to mismatch of the write pointers.
We can fixup this case by moving the alloc_offset to the corresponding
write pointer position.
Fixes: 568220fa9657 ("btrfs: zoned: support RAID0/1/10 on top of raid stripe tree")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Oliver Hartkopp <socketcan@hartkopp.net>
Date: Wed Feb 18 11:58:06 2026 +0100
can: bcm: fix locking for bcm_op runtime updates
[ Upstream commit c35636e91e392e1540949bbc67932167cb48bc3a ]
Commit c2aba69d0c36 ("can: bcm: add locking for bcm_op runtime updates")
added a locking for some variables that can be modified at runtime when
updating the sending bcm_op with a new TX_SETUP command in bcm_tx_setup().
Usually the RX_SETUP only handles and filters incoming traffic with one
exception: When the RX_RTR_FRAME flag is set a predefined CAN frame is
sent when a specific RTR frame is received. Therefore the rx bcm_op uses
bcm_can_tx() which uses the bcm_tx_lock that was only initialized in
bcm_tx_setup(). Add the missing spin_lock_init() when allocating the
bcm_op in bcm_rx_setup() to handle the RTR case properly.
Fixes: c2aba69d0c36 ("can: bcm: add locking for bcm_op runtime updates")
Reported-by: syzbot+5b11eccc403dd1cea9f8@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-can/699466e4.a70a0220.2c38d7.00ff.GAE@google.com/
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260218-bcm_spin_lock_init-v1-1-592634c8a5b5@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Feb 23 17:51:17 2026 +0100
can: ems_usb: ems_usb_read_bulk_callback(): check the proper length of a message
commit 38a01c9700b0dcafe97dfa9dc7531bf4a245deff upstream.
When looking at the data in a USB urb, the actual_length is the size of
the buffer passed to the driver, not the transfer_buffer_length which is
set by the driver as the max size of the buffer.
When parsing the messages in ems_usb_read_bulk_callback() properly check
the size both at the beginning of parsing the message to make sure it is
big enough for the expected structure, and at the end of the message to
make sure we don't overflow past the end of the buffer for the next
message.
Cc: Vincent Mailhol <mailhol@kernel.org>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org
Assisted-by: gkh_clanker_2000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026022316-answering-strainer-a5db@gregkh
Fixes: 702171adeed3 ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Alban Bedel <alban.bedel@lht.dlh.de>
Date: Mon Feb 9 15:47:05 2026 +0100
can: mcp251x: fix deadlock in error path of mcp251x_open
[ Upstream commit ab3f894de216f4a62adc3b57e9191888cbf26885 ]
The mcp251x_open() function call free_irq() in its error path with the
mpc_lock mutex held. But if an interrupt already occurred the
interrupt handler will be waiting for the mpc_lock and free_irq() will
deadlock waiting for the handler to finish.
This issue is similar to the one fixed in commit 7dd9c26bd6cf ("can:
mcp251x: fix deadlock if an interrupt occurs during mcp251x_open") but
for the error path.
To solve this issue move the call to free_irq() after the lock is
released. Setting `priv->force_quit = 1` beforehand ensure that the IRQ
handler will exit right away once it acquired the lock.
Signed-off-by: Alban Bedel <alban.bedel@lht.dlh.de>
Link: https://patch.msgid.link/20260209144706.2261954-1-alban.bedel@lht.dlh.de
Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Feb 23 17:30:20 2026 +0100
can: ucan: Fix infinite loop from zero-length messages
commit 1e446fd0582ad8be9f6dafb115fc2e7245f9bea7 upstream.
If a broken ucan device gets a message with the message length field set
to 0, then the driver will loop for forever in
ucan_read_bulk_callback(), hanging the system. If the length is 0, just
skip the message and go on to the next one.
This has been fixed in the kvaser_usb driver in the past in commit
0c73772cd2b8 ("can: kvaser_usb: leaf: Fix potential infinite loop in
command parsers"), so there must be some broken devices out there like
this somewhere.
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Vincent Mailhol <mailhol@kernel.org>
Cc: stable@kernel.org
Assisted-by: gkh_clanker_2000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026022319-huff-absurd-6a18@gregkh
Fixes: 9f2d3eae88d2 ("can: ucan: add driver for Theobroma Systems UCAN devices")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Feb 23 17:39:20 2026 +0100
can: usb: etas_es58x: correctly anchor the urb in the read bulk callback
commit 5eaad4f768266f1f17e01232ffe2ef009f8129b7 upstream.
When submitting an urb, that is using the anchor pattern, it needs to be
anchored before submitting it otherwise it could be leaked if
usb_kill_anchored_urbs() is called. This logic is correctly done
elsewhere in the driver, except in the read bulk callback so do that
here also.
Cc: Vincent Mailhol <mailhol@kernel.org>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org
Assisted-by: gkh_clanker_2000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Vincent Mailhol <mailhol@kernel.org>
Tested-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/2026022320-poser-stiffly-9d84@gregkh
Fixes: 8537257874e9 ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Feb 23 13:10:32 2026 +0100
can: usb: f81604: correctly anchor the urb in the read bulk callback
commit 952caa5da10bed22be09612433964f6877ba0dde upstream.
When submitting an urb, that is using the anchor pattern, it needs to be
anchored before submitting it otherwise it could be leaked if
usb_kill_anchored_urbs() is called. This logic is correctly done
elsewhere in the driver, except in the read bulk callback so do that
here also.
Cc: Ji-Ze Hong (Peter Hong) <peter_hong@fintek.com.tw>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Vincent Mailhol <mailhol@kernel.org>
Cc: stable@kernel.org
Assisted-by: gkh_clanker_2000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026022334-starlight-scaling-2cea@gregkh
Fixes: 88da17436973 ("can: usb: f81604: add Fintek F81604 support")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Feb 23 13:10:31 2026 +0100
can: usb: f81604: handle bulk write errors properly
commit 51f94780720fa90c424f67e3e9784cb8ef8190e5 upstream.
If a write urb fails then more needs to be done other than just logging
the message, otherwise the transmission could be stalled. Properly
increment the error counters and wake up the queues so that data will
continue to flow.
Cc: Ji-Ze Hong (Peter Hong) <peter_hong@fintek.com.tw>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Vincent Mailhol <mailhol@kernel.org>
Cc: stable@kernel.org
Assisted-by: gkh_clanker_2000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026022334-slackness-dynamic-9195@gregkh
Fixes: 88da17436973 ("can: usb: f81604: add Fintek F81604 support")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Feb 23 13:10:30 2026 +0100
can: usb: f81604: handle short interrupt urb messages properly
commit 7299b1b39a255f6092ce4ec0b65f66e9d6a357af upstream.
If an interrupt urb is received that is not the correct length, properly
detect it and don't attempt to treat the data as valid.
Cc: Ji-Ze Hong (Peter Hong) <peter_hong@fintek.com.tw>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Vincent Mailhol <mailhol@kernel.org>
Cc: stable@kernel.org
Assisted-by: gkh_clanker_2000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026022331-opal-evaluator-a928@gregkh
Fixes: 88da17436973 ("can: usb: f81604: add Fintek F81604 support")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Waiman Long <longman@redhat.com>
Date: Sat Feb 21 13:54:12 2026 -0500
cgroup/cpuset: Fix incorrect use of cpuset_update_tasks_cpumask() in update_cpumasks_hier()
[ Upstream commit 68230aac8b9aad243626fbaf3ca170012c17fec5 ]
Commit e2ffe502ba45 ("cgroup/cpuset: Add cpuset.cpus.exclusive for v2")
incorrectly changed the 2nd parameter of cpuset_update_tasks_cpumask()
from tmp->new_cpus to cp->effective_cpus. This second parameter is just
a temporary cpumask for internal use. The cpuset_update_tasks_cpumask()
function was originally called update_tasks_cpumask() before commit
381b53c3b549 ("cgroup/cpuset: rename functions shared between v1
and v2").
This mistake can incorrectly change the effective_cpus of the
cpuset when it is the top_cpuset or in arm64 architecture where
task_cpu_possible_mask() may differ from cpu_possible_mask. So far
top_cpuset hasn't been passed to update_cpumasks_hier() yet, but arm64
arch can still be impacted. Fix it by reverting the incorrect change.
Fixes: e2ffe502ba45 ("cgroup/cpuset: Add cpuset.cpus.exclusive for v2")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Nov 21 17:40:03 2025 +0100
clk: tegra: tegra124-emc: fix device leak on set_rate()
[ Upstream commit da61439c63d34ae6503d080a847f144d587e3a48 ]
Make sure to drop the reference taken when looking up the EMC device and
its driver data on first set_rate().
Note that holding a reference to a device does not prevent its driver
data from going away so there is no point in keeping the reference.
Fixes: 2db04f16b589 ("clk: tegra: Add EMC clock driver")
Fixes: 6d6ef58c2470 ("clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver")
Cc: stable@vger.kernel.org # 4.2: 6d6ef58c2470
Cc: Mikko Perttunen <mperttunen@nvidia.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Guenter Roeck <linux@roeck-us.net>
Date: Thu Feb 26 21:58:12 2026 -0800
dpaa2-switch: Fix interrupt storm after receiving bad if_id in IRQ handler
[ Upstream commit 74badb9c20b1a9c02a95c735c6d3cd6121679c93 ]
Commit 31a7a0bbeb00 ("dpaa2-switch: add bounds check for if_id in IRQ
handler") introduces a range check for if_id to avoid an out-of-bounds
access. If an out-of-bounds if_id is detected, the interrupt status is
not cleared. This may result in an interrupt storm.
Clear the interrupt status after detecting an out-of-bounds if_id to avoid
the problem.
Found by an experimental AI code review agent at Google.
Fixes: 31a7a0bbeb00 ("dpaa2-switch: add bounds check for if_id in IRQ handler")
Cc: Junrui Luo <moonafterrain@outlook.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260227055812.1777915-1-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Lars Ellenberg <lars.ellenberg@linbit.com>
Date: Thu Feb 19 15:20:12 2026 +0100
drbd: fix "LOGIC BUG" in drbd_al_begin_io_nonblock()
commit ab140365fb62c0bdab22b2f516aff563b2559e3b upstream.
Even though we check that we "should" be able to do lc_get_cumulative()
while holding the device->al_lock spinlock, it may still fail,
if some other code path decided to do lc_try_lock() with bad timing.
If that happened, we logged "LOGIC BUG for enr=...",
but still did not return an error.
The rest of the code now assumed that this request has references
for the relevant activity log extents.
The implcations are that during an active resync, mutual exclusivity of
resync versus application IO is not guaranteed. And a potential crash
at this point may not realizs that these extents could have been target
of in-flight IO and would need to be resynced just in case.
Also, once the request completes, it will give up activity log references it
does not even hold, which will trigger a BUG_ON(refcnt == 0) in lc_put().
Fix:
Do not crash the kernel for a condition that is harmless during normal
operation: also catch "e->refcnt == 0", not only "e == NULL"
when being noisy about "al_complete_io() called on inactive extent %u\n".
And do not try to be smart and "guess" whether something will work, then
be surprised when it does not.
Deal with the fact that it may or may not work. If it does not, remember a
possible "partially in activity log" state (only possible for requests that
cross extent boundaries), and return an error code from
drbd_al_begin_io_nonblock().
A latter call for the same request will then resume from where we left off.
Cc: stable@vger.kernel.org
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Date: Fri Feb 20 12:39:37 2026 +0100
drbd: fix null-pointer dereference on local read error
commit 0d195d3b205ca90db30d70d09d7bb6909aac178f upstream.
In drbd_request_endio(), READ_COMPLETED_WITH_ERROR is passed to
__req_mod() with a NULL peer_device:
__req_mod(req, what, NULL, &m);
The READ_COMPLETED_WITH_ERROR handler then unconditionally passes this
NULL peer_device to drbd_set_out_of_sync(), which dereferences it,
causing a null-pointer dereference.
Fix this by obtaining the peer_device via first_peer_device(device),
matching how drbd_req_destroy() handles the same situation.
Cc: stable@vger.kernel.org
Reported-by: Tuo Li <islituo@gmail.com>
Link: https://lore.kernel.org/linux-block/20260104165355.151864-1-islituo@gmail.com
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mario Limonciello <mario.limonciello@amd.com>
Date: Thu Feb 5 10:42:54 2026 -0600
drm/amd: Fix hang on amdgpu unload by using pci_dev_is_disconnected()
[ Upstream commit f7afda7fcd169a9168695247d07ad94cf7b9798f ]
The commit 6a23e7b4332c ("drm/amd: Clean up kfd node on surprise
disconnect") introduced early KFD cleanup when drm_dev_is_unplugged()
returns true. However, this causes hangs during normal module unload
(rmmod amdgpu).
The issue occurs because drm_dev_unplug() is called in amdgpu_pci_remove()
for all removal scenarios, not just surprise disconnects. This was done
intentionally in commit 39934d3ed572 ("Revert "drm/amdgpu: TA unload
messages are not actually sent to psp when amdgpu is uninstalled"") to
fix IGT PCI software unplug test failures. As a result,
drm_dev_is_unplugged() returns true even during normal module unload,
triggering the early KFD cleanup inappropriately.
The correct check should distinguish between:
- Actual surprise disconnect (eGPU unplugged): pci_dev_is_disconnected()
returns true
- Normal module unload (rmmod): pci_dev_is_disconnected() returns false
Replace drm_dev_is_unplugged() with pci_dev_is_disconnected() to ensure
the early cleanup only happens during true hardware disconnect events.
Cc: stable@vger.kernel.org
Reported-by: Cal Peake <cp@absolutedigital.net>
Closes: https://lore.kernel.org/all/b0c22deb-c0fa-3343-33cf-fd9a77d7db99@absolutedigital.net/
Fixes: 6a23e7b4332c ("drm/amd: Clean up kfd node on surprise disconnect")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Bart Van Assche <bvanassche@acm.org>
Date: Mon Feb 23 13:50:23 2026 -0800
drm/amdgpu: Fix locking bugs in error paths
[ Upstream commit 480ad5f6ead4a47b969aab6618573cd6822bb6a4 ]
Do not unlock psp->ras_context.mutex if it has not been locked. This has
been detected by the Clang thread-safety analyzer.
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: YiPeng Chai <YiPeng.Chai@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Fixes: b3fb79cda568 ("drm/amdgpu: add mutex to protect ras shared memory")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6fa01b4335978051d2cd80841728fd63cc597970)
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Thorsten Blum <thorsten.blum@linux.dev>
Date: Mon Sep 8 23:15:54 2025 +0200
drm/amdgpu: Replace kzalloc + copy_from_user with memdup_user
[ Upstream commit 99eeb8358e6cdb7050bf2956370c15dcceda8c7e ]
Replace kzalloc() followed by copy_from_user() with memdup_user() to
improve and simplify ta_if_load_debugfs_write() and
ta_if_invoke_debugfs_write().
No functional changes intended.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 480ad5f6ead4 ("drm/amdgpu: Fix locking bugs in error paths")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Bart Van Assche <bvanassche@acm.org>
Date: Mon Feb 23 14:00:07 2026 -0800
drm/amdgpu: Unlock a mutex before destroying it
[ Upstream commit 5e0bcc7b88bcd081aaae6f481b10d9ab294fcb69 ]
Mutexes must be unlocked before these are destroyed. This has been detected
by the Clang thread-safety analyzer.
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Fixes: f5e4cc8461c4 ("drm/amdgpu: implement RAS ACA driver framework")
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 270258ba320beb99648dceffb67e86ac76786e55)
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Wentao Liang <vulab@iscas.ac.cn>
Date: Thu Mar 6 12:27:20 2025 +0800
drm/exynos/vidi: Remove redundant error handling in vidi_get_modes()
[ Upstream commit 0253dadc772e83aaa67aea8bf24a71e7ffe13cb0 ]
In the vidi_get_modes() function, if either drm_edid_dup() or
drm_edid_alloc() fails, the function will immediately return 0,
indicating that no display modes can be retrieved. However, in
the event of failure in these two functions, it is still necessary
to call the subsequent drm_edid_connector_update() function with
a NULL drm_edid as an argument. This ensures that operations such
as connector settings are performed in its callee function,
_drm_edid_connector_property_update. To maintain the integrity of
the operation, redundant error handling needs to be removed.
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Stable-dep-of: 52b330799e2d ("drm/exynos: vidi: use ctx->lock to protect struct vidi_context member variables related to memory alloc/free")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jeongjun Park <aha310510@gmail.com>
Date: Mon Jan 19 17:25:52 2026 +0900
drm/exynos: vidi: fix to avoid directly dereferencing user pointer
[ Upstream commit d4c98c077c7fb2dfdece7d605e694b5ea2665085 ]
In vidi_connection_ioctl(), vidi->edid(user pointer) is directly
dereferenced in the kernel.
This allows arbitrary kernel memory access from the user space, so instead
of directly accessing the user pointer in the kernel, we should modify it
to copy edid to kernel memory using copy_from_user() and use it.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jeongjun Park <aha310510@gmail.com>
Date: Mon Jan 19 17:25:53 2026 +0900
drm/exynos: vidi: use ctx->lock to protect struct vidi_context member variables related to memory alloc/free
[ Upstream commit 52b330799e2d6f825ae2bb74662ec1b10eb954bb ]
Exynos Virtual Display driver performs memory alloc/free operations
without lock protection, which easily causes concurrency problem.
For example, use-after-free can occur in race scenario like this:
```
CPU0 CPU1 CPU2
---- ---- ----
vidi_connection_ioctl()
if (vidi->connection) // true
drm_edid = drm_edid_alloc(); // alloc drm_edid
...
ctx->raw_edid = drm_edid;
...
drm_mode_getconnector()
drm_helper_probe_single_connector_modes()
vidi_get_modes()
if (ctx->raw_edid) // true
drm_edid_dup(ctx->raw_edid);
if (!drm_edid) // false
...
vidi_connection_ioctl()
if (vidi->connection) // false
drm_edid_free(ctx->raw_edid); // free drm_edid
...
drm_edid_alloc(drm_edid->edid)
kmemdup(edid); // UAF!!
...
```
To prevent these vulns, at least in vidi_context, member variables related
to memory alloc/free should be protected with ctx->lock.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Felix Gu <ustc.gu@gmail.com>
Date: Fri Jan 30 00:21:19 2026 +0800
drm/logicvc: Fix device node reference leak in logicvc_drm_config_parse()
[ Upstream commit fef0e649f8b42bdffe4a916dd46e1b1e9ad2f207 ]
The logicvc_drm_config_parse() function calls of_get_child_by_name() to
find the "layers" node but fails to release the reference, leading to a
device node reference leak.
Fix this by using the __free(device_node) cleanup attribute to automatic
release the reference when the variable goes out of scope.
Fixes: efeeaefe9be5 ("drm: Add support for the LogiCVC display controller")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260130-logicvc_drm-v1-1-04366463750c@gmail.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yujie Liu <yujie.liu@intel.com>
Date: Fri Feb 27 16:24:52 2026 +0800
drm/sched: Fix kernel-doc warning for drm_sched_job_done()
[ Upstream commit 61ded1083b264ff67ca8c2de822c66b6febaf9a8 ]
There is a kernel-doc warning for the scheduler:
Warning: drivers/gpu/drm/scheduler/sched_main.c:367 function parameter 'result' not described in 'drm_sched_job_done'
Fix the warning by describing the undocumented error code.
Fixes: 539f9ee4b52a ("drm/scheduler: properly forward fence errors")
Signed-off-by: Yujie Liu <yujie.liu@intel.com>
[phasta: Flesh out commit message]
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patch.msgid.link/20260227082452.1802922-1-yujie.liu@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Francesco Lavra <flavra@baylibre.com>
Date: Tue Feb 10 19:09:32 2026 +0100
drm/solomon: Fix page start when updating rectangle in page addressing mode
[ Upstream commit 36d9579fed6c9429aa172f77bd28c58696ce8e2b ]
In page addressing mode, the pixel values of a dirty rectangle must be sent
to the display controller one page at a time. The range of pages
corresponding to a given rectangle is being incorrectly calculated as if
the Y value of the top left coordinate of the rectangle was 0. This can
result in rectangle updates being displayed on wrong parts of the screen.
Fix the above issue by consolidating the start page calculation in a single
place at the beginning of the update_rect function, and using the
calculated value for all addressing modes.
Fixes: b0daaa5cfaa5 ("drm/ssd130x: Support page addressing mode")
Signed-off-by: Francesco Lavra <flavra@baylibre.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patch.msgid.link/20260210180932.736502-1-flavra@baylibre.com
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Nov 21 17:42:01 2025 +0100
drm/tegra: dsi: fix device leak on probe
[ Upstream commit bfef062695570842cf96358f2f46f4c6642c6689 ]
Make sure to drop the reference taken when looking up the companion
(ganged) device and its driver data during probe().
Note that holding a reference to a device does not prevent its driver
data from going away so there is no point in keeping the reference.
Fixes: e94236cde4d5 ("drm/tegra: dsi: Add ganged mode support")
Fixes: 221e3638feb8 ("drm/tegra: Fix reference leak in tegra_dsi_ganged_probe")
Cc: stable@vger.kernel.org # 3.19: 221e3638feb8
Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20251121164201.13188-1-johan@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Brad Spengler <brad.spengler@opensrcsec.com>
Date: Wed Jan 7 12:12:36 2026 -0500
drm/vmwgfx: Fix invalid kref_put callback in vmw_bo_dirty_release
[ Upstream commit 211ecfaaef186ee5230a77d054cdec7fbfc6724a ]
The kref_put() call uses (void *)kvfree as the release callback, which
is incorrect. kref_put() expects a function with signature
void (*release)(struct kref *), but kvfree has signature
void (*)(const void *). Calling through an incompatible function pointer
is undefined behavior.
The code only worked by accident because ref_count is the first member
of vmw_bo_dirty, making the kref pointer equal to the struct pointer.
Fix this by adding a proper release callback that uses container_of()
to retrieve the containing structure before freeing.
Fixes: c1962742ffff ("drm/vmwgfx: Use kref in vmw_bo_dirty")
Signed-off-by: Brad Spengler <brad.spengler@opensrcsec.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Ian Forbes <ian.forbes@broadcom.com>
Link: https://patch.msgid.link/20260107171236.3573118-1-zack.rusin@broadcom.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ian Forbes <ian.forbes@broadcom.com>
Date: Tue Jan 13 11:53:57 2026 -0600
drm/vmwgfx: Return the correct value in vmw_translate_ptr functions
[ Upstream commit 5023ca80f9589295cb60735016e39fc5cc714243 ]
Before the referenced fixes these functions used a lookup function that
returned a pointer. This was changed to another lookup function that
returned an error code with the pointer becoming an out parameter.
The error path when the lookup failed was not changed to reflect this
change and the code continued to return the PTR_ERR of the now
uninitialized pointer. This could cause the vmw_translate_ptr functions
to return success when they actually failed causing further uninitialized
and OOB accesses.
Reported-by: Kuzey Arda Bulut <kuzeyardabulut@gmail.com>
Fixes: a309c7194e8a ("drm/vmwgfx: Remove rcu locks from user resources")
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Reviewed-by: Zack Rusin <zack.rusin@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patch.msgid.link/20260113175357.129285-1-ian.forbes@broadcom.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Shuicheng Lin <shuicheng.lin@intel.com>
Date: Wed Feb 4 17:28:11 2026 +0000
drm/xe/reg_sr: Fix leak on xa_store failure
[ Upstream commit 3091723785def05ebfe6a50866f87a044ae314ba ]
Free the newly allocated entry when xa_store() fails to avoid a memory
leak on the error path.
v2: use goto fail_free. (Bala)
Fixes: e5283bd4dfec ("drm/xe/reg_sr: Remove register pool")
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260204172810.1486719-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 6bc6fec71ac45f52db609af4e62bdb96b9f5fadb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Matthew Brost <matthew.brost@intel.com>
Date: Wed Jan 14 16:45:46 2026 -0800
drm/xe: Do not preempt fence signaling CS instructions
[ Upstream commit cdc8a1e11f4d5b480ec750e28010c357185b95a6 ]
If a batch buffer is complete, it makes little sense to preempt the
fence signaling instructions in the ring, as the largest portion of the
work (the batch buffer) is already done and fence signaling consists of
only a few instructions. If these instructions are preempted, the GuC
would need to perform a context switch just to signal the fence, which
is costly and delays fence signaling. Avoid this scenario by disabling
preemption immediately after the BB start instruction and re-enabling it
after executing the fence signaling instructions.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Carlos Santa <carlos.santa@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260115004546.58060-1-matthew.brost@intel.com
(cherry picked from commit 2bcbf2dcde0c839a73af664a3c77d4e77d58a3eb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Vitaly Lifshits <vitaly.lifshits@intel.com>
Date: Tue Jan 6 16:14:20 2026 +0200
e1000e: clear DPG_EN after reset to avoid autonomous power-gating
[ Upstream commit 0942fc6d324eb9c6b16187b2aa994c0823557f06 ]
Panther Lake systems introduced an autonomous power gating feature for
the integrated Gigabit Ethernet in shutdown state (S5) state. As part of
it, the reset value of DPG_EN bit was changed to 1. Clear this bit after
performing hardware reset to avoid errors such as Tx/Rx hangs, or packet
loss/corruption.
Fixes: 0c9183ce61bc ("e1000e: Add support for the next LOM generation")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jann Horn <jannh@google.com>
Date: Mon Feb 23 20:59:33 2026 +0100
eventpoll: Fix integer overflow in ep_loop_check_proc()
commit fdcfce93073d990ed4b71752e31ad1c1d6e9d58b upstream.
If a recursive call to ep_loop_check_proc() hits the `result = INT_MAX`,
an integer overflow will occur in the calling ep_loop_check_proc() at
`result = max(result, ep_loop_check_proc(ep_tovisit, depth + 1) + 1)`,
breaking the recursion depth check.
Fix it by using a different placeholder value that can't lead to an
overflow.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: f2e467a48287 ("eventpoll: Fix semi-unbounded recursion")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://patch.msgid.link/20260223-epoll-int-overflow-v1-1-452f35132224@google.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Baokun Li <libaokun1@huawei.com>
Date: Mon Jul 14 21:03:11 2025 +0800
ext4: add ext4_try_lock_group() to skip busy groups
[ Upstream commit e9eec6f33971fbfcdd32fd1c7dd515ff4d2954c0 ]
When ext4 allocates blocks, we used to just go through the block groups
one by one to find a good one. But when there are tons of block groups
(like hundreds of thousands or even millions) and not many have free space
(meaning they're mostly full), it takes a really long time to check them
all, and performance gets bad. So, we added the "mb_optimize_scan" mount
option (which is on by default now). It keeps track of some group lists,
so when we need a free block, we can just grab a likely group from the
right list. This saves time and makes block allocation much faster.
But when multiple processes or containers are doing similar things, like
constantly allocating 8k blocks, they all try to use the same block group
in the same list. Even just two processes doing this can cut the IOPS in
half. For example, one container might do 300,000 IOPS, but if you run two
at the same time, the total is only 150,000.
Since we can already look at block groups in a non-linear way, the first
and last groups in the same list are basically the same for finding a block
right now. Therefore, add an ext4_try_lock_group() helper function to skip
the current group when it is locked by another process, thereby avoiding
contention with other processes. This helps ext4 make better use of having
multiple block groups.
Also, to make sure we don't skip all the groups that have free space
when allocating blocks, we won't try to skip busy groups anymore when
ac_criteria is CR_ANY_FREE.
Performance test data follows:
Test: Running will-it-scale/fallocate2 on CPU-bound containers.
Observation: Average fallocate operations per container per second.
|CPU: Kunpeng 920 | P80 |
|Memory: 512GB |-------------------------|
|960GB SSD (0.5GB/s)| base | patched |
|-------------------|-------|-----------------|
|mb_optimize_scan=0 | 2667 | 4821 (+80.7%) |
|mb_optimize_scan=1 | 2643 | 4784 (+81.0%) |
|CPU: AMD 9654 * 2 | P96 |
|Memory: 1536GB |-------------------------|
|960GB SSD (1GB/s) | base | patched |
|-------------------|-------|-----------------|
|mb_optimize_scan=0 | 3450 | 15371 (+345%) |
|mb_optimize_scan=1 | 3209 | 6101 (+90.0%) |
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250714130327.1830534-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jan Kara <jack@suse.cz>
Date: Wed Jan 14 19:28:18 2026 +0100
ext4: always allocate blocks only from groups inode can use
[ Upstream commit 4865c768b563deff1b6a6384e74a62f143427b42 ]
For filesystems with more than 2^32 blocks inodes using indirect block
based format cannot use blocks beyond the 32-bit limit.
ext4_mb_scan_groups_linear() takes care to not select these unsupported
groups for such inodes however other functions selecting groups for
allocation don't. So far this is harmless because the other selection
functions are used only with mb_optimize_scan and this is currently
disabled for inodes with indirect blocks however in the following patch
we want to enable mb_optimize_scan regardless of inode format.
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Cc: stable@kernel.org
Link: https://patch.msgid.link/20260114182836.14120-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Baokun Li <libaokun1@huawei.com>
Date: Mon Jul 14 21:03:25 2025 +0800
ext4: convert free groups order lists to xarrays
[ Upstream commit f7eaacbb4e54f8a6c6674c16eff54f703ea63d5e ]
While traversing the list, holding a spin_lock prevents load_buddy, making
direct use of ext4_try_lock_group impossible. This can lead to a bouncing
scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group()
fails, forcing the list traversal to repeatedly restart from grp_A.
In contrast, linear traversal directly uses ext4_try_lock_group(),
avoiding this bouncing. Therefore, we need a lockless, ordered traversal
to achieve linear-like efficiency.
Therefore, this commit converts both average fragment size lists and
largest free order lists into ordered xarrays.
In an xarray, the index represents the block group number and the value
holds the block group information; a non-empty value indicates the block
group's presence.
While insertion and deletion complexity remain O(1), lookup complexity
changes from O(1) to O(nlogn), which may slightly reduce single-threaded
performance.
Additionally, xarray insertions might fail, potentially due to memory
allocation issues. However, since we have linear traversal as a fallback,
this isn't a major problem. Therefore, we've only added a warning message
for insertion failures here.
A helper function ext4_mb_find_good_group_xarray() is added to find good
groups in the specified xarray starting at the specified position start,
and when it reaches ngroups-1, it wraps around to 0 and then to start-1.
This ensures an ordered traversal within the xarray.
Performance test results are as follows: Single-process operations
on an empty disk show negligible impact, while multi-process workloads
demonstrate a noticeable performance gain.
|CPU: Kunpeng 920 | P80 | P1 |
|Memory: 512GB |------------------------|-------------------------|
|960GB SSD (0.5GB/s)| base | patched | base | patched |
|-------------------|-------|----------------|--------|----------------|
|mb_optimize_scan=0 | 20097 | 19555 (-2.6%) | 316141 | 315636 (-0.2%) |
|mb_optimize_scan=1 | 13318 | 15496 (+16.3%) | 325273 | 323569 (-0.5%) |
|CPU: AMD 9654 * 2 | P96 | P1 |
|Memory: 1536GB |------------------------|-------------------------|
|960GB SSD (1GB/s) | base | patched | base | patched |
|-------------------|-------|----------------|--------|----------------|
|mb_optimize_scan=0 | 53603 | 53192 (-0.7%) | 214243 | 212678 (-0.7%) |
|mb_optimize_scan=1 | 20887 | 37636 (+80.1%) | 213632 | 214189 (+0.2%) |
[ Applied spelling fixes per discussion on the ext4-list see thread
referened in the Link tag. --tytso]
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250714130327.1830534-16-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yang Erkun <yangerkun@huawei.com>
Date: Wed Nov 12 16:45:38 2025 +0800
ext4: correct the comments place for EXT4_EXT_MAY_ZEROOUT
[ Upstream commit cc742fd1d184bb2a11bacf50587d2c85290622e4 ]
Move the comments just before we set EXT4_EXT_MAY_ZEROOUT in
ext4_split_convert_extents.
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Message-ID: <20251112084538.1658232-4-yangerkun@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: feaf2a80e78f ("ext4: don't set EXT4_GET_BLOCKS_CONVERT when splitting before submitting I/O")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Zhang Yi <yi.zhang@huawei.com>
Date: Sat Nov 29 18:32:35 2025 +0800
ext4: don't set EXT4_GET_BLOCKS_CONVERT when splitting before submitting I/O
[ Upstream commit feaf2a80e78f89ee8a3464126077ba8683b62791 ]
When allocating blocks during within-EOF DIO and writeback with
dioread_nolock enabled, EXT4_GET_BLOCKS_PRE_IO was set to split an
existing large unwritten extent. However, EXT4_GET_BLOCKS_CONVERT was
set when calling ext4_split_convert_extents(), which may potentially
result in stale data issues.
Assume we have an unwritten extent, and then DIO writes the second half.
[UUUUUUUUUUUUUUUU] on-disk extent U: unwritten extent
[UUUUUUUUUUUUUUUU] extent status tree
|<- ->| ----> dio write this range
First, ext4_iomap_alloc() call ext4_map_blocks() with
EXT4_GET_BLOCKS_PRE_IO, EXT4_GET_BLOCKS_UNWRIT_EXT and
EXT4_GET_BLOCKS_CREATE flags set. ext4_map_blocks() find this extent and
call ext4_split_convert_extents() with EXT4_GET_BLOCKS_CONVERT and the
above flags set.
Then, ext4_split_convert_extents() calls ext4_split_extent() with
EXT4_EXT_MAY_ZEROOUT, EXT4_EXT_MARK_UNWRIT2 and EXT4_EXT_DATA_VALID2
flags set, and it calls ext4_split_extent_at() to split the second half
with EXT4_EXT_DATA_VALID2, EXT4_EXT_MARK_UNWRIT1, EXT4_EXT_MAY_ZEROOUT
and EXT4_EXT_MARK_UNWRIT2 flags set. However, ext4_split_extent_at()
failed to insert extent since a temporary lack -ENOSPC. It zeroes out
the first half but convert the entire on-disk extent to written since
the EXT4_EXT_DATA_VALID2 flag set, but left the second half as unwritten
in the extent status tree.
[0000000000SSSSSS] data S: stale data, 0: zeroed
[WWWWWWWWWWWWWWWW] on-disk extent W: written extent
[WWWWWWWWWWUUUUUU] extent status tree
Finally, if the DIO failed to write data to the disk, the stale data in
the second half will be exposed once the cached extent entry is gone.
Fix this issue by not passing EXT4_GET_BLOCKS_CONVERT when splitting
an unwritten extent before submitting I/O, and make
ext4_split_convert_extents() to zero out the entire extent range
to zero for this case, and also mark the extent in the extent status
tree for consistency.
Fixes: b8a8684502a0 ("ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Cc: stable@kernel.org
Message-ID: <20251129103247.686136-4-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Baokun Li <libaokun1@huawei.com>
Date: Mon Jul 14 21:03:22 2025 +0800
ext4: factor out __ext4_mb_scan_group()
[ Upstream commit 45704f92e55853fe287760e019feb45eeb9c988e ]
Extract __ext4_mb_scan_group() to make the code clearer and to
prepare for the later conversion of 'choose group' to 'scan groups'.
No functional changes.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250714130327.1830534-13-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Baokun Li <libaokun1@huawei.com>
Date: Mon Jul 14 21:03:23 2025 +0800
ext4: factor out ext4_mb_might_prefetch()
[ Upstream commit 5abd85f667a19ef7d880ed00c201fc22de6fa707 ]
Extract ext4_mb_might_prefetch() to make the code clearer and to
prepare for the later conversion of 'choose group' to 'scan groups'.
No functional changes.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250714130327.1830534-14-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Baokun Li <libaokun1@huawei.com>
Date: Mon Jul 14 21:03:24 2025 +0800
ext4: factor out ext4_mb_scan_group()
[ Upstream commit 9c08e42db9056d423dcef5e7998c73182180ff83 ]
Extract ext4_mb_scan_group() to make the code clearer and to
prepare for the later conversion of 'choose group' to 'scan groups'.
No functional changes.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250714130327.1830534-15-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Baokun Li <libaokun1@huawei.com>
Date: Mon Aug 25 11:38:30 2025 +0800
ext4: fix potential null deref in ext4_mb_init()
commit 3c3fac6bc0a9c00dbe65d8dc0d3a282afe4d3188 upstream.
In ext4_mb_init(), ext4_mb_avg_fragment_size_destroy() may be called
when sbi->s_mb_avg_fragment_size remains uninitialized (e.g., if groupinfo
slab cache allocation fails). Since ext4_mb_avg_fragment_size_destroy()
lacks null pointer checking, this leads to a null pointer dereference.
==================================================================
EXT4-fs: no memory for groupinfo slab cache
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: Oops: 0002 [#1] SMP PTI
CPU:2 UID: 0 PID: 87 Comm:mount Not tainted 6.17.0-rc2 #1134 PREEMPT(none)
RIP: 0010:_raw_spin_lock_irqsave+0x1b/0x40
Call Trace:
<TASK>
xa_destroy+0x61/0x130
ext4_mb_init+0x483/0x540
__ext4_fill_super+0x116d/0x17b0
ext4_fill_super+0xd3/0x280
get_tree_bdev_flags+0x132/0x1d0
vfs_get_tree+0x29/0xd0
do_new_mount+0x197/0x300
__x64_sys_mount+0x116/0x150
do_syscall_64+0x50/0x1c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
==================================================================
Therefore, add necessary null check to ext4_mb_avg_fragment_size_destroy()
to prevent this issue. The same fix is also applied to
ext4_mb_largest_free_orders_destroy().
Reported-by: syzbot+1713b1aa266195b916c2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1713b1aa266195b916c2
Cc: stable@kernel.org
Fixes: f7eaacbb4e54 ("ext4: convert free groups order lists to xarrays")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Baokun Li <libaokun1@huawei.com>
Date: Mon Jul 14 21:03:27 2025 +0800
ext4: implement linear-like traversal across order xarrays
[ Upstream commit a3ce570a5d6a70df616ae9a78635a188e6b5fd2f ]
Although we now perform ordered traversal within an xarray, this is
currently limited to a single xarray. However, we have multiple such
xarrays, which prevents us from guaranteeing a linear-like traversal
where all groups on the right are visited before all groups on the left.
For example, suppose we have 128 block groups, with a target group of 64,
a target length corresponding to an order of 1, and available free groups
of 16 (order 1) and group 65 (order 8):
For linear traversal, when no suitable free block is found in group 64, it
will search in the next block group until group 127, then start searching
from 0 up to block group 63. It ensures continuous forward traversal, which
is consistent with the unidirectional rotation behavior of HDD platters.
Additionally, the block group lock contention during freeing block is
unavoidable. The goal increasing from 0 to 64 indicates that previously
scanned groups (which had no suitable free space and are likely to free
blocks later) and skipped groups (which are currently in use) have newly
freed some used blocks. If we allocate blocks in these groups, the
probability of competing with other processes increases.
For non-linear traversal, we first traverse all groups in order_1. If only
group 16 has free space in this list, we first traverse [63, 128), then
traverse [0, 64) to find the available group 16, and then allocate blocks
in group 16. Therefore, it cannot guarantee continuous traversal in one
direction, thus increasing the probability of contention.
So refactor ext4_mb_scan_groups_xarray() to ext4_mb_scan_groups_xa_range()
to only traverse a fixed range of groups, and move the logic for handling
wrap around to the caller. The caller first iterates through all xarrays
in the range [start, ngroups) and then through the range [0, start). This
approach simulates a linear scan, which reduces contention between freeing
blocks and allocating blocks.
Assume we have the following groups, where "|" denotes the xarray traversal
start position:
order_1_groups: AB | CD
order_2_groups: EF | GH
Traversal order:
Before: C > D > A > B > G > H > E > F
After: C > D > G > H > A > B > E > F
Performance test data follows:
|CPU: Kunpeng 920 | P80 | P1 |
|Memory: 512GB |------------------------|-------------------------|
|960GB SSD (0.5GB/s)| base | patched | base | patched |
|-------------------|-------|----------------|--------|----------------|
|mb_optimize_scan=0 | 19555 | 20049 (+2.5%) | 315636 | 316724 (-0.3%) |
|mb_optimize_scan=1 | 15496 | 19342 (+24.8%) | 323569 | 328324 (+1.4%) |
|CPU: AMD 9654 * 2 | P96 | P1 |
|Memory: 1536GB |------------------------|-------------------------|
|960GB SSD (1GB/s) | base | patched | base | patched |
|-------------------|-------|----------------|--------|----------------|
|mb_optimize_scan=0 | 53192 | 52125 (-2.0%) | 212678 | 215136 (+1.1%) |
|mb_optimize_scan=1 | 37636 | 50331 (+33.7%) | 214189 | 209431 (-2.2%) |
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250714130327.1830534-18-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Baokun Li <libaokun1@huawei.com>
Date: Mon Jul 14 21:03:26 2025 +0800
ext4: refactor choose group to scan group
[ Upstream commit 6347558764911f88acac06ab996e162f0c8a212d ]
This commit converts the `choose group` logic to `scan group` using
previously prepared helper functions. This allows us to leverage xarrays
for ordered non-linear traversal, thereby mitigating the "bouncing" issue
inherent in the `choose group` mechanism.
This also decouples linear and non-linear traversals, leading to cleaner
and more readable code.
Key changes:
* ext4_mb_choose_next_group() is refactored to ext4_mb_scan_groups().
* Replaced ext4_mb_good_group() with ext4_mb_scan_group() in non-linear
traversals, and related functions now return error codes instead of
group info.
* Added ext4_mb_scan_groups_linear() for performing linear scans starting
from a specific group for a set number of times.
* Linear scans now execute up to sbi->s_mb_max_linear_groups times,
so ac_groups_linear_remaining is removed as it's no longer used.
* ac->ac_criteria is now used directly instead of passing cr around.
Also, ac->ac_criteria is incremented directly after groups scan fails
for the corresponding criteria.
* Since we're now directly scanning groups instead of finding a good group
then scanning, the following variables and flags are no longer needed,
s_bal_cX_groups_considered is sufficient.
s_bal_p2_aligned_bad_suggestions
s_bal_goal_fast_bad_suggestions
s_bal_best_avail_bad_suggestions
EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED
EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED
EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250714130327.1830534-17-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Thu Feb 19 15:33:54 2026 +0100
HID: Add HID_CLAIMED_INPUT guards in raw_event callbacks missing them
commit ecfa6f34492c493a9a1dc2900f3edeb01c79946b upstream.
In commit 2ff5baa9b527 ("HID: appleir: Fix potential NULL dereference at
raw event handle"), we handle the fact that raw event callbacks
can happen even for a HID device that has not been "claimed" causing a
crash if a broken device were attempted to be connected to the system.
Fix up the remaining in-tree HID drivers that forgot to add this same
check to resolve the same issue.
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <bentiss@kernel.org>
Cc: Bastien Nocera <hadess@hadess.net>
Cc: linux-input@vger.kernel.org
Cc: stable <stable@kernel.org>
Assisted-by: gkh_clanker_2000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Brian Howard <blhoward2@gmail.com>
Date: Tue Dec 2 21:35:47 2025 -0500
HID: multitouch: add quirks for Lenovo Yoga Book 9i
[ Upstream commit 822bc5b3744b0b2c2c9678aa1d80b2cf04fdfabf ]
The Lenovo Yoga Book 9i is a dual-screen laptop, with a single composite
USB device providing both touch and tablet interfaces for both screens.
All inputs report through a single device, differentiated solely by report
numbers. As there is no way for udev to differentiate the inputs based on
USB vendor/product ID or interface numbers, custom naming is required to
match against for downstream configuration. A firmware bug also results
in an erroneous InRange message report being received after the stylus
leaves proximity, blocking later touch events. Add required quirks for
Gen 8 to Gen 10 models, including a new quirk providing for custom input
device naming and dropping erroneous InRange reports.
Signed-off-by: Brian Howard <blhoward2@gmail.com>
Tested-by: Brian Howard <blhoward2@gmail.com>
Tested-by: Kris Fredrick <linux.baguette800@slmail.me>
Reported-by: Andrei Shumailov <gentoo1993@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220386
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Stable-dep-of: a2e70a89fa58 ("HID: multitouch: new class MT_CLS_EGALAX_P80H84")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ian Ray <ian.ray@gehealthcare.com>
Date: Tue Feb 17 13:51:51 2026 +0200
HID: multitouch: new class MT_CLS_EGALAX_P80H84
[ Upstream commit a2e70a89fa58133521b2deae4427d35776bda935 ]
Fixes: f9e82295eec1 ("HID: multitouch: add eGalaxTouch P80H84 support")
Signed-off-by: Ian Ray <ian.ray@gehealthcare.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Akhilesh Patil <akhilesh@ee.iitb.ac.in>
Date: Sun Nov 2 15:13:20 2025 +0530
hwmon: (aht10) Add support for dht20
[ Upstream commit 3eaf1b631506e8de2cb37c278d5bc042521e82c1 ]
Add support for dht20 temperature and humidity sensor from Aosong.
Modify aht10 driver to handle different init command for dht20 sensor by
adding init_cmd entry in the driver data. dht20 sensor is compatible with
aht10 hwmon driver with this change.
Tested on TI am62x SK board with dht20 sensor connected at i2c-2 port.
Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in>
Link: https://lore.kernel.org/r/2025112-94320-906858@bhairav-test.ee.iitb.ac.in
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Stable-dep-of: b7497b5a99f5 ("hwmon: (aht10) Fix initialization commands for AHT20")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Hao Yu <haoyufine@gmail.com>
Date: Mon Feb 23 01:03:31 2026 +0800
hwmon: (aht10) Fix initialization commands for AHT20
[ Upstream commit b7497b5a99f54ab8dcda5b14a308385b2fb03d8d ]
According to the AHT20 datasheet (updated to V1.0 after the 2023.09
version), the initialization command for AHT20 is 0b10111110 (0xBE).
The previous sequence (0xE1) used in earlier versions is no longer
compatible with newer AHT20 sensors. Update the initialization
command to ensure the sensor is properly initialized.
While at it, use binary notation for DHT20_CMD_INIT to match the notation
used in the datasheet.
Fixes: d2abcb5cc885 ("hwmon: (aht10) Add support for compatible aht20")
Signed-off-by: Hao Yu <haoyufine@gmail.com>
Link: https://lore.kernel.org/r/20260222170332.1616-3-haoyufine@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Bart Van Assche <bvanassche@acm.org>
Date: Mon Feb 23 14:00:14 2026 -0800
hwmon: (it87) Check the it87_lock() return value
[ Upstream commit 07ed4f05bbfd2bc014974dcc4297fd3aa1cb88c0 ]
Return early in it87_resume() if it87_lock() fails instead of ignoring the
return value of that function. This patch suppresses a Clang thread-safety
warning.
Cc: Frank Crawford <frank@crawford.emu.id.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: linux-hwmon@vger.kernel.org
Fixes: 376e1a937b30 ("hwmon: (it87) Add calls to smbus_enable/smbus_disable as required")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20260223220102.2158611-15-bart.vanassche@linux.dev
[groeck: Declare 'ret' at the beginning of it87_resume()]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Gui-Dong Han <hanguidong02@gmail.com>
Date: Tue Feb 3 20:14:43 2026 +0800
hwmon: (max16065) Use READ/WRITE_ONCE to avoid compiler optimization induced race
[ Upstream commit 007be4327e443d79c9dd9e56dc16c36f6395d208 ]
Simply copying shared data to a local variable cannot prevent data
races. The compiler is allowed to optimize away the local copy and
re-read the shared memory, causing a Time-of-Check Time-of-Use (TOCTOU)
issue if the data changes between the check and the usage.
To enforce the use of the local variable, use READ_ONCE() when reading
the shared data and WRITE_ONCE() when updating it. Apply these macros to
the three identified locations (curr_sense, adc, and fault) where local
variables are used for error validation, ensuring the value remains
consistent.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Closes: https://lore.kernel.org/all/6fe17868327207e8b850cf9f88b7dc58b2021f73.camel@decadent.org.uk/
Fixes: f5bae2642e3d ("hwmon: Driver for MAX16065 System Manager and compatibles")
Fixes: b8d5acdcf525 ("hwmon: (max16065) Use local variable to avoid TOCTOU")
Cc: stable@vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
Link: https://lore.kernel.org/r/20260203121443.5482-1-hanguidong02@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Naresh Solanki <naresh.solanki@9elements.com>
Date: Mon Oct 7 14:34:24 2024 +0530
hwmon: (max6639) : Configure based on DT property
[ Upstream commit 7506ebcd662b868780774d191a7c024c18c557a8 ]
Remove platform data & initialize with defaults
configuration & overwrite based on DT properties.
Signed-off-by: Naresh Solanki <naresh.solanki@9elements.com>
Message-ID: <20241007090426.811736-1-naresh.solanki@9elements.com>
[groeck: Dropped some unnecessary empty lines]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Stable-dep-of: 170a4b21f49b ("hwmon: (max6639) fix inverted polarity")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Olivier Sobrie <olivier@sobrie.be>
Date: Wed Mar 4 22:20:39 2026 +0100
hwmon: (max6639) fix inverted polarity
[ Upstream commit 170a4b21f49b3dcff3115b4c90758f0a0d77375a ]
According to MAX6639 documentation:
D1: PWM Output Polarity. PWM output is low at
100% duty cycle when this bit is set to zero. PWM
output is high at 100% duty cycle when this bit is set
to 1.
Up to commit 0f33272b60ed ("hwmon: (max6639) : Update hwmon init using
info structure"), the polarity was set to high (0x2) when no platform
data was set. After the patch, the polarity register wasn't set anymore
if no platform data was specified. Nowadays, since commit 7506ebcd662b
("hwmon: (max6639) : Configure based on DT property"), it is always set
to low which doesn't match with the comment above and change the
behavior compared to versions prior 0f33272b60ed.
Fixes: 0f33272b60ed ("hwmon: (max6639) : Update hwmon init using info structure")
Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
Link: https://lore.kernel.org/r/20260304212039.570274-1-olivier@sobrie.be
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Charles Haithcock <chaithco@redhat.com>
Date: Fri Feb 27 18:41:15 2026 -0700
i2c: i801: Revert "i2c: i801: replace acpi_lock with I2C bus lock"
[ Upstream commit cfc69c2e6c699c96949f7b0455195b0bfb7dc715 ]
This reverts commit f707d6b9e7c18f669adfdb443906d46cfbaaa0c1.
Under rare circumstances, multiple udev threads can collect i801 device
info on boot and walk i801_acpi_io_handler somewhat concurrently. The
first will note the area is reserved by acpi to prevent further touches.
This ultimately causes the area to be deregistered. The second will
enter i801_acpi_io_handler after the area is unregistered but before a
check can be made that the area is unregistered. i2c_lock_bus relies on
the now unregistered area containing lock_ops to lock the bus. The end
result is a kernel panic on boot with the following backtrace;
[ 14.971872] ioatdma 0000:09:00.2: enabling device (0100 -> 0102)
[ 14.971873] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 14.971880] #PF: supervisor read access in kernel mode
[ 14.971884] #PF: error_code(0x0000) - not-present page
[ 14.971887] PGD 0 P4D 0
[ 14.971894] Oops: 0000 [#1] PREEMPT SMP PTI
[ 14.971900] CPU: 5 PID: 956 Comm: systemd-udevd Not tainted 5.14.0-611.5.1.el9_7.x86_64 #1
[ 14.971905] Hardware name: XXXXXXXXXXXXXXXXXXXXXXX BIOS 1.20.10.SV91 01/30/2023
[ 14.971908] RIP: 0010:i801_acpi_io_handler+0x2d/0xb0 [i2c_i801]
[ 14.971929] Code: 00 00 49 8b 40 20 41 57 41 56 4d 8b b8 30 04 00 00 49 89 ce 41 55 41 89 d5 41 54 49 89 f4 be 02 00 00 00 55 4c 89 c5 53 89 fb <48> 8b 00 4c 89 c7 e8 18 61 54 e9 80 bd 80 04 00 00 00 75 09 4c 3b
[ 14.971933] RSP: 0018:ffffbaa841483838 EFLAGS: 00010282
[ 14.971938] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9685e01ba568
[ 14.971941] RDX: 0000000000000008 RSI: 0000000000000002 RDI: 0000000000000000
[ 14.971944] RBP: ffff9685ca22f028 R08: ffff9685ca22f028 R09: ffff9685ca22f028
[ 14.971948] R10: 000000000000000b R11: 0000000000000580 R12: 0000000000000580
[ 14.971951] R13: 0000000000000008 R14: ffff9685e01ba568 R15: ffff9685c222f000
[ 14.971954] FS: 00007f8287c0ab40(0000) GS:ffff96a47f940000(0000) knlGS:0000000000000000
[ 14.971959] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 14.971963] CR2: 0000000000000000 CR3: 0000000168090001 CR4: 00000000003706f0
[ 14.971966] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 14.971968] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 14.971972] Call Trace:
[ 14.971977] <TASK>
[ 14.971981] ? show_trace_log_lvl+0x1c4/0x2df
[ 14.971994] ? show_trace_log_lvl+0x1c4/0x2df
[ 14.972003] ? acpi_ev_address_space_dispatch+0x16e/0x3c0
[ 14.972014] ? __die_body.cold+0x8/0xd
[ 14.972021] ? page_fault_oops+0x132/0x170
[ 14.972028] ? exc_page_fault+0x61/0x150
[ 14.972036] ? asm_exc_page_fault+0x22/0x30
[ 14.972045] ? i801_acpi_io_handler+0x2d/0xb0 [i2c_i801]
[ 14.972061] acpi_ev_address_space_dispatch+0x16e/0x3c0
[ 14.972069] ? __pfx_i801_acpi_io_handler+0x10/0x10 [i2c_i801]
[ 14.972085] acpi_ex_access_region+0x5b/0xd0
[ 14.972093] acpi_ex_field_datum_io+0x73/0x2e0
[ 14.972100] acpi_ex_read_data_from_field+0x8e/0x230
[ 14.972106] acpi_ex_resolve_node_to_value+0x23d/0x310
[ 14.972114] acpi_ds_evaluate_name_path+0xad/0x110
[ 14.972121] acpi_ds_exec_end_op+0x321/0x510
[ 14.972127] acpi_ps_parse_loop+0xf7/0x680
[ 14.972136] acpi_ps_parse_aml+0x17a/0x3d0
[ 14.972143] acpi_ps_execute_method+0x137/0x270
[ 14.972150] acpi_ns_evaluate+0x1f4/0x2e0
[ 14.972158] acpi_evaluate_object+0x134/0x2f0
[ 14.972164] acpi_evaluate_integer+0x50/0xe0
[ 14.972173] ? vsnprintf+0x24b/0x570
[ 14.972181] acpi_ac_get_state.part.0+0x23/0x70
[ 14.972189] get_ac_property+0x4e/0x60
[ 14.972195] power_supply_show_property+0x90/0x1f0
[ 14.972205] add_prop_uevent+0x29/0x90
[ 14.972213] power_supply_uevent+0x109/0x1d0
[ 14.972222] dev_uevent+0x10e/0x2f0
[ 14.972228] uevent_show+0x8e/0x100
[ 14.972236] dev_attr_show+0x19/0x40
[ 14.972246] sysfs_kf_seq_show+0x9b/0x100
[ 14.972253] seq_read_iter+0x120/0x4b0
[ 14.972262] ? selinux_file_permission+0x106/0x150
[ 14.972273] vfs_read+0x24f/0x3a0
[ 14.972284] ksys_read+0x5f/0xe0
[ 14.972291] do_syscall_64+0x5f/0xe0
...
The kernel panic is mitigated by setting limiting the count of udev
children to 1. Revert to using the acpi_lock to continue protecting
marking the area as owned by firmware without relying on a lock in
a potentially unmapped region of memory.
Fixes: f707d6b9e7c1 ("i2c: i801: replace acpi_lock with I2C bus lock")
Signed-off-by: Charles Haithcock <chaithco@redhat.com>
[wsa: added Fixes-tag and updated comment stating the importance of the lock]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Thomas Gleixner <tglx@kernel.org>
Date: Sat Feb 7 11:50:23 2026 +0100
i40e: Fix preempt count leak in napi poll tracepoint
[ Upstream commit 4b3d54a85bd37ebf2d9836f0d0de775c0ff21af9 ]
Using get_cpu() in the tracepoint assignment causes an obvious preempt
count leak because nothing invokes put_cpu() to undo it:
softirq: huh, entered softirq 3 NET_RX with preempt_count 00000100, exited with 00000101?
This clearly has seen a lot of testing in the last 3+ years...
Use smp_processor_id() instead.
Fixes: 6d4d584a7ea8 ("i40e: Add i40e_napi_poll tracepoint")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Larysa Zaremba <larysa.zaremba@intel.com>
Date: Thu Mar 5 12:12:46 2026 +0100
i40e: fix registering XDP RxQ info
[ Upstream commit 8f497dc8a61429cc004720aa8e713743355d80cf ]
Current way of handling XDP RxQ info in i40e has a problem, where frag_size
is not updated when xsk_buff_pool is detached or when MTU is changed, this
leads to growing tail always failing for multi-buffer packets.
Couple XDP RxQ info registering with buffer allocations and unregistering
with cleaning the ring.
Fixes: a045d2f2d03d ("i40e: set xdp_rxq_info::frag_size")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20260305111253.2317394-6-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Larysa Zaremba <larysa.zaremba@intel.com>
Date: Thu Mar 5 12:12:47 2026 +0100
i40e: use xdp.frame_sz as XDP RxQ info frag_size
[ Upstream commit c69d22c6c46a1d792ba8af3d8d6356fdc0e6f538 ]
The only user of frag_size field in XDP RxQ info is
bpf_xdp_frags_increase_tail(). It clearly expects whole buffer size instead
of DMA write size. Different assumptions in i40e driver configuration lead
to negative tailroom.
Set frag_size to the same value as frame_sz in shared pages mode, use new
helper to set frag_size when AF_XDP ZC is active.
Fixes: a045d2f2d03d ("i40e: set xdp_rxq_info::frag_size")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20260305111253.2317394-7-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Kohei Enju <kohei@enjuk.jp>
Date: Tue Feb 10 15:57:14 2026 +0000
iavf: fix netdev->max_mtu to respect actual hardware limit
[ Upstream commit b84852170153671bb0fa6737a6e48370addd8e1a ]
iavf sets LIBIE_MAX_MTU as netdev->max_mtu, ignoring vf_res->max_mtu
from PF [1]. This allows setting an MTU beyond the actual hardware
limit, causing TX queue timeouts [2].
Set correct netdev->max_mtu using vf_res->max_mtu from the PF.
Note that currently PF drivers such as ice/i40e set the frame size in
vf_res->max_mtu, not MTU. Convert vf_res->max_mtu to MTU before setting
netdev->max_mtu.
[1]
# ip -j -d link show $DEV | jq '.[0].max_mtu'
16356
[2]
iavf 0000:00:05.0 enp0s5: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 5692 ms
iavf 0000:00:05.0 enp0s5: NIC Link is Up Speed is 10 Gbps Full Duplex
iavf 0000:00:05.0 enp0s5: NETDEV WATCHDOG: CPU: 6: transmit queue 3 timed out 5312 ms
iavf 0000:00:05.0 enp0s5: NIC Link is Up Speed is 10 Gbps Full Duplex
...
Fixes: 5fa4caff59f2 ("iavf: switch to Page Pool")
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Mon Feb 16 11:02:48 2026 -0400
IB/mthca: Add missed mthca_unmap_user_db() for mthca_create_srq()
commit 117942ca43e2e3c3d121faae530989931b7f67e1 upstream.
Fix a user triggerable leak on the system call failure path.
Cc: stable@vger.kernel.org
Fixes: ec34a922d243 ("[PATCH] IB/mthca: Add SRQ implementation")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://patch.msgid.link/2-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Brian Vazquez <brianvv@google.com>
Date: Mon Jan 26 21:55:59 2026 +0000
idpf: change IRQ naming to match netdev and ethtool queue numbering
[ Upstream commit 1500a8662d2d41d6bb03e034de45ddfe6d7d362d ]
The code uses the vidx for the IRQ name but that doesn't match ethtool
reporting nor netdev naming, this makes it hard to tune the device and
associate queues with IRQs. Sequentially requesting irqs starting from
'0' makes the output consistent.
This commit changes the interrupt numbering but preserves the name
format, maintaining ABI compatibility. Existing tools relying on the old
numbering are already non-functional, as they lack a useful correlation
to the interrupts.
Before:
ethtool -L eth1 tx 1 combined 3
grep . /proc/irq/*/*idpf*/../smp_affinity_list
/proc/irq/67/idpf-Mailbox-0/../smp_affinity_list:0-55,112-167
/proc/irq/68/idpf-eth1-TxRx-1/../smp_affinity_list:0
/proc/irq/70/idpf-eth1-TxRx-3/../smp_affinity_list:1
/proc/irq/71/idpf-eth1-TxRx-4/../smp_affinity_list:2
/proc/irq/72/idpf-eth1-Tx-5/../smp_affinity_list:3
ethtool -S eth1 | grep -v ': 0'
NIC statistics:
tx_q-0_pkts: 1002
tx_q-1_pkts: 2679
tx_q-2_pkts: 1113
tx_q-3_pkts: 1192 <----- tx_q-3 vs idpf-eth1-Tx-5
rx_q-0_pkts: 1143
rx_q-1_pkts: 3172
rx_q-2_pkts: 1074
After:
ethtool -L eth1 tx 1 combined 3
grep . /proc/irq/*/*idpf*/../smp_affinity_list
/proc/irq/67/idpf-Mailbox-0/../smp_affinity_list:0-55,112-167
/proc/irq/68/idpf-eth1-TxRx-0/../smp_affinity_list:0
/proc/irq/70/idpf-eth1-TxRx-1/../smp_affinity_list:1
/proc/irq/71/idpf-eth1-TxRx-2/../smp_affinity_list:2
/proc/irq/72/idpf-eth1-Tx-3/../smp_affinity_list:3
ethtool -S eth1 | grep -v ': 0'
NIC statistics:
tx_q-0_pkts: 118
tx_q-1_pkts: 134
tx_q-2_pkts: 228
tx_q-3_pkts: 138 <--- tx_q-3 matches idpf-eth1-Tx-3
rx_q-0_pkts: 111
rx_q-1_pkts: 366
rx_q-2_pkts: 120
Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Steven Chen <chenste@linux.microsoft.com>
Date: Mon Apr 21 15:25:08 2025 -0700
ima: define and call ima_alloc_kexec_file_buf()
[ Upstream commit c95e1acb6d7f00efab73e41b31e0560751e3f469 ]
In the current implementation, the ima_dump_measurement_list() API is
called during the kexec "load" phase, where a buffer is allocated and
the measurement records are copied. Due to this, new events added after
kexec load but before kexec execute are not carried over to the new kernel
during kexec operation
Carrying the IMA measurement list across kexec requires allocating a
buffer and copying the measurement records. Separate allocating the
buffer and copying the measurement records into separate functions in
order to allocate the buffer at kexec 'load' and copy the measurements
at kexec 'execute'.
After moving the vfree() here at this stage in the patch set, the IMA
measurement list fails to verify when doing two consecutive "kexec -s -l"
with/without a "kexec -s -u" in between. Only after "ima: kexec: move
IMA log copy from kexec load to execute" the IMA measurement list verifies
properly with the vfree() here.
Co-developed-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Steven Chen <chenste@linux.microsoft.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Stable-dep-of: 10d1c75ed438 ("ima: verify the previous kernel's IMA buffer lies in addressable RAM")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Steven Chen <chenste@linux.microsoft.com>
Date: Mon Apr 21 15:25:11 2025 -0700
ima: kexec: define functions to copy IMA log at soft boot
[ Upstream commit f18e502db673c75f762d47101dafcf58f30e2733 ]
The IMA log is currently copied to the new kernel during kexec 'load'
using ima_dump_measurement_list(). However, the log copied at kexec
'load' may result in loss of IMA measurements that only occurred after
kexec "load'. Setup the needed infrastructure to move the IMA log copy
from kexec 'load' to 'execute'.
Define a new IMA hook ima_update_kexec_buffer() as a stub function.
It will be used to call ima_dump_measurement_list() during kexec 'execute'.
Implement ima_kexec_post_load() function to be invoked after the new
Kernel image has been loaded for kexec. ima_kexec_post_load() maps the
IMA buffer to a segment in the newly loaded Kernel. It also registers
the reboot notifier_block to trigger ima_update_kexec_buffer() at
kexec 'execute'.
Set the priority of register_reboot_notifier to INT_MIN to ensure that the
IMA log copy operation will happen at the end of the operation chain, so
that all the IMA measurement records extended into the TPM are copied
Co-developed-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Steven Chen <chenste@linux.microsoft.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Stable-dep-of: 10d1c75ed438 ("ima: verify the previous kernel's IMA buffer lies in addressable RAM")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Breno Leitao <leitao@debian.org>
Date: Thu Nov 21 01:57:12 2024 -0800
ima: kexec: silence RCU list traversal warning
[ Upstream commit 68af44a71975688b881ea524e2526bb7c7ad0e9a ]
The ima_measurements list is append-only and doesn't require
rcu_read_lock() protection. However, lockdep issues a warning when
traversing RCU lists without the read lock:
security/integrity/ima/ima_kexec.c:40 RCU-list traversed in non-reader section!!
Fix this by using the variant of list_for_each_entry_rcu() with the last
argument set to true. This tells the RCU subsystem that traversing this
append-only list without the read lock is intentional and safe.
This change silences the lockdep warning while maintaining the correct
semantics for the append-only list traversal.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Stable-dep-of: 10d1c75ed438 ("ima: verify the previous kernel's IMA buffer lies in addressable RAM")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Steven Chen <chenste@linux.microsoft.com>
Date: Mon Apr 21 15:25:07 2025 -0700
ima: rename variable the seq_file "file" to "ima_kexec_file"
[ Upstream commit cb5052282c65dc998d12e4eea8d5133249826c13 ]
Before making the function local seq_file "file" variable file static
global, rename it to "ima_kexec_file".
Signed-off-by: Steven Chen <chenste@linux.microsoft.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Stable-dep-of: 10d1c75ed438 ("ima: verify the previous kernel's IMA buffer lies in addressable RAM")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Date: Tue Dec 30 22:16:07 2025 -0800
ima: verify the previous kernel's IMA buffer lies in addressable RAM
[ Upstream commit 10d1c75ed4382a8e79874379caa2ead8952734f9 ]
Patch series "Address page fault in ima_restore_measurement_list()", v3.
When the second-stage kernel is booted via kexec with a limiting command
line such as "mem=<size>" we observe a pafe fault that happens.
BUG: unable to handle page fault for address: ffff97793ff47000
RIP: ima_restore_measurement_list+0xdc/0x45a
#PF: error_code(0x0000) not-present page
This happens on x86_64 only, as this is already fixed in aarch64 in
commit: cbf9c4b9617b ("of: check previous kernel's ima-kexec-buffer
against memory bounds")
This patch (of 3):
When the second-stage kernel is booted with a limiting command line (e.g.
"mem=<size>"), the IMA measurement buffer handed over from the previous
kernel may fall outside the addressable RAM of the new kernel. Accessing
such a buffer can fault during early restore.
Introduce a small generic helper, ima_validate_range(), which verifies
that a physical [start, end] range for the previous-kernel IMA buffer lies
within addressable memory:
- On x86, use pfn_range_is_mapped().
- On OF based architectures, use page_is_ram().
Link: https://lkml.kernel.org/r/20251231061609.907170-1-harshit.m.mogalapalli@oracle.com
Link: https://lkml.kernel.org/r/20251231061609.907170-2-harshit.m.mogalapalli@oracle.com
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: guoweikang <guoweikang.kernel@gmail.com>
Cc: Henry Willard <henry.willard@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Jonathan McDowell <noodles@fb.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Webb <paul.x.webb@oracle.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Yifei Liu <yifei.l.liu@oracle.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Eric Dumazet <edumazet@google.com>
Date: Fri Feb 27 17:26:03 2026 +0000
indirect_call_wrapper: do not reevaluate function pointer
[ Upstream commit 710f5c76580306cdb9ec51fac8fcf6a8faff7821 ]
We have an increasing number of READ_ONCE(xxx->function)
combined with INDIRECT_CALL_[1234]() helpers.
Unfortunately this forces INDIRECT_CALL_[1234]() to read
xxx->function many times, which is not what we wanted.
Fix these macros so that xxx->function value is not reloaded.
$ scripts/bloat-o-meter -t vmlinux.0 vmlinux
add/remove: 0/0 grow/shrink: 1/65 up/down: 122/-1084 (-962)
Function old new delta
ip_push_pending_frames 59 181 +122
ip6_finish_output 687 681 -6
__udp_enqueue_schedule_skb 1078 1072 -6
ioam6_output 2319 2312 -7
xfrm4_rcv_encap_finish2 64 56 -8
xfrm4_output 297 289 -8
vrf_ip_local_out 278 270 -8
vrf_ip6_local_out 278 270 -8
seg6_input_finish 64 56 -8
rpl_output 700 692 -8
ipmr_forward_finish 124 116 -8
ip_forward_finish 143 135 -8
ip6mr_forward2_finish 100 92 -8
ip6_forward_finish 73 65 -8
input_action_end_bpf 1091 1083 -8
dst_input 52 44 -8
__xfrm6_output 801 793 -8
__xfrm4_output 83 75 -8
bpf_input 500 491 -9
__tcp_check_space 530 521 -9
input_action_end_dt6 291 280 -11
vti6_tnl_xmit 1634 1622 -12
bpf_xmit 1203 1191 -12
rpl_input 497 483 -14
rawv6_send_hdrinc 1355 1341 -14
ndisc_send_skb 1030 1016 -14
ipv6_srh_rcv 1377 1363 -14
ip_send_unicast_reply 1253 1239 -14
ip_rcv_finish 226 212 -14
ip6_rcv_finish 300 286 -14
input_action_end_x_core 205 191 -14
input_action_end_x 355 341 -14
input_action_end_t 205 191 -14
input_action_end_dx6_finish 127 113 -14
input_action_end_dx4_finish 373 359 -14
input_action_end_dt4 426 412 -14
input_action_end_core 186 172 -14
input_action_end_b6_encap 292 278 -14
input_action_end_b6 198 184 -14
igmp6_send 1332 1318 -14
ip_sublist_rcv 864 848 -16
ip6_sublist_rcv 1091 1075 -16
ipv6_rpl_srh_rcv 1937 1920 -17
xfrm_policy_queue_process 1246 1228 -18
seg6_output_core 903 885 -18
mld_sendpack 856 836 -20
NF_HOOK 756 736 -20
vti_tunnel_xmit 1447 1426 -21
input_action_end_dx6 664 642 -22
input_action_end 1502 1480 -22
sock_sendmsg_nosec 134 111 -23
ip6mr_forward2 388 364 -24
sock_recvmsg_nosec 134 109 -25
seg6_input_core 836 810 -26
ip_send_skb 172 146 -26
ip_local_out 140 114 -26
ip6_local_out 140 114 -26
__sock_sendmsg 162 136 -26
__ip_queue_xmit 1196 1170 -26
__ip_finish_output 405 379 -26
ipmr_queue_fwd_xmit 373 346 -27
sock_recvmsg 173 145 -28
ip6_xmit 1635 1607 -28
xfrm_output_resume 1418 1389 -29
ip_build_and_send_pkt 625 591 -34
dst_output 504 432 -72
Total: Before=25217686, After=25216724, chg -0.00%
Fixes: 283c16a2dfd3 ("indirect call wrappers: helpers to speed-up indirect calls of builtin")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260227172603.1700433-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Minseong Kim <ii4gsp@gmail.com>
Date: Wed Jan 21 10:02:02 2026 -0800
Input: synaptics_i2c - guard polling restart in resume
[ Upstream commit 870c2e7cd881d7a10abb91f2b38135622d9f9f65 ]
synaptics_i2c_resume() restarts delayed work unconditionally, even when
the input device is not opened. Guard the polling restart by taking the
input device mutex and checking input_device_enabled() before re-queuing
the delayed work.
Fixes: eef3e4cab72ea ("Input: add driver for Synaptics I2C touchpad")
Signed-off-by: Minseong Kim <ii4gsp@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260121063738.799967-1-ii4gsp@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Marco Crivellari <marco.crivellari@suse.com>
Date: Thu Nov 6 15:19:54 2025 +0100
Input: synaptics_i2c - replace use of system_wq with system_dfl_wq
[ Upstream commit b3ee88e27798f0e8dd3a81867804d693da74d57d ]
Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
This patch continues the effort to refactor worqueue APIs, which has begun
with the change introducing new workqueues and a new alloc_workqueue flag:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
This specific workload do not benefit from a per-cpu workqueue, so use
the default unbound workqueue (system_dfl_wq) instead.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251106141955.218911-4-marco.crivellari@suse.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Stable-dep-of: 870c2e7cd881 ("Input: synaptics_i2c - guard polling restart in resume")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jinhui Guo <guojinhui.liam@bytedance.com>
Date: Thu Jan 22 09:48:50 2026 +0800
iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode
[ Upstream commit 42662d19839f34735b718129ea200e3734b07e50 ]
PCIe endpoints with ATS enabled and passed through to userspace
(e.g., QEMU, DPDK) can hard-lock the host when their link drops,
either by surprise removal or by a link fault.
Commit 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation
request when device is disconnected") adds pci_dev_is_disconnected()
to devtlb_invalidation_with_pasid() so ATS invalidation is skipped
only when the device is being safely removed, but it applies only
when Intel IOMMU scalable mode is enabled.
With scalable mode disabled or unsupported, a system hard-lock
occurs when a PCIe endpoint's link drops because the Intel IOMMU
waits indefinitely for an ATS invalidation that cannot complete.
Call Trace:
qi_submit_sync
qi_flush_dev_iotlb
__context_flush_dev_iotlb.part.0
domain_context_clear_one_cb
pci_for_each_dma_alias
device_block_translation
blocking_domain_attach_dev
iommu_deinit_device
__iommu_group_remove_device
iommu_release_device
iommu_bus_notifier
blocking_notifier_call_chain
bus_notify
device_del
pci_remove_bus_device
pci_stop_and_remove_bus_device
pciehp_unconfigure_device
pciehp_disable_slot
pciehp_handle_presence_or_link_change
pciehp_ist
Commit 81e921fd3216 ("iommu/vt-d: Fix NULL domain on device release")
adds intel_pasid_teardown_sm_context() to intel_iommu_release_device(),
which calls qi_flush_dev_iotlb() and can also hard-lock the system
when a PCIe endpoint's link drops.
Call Trace:
qi_submit_sync
qi_flush_dev_iotlb
__context_flush_dev_iotlb.part.0
intel_context_flush_no_pasid
device_pasid_table_teardown
pci_pasid_table_teardown
pci_for_each_dma_alias
intel_pasid_teardown_sm_context
intel_iommu_release_device
iommu_deinit_device
__iommu_group_remove_device
iommu_release_device
iommu_bus_notifier
blocking_notifier_call_chain
bus_notify
device_del
pci_remove_bus_device
pci_stop_and_remove_bus_device
pciehp_unconfigure_device
pciehp_disable_slot
pciehp_handle_presence_or_link_change
pciehp_ist
Sometimes the endpoint loses connection without a link-down event
(e.g., due to a link fault); killing the process (virsh destroy)
then hard-locks the host.
Call Trace:
qi_submit_sync
qi_flush_dev_iotlb
__context_flush_dev_iotlb.part.0
domain_context_clear_one_cb
pci_for_each_dma_alias
device_block_translation
blocking_domain_attach_dev
__iommu_attach_device
__iommu_device_set_domain
__iommu_group_set_domain_internal
iommu_detach_group
vfio_iommu_type1_detach_group
vfio_group_detach_container
vfio_group_fops_release
__fput
pci_dev_is_disconnected() only covers safe-removal paths;
pci_device_is_present() tests accessibility by reading
vendor/device IDs and internally calls pci_dev_is_disconnected().
On a ConnectX-5 (8 GT/s, x2) this costs ~70 µs.
Since __context_flush_dev_iotlb() is only called on
{attach,release}_dev paths (not hot), add pci_device_is_present()
there to skip inaccessible devices and avoid the hard-lock.
Fixes: 37764b952e1b ("iommu/vt-d: Global devTLB flush when present context entry changed")
Fixes: 81e921fd3216 ("iommu/vt-d: Fix NULL domain on device release")
Cc: stable@vger.kernel.org
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Link: https://lore.kernel.org/r/20251211035946.2071-2-guojinhui.liam@bytedance.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jakub Kicinski <kuba@kernel.org>
Date: Sun Mar 1 11:45:48 2026 -0800
ipv6: fix NULL pointer deref in ip6_rt_get_dev_rcu()
[ Upstream commit 2ffb4f5c2ccb2fa1c049dd11899aee7967deef5a ]
l3mdev_master_dev_rcu() can return NULL when the slave device is being
un-slaved from a VRF. All other callers deal with this, but we lost
the fallback to loopback in ip6_rt_pcpu_alloc() -> ip6_rt_get_dev_rcu()
with commit 4832c30d5458 ("net: ipv6: put host and anycast routes on
device with address").
KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f]
RIP: 0010:ip6_rt_pcpu_alloc (net/ipv6/route.c:1418)
Call Trace:
ip6_pol_route (net/ipv6/route.c:2318)
fib6_rule_lookup (net/ipv6/fib6_rules.c:115)
ip6_route_output_flags (net/ipv6/route.c:2607)
vrf_process_v6_outbound (drivers/net/vrf.c:437)
I was tempted to rework the un-slaving code to clear the flag first
and insert synchronize_rcu() before we remove the upper. But looks like
the explicit fallback to loopback_dev is an established pattern.
And I guess avoiding the synchronize_rcu() is nice, too.
Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address")
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260301194548.927324-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Nam Cao <namcao@linutronix.de>
Date: Thu Feb 12 12:41:25 2026 +0100
irqchip/sifive-plic: Fix frozen interrupt due to affinity setting
[ Upstream commit 1072020685f4b81f6efad3b412cdae0bd62bb043 ]
PLIC ignores interrupt completion message for disabled interrupt, explained
by the specification:
The PLIC signals it has completed executing an interrupt handler by
writing the interrupt ID it received from the claim to the
claim/complete register. The PLIC does not check whether the completion
ID is the same as the last claim ID for that target. If the completion
ID does not match an interrupt source that is currently enabled for
the target, the completion is silently ignored.
This caused problems in the past, because an interrupt can be disabled
while still being handled and plic_irq_eoi() had no effect. That was fixed
by checking if the interrupt is disabled, and if so enable it, before
sending the completion message. That check is done with irqd_irq_disabled().
However, that is not sufficient because the enable bit for the handling
hart can be zero despite irqd_irq_disabled(d) being false. This can happen
when affinity setting is changed while a hart is still handling the
interrupt.
This problem is easily reproducible by dumping a large file to uart (which
generates lots of interrupts) and at the same time keep changing the uart
interrupt's affinity setting. The uart port becomes frozen almost
instantaneously.
Fix this by checking PLIC's enable bit instead of irqd_irq_disabled().
Fixes: cc9f04f9a84f ("irqchip/sifive-plic: Implement irq_set_affinity() for SMP host")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260212114125.3148067-1-namcao@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Steven Chen <chenste@linux.microsoft.com>
Date: Mon Apr 21 15:25:09 2025 -0700
kexec: define functions to map and unmap segments
[ Upstream commit 0091d9241ea24c5275be4a3e5a032862fd9de9ec ]
Implement kimage_map_segment() to enable IMA to map the measurement log
list to the kimage structure during the kexec 'load' stage. This function
gathers the source pages within the specified address range, and maps them
to a contiguous virtual address range.
This is a preparation for later usage.
Implement kimage_unmap_segment() for unmapping segments using vunmap().
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Co-developed-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Steven Chen <chenste@linux.microsoft.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Stable-dep-of: 10d1c75ed438 ("ima: verify the previous kernel's IMA buffer lies in addressable RAM")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Wake Liu <wakel@google.com>
Date: Wed Dec 24 16:41:20 2025 +0800
kselftest/harness: Use helper to avoid zero-size memset warning
[ Upstream commit 19b8a76cd99bde6d299e60490f3e62b8d3df3997 ]
When building kselftests with a toolchain that enables source
fortification (e.g., Android's build environment, which uses
-D_FORTIFY_SOURCE=3), a build failure occurs in tests that use an
empty FIXTURE().
The root cause is that an empty fixture struct results in
`sizeof(self_private)` evaluating to 0. The compiler's fortification
checks then detect the `memset()` call with a compile-time constant size
of 0, issuing a `-Wuser-defined-warnings` which is promoted to an error
by `-Werror`.
An initial attempt to guard the call with `if (sizeof(self_private) > 0)`
was insufficient. The compiler's static analysis is aggressive enough
to flag the `memset(..., 0)` pattern before evaluating the conditional,
thus still triggering the error.
To resolve this robustly, this change introduces a `static inline`
helper function, `__kselftest_memset_safe()`. This function wraps the
size check and the `memset()` call. By replacing the direct `memset()`
in the `__TEST_F_IMPL` macro with a call to this helper, we create an
abstraction boundary. This prevents the compiler's static analyzer from
"seeing" the problematic pattern at the macro expansion site, resolving
the build failure.
Build Context:
Compiler: Android (14488419, +pgo, +bolt, +lto, +mlgo, based on r584948) clang version 22.0.0 (https://android.googlesource.com/toolchain/llvm-project 2d65e4108033380e6fe8e08b1f1826cd2bfb0c99)
Relevant Options: -O2 -Wall -Werror -D_FORTIFY_SOURCE=3 -target i686-linux-android10000
Test: m kselftest_futex_futex_requeue_pi
Removed Gerrit Change-Id
Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20251224084120.249417-1-wakel@google.com
Signed-off-by: Wake Liu <wakel@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Stable-dep-of: 6be268151426 ("selftests/harness: order TEST_F and XFAIL_ADD constructors")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Namjae Jeon <linkinjeon@kernel.org>
Date: Mon Feb 9 10:43:19 2026 +0900
ksmbd: add chann_lock to protect ksmbd_chann_list xarray
[ Upstream commit 4f3a06cc57976cafa8c6f716646be6c79a99e485 ]
ksmbd_chann_list xarray lacks synchronization, allowing use-after-free in
multi-channel sessions (between lookup_chann_list() and ksmbd_chann_del).
Adds rw_semaphore chann_lock to struct ksmbd_session and protects
all xa_load/xa_store/xa_erase accesses.
Cc: stable@vger.kernel.org
Reported-by: Igor Stepansky <igor.stepansky@orca.security>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Namjae Jeon <linkinjeon@kernel.org>
Date: Mon Jul 21 14:29:43 2025 +0900
ksmbd: check return value of xa_store() in krb5_authenticate
[ Upstream commit ecd9d6bf88ddd64e3dc7beb9a065fd5fa4714f72 ]
xa_store() may fail so check its return value and return error code if
error occurred.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Stable-dep-of: 4f3a06cc5797 ("ksmbd: add chann_lock to protect ksmbd_chann_list xarray")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Shuvam Pandey <shuvampandey1@gmail.com>
Date: Thu Feb 26 21:14:10 2026 +0545
kunit: tool: copy caller args in run_kernel to prevent mutation
[ Upstream commit 40804c4974b8df2adab72f6475d343eaff72b7f6 ]
run_kernel() appended KUnit flags directly to the caller-provided args
list. When exec_tests() calls run_kernel() repeatedly (e.g. with
--run_isolated), each call mutated the same list, causing later runs
to inherit stale filter_glob values and duplicate kunit.enable flags.
Fix this by copying args at the start of run_kernel(). Add a regression
test that calls run_kernel() twice with the same list and verifies the
original remains unchanged.
Fixes: ff9e09a3762f ("kunit: tool: support running each suite/test separately")
Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com>
Reviewed-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Fuad Tabba <tabba@google.com>
Date: Fri Feb 13 14:38:12 2026 +0000
KVM: arm64: Hide S1POE from guests when not supported by the host
[ Upstream commit f66857bafd4f151c5cc6856e47be2e12c1721e43 ]
When CONFIG_ARM64_POE is disabled, KVM does not save/restore POR_EL1.
However, ID_AA64MMFR3_EL1 sanitisation currently exposes the feature to
guests whenever the hardware supports it, ignoring the host kernel
configuration.
If a guest detects this feature and attempts to use it, the host will
fail to context-switch POR_EL1, potentially leading to state corruption.
Fix this by masking ID_AA64MMFR3_EL1.S1POE in the sanitised system
registers, preventing KVM from advertising the feature when the host
does not support it (i.e. system_supports_poe() is false).
Fixes: 70ed7238297f ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260213143815.1732675-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Sean Christopherson <seanjc@google.com>
Date: Thu Jan 8 19:06:57 2026 -0800
KVM: x86: Ignore -EBUSY when checking nested events from vcpu_block()
[ Upstream commit ead63640d4e72e6f6d464f4e31f7fecb79af8869 ]
Ignore -EBUSY when checking nested events after exiting a blocking state
while L2 is active, as exiting to userspace will generate a spurious
userspace exit, usually with KVM_EXIT_UNKNOWN, and likely lead to the VM's
demise. Continuing with the wakeup isn't perfect either, as *something*
has gone sideways if a vCPU is awakened in L2 with an injected event (or
worse, a nested run pending), but continuing on gives the VM a decent
chance of surviving without any major side effects.
As explained in the Fixes commits, it _should_ be impossible for a vCPU to
be put into a blocking state with an already-injected event (exception,
IRQ, or NMI). Unfortunately, userspace can stuff MP_STATE and/or injected
events, and thus put the vCPU into what should be an impossible state.
Don't bother trying to preserve the WARN, e.g. with an anti-syzkaller
Kconfig, as WARNs can (hopefully) be added in paths where _KVM_ would be
violating x86 architecture, e.g. by WARNing if KVM attempts to inject an
exception or interrupt while the vCPU isn't running.
Cc: Alessandro Ratti <alessandro@0x65c.net>
Cc: stable@vger.kernel.org
Fixes: 26844fee6ade ("KVM: x86: never write to memory from kvm_vcpu_check_block()")
Fixes: 45405155d876 ("KVM: x86: WARN if a vCPU gets a valid wakeup that KVM can't yet inject")
Link: https://syzkaller.appspot.com/text?tag=ReproC&x=10d4261a580000
Reported-by: syzbot+1522459a74d26b0ac33a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/671bc7a7.050a0220.455e8.022a.GAE@google.com
Link: https://patch.msgid.link/20260109030657.994759-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Fri Mar 13 17:20:49 2026 +0100
Linux 6.12.77
Link: https://lore.kernel.org/r/20260312201018.128816016@linuxfoundation.org
Tested-by: Brett A C Sheffield <bacs@librecast.net>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Wed Jan 8 10:04:47 2025 +0100
LoongArch/orc: Use RCU in all users of __module_address().
[ Upstream commit f99d27d9feb755aee9350fc89f57814d7e1b4880 ]
__module_address() can be invoked within a RCU section, there is no
requirement to have preemption disabled.
Replace the preempt_disable() section around __module_address() with
RCU.
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: loongarch@lists.linux.dev
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-19-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Stable-dep-of: 055c7e75190e ("LoongArch: Handle percpu handler address for ORC unwinder")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Tiezhu Yang <yangtiezhu@loongson.cn>
Date: Tue Feb 10 19:31:13 2026 +0800
LoongArch: Handle percpu handler address for ORC unwinder
[ Upstream commit 055c7e75190e0be43037bd663a3f6aced194416e ]
After commit 4cd641a79e69 ("LoongArch: Remove unnecessary checks for ORC
unwinder"), the system can not boot normally under some configs (such as
enable KASAN), there are many error messages "cannot find unwind pc".
The kernel boots normally with the defconfig, so no problem found out at
the first time. Here is one way to reproduce:
cd linux
make mrproper defconfig -j"$(nproc)"
scripts/config -e KASAN
make olddefconfig all -j"$(nproc)"
sudo make modules_install
sudo make install
sudo reboot
The address that can not unwind is not a valid kernel address which is
between "pcpu_handlers[cpu]" and "pcpu_handlers[cpu] + vec_sz" due to
the code of eentry was copied to the new area of pcpu_handlers[cpu] in
setup_tlb_handler(), handle this special case to get the valid address
to unwind normally.
Cc: stable@vger.kernel.org
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Tiezhu Yang <yangtiezhu@loongson.cn>
Date: Tue Feb 10 19:31:14 2026 +0800
LoongArch: Remove some extern variables in source files
[ Upstream commit 0e6f596d6ac635e80bb265d587b2287ef8fa1cd6 ]
There are declarations of the variable "eentry", "pcpu_handlers[]" and
"exception_handlers[]" in asm/setup.h, the source files already include
this header file directly or indirectly, so no need to declare them in
the source files, just remove the code.
Cc: stable@vger.kernel.org
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Tiezhu Yang <yangtiezhu@loongson.cn>
Date: Wed Dec 31 15:19:19 2025 +0800
LoongArch: Remove unnecessary checks for ORC unwinder
[ Upstream commit 4cd641a79e69270a062777f64a0dd330abb9044a ]
According to the following function definitions, __kernel_text_address()
already checks __module_text_address(), so it should remove the check of
__module_text_address() in bt_address() at least.
int __kernel_text_address(unsigned long addr)
{
if (kernel_text_address(addr))
return 1;
...
return 0;
}
int kernel_text_address(unsigned long addr)
{
bool no_rcu;
int ret = 1;
...
if (is_module_text_address(addr))
goto out;
...
return ret;
}
bool is_module_text_address(unsigned long addr)
{
guard(rcu)();
return __module_text_address(addr) != NULL;
}
Furthermore, there are two checks of __kernel_text_address(), one is in
bt_address() and the other is after calling bt_address(), it looks like
redundant.
Handle the exception address first and then use __kernel_text_address()
to validate the calculated address for exception or the normal address
in bt_address(), then it can remove the check of __kernel_text_address()
after calling bt_address().
Just remove unnecessary checks, no functional changes intended.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Stable-dep-of: 055c7e75190e ("LoongArch: Handle percpu handler address for ORC unwinder")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Anup Patel <apatel@ventanamicro.com>
Date: Mon Aug 18 09:39:01 2025 +0530
mailbox: Allow controller specific mapping using fwnode
[ Upstream commit ba879dfc0574878f3e08f217b2b4fdf845c426c0 ]
Introduce optional fw_node() callback which allows a mailbox controller
driver to provide controller specific mapping using fwnode.
The Linux OF framework already implements fwnode operations for the
Linux DD framework so the fw_xlate() callback works fine with device
tree as well.
Acked-by: Jassi Brar <jassisinghbrar@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20250818040920.272664-6-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Stable-dep-of: fcd7f96c7836 ("mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Tudor Ambarus <tudor.ambarus@linaro.org>
Date: Mon Feb 24 08:27:14 2025 +0000
mailbox: don't protect of_parse_phandle_with_args with con_mutex
[ Upstream commit 8c71c61fc613657d785a3377b4b34484bd978374 ]
There are no concurrency problems if multiple consumers parse the
phandle, don't gratuiously protect the parsing with the mutex used
for the controllers list.
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Stable-dep-of: fcd7f96c7836 ("mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Joonwon Kang <joonwonkang@google.com>
Date: Wed Nov 26 06:22:50 2025 +0000
mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()
[ Upstream commit fcd7f96c783626c07ee3ed75fa3739a8a2052310 ]
Although it is guided that `#mbox-cells` must be at least 1, there are
many instances of `#mbox-cells = <0>;` in the device tree. If that is
the case and the corresponding mailbox controller does not provide
`fw_xlate` and of_xlate` function pointers, `fw_mbox_index_xlate()` will
be used by default and out-of-bounds accesses could occur due to lack of
bounds check in that function.
Cc: stable@vger.kernel.org
Signed-off-by: Joonwon Kang <joonwonkang@google.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Tudor Ambarus <tudor.ambarus@linaro.org>
Date: Mon Feb 24 08:27:17 2025 +0000
mailbox: remove unused header files
[ Upstream commit 4de14ec76b5e67d824896f774b3a23d86a2ebc87 ]
There's nothing used from these header files, remove their inclusion.
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Stable-dep-of: fcd7f96c7836 ("mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Tudor Ambarus <tudor.ambarus@linaro.org>
Date: Mon Feb 24 08:27:15 2025 +0000
mailbox: sort headers alphabetically
[ Upstream commit db824c1119fc16556a84cb7a771ca6553b3c3a45 ]
Sorting headers alphabetically helps locating duplicates,
and makes it easier to figure out where to insert new headers.
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Stable-dep-of: fcd7f96c7836 ("mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Peng Fan <peng.fan@nxp.com>
Date: Fri Apr 11 21:14:09 2025 +0800
mailbox: Use dev_err when there is error
[ Upstream commit 8da4988b6e645f3eaa590ea16f433583364fd09c ]
Use dev_err to show the error log instead of using dev_dbg.
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Stable-dep-of: fcd7f96c7836 ("mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Peng Fan <peng.fan@nxp.com>
Date: Fri Apr 11 21:14:13 2025 +0800
mailbox: Use guard/scoped_guard for con_mutex
[ Upstream commit 16da9a653c5bf5d97fb296420899fe9735aa9c3c ]
Use guard and scoped_guard for con_mutex to simplify code.
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Stable-dep-of: fcd7f96c7836 ("mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jens Axboe <axboe@kernel.dk>
Date: Tue Feb 24 11:51:16 2026 -0700
media: dvb-core: fix wrong reinitialization of ringbuffer on reopen
commit bfbc0b5b32a8f28ce284add619bf226716a59bc0 upstream.
dvb_dvr_open() calls dvb_ringbuffer_init() when a new reader opens the
DVR device. dvb_ringbuffer_init() calls init_waitqueue_head(), which
reinitializes the waitqueue list head to empty.
Since dmxdev->dvr_buffer.queue is a shared waitqueue (all opens of the
same DVR device share it), this orphans any existing waitqueue entries
from io_uring poll or epoll, leaving them with stale prev/next pointers
while the list head is reset to {self, self}.
The waitqueue and spinlock in dvr_buffer are already properly
initialized once in dvb_dmxdev_init(). The open path only needs to
reset the buffer data pointer, size, and read/write positions.
Replace the dvb_ringbuffer_init() call in dvb_dvr_open() with direct
assignment of data/size and a call to dvb_ringbuffer_reset(), which
properly resets pread, pwrite, and error with correct memory ordering
without touching the waitqueue or spinlock.
Cc: stable@vger.kernel.org
Fixes: 34731df288a5f ("V4L/DVB (3501): Dmxdev: use dvb_ringbuffer")
Reported-by: syzbot+ab12f0c08dd7ab8d057c@syzkaller.appspotmail.com
Tested-by: syzbot+ab12f0c08dd7ab8d057c@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/698a26d3.050a0220.3b3015.007d.GAE@google.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Matthias Fend <matthias.fend@emfend.at>
Date: Thu Jun 12 08:54:12 2025 +0200
media: dw9714: add support for powerdown pin
[ Upstream commit 03dca1842421b068d6a65b8ae16e2191882c7753 ]
Add support for the powerdown pin (xSD), which can be used to put the VCM
driver into power down mode. This is useful, for example, if the VCM
driver's power supply cannot be controlled. The use of the powerdown pin is
optional.
Signed-off-by: Matthias Fend <matthias.fend@emfend.at>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Stable-dep-of: 401aec35ac7b ("media: dw9714: Fix powerup sequence")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ricardo Ribalda <ribalda@chromium.org>
Date: Wed Dec 10 07:53:43 2025 +0000
media: dw9714: Fix powerup sequence
[ Upstream commit 401aec35ac7bd04b4018a519257b945abb88e26c ]
We have experienced seen multiple I2C errors while doing stress test on
the module:
dw9714 i2c-PRP0001:01: dw9714_vcm_resume I2C failure: -5
dw9714 i2c-PRP0001:01: I2C write fail
Inspecting the powerup sequence we found that it does not match the
documentation at:
https://blog.arducam.com/downloads/DW9714A-DONGWOON(Autofocus_motor_manual).pdf
"""
(2) DW9714A requires waiting time of 12ms after power on. During this
waiting time, the offset calibration of internal amplifier is
operating for minimization of output offset current .
"""
This patch increases the powerup delay to follow the documentation.
Fixes: 9d00ccabfbb5 ("media: i2c: dw9714: Fix occasional probe errors")
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Tested-by: Neil Sun <neil.sun@lcfuturecenter.com>
Reported-by: Naomi Huang <naomi.huang@lcfuturecenter.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Matthias Fend <matthias.fend@emfend.at>
Date: Thu Jun 12 08:54:11 2025 +0200
media: dw9714: move power sequences to dedicated functions
[ Upstream commit 1eefe42e9de503e422a9c925eebdbd215ee28966 ]
Move the power-up and power-down sequences to their own functions. This is
a preparation for the upcoming powerdown pin support.
Signed-off-by: Matthias Fend <matthias.fend@emfend.at>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Stable-dep-of: 401aec35ac7b ("media: dw9714: Fix powerup sequence")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Zilin Guan <zilin@seu.edu.cn>
Date: Fri Nov 14 09:12:57 2025 +0000
media: tegra-video: Fix memory leak in __tegra_channel_try_format()
[ Upstream commit 43e5302d22334f1183dec3e0d5d8007eefe2817c ]
The state object allocated by __v4l2_subdev_state_alloc() must be freed
with __v4l2_subdev_state_free() when it is no longer needed.
In __tegra_channel_try_format(), two error paths return directly after
v4l2_subdev_call() fails, without freeing the allocated 'sd_state'
object. This violates the requirement and causes a memory leak.
Fix this by introducing a cleanup label and using goto statements in the
error paths to ensure that __v4l2_subdev_state_free() is always called
before the function returns.
Fixes: 56f64b82356b7 ("media: tegra-video: Use zero crop settings if subdev has no get_selection")
Fixes: 1ebaeb09830f3 ("media: tegra-video: Add support for external sensor capture")
Cc: stable@vger.kernel.org
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Nov 21 17:46:23 2025 +0100
memory: mtk-smi: fix device leak on larb probe
[ Upstream commit 9dae65913b32d05dbc8ff4b8a6bf04a0e49a8eb6 ]
Make sure to drop the reference taken when looking up the SMI device
during larb probe on late probe failure (e.g. probe deferral) and on
driver unbind.
Fixes: cc8bbe1a8312 ("memory: mediatek: Add SMI driver")
Fixes: 038ae37c510f ("memory: mtk-smi: add missing put_device() call in mtk_smi_device_link_common")
Cc: stable@vger.kernel.org # 4.6: 038ae37c510f
Cc: stable@vger.kernel.org # 4.6
Cc: Yong Wu <yong.wu@mediatek.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20251121164624.13685-3-johan@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Johan Hovold <johan@kernel.org>
Date: Fri Nov 21 17:46:22 2025 +0100
memory: mtk-smi: fix device leaks on common probe
[ Upstream commit 6cfa038bddd710f544076ea2ef7792fc82fbedd6 ]
Make sure to drop the reference taken when looking up the SMI device
during common probe on late probe failure (e.g. probe deferral) and on
driver unbind.
Fixes: 47404757702e ("memory: mtk-smi: Add device link for smi-sub-common")
Fixes: 038ae37c510f ("memory: mtk-smi: add missing put_device() call in mtk_smi_device_link_common")
Cc: stable@vger.kernel.org # 5.16: 038ae37c510f
Cc: stable@vger.kernel.org # 5.16
Cc: Yong Wu <yong.wu@mediatek.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20251121164624.13685-2-johan@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Christian Brauner <brauner@kernel.org>
Date: Thu Jan 29 14:52:22 2026 +0100
namespace: fix proc mount iteration
commit 4a403d7aa9074f527f064ef0806aaab38d14b07c upstream.
The m->index isn't updated when m->show() overflows and retains its
value before the current mount causing a restart to start at the same
value. If that happens in short order to due a quickly expanding mount
table this would cause the same mount to be shown again and again.
Ensure that *pos always equals the mount id of the mount that was
returned by start/next. On restart after overflow mnt_find_id_at(*pos)
finds the exact mount. This should avoid duplicates, avoid skips and
should handle concurrent modification just fine.
Cc: <stable@vger.kernel.org>
Fixed: 2eea9ce4310d8 ("mounts: keep list of mounts in an rbtree")
Link: https://patch.msgid.link/20260129-geleckt-treuhand-4bb940acacd9@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Allison Henderson <achender@kernel.org>
Date: Fri Feb 27 13:23:36 2026 -0700
net/rds: Fix circular locking dependency in rds_tcp_tune
[ Upstream commit 6a877ececd6daa002a9a0002cd0fbca6592a9244 ]
syzbot reported a circular locking dependency in rds_tcp_tune() where
sk_net_refcnt_upgrade() is called while holding the socket lock:
======================================================
WARNING: possible circular locking dependency detected
======================================================
kworker/u10:8/15040 is trying to acquire lock:
ffffffff8e9aaf80 (fs_reclaim){+.+.}-{0:0},
at: __kmalloc_cache_noprof+0x4b/0x6f0
but task is already holding lock:
ffff88805a3c1ce0 (k-sk_lock-AF_INET6){+.+.}-{0:0},
at: rds_tcp_tune+0xd7/0x930
The issue occurs because sk_net_refcnt_upgrade() performs memory
allocation (via get_net_track() -> ref_tracker_alloc()) while the
socket lock is held, creating a circular dependency with fs_reclaim.
Fix this by moving sk_net_refcnt_upgrade() outside the socket lock
critical section. This is safe because the fields modified by the
sk_net_refcnt_upgrade() call (sk_net_refcnt, ns_tracker) are not
accessed by any concurrent code path at this point.
v2:
- Corrected fixes tag
- check patch line wrap nits
- ai commentary nits
Reported-by: syzbot+2e2cf5331207053b8106@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2e2cf5331207053b8106
Fixes: 3a58f13a881e ("net: rds: acquire refcount on TCP sockets")
Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260227202336.167757-1-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Wed Mar 4 09:06:02 2026 -0500
net/sched: act_ife: Fix metalist update behavior
[ Upstream commit e2cedd400c3ec0302ffca2490e8751772906ac23 ]
Whenever an ife action replace changes the metalist, instead of
replacing the old data on the metalist, the current ife code is appending
the new metadata. Aside from being innapropriate behavior, this may lead
to an unbounded addition of metadata to the metalist which might cause an
out of bounds error when running the encode op:
[ 138.423369][ C1] ==================================================================
[ 138.424317][ C1] BUG: KASAN: slab-out-of-bounds in ife_tlv_meta_encode (net/ife/ife.c:168)
[ 138.424906][ C1] Write of size 4 at addr ffff8880077f4ffe by task ife_out_out_bou/255
[ 138.425778][ C1] CPU: 1 UID: 0 PID: 255 Comm: ife_out_out_bou Not tainted 7.0.0-rc1-00169-gfbdfa8da05b6 #624 PREEMPT(full)
[ 138.425795][ C1] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 138.425800][ C1] Call Trace:
[ 138.425804][ C1] <IRQ>
[ 138.425808][ C1] dump_stack_lvl (lib/dump_stack.c:122)
[ 138.425828][ C1] print_report (mm/kasan/report.c:379 mm/kasan/report.c:482)
[ 138.425839][ C1] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[ 138.425844][ C1] ? __virt_addr_valid (./arch/x86/include/asm/preempt.h:95 (discriminator 1) ./include/linux/rcupdate.h:975 (discriminator 1) ./include/linux/mmzone.h:2207 (discriminator 1) arch/x86/mm/physaddr.c:54 (discriminator 1))
[ 138.425853][ C1] ? ife_tlv_meta_encode (net/ife/ife.c:168)
[ 138.425859][ C1] kasan_report (mm/kasan/report.c:221 mm/kasan/report.c:597)
[ 138.425868][ C1] ? ife_tlv_meta_encode (net/ife/ife.c:168)
[ 138.425878][ C1] kasan_check_range (mm/kasan/generic.c:186 (discriminator 1) mm/kasan/generic.c:200 (discriminator 1))
[ 138.425884][ C1] __asan_memset (mm/kasan/shadow.c:84 (discriminator 2))
[ 138.425889][ C1] ife_tlv_meta_encode (net/ife/ife.c:168)
[ 138.425893][ C1] ? ife_tlv_meta_encode (net/ife/ife.c:171)
[ 138.425898][ C1] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[ 138.425903][ C1] ife_encode_meta_u16 (net/sched/act_ife.c:57)
[ 138.425910][ C1] ? __pfx_do_raw_spin_lock (kernel/locking/spinlock_debug.c:114)
[ 138.425916][ C1] ? __asan_memcpy (mm/kasan/shadow.c:105 (discriminator 3))
[ 138.425921][ C1] ? __pfx_ife_encode_meta_u16 (net/sched/act_ife.c:45)
[ 138.425927][ C1] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[ 138.425931][ C1] tcf_ife_act (net/sched/act_ife.c:847 net/sched/act_ife.c:879)
To solve this issue, fix the replace behavior by adding the metalist to
the ife rcu data structure.
Fixes: aa9fd9a325d51 ("sched: act: ife: update parameters via rcu handling")
Reported-by: Ruitong Liu <cnitlrt@gmail.com>
Tested-by: Ruitong Liu <cnitlrt@gmail.com>
Co-developed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260304140603.76500-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Davide Caratti <dcaratti@redhat.com>
Date: Tue Feb 24 21:28:32 2026 +0100
net/sched: ets: fix divide by zero in the offload path
commit e35626f610f3d2b7953ccddf6a77453da22b3a9e upstream.
Offloading ETS requires computing each class' WRR weight: this is done by
averaging over the sums of quanta as 'q_sum' and 'q_psum'. Using unsigned
int, the same integer size as the individual DRR quanta, can overflow and
even cause division by zero, like it happened in the following splat:
Oops: divide error: 0000 [#1] SMP PTI
CPU: 13 UID: 0 PID: 487 Comm: tc Tainted: G E 6.19.0-virtme #45 PREEMPT(full)
Tainted: [E]=UNSIGNED_MODULE
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
RIP: 0010:ets_offload_change+0x11f/0x290 [sch_ets]
Code: e4 45 31 ff eb 03 41 89 c7 41 89 cb 89 ce 83 f9 0f 0f 87 b7 00 00 00 45 8b 08 31 c0 45 01 cc 45 85 c9 74 09 41 6b c4 64 31 d2 <41> f7 f2 89 c2 44 29 fa 45 89 df 41 83 fb 0f 0f 87 c7 00 00 00 44
RSP: 0018:ffffd0a180d77588 EFLAGS: 00010246
RAX: 00000000ffffff38 RBX: ffff8d3d482ca000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffd0a180d77660
RBP: ffffd0a180d77690 R08: ffff8d3d482ca2d8 R09: 00000000fffffffe
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffe
R13: ffff8d3d472f2000 R14: 0000000000000003 R15: 0000000000000000
FS: 00007f440b6c2740(0000) GS:ffff8d3dc9803000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000003cdd2000 CR3: 0000000007b58002 CR4: 0000000000172ef0
Call Trace:
<TASK>
ets_qdisc_change+0x870/0xf40 [sch_ets]
qdisc_create+0x12b/0x540
tc_modify_qdisc+0x6d7/0xbd0
rtnetlink_rcv_msg+0x168/0x6b0
netlink_rcv_skb+0x5c/0x110
netlink_unicast+0x1d6/0x2b0
netlink_sendmsg+0x22e/0x470
____sys_sendmsg+0x38a/0x3c0
___sys_sendmsg+0x99/0xe0
__sys_sendmsg+0x8a/0xf0
do_syscall_64+0x111/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f440b81c77e
Code: 4d 89 d8 e8 d4 bc 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
RSP: 002b:00007fff951e4c10 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000481820 RCX: 00007f440b81c77e
RDX: 0000000000000000 RSI: 00007fff951e4cd0 RDI: 0000000000000003
RBP: 00007fff951e4c20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff951f4fa8
R13: 00000000699ddede R14: 00007f440bb01000 R15: 0000000000486980
</TASK>
Modules linked in: sch_ets(E) netdevsim(E)
---[ end trace 0000000000000000 ]---
RIP: 0010:ets_offload_change+0x11f/0x290 [sch_ets]
Code: e4 45 31 ff eb 03 41 89 c7 41 89 cb 89 ce 83 f9 0f 0f 87 b7 00 00 00 45 8b 08 31 c0 45 01 cc 45 85 c9 74 09 41 6b c4 64 31 d2 <41> f7 f2 89 c2 44 29 fa 45 89 df 41 83 fb 0f 0f 87 c7 00 00 00 44
RSP: 0018:ffffd0a180d77588 EFLAGS: 00010246
RAX: 00000000ffffff38 RBX: ffff8d3d482ca000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffd0a180d77660
RBP: ffffd0a180d77690 R08: ffff8d3d482ca2d8 R09: 00000000fffffffe
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffe
R13: ffff8d3d472f2000 R14: 0000000000000003 R15: 0000000000000000
FS: 00007f440b6c2740(0000) GS:ffff8d3dc9803000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000003cdd2000 CR3: 0000000007b58002 CR4: 0000000000172ef0
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---
Fix this using 64-bit integers for 'q_sum' and 'q_psum'.
Cc: stable@vger.kernel.org
Fixes: d35eb52bd2ac ("net: sch_ets: Make the ETS qdisc offloadable")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/28504887df314588c7255e9911769c36f751edee.1771964872.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Victor Nogueira <victor@mojatatu.com>
Date: Wed Feb 25 10:43:48 2026 -0300
net/sched: Only allow act_ct to bind to clsact/ingress qdiscs and shared blocks
commit 11cb63b0d1a0685e0831ae3c77223e002ef18189 upstream.
As Paolo said earlier [1]:
"Since the blamed commit below, classify can return TC_ACT_CONSUMED while
the current skb being held by the defragmentation engine. As reported by
GangMin Kim, if such packet is that may cause a UaF when the defrag engine
later on tries to tuch again such packet."
act_ct was never meant to be used in the egress path, however some users
are attaching it to egress today [2]. Attempting to reach a middle
ground, we noticed that, while most qdiscs are not handling
TC_ACT_CONSUMED, clsact/ingress qdiscs are. With that in mind, we
address the issue by only allowing act_ct to bind to clsact/ingress
qdiscs and shared blocks. That way it's still possible to attach act_ct to
egress (albeit only with clsact).
[1] https://lore.kernel.org/netdev/674b8cbfc385c6f37fb29a1de08d8fe5c2b0fbee.1771321118.git.pabeni@redhat.com/
[2] https://lore.kernel.org/netdev/cc6bfb4a-4a2b-42d8-b9ce-7ef6644fb22b@ovn.org/
Reported-by: GangMin Kim <km.kim1503@gmail.com>
Fixes: 3f14b377d01d ("net/sched: act_ct: fix skb leak and crash on ooo frags")
CC: stable@vger.kernel.org
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260225134349.1287037-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Date: Thu Feb 12 20:55:09 2026 -0800
net: arcnet: com20020-pci: fix support for 2.5Mbit cards
[ Upstream commit c7d9be66b71af490446127c6ffcb66d6bb71b8b9 ]
Commit 8c14f9c70327 ("ARCNET: add com20020 PCI IDs with metadata")
converted the com20020-pci driver to use a card info structure instead
of a single flag mask in driver_data. However, it failed to take into
account that in the original code, driver_data of 0 indicates a card
with no special flags, not a card that should not have any card info
structure. This introduced a null pointer dereference when cards with
no flags were probed.
Commit bd6f1fd5d33d ("net: arcnet: com20020: Fix null-ptr-deref in
com20020pci_probe()") then papered over this issue by rejecting cards
with no driver_data instead of resolving the problem at its source.
Fix the original issue by introducing a new card info structure for
2.5Mbit cards that does not set any flags and using it if no
driver_data is present.
Fixes: 8c14f9c70327 ("ARCNET: add com20020 PCI IDs with metadata")
Fixes: bd6f1fd5d33d ("net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()")
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260213045510.32368-1-enelsonmoore@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Fernando Fernandez Mancera <fmancera@suse.de>
Date: Wed Mar 4 13:03:56 2026 +0100
net: bridge: fix nd_tbl NULL dereference when IPv6 is disabled
[ Upstream commit e5e890630533bdc15b26a34bb8e7ef539bdf1322 ]
When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never
initialized because inet6_init() exits before ndisc_init() is called
which initializes it. Then, if neigh_suppress is enabled and an ICMPv6
Neighbor Discovery packet reaches the bridge, br_do_suppress_nd() will
dereference ipv6_stub->nd_tbl which is NULL, passing it to
neigh_lookup(). This causes a kernel NULL pointer dereference.
BUG: kernel NULL pointer dereference, address: 0000000000000268
Oops: 0000 [#1] PREEMPT SMP NOPTI
[...]
RIP: 0010:neigh_lookup+0x16/0xe0
[...]
Call Trace:
<IRQ>
? neigh_lookup+0x16/0xe0
br_do_suppress_nd+0x160/0x290 [bridge]
br_handle_frame_finish+0x500/0x620 [bridge]
br_handle_frame+0x353/0x440 [bridge]
__netif_receive_skb_core.constprop.0+0x298/0x1110
__netif_receive_skb_one_core+0x3d/0xa0
process_backlog+0xa0/0x140
__napi_poll+0x2c/0x170
net_rx_action+0x2c4/0x3a0
handle_softirqs+0xd0/0x270
do_softirq+0x3f/0x60
Fix this by replacing IS_ENABLED(IPV6) call with ipv6_mod_enabled() in
the callers. This is in essence disabling NS/NA suppression when IPv6 is
disabled.
Fixes: ed842faeb2bd ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports")
Reported-by: Guruprasad C P <gurucp2005@gmail.com>
Closes: https://lore.kernel.org/netdev/CAHXs0ORzd62QOG-Fttqa2Cx_A_VFp=utE2H2VTX5nqfgs7LDxQ@mail.gmail.com/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260304120357.9778-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Date: Sun Mar 1 18:13:14 2026 -0300
net: dsa: realtek: rtl8365mb: fix rtl8365mb_phy_ocp_write return value
[ Upstream commit 7cbe98f7bef965241a5908d50d557008cf998aee ]
Function rtl8365mb_phy_ocp_write() always returns 0, even when an error
occurs during register access. This patch fixes the return value to
propagate the actual error code from regmap operations.
Link: https://lore.kernel.org/netdev/a2dfde3c-d46f-434b-9d16-1e251e449068@yahoo.com/
Fixes: 2796728460b8 ("net: dsa: realtek: rtl8365mb: serialize indirect PHY register access")
Signed-off-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260301-realtek_namiltd_fix1-v1-1-43a6bb707f9c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Tue Mar 3 18:56:39 2026 +0100
net: ethernet: mtk_eth_soc: Reset prog ptr to old_prog in case of error in mtk_xdp_setup()
[ Upstream commit 0abc73c8a40fd64ac1739c90bb4f42c418d27a5e ]
Reset eBPF program pointer to old_prog and do not decrease its ref-count
if mtk_open routine in mtk_xdp_setup() fails.
Fixes: 7c26c20da5d42 ("net: ethernet: mtk_eth_soc: add basic XDP support")
Suggested-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260303-mtk-xdp-prog-ptr-fix-v2-1-97b6dbbe240f@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Chintan Vankar <c-vankar@ti.com>
Date: Tue Feb 24 23:43:59 2026 +0530
net: ethernet: ti: am65-cpsw-nuss/cpsw-ale: Fix multicast entry handling in ALE table
[ Upstream commit be11a537224d72b906db6b98510619770298c8a4 ]
In the current implementation, flushing multicast entries in MAC mode
incorrectly deletes entries for all ports instead of only the target port,
disrupting multicast traffic on other ports. The cause is adding multicast
entries by setting only host port bit, and not setting the MAC port bits.
Fix this by setting the MAC port's bit in the port mask while adding the
multicast entry. Also fix the flush logic to preserve the host port bit
during removal of MAC port and free ALE entries when mask contains only
host port.
Fixes: 5c50a856d550 ("drivers: net: ethernet: cpsw: add multicast address to ALE table")
Signed-off-by: Chintan Vankar <c-vankar@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260224181359.2055322-1-c-vankar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yung Chih Su <yuuchihsu@gmail.com>
Date: Mon Mar 2 14:02:47 2026 +0800
net: ipv4: fix ARM64 alignment fault in multipath hash seed
[ Upstream commit 4ee7fa6cf78ff26d783d39e2949d14c4c1cd5e7f ]
`struct sysctl_fib_multipath_hash_seed` contains two u32 fields
(user_seed and mp_seed), making it an 8-byte structure with a 4-byte
alignment requirement.
In `fib_multipath_hash_from_keys()`, the code evaluates the entire
struct atomically via `READ_ONCE()`:
mp_seed = READ_ONCE(net->ipv4.sysctl_fib_multipath_hash_seed).mp_seed;
While this silently works on GCC by falling back to unaligned regular
loads which the ARM64 kernel tolerates, it causes a fatal kernel panic
when compiled with Clang and LTO enabled.
Commit e35123d83ee3 ("arm64: lto: Strengthen READ_ONCE() to acquire
when CONFIG_LTO=y") strengthens `READ_ONCE()` to use Load-Acquire
instructions (`ldar` / `ldapr`) to prevent compiler reordering bugs
under Clang LTO. Since the macro evaluates the full 8-byte struct,
Clang emits a 64-bit `ldar` instruction. ARM64 architecture strictly
requires `ldar` to be naturally aligned, thus executing it on a 4-byte
aligned address triggers a strict Alignment Fault (FSC = 0x21).
Fix the read side by moving the `READ_ONCE()` directly to the `u32`
member, which emits a safe 32-bit `ldar Wn`.
Furthermore, Eric Dumazet pointed out that `WRITE_ONCE()` on the entire
struct in `proc_fib_multipath_hash_set_seed()` is also flawed. Analysis
shows that Clang splits this 8-byte write into two separate 32-bit
`str` instructions. While this avoids an alignment fault, it destroys
atomicity and exposes a tear-write vulnerability. Fix this by
explicitly splitting the write into two 32-bit `WRITE_ONCE()`
operations.
Finally, add the missing `READ_ONCE()` when reading `user_seed` in
`proc_fib_multipath_hash_seed()` to ensure proper pairing and
concurrency safety.
Fixes: 4ee2a8cace3f ("net: ipv4: Add a sysctl to set multipath hash seed")
Signed-off-by: Yung Chih Su <yuuchihsu@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260302060247.7066-1-yuuchihsu@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jiayuan Chen <jiayuan.chen@shopee.com>
Date: Wed Mar 4 19:38:13 2026 +0800
net: ipv6: fix panic when IPv4 route references loopback IPv6 nexthop
[ Upstream commit 21ec92774d1536f71bdc90b0e3d052eff99cf093 ]
When a standalone IPv6 nexthop object is created with a loopback device
(e.g., "ip -6 nexthop add id 100 dev lo"), fib6_nh_init() misclassifies
it as a reject route. This is because nexthop objects have no destination
prefix (fc_dst=::), causing fib6_is_reject() to match any loopback
nexthop. The reject path skips fib_nh_common_init(), leaving
nhc_pcpu_rth_output unallocated. If an IPv4 route later references this
nexthop, __mkroute_output() dereferences NULL nhc_pcpu_rth_output and
panics.
Simplify the check in fib6_nh_init() to only match explicit reject
routes (RTF_REJECT) instead of using fib6_is_reject(). The loopback
promotion heuristic in fib6_is_reject() is handled separately by
ip6_route_info_create_nh(). After this change, the three cases behave
as follows:
1. Explicit reject route ("ip -6 route add unreachable 2001:db8::/64"):
RTF_REJECT is set, enters reject path, skips fib_nh_common_init().
No behavior change.
2. Implicit loopback reject route ("ip -6 route add 2001:db8::/32 dev lo"):
RTF_REJECT is not set, takes normal path, fib_nh_common_init() is
called. ip6_route_info_create_nh() still promotes it to reject
afterward. nhc_pcpu_rth_output is allocated but unused, which is
harmless.
3. Standalone nexthop object ("ip -6 nexthop add id 100 dev lo"):
RTF_REJECT is not set, takes normal path, fib_nh_common_init() is
called. nhc_pcpu_rth_output is properly allocated, fixing the crash
when IPv4 routes reference this nexthop.
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects")
Reported-by: syzbot+334190e097a98a1b81bb@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/698f8482.a70a0220.2c38d7.00ca.GAE@google.com/T/
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260304113817.294966-2-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ian Ray <ian.ray@gehealthcare.com>
Date: Mon Mar 2 18:32:37 2026 +0200
net: nfc: nci: Fix zero-length proprietary notifications
[ Upstream commit f7d92f11bd33a6eb49c7c812255ef4ab13681f0f ]
NCI NFC controllers may have proprietary OIDs with zero-length payload.
One example is: drivers/nfc/nxp-nci/core.c, NXP_NCI_RF_TXLDO_ERROR_NTF.
Allow a zero length payload in proprietary notifications *only*.
Before:
-- >8 --
kernel: nci: nci_recv_frame: len 3
-- >8 --
After:
-- >8 --
kernel: nci: nci_recv_frame: len 3
kernel: nci: nci_ntf_packet: NCI RX: MT=ntf, PBF=0, GID=0x1, OID=0x23, plen=0
kernel: nci: nci_ntf_packet: unknown ntf opcode 0x123
kernel: nfc nfc0: NFC: RF transmitter couldn't start. Bad power and/or configuration?
-- >8 --
After fixing the hardware:
-- >8 --
kernel: nci: nci_recv_frame: len 27
kernel: nci: nci_ntf_packet: NCI RX: MT=ntf, PBF=0, GID=0x1, OID=0x5, plen=24
kernel: nci: nci_rf_intf_activated_ntf_packet: rf_discovery_id 1
-- >8 --
Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet")
Signed-off-by: Ian Ray <ian.ray@gehealthcare.com>
Link: https://patch.msgid.link/20260302163238.140576-1-ian.ray@gehealthcare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Koichiro Den <den@valinux.co.jp>
Date: Sat Feb 28 23:53:07 2026 +0900
net: sched: avoid qdisc_reset_all_tx_gt() vs dequeue race for lockless qdiscs
[ Upstream commit 7f083faf59d14c04e01ec05a7507f036c965acf8 ]
When shrinking the number of real tx queues,
netif_set_real_num_tx_queues() calls qdisc_reset_all_tx_gt() to flush
qdiscs for queues which will no longer be used.
qdisc_reset_all_tx_gt() currently serializes qdisc_reset() with
qdisc_lock(). However, for lockless qdiscs, the dequeue path is
serialized by qdisc_run_begin/end() using qdisc->seqlock instead, so
qdisc_reset() can run concurrently with __qdisc_run() and free skbs
while they are still being dequeued, leading to UAF.
This can easily be reproduced on e.g. virtio-net by imposing heavy
traffic while frequently changing the number of queue pairs:
iperf3 -ub0 -c $peer -t 0 &
while :; do
ethtool -L eth0 combined 1
ethtool -L eth0 combined 2
done
With KASAN enabled, this leads to reports like:
BUG: KASAN: slab-use-after-free in __qdisc_run+0x133f/0x1760
...
Call Trace:
<TASK>
...
__qdisc_run+0x133f/0x1760
__dev_queue_xmit+0x248f/0x3550
ip_finish_output2+0xa42/0x2110
ip_output+0x1a7/0x410
ip_send_skb+0x2e6/0x480
udp_send_skb+0xb0a/0x1590
udp_sendmsg+0x13c9/0x1fc0
...
</TASK>
Allocated by task 1270 on cpu 5 at 44.558414s:
...
alloc_skb_with_frags+0x84/0x7c0
sock_alloc_send_pskb+0x69a/0x830
__ip_append_data+0x1b86/0x48c0
ip_make_skb+0x1e8/0x2b0
udp_sendmsg+0x13a6/0x1fc0
...
Freed by task 1306 on cpu 3 at 44.558445s:
...
kmem_cache_free+0x117/0x5e0
pfifo_fast_reset+0x14d/0x580
qdisc_reset+0x9e/0x5f0
netif_set_real_num_tx_queues+0x303/0x840
virtnet_set_channels+0x1bf/0x260 [virtio_net]
ethnl_set_channels+0x684/0xae0
ethnl_default_set_doit+0x31a/0x890
...
Serialize qdisc_reset_all_tx_gt() against the lockless dequeue path by
taking qdisc->seqlock for TCQ_F_NOLOCK qdiscs, matching the
serialization model already used by dev_reset_queue().
Additionally clear QDISC_STATE_NON_EMPTY after reset so the qdisc state
reflects an empty queue, avoiding needless re-scheduling.
Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Link: https://patch.msgid.link/20260228145307.3955532-1-den@valinux.co.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Huacai Chen <chenhuacai@kernel.org>
Date: Tue Feb 3 14:29:01 2026 +0800
net: stmmac: dwmac-loongson: Set clk_csr_i to 100-150MHz
commit e1aa5ef892fb4fa9014a25e87b64b97347919d37 upstream.
Current clk_csr_i setting of Loongson STMMAC (including LS7A1000/2000
and LS2K1000/2000/3000) are copy & paste from other drivers. In fact,
Loongson STMMAC use 125MHz clocks and need 62 freq division to within
2.5MHz, meeting most PHY MDC requirement. So fix by setting clk_csr_i
to 100-150MHz, otherwise some PHYs may link fail.
Cc: stable@vger.kernel.org
Fixes: 30bba69d7db40e7 ("stmmac: pci: Add dwmac support for Loongson")
Signed-off-by: Hongliang Wang <wanghongliang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://patch.msgid.link/20260203062901.2158236-1-chenhuacai@loongson.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Date: Tue Mar 3 14:58:25 2026 +0000
net: stmmac: Fix error handling in VLAN add and delete paths
[ Upstream commit 35dfedce442c4060cfe5b98368bc9643fb995716 ]
stmmac_vlan_rx_add_vid() updates active_vlans and the VLAN hash
register before writing the HW filter entry. If the filter write
fails, it leaves a stale VID in active_vlans and the hash register.
stmmac_vlan_rx_kill_vid() has the reverse problem: it clears
active_vlans before removing the HW filter. On failure, the VID is
gone from active_vlans but still present in the HW filter table.
To fix this, reorder the operations to update the hash table first,
then attempt the HW filter operation. If the HW filter fails, roll
back both the active_vlans bitmap and the hash table by calling
stmmac_vlan_update() again.
Fixes: ed64639bc1e0 ("net: stmmac: Add support for VLAN Rx filtering")
Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Link: https://patch.msgid.link/20260303145828.7845-2-ovidiu.panait.rb@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: MD Danish Anwar <danishanwar@ti.com>
Date: Thu Feb 26 15:53:56 2026 +0530
net: ti: icssg-prueth: Fix ping failure after offload mode setup when link speed is not 1G
[ Upstream commit 147792c395db870756a0dc87ce656c75ae7ab7e8 ]
When both eth interfaces with links up are added to a bridge or hsr
interface, ping fails if the link speed is not 1Gbps (e.g., 100Mbps).
The issue is seen because when switching to offload (bridge/hsr) mode,
prueth_emac_restart() restarts the firmware and clears DRAM with
memset_io(), setting all memory to 0. This includes PORT_LINK_SPEED_OFFSET
which firmware reads for link speed. The value 0 corresponds to
FW_LINK_SPEED_1G (0x00), so for 1Gbps links the default value is correct
and ping works. For 100Mbps links, the firmware needs FW_LINK_SPEED_100M
(0x01) but gets 0 instead, causing ping to fail. The function
emac_adjust_link() is called to reconfigure, but it detects no state change
(emac->link is still 1, speed/duplex match PHY) so new_state remains false
and icssg_config_set_speed() is never called to correct the firmware speed
value.
The fix resets emac->link to 0 before calling emac_adjust_link() in
prueth_emac_common_start(). This forces new_state=true, ensuring
icssg_config_set_speed() is called to write the correct speed value to
firmware memory.
Fixes: 06feac15406f ("net: ti: icssg-prueth: Fix emac link speed handling")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20260226102356.2141871-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Feb 23 13:59:26 2026 +0100
net: usb: kalmia: validate USB endpoints
commit c58b6c29a4c9b8125e8ad3bca0637e00b71e2693 upstream.
The kalmia driver should validate that the device it is probing has the
proper number and types of USB endpoints it is expecting before it binds
to it. If a malicious device were to not have the same urbs the driver
will crash later on when it blindly accesses these endpoints.
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: d40261236e8e ("net/usb: Add Samsung Kalmia driver for Samsung GT-B3730")
Link: https://patch.msgid.link/2026022326-shack-headstone-ef6f@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Feb 23 14:00:06 2026 +0100
net: usb: kaweth: validate USB endpoints
commit 4b063c002ca759d1b299988ee23f564c9609c875 upstream.
The kaweth driver should validate that the device it is probing has the
proper number and types of USB endpoints it is expecting before it binds
to it. If a malicious device were to not have the same urbs the driver
will crash later on when it blindly accesses these endpoints.
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://patch.msgid.link/2026022305-substance-virtual-c728@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Feb 23 13:58:48 2026 +0100
net: usb: pegasus: validate USB endpoints
commit 11de1d3ae5565ed22ef1f89d73d8f2d00322c699 upstream.
The pegasus driver should validate that the device it is probing has the
proper number and types of USB endpoints it is expecting before it binds
to it. If a malicious device were to not have the same urbs the driver
will crash later on when it blindly accesses these endpoints.
Cc: Petko Manolov <petkan@nucleusys.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026022347-legibly-attest-cc5c@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Fernando Fernandez Mancera <fmancera@suse.de>
Date: Wed Mar 4 13:03:57 2026 +0100
net: vxlan: fix nd_tbl NULL dereference when IPv6 is disabled
[ Upstream commit 168ff39e4758897d2eee4756977d036d52884c7e ]
When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never
initialized because inet6_init() exits before ndisc_init() is called
which initializes it. If an IPv6 packet is injected into the interface,
route_shortcircuit() is called and a NULL pointer dereference happens on
neigh_lookup().
BUG: kernel NULL pointer dereference, address: 0000000000000380
Oops: Oops: 0000 [#1] SMP NOPTI
[...]
RIP: 0010:neigh_lookup+0x20/0x270
[...]
Call Trace:
<TASK>
vxlan_xmit+0x638/0x1ef0 [vxlan]
dev_hard_start_xmit+0x9e/0x2e0
__dev_queue_xmit+0xbee/0x14e0
packet_sendmsg+0x116f/0x1930
__sys_sendto+0x1f5/0x200
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x12f/0x1590
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fix this by adding an early check on route_shortcircuit() when protocol
is ETH_P_IPV6. Note that ipv6_mod_enabled() cannot be used here because
VXLAN can be built-in even when IPv6 is built as a module.
Fixes: e15a00aafa4b ("vxlan: add ipv6 route short circuit support")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260304120357.9778-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Eric Dumazet <edumazet@google.com>
Date: Wed Mar 4 01:56:40 2026 +0000
net_sched: sch_fq: clear q->band_pkt_count[] in fq_reset()
[ Upstream commit a4c2b8be2e5329e7fac6e8f64ddcb8958155cfcb ]
When/if a NIC resets, queues are deactivated by dev_deactivate_many(),
then reactivated when the reset operation completes.
fq_reset() removes all the skbs from various queues.
If we do not clear q->band_pkt_count[], these counters keep growing
and can eventually reach sch->limit, preventing new packets to be queued.
Many thanks to Praveen for discovering the root cause.
Fixes: 29f834aa326e ("net_sched: sch_fq: add 3 bands and WRR scheduling")
Diagnosed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260304015640.961780-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Florian Westphal <fw@strlen.de>
Date: Tue Mar 3 16:31:32 2026 +0100
netfilter: nft_set_pipapo: split gc into unlink and reclaim phase
[ Upstream commit 9df95785d3d8302f7c066050117b04cd3c2048c2 ]
Yiming Qian reports Use-after-free in the pipapo set type:
Under a large number of expired elements, commit-time GC can run for a very
long time in a non-preemptible context, triggering soft lockup warnings and
RCU stall reports (local denial of service).
We must split GC in an unlink and a reclaim phase.
We cannot queue elements for freeing until pointers have been swapped.
Expired elements are still exposed to both the packet path and userspace
dumpers via the live copy of the data structure.
call_rcu() does not protect us: dump operations or element lookups starting
after call_rcu has fired can still observe the free'd element, unless the
commit phase has made enough progress to swap the clone and live pointers
before any new reader has picked up the old version.
This a similar approach as done recently for the rbtree backend in commit
35f83a75529a ("netfilter: nft_set_rbtree: don't gc elements on insert").
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jakub Kicinski <kuba@kernel.org>
Date: Tue Mar 3 08:23:44 2026 -0800
nfc: nci: clear NCI_DATA_EXCHANGE before calling completion callback
[ Upstream commit 0efdc02f4f6d52f8ca5d5889560f325a836ce0a8 ]
Move clear_bit(NCI_DATA_EXCHANGE) before invoking the data exchange
callback in nci_data_exchange_complete().
The callback (e.g. rawsock_data_exchange_complete) may immediately
schedule another data exchange via schedule_work(tx_work). On a
multi-CPU system, tx_work can run and reach nci_transceive() before
the current nci_data_exchange_complete() clears the flag, causing
test_and_set_bit(NCI_DATA_EXCHANGE) to return -EBUSY and the new
transfer to fail.
This causes intermittent flakes in nci/nci_dev in NIPA:
# # RUN NCI.NCI1_0.t4t_tag_read ...
# # t4t_tag_read: Test terminated by timeout
# # FAIL NCI.NCI1_0.t4t_tag_read
# not ok 3 NCI.NCI1_0.t4t_tag_read
Fixes: 38f04c6b1b68 ("NFC: protect nci_data_exchange transactions")
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260303162346.2071888-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jakub Kicinski <kuba@kernel.org>
Date: Tue Mar 3 08:23:41 2026 -0800
nfc: nci: free skb on nci_transceive early error paths
[ Upstream commit 7bd4b0c4779f978a6528c9b7937d2ca18e936e2c ]
nci_transceive() takes ownership of the skb passed by the caller,
but the -EPROTO, -EINVAL, and -EBUSY error paths return without
freeing it.
Due to issues clearing NCI_DATA_EXCHANGE fixed by subsequent changes
the nci/nci_dev selftest hits the error path occasionally in NIPA,
and kmemleak detects leaks:
unreferenced object 0xff11000015ce6a40 (size 640):
comm "nci_dev", pid 3954, jiffies 4295441246
hex dump (first 32 bytes):
6b 6b 6b 6b 00 a4 00 0c 02 e1 03 6b 6b 6b 6b 6b kkkk.......kkkkk
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
backtrace (crc 7c40cc2a):
kmem_cache_alloc_node_noprof+0x492/0x630
__alloc_skb+0x11e/0x5f0
alloc_skb_with_frags+0xc6/0x8f0
sock_alloc_send_pskb+0x326/0x3f0
nfc_alloc_send_skb+0x94/0x1d0
rawsock_sendmsg+0x162/0x4c0
do_syscall_64+0x117/0xfc0
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260303162346.2071888-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Feb 23 12:28:30 2026 +0100
nfc: pn533: properly drop the usb interface reference on disconnect
commit 12133a483dfa832241fbbf09321109a0ea8a520e upstream.
When the device is disconnected from the driver, there is a "dangling"
reference count on the usb interface that was grabbed in the probe
callback. Fix this up by properly dropping the reference after we are
done with it.
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: c46ee38620a2 ("NFC: pn533: add NXP pn533 nfc device driver")
Link: https://patch.msgid.link/2026022329-flashing-ought-7573@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jakub Kicinski <kuba@kernel.org>
Date: Tue Mar 3 08:23:45 2026 -0800
nfc: rawsock: cancel tx_work before socket teardown
[ Upstream commit d793458c45df2aed498d7f74145eab7ee22d25aa ]
In rawsock_release(), cancel any pending tx_work and purge the write
queue before orphaning the socket. rawsock_tx_work runs on the system
workqueue and calls nfc_data_exchange which dereferences the NCI
device. Without synchronization, tx_work can race with socket and
device teardown when a process is killed (e.g. by SIGKILL), leading
to use-after-free or leaked references.
Set SEND_SHUTDOWN first so that if tx_work is already running it will
see the flag and skip transmitting, then use cancel_work_sync to wait
for any in-progress execution to finish, and finally purge any
remaining queued skbs.
Fixes: 23b7869c0fd0 ("NFC: add the NFC socket raw protocol")
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260303162346.2071888-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Kuniyuki Iwashima <kuniyu@google.com>
Date: Sat Jan 24 04:18:40 2026 +0000
nfsd: Fix cred ref leak in nfsd_nl_threads_set_doit().
commit 1cb968a2013ffa8112d52ebe605009ea1c6a582c upstream.
syzbot reported memory leak of struct cred. [0]
nfsd_nl_threads_set_doit() passes get_current_cred() to
nfsd_svc(), but put_cred() is not called after that.
The cred is finally passed down to _svc_xprt_create(),
which calls get_cred() with the cred for struct svc_xprt.
The ownership of the refcount by get_current_cred() is not
transferred to anywhere and is just leaked.
nfsd_svc() is also called from write_threads(), but it does
not bump file->f_cred there.
nfsd_nl_threads_set_doit() is called from sendmsg() and
current->cred does not go away.
Let's use current_cred() in nfsd_nl_threads_set_doit().
[0]:
BUG: memory leak
unreferenced object 0xffff888108b89480 (size 184):
comm "syz-executor", pid 5994, jiffies 4294943386
hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 369454a7):
kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline]
slab_post_alloc_hook mm/slub.c:4958 [inline]
slab_alloc_node mm/slub.c:5263 [inline]
kmem_cache_alloc_noprof+0x412/0x580 mm/slub.c:5270
prepare_creds+0x22/0x600 kernel/cred.c:185
copy_creds+0x44/0x290 kernel/cred.c:286
copy_process+0x7a7/0x2870 kernel/fork.c:2086
kernel_clone+0xac/0x6e0 kernel/fork.c:2651
__do_sys_clone+0x7f/0xb0 kernel/fork.c:2792
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xa4/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 924f4fb003ba ("NFSD: convert write_threads to netlink command")
Cc: stable@vger.kernel.org
Reported-by: syzbot+dd3b43aa0204089217ee@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69744674.a00a0220.33ccc7.0000.GAE@google.com/
Tested-by: syzbot+dd3b43aa0204089217ee@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ming Lei <ming.lei@redhat.com>
Date: Sat Jan 31 22:48:08 2026 +0800
nvme: fix admin queue leak on controller reset
[ Upstream commit b84bb7bd913d8ca2f976ee6faf4a174f91c02b8d ]
When nvme_alloc_admin_tag_set() is called during a controller reset,
a previous admin queue may still exist. Release it properly before
allocating a new one to avoid orphaning the old queue.
This fixes a regression introduced by commit 03b3bcd319b3 ("nvme: fix
admin request_queue lifetime").
Cc: Keith Busch <kbusch@kernel.org>
Fixes: 03b3bcd319b3 ("nvme: fix admin request_queue lifetime").
Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/linux-block/CAHj4cs9wv3SdPo+N01Fw2SHBYDs9tj2M_e1-GdQOkRy=DsBB1w@mail.gmail.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Sungwoo Kim <iam@sung-woo.kim>
Date: Fri Feb 27 19:19:28 2026 -0500
nvme: fix memory allocation in nvme_pr_read_keys()
[ Upstream commit c3320153769f05fd7fe9d840cb555dd3080ae424 ]
nvme_pr_read_keys() takes num_keys from userspace and uses it to
calculate the allocation size for rse via struct_size(). The upper
limit is PR_KEYS_MAX (64K).
A malicious or buggy userspace can pass a large num_keys value that
results in a 4MB allocation attempt at most, causing a warning in
the page allocator when the order exceeds MAX_PAGE_ORDER.
To fix this, use kvzalloc() instead of kzalloc().
This bug has the same reasoning and fix with the patch below:
https://lore.kernel.org/linux-block/20251212013510.3576091-1-kartikey406@gmail.com/
Warning log:
WARNING: mm/page_alloc.c:5216 at __alloc_frozen_pages_noprof+0x5aa/0x2300 mm/page_alloc.c:5216, CPU#1: syz-executor117/272
Modules linked in:
CPU: 1 UID: 0 PID: 272 Comm: syz-executor117 Not tainted 6.19.0 #1 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:__alloc_frozen_pages_noprof+0x5aa/0x2300 mm/page_alloc.c:5216
Code: ff 83 bd a8 fe ff ff 0a 0f 86 69 fb ff ff 0f b6 1d f9 f9 c4 04 80 fb 01 0f 87 3b 76 30 ff 83 e3 01 75 09 c6 05 e4 f9 c4 04 01 <0f> 0b 48 c7 85 70 fe ff ff 00 00 00 00 e9 8f fd ff ff 31 c0 e9 0d
RSP: 0018:ffffc90000fcf450 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1ffff920001f9ea0
RDX: 0000000000000000 RSI: 000000000000000b RDI: 0000000000040dc0
RBP: ffffc90000fcf648 R08: ffff88800b6c3380 R09: 0000000000000001
R10: ffffc90000fcf840 R11: ffff88807ffad280 R12: 0000000000000000
R13: 0000000000040dc0 R14: 0000000000000001 R15: ffffc90000fcf620
FS: 0000555565db33c0(0000) GS:ffff8880be26c000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002000000c CR3: 0000000003b72000 CR4: 00000000000006f0
Call Trace:
<TASK>
alloc_pages_mpol+0x236/0x4d0 mm/mempolicy.c:2486
alloc_frozen_pages_noprof+0x149/0x180 mm/mempolicy.c:2557
___kmalloc_large_node+0x10c/0x140 mm/slub.c:5598
__kmalloc_large_node_noprof+0x25/0xc0 mm/slub.c:5629
__do_kmalloc_node mm/slub.c:5645 [inline]
__kmalloc_noprof+0x483/0x6f0 mm/slub.c:5669
kmalloc_noprof include/linux/slab.h:961 [inline]
kzalloc_noprof include/linux/slab.h:1094 [inline]
nvme_pr_read_keys+0x8f/0x4c0 drivers/nvme/host/pr.c:245
blkdev_pr_read_keys block/ioctl.c:456 [inline]
blkdev_common_ioctl+0x1b71/0x29b0 block/ioctl.c:730
blkdev_ioctl+0x299/0x700 block/ioctl.c:786
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl fs/ioctl.c:583 [inline]
__x64_sys_ioctl+0x1bf/0x220 fs/ioctl.c:583
x64_sys_call+0x1280/0x21b0 mnt/fuzznvme_1/fuzznvme/linux-build/v6.19/./arch/x86/include/generated/asm/syscalls_64.h:17
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x71/0x330 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fb893d3108d
Code: 28 c3 e8 46 1e 00 00 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffff61f2f38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffff61f3138 RCX: 00007fb893d3108d
RDX: 0000000020000040 RSI: 00000000c01070ce RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000000000000 R09: 00007ffff61f3138
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffff61f3128 R14: 00007fb893dae530 R15: 0000000000000001
</TASK>
Fixes: 5fd96a4e15de (nvme: Add pr_ops read_keys support)
Acked-by: Chao Shi <cshi008@fiu.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Acked-by: Dave Tian <daveti@purdue.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Stefan Hajnoczi <stefanha@redhat.com>
Date: Mon Dec 1 16:43:27 2025 -0500
nvme: reject invalid pr_read_keys() num_keys values
[ Upstream commit 38ec8469f39e0e96e7dd9b76f05e0f8eb78be681 ]
The pr_read_keys() interface has a u32 num_keys parameter. The NVMe
Reservation Report command has a u32 maximum length. Reject num_keys
values that are too large to fit.
This will become important when pr_read_keys() is exposed to untrusted
userspace via an <linux/pr.h> ioctl.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: c3320153769f ("nvme: fix memory allocation in nvme_pr_read_keys()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Vimlesh Kumar <vimleshk@marvell.com>
Date: Fri Feb 27 09:13:58 2026 +0000
octeon_ep: avoid compiler and IQ/OQ reordering
[ Upstream commit 43b3160cb639079a15daeb5f080120afbfbfc918 ]
Utilize READ_ONCE and WRITE_ONCE APIs for IO queue Tx/Rx
variable access to prevent compiler optimization and reordering.
Additionally, ensure IO queue OUT/IN_CNT registers are flushed
by performing a read-back after writing.
The compiler could reorder reads/writes to pkts_pending, last_pkt_count,
etc., causing stale values to be used when calculating packets to process
or register updates to send to hardware. The Octeon hardware requires a
read-back after writing to OUT_CNT/IN_CNT registers to ensure the write
has been flushed through any posted write buffers before the interrupt
resend bit is set. Without this, we have observed cases where the hardware
didn't properly update its internal state.
wmb/rmb only provides ordering guarantees but doesn't prevent the compiler
from performing optimizations like caching in registers, load tearing etc.
Fixes: 37d79d0596062 ("octeon_ep: add Tx/Rx processing and interrupt support")
Signed-off-by: Sathesh Edara <sedara@marvell.com>
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: Vimlesh Kumar <vimleshk@marvell.com>
Link: https://patch.msgid.link/20260227091402.1773833-3-vimleshk@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Vimlesh Kumar <vimleshk@marvell.com>
Date: Fri Feb 27 09:13:57 2026 +0000
octeon_ep: Relocate counter updates before NAPI
[ Upstream commit 18c04a808c436d629d5812ce883e3822a5f5a47f ]
Relocate IQ/OQ IN/OUT_CNTS updates to occur before NAPI completion,
and replace napi_complete with napi_complete_done.
Moving the IQ/OQ counter updates before napi_complete_done ensures
1. Counter registers are updated before re-enabling interrupts.
2. Prevents a race where new packets arrive but counters aren't properly
synchronized.
napi_complete_done (vs napi_complete) allows for better
interrupt coalescing.
Fixes: 37d79d0596062 ("octeon_ep: add Tx/Rx processing and interrupt support")
Signed-off-by: Sathesh Edara <sedara@marvell.com>
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: Vimlesh Kumar <vimleshk@marvell.com>
Link: https://patch.msgid.link/20260227091402.1773833-2-vimleshk@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Vimlesh Kumar <vimleshk@marvell.com>
Date: Fri Feb 27 09:14:00 2026 +0000
octeon_ep_vf: avoid compiler and IQ/OQ reordering
[ Upstream commit 6c73126ecd1080351b468fe43353b2f705487f44 ]
Utilize READ_ONCE and WRITE_ONCE APIs for IO queue Tx/Rx
variable access to prevent compiler optimization and reordering.
Additionally, ensure IO queue OUT/IN_CNT registers are flushed
by performing a read-back after writing.
The compiler could reorder reads/writes to pkts_pending, last_pkt_count,
etc., causing stale values to be used when calculating packets to process
or register updates to send to hardware. The Octeon hardware requires a
read-back after writing to OUT_CNT/IN_CNT registers to ensure the write
has been flushed through any posted write buffers before the interrupt
resend bit is set. Without this, we have observed cases where the hardware
didn't properly update its internal state.
wmb/rmb only provides ordering guarantees but doesn't prevent the compiler
from performing optimizations like caching in registers, load tearing etc.
Fixes: 1cd3b407977c3 ("octeon_ep_vf: add Tx/Rx processing and interrupt support")
Signed-off-by: Sathesh Edara <sedara@marvell.com>
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: Vimlesh Kumar <vimleshk@marvell.com>
Link: https://patch.msgid.link/20260227091402.1773833-5-vimleshk@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Vimlesh Kumar <vimleshk@marvell.com>
Date: Fri Feb 27 09:13:59 2026 +0000
octeon_ep_vf: Relocate counter updates before NAPI
[ Upstream commit 2ae7d20fb24f598f60faa8f6ecc856dac782261a ]
Relocate IQ/OQ IN/OUT_CNTS updates to occur before NAPI completion.
Moving the IQ/OQ counter updates before napi_complete_done ensures
1. Counter registers are updated before re-enabling interrupts.
2. Prevents a race where new packets arrive but counters aren't properly
synchronized.
Fixes: 1cd3b407977c3 ("octeon_ep_vf: add Tx/Rx processing and interrupt support")
Signed-off-by: Sathesh Edara <sedara@marvell.com>
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: Vimlesh Kumar <vimleshk@marvell.com>
Link: https://patch.msgid.link/20260227091402.1773833-4-vimleshk@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Date: Tue Dec 30 22:16:08 2025 -0800
of/kexec: refactor ima_get_kexec_buffer() to use ima_validate_range()
[ Upstream commit 4d02233235ed0450de9c10fcdcf3484e3c9401ce ]
Refactor the OF/DT ima_get_kexec_buffer() to use a generic helper to
validate the address range. No functional change intended.
Link: https://lkml.kernel.org/r/20251231061609.907170-3-harshit.m.mogalapalli@oracle.com
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: guoweikang <guoweikang.kernel@gmail.com>
Cc: Henry Willard <henry.willard@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Jonathan McDowell <noodles@fb.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Webb <paul.x.webb@oracle.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Yifei Liu <yifei.l.liu@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Bjorn Helgaas <bhelgaas@google.com>
Date: Fri Feb 27 06:10:08 2026 -0600
PCI: Correct PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value
[ Upstream commit 39195990e4c093c9eecf88f29811c6de29265214 ]
fb82437fdd8c ("PCI: Change capability register offsets to hex") incorrectly
converted the PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value from decimal 52 to hex
0x32:
-#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52 /* v2 endpoints with link end here */
+#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 0x32 /* end of v2 EPs w/ link */
This broke PCI capabilities in a VMM because subsequent ones weren't
DWORD-aligned.
Change PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 to the correct value of 0x34.
fb82437fdd8c was from Baruch Siach <baruch@tkos.co.il>, but this was not
Baruch's fault; it's a mistake I made when applying the patch.
Fixes: fb82437fdd8c ("PCI: Change capability register offsets to hex")
Reported-by: David Woodhouse <dwmw2@infradead.org>
Closes: https://lore.kernel.org/all/3ae392a0158e9d9ab09a1d42150429dd8ca42791.camel@infradead.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Niklas Cassel <cassel@kernel.org>
Date: Mon Jan 13 11:59:34 2025 +0100
PCI: dw-rockchip: Don't wait for link since we can detect Link Up
[ Upstream commit ec9fd499b9c60a187ac8d6414c3c343c77d32e42 ]
The Root Complex specific device tree binding for pcie-dw-rockchip has the
'sys' interrupt marked as required.
The driver requests the 'sys' IRQ unconditionally, and errors out if not
provided.
Thus, we can unconditionally set 'use_linkup_irq', so dw_pcie_host_init()
doesn't wait for the link to come up.
This will skip the wait for link up (since the bus will be enumerated once
the link up IRQ is triggered), which reduces the bootup time.
Link: https://lore.kernel.org/r/20250113-rockchip-no-wait-v1-1-25417f37b92f@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Stable-dep-of: fc6298086bfa ("Revert "PCI: dw-rockchip: Don't wait for link since we can detect Link Up"")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Sat Oct 12 20:32:46 2024 +0900
PCI: dwc: endpoint: Implement the pci_epc_ops::align_addr() operation
[ Upstream commit e73ea1c2d4d8f7ba5daaf7aa51171f63cf79bcd8 ]
The function dw_pcie_prog_outbound_atu() used to program outbound ATU
entries for mapping RC PCI addresses to local CPU addresses does not
allow PCI addresses that are not aligned to the value of region_align
of struct dw_pcie. This value is determined from the iATU hardware
registers during probing of the iATU (done by dw_pcie_iatu_detect()).
This value is thus valid for all DWC PCIe controllers, and valid
regardless of the hardware configuration used when synthesizing the
DWC PCIe controller.
Implement the ->align_addr() endpoint controller operation to allow
this mapping alignment to be transparently handled by endpoint function
drivers through the function pci_epc_mem_map().
Link: https://lore.kernel.org/linux-pci/20241012113246.95634-7-dlemoal@kernel.org
Link: https://lore.kernel.org/linux-pci/20241015090712.112674-1-dlemoal@kernel.org
Link: https://lore.kernel.org/linux-pci/20241017132052.4014605-5-cassel@kernel.org
Co-developed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
[mani: squashed the patch that changed phy_addr_t to u64]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: squashed patch that updated the pci_size variable]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Stable-dep-of: c22533c66cca ("PCI: dwc: ep: Flush MSI-X write before unmapping its ATU entry")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Niklas Cassel <cassel@kernel.org>
Date: Wed Feb 11 18:55:41 2026 +0100
PCI: dwc: ep: Flush MSI-X write before unmapping its ATU entry
[ Upstream commit c22533c66ccae10511ad6a7afc34bb26c47577e3 ]
Endpoint drivers use dw_pcie_ep_raise_msix_irq() to raise an MSI-X
interrupt to the host using a writel(), which generates a PCI posted write
transaction. There's no completion for posted writes, so the writel() may
return before the PCI write completes. dw_pcie_ep_raise_msix_irq() also
unmaps the outbound ATU entry used for the PCI write, so the write races
with the unmap.
If the PCI write loses the race with the ATU unmap, the write may corrupt
host memory or cause IOMMU errors, e.g., these when running fio with a
larger queue depth against nvmet-pci-epf:
arm-smmu-v3 fc900000.iommu: 0x0000010000000010
arm-smmu-v3 fc900000.iommu: 0x0000020000000000
arm-smmu-v3 fc900000.iommu: 0x000000090000f040
arm-smmu-v3 fc900000.iommu: 0x0000000000000000
arm-smmu-v3 fc900000.iommu: event: F_TRANSLATION client: 0000:01:00.0 sid: 0x100 ssid: 0x0 iova: 0x90000f040 ipa: 0x0
arm-smmu-v3 fc900000.iommu: unpriv data write s1 "Input address caused fault" stag: 0x0
Flush the write by performing a readl() of the same address to ensure that
the write has reached the destination before the ATU entry is unmapped.
The same problem was solved for dw_pcie_ep_raise_msi_irq() in commit
8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping"), but there
it was solved by dedicating an outbound iATU only for MSI. We can't do the
same for MSI-X because each vector can have a different msg_addr and the
msg_addr may be changed while the vector is masked.
Fixes: beb4641a787d ("PCI: dwc: Add MSI-X callbacks handler")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260211175540.105677-2-cassel@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Niklas Cassel <cassel@kernel.org>
Date: Thu Oct 17 15:20:55 2024 +0200
PCI: dwc: ep: Use align addr function for dw_pcie_ep_raise_{msi,msix}_irq()
[ Upstream commit 3fafc38b77bebeeea5faa2a588b92353775bb390 ]
Use the dw_pcie_ep_align_addr() function to calculate the alignment in
dw_pcie_ep_raise_{msi,msix}_irq() instead of open coding the same.
Link: https://lore.kernel.org/r/20241017132052.4014605-6-cassel@kernel.org
Link: https://lore.kernel.org/r/20241104205144.409236-2-cassel@kernel.org
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
[kwilczynski: squashed patch that fixes memory map sizes]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Stable-dep-of: c22533c66cca ("PCI: dwc: ep: Flush MSI-X write before unmapping its ATU entry")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Sat Oct 12 20:32:41 2024 +0900
PCI: endpoint: Introduce pci_epc_function_is_valid()
[ Upstream commit ca3c342fb3c76eee739a1cfc4ff59841722ebee7 ]
Introduce the epc core helper function pci_epc_function_is_valid() to
verify that an epc pointer, a physical function number and a virtual
function number are all valid. This avoids repeating the code pattern:
if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
return err;
if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
return err;
in many functions of the endpoint controller core code.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20241012113246.95634-2-dlemoal@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Stable-dep-of: c22533c66cca ("PCI: dwc: ep: Flush MSI-X write before unmapping its ATU entry")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Sat Oct 12 20:32:43 2024 +0900
PCI: endpoint: Introduce pci_epc_mem_map()/unmap()
[ Upstream commit ce1dfe6d328966b75821c1f043a940eb2569768a ]
Some endpoint controllers have requirements on the alignment of the
controller physical memory address that must be used to map a RC PCI
address region. For instance, the endpoint controller of the RK3399 SoC
uses at most the lower 20 bits of a physical memory address region as
the lower bits of a RC PCI address region. For mapping a PCI address
region of size bytes starting from pci_addr, the exact number of
address bits used is the number of address bits changing in the address
range [pci_addr..pci_addr + size - 1]. For this example, this creates
the following constraints:
1) The offset into the controller physical memory allocated for a
mapping depends on the mapping size *and* the starting PCI address
for the mapping.
2) A mapping size cannot exceed the controller windows size (1MB) minus
the offset needed into the allocated physical memory, which can end
up being a smaller size than the desired mapping size.
Handling these constraints independently of the controller being used
in an endpoint function driver is not possible with the current EPC
API as only the ->align field in struct pci_epc_features is provided
but used for BAR (inbound ATU mappings) mapping only. A new API is
needed for function drivers to discover mapping constraints and handle
non-static requirements based on the RC PCI address range to access.
Introduce the endpoint controller operation ->align_addr() to allow
the EPC core functions to obtain the size and the offset into a
controller address region that must be allocated and mapped to access
a RC PCI address region. The size of the mapping provided by the
align_addr() operation can then be used as the size argument for the
function pci_epc_mem_alloc_addr() and the offset into the allocated
controller memory provided can be used to correctly handle data
transfers. For endpoint controllers that have PCI address alignment
constraints, the align_addr() operation may indicate upon return an
effective PCI address mapping size that is smaller (but not 0) than the
requested PCI address region size.
The controller ->align_addr() operation is optional: controllers that
do not have any alignment constraints for mapping RC PCI address regions
do not need to implement this operation. For such controllers, it is
always assumed that the mapping size is equal to the requested size of
the PCI region and that the mapping offset is 0.
The function pci_epc_mem_map() is introduced to use this new controller
operation (if it is defined) to handle controller memory allocation and
mapping to a RC PCI address region in endpoint function drivers.
This function first uses the ->align_addr() controller operation to
determine the controller memory address size (and offset into) needed
for mapping an RC PCI address region. The result of this operation is
used to allocate a controller physical memory region using
pci_epc_mem_alloc_addr() and then to map that memory to the RC PCI
address space with pci_epc_map_addr().
Since ->align_addr() () may indicate that not all of a RC PCI address
region can be mapped, pci_epc_mem_map() may only partially map the RC
PCI address region specified. It is the responsibility of the caller
(an endpoint function driver) to handle such smaller mapping by
repeatedly using pci_epc_mem_map() over the desried PCI address range.
The counterpart of pci_epc_mem_map() to unmap and free a mapped
controller memory address region is pci_epc_mem_unmap().
Both functions operate using the new struct pci_epc_map data structure.
This new structure represents a mapping PCI address, mapping effective
size, the size of the controller memory needed for the mapping as well
as the physical and virtual CPU addresses of the mapping (phys_base and
virt_base fields). For convenience, the physical and virtual CPU
addresses within that mapping to use to access the target RC PCI address
region are also provided (phys_addr and virt_addr fields).
Endpoint function drivers can use struct pci_epc_map to access the
mapped RC PCI address region using the ->virt_addr and ->pci_size
fields.
Co-developed-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20241012113246.95634-4-dlemoal@kernel.org
[mani: squashed the patch that changed phy_addr_t to u64]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Stable-dep-of: c22533c66cca ("PCI: dwc: ep: Flush MSI-X write before unmapping its ATU entry")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Date: Sat Nov 23 00:40:00 2024 +0530
PCI: qcom: Don't wait for link if we can detect Link Up
[ Upstream commit 36971d6c5a9a134c15760ae9fd13c6d5f9a36abb ]
If we have a 'global' IRQ for Link Up events, we need not wait for the
link to be up during PCI initialization, which reduces startup time.
Check for 'global' IRQ, and if present, set 'use_linkup_irq',
so dw_pcie_host_init() doesn't wait for the link to come up.
Link: https://lore.kernel.org/r/20241123-remove_wait2-v5-2-b5f9e6b794c2@quicinc.com
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Stable-dep-of: e9ce5b380443 ("Revert "PCI: qcom: Don't wait for link if we can detect Link Up"")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date: Mon Dec 8 16:56:54 2025 +0200
PCI: Use resource_set_range() that correctly sets ->end
[ Upstream commit 11721c45a8266a9d0c9684153d20e37159465f96 ]
__pci_read_base() sets resource start and end addresses when resource
is larger than 4G but pci_bus_addr_t or resource_size_t are not capable
of representing 64-bit PCI addresses. This creates a problematic
resource that has non-zero flags but the start and end addresses do not
yield to resource size of 0 but 1.
Replace custom resource addresses setup with resource_set_range()
that correctly sets end address as -1 which results in resource_size()
returning 0.
For consistency, also use resource_set_range() in the other branch that
does size based resource setup.
Fixes: 23b13bc76f35 ("PCI: Fail safely if we can't handle BARs larger than 4GB")
Link: https://lore.kernel.org/all/20251207215359.28895-1-ansuelsmth@gmail.com/T/#m990492684913c5a158ff0e5fc90697d8ad95351b
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: stable@vger.kernel.org
Cc: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20251208145654.5294-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Peter Zijlstra <peterz@infradead.org>
Date: Tue Feb 24 13:29:09 2026 +0100
perf: Fix __perf_event_overflow() vs perf_remove_from_context() race
[ Upstream commit c9bc1753b3cc41d0e01fbca7f035258b5f4db0ae ]
Make sure that __perf_event_overflow() runs with IRQs disabled for all
possible callchains. Specifically the software events can end up running
it with only preemption disabled.
This opens up a race vs perf_event_exit_event() and friends that will go
and free various things the overflow path expects to be present, like
the BPF program.
Fixes: 592903cdcbf6 ("perf_counter: add an event_list")
Reported-by: Simond Hu <cmdhh1767@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Simond Hu <cmdhh1767@gmail.com>
Link: https://patch.msgid.link/20260224122909.GV1395416@noisy.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Felix Gu <ustc.gu@gmail.com>
Date: Mon Feb 23 17:39:07 2026 +0800
pinctrl: cirrus: cs42l43: Fix double-put in cs42l43_pin_probe()
[ Upstream commit fd5bed798f45eb3a178ad527b43ab92705faaf8a ]
devm_add_action_or_reset() already invokes the action on failure,
so the explicit put causes a double-put.
Fixes: 9b07cdf86a0b ("pinctrl: cirrus: Fix fwnode leak in cs42l43_pin_probe()")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Florian Eckert <fe@dev.tdt.de>
Date: Thu Feb 5 13:55:46 2026 +0100
pinctrl: equilibrium: fix warning trace on load
[ Upstream commit 3e00b1b332e54ba50cca6691f628b9c06574024f ]
The callback functions 'eqbr_irq_mask()' and 'eqbr_irq_ack()' are also
called in the callback function 'eqbr_irq_mask_ack()'. This is done to
avoid source code duplication. The problem, is that in the function
'eqbr_irq_mask()' also calles the gpiolib function 'gpiochip_disable_irq()'
This generates the following warning trace in the log for every gpio on
load.
[ 6.088111] ------------[ cut here ]------------
[ 6.092440] WARNING: CPU: 3 PID: 1 at drivers/gpio/gpiolib.c:3810 gpiochip_disable_irq+0x39/0x50
[ 6.097847] Modules linked in:
[ 6.097847] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.12.59+ #0
[ 6.097847] Tainted: [W]=WARN
[ 6.097847] RIP: 0010:gpiochip_disable_irq+0x39/0x50
[ 6.097847] Code: 39 c6 48 19 c0 21 c6 48 c1 e6 05 48 03 b2 38 03 00 00 48 81 fe 00 f0 ff ff 77 11 48 8b 46 08 f6 c4 02 74 06 f0 80 66 09 fb c3 <0f> 0b 90 0f 1f 40 00 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
[ 6.097847] RSP: 0000:ffffc9000000b830 EFLAGS: 00010046
[ 6.097847] RAX: 0000000000000045 RBX: ffff888001be02a0 RCX: 0000000000000008
[ 6.097847] RDX: ffff888001be9000 RSI: ffff888001b2dd00 RDI: ffff888001be02a0
[ 6.097847] RBP: ffffc9000000b860 R08: 0000000000000000 R09: 0000000000000000
[ 6.097847] R10: 0000000000000001 R11: ffff888001b2a154 R12: ffff888001be0514
[ 6.097847] R13: ffff888001be02a0 R14: 0000000000000008 R15: 0000000000000000
[ 6.097847] FS: 0000000000000000(0000) GS:ffff888041d80000(0000) knlGS:0000000000000000
[ 6.097847] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6.097847] CR2: 0000000000000000 CR3: 0000000003030000 CR4: 00000000001026b0
[ 6.097847] Call Trace:
[ 6.097847] <TASK>
[ 6.097847] ? eqbr_irq_mask+0x63/0x70
[ 6.097847] ? no_action+0x10/0x10
[ 6.097847] eqbr_irq_mask_ack+0x11/0x60
In an other driver (drivers/pinctrl/starfive/pinctrl-starfive-jh7100.c) the
interrupt is not disabled here.
To fix this, do not call the 'eqbr_irq_mask()' and 'eqbr_irq_ack()'
function. Implement instead this directly without disabling the interrupts.
Fixes: 52066a53bd11 ("pinctrl: equilibrium: Convert to immutable irq_chip")
Signed-off-by: Florian Eckert <fe@dev.tdt.de>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Florian Eckert <fe@dev.tdt.de>
Date: Thu Feb 5 13:55:45 2026 +0100
pinctrl: equilibrium: rename irq_chip function callbacks
[ Upstream commit 1f96b84835eafb3e6f366dc3a66c0e69504cec9d ]
Renaming of the irq_chip callback functions to improve clarity.
Signed-off-by: Florian Eckert <fe@dev.tdt.de>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Stable-dep-of: 3e00b1b332e5 ("pinctrl: equilibrium: fix warning trace on load")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Thorsten Blum <thorsten.blum@linux.dev>
Date: Tue Mar 3 12:30:51 2026 +0100
platform/x86: dell-wmi-sysman: Don't hex dump plaintext password data
commit d1a196e0a6dcddd03748468a0e9e3100790fc85c upstream.
set_new_password() hex dumps the entire buffer, which contains plaintext
password data, including current and new passwords. Remove the hex dump
to avoid leaking credentials.
Fixes: e8a60aa7404b ("platform/x86: Introduce support for Systems Management Driver over WMI for Dell Systems")
Cc: stable@vger.kernel.org
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20260303113050.58127-2-thorsten.blum@linux.dev
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Kurt Borja <kuurtb@gmail.com>
Date: Sat Feb 7 12:16:34 2026 -0500
platform/x86: dell-wmi: Add audio/mic mute key codes
commit 26a7601471f62b95d56a81c3a8ccb551b5a6630f upstream.
Add audio/mic mute key codes found in Alienware m18 r1 AMD.
Cc: stable@vger.kernel.org
Tested-by: Olexa Bilaniuk <obilaniu@gmail.com>
Suggested-by: Olexa Bilaniuk <obilaniu@gmail.com>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Acked-by: Pali Rohár <pali@kernel.org>
Link: https://patch.msgid.link/20260207-mute-keys-v2-1-c55e5471c9c1@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jonathan Teh <jonathan.teh@outlook.com>
Date: Mon Feb 16 01:01:29 2026 +0000
platform/x86: thinkpad_acpi: Fix errors reading battery thresholds
[ Upstream commit 53e977b1d50c46f2c4ec3865cd13a822f58ad3cd ]
Check whether the battery supports the relevant charge threshold before
reading the value to silence these errors:
thinkpad_acpi: acpi_evalf(BCTG, dd, ...) failed: AE_NOT_FOUND
ACPI: \_SB_.PCI0.LPC_.EC__.HKEY: BCTG: evaluate failed
thinkpad_acpi: acpi_evalf(BCSG, dd, ...) failed: AE_NOT_FOUND
ACPI: \_SB_.PCI0.LPC_.EC__.HKEY: BCSG: evaluate failed
when reading the charge thresholds via sysfs on platforms that do not
support them such as the ThinkPad T400.
Fixes: 2801b9683f74 ("thinkpad_acpi: Add support for battery thresholds")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=202619
Signed-off-by: Jonathan Teh <jonathan.teh@outlook.com>
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://patch.msgid.link/MI0P293MB01967B206E1CA6F337EBFB12926CA@MI0P293MB0196.ITAP293.PROD.OUTLOOK.COM
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Mon Feb 16 11:02:49 2026 -0400
RDMA/irdma: Fix kernel stack leak in irdma_create_user_ah()
commit 74586c6da9ea222a61c98394f2fc0a604748438c upstream.
struct irdma_create_ah_resp { // 8 bytes, no padding
__u32 ah_id; // offset 0 - SET (uresp.ah_id = ah->sc_ah.ah_info.ah_idx)
__u8 rsvd[4]; // offset 4 - NEVER SET <- LEAK
};
rsvd[4]: 4 bytes of stack memory leaked unconditionally. Only ah_id is assigned before ib_respond_udata().
The reserved members of the structure were not zeroed.
Cc: stable@vger.kernel.org
Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://patch.msgid.link/3-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date: Fri Jun 14 13:06:03 2024 +0300
resource: Add resource set range and size helpers
[ Upstream commit 9fb6fef0fb49124291837af1da5028f79d53f98e ]
Setting the end address for a resource with a given size lacks a helper and
is therefore coded manually unlike the getter side which has a helper for
resource size calculation. Also, almost all callsites that calculate the
end address for a resource also set the start address right before it like
this:
res->start = start_addr;
res->end = res->start + size - 1;
Add resource_set_range(res, start_addr, size) that sets the start address
and calculates the end address to simplify this often repeated fragment.
Also add resource_set_size() for the cases where setting the start address
of the resource is not necessary but mention in its kerneldoc that
resource_set_range() is preferred when setting both addresses.
Link: https://lore.kernel.org/r/20240614100606.15830-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Stable-dep-of: 11721c45a826 ("PCI: Use resource_set_range() that correctly sets ->end")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Niklas Cassel <cassel@kernel.org>
Date: Mon Dec 22 07:42:08 2025 +0100
Revert "PCI: dw-rockchip: Don't wait for link since we can detect Link Up"
[ Upstream commit fc6298086bfacaa7003b0bd1da4e4f42b29f7d77 ]
This reverts commit ec9fd499b9c60a187ac8d6414c3c343c77d32e42.
While this fake hotplugging was a nice idea, it has shown that this feature
does not handle PCIe switches correctly:
pci_bus 0004:43: busn_res: can not insert [bus 43-41] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci_bus 0004:43: busn_res: [bus 43-41] end is updated to 43
pci_bus 0004:43: busn_res: can not insert [bus 43] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci 0004:42:00.0: devices behind bridge are unusable because [bus 43] cannot be assigned for them
pci_bus 0004:44: busn_res: can not insert [bus 44-41] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci_bus 0004:44: busn_res: [bus 44-41] end is updated to 44
pci_bus 0004:44: busn_res: can not insert [bus 44] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci 0004:42:02.0: devices behind bridge are unusable because [bus 44] cannot be assigned for them
pci_bus 0004:45: busn_res: can not insert [bus 45-41] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci_bus 0004:45: busn_res: [bus 45-41] end is updated to 45
pci_bus 0004:45: busn_res: can not insert [bus 45] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci 0004:42:06.0: devices behind bridge are unusable because [bus 45] cannot be assigned for them
pci_bus 0004:46: busn_res: can not insert [bus 46-41] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci_bus 0004:46: busn_res: [bus 46-41] end is updated to 46
pci_bus 0004:46: busn_res: can not insert [bus 46] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci 0004:42:0e.0: devices behind bridge are unusable because [bus 46] cannot be assigned for them
pci_bus 0004:42: busn_res: [bus 42-41] end is updated to 46
pci_bus 0004:42: busn_res: can not insert [bus 42-46] under [bus 41] (conflicts with (null) [bus 41])
pci 0004:41:00.0: devices behind bridge are unusable because [bus 42-46] cannot be assigned for them
pcieport 0004:40:00.0: bridge has subordinate 41 but max busn 46
During the initial scan, PCI core doesn't see the switch and since the Root
Port is not hot plug capable, the secondary bus number gets assigned as the
subordinate bus number. This means, the PCI core assumes that only one bus
will appear behind the Root Port since the Root Port is not hot plug
capable.
This works perfectly fine for PCIe endpoints connected to the Root Port,
since they don't extend the bus. However, if a PCIe switch is connected,
then there is a problem when the downstream busses starts showing up and
the PCI core doesn't extend the subordinate bus number and bridge resources
after initial scan during boot.
The long term plan is to migrate this driver to the upcoming pwrctrl APIs
that are supposed to handle this problem elegantly.
Suggested-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Acked-by: Shawn Lin <shawn.lin@rock-chips.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251222064207.3246632-9-cassel@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Niklas Cassel <cassel@kernel.org>
Date: Mon Dec 22 07:42:10 2025 +0100
Revert "PCI: qcom: Don't wait for link if we can detect Link Up"
[ Upstream commit e9ce5b3804436301ab343bc14203a4c14b336d1b ]
This reverts commit 36971d6c5a9a134c15760ae9fd13c6d5f9a36abb.
While this fake hotplugging was a nice idea, it has shown that this feature
does not handle PCIe switches correctly:
pci_bus 0004:43: busn_res: can not insert [bus 43-41] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci_bus 0004:43: busn_res: [bus 43-41] end is updated to 43
pci_bus 0004:43: busn_res: can not insert [bus 43] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci 0004:42:00.0: devices behind bridge are unusable because [bus 43] cannot be assigned for them
pci_bus 0004:44: busn_res: can not insert [bus 44-41] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci_bus 0004:44: busn_res: [bus 44-41] end is updated to 44
pci_bus 0004:44: busn_res: can not insert [bus 44] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci 0004:42:02.0: devices behind bridge are unusable because [bus 44] cannot be assigned for them
pci_bus 0004:45: busn_res: can not insert [bus 45-41] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci_bus 0004:45: busn_res: [bus 45-41] end is updated to 45
pci_bus 0004:45: busn_res: can not insert [bus 45] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci 0004:42:06.0: devices behind bridge are unusable because [bus 45] cannot be assigned for them
pci_bus 0004:46: busn_res: can not insert [bus 46-41] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci_bus 0004:46: busn_res: [bus 46-41] end is updated to 46
pci_bus 0004:46: busn_res: can not insert [bus 46] under [bus 42-41] (conflicts with (null) [bus 42-41])
pci 0004:42:0e.0: devices behind bridge are unusable because [bus 46] cannot be assigned for them
pci_bus 0004:42: busn_res: [bus 42-41] end is updated to 46
pci_bus 0004:42: busn_res: can not insert [bus 42-46] under [bus 41] (conflicts with (null) [bus 41])
pci 0004:41:00.0: devices behind bridge are unusable because [bus 42-46] cannot be assigned for them
pcieport 0004:40:00.0: bridge has subordinate 41 but max busn 46
During the initial scan, PCI core doesn't see the switch and since the Root
Port is not hot plug capable, the secondary bus number gets assigned as the
subordinate bus number. This means, the PCI core assumes that only one bus
will appear behind the Root Port since the Root Port is not hot plug
capable.
This works perfectly fine for PCIe endpoints connected to the Root Port,
since they don't extend the bus. However, if a PCIe switch is connected,
then there is a problem when the downstream busses starts showing up and
the PCI core doesn't extend the subordinate bus number and bridge resources
after initial scan during boot.
The long term plan is to migrate this driver to the upcoming pwrctrl APIs
that are supposed to handle this problem elegantly.
Suggested-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Acked-by: Shawn Lin <shawn.lin@rock-chips.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251222064207.3246632-11-cassel@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Fri Feb 20 15:06:40 2026 -0500
rseq: Clarify rseq registration rseq_size bound check comment
[ Upstream commit 26d43a90be81fc90e26688a51d3ec83188602731 ]
The rseq registration validates that the rseq_size argument is greater
or equal to 32 (the original rseq size), but the comment associated with
this check does not clearly state this.
Clarify the comment to that effect.
Fixes: ee3e3ac05c26 ("rseq: Introduce extensible rseq ABI")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260220200642.1317826-2-mathieu.desnoyers@efficios.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Alexandre Courbot <acourbot@nvidia.com>
Date: Tue Feb 24 19:37:56 2026 +0900
rust: kunit: fix warning when !CONFIG_PRINTK
[ Upstream commit 7dd34dfc8dfa92a7244242098110388367996ac3 ]
If `CONFIG_PRINTK` is not set, then the following warnings are issued
during build:
warning: unused variable: `args`
--> ../rust/kernel/kunit.rs:16:12
|
16 | pub fn err(args: fmt::Arguments<'_>) {
| ^^^^ help: if this is intentional, prefix it with an underscore: `_args`
|
= note: `#[warn(unused_variables)]` (part of `#[warn(unused)]`) on by default
warning: unused variable: `args`
--> ../rust/kernel/kunit.rs:32:13
|
32 | pub fn info(args: fmt::Arguments<'_>) {
| ^^^^ help: if this is intentional, prefix it with an underscore: `_args`
Fix this by adding a no-op assignment using `args` when `CONFIG_PRINTK`
is not set.
Fixes: a66d733da801 ("rust: support running Rust documentation tests as KUnit ones")
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Heiko Carstens <hca@linux.ibm.com>
Date: Wed Feb 18 15:20:04 2026 +0100
s390/idle: Fix cpu idle exit cpu time accounting
[ Upstream commit 0d785e2c324c90662baa4fe07a0d02233ff92824 ]
With the conversion to generic entry [1] cpu idle exit cpu time accounting
was converted from assembly to C. This introduced an reversed order of cpu
time accounting.
On cpu idle exit the current accounting happens with the following call
chain:
-> do_io_irq()/do_ext_irq()
-> irq_enter_rcu()
-> account_hardirq_enter()
-> vtime_account_irq()
-> vtime_account_kernel()
vtime_account_kernel() accounts the passed cpu time since last_update_timer
as system time, and updates last_update_timer to the current cpu timer
value.
However the subsequent call of
-> account_idle_time_irq()
will incorrectly subtract passed cpu time from timer_idle_enter to the
updated last_update_timer value from system_timer. Then last_update_timer
is updated to a sys_enter_timer, which means that last_update_timer goes
back in time.
Subsequently account_hardirq_exit() will account too much cpu time as
hardirq time. The sum of all accounted cpu times is still correct, however
some cpu time which was previously accounted as system time is now
accounted as hardirq time, plus there is the oddity that last_update_timer
goes back in time.
Restore previous behavior by extracting cpu time accounting code from
account_idle_time_irq() into a new update_timer_idle() function and call it
before irq_enter_rcu().
Fixes: 56e62a737028 ("s390: convert to generic entry") [1]
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Heiko Carstens <hca@linux.ibm.com>
Date: Wed Feb 18 15:20:05 2026 +0100
s390/vtime: Fix virtual timer forwarding
[ Upstream commit dbc0fb35679ed5d0adecf7d02137ac2c77244b3b ]
Since delayed accounting of system time [1] the virtual timer is
forwarded by do_account_vtime() but also vtime_account_kernel(),
vtime_account_softirq(), and vtime_account_hardirq(). This leads
to double accounting of system, guest, softirq, and hardirq time.
Remove accounting from the vtime_account*() family to restore old behavior.
There is only one user of the vtimer interface, which might explain
why nobody noticed this so far.
Fixes: b7394a5f4ce9 ("sched/cputime, s390: Implement delayed accounting of system time") [1]
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Junxiao Bi <junxiao.bi@oracle.com>
Date: Mon Feb 23 15:27:28 2026 -0800
scsi: core: Fix refcount leak for tagset_refcnt
commit 1ac22c8eae81366101597d48360718dff9b9d980 upstream.
This leak will cause a hang when tearing down the SCSI host. For example,
iscsid hangs with the following call trace:
[130120.652718] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
PID: 2528 TASK: ffff9d0408974e00 CPU: 3 COMMAND: "iscsid"
#0 [ffffb5b9c134b9e0] __schedule at ffffffff860657d4
#1 [ffffb5b9c134ba28] schedule at ffffffff86065c6f
#2 [ffffb5b9c134ba40] schedule_timeout at ffffffff86069fb0
#3 [ffffb5b9c134bab0] __wait_for_common at ffffffff8606674f
#4 [ffffb5b9c134bb10] scsi_remove_host at ffffffff85bfe84b
#5 [ffffb5b9c134bb30] iscsi_sw_tcp_session_destroy at ffffffffc03031c4 [iscsi_tcp]
#6 [ffffb5b9c134bb48] iscsi_if_recv_msg at ffffffffc0292692 [scsi_transport_iscsi]
#7 [ffffb5b9c134bb98] iscsi_if_rx at ffffffffc02929c2 [scsi_transport_iscsi]
#8 [ffffb5b9c134bbf0] netlink_unicast at ffffffff85e551d6
#9 [ffffb5b9c134bc38] netlink_sendmsg at ffffffff85e554ef
Fixes: 8fe4ce5836e9 ("scsi: core: Fix a use-after-free")
Cc: stable@vger.kernel.org
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260223232728.93350-1-junxiao.bi@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mathias Krause <minipli@grsecurity.net>
Date: Thu Feb 12 11:23:27 2026 -0800
scsi: lpfc: Properly set WC for DPP mapping
[ Upstream commit bffda93a51b40afd67c11bf558dc5aae83ca0943 ]
Using set_memory_wc() to enable write-combining for the DPP portion of
the MMIO mapping is wrong as set_memory_*() is meant to operate on RAM
only, not MMIO mappings. In fact, as used currently triggers a BUG_ON()
with enabled CONFIG_DEBUG_VIRTUAL.
Simply map the DPP region separately and in addition to the already
existing mappings, avoiding any possible negative side effects for
these.
Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Reviewed-by: Mathias Krause <minipli@grsecurity.net>
Link: https://patch.msgid.link/20260212192327.141104-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Salomon Dushimirimana <salomondush@google.com>
Date: Fri Feb 13 19:28:06 2026 +0000
scsi: pm8001: Fix use-after-free in pm8001_queue_command()
[ Upstream commit 38353c26db28efd984f51d426eac2396d299cca7 ]
Commit e29c47fe8946 ("scsi: pm8001: Simplify pm8001_task_exec()") refactors
pm8001_queue_command(), however it introduces a potential cause of a double
free scenario when it changes the function to return -ENODEV in case of phy
down/device gone state.
In this path, pm8001_queue_command() updates task status and calls
task_done to indicate to upper layer that the task has been handled.
However, this also frees the underlying SAS task. A -ENODEV is then
returned to the caller. When libsas sas_ata_qc_issue() receives this error
value, it assumes the task wasn't handled/queued by LLDD and proceeds to
clean up and free the task again, resulting in a double free.
Since pm8001_queue_command() handles the SAS task in this case, it should
return 0 to the caller indicating that the task has been handled.
Fixes: e29c47fe8946 ("scsi: pm8001: Simplify pm8001_task_exec()")
Signed-off-by: Salomon Dushimirimana <salomondush@google.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://patch.msgid.link/20260213192806.439432-1-salomondush@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Prithvi Tambewagh <activprithvi@gmail.com>
Date: Mon Feb 16 11:50:02 2026 +0530
scsi: target: Fix recursive locking in __configfs_open_file()
commit 14d4ac19d1895397532eec407433c5d74d9da53b upstream.
In flush_write_buffer, &p->frag_sem is acquired and then the loaded store
function is called, which, here, is target_core_item_dbroot_store(). This
function called filp_open(), following which these functions were called
(in reverse order), according to the call trace:
down_read
__configfs_open_file
do_dentry_open
vfs_open
do_open
path_openat
do_filp_open
file_open_name
filp_open
target_core_item_dbroot_store
flush_write_buffer
configfs_write_iter
target_core_item_dbroot_store() tries to validate the new file path by
trying to open the file path provided to it; however, in this case, the bug
report shows:
db_root: not a directory: /sys/kernel/config/target/dbroot
indicating that the same configfs file was tried to be opened, on which it
is currently working on. Thus, it is trying to acquire frag_sem semaphore
of the same file of which it already holds the semaphore obtained in
flush_write_buffer(), leading to acquiring the semaphore in a nested manner
and a possibility of recursive locking.
Fix this by modifying target_core_item_dbroot_store() to use kern_path()
instead of filp_open() to avoid opening the file using filesystem-specific
function __configfs_open_file(), and further modifying it to make this fix
compatible.
Reported-by: syzbot+f6e8174215573a84b797@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f6e8174215573a84b797
Tested-by: syzbot+f6e8174215573a84b797@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Prithvi Tambewagh <activprithvi@gmail.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Link: https://patch.msgid.link/20260216062002.61937-1-activprithvi@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Peter Wang <peter.wang@mediatek.com>
Date: Mon Feb 23 18:37:57 2026 +0800
scsi: ufs: core: Move link recovery for hibern8 exit failure to wl_resume
[ Upstream commit 62c015373e1cdb1cdca824bd2dbce2dac0819467 ]
Move the link recovery trigger from ufshcd_uic_pwr_ctrl() to
__ufshcd_wl_resume(). Ensure link recovery is only attempted when hibern8
exit fails during resume, not during hibern8 enter in suspend. Improve
error handling and prevent unnecessary link recovery attempts.
Fixes: 35dabf4503b9 ("scsi: ufs: core: Use link recovery when h8 exit fails during runtime resume")
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260223103906.2533654-1-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yifan Wu <wuyifan50@huawei.com>
Date: Thu Mar 5 09:36:37 2026 +0800
selftest/arm64: Fix sve2p1_sigill() to hwcap test
[ Upstream commit d87c828daa7ead9763416f75cc416496969cf1dc ]
The FEAT_SVE2p1 is indicated by ID_AA64ZFR0_EL1.SVEver. However,
the BFADD requires the FEAT_SVE_B16B16, which is indicated by
ID_AA64ZFR0_EL1.B16B16. This could cause the test to incorrectly
fail on a CPU that supports FEAT_SVE2.1 but not FEAT_SVE_B16B16.
LD1Q Gather load quadwords which is decoded from SVE encodings and
implied by FEAT_SVE2p1.
Fixes: c5195b027d29 ("kselftest/arm64: Add SVE 2.1 to hwcap test")
Signed-off-by: Yifan Wu <wuyifan50@huawei.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Sun Jian <sun.jian.kdev@gmail.com>
Date: Wed Feb 25 19:14:50 2026 +0800
selftests/harness: order TEST_F and XFAIL_ADD constructors
[ Upstream commit 6be2681514261324c8ee8a1c6f76cefdf700220f ]
TEST_F() allocates and registers its struct __test_metadata via mmap()
inside its constructor, and only then assigns the
_##fixture_##test##_object pointer.
XFAIL_ADD() runs in a constructor too and reads
_##fixture_##test##_object to initialize xfail->test. If XFAIL_ADD runs
first, xfail->test can be NULL and the expected failure will be reported
as FAIL.
Use constructor priorities to ensure TEST_F registration runs before
XFAIL_ADD, without adding extra state or runtime lookups.
Fixes: 2709473c9386 ("selftests: kselftest_harness: support using xfail")
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://patch.msgid.link/20260225111451.347923-1-sun.jian.kdev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date: Tue Mar 3 11:56:06 2026 +0100
selftests: mptcp: join: check removing signal+subflow endp
commit 1777f349ff41b62dfe27454b69c27b0bc99ffca5 upstream.
This validates the previous commit: endpoints with both the signal and
subflow flags should always be marked as used even if it was not
possible to create new subflows due to the MPTCP PM limits.
For this test, an extra endpoint is created with both the signal and the
subflow flags, and limits are set not to create extra subflows. In this
case, an ADD_ADDR is sent, but no subflows are created. Still, the local
endpoint is marked as used, and no warning is fired when removing the
endpoint, after having sent a RM_ADDR.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 85df533a787b ("mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260303-net-mptcp-misc-fixes-7-0-rc2-v1-5-4b5462b6f016@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Paolo Abeni <pabeni@redhat.com>
Date: Tue Mar 3 11:56:02 2026 +0100
selftests: mptcp: more stable simult_flows tests
commit 8c09412e584d9bcc0e71d758ec1008d1c8d1a326 upstream.
By default, the netem qdisc can keep up to 1000 packets under its belly
to deal with the configured rate and delay. The simult flows test-case
simulates very low speed links, to avoid problems due to slow CPUs and
the TCP stack tend to transmit at a slightly higher rate than the
(virtual) link constraints.
All the above causes a relatively large amount of packets being enqueued
in the netem qdiscs - the longer the transfer, the longer the queue -
producing increasingly high TCP RTT samples and consequently increasingly
larger receive buffer size due to DRS.
When the receive buffer size becomes considerably larger than the needed
size, the tests results can flake, i.e. because minimal inaccuracy in the
pacing rate can lead to a single subflow usage towards the end of the
connection for a considerable amount of data.
Address the issue explicitly setting netem limits suitable for the
configured link speeds and unflake all the affected tests.
Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260303-net-mptcp-misc-fixes-7-0-rc2-v1-1-4b5462b6f016@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: ZhangGuoDong <zhangguodong@kylinos.cn>
Date: Tue Mar 3 15:13:11 2026 +0000
smb/client: fix buffer size for smb311_posix_qinfo in smb2_compound_op()
[ Upstream commit 12c43a062acb0ac137fc2a4a106d4d084b8c5416 ]
Use `sizeof(struct smb311_posix_qinfo)` instead of sizeof its pointer,
so the allocated buffer matches the actual struct size.
Fixes: 6a5f6592a0b6 ("SMB311: Add support for query info using posix extensions (level 100)")
Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: ZhangGuoDong <zhangguodong@kylinos.cn>
Date: Tue Mar 3 15:13:12 2026 +0000
smb/client: fix buffer size for smb311_posix_qinfo in SMB311_posix_query_info()
[ Upstream commit 9621b996e4db1dbc2b3dc5d5910b7d6179397320 ]
SMB311_posix_query_info() is currently unused, but it may still be used in
some stable versions, so these changes are submitted as a separate patch.
Use `sizeof(struct smb311_posix_qinfo)` instead of sizeof its pointer,
so the allocated buffer matches the actual struct size.
Fixes: b1bc1874b885 ("smb311: Add support for SMB311 query info (non-compounded)")
Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Thorsten Blum <thorsten.blum@linux.dev>
Date: Thu Feb 26 22:28:45 2026 +0100
smb: client: Don't log plaintext credentials in cifs_set_cifscreds
commit 2f37dc436d4e61ff7ae0b0353cf91b8c10396e4d upstream.
When debug logging is enabled, cifs_set_cifscreds() logs the key
payload and exposes the plaintext username and password. Remove the
debug log to avoid exposing credentials.
Fixes: 8a8798a5ff90 ("cifs: fetch credentials out of keyring for non-krb5 auth multiuser mounts")
Cc: stable@vger.kernel.org
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Paulo Alcantara <pc@manguebit.org>
Date: Wed Feb 25 21:34:55 2026 -0300
smb: client: fix broken multichannel with krb5+signing
commit d9d1e319b39ea685ede59319002d567c159d23c3 upstream.
When mounting a share with 'multichannel,max_channels=n,sec=krb5i',
the client was duplicating signing key for all secondary channels,
thus making the server fail all commands sent from secondary channels
due to bad signatures.
Every channel has its own signing key, so when establishing a new
channel with krb5 auth, make sure to use the new session key as the
derived key to generate channel's signing key in SMB2_auth_kerberos().
Repro:
$ mount.cifs //srv/share /mnt -o multichannel,max_channels=4,sec=krb5i
$ sleep 5
$ umount /mnt
$ dmesg
...
CIFS: VFS: sign fail cmd 0x5 message id 0x2
CIFS: VFS: \\srv SMB signature verification returned error = -13
CIFS: VFS: sign fail cmd 0x5 message id 0x2
CIFS: VFS: \\srv SMB signature verification returned error = -13
CIFS: VFS: sign fail cmd 0x4 message id 0x2
CIFS: VFS: \\srv SMB signature verification returned error = -13
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Henrique Carvalho <henrique.carvalho@suse.com>
Date: Sat Feb 21 01:59:44 2026 -0300
smb: client: fix cifs_pick_channel when channels are equally loaded
commit 663c28469d3274d6456f206a6671c91493d85ff1 upstream.
cifs_pick_channel uses (start % chan_count) when channels are equally
loaded, but that can return a channel that failed the eligibility
checks.
Drop the fallback and return the scan-selected channel instead. If none
is eligible, keep the existing behavior of using the primary channel.
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Acked-by: Meetakshi Setiya <msetiya@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Phillip Lougher <phillip@squashfs.org.uk>
Date: Tue Feb 17 05:09:55 2026 +0000
Squashfs: check metadata block offset is within range
commit fdb24a820a5832ec4532273282cbd4f22c291a0d upstream.
Syzkaller reports a "general protection fault in squashfs_copy_data"
This is ultimately caused by a corrupted index look-up table, which
produces a negative metadata block offset.
This is subsequently passed to squashfs_copy_data (via
squashfs_read_metadata) where the negative offset causes an out of bounds
access.
The fix is to check that the offset is within range in
squashfs_read_metadata. This will trap this and other cases.
Link: https://lkml.kernel.org/r/20260217050955.138351-1-phillip@squashfs.org.uk
Fixes: f400e12656ab ("Squashfs: cache operations")
Reported-by: syzbot+a9747fe1c35a5b115d3f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/699234e2.a70a0220.2c38d7.00e2.GAE@google.com/
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Guenter Roeck <linux@roeck-us.net>
Date: Thu Mar 5 11:33:39 2026 -0800
tracing: Add NULL pointer check to trigger_data_free()
[ Upstream commit 457965c13f0837a289c9164b842d0860133f6274 ]
If trigger_data_alloc() fails and returns NULL, event_hist_trigger_parse()
jumps to the out_free error path. While kfree() safely handles a NULL
pointer, trigger_data_free() does not. This causes a NULL pointer
dereference in trigger_data_free() when evaluating
data->cmd_ops->set_filter.
Fix the problem by adding a NULL pointer check to trigger_data_free().
The problem was found by an experimental code review agent based on
gemini-3.1-pro while reviewing backports into v6.18.y.
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://patch.msgid.link/20260305193339.2810953-1-linux@roeck-us.net
Fixes: 0550069cc25f ("tracing: Properly process error handling in event_hist_trigger_parse()")
Assisted-by: Gemini:gemini-3.1-pro
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Qing Wang <wangqing7171@gmail.com>
Date: Fri Feb 27 10:58:42 2026 +0800
tracing: Fix WARN_ON in tracing_buffers_mmap_close
commit e39bb9e02b68942f8e9359d2a3efe7d37ae6be0e upstream.
When a process forks, the child process copies the parent's VMAs but the
user_mapped reference count is not incremented. As a result, when both the
parent and child processes exit, tracing_buffers_mmap_close() is called
twice. On the second call, user_mapped is already 0, causing the function to
return -ENODEV and triggering a WARN_ON.
Normally, this isn't an issue as the memory is mapped with VM_DONTCOPY set.
But this is only a hint, and the application can call
madvise(MADVISE_DOFORK) which resets the VM_DONTCOPY flag. When the
application does that, it can trigger this issue on fork.
Fix it by incrementing the user_mapped reference count without re-mapping
the pages in the VMA's open callback.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://patch.msgid.link/20260227025842.1085206-1-wangqing7171@gmail.com
Fixes: cf9f0f7c4c5bb ("tracing: Allow user-space mapping of the ring-buffer")
Reported-by: syzbot+3b5dd2030fe08afdf65d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3b5dd2030fe08afdf65d
Tested-by: syzbot+3b5dd2030fe08afdf65d@syzkaller.appspotmail.com
Signed-off-by: Qing Wang <wangqing7171@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Breno Leitao <leitao@debian.org>
Date: Wed Jan 28 10:16:11 2026 -0800
uprobes: Fix incorrect lockdep condition in filter_chain()
[ Upstream commit a56a38fd9196fc89401e498d70b7aa9c9679fa6e ]
The list_for_each_entry_rcu() in filter_chain() uses
rcu_read_lock_trace_held() as the lockdep condition, but the function
holds consumer_rwsem, not the RCU trace lock.
This gives me the following output when running with some locking debug
option enabled:
kernel/events/uprobes.c:1141 RCU-list traversed in non-reader section!!
filter_chain
register_for_each_vma
uprobe_unregister_nosync
__probe_event_disable
Remove the incorrect lockdep condition since the rwsem provides
sufficient protection for the list traversal.
Fixes: cc01bd044e6a ("uprobes: travers uprobe's consumer list locklessly under SRCU protection")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260128-uprobe_rcu-v2-1-994ea6d32730@debian.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Andrii Nakryiko <andrii@kernel.org>
Date: Tue Sep 10 10:43:12 2024 -0700
uprobes: switch to RCU Tasks Trace flavor for better performance
[ Upstream commit 87195a1ee332add27bd51448c6b54aad551a28f5 ]
This patch switches uprobes SRCU usage to RCU Tasks Trace flavor, which
is optimized for more lightweight and quick readers (at the expense of
slower writers, which for uprobes is a fine tradeof) and has better
performance and scalability with number of CPUs.
Similarly to baseline vs SRCU, we've benchmarked SRCU-based
implementation vs RCU Tasks Trace implementation.
SRCU
====
uprobe-nop ( 1 cpus): 3.276 ± 0.005M/s ( 3.276M/s/cpu)
uprobe-nop ( 2 cpus): 4.125 ± 0.002M/s ( 2.063M/s/cpu)
uprobe-nop ( 4 cpus): 7.713 ± 0.002M/s ( 1.928M/s/cpu)
uprobe-nop ( 8 cpus): 8.097 ± 0.006M/s ( 1.012M/s/cpu)
uprobe-nop (16 cpus): 6.501 ± 0.056M/s ( 0.406M/s/cpu)
uprobe-nop (32 cpus): 4.398 ± 0.084M/s ( 0.137M/s/cpu)
uprobe-nop (64 cpus): 6.452 ± 0.000M/s ( 0.101M/s/cpu)
uretprobe-nop ( 1 cpus): 2.055 ± 0.001M/s ( 2.055M/s/cpu)
uretprobe-nop ( 2 cpus): 2.677 ± 0.000M/s ( 1.339M/s/cpu)
uretprobe-nop ( 4 cpus): 4.561 ± 0.003M/s ( 1.140M/s/cpu)
uretprobe-nop ( 8 cpus): 5.291 ± 0.002M/s ( 0.661M/s/cpu)
uretprobe-nop (16 cpus): 5.065 ± 0.019M/s ( 0.317M/s/cpu)
uretprobe-nop (32 cpus): 3.622 ± 0.003M/s ( 0.113M/s/cpu)
uretprobe-nop (64 cpus): 3.723 ± 0.002M/s ( 0.058M/s/cpu)
RCU Tasks Trace
===============
uprobe-nop ( 1 cpus): 3.396 ± 0.002M/s ( 3.396M/s/cpu)
uprobe-nop ( 2 cpus): 4.271 ± 0.006M/s ( 2.135M/s/cpu)
uprobe-nop ( 4 cpus): 8.499 ± 0.015M/s ( 2.125M/s/cpu)
uprobe-nop ( 8 cpus): 10.355 ± 0.028M/s ( 1.294M/s/cpu)
uprobe-nop (16 cpus): 7.615 ± 0.099M/s ( 0.476M/s/cpu)
uprobe-nop (32 cpus): 4.430 ± 0.007M/s ( 0.138M/s/cpu)
uprobe-nop (64 cpus): 6.887 ± 0.020M/s ( 0.108M/s/cpu)
uretprobe-nop ( 1 cpus): 2.174 ± 0.001M/s ( 2.174M/s/cpu)
uretprobe-nop ( 2 cpus): 2.853 ± 0.001M/s ( 1.426M/s/cpu)
uretprobe-nop ( 4 cpus): 4.913 ± 0.002M/s ( 1.228M/s/cpu)
uretprobe-nop ( 8 cpus): 5.883 ± 0.002M/s ( 0.735M/s/cpu)
uretprobe-nop (16 cpus): 5.147 ± 0.001M/s ( 0.322M/s/cpu)
uretprobe-nop (32 cpus): 3.738 ± 0.008M/s ( 0.117M/s/cpu)
uretprobe-nop (64 cpus): 4.397 ± 0.002M/s ( 0.069M/s/cpu)
Peak throughput for uprobes increases from 8 mln/s to 10.3 mln/s
(+28%!), and for uretprobes from 5.3 mln/s to 5.8 mln/s (+11%), as we
have more work to do on uretprobes side.
Even single-thread (no contention) performance is slightly better: 3.276
mln/s to 3.396 mln/s (+3.5%) for uprobes, and 2.055 mln/s to 2.174 mln/s
(+5.8%) for uretprobes.
We also select TASKS_TRACE_RCU for UPROBES in Kconfig due to the new
dependency.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20240910174312.3646590-1-andrii@kernel.org
Stable-dep-of: a56a38fd9196 ("uprobes: Fix incorrect lockdep condition in filter_chain()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Théo Lebrun <theo.lebrun@bootlin.com>
Date: Wed Feb 5 18:36:49 2025 +0100
usb: cdns3: call cdns_power_is_lost() only once in cdns_resume()
[ Upstream commit 17c6526b333cfd89a4c888a6f7c876c8c326e5ae ]
cdns_power_is_lost() does a register read.
Call it only once rather than twice.
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://lore.kernel.org/r/20250205-s2r-cdns-v7-4-13658a271c3c@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 87e4b043b98a ("usb: cdns3: fix role switching during resume")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Thomas Richard (TI) <thomas.richard@bootlin.com>
Date: Fri Jan 30 11:05:45 2026 +0100
usb: cdns3: fix role switching during resume
[ Upstream commit 87e4b043b98a1d269be0b812f383881abee0ca45 ]
If the role change while we are suspended, the cdns3 driver switches to the
new mode during resume. However, switching to host mode in this context
causes a NULL pointer dereference.
The host role's start() operation registers a xhci-hcd device, but its
probe is deferred while we are in the resume path. The host role's resume()
operation assumes the xhci-hcd device is already probed, which is not the
case, leading to the dereference. Since the start() operation of the new
role is already called, the resume operation can be skipped.
So skip the resume operation for the new role if a role switch occurs
during resume. Once the resume sequence is complete, the xhci-hcd device
can be probed in case of host mode.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000208
Mem abort info:
...
Data abort info:
...
[0000000000000208] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] SMP
Modules linked in:
CPU: 0 UID: 0 PID: 146 Comm: sh Not tainted
6.19.0-rc7-00013-g6e64f4aabfae-dirty #135 PREEMPT
Hardware name: Texas Instruments J7200 EVM (DT)
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : usb_hcd_is_primary_hcd+0x0/0x1c
lr : cdns_host_resume+0x24/0x5c
...
Call trace:
usb_hcd_is_primary_hcd+0x0/0x1c (P)
cdns_resume+0x6c/0xbc
cdns3_controller_resume.isra.0+0xe8/0x17c
cdns3_plat_resume+0x18/0x24
platform_pm_resume+0x2c/0x68
dpm_run_callback+0x90/0x248
device_resume+0x100/0x24c
dpm_resume+0x190/0x2ec
dpm_resume_end+0x18/0x34
suspend_devices_and_enter+0x2b0/0xa44
pm_suspend+0x16c/0x5fc
state_store+0x80/0xec
kobj_attr_store+0x18/0x2c
sysfs_kf_write+0x7c/0x94
kernfs_fop_write_iter+0x130/0x1dc
vfs_write+0x240/0x370
ksys_write+0x70/0x108
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x48/0x10c
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x34/0x108
el0t_64_sync_handler+0xa0/0xe4
el0t_64_sync+0x198/0x19c
Code: 52800003 f9407ca5 d63f00a0 17ffffe4 (f9410401)
---[ end trace 0000000000000000 ]---
Cc: stable <stable@kernel.org>
Fixes: 2cf2581cd229 ("usb: cdns3: add power lost support for system resume")
Signed-off-by: Thomas Richard (TI) <thomas.richard@bootlin.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Link: https://patch.msgid.link/20260130-usb-cdns3-fix-role-switching-during-resume-v1-1-44c456852b52@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Hongyu Xie <xiehongyu1@kylinos.cn>
Date: Tue Dec 31 09:36:41 2024 +0800
usb: cdns3: remove redundant if branch
[ Upstream commit dedab674428f8a99468a4864c067128ba9ea83a6 ]
cdns->role_sw->dev->driver_data gets set in routines showing below,
cdns_init
sw_desc.driver_data = cdns;
cdns->role_sw = usb_role_switch_register(dev, &sw_desc);
dev_set_drvdata(&sw->dev, desc->driver_data);
In cdns_resume,
cdns->role = cdns_role_get(cdns->role_sw); //line redundant
struct cdns *cdns = usb_role_switch_get_drvdata(sw);
dev_get_drvdata(&sw->dev)
return dev->driver_data
return cdns->role;
"line redundant" equals to,
cdns->role = cdns->role;
So fix this if branch.
Signed-off-by: Hongyu Xie <xiehongyu1@kylinos.cn>
Acked-by: Peter Chen <peter.chen@kernel.org>
Link: https://lore.kernel.org/r/20241231013641.23908-1-xiehongyu1@kylinos.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 87e4b043b98a ("usb: cdns3: fix role switching during resume")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Daniil Dulov <d.dulov@aladdin.ru>
Date: Wed Feb 11 11:20:24 2026 +0300
wifi: cfg80211: cancel rfkill_block work in wiphy_unregister()
commit 767d23ade706d5fa51c36168e92a9c5533c351a1 upstream.
There is a use-after-free error in cfg80211_shutdown_all_interfaces found
by syzkaller:
BUG: KASAN: use-after-free in cfg80211_shutdown_all_interfaces+0x213/0x220
Read of size 8 at addr ffff888112a78d98 by task kworker/0:5/5326
CPU: 0 UID: 0 PID: 5326 Comm: kworker/0:5 Not tainted 6.19.0-rc2 #2 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: events cfg80211_rfkill_block_work
Call Trace:
<TASK>
dump_stack_lvl+0x116/0x1f0
print_report+0xcd/0x630
kasan_report+0xe0/0x110
cfg80211_shutdown_all_interfaces+0x213/0x220
cfg80211_rfkill_block_work+0x1e/0x30
process_one_work+0x9cf/0x1b70
worker_thread+0x6c8/0xf10
kthread+0x3c5/0x780
ret_from_fork+0x56d/0x700
ret_from_fork_asm+0x1a/0x30
</TASK>
The problem arises due to the rfkill_block work is not cancelled when wiphy
is being unregistered. In order to fix the issue cancel the corresponding
work in wiphy_unregister().
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 1f87f7d3a3b4 ("cfg80211: add rfkill support")
Cc: stable@vger.kernel.org
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Link: https://patch.msgid.link/20260211082024.1967588-1-d.dulov@aladdin.ru
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bart Van Assche <bvanassche@acm.org>
Date: Mon Feb 23 14:00:24 2026 -0800
wifi: cw1200: Fix locking in error paths
[ Upstream commit d98c24617a831e92e7224a07dcaed2dd0b02af96 ]
cw1200_wow_suspend() must only return with priv->conf_mutex locked if it
returns zero. This mutex must be unlocked if an error is returned. Add
mutex_unlock() calls to the error paths from which that call is missing.
This has been detected by the Clang thread-safety analyzer.
Fixes: a910e4a94f69 ("cw1200: add driver for the ST-E CW1100 & CW1200 WLAN chipsets")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260223220102.2158611-25-bart.vanassche@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ariel Silver <arielsilver77@gmail.com>
Date: Fri Feb 20 10:11:29 2026 +0000
wifi: mac80211: bounds-check link_id in ieee80211_ml_reconfiguration
commit 162d331d833dc73a3e905a24c44dd33732af1fc5 upstream.
link_id is taken from the ML Reconfiguration element (control & 0x000f),
so it can be 0..15. link_removal_timeout[] has IEEE80211_MLD_MAX_NUM_LINKS
(15) elements, so index 15 is out-of-bounds. Skip subelements with
link_id >= IEEE80211_MLD_MAX_NUM_LINKS to avoid a stack out-of-bounds
write.
Fixes: 8eb8dd2ffbbb ("wifi: mac80211: Support link removal using Reconfiguration ML element")
Reported-by: Ariel Silver <arielsilver77@gmail.com>
Signed-off-by: Ariel Silver <arielsilver77@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260220101129.1202657-1-Ariel.Silver@cybereason.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Vahagn Vardanian <vahagn@redrays.io>
Date: Mon Feb 23 00:00:00 2026 +0000
wifi: mac80211: fix NULL pointer dereference in mesh_rx_csa_frame()
commit 017c1792525064a723971f0216e6ef86a8c7af11 upstream.
In mesh_rx_csa_frame(), elems->mesh_chansw_params_ie is dereferenced
at lines 1638 and 1642 without a prior NULL check:
ifmsh->chsw_ttl = elems->mesh_chansw_params_ie->mesh_ttl;
...
pre_value = le16_to_cpu(elems->mesh_chansw_params_ie->mesh_pre_value);
The mesh_matches_local() check above only validates the Mesh ID,
Mesh Configuration, and Supported Rates IEs. It does not verify the
presence of the Mesh Channel Switch Parameters IE (element ID 118).
When a received CSA action frame omits that IE, ieee802_11_parse_elems()
leaves elems->mesh_chansw_params_ie as NULL, and the unconditional
dereference causes a kernel NULL pointer dereference.
A remote mesh peer with an established peer link (PLINK_ESTAB) can
trigger this by sending a crafted SPECTRUM_MGMT/CHL_SWITCH action frame
that includes a matching Mesh ID and Mesh Configuration IE but omits the
Mesh Channel Switch Parameters IE. No authentication beyond the default
open mesh peering is required.
Crash confirmed on kernel 6.17.0-5-generic via mac80211_hwsim:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Oops: Oops: 0000 [#1] SMP NOPTI
RIP: 0010:ieee80211_mesh_rx_queued_mgmt+0x143/0x2a0 [mac80211]
CR2: 0000000000000000
Fix by adding a NULL check for mesh_chansw_params_ie after
mesh_matches_local() returns, consistent with how other optional IEs
are guarded throughout the mesh code.
The bug has been present since v3.13 (released 2014-01-19).
Fixes: 8f2535b92d68 ("mac80211: process the CSA frame for mesh accordingly")
Cc: stable@vger.kernel.org
Signed-off-by: Vahagn Vardanian <vahagn@redrays.io>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu Feb 26 20:11:16 2026 +0100
wifi: mt76: Fix possible oob access in mt76_connac2_mac_write_txwi_80211()
[ Upstream commit 4e10a730d1b511ff49723371ed6d694dd1b2c785 ]
Check frame length before accessing the mgmt fields in
mt76_connac2_mac_write_txwi_80211 in order to avoid a possible oob
access.
Fixes: 577dbc6c656d ("mt76: mt7915: enable offloading of sequence number assignment")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260226-mt76-addba-req-oob-access-v1-3-b0f6d1ad4850@kernel.org
[fix check to also cover mgmt->u.action.u.addba_req.capab,
correct Fixes tag]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu Feb 26 20:11:15 2026 +0100
wifi: mt76: mt7925: Fix possible oob access in mt7925_mac_write_txwi_80211()
[ Upstream commit c41a9abd6ae31d130e8f332e7c8800c4c866234b ]
Check frame length before accessing the mgmt fields in
mt7925_mac_write_txwi_80211 in order to avoid a possible oob access.
Fixes: c948b5da6bbec ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 chips")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260226-mt76-addba-req-oob-access-v1-2-b0f6d1ad4850@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu Feb 26 20:11:14 2026 +0100
wifi: mt76: mt7996: Fix possible oob access in mt7996_mac_write_txwi_80211()
[ Upstream commit 60862846308627e9e15546bb647a00de44deb27b ]
Check frame length before accessing the mgmt fields in
mt7996_mac_write_txwi_80211 in order to avoid a possible oob access.
Fixes: 98686cd21624c ("wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260226-mt76-addba-req-oob-access-v1-1-b0f6d1ad4850@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Johannes Berg <johannes.berg@intel.com>
Date: Tue Feb 17 13:05:26 2026 +0100
wifi: radiotap: reject radiotap with unknown bits
commit c854758abe0b8d86f9c43dc060ff56a0ee5b31e0 upstream.
The radiotap parser is currently only used with the radiotap
namespace (not with vendor namespaces), but if the undefined
field 18 is used, the alignment/size is unknown as well. In
this case, iterator->_next_ns_data isn't initialized (it's
only set for skipping vendor namespaces), and syzbot points
out that we later compare against this uninitialized value.
Fix this by moving the rejection of unknown radiotap fields
down to after the in-namespace lookup, so it will really use
iterator->_next_ns_data only for vendor namespaces, even in
case undefined fields are present.
Cc: stable@vger.kernel.org
Fixes: 33e5a2f776e3 ("wireless: update radiotap parser")
Reported-by: syzbot+b09c1af8764c0097bb19@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/69944a91.a70a0220.2c38d7.00fc.GAE@google.com
Link: https://patch.msgid.link/20260217120526.162647-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm>
Date: Sat Feb 21 17:28:04 2026 +0100
wifi: rsi: Don't default to -EOPNOTSUPP in rsi_mac80211_config
[ Upstream commit d973b1039ccde6b241b438d53297edce4de45b5c ]
This triggers a WARN_ON in ieee80211_hw_conf_init and isn't the expected
behavior from the driver - other drivers default to 0 too.
Fixes: 0a44dfc07074 ("wifi: mac80211: simplify non-chanctx drivers")
Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm>
Link: https://patch.msgid.link/20260221-rsi-config-ret-v1-1-9a8f805e2f31@puri.sm
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Bart Van Assche <bvanassche@acm.org>
Date: Mon Feb 23 14:00:25 2026 -0800
wifi: wlcore: Fix a locking bug
[ Upstream commit 72c6df8f284b3a49812ce2ac136727ace70acc7c ]
Make sure that wl->mutex is locked before it is unlocked. This has been
detected by the Clang thread-safety analyzer.
Fixes: 45aa7f071b06 ("wlcore: Use generic runtime pm calls for wowlan elp configuration")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260223220102.2158611-26-bart.vanassche@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Marco Crivellari <marco.crivellari@suse.com>
Date: Sat Jun 14 15:35:29 2025 +0200
workqueue: Add system_percpu_wq and system_dfl_wq
[ Upstream commit 128ea9f6ccfb6960293ae4212f4f97165e42222d ]
Currently, if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
system_wq is a per-CPU worqueue, yet nothing in its name tells about that
CPU affinity constraint, which is very often not required by users. Make it
clear by adding a system_percpu_wq.
system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.
Adding system_dfl_wq to encourage its use when unbound work should be used.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Stable-dep-of: 870c2e7cd881 ("Input: synaptics_i2c - guard polling restart in resume")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yazen Ghannam <yazen.ghannam@amd.com>
Date: Tue Nov 11 14:53:57 2025 +0000
x86/acpi/boot: Correct acpi_is_processor_usable() check again
[ Upstream commit adbf61cc47cb72b102682e690ad323e1eda652c2 ]
ACPI v6.3 defined a new "Online Capable" MADT LAPIC flag. This bit is
used in conjunction with the "Enabled" MADT LAPIC flag to determine if
a CPU can be enabled/hotplugged by the OS after boot.
Before the new bit was defined, the "Enabled" bit was explicitly
described like this (ACPI v6.0 wording provided):
"If zero, this processor is unusable, and the operating system
support will not attempt to use it"
This means that CPU hotplug (based on MADT) is not possible. Many BIOS
implementations follow this guidance. They may include LAPIC entries in
MADT for unavailable CPUs, but since these entries are marked with
"Enabled=0" it is expected that the OS will completely ignore these
entries.
However, QEMU will do the same (include entries with "Enabled=0") for
the purpose of allowing CPU hotplug within the guest.
Comment from QEMU function pc_madt_cpu_entry():
/* ACPI spec says that LAPIC entry for non present
* CPU may be omitted from MADT or it must be marked
* as disabled. However omitting non present CPU from
* MADT breaks hotplug on linux. So possible CPUs
* should be put in MADT but kept disabled.
*/
Recent Linux topology changes broke the QEMU use case. A following fix
for the QEMU use case broke bare metal topology enumeration.
Rework the Linux MADT LAPIC flags check to allow the QEMU use case only
for guests and to maintain the ACPI spec behavior for bare metal.
Remove an unnecessary check added to fix a bare metal case introduced by
the QEMU "fix".
[ bp: Change logic as Michal suggested. ]
[ mingo: Removed misapplied -stable tag. ]
Fixes: fed8d8773b8e ("x86/acpi/boot: Correct acpi_is_processor_usable() check")
Fixes: f0551af02130 ("x86/topology: Ignore non-present APIC IDs in a present package")
Closes: https://lore.kernel.org/r/20251024204658.3da9bf3f.michal.pecio@gmail.com
Reported-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Michal Pecio <michal.pecio@gmail.com>
Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lore.kernel.org/20251111145357.4031846-1-yazen.ghannam@amd.com
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Mike Rapoport (Microsoft) <rppt@kernel.org>
Date: Wed Feb 25 08:55:55 2026 +0200
x86/efi: defer freeing of boot services memory
commit a4b0bf6a40f3c107c67a24fbc614510ef5719980 upstream.
efi_free_boot_services() frees memory occupied by EFI_BOOT_SERVICES_CODE
and EFI_BOOT_SERVICES_DATA using memblock_free_late().
There are two issue with that: memblock_free_late() should be used for
memory allocated with memblock_alloc() while the memory reserved with
memblock_reserve() should be freed with free_reserved_area().
More acutely, with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
efi_free_boot_services() is called before deferred initialization of the
memory map is complete.
Benjamin Herrenschmidt reports that this causes a leak of ~140MB of
RAM on EC2 t3a.nano instances which only have 512MB or RAM.
If the freed memory resides in the areas that memory map for them is
still uninitialized, they won't be actually freed because
memblock_free_late() calls memblock_free_pages() and the latter skips
uninitialized pages.
Using free_reserved_area() at this point is also problematic because
__free_page() accesses the buddy of the freed page and that again might
end up in uninitialized part of the memory map.
Delaying the entire efi_free_boot_services() could be problematic
because in addition to freeing boot services memory it updates
efi.memmap without any synchronization and that's undesirable late in
boot when there is concurrency.
More robust approach is to only defer freeing of the EFI boot services
memory.
Split efi_free_boot_services() in two. First efi_unmap_boot_services()
collects ranges that should be freed into an array then
efi_free_boot_services() later frees them after deferred init is complete.
Link: https://lore.kernel.org/all/ec2aaef14783869b3be6e3c253b2dcbf67dbc12a.camel@kernel.crashing.org
Fixes: 916f676f8dc0 ("x86, efi: Retain boot service code until after switching to virtual mode")
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Tue Jan 6 13:15:04 2026 +0000
x86/fred: Correct speculative safety in fred_extint()
[ Upstream commit aa280a08e7d8fae58557acc345b36b3dc329d595 ]
array_index_nospec() is no use if the result gets spilled to the stack, as
it makes the believed safe-under-speculation value subject to memory
predictions.
For all practical purposes, this means array_index_nospec() must be used in
the expression that accesses the array.
As the code currently stands, it's the wrong side of irqentry_enter(), and
'index' is put into %ebp across the function call.
Remove the index variable and reposition array_index_nospec(), so it's
calculated immediately before the array access.
Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260106131504.679932-1-andrew.cooper3@citrix.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Mon Sep 30 14:49:47 2024 -0400
xattr: switch to CLASS(fd)
commit a71874379ec8c6e788a61d71b3ad014a8d9a5c08 upstream.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Larysa Zaremba <larysa.zaremba@intel.com>
Date: Thu Mar 5 12:12:50 2026 +0100
xdp: produce a warning when calculated tailroom is negative
[ Upstream commit 8821e857759be9db3cde337ad328b71fe5c8a55f ]
Many ethernet drivers report xdp Rx queue frag size as being the same as
DMA write size. However, the only user of this field, namely
bpf_xdp_frags_increase_tail(), clearly expects a truesize.
Such difference leads to unspecific memory corruption issues under certain
circumstances, e.g. in ixgbevf maximum DMA write size is 3 KB, so when
running xskxceiver's XDP_ADJUST_TAIL_GROW_MULTI_BUFF, 6K packet fully uses
all DMA-writable space in 2 buffers. This would be fine, if only
rxq->frag_size was properly set to 4K, but value of 3K results in a
negative tailroom, because there is a non-zero page offset.
We are supposed to return -EINVAL and be done with it in such case, but due
to tailroom being stored as an unsigned int, it is reported to be somewhere
near UINT_MAX, resulting in a tail being grown, even if the requested
offset is too much (it is around 2K in the abovementioned test). This later
leads to all kinds of unspecific calltraces.
[ 7340.337579] xskxceiver[1440]: segfault at 1da718 ip 00007f4161aeac9d sp 00007f41615a6a00 error 6
[ 7340.338040] xskxceiver[1441]: segfault at 7f410000000b ip 00000000004042b5 sp 00007f415bffecf0 error 4
[ 7340.338179] in libc.so.6[61c9d,7f4161aaf000+160000]
[ 7340.339230] in xskxceiver[42b5,400000+69000]
[ 7340.340300] likely on CPU 6 (core 0, socket 6)
[ 7340.340302] Code: ff ff 01 e9 f4 fe ff ff 0f 1f 44 00 00 4c 39 f0 74 73 31 c0 ba 01 00 00 00 f0 0f b1 17 0f 85 ba 00 00 00 49 8b 87 88 00 00 00 <4c> 89 70 08 eb cc 0f 1f 44 00 00 48 8d bd f0 fe ff ff 89 85 ec fe
[ 7340.340888] likely on CPU 3 (core 0, socket 3)
[ 7340.345088] Code: 00 00 00 ba 00 00 00 00 be 00 00 00 00 89 c7 e8 31 ca ff ff 89 45 ec 8b 45 ec 85 c0 78 07 b8 00 00 00 00 eb 46 e8 0b c8 ff ff <8b> 00 83 f8 69 74 24 e8 ff c7 ff ff 8b 00 83 f8 0b 74 18 e8 f3 c7
[ 7340.404334] Oops: general protection fault, probably for non-canonical address 0x6d255010bdffc: 0000 [#1] SMP NOPTI
[ 7340.405972] CPU: 7 UID: 0 PID: 1439 Comm: xskxceiver Not tainted 6.19.0-rc1+ #21 PREEMPT(lazy)
[ 7340.408006] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-5.fc42 04/01/2014
[ 7340.409716] RIP: 0010:lookup_swap_cgroup_id+0x44/0x80
[ 7340.410455] Code: 83 f8 1c 73 39 48 ba ff ff ff ff ff ff ff 03 48 8b 04 c5 20 55 fa bd 48 21 d1 48 89 ca 83 e1 01 48 d1 ea c1 e1 04 48 8d 04 90 <8b> 00 48 83 c4 10 d3 e8 c3 cc cc cc cc 31 c0 e9 98 b7 dd 00 48 89
[ 7340.412787] RSP: 0018:ffffcc5c04f7f6d0 EFLAGS: 00010202
[ 7340.413494] RAX: 0006d255010bdffc RBX: ffff891f477895a8 RCX: 0000000000000010
[ 7340.414431] RDX: 0001c17e3fffffff RSI: 00fa070000000000 RDI: 000382fc7fffffff
[ 7340.415354] RBP: 00fa070000000000 R08: ffffcc5c04f7f8f8 R09: ffffcc5c04f7f7d0
[ 7340.416283] R10: ffff891f4c1a7000 R11: ffffcc5c04f7f9c8 R12: ffffcc5c04f7f7d0
[ 7340.417218] R13: 03ffffffffffffff R14: 00fa06fffffffe00 R15: ffff891f47789500
[ 7340.418229] FS: 0000000000000000(0000) GS:ffff891ffdfaa000(0000) knlGS:0000000000000000
[ 7340.419489] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7340.420286] CR2: 00007f415bfffd58 CR3: 0000000103f03002 CR4: 0000000000772ef0
[ 7340.421237] PKRU: 55555554
[ 7340.421623] Call Trace:
[ 7340.421987] <TASK>
[ 7340.422309] ? softleaf_from_pte+0x77/0xa0
[ 7340.422855] swap_pte_batch+0xa7/0x290
[ 7340.423363] zap_nonpresent_ptes.constprop.0.isra.0+0xd1/0x270
[ 7340.424102] zap_pte_range+0x281/0x580
[ 7340.424607] zap_pmd_range.isra.0+0xc9/0x240
[ 7340.425177] unmap_page_range+0x24d/0x420
[ 7340.425714] unmap_vmas+0xa1/0x180
[ 7340.426185] exit_mmap+0xe1/0x3b0
[ 7340.426644] __mmput+0x41/0x150
[ 7340.427098] exit_mm+0xb1/0x110
[ 7340.427539] do_exit+0x1b2/0x460
[ 7340.427992] do_group_exit+0x2d/0xc0
[ 7340.428477] get_signal+0x79d/0x7e0
[ 7340.428957] arch_do_signal_or_restart+0x34/0x100
[ 7340.429571] exit_to_user_mode_loop+0x8e/0x4c0
[ 7340.430159] do_syscall_64+0x188/0x6b0
[ 7340.430672] ? __do_sys_clone3+0xd9/0x120
[ 7340.431212] ? switch_fpu_return+0x4e/0xd0
[ 7340.431761] ? arch_exit_to_user_mode_prepare.isra.0+0xa1/0xc0
[ 7340.432498] ? do_syscall_64+0xbb/0x6b0
[ 7340.433015] ? __handle_mm_fault+0x445/0x690
[ 7340.433582] ? count_memcg_events+0xd6/0x210
[ 7340.434151] ? handle_mm_fault+0x212/0x340
[ 7340.434697] ? do_user_addr_fault+0x2b4/0x7b0
[ 7340.435271] ? clear_bhb_loop+0x30/0x80
[ 7340.435788] ? clear_bhb_loop+0x30/0x80
[ 7340.436299] ? clear_bhb_loop+0x30/0x80
[ 7340.436812] ? clear_bhb_loop+0x30/0x80
[ 7340.437323] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 7340.437973] RIP: 0033:0x7f4161b14169
[ 7340.438468] Code: Unable to access opcode bytes at 0x7f4161b1413f.
[ 7340.439242] RSP: 002b:00007ffc6ebfa770 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[ 7340.440173] RAX: fffffffffffffe00 RBX: 00000000000005a1 RCX: 00007f4161b14169
[ 7340.441061] RDX: 00000000000005a1 RSI: 0000000000000109 RDI: 00007f415bfff990
[ 7340.441943] RBP: 00007ffc6ebfa7a0 R08: 0000000000000000 R09: 00000000ffffffff
[ 7340.442824] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 7340.443707] R13: 0000000000000000 R14: 00007f415bfff990 R15: 00007f415bfff6c0
[ 7340.444586] </TASK>
[ 7340.444922] Modules linked in: rfkill intel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel vfat fat kvm snd_pcm irqbypass rapl iTCO_wdt snd_timer intel_pmc_bxt iTCO_vendor_support snd ixgbevf virtio_net soundcore i2c_i801 pcspkr libeth_xdp net_failover i2c_smbus lpc_ich failover libeth virtio_balloon joydev 9p fuse loop zram lz4hc_compress lz4_compress 9pnet_virtio 9pnet netfs ghash_clmulni_intel serio_raw qemu_fw_cfg
[ 7340.449650] ---[ end trace 0000000000000000 ]---
The issue can be fixed in all in-tree drivers, but we cannot just trust OOT
drivers to not do this. Therefore, make tailroom a signed int and produce a
warning when it is negative to prevent such mistakes in the future.
Fixes: bf25146a5595 ("bpf: add frags support to the bpf_xdp_adjust_tail() API")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20260305111253.2317394-10-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Larysa Zaremba <larysa.zaremba@intel.com>
Date: Thu Mar 5 12:12:42 2026 +0100
xdp: use modulo operation to calculate XDP frag tailroom
[ Upstream commit 88b6b7f7b216108a09887b074395fa7b751880b1 ]
The current formula for calculating XDP tailroom in mbuf packets works only
if each frag has its own page (if rxq->frag_size is PAGE_SIZE), this
defeats the purpose of the parameter overall and without any indication
leads to negative calculated tailroom on at least half of frags, if shared
pages are used.
There are not many drivers that set rxq->frag_size. Among them:
* i40e and enetc always split page uniformly between frags, use shared
pages
* ice uses page_pool frags via libeth, those are power-of-2 and uniformly
distributed across page
* idpf has variable frag_size with XDP on, so current API is not applicable
* mlx5, mtk and mvneta use PAGE_SIZE or 0 as frag_size for page_pool
As for AF_XDP ZC, only ice, i40e and idpf declare frag_size for it. Modulo
operation yields good results for aligned chunks, they are all power-of-2,
between 2K and PAGE_SIZE. Formula without modulo fails when chunk_size is
2K. Buffers in unaligned mode are not distributed uniformly, so modulo
operation would not work.
To accommodate unaligned buffers, we could define frag_size as
data + tailroom, and hence do not subtract offset when calculating
tailroom, but this would necessitate more changes in the drivers.
Define rxq->frag_size as an even portion of a page that fully belongs to a
single frag. When calculating tailroom, locate the data start within such
portion by performing a modulo operation on page offset.
Fixes: bf25146a5595 ("bpf: add frags support to the bpf_xdp_adjust_tail() API")
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20260305111253.2317394-2-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: David Thomson <dt@linux-mail.net>
Date: Tue Feb 24 09:37:11 2026 +0000
xen/acpi-processor: fix _CST detection using undersized evaluation buffer
[ Upstream commit 8b57227d59a86fc06d4f09de08f98133680f2cae ]
read_acpi_id() attempts to evaluate _CST using a stack buffer of
sizeof(union acpi_object) (48 bytes), but _CST returns a nested Package
of sub-Packages (one per C-state, each containing a register descriptor,
type, latency, and power) requiring hundreds of bytes. The evaluation
always fails with AE_BUFFER_OVERFLOW.
On modern systems using FFH/MWAIT entry (where pblk is zero), this
causes the function to return before setting the acpi_id_cst_present
bit. In check_acpi_ids(), flags.power is then zero for all Phase 2 CPUs
(physical CPUs beyond dom0's vCPU count), so push_cxx_to_hypervisor() is
never called for them.
On a system with dom0_max_vcpus=2 and 8 physical CPUs, only PCPUs 0-1
receive C-state data. PCPUs 2-7 are stuck in C0/C1 idle, unable to
enter C2/C3. This costs measurable wall power (4W observed on an Intel
Core Ultra 7 265K with Xen 4.20).
The function never uses the _CST return value -- it only needs to know
whether _CST exists. Replace the broken acpi_evaluate_object() call with
acpi_has_method(), which correctly detects _CST presence using
acpi_get_handle() without any buffer allocation. This brings C-state
detection to parity with the P-state path, which already works correctly
for Phase 2 CPUs.
Fixes: 59a568029181 ("xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.")
Signed-off-by: David Thomson <dt@linux-mail.net>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20260224093707.19679-1-dt@linux-mail.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Nikhil P. Rao <nikhil.rao@amd.com>
Date: Wed Feb 25 00:00:26 2026 +0000
xsk: Fix fragment node deletion to prevent buffer leak
[ Upstream commit 60abb0ac11dccd6b98fd9182bc5f85b621688861 ]
After commit b692bf9a7543 ("xsk: Get rid of xdp_buff_xsk::xskb_list_node"),
the list_node field is reused for both the xskb pool list and the buffer
free list, this causes a buffer leak as described below.
xp_free() checks if a buffer is already on the free list using
list_empty(&xskb->list_node). When list_del() is used to remove a node
from the xskb pool list, it doesn't reinitialize the node pointers.
This means list_empty() will return false even after the node has been
removed, causing xp_free() to incorrectly skip adding the buffer to the
free list.
Fix this by using list_del_init() instead of list_del() in all fragment
handling paths, this ensures the list node is reinitialized after removal,
allowing the list_empty() to work correctly.
Fixes: b692bf9a7543 ("xsk: Get rid of xdp_buff_xsk::xskb_list_node")
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
Link: https://patch.msgid.link/20260225000456.107806-2-nikhil.rao@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: f7387d6579d6 ("xsk: Fix zero-copy AF_XDP fragment drop")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Nikhil P. Rao <nikhil.rao@amd.com>
Date: Wed Feb 25 00:00:27 2026 +0000
xsk: Fix zero-copy AF_XDP fragment drop
[ Upstream commit f7387d6579d65efd490a864254101cb665f2e7a7 ]
AF_XDP should ensure that only a complete packet is sent to application.
In the zero-copy case, if the Rx queue gets full as fragments are being
enqueued, the remaining fragments are dropped.
For the multi-buffer case, add a check to ensure that the Rx queue has
enough space for all fragments of a packet before starting to enqueue
them.
Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
Link: https://patch.msgid.link/20260225000456.107806-3-nikhil.rao@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Mon Oct 7 14:24:53 2024 +0200
xsk: Get rid of xdp_buff_xsk::xskb_list_node
[ Upstream commit b692bf9a7543af7ad11a59d182a3757578f0ba53 ]
Let's bring xdp_buff_xsk back to occupying 2 cachelines by removing
xskb_list_node - for the purpose of gathering the xskb frags
free_list_node can be used, head of the list (xsk_buff_pool::xskb_list)
stays as-is, just reuse the node ptr.
It is safe to do as a single xdp_buff_xsk can never reside in two
pool's lists simultaneously.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20241007122458.282590-2-maciej.fijalkowski@intel.com
Stable-dep-of: f7387d6579d6 ("xsk: Fix zero-copy AF_XDP fragment drop")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Larysa Zaremba <larysa.zaremba@intel.com>
Date: Thu Mar 5 12:12:43 2026 +0100
xsk: introduce helper to determine rxq->frag_size
[ Upstream commit 16394d80539937d348dd3b9ea32415c54e67a81b ]
rxq->frag_size is basically a step between consecutive strictly aligned
frames. In ZC mode, chunk size fits exactly, but if chunks are unaligned,
there is no safe way to determine accessible space to grow tailroom.
Report frag_size to be zero, if chunks are unaligned, chunk_size otherwise.
Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20260305111253.2317394-3-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Mon Oct 7 14:24:54 2024 +0200
xsk: s/free_list_node/list_node/
[ Upstream commit 30ec2c1baaead43903ad63ff8e3083949059083c ]
Now that free_list_node's purpose is two-folded, make it just a
'list_node'.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20241007122458.282590-3-maciej.fijalkowski@intel.com
Stable-dep-of: f7387d6579d6 ("xsk: Fix zero-copy AF_XDP fragment drop")
Signed-off-by: Sasha Levin <sashal@kernel.org>