The function idle_thread_get() can return an error pointer and is not
checked for it. Add check for error pointer.
Detected by Smatch:
arch/x86/hyperv/hv_vtl.c:126 hv_vtl_bringup_vcpu() error:
'idle' dereferencing possible ERR_PTR()
Fixes: 2b4b90e053 ("x86/hyperv: Use per cpu initial stack for vtl context")
Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
MSVC compiler, used to compile the Microsoft Hypervisor, currently
has an assert intrinsic that uses interrupt vector 0x29 to create an
exception. This will cause hypervisor to then crash and collect core. As
such, if this interrupt number is assigned to a device by Linux and the
device generates it, hypervisor will crash. There are two other such
vectors hard coded in the hypervisor, 0x2C and 0x2D for debug purposes.
Fortunately, the three vectors are part of the kernel driver space and
that makes it feasible to reserve them early so they are not assigned
later.
Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The verification signature header generation requires converting a
binary certificate to a C array. Previously this only worked with xxd,
and a switch to hexdump has been done in commit b640d556a2
("selftests/bpf: Remove xxd util dependency").
hexdump is a more common utility program, yet it might not be installed
by default. When it is not installed, BPF selftests build without
errors, but tests_progs is unusable: it exits with the 255 code and
without any error messages. When manually reproducing the issue, it is
not too hard to find out that the generated verification_cert.h file is
incorrect, but that's time consuming. When digging the BPF selftests
build logs, this line can be seen amongst thousands others, but ignored:
/bin/sh: 2: hexdump: not found
Here, od is used instead of hexdump. od is coming from the coreutils
package, and this new od command produces the same output when using od
from GNU coreutils, uutils, and even busybox. This is more portable, and
it produces a similar results to what was done before with hexdump:
there is an extra comma at the end instead of trailing whitespaces,
but the C code is not impacted.
Fixes: b640d556a2 ("selftests/bpf: Remove xxd util dependency")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20260218-bpf-sft-hexdump-od-v2-1-2f9b3ee5ab86@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull Kbuild fixes from Nathan Chancellor:
- Ensure tools/objtool is cleaned by 'make clean' and 'make mrproper'
- Fix test program for CONFIG_CC_CAN_LINK to avoid a warning, which is
made fatal by -Werror
- Drop explicit LZMA parallel compression in scripts/make_fit.py
- Several fixes for commit 62089b8048 ("kbuild: rpm-pkg: Generate
debuginfo package manually")
* tag 'kbuild-fixes-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: rpm-pkg: Disable automatic requires for manual debuginfo package
kbuild: rpm-pkg: Fix manual debuginfo generation when using .src.rpm
kernel: rpm-pkg: Restore find-debuginfo.sh approach to -debuginfo package
kbuild: rpm-pkg: Restrict manual debug package creation
scripts/make_fit.py: Drop explicit LZMA parallel compression
kbuild: Fix CC_CAN_LINK detection
kbuild: Add objtool to top-level clean target
Ihor Solodrai says:
====================
libbpf: Remove extern declaration of bpf_stream_vprintk()
The first patch adjusts a selftest that has been using
bpf_stream_printk() macro. The second patch removes the declaration.
====================
Link: https://patch.msgid.link/20260218215651.2057673-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull thermal control fix from Rafael Wysocki:
"This fixes a sysfs group leak on DLVR registration failure in the
Intel int340x thermal driver (Kaushlendra Kumar)"
* tag 'thermal-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: int340x: Fix sysfs group leak on DLVR registration failure
Pull more ACPI support updates from Rafael J. Wysocki:
"These are mostly fixes and cleanups on top of the ACPI support updates
merged recently, including two new quirks, an ACPI CPPC library fix,
and fixes and cleanups of a few core ACPI device drivers:
- Add an unused power resource handling quirk for THUNDEROBOT ZERO
(Zhai Can)
- Fix remaining for_each_possible_cpu() in the ACPI CPPC library to
use online CPUs (Sean V Kelley)
- Drop redundant checks from the ACPI notify handler and the driver
remove callback in the ACPI battery driver (Rafael Wysocki)
- Move the creation of the wakeup source during the ACPI button
driver probe to an earlier point to avoid missing a wakeup event
due to a race and clean up system wakeup handling and remove
callback in that driver (Rafael Wysocki)
- Drop unnecessary driver_data pointer clearing from the ACPI EC and
SMBUS HC drivers and make the ACPI backlight (video) driver clear
the device's driver_data pointer on remove (Rafael Wysocki)
- Force enabling of PWM2 on the Yogabook YB1-X90 tablets (Yauhen
Kharuzhy)"
* tag 'acpi-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: Add unused power resource quirk for THUNDEROBOT ZERO
ACPI: driver: Drop driver_data pointer clearing from two drivers
ACPI: video: Clear driver_data pointer on remove
ACPI: button: Tweak acpi_button_remove()
ACPI: button: Tweak system wakeup handling
ACPI: battery: Drop redundant checks from acpi_battery_remove()
ACPI: CPPC: Fix remaining for_each_possible_cpu() to use online CPUs
ACPI: x86: Force enabling of PWM2 on the Yogabook YB1-X90
ACPI: button: Call device_init_wakeup() earlier during probe
ACPI: battery: Drop redundant check from acpi_battery_notify()
Pull more power management updates from Rafael Wysocki:
"These are mostly fixes on top of the power management updates merged
recently in cpuidle governors, in the Intel RAPL power capping driver
and in the wake IRQ management code:
- Fix the handling of package-scope MSRs in the intel_rapl power
capping driver when called from the PMU subsystem and make it add
all package CPUs to the PMU cpumask to allow tools to read RAPL
events from any CPU in the package (Kuppuswamy Satharayananyan)
- Rework the invalid version check in the intel_rapl_tpmi power
capping driver to account for the fact that on partitioned systems,
multiple TPMI instances may exist per package, but RAPL registers
are only valid on one instance (Kuppuswamy Satharayananyan)
- Describe the new intel_idle.table command line option in the
admin-guide intel_idle documentation (Artem Bityutskiy)
- Fix a crash in the ladder cpuidle governor on systems with only one
(polling) idle state available by making the cpuidle core bypass
the governor in those cases and adjust the other existing governors
to that change (Aboorva Devarajan, Christian Loehle)
- Update kerneldoc comments for wake IRQ management functions that
have not been matching the code (Wang Jiayue)"
* tag 'pm-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: menu: Remove single state handling
cpuidle: teo: Remove single state handling
cpuidle: haltpoll: Remove single state handling
cpuidle: Skip governor when only one idle state is available
powercap: intel_rapl_tpmi: Remove FW_BUG from invalid version check
PM: sleep: wakeirq: Update outdated documentation comments
Documentation: PM: Document intel_idle.table command line option
powercap: intel_rapl: Expose all package CPUs in PMU cpumask
powercap: intel_rapl: Remove incorrect CPU check in PMU context
I finally got a big endian PPC64 kernel to boot in QEMU. The PPC64 VSX
optimized AES library code does work in that case, with the exception of
rndkey_from_vsx() which doesn't take into account that the order in
which the VSX code stores the round key words depends on the endianness.
So fix rndkey_from_vsx() to do the right thing on big endian CPUs.
Fixes: 7cf2082e74 ("lib/crypto: powerpc/aes: Migrate POWER8 optimized code into library")
Link: https://lore.kernel.org/r/20260216022104.332991-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Smatch static checker warning:
security/apparmor/policy_unpack.c:966 unpack_pdb()
warn: unsigned 'unpack_tags(e, &pdb->tags, info)' is never less than zero.
unpack_tags() is declared with return type size_t (unsigned) but returns
negative errno values on failure. The caller in unpack_pdb() tests the
return with `< 0`, which is always false for an unsigned type, making
error handling dead code. Malformed tag data would be silently accepted
instead of causing a load failure.
Change return type of unpack_tags() from size_t to int to match the
functions's actual semantic.
Fixes: 3d28e2397a ("apparmor: add support loading per permission tagging")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Massimiliano Pellizzer <mpellizzer.dev@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Merge additional updates of multiple core ACPI device drivers (battery,
button, video, EC, SMBUS HC) for 7.0-rc1:
- Drop redundant checks from the ACPI notify handler and the driver
remove callback in the ACPI battery driver (Rafael Wysocki)
- Move the creation of the wakeup source during the ACPI button driver
probe to an earlier point to avoid missing a wakeup event due to a
race and clean up system wakeup handling and remove callback in that
driver (Rafael Wysocki)
- Drop unnecessary driver_data pointer clearing from the ACPI EC and
SMBUS HC drivers and make the ACPI backlight (video) driver clear the
device's driver_data pointer on remove (Rafael Wysocki)
* acpi-battery:
ACPI: battery: Drop redundant checks from acpi_battery_remove()
ACPI: battery: Drop redundant check from acpi_battery_notify()
* acpi-button:
ACPI: button: Tweak acpi_button_remove()
ACPI: button: Tweak system wakeup handling
ACPI: button: Call device_init_wakeup() earlier during probe
* acpi-driver:
ACPI: driver: Drop driver_data pointer clearing from two drivers
ACPI: video: Clear driver_data pointer on remove
Merge an ACPI power management update and an ACPI CPPC library update
for 7.0-rc1:
- Add an unused power resource handling quirk for THUNDEROBOT ZERO (Zhai
Can)
- Fix remaining for_each_possible_cpu() in the ACPI CPPC library to use
online CPUs (Sean V Kelley)
* acpi-pm:
ACPI: PM: Add unused power resource quirk for THUNDEROBOT ZERO
* acpi-cppc:
ACPI: CPPC: Fix remaining for_each_possible_cpu() to use online CPUs
Merge additional power capping and cpuidle updates for 7.0-rc1:
- Fix the handling of package-scope MSRs in the intel_rapl power
capping driver when called from the PMU subsystem and make it add all
package CPUs to the PMU cpumask to allow tools to read RAPL events
from any CPU in the package (Kuppuswamy Sathyanarayanan)
- Rework the invalid version check in the intel_rapl_tpmi power capping
driver to account for the fact that on partitioned systems, multiple
TPMI instances may exist per package, but RAPL registers are only
valid on one instance (Kuppuswamy Satharayananyan)
- Describe the new intel_idle.table command line option in the
admin-guide intel_idle documentation (Artem Bityutskiy)
- Fix a crash in the ladder cpuidle governor on systems with only one
(polling) idle state available by making the cpuidle core bypass the
governor in those cases and adjust the other existing governors to
that change (Aboorva Devarajan, Christian Loehle)
* pm-powercap:
powercap: intel_rapl_tpmi: Remove FW_BUG from invalid version check
powercap: intel_rapl: Expose all package CPUs in PMU cpumask
powercap: intel_rapl: Remove incorrect CPU check in PMU context
* pm-cpuidle:
cpuidle: menu: Remove single state handling
cpuidle: teo: Remove single state handling
cpuidle: haltpoll: Remove single state handling
cpuidle: Skip governor when only one idle state is available
Documentation: PM: Document intel_idle.table command line option
Pull sysctl updates from Joel Granados:
- Remove macros from proc handler converters
Replace the proc converter macros with "regular" functions. Though it
is more verbose than the macro version, it helps when debugging and
better aligns with coding-style.rst.
- General cleanup
Remove superfluous ctl_table forward declarations. Const qualify the
memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays.
Add missing kernel doc to proc_dointvec_conv.
- Testing
This series was run through sysctl selftests/kunit test suite in
x86_64. And went into linux-next after rc4, giving it a good 3 weeks
of testing
* tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions
sysctl: Replace unidirectional INT converter macros with functions
sysctl: Add kernel doc to proc_douintvec_conv
sysctl: Replace UINT converter macros with functions
sysctl: Add CONFIG_PROC_SYSCTL guards for converter macros
sysctl: clarify proc_douintvec_minmax doc
sysctl: Return -ENOSYS from proc_douintvec_conv when CONFIG_PROC_SYSCTL=n
sysctl: Remove unused ctl_table forward declarations
loadpin: Implement custom proc_handler for enforce
alloc_tag: move memory_allocation_profiling_sysctls into .rodata
sysctl: Add missing kernel-doc for proc_dointvec_conv
Pull turbostat fix from Len Brown:
"Fix a recent AMD regression due to errant code cleanup"
* tag 'turbostat-2026.02.14-AMD-RAPL-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: Fix AMD RAPL regression
The io_zcrx_put_niov_uref() function uses a non-atomic
check-then-decrement pattern (atomic_read followed by separate
atomic_dec) to manipulate user_refs. This is serialized against other
callers by rq_lock, but io_zcrx_scrub() modifies the same counter with
atomic_xchg() WITHOUT holding rq_lock.
On SMP systems, the following race exists:
CPU0 (refill, holds rq_lock) CPU1 (scrub, no rq_lock)
put_niov_uref:
atomic_read(uref) - 1
// window opens
atomic_xchg(uref, 0) - 1
return_niov_freelist(niov) [PUSH #1]
// window closes
atomic_dec(uref) - wraps to -1
returns true
return_niov(niov)
return_niov_freelist(niov) [PUSH #2: DOUBLE-FREE]
The same niov is pushed to the freelist twice, causing free_count to
exceed nr_iovs. Subsequent freelist pushes then perform an out-of-bounds
write (a u32 value) past the kvmalloc'd freelist array into the adjacent
slab object.
Fix this by replacing the non-atomic read-then-dec in
io_zcrx_put_niov_uref() with an atomic_try_cmpxchg loop that atomically
tests and decrements user_refs. This makes the operation safe against
concurrent atomic_xchg from scrub without requiring scrub to acquire
rq_lock.
Fixes: 34a3e60821 ("io_uring/zcrx: implement zerocopy receive pp memory provider")
Cc: stable@vger.kernel.org
Signed-off-by: Kai Aizen <kai@snailsploit.com>
[pavel: removed a warning and a comment]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In the original txt format binding document ak4458.txt, the supply names
are 'AVDD-supply', 'DVDD-supply', and they are also used in driver. But in
the commit converting to yaml format, they are changed to 'avdd-supply',
'dvdd-supply'. After search all the dts file, these names 'AVDD-supply',
'DVDD-supply', 'avdd-supply', 'dvdd-supply' are not used in any dts
file. So it is safe to fix the yaml binding document.
Fixes: 829d78e3ea ("ASoC: dt-bindings: ak5558: Convert to dtschema")
Cc: stable@vger.kernel.org
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260212021829.3244736-4-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
In the original txt format binding document ak4458.txt, the supply names
are 'AVDD-supply', 'DVDD-supply', and they are also used in driver. But in
the commit converting to yaml format, they are changed to 'avdd-supply',
'dvdd-supply'. After search all the dts file, these names 'AVDD-supply',
'DVDD-supply', 'avdd-supply', 'dvdd-supply' are not used in any dts
file. So it is safe to fix this yaml binding document.
Fixes: 009e83b591 ("ASoC: dt-bindings: ak4458: Convert to dtschema")
Cc: stable@vger.kernel.org
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260212021829.3244736-3-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
nr_cpu_ids / 2 produces stride 0 on a single-CPU system, which later
causes SCX_BUG_ON(i == j) to fire. Validate stride after option
parsing to also catch invalid user-supplied values via -S.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Use CPU_SET_S() instead of CPU_SET() on the dynamically allocated
cpuset to avoid a potential out-of-bounds write when nr_cpu_ids
exceeds CPU_SETSIZE.
Also destroy the skeleton before returning on invalid central CPU ID
to prevent a resource leak.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Some GPUs have analog connectors that work with a DP bridge chip
and don't actually have an internal DAC: Those should not use
the analog stream encoders.
Fixes: 5834c33fd3 ("drm/amd/display: Add concept of analog encoders (v2)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some GPUs have analog connectors that work with a DP bridge chip
and don't actually have an internal DAC: Those should not use
the analog link encoder code path.
Fixes: 0fbe321a93 ("drm/amd/display: Implement DCE analog link encoders (v2)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
DCE 6 should use the DCE 6 specific link encoder.
This was a copy paste mistake.
Fixes: 0fbe321a93 ("drm/amd/display: Implement DCE analog link encoders (v2)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If btrfs_search_slot_for_read() returns 1, it means we did not find any
key greater than or equals to the key we asked for, meaning we have
reached the end of the tree and therefore the path is not valid. If
this happens we need to break out of the loop and stop, instead of
continuing and accessing an invalid path.
Fixes: 5223cc60b4 ("btrfs: drop the path before adding qgroup items when enabling qgroups")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If the root nodes for the chunk root, tree root or log root are not sector
size aligned, we are logging a warning message but these are in fact
errors that makes the super block validation fail. So change the level of
the messages from warning to error.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When initializing the delayed refs block reserve for a transaction handle
we are passing a type of BTRFS_BLOCK_RSV_DELOPS, which is meant for
delayed items and not for delayed refs. The correct type for delayed refs
is BTRFS_BLOCK_RSV_DELREFS.
On release of any excess space reserved in a local delayed refs reserve,
we also should transfer that excess space to the global block reserve
(it it's full, we return to the space info for general availability).
By initializing a transaction's local delayed refs block reserve with a
type of BTRFS_BLOCK_RSV_DELOPS, we were also causing any excess space
released from the delayed block reserve (fs_info->delayed_block_rsv, used
for delayed inodes and items) to be transferred to the global block
reserve instead of the global delayed refs block reserve. This was an
unintentional change in commit 28270e25c6 ("btrfs: always reserve space
for delayed refs when starting transaction"), but it's not particularly
serious as things tend to cancel out each other most of the time and it's
relatively rare to be anywhere near exhaustion of the global reserve.
Fix this by initializing a transaction's local delayed refs reserve with
a type of BTRFS_BLOCK_RSV_DELREFS and making btrfs_block_rsv_release()
attempt to transfer unused space from such a reserve into the global block
reserve, just as we did before that commit for when the block reserve is
a delayed refs rsv.
Reported-by: Alex Lyakas <alex.lyakas@zadara.com>
Link: https://lore.kernel.org/linux-btrfs/CAOcd+r0FHG5LWzTSu=LknwSoqxfw+C00gFAW7fuX71+Z5AfEew@mail.gmail.com/
Fixes: 28270e25c6 ("btrfs: always reserve space for delayed refs when starting transaction")
Reviewed-by: Alex Lyakas <alex.lyakas@zadara.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that when btrfs hits ENOSPC error in a critical
path, btrfs flips RO (this part is expected, although the ENOSPC bug
still needs to be addressed).
The problem is after the RO flip, if there is a read repair pending, we
can hit the ASSERT() inside btrfs_repair_io_failure() like the following:
BTRFS info (device vdc): relocating block group 30408704 flags metadata|raid1
------------[ cut here ]------------
BTRFS: Transaction aborted (error -28)
WARNING: fs/btrfs/extent-tree.c:3235 at __btrfs_free_extent.isra.0+0x453/0xfd0, CPU#1: btrfs/383844
Modules linked in: kvm_intel kvm irqbypass
[...]
---[ end trace 0000000000000000 ]---
BTRFS info (device vdc state EA): 2 enospc errors during balance
BTRFS info (device vdc state EA): balance: ended with status: -30
BTRFS error (device vdc state EA): parent transid verify failed on logical 30556160 mirror 2 wanted 8 found 6
BTRFS error (device vdc state EA): bdev /dev/nvme0n1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
[...]
assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938
------------[ cut here ]------------
assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938
kernel BUG at fs/btrfs/bio.c:938!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 868 Comm: kworker/u8:13 Tainted: G W N 6.19.0-rc6+ #4788 PREEMPT(full)
Tainted: [W]=WARN, [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
Workqueue: btrfs-endio simple_end_io_work
RIP: 0010:btrfs_repair_io_failure.cold+0xb2/0x120
RSP: 0000:ffffc90001d2bcf0 EFLAGS: 00010246
RAX: 0000000000000051 RBX: 0000000000001000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8305cf42 RDI: 00000000ffffffff
RBP: 0000000000000002 R08: 00000000fffeffff R09: ffffffff837fa988
R10: ffffffff8327a9e0 R11: 6f69747265737361 R12: ffff88813018d310
R13: ffff888168b8a000 R14: ffffc90001d2bd90 R15: ffff88810a169000
FS: 0000000000000000(0000) GS:ffff8885e752c000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
------------[ cut here ]------------
[CAUSE]
The cause of -ENOSPC error during the test case btrfs/124 is still
unknown, although it's known that we still have cases where metadata can
be over-committed but can not be fulfilled correctly, thus if we hit
such ENOSPC error inside a critical path, we have no choice but abort
the current transaction.
This will mark the fs read-only.
The problem is inside the btrfs_repair_io_failure() path that we require
the fs not to be mount read-only. This is normally fine, but if we are
doing a read-repair meanwhile the fs flips RO due to a critical error,
we can enter btrfs_repair_io_failure() with super block set to
read-only, thus triggering the above crash.
[FIX]
Just replace the ASSERT() with a proper return if the fs is already
read-only.
Reported-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-btrfs/20260126045555.GB31641@lst.de/
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Block group size classes are managed consistently everywhere.
Currently, btrfs_use_block_group_size_class() sets a block group's size
class to specialize it for a specific allocation size. However, this
size class remains "stale" even if the block group becomes completely
empty (both used and reserved bytes reach zero).
This happens in two scenarios:
1. When space reservations are freed (e.g., due to errors or transaction
aborts) via btrfs_free_reserved_bytes().
2. When the last extent in a block group is freed via
btrfs_update_block_group().
While size classes are advisory, a stale size class can cause
find_free_extent to unnecessarily skip candidate block groups during
initial search loops. This undermines the purpose of size classes to
reduce fragmentation by keeping block groups restricted to a specific
size class when they could be reused for any size.
Fix this by resetting the size class to BTRFS_BG_SZ_NONE whenever a
block group's used and reserved counts both reach zero. This ensures
that empty block groups are fully available for any allocation size in
the next cycle.
Fixes: 52bb7a2166 ("btrfs: introduce size class to block group allocator")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We search with offset (u64)-1 which should never match exactly.
Previously this was handled with BUG(). Now logs an error
and return -EUCLEAN.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Adarsh Das <adarshdas950@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We search with offset (u64)-1 which should never match exactly.
Previously the code silently returned success without setting the index
count. Now logs an error and return -EUCLEAN instead.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Adarsh Das <adarshdas950@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>,
Signed-off-by: David Sterba <dsterba@suse.com>
With PREEMPT_RT as potential configuration option, spinlock_t is now
considered as a sleeping lock, and thus might cause issues when used in
an atomic context. But even with PREEMPT_RT as potential configuration
option, raw_spinlock_t remains as a true spinning lock/atomic context.
This creates potential issues with the s390 debug/tracing feature. The
functions to trace errors are called in various contexts, including
under lock of raw_spinlock_t, and thus the used spinlock_t in each debug
area is in violation of the locking semantics.
Here are two examples involving failing PCI Read accesses that are
traced while holding `pci_lock` in `drivers/pci/access.c`:
=============================
[ BUG: Invalid wait context ]
6.19.0-devel #18 Not tainted
-----------------------------
bash/3833 is trying to lock:
0000027790baee30 (&rc->lock){-.-.}-{3:3}, at: debug_event_common+0xfc/0x300
other info that might help us debug this:
context-{5:5}
5 locks held by bash/3833:
#0: 0000027efbb29450 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x7c/0xf0
#1: 00000277f0504a90 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x13e/0x260
#2: 00000277beed8c18 (kn->active#339){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x164/0x260
#3: 00000277e9859190 (&dev->mutex){....}-{4:4}, at: pci_dev_lock+0x2e/0x40
#4: 00000383068a7708 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x4a/0xb0
stack backtrace:
CPU: 6 UID: 0 PID: 3833 Comm: bash Kdump: loaded Not tainted 6.19.0-devel #18 PREEMPTLAZY
Hardware name: IBM 9175 ME1 701 (LPAR)
Call Trace:
[<00000383048afec2>] dump_stack_lvl+0xa2/0xe8
[<00000383049ba166>] __lock_acquire+0x816/0x1660
[<00000383049bb1fa>] lock_acquire+0x24a/0x370
[<00000383059e3860>] _raw_spin_lock_irqsave+0x70/0xc0
[<00000383048bbb6c>] debug_event_common+0xfc/0x300
[<0000038304900b0a>] __zpci_load+0x17a/0x1f0
[<00000383048fad88>] pci_read+0x88/0xd0
[<00000383054cbce0>] pci_bus_read_config_dword+0x70/0xb0
[<00000383054d55e4>] pci_dev_wait+0x174/0x290
[<00000383054d5a3e>] __pci_reset_function_locked+0xfe/0x170
[<00000383054d9b30>] pci_reset_function+0xd0/0x100
[<00000383054ee21a>] reset_store+0x5a/0x80
[<0000038304e98758>] kernfs_fop_write_iter+0x1e8/0x260
[<0000038304d995da>] new_sync_write+0x13a/0x180
[<0000038304d9c5d0>] vfs_write+0x200/0x330
[<0000038304d9c88c>] ksys_write+0x7c/0xf0
[<00000383059cfa80>] __do_syscall+0x210/0x500
[<00000383059e4c06>] system_call+0x6e/0x90
INFO: lockdep is turned off.
=============================
[ BUG: Invalid wait context ]
6.19.0-devel #3 Not tainted
-----------------------------
bash/6861 is trying to lock:
0000009da05c7430 (&rc->lock){-.-.}-{3:3}, at: debug_event_common+0xfc/0x300
other info that might help us debug this:
context-{5:5}
5 locks held by bash/6861:
#0: 000000acff404450 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x7c/0xf0
#1: 000000acff41c490 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x13e/0x260
#2: 0000009da36937d8 (kn->active#75){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x164/0x260
#3: 0000009dd15250d0 (&zdev->state_lock){+.+.}-{4:4}, at: enable_slot+0x2e/0xc0
#4: 000001a19682f708 (pci_lock){....}-{2:2}, at: pci_bus_read_config_byte+0x42/0xa0
stack backtrace:
CPU: 16 UID: 0 PID: 6861 Comm: bash Kdump: loaded Not tainted 6.19.0-devel #3 PREEMPTLAZY
Hardware name: IBM 9175 ME1 701 (LPAR)
Call Trace:
[<000001a194837ec2>] dump_stack_lvl+0xa2/0xe8
[<000001a194942166>] __lock_acquire+0x816/0x1660
[<000001a1949431fa>] lock_acquire+0x24a/0x370
[<000001a19596b810>] _raw_spin_lock_irqsave+0x70/0xc0
[<000001a194843b6c>] debug_event_common+0xfc/0x300
[<000001a194888b0a>] __zpci_load+0x17a/0x1f0
[<000001a194882d88>] pci_read+0x88/0xd0
[<000001a195453b88>] pci_bus_read_config_byte+0x68/0xa0
[<000001a195457bc2>] pci_setup_device+0x62/0xad0
[<000001a195458e70>] pci_scan_single_device+0x90/0xe0
[<000001a19488a0f6>] zpci_bus_scan_device+0x46/0x80
[<000001a19547f958>] enable_slot+0x98/0xc0
[<000001a19547f134>] power_write_file+0xc4/0x110
[<000001a194e20758>] kernfs_fop_write_iter+0x1e8/0x260
[<000001a194d215da>] new_sync_write+0x13a/0x180
[<000001a194d245d0>] vfs_write+0x200/0x330
[<000001a194d2488c>] ksys_write+0x7c/0xf0
[<000001a195957a30>] __do_syscall+0x210/0x500
[<000001a19596cbb6>] system_call+0x6e/0x90
INFO: lockdep is turned off.
Since it is desired to keep it possible to create trace records in most
situations, including this particular case (failing PCI config space
accesses are relevant), convert the used spinlock_t in `struct
debug_info` to raw_spinlock_t.
The impact is small, as the debug area lock only protects bounded memory
access without external dependencies, apart from one function
debug_set_size() where kfree() is implicitly called with the lock held.
Move debug_info_free() out of this lock, to keep remove this external
dependency.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The accept_memory() and range_contains_unaccepted_memory() functions
employ a "guard page" logic to prevent crashes with load_unaligned_zeropad().
This logic extends the range to be accepted (or checked) by one unit_size
if the end of the range is aligned to a unit_size boundary.
However, if the caller passes a range that is not page-aligned, the
'end' of the range might not be numerically aligned to unit_size, even
if it covers the last page of a unit. This causes the "if (!(end % unit_size))"
check to fail, skipping the necessary extension and leaving the next
unit unaccepted, which can lead to a kernel panic when accessed by
load_unaligned_zeropad().
Align the start address down and the size up to the nearest page
boundary before performing the unit_size alignment check. This ensures
that the guard unit is correctly added when the range effectively ends
on a unit boundary.
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The reserve_unaccepted() function incorrectly calculates the size of the
memblock reservation for the unaccepted memory table. It aligns the
size of the table, but fails to account for cases where the table's
starting physical address (efi.unaccepted) is not page-aligned.
If the table starts at an offset within a page and its end crosses into
a subsequent page that the aligned size does not cover, the end of the
table will not be reserved. This can lead to the table being overwritten
or inaccessible, causing a kernel panic in accept_memory().
This issue was observed when starting Intel TDX VMs with specific memory
sizes (e.g., > 64GB).
Fix this by calculating the end address first (including the unaligned
start) and then aligning it up, ensuring the entire range is covered
by the reservation.
Fixes: 8dbe33956d ("efi/unaccepted: Make sure unaccepted table is mapped")
Reported-by: Moritz Sanft <ms@edgeless.systems>
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Over the years I've contributed patches to the EFI subsystem
mostly around TPM and EFI variables. Add me as a reviewer.
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The 'struct efivar_operations' is not modified by the driver after
initialization, so it should follow typical practice of being static
const for increased code safety and readability.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Dave reports that kexec may fail when the first kernel boots via the EFI
stub but without EFI runtime services, as in that case, the RSDP address
field in struct bootparams is never assigned. Kexec copies this value
into the version of struct bootparams that it provides to the incoming
kernel, which may have no other means to locate the ACPI root pointer.
So take the value from the EFI config tables if no root pointer has been
set in the first kernel's struct bootparams.
Fixes: a1b87d54f4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
Cc: <stable@vger.kernel.org> # v6.1
Reported-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Link: https://lore.kernel.org/linux-efi/aZQg_tRQmdKNadCg@darkstar.users.ipa.redhat.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
A recent patch moving the call of sparse_init() to common mm code
broke booting as a Xen PV guest.
Reason is that the Xen PV specific boot code relied on struct page area
being accessible rather early, but this changed by the move of the call
of sparse_init().
Fortunately the fix is rather easy: there is a static branch available
indicating whether struct page contents are usable by Xen. This static
branch just needs to be tested in some places for avoiding the access
of struct page.
Fixes: 4267739cab ("arch, mm: consolidate initialization of SPARSE memory model")
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20260214135035.119357-1-jgross@suse.com>
Currently if we export a GPIO over sysfs and unbind the parent GPIO
controller, the exported attribute will remain under /sys/class/gpio
because once we remove the parent device, we can no longer associate the
descriptor with it in gpiod_unexport() and never drop the final
reference.
Rework the teardown code: provide an unlocked variant of
gpiod_unexport() and remove all exported GPIOs with the sysfs_lock taken
before unregistering the parent device itself. This is done to prevent
any new exports happening before we unregister the device completely.
Cc: stable@vger.kernel.org
Fixes: 1cd53df733 ("gpio: sysfs: don't look up exported lines as class devices")
Link: https://patch.msgid.link/20260212133505.81516-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Using the remote firmware node for software node lookup is the right
thing to do. The GPIO controller we want to resolve should have the
software node we scooped out of the reference attached to it. However,
there are existing users who abuse the software node API by creating
dummy swnodes whose name is set to the expected label string of the GPIO
controller whose pins they want to control and use them in their local
swnode references as GPIO properties.
This used to work when we compared the software node's name to the
chip's label. When we switched to using a real fwnode lookup, these
users broke down because the firmware nodes in question were never
attached to the controllers they were looking for.
Restore the label matching as a fallback to fix the broken users but add
a big FIXME urging for a better solution.
Cc: stable@vger.kernel.org # v6.18, v6.19
Fixes: 216c120475 ("gpio: swnode: allow referencing GPIO chips by firmware nodes")
Link: https://lore.kernel.org/all/aYkdKfP5fg6iywgr@jekhomev/
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Link: https://patch.msgid.link/20260211085313.16792-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>