The ciphertext stealing logic in the AES-XTS implementation creates a
separate ksimd scope to call into the FP/SIMD core routines, and in some
cases (CONFIG_KASAN_STACK is one, but there might be others), the 528
byte kernel mode FP/SIMD buffer that is allocated inside this scope is
not shared with the preceding ksimd scope, resulting in unnecessary
stack bloat.
Considering that
a) the XTS ciphertext stealing logic is never called for block
encryption use cases, and XTS is rarely used for anything else,
b) in the vast majority of cases, the entire input block is processed
during the first iteration of the loop,
we can combine both ksimd scopes into a single one with no practical
impact on how often/how long FP/SIMD is en/disabled, allowing us to
reuse the same stack slot for both FP/SIMD routine calls.
Fixes: ba3c1b3b5a ("crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20251203163803.157541-5-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
BLAKE2b has a state of 16 64-bit words. Add the message data in and
there are 32 64-bit words. With the current code where all the rounds
are unrolled to enable constant-folding of the blake2b_sigma values,
this results in a very large code size on 32-bit kernels, including a
recurring issue where gcc uses a large amount of stack.
There's just not much benefit to this unrolling when the code is already
so large. Let's roll up the rounds when !CONFIG_64BIT.
To avoid having to duplicate the code, just write the code once using a
loop, and conditionally use 'unrolled_full' from <linux/unroll.h>.
Then, fold the now-unneeded ROUND() macro into the loop. Finally, also
remove the now-unneeded override of the stack frame size warning.
Code size improvements for blake2b_compress_generic():
Size before (bytes) Size after (bytes)
------------------- ------------------
i386, gcc 27584 3632
i386, clang 18208 3248
arm32, gcc 19912 2860
arm32, clang 21336 3344
Running the BLAKE2b benchmark on a !CONFIG_64BIT kernel on an x86_64
processor shows a 16384B throughput change of 351 => 340 MB/s (gcc) or
442 MB/s => 375 MB/s (clang). So clearly not much of a slowdown either.
But also that microbenchmark also effectively disregards cache usage,
which is important in practice and is far better in the smaller code.
Note: If we rolled up the loop on x86_64 too, the change would be
7024 bytes => 1584 bytes and 1960 MB/s => 1396 MB/s (gcc), or
6848 bytes => 1696 bytes and 1920 MB/s => 1263 MB/s (clang).
Maybe still worth it, though not quite as clearly beneficial.
Fixes: 91d689337f ("crypto: blake2b - add blake2b generic implementation")
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251205050330.89704-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Replace the RISCV_ISA_V dependency of the RISC-V crypto code with
RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS, which implies RISCV_ISA_V as
well as vector unaligned accesses being efficient.
This is necessary because this code assumes that vector unaligned
accesses are supported and are efficient. (It does so to avoid having
to use lots of extra vsetvli instructions to switch the element width
back and forth between 8 and either 32 or 64.)
This was omitted from the code originally just because the RISC-V kernel
support for detecting this feature didn't exist yet. Support has now
been added, but it's fragmented into per-CPU runtime detection, a
command-line parameter, and a kconfig option. The kconfig option is the
only reasonable way to do it, though, so let's just rely on that.
Fixes: eb24af5d7a ("crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}")
Fixes: bb54668837 ("crypto: riscv - add vector crypto accelerated ChaCha20")
Fixes: 600a3853df ("crypto: riscv - add vector crypto accelerated GHASH")
Fixes: 8c8e40470f ("crypto: riscv - add vector crypto accelerated SHA-{256,224}")
Fixes: b3415925a0 ("crypto: riscv - add vector crypto accelerated SHA-{512,384}")
Fixes: 563a5255af ("crypto: riscv - add vector crypto accelerated SM3")
Fixes: b8d06352bb ("crypto: riscv - add vector crypto accelerated SM4")
Cc: stable@vger.kernel.org
Reported-by: Vivian Wang <wangruikang@iscas.ac.cn>
Closes: https://lore.kernel.org/r/b3cfcdac-0337-4db0-a611-258f2868855f@iscas.ac.cn/
Reviewed-by: Jerry Shih <jerry.shih@sifive.com>
Link: https://lore.kernel.org/r/20251206213750.81474-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Pull i2c updates from Wolfram Sang:
- general cleanups in bcm2835, designware, pcf8584, and stm32
- amd-mp2: fix device refcount
- designware: avoid interrupt storms caused by bad firmware
- spacemit: fix device detection failures
- new devices: Intel Diamond Rapids, Rockchip RK3506, Qualcomm
Kaanapali and MSM8953
- minor fixes to i801, core documentation, elektor Kconfig dependencies
- at24 updates: add new compatible for Belling BL24S64
* tag 'i2c-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (21 commits)
i2c: qcom-cci: Add msm8953 compatible
i2c: spacemit: fix detect issue
i2c: amd-mp2: fix reference leak in MP2 PCI device
i2c: i2c.h: fix a bad kernel-doc line
i2c: i2c-elektor: Allow building on SMP kernels
dt-bindings: i2c: qcom-cci: Document Kaanapali compatible
dt-bindings: i2c: qcom-cci: Document msm8953 compatible
dt-bindings: eeprom: at24: Add compatible for Belling BL24S64
i2c: i801: Fix the Intel Diamond Rapids features
i2c: pcf8584: Change pcf_doAdress() to pcf_send_address()
i2c: pcf8584: Make pcf_doAddress() function void
i2c: pcf8584: Move 'ret' variable inside for loop, goto out if ret < 0.
i2c: designware: Disable SMBus interrupts to prevent storms from mis-configured firmware
dt-bindings: i2c: i2c-rk3x: Add compatible string for RK3506
i2c: i801: Add support for Intel Diamond Rapids
i2c: stm32: Omit two variable reassignments in stm32_i2c_dma_request()
i2c: designware: Omit a variable reassignment in dw_i2c_plat_probe()
i2c: pcf8584: Fix do not use assignment inside if conditional
i2c: pcf8584: Remove debug macros from i2c-algo-pcf.c
i2c: busses: bcm2835: convert from round_rate() to determine_rate()
...
Pull x86 platform driver updates from Ilpo Järvinen:
- acer-wmi: Add PH16-72, PHN16-72, and PT14-51 fan control support
- acpi: platform_profile: Add max-power profile option (power draw
limited by the cooling hardware, may exceed battery power draw limit
when on AC power)
- amd/hsmp: Allow more than one data-fabric per socket
- asus-armoury: Add WMI attributes driver to expose miscellaneous WMI
functions through fw_attributes (deprecates the custom BIOS features
interface through asus-wmi)
- asus-wmi: Use brightness_set_blocking() for kbd led
- ayaneo-ec: Add Ayaneo Embedded Controller driver
- fs/nls:
- Fix utf16 to utf8 string conversion when output size restricted
- Improve error code consistency for utf8 to utf32 conversions
- ideapad-laptop: Fast (Rapid Charge) charge type support
- intel/hid: Add Dell Pro Rugged 10/12 tablet to VGBS DMI quirks
- intel/pmc:
- Arrow Lake telemetry GUID improvements
- Add support for Wildcat Lake PMC information
- intel_pmc_ipc: Fix ACPI buffer memleak
- intel/punit_ipc: Fix memory corruption
- intel/vsec: Wildcat Lake PMT telemetry support
- lenovo-wmi-gamezone: Map "Extreme" performance mode to max-power
- lg-laptop: Add support for the HDAP opregion field
- serial-multi-instantiate: Add IRQ_RESOURCE_OPT for IRQ missing
projects
- thinkpad-t14s-ec: Improve suspend/resume support (lid LEDs, keyboard
backlight)
- uniwill: Add Uniwill laptop driver
- wmi: Move under drivers/platform/wmi as non-x86 WMI support is around
the corner and other WMI features will require adding more C files as
well
- tools/power/x86/intel-speed-select: v1.24
- Check feature status to check if the feature enablement was
successful
- Reset SST-TF bucket structure to display valid bucket info
- Miscellaneous cleanups / refactoring / improvements
* tag 'platform-drivers-x86-v6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (73 commits)
tools/power/x86/intel-speed-select: v1.24 release
tools/power/x86/intel-speed-select: Reset isst_turbo_freq_info for invalid buckets
tools/power/x86/intel-speed-select: Check feature status
platform/x86: asus-wmi: use brightness_set_blocking() for kbd led
fs/nls: Fix inconsistency between utf8_to_utf32() and utf32_to_utf8()
platform/x86: asus-armoury: add support for GA503QR
platform/x86: intel_pmc_ipc: fix ACPI buffer memory leak
platform/x86: hp-wmi: Order DMI board name arrays
platform/x86/intel/hid: Add Dell Pro Rugged 10/12 tablet to VGBS DMI quirks
platform: surface: replace use of system_wq with system_percpu_wq
platform: x86: replace use of system_wq with system_percpu_wq
platform/surface: acpi-notify: add WQ_PERCPU to alloc_workqueue users
platform/x86: wmi-gamezone: Add Legion Go 2 Quirks
platform/x86: lenovo-wmi-gamezone Use max-power rather than balanced-performance
acpi: platform_profile - Add max-power profile option
platform/x86/amd/pmf: Use devm_mutex_init() for mutex initialization
platform/x86/amd/pmf: Add BIOS_INPUTS_MAX macro to replace hardcoded array size
platform/x86: serial-multi-instantiate: Add IRQ_RESOURCE_OPT for IRQ missing projects
platform/x86/amd/pmf: Refactor repetitive BIOS output handling
platform/x86/uniwill: Add TUXEDO devices
...
Pull more power management updates from Rafael Wysocki:
"Fix a runtime PM unit test added during the 6.18 development cycle and
change the pm_runtime_barrier() return type to void (Brian Norris)"
* tag 'pm-6.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
coccinelle: Drop pm_runtime_barrier() error code checks
PM: runtime: Make pm_runtime_barrier() return void
PM: runtime: Stop checking pm_runtime_barrier() return code
Add a cond_lock annotation for lockref_put_or_lock to make sparse
happy with using it. Note that for this the return value has to be
double-inverted as the return value convention of lockref_put_or_lock
is inverted compared to _trylock conventions expected by __cond_lock,
as lockref_put_or_lock returns true when it did not need to take the
lock.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull __auto_type to auto conversion from Peter Anvin:
"Convert '__auto_type' to 'auto', defining a macro for 'auto' unless
C23+ is in use"
* tag 'auto-type-conversion-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-auto:
tools/virtio: replace "__auto_type" with "auto"
selftests/bpf: replace "__auto_type" with "auto"
arch/x86: replace "__auto_type" with "auto"
arch/nios2: replace "__auto_type" and adjacent equivalent with "auto"
fs/proc: replace "__auto_type" with "const auto"
include/linux: change "__auto_type" to "auto"
compiler_types.h: add "auto" as a macro for "__auto_type"
The newly added damos_test_commit() constructs multiple large structures
on the stack, which exceeds the warning limit in some cases:
In file included from mm/damon/core.c:2941:
mm/damon/tests/core-kunit.h: In function 'damos_test_commit':
mm/damon/tests/core-kunit.h:965:1: error: the frame size of 1520 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
Split this function up into two separate ones that are called
sequentially, so they can occupy the same stack slots.
Link: https://lkml.kernel.org/r/20251204100403.1034980-1-arnd@kernel.org
Fixes: 299a88f6ec ("mm/damon/tests/core-kunit: add damos_commit() test")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Quanmin Yan <yanquanmin1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The files in Documentation/core-api/ are by virtue of their top-level
directory part of the Documentation section in MAINTAINERS. Each file in
Documentation/core-api/ should however also have a further section in
MAINTAINERS it belongs to, which fits to the technical area of the
documented API in that file.
idr.rst provides some explanation to the ID allocation API defined in
lib/idr.c, which itself is part of the XARRAY section.
Add this core-api document to XARRAY.
Link: https://lkml.kernel.org/r/20251105105857.156950-1-lukas.bulwahn@redhat.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The function hugetlb_reserve_pages() returns the number of pages added
to the reservation map on success and a negative error code on failure
(e.g. -EINVAL, -ENOMEM). However, in some error paths, it may return -1
directly.
For example, a failure at:
if (hugetlb_acct_memory(h, gbl_reserve) < 0)
goto out_put_pages;
results in returning -1 (since add = -1), which may be misinterpreted
in userspace as -EPERM.
Fix this by explicitly capturing and propagating the return values from
helper functions, and using -EINVAL for all other failure cases.
Link: https://lkml.kernel.org/r/20251125171350.86441-1-skolothumtho@nvidia.com
Fixes: 986f5f2b4b ("mm/hugetlb: make hugetlb_reserve_pages() return nr of entries updated")
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Matthew R. Ochs <mochs@nvidia.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolin Chen <nicolinc@nvidia.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
min_order_for_split() returns -EBUSY when the folio is truncated and
cannot be split. In commit 77008e1b2e ("mm/huge_memory: do not change
split_huge_page*() target order silently"), memory_failure() does not
handle it and pass -EBUSY to try_to_split_thp_page() directly.
try_to_split_thp_page() returns -EINVAL since -EBUSY becomes 0xfffffff0 as
new_order is unsigned int in __folio_split() and this large new_order is
rejected as an invalid input. The code does not cause a bug.
soft_offline_in_use_page() also uses min_order_for_split() but it always
passes 0 as new_order for split.
Fix it by making min_order_for_split() always return an order. When the
given folio is truncated, namely folio->mapping == NULL, return 0 and let
a subsequent split function handle the situation and return -EBUSY.
Add kernel-doc to min_order_for_split() to clarify its use.
Link: https://lkml.kernel.org/r/20251126210618.1971206-4-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Improve folio split related functions", v4.
This patchset improves several folio split related functions to avoid
future misuse. The changes are:
1. Consolidated folio splittable checks by moving truncated folio check,
huge zero folio check, and writeback folio check into
folio_split_supported(). Changed the function return type. Renamed it
to folio_check_splittable() for clarification.
2. Replaced can_split_folio() with open coded folio_expected_ref_count()
and folio_ref_count() and introduced folio_cache_ref_count().
3. Changed min_order_for_split() to always return an order.
4. Fixed folio split stats counting.
Motivation
==========
This is based on Wei's observation[1] and solves several potential
issues:
1. Dereferencing NULL folio->mapping in try_folio_split_to_order() if it
is called on truncated folios.
2. Not handling of negative return value of min_order_for_split() in
mm/memory-failure.c
There is no bug in the current code.
This patch (of 4):
folio_split_supported() used in try_folio_split_to_order() requires
folio->mapping to be non NULL, but current try_folio_split_to_order() does
not check it. There is no issue in the current code, since
try_folio_split_to_order() is only used in truncate_inode_partial_folio(),
where folio->mapping is not NULL.
To prevent future misuse, move folio->mapping NULL check (i.e., folio is
truncated) into folio_split_supported(). Since folio->mapping NULL check
returns -EBUSY and folio_split_supported() == false means -EINVAL, change
folio_split_supported() return type from bool to int and return error
numbers accordingly. Rename folio_split_supported() to
folio_check_splittable() to match the return type change.
While at it, move is_huge_zero_folio() check and folio_test_writeback()
check into folio_check_splittable() and add kernel-doc.
Remove all warnings inside folio_check_splittable() and give warnings
in __folio_split() instead, so that bool warns parameter can be removed.
Link: https://lkml.kernel.org/r/20251126210618.1971206-1-ziy@nvidia.com
Link: https://lkml.kernel.org/r/20251126210618.1971206-2-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Balbir Singh <balbirs@nvidia.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
blk_mq_{add,del}_queue_tag_set() functions add and remove queues from
tagset, the functions make sure that tagset and queues are marked as
shared when two or more queues are attached to the same tagset.
Initially a tagset starts as unshared and when the number of added
queues reaches two, blk_mq_add_queue_tag_set() marks it as shared along
with all the queues attached to it. When the number of attached queues
drops to 1 blk_mq_del_queue_tag_set() need to mark both the tagset and
the remaining queues as unshared.
Both functions need to freeze current queues in tagset before setting on
unsetting BLK_MQ_F_TAG_QUEUE_SHARED flag. While doing so, both functions
hold set->tag_list_lock mutex, which makes sense as we do not want
queues to be added or deleted in the process. This used to work fine
until commit 98d81f0df7 ("nvme: use blk_mq_[un]quiesce_tagset")
made the nvme driver quiesce tagset instead of quiscing individual
queues. blk_mq_quiesce_tagset() does the job and quiesce the queues in
set->tag_list while holding set->tag_list_lock also.
This results in deadlock between two threads with these stacktraces:
__schedule+0x47c/0xbb0
? timerqueue_add+0x66/0xb0
schedule+0x1c/0xa0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.constprop.0+0x271/0x600
blk_mq_quiesce_tagset+0x25/0xc0
nvme_dev_disable+0x9c/0x250
nvme_timeout+0x1fc/0x520
blk_mq_handle_expired+0x5c/0x90
bt_iter+0x7e/0x90
blk_mq_queue_tag_busy_iter+0x27e/0x550
? __blk_mq_complete_request_remote+0x10/0x10
? __blk_mq_complete_request_remote+0x10/0x10
? __call_rcu_common.constprop.0+0x1c0/0x210
blk_mq_timeout_work+0x12d/0x170
process_one_work+0x12e/0x2d0
worker_thread+0x288/0x3a0
? rescuer_thread+0x480/0x480
kthread+0xb8/0xe0
? kthread_park+0x80/0x80
ret_from_fork+0x2d/0x50
? kthread_park+0x80/0x80
ret_from_fork_asm+0x11/0x20
__schedule+0x47c/0xbb0
? xas_find+0x161/0x1a0
schedule+0x1c/0xa0
blk_mq_freeze_queue_wait+0x3d/0x70
? destroy_sched_domains_rcu+0x30/0x30
blk_mq_update_tag_set_shared+0x44/0x80
blk_mq_exit_queue+0x141/0x150
del_gendisk+0x25a/0x2d0
nvme_ns_remove+0xc9/0x170
nvme_remove_namespaces+0xc7/0x100
nvme_remove+0x62/0x150
pci_device_remove+0x23/0x60
device_release_driver_internal+0x159/0x200
unbind_store+0x99/0xa0
kernfs_fop_write_iter+0x112/0x1e0
vfs_write+0x2b1/0x3d0
ksys_write+0x4e/0xb0
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x4b/0x53
The top stacktrace is showing nvme_timeout() called to handle nvme
command timeout. timeout handler is trying to disable the controller and
as a first step, it needs to blk_mq_quiesce_tagset() to tell blk-mq not
to call queue callback handlers. The thread is stuck waiting for
set->tag_list_lock as it tries to walk the queues in set->tag_list.
The lock is held by the second thread in the bottom stack which is
waiting for one of queues to be frozen. The queue usage counter will
drop to zero after nvme_timeout() finishes, and this will not happen
because the thread will wait for this mutex forever.
Given that [un]quiescing queue is an operation that does not need to
sleep, update blk_mq_[un]quiesce_tagset() to use RCU instead of taking
set->tag_list_lock, update blk_mq_{add,del}_queue_tag_set() to use RCU
safe list operations. Also, delete INIT_LIST_HEAD(&q->tag_set_list)
in blk_mq_del_queue_tag_set() because we can not re-initialize it while
the list is being traversed under RCU. The deleted queue will not be
added/deleted to/from a tagset and it will be freed in blk_free_queue()
after the end of RCU grace period.
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Fixes: 98d81f0df7 ("nvme: use blk_mq_[un]quiesce_tagset")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
__bio_for_each_segment() uses the returned struct bio_vec's bv_len field
to advance the struct bvec_iter at the end of each loop iteration. So
it's incorrect to modify it during the loop. Don't assign to bv_len (or
bv_offset, for that matter) in ublk_copy_user_pages().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: e87d66ab27 ("ublk: use rq_for_each_segment() for user copy")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that all potential callers of bio_chain_endio have been
eliminated, completely prohibit any future calls to this function.
Suggested-by: Ming Lei <ming.lei@redhat.com>
Suggested-by: Andreas Gruenbacher <agruenba@redhat.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't call bio->bi_end_io() directly. Use the bio_endio() helper
function instead, which handles completion more safely and uniformly.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
GCC notices that the 16-byte uabi_name field could theoretically be too
small for the formatted string if the instance number exceeds 100.
So grow the field to 20 bytes.
drivers/gpu/drm/i915/intel_memory_region.c: In function ‘intel_memory_region_create’:
drivers/gpu/drm/i915/intel_memory_region.c:273:61: error: ‘%u’ directive output may be truncated writing between 1 and 5 bytes into a region of size between 3 and 11 [-Werror=format-truncation=]
273 | snprintf(mem->uabi_name, sizeof(mem->uabi_name), "%s%u",
| ^~
drivers/gpu/drm/i915/intel_memory_region.c:273:58: note: directive argument in the range [0, 65535]
273 | snprintf(mem->uabi_name, sizeof(mem->uabi_name), "%s%u",
| ^~~~~~
drivers/gpu/drm/i915/intel_memory_region.c:273:9: note: ‘snprintf’ output between 7 and 19 bytes into a destination of size 16
273 | snprintf(mem->uabi_name, sizeof(mem->uabi_name), "%s%u",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
274 | intel_memory_type_str(type), instance);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 3b38d35157 ("drm/i915: Add stable memory region names")
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://lore.kernel.org/r/20251205113500.684286-2-ardb@kernel.org
(cherry picked from commit 18476087f1)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
For events with inherit_stat enabled, a "read" event will be generated
to collect per task event counts on task exit.
The call chain is as follows:
do_exit
-> perf_event_exit_task
-> perf_event_exit_task_context
-> perf_event_exit_event
-> perf_remove_from_context
-> perf_child_detach
-> sync_child_event
-> perf_event_read_event
However, the child event context detaches the task too early in
perf_event_exit_task_context, which causes sync_child_event to never
generate the read event in this case, since child_event->ctx->task is
always set to TASK_TOMBSTONE. Fix that by moving context lock section
backward to ensure ctx->task is not set to TASK_TOMBSTONE before
generating the read event.
Because perf_event_free_task calls perf_event_exit_task_context with
exit = false to tear down all child events from the context, and the
task never lived, accessing the task PID can lead to a use-after-free.
To fix that, let sync_child_event read task from argument and move the
call to the only place it should be triggered to avoid the effect of
setting ctx->task to TASK_TOMESTONE, and add a task parameter to
perf_event_exit_event to trigger the sync_child_event properly when
needed.
This bug can be reproduced by running "perf record -s" and attaching to
any program that generates perf events in its child tasks. If we check
the result with "perf report -T", the last line of the report will leave
an empty table like "# PID TID", which is expected to contain the
per-task event counts by design.
Fixes: ef54c1a476 ("perf: Rework perf_event_exit_event()")
Signed-off-by: Thaumy Cheng <thaumy.love@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linux-perf-users@vger.kernel.org
Link: https://patch.msgid.link/20251209041600.963586-1-thaumy.love@gmail.com
Group is_permission_fault() with is_translation_fault(), which is
needed to use is_permission_fault() in __do_kernel_fault(). As
this is static inline, there is no need for this to be under
CONFIG_MMU.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
In the inline assembly inside load_unaligned_zeropad(), the "addr" is
constrained as input-only operand. The compiler assumes that on exit
from the asm statement these operands contain the same values as they
had before executing the statement, but when kernel page fault happened, the assembly fixup code "bic %2 %2, #0x3" modify the value of "addr", which may lead to an unexpected behavior.
Use a temporary variable "tmp" to handle it, instead of modifying the
input-only operand, just like what arm64's load_unaligned_zeropad()
does.
Fixes: b9a50f7490 ("ARM: 7450/1: dcache: select DCACHE_WORD_ACCESS for little-endian ARMv6+ CPUs")
Co-developed-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Signed-off-by: Liyuan Pang <pangliyuan1@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Pull smb client updates from Steve French:
- multichannel fixes, including enabling ability to change multichannel
settings with remount
- debugging improvements: adding additional tracepoints, improving log
messages
- cleanup, including restructuring some of the transport layer for the
client to make it clearer, and cleanup of status code table to be
more consistent with protocol documentation
- fixes for reads that start beyond end of file use cases
- fix to backoff reconnects to reduce reconnect storms
- locking improvement for getting mid entries
- fixes for missing status code error mappings
- performance improvement for status code to error mappings
* tag 'v6.19-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (22 commits)
smb/client: update some SMB2 status strings
cifs: Remove dead function prototypes
smb/client: add two elements to smb2_error_map_table array
smb: rename to STATUS_SMB_NO_PREAUTH_INTEGRITY_HASH_OVERLAP
smb/client: remove unused elements from smb2_error_map_table array
smb/client: reduce loop count in map_smb2_to_linux_error() by half
smb: client: Add tracepoint for krb5 auth
smb: client: improve error message when creating SMB session
smb: client: relax session and tcon reconnect attempts
cifs: Fix handling of a beyond-EOF DIO/unbuffered read over SMB2
cifs: client: allow changing multichannel mount options on remount
cifs: Do some preparation prior to organising the function declarations
cifs: Add a tracepoint to log EIO errors
cifs: Don't need state locking in smb2_get_mid_entry()
cifs: Remove the server pointer from smb_message
cifs: Fix specification of function pointers
cifs: Replace SendReceiveBlockingLock() with SendReceive() plus flags
cifs: Clean up some places where an extra kvec[] was required for rfc1002
cifs: Make smb1's SendReceive() wrap cifs_send_recv()
cifs: Remove the RFC1002 header from smb_hdr
...
The global gpio_shared_lock has caused some issues when recursively
removing GPIO shared proxies. The thing is: we don't really need it.
Once created from an init-call, the shared GPIO data structures are
never removed, there's no need to protect the linked lists of entries
and references. All we need to protect is the shared GPIO descriptor
(which we already do with a per-entry mutex) and the auxiliary device
data in struct gpio_shared_ref.
Remove the global gpio_shared_lock and use a per-reference mutex to
protect shared references when adding/removing the auxiliary devices and
their GPIO lookup entries.
Link: https://lore.kernel.org/r/20251206-gpio-shared-teardown-fixes-v1-4-35ac458cfce1@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Initializing automatic __free variables to NULL without need (e.g.
branches with different allocations), followed by actual allocation is
in contrary to explicit coding rules guiding cleanup.h:
"Given that the "__free(...) = NULL" pattern for variables defined at
the top of the function poses this potential interdependency problem the
recommendation is to always define and assign variables in one statement
and not group variable definitions at the top of the function when
__free() is used."
Code does not have a bug, but is less readable and uses discouraged
coding practice, so fix that by moving declaration to the place of
assignment.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20251208020807.5043-2-krzysztof.kozlowski@oss.qualcomm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull f2fs updates from Jaegeuk Kim:
"This series focuses on minor clean-ups and performance optimizations
across sysfs, documentation, debugfs, tracepoints, slab allocation,
and GC. Furthermore, it resolves several corner-case bugs caught by
xfstests, as well as issues related to 16KB page support and
f2fs_enable_checkpoint.
Enhancement:
- wrap ASCII tables in literal blocks to fix LaTeX build
- optimize trace_f2fs_write_checkpoint with enums
- support to show curseg.next_blkoff in debugfs
- add a sysfs entry to show max open zones
- add fadvise tracepoint
- use global inline_xattr_slab instead of per-sb slab cache
- set default valid_thresh_ratio to 80 for zoned devices
- maintain one time GC mode is enabled during whole zoned GC cycle
Bug fix:
- ensure node page reads complete before f2fs_put_super() finishes
- do not account invalid blocks in get_left_section_blocks()
- revert summary entry count from 2048 to 512 in 16kb block support
- detect recoverable inode during dryrun of find_fsync_dnodes()
- fix age extent cache insertion skip on counter overflow
- add sanity checks before unlinking and loading inodes
- ensure minimum trim granularity accounts for all devices
- block cache/dio write during f2fs_enable_checkpoint()
- propagate error from f2fs_enable_checkpoint()
- invalidate dentry cache on failed whiteout creation
- avoid updating compression context during writeback
- avoid updating zero-sized extent in extent cache
- avoid potential deadlock"
* tag 'f2fs-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (39 commits)
f2fs: ignore discard return value
f2fs: optimize trace_f2fs_write_checkpoint with enums
f2fs: fix to not account invalid blocks in get_left_section_blocks()
f2fs: support to show curseg.next_blkoff in debugfs
docs: f2fs: wrap ASCII tables in literal blocks to fix LaTeX build
f2fs: expand scalability of f2fs mount option
f2fs: change default schedule timeout value
f2fs: introduce f2fs_schedule_timeout()
f2fs: use memalloc_retry_wait() as much as possible
f2fs: add a sysfs entry to show max open zones
f2fs: wrap all unusable_blocks_per_sec code in CONFIG_BLK_DEV_ZONED
f2fs: simplify list initialization in f2fs_recover_fsync_data()
f2fs: revert summary entry count from 2048 to 512 in 16kb block support
f2fs: fix to detect recoverable inode during dryrun of find_fsync_dnodes()
f2fs: fix return value of f2fs_recover_fsync_data()
f2fs: add fadvise tracepoint
f2fs: fix age extent cache insertion skip on counter overflow
f2fs: Add sanity checks before unlinking and loading inodes
f2fs: Rename f2fs_unlink exit label
f2fs: ensure minimum trim granularity accounts for all devices
...
If scsi_dh_attached_handler_name() fails to allocate the handler name,
dm-multipath (its only caller) assumes there is no attached device
handler, and sets the device up incorrectly. Return an error pointer
instead, so multipath can distinguish between failure, success where
there is no attached device handler, or when the path device is not a
SCSI device at all.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Link: https://patch.msgid.link/20251206010015.1595225-1-bmarzins@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>