Pull fsverity fixes from Eric Biggers:
- Fix a build error on parisc
- Remove the non-large-folio-aware function fsverity_verify_page()
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fsverity: fix build error by adding fsverity_readahead() stub
fsverity: remove fsverity_verify_page()
f2fs: make f2fs_verify_cluster() partially large-folio-aware
f2fs: remove unnecessary ClearPageUptodate in f2fs_verify_cluster()
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.
As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull kmalloc_obj conversion from Kees Cook:
"This does the tree-wide conversion to kmalloc_obj() and friends using
coccinelle, with a subsequent small manual cleanup of whitespace
alignment that coccinelle does not handle.
This uncovered a clang bug in __builtin_counted_by_ref(), so the
conversion is preceded by disabling that for current versions of
clang. The imminent clang 22.1 release has the fix.
I've done allmodconfig build tests for x86_64, arm64, i386, and arm. I
did defconfig builds for alpha, m68k, mips, parisc, powerpc, riscv,
s390, sparc, sh, arc, csky, xtensa, hexagon, and openrisc"
* tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kmalloc_obj: Clean up after treewide replacements
treewide: Replace kmalloc with kmalloc_obj for non-scalar types
compiler_types: Disable __builtin_counted_by_ref for Clang
Pull io_uring fixes from Jens Axboe:
- A fix for a missing URING_CMD128 opcode check, fixing an issue with
the SQE mixed mode support introduced in 6.19. Merged late due to
having multiple dependencies
- Add sqe->cmd size checking for big SQEs, similar to what we have for
normal sized SQEs
- Fix a race condition in zcrx, that leads to a double free
* tag 'io_uring-20260221' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring: Add size check for sqe->cmd
io_uring: add IORING_OP_URING_CMD128 to opcode checks
io_uring/zcrx: fix user_ref race between scrub and refill paths
Pull smb server fixes from Steve French:
"Two small fixes:
- fix potential deadlock
- minor cleanup"
* tag 'v7.0-rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: call ksmbd_vfs_kern_path_end_removing() on some error paths
smb: server: Remove duplicate include of misc.h
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
Pull btrfs fixes from David Sterba:
- multiple error handling fixes of unexpected conditions
- reset block group size class once it becomes empty so that
its class can be changed
- error message level adjustments
- fixes of returned error values
- use correct block reserve for delayed refs
* tag 'for-7.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix invalid leaf access in btrfs_quota_enable() if ref key not found
btrfs: fix lost error return in btrfs_find_orphan_roots()
btrfs: fix lost return value on error in finish_verity()
btrfs: change unaligned root messages to error level in btrfs_validate_super()
btrfs: use the correct type to initialize block reserve for delayed refs
btrfs: do not ASSERT() when the fs flips RO inside btrfs_repair_io_failure()
btrfs: reset block group size class when it becomes empty
btrfs: replace BUG() with error handling in __btrfs_balance()
btrfs: handle unexpected exact match in btrfs_set_inode_index_count()
Pull ecryptfs updates from Tyler Hicks:
"This consists of some really minor typo fixes that fell through the
cracks and some more recent code cleanups:
- Comment typo fixes
- Removal of an unused function declaration
- Use strscpy() instead of the deprecated strcpy()
- Use string copying helpers instead of memcpy() and manually
terminating strings"
* tag 'ecryptfs-7.0-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
ecryptfs: Replace memcpy + NUL termination in ecryptfs_copy_filename
ecryptfs: Drop redundant NUL terminations after calling ecryptfs_to_hex
ecryptfs: Replace memcpy + NUL termination in ecryptfs_new_file_context
ecryptfs: Replace strcpy with strscpy in ecryptfs_validate_options
ecryptfs: Replace strcpy with strscpy in ecryptfs_cipher_code_to_string
ecryptfs: Replace strcpy with strscpy in ecryptfs_set_default_crypt_stat_vals
ecryptfs: simplify list initialization in ecryptfs_parse_packet_set()
ecryptfs: Remove unused declartion ecryptfs_fill_zeros()
ecryptfs: Fix packet format comment in parse_tag_67_packet()
ecryptfs: comment typo fix
ecryptfs: keystore: Fix typo 'the the' in comment
For SQE128, sqe->cmd provides 80 bytes for uring_cmd. Add macro to
check if size of user struct does not exceed 80 bytes at compile time.
User doesn't have to track this manually during development.
Replace io_uring_sqe_cmd() inline func with macro and add
io_uring_sqe128_cmd() which checks struct
size for 16 bytes cmd and 80 bytes cmd respectively.
Signed-off-by: Govindarajulu Varadarajan <govind.varadar@gmail.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull more MM updates from Andrew Morton:
- "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
couple of issues in the demotion code - pages were failed demotion
and were finding themselves demoted into disallowed nodes (Bing Jiao)
- "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
mapledtree race and performs a number of cleanups (Liam Howlett)
- "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
them" implements a lot of cleanups following on from the conversion
of the VMA flags into a bitmap (Lorenzo Stoakes)
- "support batch checking of references and unmapping for large folios"
implements batching to greatly improve the performance of reclaiming
clean file-backed large folios (Baolin Wang)
- "selftests/mm: add memory failure selftests" does as claimed (Miaohe
Lin)
* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
mm/page_alloc: clear page->private in free_pages_prepare()
selftests/mm: add memory failure dirty pagecache test
selftests/mm: add memory failure clean pagecache test
selftests/mm: add memory failure anonymous page test
mm: rmap: support batched unmapping for file large folios
arm64: mm: implement the architecture-specific clear_flush_young_ptes()
arm64: mm: support batch clearing of the young flag for large folios
arm64: mm: factor out the address and ptep alignment into a new helper
mm: rmap: support batched checks of the references for large folios
tools/testing/vma: add VMA userland tests for VMA flag functions
tools/testing/vma: separate out vma_internal.h into logical headers
tools/testing/vma: separate VMA userland tests into separate files
mm: make vm_area_desc utilise vma_flags_t only
mm: update all remaining mmap_prepare users to use vma_flags_t
mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
mm: update secretmem to use VMA flags on mmap_prepare
mm: update hugetlbfs to use VMA flags on mmap_prepare
mm: add basic VMA flag operation helper functions
tools: bitmap: add missing bitmap_[subset(), andnot()]
mm: add mk_vma_flags() bitmap flag macro helper
...
Pull sysctl updates from Joel Granados:
- Remove macros from proc handler converters
Replace the proc converter macros with "regular" functions. Though it
is more verbose than the macro version, it helps when debugging and
better aligns with coding-style.rst.
- General cleanup
Remove superfluous ctl_table forward declarations. Const qualify the
memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays.
Add missing kernel doc to proc_dointvec_conv.
- Testing
This series was run through sysctl selftests/kunit test suite in
x86_64. And went into linux-next after rc4, giving it a good 3 weeks
of testing
* tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions
sysctl: Replace unidirectional INT converter macros with functions
sysctl: Add kernel doc to proc_douintvec_conv
sysctl: Replace UINT converter macros with functions
sysctl: Add CONFIG_PROC_SYSCTL guards for converter macros
sysctl: clarify proc_douintvec_minmax doc
sysctl: Return -ENOSYS from proc_douintvec_conv when CONFIG_PROC_SYSCTL=n
sysctl: Remove unused ctl_table forward declarations
loadpin: Implement custom proc_handler for enforce
alloc_tag: move memory_allocation_profiling_sysctls into .rodata
sysctl: Add missing kernel-doc for proc_dointvec_conv
If btrfs_search_slot_for_read() returns 1, it means we did not find any
key greater than or equals to the key we asked for, meaning we have
reached the end of the tree and therefore the path is not valid. If
this happens we need to break out of the loop and stop, instead of
continuing and accessing an invalid path.
Fixes: 5223cc60b4 ("btrfs: drop the path before adding qgroup items when enabling qgroups")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If the root nodes for the chunk root, tree root or log root are not sector
size aligned, we are logging a warning message but these are in fact
errors that makes the super block validation fail. So change the level of
the messages from warning to error.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When initializing the delayed refs block reserve for a transaction handle
we are passing a type of BTRFS_BLOCK_RSV_DELOPS, which is meant for
delayed items and not for delayed refs. The correct type for delayed refs
is BTRFS_BLOCK_RSV_DELREFS.
On release of any excess space reserved in a local delayed refs reserve,
we also should transfer that excess space to the global block reserve
(it it's full, we return to the space info for general availability).
By initializing a transaction's local delayed refs block reserve with a
type of BTRFS_BLOCK_RSV_DELOPS, we were also causing any excess space
released from the delayed block reserve (fs_info->delayed_block_rsv, used
for delayed inodes and items) to be transferred to the global block
reserve instead of the global delayed refs block reserve. This was an
unintentional change in commit 28270e25c6 ("btrfs: always reserve space
for delayed refs when starting transaction"), but it's not particularly
serious as things tend to cancel out each other most of the time and it's
relatively rare to be anywhere near exhaustion of the global reserve.
Fix this by initializing a transaction's local delayed refs reserve with
a type of BTRFS_BLOCK_RSV_DELREFS and making btrfs_block_rsv_release()
attempt to transfer unused space from such a reserve into the global block
reserve, just as we did before that commit for when the block reserve is
a delayed refs rsv.
Reported-by: Alex Lyakas <alex.lyakas@zadara.com>
Link: https://lore.kernel.org/linux-btrfs/CAOcd+r0FHG5LWzTSu=LknwSoqxfw+C00gFAW7fuX71+Z5AfEew@mail.gmail.com/
Fixes: 28270e25c6 ("btrfs: always reserve space for delayed refs when starting transaction")
Reviewed-by: Alex Lyakas <alex.lyakas@zadara.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that when btrfs hits ENOSPC error in a critical
path, btrfs flips RO (this part is expected, although the ENOSPC bug
still needs to be addressed).
The problem is after the RO flip, if there is a read repair pending, we
can hit the ASSERT() inside btrfs_repair_io_failure() like the following:
BTRFS info (device vdc): relocating block group 30408704 flags metadata|raid1
------------[ cut here ]------------
BTRFS: Transaction aborted (error -28)
WARNING: fs/btrfs/extent-tree.c:3235 at __btrfs_free_extent.isra.0+0x453/0xfd0, CPU#1: btrfs/383844
Modules linked in: kvm_intel kvm irqbypass
[...]
---[ end trace 0000000000000000 ]---
BTRFS info (device vdc state EA): 2 enospc errors during balance
BTRFS info (device vdc state EA): balance: ended with status: -30
BTRFS error (device vdc state EA): parent transid verify failed on logical 30556160 mirror 2 wanted 8 found 6
BTRFS error (device vdc state EA): bdev /dev/nvme0n1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
[...]
assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938
------------[ cut here ]------------
assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938
kernel BUG at fs/btrfs/bio.c:938!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 868 Comm: kworker/u8:13 Tainted: G W N 6.19.0-rc6+ #4788 PREEMPT(full)
Tainted: [W]=WARN, [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
Workqueue: btrfs-endio simple_end_io_work
RIP: 0010:btrfs_repair_io_failure.cold+0xb2/0x120
RSP: 0000:ffffc90001d2bcf0 EFLAGS: 00010246
RAX: 0000000000000051 RBX: 0000000000001000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8305cf42 RDI: 00000000ffffffff
RBP: 0000000000000002 R08: 00000000fffeffff R09: ffffffff837fa988
R10: ffffffff8327a9e0 R11: 6f69747265737361 R12: ffff88813018d310
R13: ffff888168b8a000 R14: ffffc90001d2bd90 R15: ffff88810a169000
FS: 0000000000000000(0000) GS:ffff8885e752c000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
------------[ cut here ]------------
[CAUSE]
The cause of -ENOSPC error during the test case btrfs/124 is still
unknown, although it's known that we still have cases where metadata can
be over-committed but can not be fulfilled correctly, thus if we hit
such ENOSPC error inside a critical path, we have no choice but abort
the current transaction.
This will mark the fs read-only.
The problem is inside the btrfs_repair_io_failure() path that we require
the fs not to be mount read-only. This is normally fine, but if we are
doing a read-repair meanwhile the fs flips RO due to a critical error,
we can enter btrfs_repair_io_failure() with super block set to
read-only, thus triggering the above crash.
[FIX]
Just replace the ASSERT() with a proper return if the fs is already
read-only.
Reported-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-btrfs/20260126045555.GB31641@lst.de/
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Block group size classes are managed consistently everywhere.
Currently, btrfs_use_block_group_size_class() sets a block group's size
class to specialize it for a specific allocation size. However, this
size class remains "stale" even if the block group becomes completely
empty (both used and reserved bytes reach zero).
This happens in two scenarios:
1. When space reservations are freed (e.g., due to errors or transaction
aborts) via btrfs_free_reserved_bytes().
2. When the last extent in a block group is freed via
btrfs_update_block_group().
While size classes are advisory, a stale size class can cause
find_free_extent to unnecessarily skip candidate block groups during
initial search loops. This undermines the purpose of size classes to
reduce fragmentation by keeping block groups restricted to a specific
size class when they could be reused for any size.
Fix this by resetting the size class to BTRFS_BG_SZ_NONE whenever a
block group's used and reserved counts both reach zero. This ensures
that empty block groups are fully available for any allocation size in
the next cycle.
Fixes: 52bb7a2166 ("btrfs: introduce size class to block group allocator")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We search with offset (u64)-1 which should never match exactly.
Previously this was handled with BUG(). Now logs an error
and return -EUCLEAN.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Adarsh Das <adarshdas950@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We search with offset (u64)-1 which should never match exactly.
Previously the code silently returned success without setting the index
count. Now logs an error and return -EUCLEAN instead.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Adarsh Das <adarshdas950@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>,
Signed-off-by: David Sterba <dsterba@suse.com>
f2fs_verify_cluster() is the only remaining caller of the
non-large-folio-aware function fsverity_verify_page(). To unblock the
removal of that function, change f2fs_verify_cluster() to verify the
entire folio of each page and mark it up-to-date.
Note that this doesn't actually make f2fs_verify_cluster()
large-folio-aware, as it is still passed an array of pages. Currently,
it's never called with large folios.
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20260218010630.7407-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Pull ntfs3 updates from Konstantin Komarov:
"New code:
- improve readahead for bitmap initialization and large directory scans
- fsync files by syncing parent inodes
- drop of preallocated clusters for sparse and compressed files
- zero-fill folios beyond i_valid in ntfs_read_folio()
- implement llseek SEEK_DATA/SEEK_HOLE by scanning data runs
- implement iomap-based file operations
- allow explicit boolean acl/prealloc mount options
- fall-through between switch labels
- delayed-allocation (delalloc) support
Fixes:
- check return value of indx_find to avoid infinite loop
- initialize new folios before use
- infinite loop in attr_load_runs_range on inconsistent metadata
- infinite loop triggered by zero-sized ATTR_LIST
- ntfs_mount_options leak in ntfs_fill_super()
- deadlock in ni_read_folio_cmpr
- circular locking dependency in run_unpack_ex
- prevent infinite loops caused by the next valid being the same
- restore NULL folio initialization in ntfs_writepages()
- slab-out-of-bounds read in DeleteIndexEntryRoot
Updates:
- allow readdir() to finish after directory mutations without rewinddir()
- handle attr_set_size() errors when truncating files
- make ntfs_writeback_ops static
- refactor duplicate kmemdup pattern in do_action()
- avoid calling run_get_entry() when run == NULL in ntfs_read_run_nb_ra()
Replaced:
- use wait_on_buffer() directly
- rename ni_readpage_cmpr into ni_read_folio_cmpr"
* tag 'ntfs3_for_7.0' of https://github.com/Paragon-Software-Group/linux-ntfs3: (26 commits)
fs/ntfs3: add delayed-allocation (delalloc) support
fs/ntfs3: avoid calling run_get_entry() when run == NULL in ntfs_read_run_nb_ra()
fs/ntfs3: add fall-through between switch labels
fs/ntfs3: allow explicit boolean acl/prealloc mount options
fs/ntfs3: Fix slab-out-of-bounds read in DeleteIndexEntryRoot
ntfs3: Restore NULL folio initialization in ntfs_writepages()
ntfs3: Refactor duplicate kmemdup pattern in do_action()
fs/ntfs3: prevent infinite loops caused by the next valid being the same
fs/ntfs3: make ntfs_writeback_ops static
ntfs3: fix circular locking dependency in run_unpack_ex
fs/ntfs3: implement iomap-based file operations
fs/ntfs3: fix deadlock in ni_read_folio_cmpr
fs/ntfs3: implement llseek SEEK_DATA/SEEK_HOLE by scanning data runs
fs/ntfs3: zero-fill folios beyond i_valid in ntfs_read_folio()
fs/ntfs3: handle attr_set_size() errors when truncating files
fs/ntfs3: drop preallocated clusters for sparse and compressed files
fs/ntfs3: fsync files by syncing parent inodes
fs/ntfs3: fix ntfs_mount_options leak in ntfs_fill_super()
fs/ntfs3: allow readdir() to finish after directory mutations without rewinddir()
fs/ntfs3: improve readahead for bitmap initialization and large directory scans
...
Pull ceph updates from Ilya Dryomov:
"This adds support for the upcoming aes256k key type in CephX that is
based on Kerberos 5 and brings a bunch of assorted CephFS fixes from
Ethan and Sam. One of Sam's patches in particular undoes a change in
the fscrypt area that had an inadvertent side effect of making CephFS
behave as if mounted with wsize=4096 and leading to the corresponding
degradation in performance, especially for sequential writes"
* tag 'ceph-for-7.0-rc1' of https://github.com/ceph/ceph-client:
ceph: assert loop invariants in ceph_writepages_start()
ceph: remove error return from ceph_process_folio_batch()
ceph: fix write storm on fscrypted files
ceph: do not propagate page array emplacement errors as batch errors
ceph: supply snapshot context in ceph_uninline_data()
ceph: supply snapshot context in ceph_zero_partial_object()
libceph: adapt ceph_x_challenge_blob hashing and msgr1 message signing
libceph: add support for CEPH_CRYPTO_AES256KRB5
libceph: introduce ceph_crypto_key_prepare()
libceph: generalize ceph_x_encrypt_offset() and ceph_x_encrypt_buflen()
libceph: define and enforce CEPH_MAX_KEY_LEN
Pull overlayfs update from Amir Goldstein:
"Relax the semantics of uuid=off to cater to a use case of overlayfs
lower layers on btrfs clones, whose UUID are ephemeral and an upper
layer on a different filesystem"
* tag 'ovl-update-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: relax requirement for uuid=off,index=on
Pull smb client fixes from Steve French:
- Fix three potential double free vulnerabilities
- Fix data corruption due to racy lease checks
- Enforce SMB1 signing verification checks
- Fix invalid mount option parsing
- Remove unneeded tracepoint
- Various minor error code corrections
- Minor cleanup
* tag 'v7.0-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: terminate session upon failed client required signing
cifs: some missing initializations on replay
cifs: remove unnecessary tracing after put tcon
cifs: update internal module version number
smb: client: fix data corruption due to racy lease checks
smb/client: move NT_STATUS_MORE_ENTRIES
smb/client: rename to NT_ERROR_INVALID_DATATYPE
smb/client: rename to NT_STATUS_SOME_NOT_MAPPED
smb/client: map NT_STATUS_PRIVILEGE_NOT_HELD
smb/client: map NT_STATUS_MORE_PROCESSING_REQUIRED
smb/client: map NT_STATUS_BUFFER_OVERFLOW
smb/client: map NT_STATUS_NOTIFY_ENUM_DIR
cifs: SMB1 split: Remove duplicate include of cifs_debug.h
smb: client: fix regression with mount options parsing
Pull more misc vfs updates from Christian Brauner:
"Features:
- Optimize close_range() from O(range size) to O(active FDs) by using
find_next_bit() on the open_fds bitmap instead of linearly scanning
the entire requested range. This is a significant improvement for
large-range close operations on sparse file descriptor tables.
- Add FS_XFLAG_VERITY file attribute for fs-verity files, retrievable
via FS_IOC_FSGETXATTR and file_getattr(). The flag is read-only.
Add tracepoints for fs-verity enable and verify operations,
replacing the previously removed debug printk's.
- Prevent nfsd from exporting special kernel filesystems like pidfs
and nsfs. These filesystems have custom ->open() and ->permission()
export methods that are designed for open_by_handle_at(2) only and
are incompatible with nfsd. Update the exportfs documentation
accordingly.
Fixes:
- Fix KMSAN uninit-value in ovl_fill_real() where strcmp() was used
on a non-null-terminated decrypted directory entry name from
fscrypt. This triggered on encrypted lower layers when the
decrypted name buffer contained uninitialized tail data.
The fix also adds VFS-level name_is_dot(), name_is_dotdot(), and
name_is_dot_dotdot() helpers, replacing various open-coded "." and
".." checks across the tree.
- Fix read-only fsflags not being reset together with xflags in
vfs_fileattr_set(). Currently harmless since no read-only xflags
overlap with flags, but this would cause inconsistencies for any
future shared read-only flag
- Return -EREMOTE instead of -ESRCH from PIDFD_GET_INFO when the
target process is in a different pid namespace. This lets userspace
distinguish "process exited" from "process in another namespace",
matching glibc's pidfd_getpid() behavior
Cleanups:
- Use C-string literals in the Rust seq_file bindings, replacing the
kernel::c_str!() macro (available since Rust 1.77)
- Fix typo in d_walk_ret enum comment, add porting notes for the
readlink_copy() calling convention change"
* tag 'vfs-7.0-rc1.misc.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: add porting notes about readlink_copy()
pidfs: return -EREMOTE when PIDFD_GET_INFO is called on another ns
nfsd: do not allow exporting of special kernel filesystems
exportfs: clarify the documentation of open()/permission() expotrfs ops
fsverity: add tracepoints
fs: add FS_XFLAG_VERITY for fs-verity files
rust: seq_file: replace `kernel::c_str!` with C-Strings
fs: dcache: fix typo in enum d_walk_ret comment
ovl: use name_is_dot* helpers in readdir code
fs: add helpers name_is_dot{,dot,_dotdot}
ovl: Fix uninit-value in ovl_fill_real
fs: reset read-only fsflags together with xflags
fs/file: optimize close_range() complexity from O(N) to O(Sparse)
Pull pidfs updates from Christian Brauner:
- pid: introduce task_ppid_vnr() helper
- pidfs: convert rb-tree to rhashtable
Mateusz reported performance penalties during task creation because
pidfs uses pidmap_lock to add elements into the rbtree. Switch to an
rhashtable to have separate fine-grained locking and to decouple from
pidmap_lock moving all heavy manipulations outside of it
Also move inode allocation outside of pidmap_lock. With this there's
nothing happening for pidfs under pidmap_lock
- pid: reorder fields in pid_namespace to reduce false sharing
- Revert "pid: make __task_pid_nr_ns(ns => NULL) safe for zombie
callers"
- ipc: Add SPDX license id to mqueue.c
* tag 'kernel-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
pid: introduce task_ppid_vnr() helper
pidfs: implement ino allocation without the pidmap lock
Revert "pid: make __task_pid_nr_ns(ns => NULL) safe for zombie callers"
pid: reorder fields in pid_namespace to reduce false sharing
pidfs: convert rb-tree to rhashtable
ipc: Add SPDX license id to mqueue.c
This patch implements delayed allocation (delalloc) in ntfs3 driver.
It introduces an in-memory delayed-runlist (run_da) and the helpers to
track, reserve and later convert those delayed reservations into real
clusters at writeback time. The change keeps on-disk formats untouched and
focuses on pagecache integration, correctness and safe interaction with
fallocate, truncate, and dio/iomap paths.
Key points:
- add run_da (delay-allocated run tree) and bookkeeping for delayed clusters.
- mark ranges as delalloc (DELALLOC_LCN) instead of immediately allocating.
Actual allocation performed later (writeback / attr_set_size_ex / explicit
flush paths).
- direct i/o / iomap paths updated to avoid dio collisions with
delalloc: dio falls back or forces allocation of delayed blocks before
proceeding.
- punch/collapse/truncate/fallocate check and cancel delay-alloc reservations.
Sparse/compressed files handled specially.
- free-space checks updated (ntfs_check_free_space) to account for reserved
delalloc clusters and MFT record budgeting.
- delayed allocations are committed on last writer (file release) and on
explicit allocation flush paths.
Tested-by: syzbot@syzkaller.appspotmail.com
Reported-by: syzbot+2bd8e813c7f767aa9bb1@syzkaller.appspotmail.com
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Currently, when smb signature verification fails, the behaviour is to log
the failure without any action to terminate the session.
Call cifs_reconnect() when client required signature verification fails.
Otherwise, log the error without reconnecting.
Signed-off-by: Aaditya Kansal <aadityakansal390@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In several places in the code, we have a label to signify
the start of the code where a request can be replayed if
necessary. However, some of these places were missing the
necessary reinitializations of certain local variables
before replay.
This change makes sure that these variables get initialized
after the label.
Cc: stable@vger.kernel.org
Reported-by: Yuchan Nam <entropy1110@gmail.com>
Tested-by: Yuchan Nam <entropy1110@gmail.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
This code was recently changed from manually decrementing
tcon ref to using cifs_put_tcon. But even before that change
this tracing happened after decrementing the ref count, which
is wrong. With cifs_put_tcon, tracing already happens inside it.
So just removing the extra tracing here.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
There are two places where ksmbd_vfs_kern_path_end_removing() needs to be
called in order to balance what the corresponding successful call to
ksmbd_vfs_kern_path_start_removing() has done, i.e. drop inode locks and
put the taken references. Otherwise there might be potential deadlocks
and unbalanced locks which are caught like:
BUG: workqueue leaked lock or atomic: kworker/5:21/0x00000000/7596
last function: handle_ksmbd_work
2 locks held by kworker/5:21/7596:
#0: ffff8881051ae448 (sb_writers#3){.+.+}-{0:0}, at: ksmbd_vfs_kern_path_locked+0x142/0x660
#1: ffff888130e966c0 (&type->i_mutex_dir_key#3/1){+.+.}-{4:4}, at: ksmbd_vfs_kern_path_locked+0x17d/0x660
CPU: 5 PID: 7596 Comm: kworker/5:21 Not tainted 6.1.162-00456-gc29b353f383b #138
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
Workqueue: ksmbd-io handle_ksmbd_work
Call Trace:
<TASK>
dump_stack_lvl+0x44/0x5b
process_one_work.cold+0x57/0x5c
worker_thread+0x82/0x600
kthread+0x153/0x190
ret_from_fork+0x22/0x30
</TASK>
Found by Linux Verification Center (linuxtesting.org).
Fixes: d5fc1400a3 ("smb/server: avoid deadlock when linking with ReplaceIfExists")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull exfat updates from Namjae Jeon:
- Improve error code handling and four cleanups
- Reduce unnecessary valid_size extension during mmap write to avoid
over-extending writes
- Optimize consecutive FAT entry reads by caching buffer heads in
__exfat_ent_get to significantly reduce sb_bread() calls
- Add multi-cluster (contiguous cluster) support to exfat_get_cluster()
and exfat_map_cluster() for better sequential read performance,
especially on small cluster sizes
* tag 'exfat-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: add blank line after declarations
exfat: remove unnecessary else after return statement
exfat: support multi-cluster for exfat_get_cluster
exfat: return the start of next cache in exfat_cache_lookup
exfat: tweak cluster cache to support zero offset
exfat: support multi-cluster for exfat_map_cluster
exfat: remove handling of non-file types in exfat_map_cluster
exfat: reuse cache to improve exfat_get_cluster
exfat: reduce the number of parameters for exfat_get_cluster()
exfat: remove the unreachable warning for cache miss cases
exfat: remove the check for infinite cluster chain loop
exfat: improve exfat_find_last_cluster
exfat: improve exfat_count_num_clusters
exfat: support reuse buffer head for exfat_ent_get
exfat: add cache option for __exfat_ent_get
exfat: reduce unnecessary writes during mmap write
exfat: improve error code handling in exfat_find_empty_entry()
Pull f2fs updates from Jaegeuk Kim:
"In this development cycle, we focused on several key performance
optimizations:
- introducing large folio support to enhance read speeds for
immutable files
- reducing checkpoint=enable latency by flushing only committed dirty
pages
- implementing tracepoints to diagnose and resolve lock priority
inversion.
Additionally, we introduced the packed_ssa feature to optimize the SSA
footprint when utilizing large block sizes.
Detail summary:
Enhancements:
- support large folio for immutable non-compressed case
- support non-4KB block size without packed_ssa feature
- optimize f2fs_enable_checkpoint() to avoid long delay
- optimize f2fs_overwrite_io() for f2fs_iomap_begin
- optimize NAT block loading during checkpoint write
- add write latency stats for NAT and SIT blocks in
f2fs_write_checkpoint
- pin files do not require sbi->writepages lock for ordering
- avoid f2fs_map_blocks() for consecutive holes in readpages
- flush plug periodically during GC to maximize readahead effect
- add tracepoints to catch lock overheads
- add several sysfs entries to tune internal lock priorities
Fixes:
- fix lock priority inversion issue
- fix incomplete block usage in compact SSA summaries
- fix to show simulate_lock_timeout correctly
- fix to avoid mapping wrong physical block for swapfile
- fix IS_CHECKPOINTED flag inconsistency issue caused by
concurrent atomic commit and checkpoint writes
- fix to avoid UAF in f2fs_write_end_io()"
* tag 'f2fs-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (61 commits)
f2fs: sysfs: introduce critical_task_priority
f2fs: introduce trace_f2fs_priority_update
f2fs: fix lock priority inversion issue
f2fs: optimize f2fs_overwrite_io() for f2fs_iomap_begin
f2fs: fix incomplete block usage in compact SSA summaries
f2fs: decrease maximum flush retry count in f2fs_enable_checkpoint()
f2fs: optimize NAT block loading during checkpoint write
f2fs: change size parameter of __has_cursum_space() to unsigned int
f2fs: add write latency stats for NAT and SIT blocks in f2fs_write_checkpoint
f2fs: pin files do not require sbi->writepages lock for ordering
f2fs: fix to show simulate_lock_timeout correctly
f2fs: introduce FAULT_SKIP_WRITE
f2fs: check skipped write in f2fs_enable_checkpoint()
Revert "f2fs: add timeout in f2fs_enable_checkpoint()"
f2fs: fix to unlock folio in f2fs_read_data_large_folio()
f2fs: fix error path handling in f2fs_read_data_large_folio()
f2fs: use folio_end_read
f2fs: fix to avoid mapping wrong physical block for swapfile
f2fs: avoid f2fs_map_blocks() for consecutive holes in readpages
f2fs: advance index and offset after zeroing in large folio read
...
Pull MM fixes from Andrew Morton:
"Three MM hotfixes, all three are cc:stable"
* tag 'mm-hotfixes-stable-2026-02-13-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
procfs: fix possible double mmput() in do_procmap_query()
mm/page_alloc: skip debug_check_no_{obj,locks}_freed with FPI_TRYLOCK
mm/hugetlb: restore failed global reservations to subpool
Pull NFS client updates from Anna Schumaker:
"New Features:
- Use an LRU list for returning unused delegations
- Introduce a KConfig option to disable NFS v4.0 and make NFS v4.1
the default
Bugfixes:
- NFS/localio:
- Handle short writes by retrying
- Prevent direct reclaim recursion into NFS via nfs_writepages
- Use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit
- Remove -EAGAIN handling in nfs_local_doio()
- pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN
- fs/nfs: Fix a readdir slow-start regression
- SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path
Other cleanups and improvements:
- A few other NFS/localio cleanups
- Various other delegation handling cleanups from Christoph
- Unify security_inode_listsecurity() calls
- Improvements to NFSv4 lease handling
- Clean up SUNRPC *_debug fields when CONFIG_SUNRPC_DEBUG is not set"
* tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (60 commits)
SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path
nfs: nfs4proc: Convert comma to semicolon
SUNRPC: Change list definition method
sunrpc: rpc_debug and others are defined even if CONFIG_SUNRPC_DEBUG unset
NFSv4: limit lease period in nfs4_set_lease_period()
NFSv4: pass lease period in seconds to nfs4_set_lease_period()
nfs: unify security_inode_listsecurity() calls
fs/nfs: Fix readdir slow-start regression
pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN
NFS: fix delayed delegation return handling
NFS: simplify error handling in nfs_end_delegation_return
NFS: fold nfs_abort_delegation_return into nfs_end_delegation_return
NFS: remove the delegation == NULL check in nfs_end_delegation_return
NFS: use bool for the issync argument to nfs_end_delegation_return
NFS: return void from ->return_delegation
NFS: return void from nfs4_inode_make_writeable
NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4
NFS: Add a way to disable NFS v4.0 via KConfig
NFS: Move sequence slot operations into minorversion operations
NFS: Pass a struct nfs_client to nfs4_init_sequence()
...
Customer reported data corruption in some of their files. It turned
out the client would end up calling cacheless IO functions while
having RHW lease, bypassing the pagecache and then leaving gaps in the
file while writing to it. It was related to concurrent opens changing
the lease state while having writes in flight. Lease breaks and
re-opens due to reconnect could also cause same issue.
Fix this by serialising the lease updates with
cifsInodeInfo::open_file_lock. When handling oplock break, make sure
to use the downgraded oplock value rather than one in cifsInodeinfo as
it could be changed concurrently.
Reported-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>