linux-cryptodev-2.6

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git synced 2026-05-02 18:17:50 -04:00

Author	SHA1	Message	Date
Linus Torvalds	8113b3998d	Merge tag 'vfs-7.0-rc1.atomic_open' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs atomic_open updates from Christian Brauner: "Allow knfsd to use atomic_open() While knfsd offers combined exclusive create and open results to clients, on some filesystems those results are not atomic. The separate vfs_create() + vfs_open() sequence in dentry_create() can produce races and unexpected errors. For example, open O_CREAT with mode 0 will succeed in creating the file but return -EACCES from vfs_open(). Additionally, network filesystems benefit from reducing remote round-trip operations by using a single atomic_open() call. Teach dentry_create() -- whose sole caller is knfsd -- to use atomic_open() for filesystems that support it" * tag 'vfs-7.0-rc1.atomic_open' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/namei: fix kernel-doc markup for dentry_create VFS/knfsd: Teach dentry_create() to use atomic_open() VFS: Prepare atomic_open() for dentry_create() VFS: move dentry_create() from fs/open.c to fs/namei.c	2026-02-09 14:25:37 -08:00
Linus Torvalds	f0b9d8eb98	Merge tag 'nfsd-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "A set of NFSD fixes for stable that arrived after the merge window: - Remove an invalid NFS status code - Fix an fstests failure when using pNFS - Fix a UAF in v4_end_grace() - Fix the administrative interface used to revoke NFSv4 state - Fix a memory leak reported by syzbot" * tag 'nfsd-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: net ref data still needs to be freed even if net hasn't startup nfsd: check that server is running in unlock_filesystem nfsd: use correct loop termination in nfsd4_revoke_states() nfsd: provide locking for v4_end_grace NFSD: Fix permission check for read access to executable-only files NFSD: Remove NFSERR_EAGAIN	2026-01-06 09:12:52 -08:00
Chuck Lever	c6c209ceb8	NFSD: Remove NFSERR_EAGAIN I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881. None of these RFCs have an NFS status code that match the numeric value "11". Based on the meaning of the EAGAIN errno, I presume the use of this status in NFSD means NFS4ERR_DELAY. So replace the one usage of nfserr_eagain, and remove it from NFSD's NFS status conversion tables. As far as I can tell, NFSERR_EAGAIN has existed since the pre-git era, but was not actually used by any code until commit `f4e44b3933` ("NFSD: delay unmount source's export after inter-server copy completed."), at which time it become possible for NFSD to return a status code of 11 (which is not valid NFS protocol). Fixes: `f4e44b3933` ("NFSD: delay unmount source's export after inter-server copy completed.") Cc: stable@vger.kernel.org Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2026-01-02 13:43:41 -05:00
Benjamin Coddington	64a989dbd1	VFS/knfsd: Teach dentry_create() to use atomic_open() While knfsd offers combined exclusive create and open results to clients, on some filesystems those results may not be atomic. This behavior can be observed. For example, an open O_CREAT with mode 0 will succeed in creating the file but unexpectedly return -EACCES from vfs_open(). Additionally reducing the number of remote RPC calls required for O_CREAT on network filesystem provides a performance benefit in the open path. Teach knfsd's helper dentry_create() to use atomic_open() for filesystems that support it. The previously const @path is passed up to atomic_open() and may be modified depending on whether an existing entry was found or if the atomic_open() returned an error and consumed the passed-in dentry. Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com> Link: https://patch.msgid.link/8e449bfb64ab055abb9fd82641a171531415a88c.1764259052.git.bcodding@hammerspace.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-15 14:12:45 +01:00
Linus Torvalds	a8058f8442	Merge tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull directory locking updates from Christian Brauner: "This contains the work to add centralized APIs for directory locking operations. This series is part of a larger effort to change directory operation locking to allow multiple concurrent operations in a directory. The ultimate goal is to lock the target dentry(s) rather than the whole parent directory. To help with changing the locking protocol, this series centralizes locking and lookup in new helper functions. The helpers establish a pattern where it is the dentry that is being locked and unlocked (currently the lock is held on dentry->d_parent->d_inode, but that can change in the future). This also changes vfs_mkdir() to unlock the parent on failure, as well as dput()ing the dentry. This allows end_creating() to only require the target dentry (which may be IS_ERR() after vfs_mkdir()), not the parent" * tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nfsd: fix end_creating() conversion VFS: introduce end_creating_keep() VFS: change vfs_mkdir() to unlock on failure. ecryptfs: use new start_creating/start_removing APIs Add start_renaming_two_dentries() VFS/ovl/smb: introduce start_renaming_dentry() VFS/nfsd/ovl: introduce start_renaming() and end_renaming() VFS: add start_creating_killable() and start_removing_killable() VFS: introduce start_removing_dentry() smb/server: use end_removing_noperm for for target of smb2_create_link() VFS: introduce start_creating_noperm() and start_removing_noperm() VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() VFS: tidy up do_unlinkat() VFS: introduce start_dirop() and end_dirop() debugfs: rename end_creating() to debugfs_end_creating()	2025-12-01 16:13:46 -08:00
Linus Torvalds	db74a7d02a	Merge tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull directory delegations update from Christian Brauner: "This contains the work for recall-only directory delegations for knfsd. Add support for simple, recallable-only directory delegations. This was decided at the fall NFS Bakeathon where the NFS client and server maintainers discussed how to merge directory delegation support. The approach starts with recallable-only delegations for several reasons: 1. RFC8881 has gaps that are being addressed in RFC8881bis. In particular, it requires directory position information for CB_NOTIFY callbacks, which is difficult to implement properly under Linux. The spec is being extended to allow that information to be omitted. 2. Client-side support for CB_NOTIFY still lags. The client side involves heuristics about when to request a delegation. 3. Early indication shows simple, recallable-only delegations can help performance. Anna Schumaker mentioned seeing a multi-minute speedup in xfstests runs with them enabled. With these changes, userspace can also request a read lease on a directory that will be recalled on conflicting accesses. This may be useful for applications like Samba. Users can disable leases altogether via the fs.leases-enable sysctl if needed. VFS changes: - Dedicated Type for Delegations Introduce struct delegated_inode to track inodes that may have delegations that need to be broken. This replaces the previous approach of passing raw inode pointers through the delegation breaking code paths, providing better type safety and clearer semantics for the delegation machinery. - Break parent directory delegations in open(..., O_CREAT) codepath - Allow mkdir to wait for delegation break on parent - Allow rmdir to wait for delegation break on parent - Add try_break_deleg calls for parents to vfs_link(), vfs_rename(), and vfs_unlink() - Make vfs_create(), vfs_mknod(), and vfs_symlink() break delegations on parent directory - Clean up argument list for vfs_create() - Expose delegation support to userland Filelock changes: - Make lease_alloc() take a flags argument - Rework the __break_lease API to use flags - Add struct delegated_inode - Push the S_ISREG check down to ->setlease handlers - Lift the ban on directory leases in generic_setlease NFSD changes: - Allow filecache to hold S_IFDIR files - Allow DELEGRETURN on directories - Wire up GET_DIR_DELEGATION handling Fixes: - Fix kernel-doc warnings in __fcntl_getlease - Add needed headers for new struct delegation definition" * tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: add needed headers for new struct delegation definition filelock: __fcntl_getlease: fix kernel-doc warnings vfs: expose delegation support to userland nfsd: wire up GET_DIR_DELEGATION handling nfsd: allow DELEGRETURN on directories nfsd: allow filecache to hold S_IFDIR files filelock: lift the ban on directory leases in generic_setlease vfs: make vfs_symlink break delegations on parent dir vfs: make vfs_mknod break delegations on parent directory vfs: make vfs_create break delegations on parent directory vfs: clean up argument list for vfs_create() vfs: break parent dir delegations in open(..., O_CREAT) codepath vfs: allow rmdir to wait for delegation break on parent vfs: allow mkdir to wait for delegation break on parent vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink} filelock: push the S_ISREG check down to ->setlease handlers filelock: add struct delegated_inode filelock: rework the __break_lease API to use flags filelock: make lease_alloc() take a flags argument	2025-12-01 15:34:41 -08:00
NeilBrown	fe497f0759	VFS: change vfs_mkdir() to unlock on failure. vfs_mkdir() already drops the reference to the dentry on failure but it leaves the parent locked. This complicates end_creating() which needs to unlock the parent even though the dentry is no longer available. If we change vfs_mkdir() to unlock on failure as well as releasing the dentry, we can remove the "parent" arg from end_creating() and simplify the rules for calling it. Note that cachefiles_get_directory() can choose to substitute an error instead of actually calling vfs_mkdir(), for fault injection. In that case it needs to call end_creating(), just as vfs_mkdir() now does on error. ovl_create_real() will now unlock on error. So the conditional end_creating() after the call is removed, and end_creating() is called internally on error. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-15-neilb@ownmail.net Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-14 13:15:58 +01:00
NeilBrown	7ab96df840	VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() start_creating() is similar to simple_start_creating() but is not so simple. It takes a qstr for the name, includes permission checking, and does NOT report an error if the name already exists, returning a positive dentry instead. This is currently used by nfsd, cachefiles, and overlayfs. end_creating() is called after the dentry has been used. end_creating() drops the reference to the dentry as it is generally no longer needed. This is exactly the first section of end_creating_path() so that function is changed to call the new end_creating() These calls help encapsulate locking rules so that directory locking can be changed. Occasionally this change means that the parent lock is held for a shorter period of time, for example in cachefiles_commit_tmpfile(). As this function now unlocks after an unlink and before the following lookup, it is possible that the lookup could again find a positive dentry, so a while loop is introduced there. In overlayfs the ovl_lookup_temp() function has ovl_tempname() split out to be used in ovl_start_creating_temp(). The other use of ovl_lookup_temp() is preparing for a rename. When rename handling is updated, ovl_lookup_temp() will be removed. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-5-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-14 13:15:56 +01:00
Jeff Layton	8b99f6a8c1	nfsd: wire up GET_DIR_DELEGATION handling Add a new routine for acquiring a read delegation on a directory. These are recallable-only delegations with no support for CB_NOTIFY. That will be added in a later phase. Since the same CB_RECALL/DELEGRETURN infrastructure is used for regular and directory delegations, a normal nfs4_delegation is used to represent a directory delegation. Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-16-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-12 09:38:37 +01:00
Chuck Lever	3e7f011c25	Revert "NFSD: Remove the cap on number of operations per NFSv4 COMPOUND" I've found that pynfs COMP6 now leaves the connection or lease in a strange state, which causes CLOSE9 to hang indefinitely. I've dug into it a little, but I haven't been able to root-cause it yet. However, I bisected to commit `48aab1606f` ("NFSD: Remove the cap on number of operations per NFSv4 COMPOUND"). Tianshuo Han also reports a potential vulnerability when decoding an NFSv4 COMPOUND. An attacker can place an arbitrarily large op count in the COMPOUND header, which results in: [ 51.410584] nfsd: vmalloc error: size 1209533382144, exceeds total pages, mode:0xdc0(GFP_KERNEL\|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 when NFSD attempts to allocate the COMPOUND op array. Let's restore the operation-per-COMPOUND limit, but increased to 200 for now. Reported-by: tianshuo han <hantianshuo233@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Tested-by: Tianshuo Han <hantianshuo233@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-21 11:03:50 -04:00
Chuck Lever	abb1f08a21	NFSD: Fix crash in nfsd4_read_release() When tracing is enabled, the trace_nfsd_read_done trace point crashes during the pynfs read.testNoFh test. Fixes: `15a8b55dbb` ("nfsd: call op_release, even when op_func returns an error") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-21 11:03:19 -04:00
Chuck Lever	d6e80d48f9	NFSD: Do the grace period check in ->proc_layoutget RFC 8881 Section 18.43.3 states: > If the metadata server is in a grace period, and does not persist > layouts and device ID to device address mappings, then it MUST > return NFS4ERR_GRACE (see Section 8.4.2.1). Jeff observed that this suggests the grace period check is better done by the individual layout type implementations, because checking for the server grace period is unnecessary for some layout types. Suggested-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/linux-nfs/7h5p5ktyptyt37u6jhpbjfd5u6tg44lriqkdc7iz7czeeabrvo@ijgxz27dw4sg/T/#t Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Sergey Bashirov	e0963ce53b	NFSD: Allow layoutcommit during grace period If the loca_reclaim field is set to TRUE, this indicates that the client is attempting to commit changes to a layout after the restart of the metadata server during the metadata server's recovery grace period. This type of request may be necessary when the client has uncommitted writes to provisionally allocated byte-ranges of a file that were sent to the storage devices before the restart of the metadata server. See RFC 8881, section 18.42.3. Without this, the client is not able to increase the file size and commit preallocated extents when the block/scsi layout server is restarted during a write and is in a grace period. And when the grace period ends, the client also cannot perform layoutcommit because the old layout state becomes invalid, resulting in file corruption. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Sergey Bashirov	db155b7c7c	NFSD: Disallow layoutget during grace period When the server is recovering from a reboot and is in a grace period, any operation that may result in deletion or reallocation of block extents should not be allowed. See RFC 8881, section 18.43.3. If multiple clients write data to the same file, rebooting the server during writing may result in file corruption. In the worst case, the exported XFS may also become corrupted. Observed this behavior while testing pNFS block volume setup. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-25 10:01:24 -04:00
Thorsten Blum	ab1c282c01	NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul() Commit `5304877936` ("NFSD: Fix strncpy() fortify warning") replaced strncpy(,, sizeof(..)) with strlcpy(,, sizeof(..) - 1), but strlcpy() already guaranteed NUL-termination of the destination buffer and subtracting one byte potentially truncated the source string. The incorrect size was then carried over in commit `72f78ae00a` ("NFSD: move from strlcpy with unused retval to strscpy") when switching from strlcpy() to strscpy(). Fix this off-by-one error by using the full size of the destination buffer again. Cc: stable@vger.kernel.org Fixes: `5304877936` ("NFSD: Fix strncpy() fortify warning") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	e5e9b24ab8	nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation Instead of allowing the ctime to roll backward with a WRITE_ATTRS delegation, set FMODE_NOCMTIME on the file and have it skip mtime and ctime updates. It is possible that the client will never send a SETATTR to set the times before returning the delegation. Add two new bools to struct nfs4_delegation: dl_written: tracks whether the file has been written since the delegation was granted. This is set in the WRITE and LAYOUTCOMMIT handlers. dl_setattr: tracks whether the client has sent at least one valid mtime that can also update the ctime in a SETATTR. When unlocking the lease for the delegation, clear FMODE_NOCMTIME. If the file has been written, but no setattr for the delegated mtime and ctime has been done, update the timestamps to current_time(). Suggested-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	3952f1cbcb	nfsd: fix SETATTR updates for delegated timestamps SETATTRs containing delegated timestamp updates are currently not being vetted properly. Since we no longer need to compare the timestamps vs. the current timestamps, move the vetting of delegated timestamps wholly into nfsd. Rename the set_cb_time() helper to nfsd4_vet_deleg_time(), and make it non-static. Add a new vet_deleg_attrs() helper that is called from nfsd4_setattr that uses nfsd4_vet_deleg_time() to properly validate the all the timestamps. If the validation indicates that the update should be skipped, unset the appropriate flags in ia_valid. Fixes: `7e13f4f8d2` ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Sergey Bashirov	d68886bae7	NFSD: Fix last write offset handling in layoutcommit The data type of loca_last_write_offset is newoffset4 and is switched on a boolean value, no_newoffset, that indicates if a previous write occurred or not. If no_newoffset is FALSE, an offset is not given. This means that client does not try to update the file size. Thus, server should not try to calculate new file size and check if it fits into the segment range. See RFC 8881, section 12.5.4.2. Sometimes the current incorrect logic may cause clients to hang when trying to sync an inode. If layoutcommit fails, the client marks the inode as dirty again. Fixes: `9cf514ccfa` ("nfsd: implement pNFS operations") Cc: stable@vger.kernel.org Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Sergey Bashirov	f963cf2b91	NFSD: Implement large extent array support in pNFS When pNFS client in the block or scsi layout mode sends layoutcommit to MDS, a variable length array of modified extents is supplied within the request. This patch allows the server to accept such extent arrays if they do not fit within single memory page. The issue can be reproduced when writing to a 1GB file using FIO with O_DIRECT, 4K block and large I/O depth without preallocation of the file. In this case, the server returns NFSERR_BADXDR to the client. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Sergey Bashirov	274365a51d	NFSD: Minor cleanup in layoutcommit processing Remove dprintk in nfsd4_layoutcommit. These are not needed in day to day usage, and the information is also available in Wireshark when capturing NFS traffic. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Chuck Lever	e58691ea4c	Revert "NFSD: Force all NFSv4.2 COPY requests to be synchronous" In the past several kernel releases, we've made NFSv4.2 async copy reliable: - The Linux NFS client and server now both implement and use the NFSv4.2 OFFLOAD_STATUS operation - The Linux NFS server keeps copy stateids around longer - The Linux NFS client and server now both implement referring call lists And resilient against DoS: - The Linux NFS server limits the number of concurrent async copy operations Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-07-14 12:46:46 -04:00
Chuck Lever	48aab1606f	NFSD: Remove the cap on number of operations per NFSv4 COMPOUND This limit has always been a sanity check; in nearly all cases a large COMPOUND is a sign of a malfunctioning client. The only real limit on COMPOUND size and complexity is the size of NFSD's send and receive buffers. However, there are a few cases where a large COMPOUND is sane. For example, when a client implementation wants to walk down a long file pathname in a single round trip. A small risk is that now a client can construct a COMPOUND request that can keep a single nfsd thread busy for quite some time. Suggested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-07-14 12:46:41 -04:00
Linus Torvalds	2c26b68cd5	Merge tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "The marquee feature for this release is that the limit on the maximum rsize and wsize has been raised to 4MB. The default remains at 1MB, but risk-seeking administrators now have the ability to try larger I/O sizes with NFS clients that support them. Eventually the default setting will be increased when we have confidence that this change will not have negative impact. With v6.16, NFSD now has its own debugfs file system where we can add experimental features and make them available outside of our development community without impacting production deployments. The first experimental setting added is one that makes all NFS READ operations use vfs_iter_read() instead of the NFSD splice actor. The plan is to eventually retire the splice actor, as that will enable a number of new capabilities such as the use of struct bio_vec from the top to the bottom of the NFSD stack. Jeff Layton contributed a number of observability improvements. The use of dprintk() in a number of high-traffic code paths has been replaced with static trace points. This release sees the continuation of efforts to harden the NFSv4.2 COPY operation. Soon, the restriction on async COPY operations can be lifted. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.16 development cycle" * tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (60 commits) xdrgen: Fix code generated for counted arrays SUNRPC: Bump the maximum payload size for the server NFSD: Add a "default" block size NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro NFSD: Remove NFSD_BUFSIZE sunrpc: Remove the RPCSVC_MAXPAGES macro svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages sunrpc: Adjust size of socket's receive page array dynamically SUNRPC: Remove svc_rqst :: rq_vec SUNRPC: Remove svc_fill_write_vector() NFSD: Use rqstp->rq_bvec in nfsd_iter_write() SUNRPC: Export xdr_buf_to_bvec() NFSD: De-duplicate the svc_fill_write_vector() call sites NFSD: Use rqstp->rq_bvec in nfsd_iter_read() sunrpc: Replace the rq_bvec array with dynamically-allocated memory sunrpc: Replace the rq_pages array with dynamically-allocated memory sunrpc: Remove backchannel check in svc_init_buffer() sunrpc: Add a helper to derive maxpages from sv_max_mesg svcrdma: Reduce the number of rdma_rw contexts per-QP ...	2025-05-28 12:21:12 -07:00
Chuck Lever	58d721684d	NFSD: Remove NFSD_BUFSIZE Clean up: The documenting comment for NFSD_BUFSIZE is quite stale. NFSD_BUFSIZE is used only for NFSv4 Reply these days; never for NFSv2 or v3, and never for RPC Calls. Even so, the byte count estimate does not include the size of the NFSv4 COMPOUND Reply HEADER or the RPC auth flavor. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-05-15 16:16:27 -04:00
Chuck Lever	f2e597353d	NFSD: De-duplicate the svc_fill_write_vector() call sites All three call sites do the same thing. I'm struggling with this a bit, however. struct xdr_buf is an XDR layer object and unmarshaling a WRITE payload is clearly a task intended to be done by the proc and xdr functions, not by VFS. This feels vaguely like a layering violation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-05-15 16:16:23 -04:00
Jeff Layton	0d885f7e79	nfsd: add tracepoint for getattr and statfs events There isn't a common helper for getattrs, so add these into the protocol-specific helpers. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-05-11 19:48:34 -04:00
Jeff Layton	a91bfc4571	nfsd: add tracepoint to nfsd_readdir Observe the start of NFS READDIR operations. The NFS READDIR's count argument can be interesting when tuning a client's readdir behavior. However, the count argument is not passed to nfsd_readdir(). To properly capture the count argument, this tracepoint must appear in each proc function before the nfsd_readdir() call. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-05-11 19:48:34 -04:00
NeilBrown	1244f0b2c3	nfsd: nfsd4_spo_must_allow() must check this is a v4 compound request If the request being processed is not a v4 compound request, then examining the cstate can have undefined results. This patch adds a check that the rpc procedure being executed (rq_procinfo) is the NFSPROC4_COMPOUND procedure. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-05-11 19:48:25 -04:00
Guoqing Jiang	c447d2ac98	nfsd: remove redundant WARN_ON_ONCE in nfsd4_write It can be removed since svc_fill_write_vector already has the same WARN_ON_ONCE. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-05-11 19:48:24 -04:00
Chuck Lever	b53a42970f	NFSD: Record each NFSv4 call's session slot index Help the client resolve the race between the reply to an asynchronous COPY reply and the associated CB_OFFLOAD callback by planting the session, slot, and sequence number of the COPY in the CB_SEQUENCE contained in the CB_OFFLOAD COMPOUND. Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-05-11 19:48:21 -04:00
Chuck Lever	71aeab7bd9	NFSD: Shorten CB_OFFLOAD response to NFS4ERR_DELAY Try not to prolong the wait for completion of a COPY or COPY_NOTIFY operation. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-05-11 19:48:20 -04:00
Chuck Lever	2f2b6d0b9b	NFSD: OFFLOAD_CANCEL should mark an async COPY as completed Update the status of an async COPY operation when it has been stopped. OFFLOAD_STATUS needs to indicate that the COPY is no longer running. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-05-11 19:48:19 -04:00
NeilBrown	8ad9248471	nfsd: Use lookup_one() rather than lookup_one_len() nfsd uses some VFS interfaces (such as vfs_mkdir) which take an explicit mnt_idmap, and it passes &nop_mnt_idmap as nfsd doesn't yet support idmapped mounts. It also uses the lookup_one_len() family of functions which implicitly use &nop_mnt_idmap. This mixture of implicit and explicit could be confusing. When we eventually update nfsd to support idmap mounts it would be best if all places which need an idmap determined from the mount point were similar and easily found. So this patch changes nfsd to use lookup_one(), lookup_one_unlocked(), and lookup_one_positive_unlocked(), passing &nop_mnt_idmap. This has the benefit of removing some uses of the lookup_one_len functions where permission checking is actually needed. Many callers don't care about permission checking and using these function only where permission checking is needed is a valuable simplification. This change requires passing the name in a qstr. Currently this is a little clumsy, but if nfsd is changed to use qstr more broadly it will result in a net improvement. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/r/20250319031545.2999807-3-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 09:25:32 +02:00
Jeff Layton	1054e8ffc5	nfsd: prevent callback tasks running concurrently The nfsd4_callback workqueue jobs exist to queue backchannel RPCs to rpciod. Because they run in different workqueue contexts, the rpc_task can run concurrently with the workqueue job itself, should it become requeued. This is problematic as there is no locking when accessing the fields in the nfsd4_callback. Add a new unsigned long to nfsd4_callback and declare a new NFSD4_CALLBACK_RUNNING flag to be set in it. When attempting to run a workqueue job, do a test_and_set_bit() on that flag first, and don't queue the workqueue job if it returns true. Clear NFSD4_CALLBACK_RUNNING in nfsd41_destroy_cb(). This also gives us a more reliable mechanism for handling queueing failures in codepaths where we have to take references under spinlocks. We can now do the test_and_set_bit on NFSD4_CALLBACK_RUNNING first, and only take references to the objects if that returns false. Most of the nfsd4_run_cb() callers are converted to use this new flag or the nfsd4_try_run_cb() wrapper. The main exception is the callback channel probe, which has its own synchronization. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-03-10 09:11:09 -04:00
Jeff Layton	7e13f4f8d2	nfsd: handle delegated timestamps in SETATTR Allow SETATTR to handle delegated timestamps. This patch assumes that only the delegation holder has the ability to set the timestamps in this way, so we allow this only if the SETATTR stateid refers to a *_ATTRS_DELEG delegation. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Olga Kornievskaia	9048cf05a1	NFSD: fix management of pending async copies Currently the pending_async_copies count is decremented just before a struct nfsd4_copy is destroyed. After commit `aa0ebd21df` ("NFSD: Add nfsd4_copy time-to-live") nfsd4_copy structures sticks around for 10 lease periods after the COPY itself has completed, the pending_async_copies count stays high for a long time. This causes NFSD to avoid the use of background copy even though the actual background copy workload might no longer be running. In this patch, decrement pending_async_copies once async copy thread is done processing the copy work. Fixes: `aa0ebd21df` ("NFSD: Add nfsd4_copy time-to-live") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-12-17 16:35:53 -05:00
Chuck Lever	aa0ebd21df	NFSD: Add nfsd4_copy time-to-live Keep async copy state alive for a few lease cycles after the copy completes so that OFFLOAD_STATUS returns something meaningful. This means that NFSD's client shutdown processing needs to purge any of this state that happens to be waiting to die. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-11-18 20:23:11 -05:00
Chuck Lever	ac0514f4d1	NFSD: Add a laundromat reaper for async copy state RFC 7862 Section 4.8 states: > A copy offload stateid will be valid until either (A) the client > or server restarts or (B) the client returns the resource by > issuing an OFFLOAD_CANCEL operation or the client replies to a > CB_OFFLOAD operation. Instead of releasing async copy state when the CB_OFFLOAD callback completes, now let it live until the next laundromat run after the callback completes. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-11-18 20:23:11 -05:00
Chuck Lever	b44ffa4c4f	NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operations Currently __destroy_client() consults the nfs4_client's async_copies list to determine whether there are ongoing async COPY operations. However, NFSD now keeps copy state in that list even when the async copy has completed, to enable OFFLOAD_STATUS to find the COPY results for a while after the COPY has completed. DESTROY_CLIENTID should not be blocked if the client's async_copies list contains state for only completed copy operations. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-11-18 20:23:10 -05:00
Chuck Lever	5c41f32147	NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD RFC 7862 permits callback services to respond to CB_OFFLOAD with NFS4ERR_DELAY. Currently NFSD drops the CB_OFFLOAD in that case. To improve the reliability of COPY offload, NFSD should rather send another CB_OFFLOAD completion notification. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-11-18 20:23:10 -05:00
Chuck Lever	409d6f52bd	NFSD: Free async copy information in nfsd4_cb_offload_release() RFC 7862 Section 4.8 states: > A copy offload stateid will be valid until either (A) the client > or server restarts or (B) the client returns the resource by > issuing an OFFLOAD_CANCEL operation or the client replies to a > CB_OFFLOAD operation. Currently, NFSD purges the metadata for an async COPY operation as soon as the CB_OFFLOAD callback has been sent. It does not wait even for the client's CB_OFFLOAD response, as the paragraph above suggests that it should. This makes the OFFLOAD_STATUS operation ineffective during the window between the completion of an asynchronous COPY and the server's receipt of the corresponding CB_OFFLOAD response. This is important if, for example, the client responds with NFS4ERR_DELAY, or the transport is lost before the server receives the response. A client might use OFFLOAD_STATUS to query the server about the still pending asynchronous COPY, but NFSD will respond to OFFLOAD_STATUS as if it had never heard of the presented copy stateid. This patch starts to address this issue by extending the lifetime of struct nfsd4_copy at least until the server has seen the client's CB_OFFLOAD response, or the CB_OFFLOAD has timed out. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-11-18 20:23:10 -05:00
Chuck Lever	62a8642ba0	NFSD: Fix nfsd4_shutdown_copy() nfsd4_shutdown_copy() is just this: while ((copy = nfsd4_get_copy(clp)) != NULL) nfsd4_stop_copy(copy); nfsd4_get_copy() bumps @copy's reference count, preventing nfsd4_stop_copy() from releasing @copy. A while loop like this usually works by removing the first element of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy() alters the async_copies list. Best I can tell, then, is that nfsd4_shutdown_copy() continues to loop until other threads manage to remove all the items from this list. The spinning loop blocks shutdown until these items are gone. Possibly the reason we haven't seen this issue in the field is because client_has_state() prevents __destroy_client() from calling nfsd4_shutdown_copy() if there are any items on this list. In a subsequent patch I plan to remove that restriction. Fixes: `e0639dc580` ("NFSD introduce async copy feature") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-11-18 20:23:09 -05:00
Chuck Lever	a4452e661b	NFSD: Add a tracepoint to record canceled async COPY operations Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-11-18 20:23:09 -05:00
Pali Rohár	bb4f07f240	nfsd: Fix NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT Currently NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT do not bypass only GSS, but bypass any method. This is a problem specially for NFS3 AUTH_NULL-only exports. The purpose of NFSD_MAY_BYPASS_GSS_ON_ROOT is described in RFC 2623, section 2.3.2, to allow mounting NFS2/3 GSS-only export without authentication. So few procedures which do not expose security risk used during mount time can be called also with AUTH_NONE or AUTH_SYS, to allow client mount operation to finish successfully. The problem with current implementation is that for AUTH_NULL-only exports, the NFSD_MAY_BYPASS_GSS_ON_ROOT is active also for NFS3 AUTH_UNIX mount attempts which confuse NFS3 clients, and make them think that AUTH_UNIX is enabled and is working. Linux NFS3 client never switches from AUTH_UNIX to AUTH_NONE on active mount, which makes the mount inaccessible. Fix the NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT implementation and really allow to bypass only exports which have enabled some real authentication (GSS, TLS, or any other). The result would be: For AUTH_NULL-only export if client attempts to do mount with AUTH_UNIX flavor then it will receive access errors, which instruct client that AUTH_UNIX flavor is not usable and will either try other auth flavor (AUTH_NULL if enabled) or fails mount procedure. Similarly if client attempt to do mount with AUTH_NULL flavor and only AUTH_UNIX flavor is enabled then the client will receive access error. This should fix problems with AUTH_NULL-only or AUTH_UNIX-only exports if client attempts to mount it with other auth flavor (e.g. with AUTH_NULL for AUTH_UNIX-only export, or with AUTH_UNIX for AUTH_NULL-only export). Signed-off-by: Pali Rohár <pali@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-11-18 20:22:59 -05:00
Pali Rohár	600020927b	nfsd: Fill NFSv4.1 server implementation fields in OP_EXCHANGE_ID response NFSv4.1 OP_EXCHANGE_ID response from server may contain server implementation details (domain, name and build time) in optional nfs_impl_id4 field. Currently nfsd does not fill this field. Send these information in NFSv4.1 OP_EXCHANGE_ID response. Fill them with the same values as what is Linux NFSv4.1 client doing. Domain is hardcoded to "kernel.org", name is composed in the same way as "uname -srvm" output and build time is hardcoded to zeros. NFSv4.1 client and server implementation fields are useful for statistic purposes or for identifying type of clients and servers. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-11-18 20:22:58 -05:00
Jeff Layton	b9376c7e42	nfsd: new tracepoint for after op_func in compound processing Turn nfsd_compound_encode_err tracepoint into a class and add a new nfsd_compound_op_err tracepoint. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-11-18 20:22:58 -05:00
Chuck Lever	8286f8b622	NFSD: Never decrement pending_async_copies on error The error flow in nfsd4_copy() calls cleanup_async_copy(), which already decrements nn->pending_async_copies. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Fixes: `aadc3bbea1` ("NFSD: Limit the number of concurrent async COPY operations") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-10-30 14:12:16 -04:00
Chuck Lever	63fab04cbd	NFSD: Initialize struct nfsd4_copy earlier Ensure the refcount and async_copies fields are initialized early. cleanup_async_copy() will reference these fields if an error occurs in nfsd4_copy(). If they are not correctly initialized, at the very least, a refcount underflow occurs. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Fixes: `aadc3bbea1` ("NFSD: Limit the number of concurrent async COPY operations") Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-10-29 15:31:18 -04:00
Chuck Lever	0505de9615	NFSD: Wrap async copy operations with trace points Add an nfsd_copy_async_done to record the timestamp, the final status code, and the callback stateid of an async copy. Rename the nfsd_copy_do_async tracepoint to match that naming convention to make it easier to enable both of these with a single glob. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00
Chuck Lever	aadc3bbea1	NFSD: Limit the number of concurrent async COPY operations Nothing appears to limit the number of concurrent async COPY operations that clients can start. In addition, AFAICT each async COPY can copy an unlimited number of 4MB chunks, so can run for a long time. Thus IMO async COPY can become a DoS vector. Add a restriction mechanism that bounds the number of concurrent background COPY operations. Start simple and try to be fair -- this patch implements a per-namespace limit. An async COPY request that occurs while this limit is exceeded gets NFS4ERR_DELAY. The requesting client can choose to send the request again after a delay or fall back to a traditional read/write style copy. If there is need to make the mechanism more sophisticated, we can visit that in future patches. Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-09-20 19:31:03 -04:00

1 2 3 4 5 ...

501 Commits