Commit 1db4b360 authored Jan 13, 2026 by Yongpeng Yang Committed by Jaegeuk Kim Jan 27, 2026

f2fs: optimize NAT block loading during checkpoint write



Under stress tests with frequent metadata operations, checkpoint write
time can become excessively long. Analysis shows that the slowdown is
caused by synchronous, one-by-one reads of NAT blocks during checkpoint
processing.

The issue can be reproduced with the following workload:
1. seq 1 650000 | xargs -P 16 -n 1 touch
2. sync # avoid checkpoint write during deleting
3. delete 1 file every 455 files
4. echo 3 > /proc/sys/vm/drop_caches
5. sync # trigger checkpoint write

This patch submits read I/O for all NAT blocks required in the
__flush_nat_entry_set() phase in advance, reducing the overhead of
synchronous waiting for individual NAT block reads.

The NAT block flush latency before and after the change is as below:

|             |NAT blocks accessed|NAT blocks read|Flush time (ms)|
|-------------|-------------------|---------------|---------------|
|Before change|1205               |1191           |158            |
|After change |1264               |1242           |11             |

With a similar number of NAT blocks accessed and read from disk, adding
NAT block readahead reduces the total NAT block flush time by more than
90%.

Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

parent 7c9ee0ed

fs/f2fs/node.c

+13 −1

Original line number	Diff line number	Diff line
		@@ -3179,7 +3179,7 @@ int f2fs_flush_nat_entries(struct f2fs_sb_info sbi, struct cp_control cpc)
		struct f2fs_journal *journal = curseg->journal;
		struct nat_entry_set *setvec[NAT_VEC_SIZE];
		struct nat_entry_set set, tmp;
		unsigned int found;
		unsigned int found, entry_count = 0;
		nid_t set_idx = 0;
		LIST_HEAD(sets);
		int err = 0;
		@@ -3219,6 +3219,18 @@ int f2fs_flush_nat_entries(struct f2fs_sb_info sbi, struct cp_control cpc)
		MAX_NAT_JENTRIES(sbi, journal));
		}

		/*
		* Readahead the current NAT block to prevent read requests from
		* being issued and waited on one by one.
		*/
		list_for_each_entry(set, &sets, set_list) {
		entry_count += set->entry_cnt;
		if (!enabled_nat_bits(sbi, cpc) &&
		__has_cursum_space(sbi, journal,
		entry_count, NAT_JOURNAL))
		continue;
		f2fs_ra_meta_pages(sbi, set->set, 1, META_NAT, true);
		}
		/* flush dirty nats in nat entry set */
		list_for_each_entry_safe(set, tmp, &sets, set_list) {
		err = __flush_nat_entry_set(sbi, set, cpc);