From c36748e8733ef9c5f4cd1d7c4327994e5b88b8df Mon Sep 17 00:00:00 2001 From: Wake Liu Date: Thu, 7 Aug 2025 16:09:32 +0800 Subject: [PATCH 0001/1536] selftests/net: Replace non-standard __WORDSIZE with sizeof(long) * 8 The `__WORDSIZE` macro, defined in the non-standard `` header, is a GNU extension and not universally available with all toolchains, such as Clang when used with musl libc. This can lead to build failures in environments where this header is missing. The intention of the code is to determine the bit width of a C `long`. Replace the non-portable `__WORDSIZE` with the standard and portable `sizeof(long) * 8` expression to achieve the same result. This change also removes the inclusion of the now-unused `` header. Signed-off-by: Wake Liu Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/psock_tpacket.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c index 221270cee3ea..0dd909e325d9 100644 --- a/tools/testing/selftests/net/psock_tpacket.c +++ b/tools/testing/selftests/net/psock_tpacket.c @@ -33,7 +33,6 @@ #include #include #include -#include #include #include #include @@ -785,7 +784,7 @@ static int test_kernel_bit_width(void) static int test_user_bit_width(void) { - return __WORDSIZE; + return sizeof(long) * 8; } static const char *tpacket_str[] = { From bc4c0a48bdad7f225740b8e750fdc1da6d85e1eb Mon Sep 17 00:00:00 2001 From: Wake Liu Date: Sat, 9 Aug 2025 14:20:13 +0800 Subject: [PATCH 0002/1536] selftests/net: Ensure assert() triggers in psock_tpacket.c The get_next_frame() function in psock_tpacket.c was missing a return statement in its default switch case, leading to a compiler warning. This was caused by a `bug_on(1)` call, which is defined as an `assert()`, being compiled out because NDEBUG is defined during the build. Instead of adding a `return NULL;` which would silently hide the error and could lead to crashes later, this change restores the original author's intent. By adding `#undef NDEBUG` before including , we ensure the assertion is active and will cause the test to abort if this unreachable code is ever executed. Signed-off-by: Wake Liu Link: https://patch.msgid.link/20250809062013.2407822-1-wakel@google.com Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/psock_tpacket.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c index 0dd909e325d9..2938045c5cf9 100644 --- a/tools/testing/selftests/net/psock_tpacket.c +++ b/tools/testing/selftests/net/psock_tpacket.c @@ -22,6 +22,7 @@ * - TPACKET_V3: RX_RING */ +#undef NDEBUG #include #include #include From dd5d5a11bacb2eb0e00e0f6a9bc919afed2ec751 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 10 Jun 2025 11:42:30 +0200 Subject: [PATCH 0003/1536] docs: netlink: netlink-raw.rst: use :ref: instead of :doc: Currently, rt documents are referred with: Documentation/userspace-api/netlink/netlink-raw.rst: :doc:`rt-link<../../networking/netlink_spec/rt-link>` Documentation/userspace-api/netlink/netlink-raw.rst: :doc:`tc<../../networking/netlink_spec/tc>` Documentation/userspace-api/netlink/netlink-raw.rst: :doc:`tc<../../networking/netlink_spec/tc>` Having :doc: references with relative paths doesn't always work, as it may have troubles when O= is used. Also that's hard to maintain, and may break if we change the way rst files are generated from yaml. Better to use instead a reference for the netlink family. So, replace them by Sphinx cross-reference tag that are created by ynl_gen_rst.py. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Donald Hunter --- Documentation/userspace-api/netlink/netlink-raw.rst | 6 +++--- tools/net/ynl/pyynl/ynl_gen_rst.py | 5 +++-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/Documentation/userspace-api/netlink/netlink-raw.rst b/Documentation/userspace-api/netlink/netlink-raw.rst index 31fc91020eb3..aae296c170c5 100644 --- a/Documentation/userspace-api/netlink/netlink-raw.rst +++ b/Documentation/userspace-api/netlink/netlink-raw.rst @@ -62,8 +62,8 @@ Sub-messages ------------ Several raw netlink families such as -:doc:`rt-link<../../networking/netlink_spec/rt-link>` and -:doc:`tc<../../networking/netlink_spec/tc>` use attribute nesting as an +:ref:`rt-link` and +:ref:`tc` use attribute nesting as an abstraction to carry module specific information. Conceptually it looks as follows:: @@ -162,7 +162,7 @@ then this is an error. Nested struct definitions ------------------------- -Many raw netlink families such as :doc:`tc<../../networking/netlink_spec/tc>` +Many raw netlink families such as :ref:`tc` make use of nested struct definitions. The ``netlink-raw`` schema makes it possible to embed a struct within a struct definition using the ``struct`` property. For example, the following struct definition embeds the diff --git a/tools/net/ynl/pyynl/ynl_gen_rst.py b/tools/net/ynl/pyynl/ynl_gen_rst.py index 0cb6348e28d3..7bfb8ceeeefc 100755 --- a/tools/net/ynl/pyynl/ynl_gen_rst.py +++ b/tools/net/ynl/pyynl/ynl_gen_rst.py @@ -314,10 +314,11 @@ def parse_yaml(obj: Dict[str, Any]) -> str: # Main header - lines.append(rst_header()) - family = obj['name'] + lines.append(rst_header()) + lines.append(rst_label("netlink-" + family)) + title = f"Family ``{family}`` netlink specification" lines.append(rst_title(title)) lines.append(rst_paragraph(".. contents:: :depth: 3\n")) From f25f39e6d26644aeeef72433f67fc6d9e638c800 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Thu, 12 Jun 2025 08:27:00 +0200 Subject: [PATCH 0004/1536] tools: ynl_gen_rst.py: Split library from command line tool As we'll be using the Netlink specs parser inside a Sphinx extension, move the library part from the command line parser. While here, change the code which generates an index file to parse inputs from both .rst and .yaml extensions. With that, the tool can easily be tested with: tools/net/ynl/pyynl/ynl_gen_rst.py -x -o Documentation/netlink/specs/foo.rst Without needing to first generate a temp directory with the rst files. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Donald Hunter --- tools/net/ynl/pyynl/lib/__init__.py | 2 + tools/net/ynl/pyynl/lib/doc_generator.py | 382 +++++++++++++++++++++++ tools/net/ynl/pyynl/ynl_gen_rst.py | 375 +--------------------- 3 files changed, 400 insertions(+), 359 deletions(-) create mode 100644 tools/net/ynl/pyynl/lib/doc_generator.py diff --git a/tools/net/ynl/pyynl/lib/__init__.py b/tools/net/ynl/pyynl/lib/__init__.py index 71518b9842ee..5f266ebe4526 100644 --- a/tools/net/ynl/pyynl/lib/__init__.py +++ b/tools/net/ynl/pyynl/lib/__init__.py @@ -4,6 +4,8 @@ from .nlspec import SpecAttr, SpecAttrSet, SpecEnumEntry, SpecEnumSet, \ SpecFamily, SpecOperation, SpecSubMessage, SpecSubMessageFormat from .ynl import YnlFamily, Netlink, NlError +from .doc_generator import YnlDocGenerator + __all__ = ["SpecAttr", "SpecAttrSet", "SpecEnumEntry", "SpecEnumSet", "SpecFamily", "SpecOperation", "SpecSubMessage", "SpecSubMessageFormat", "YnlFamily", "Netlink", "NlError"] diff --git a/tools/net/ynl/pyynl/lib/doc_generator.py b/tools/net/ynl/pyynl/lib/doc_generator.py new file mode 100644 index 000000000000..80e468086693 --- /dev/null +++ b/tools/net/ynl/pyynl/lib/doc_generator.py @@ -0,0 +1,382 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# -*- coding: utf-8; mode: python -*- + +""" + Class to auto generate the documentation for Netlink specifications. + + :copyright: Copyright (C) 2023 Breno Leitao + :license: GPL Version 2, June 1991 see linux/COPYING for details. + + This class performs extensive parsing to the Linux kernel's netlink YAML + spec files, in an effort to avoid needing to heavily mark up the original + YAML file. + + This code is split in two classes: + 1) RST formatters: Use to convert a string to a RST output + 2) YAML Netlink (YNL) doc generator: Generate docs from YAML data +""" + +from typing import Any, Dict, List +import os.path +import sys +import argparse +import logging +import yaml + + +# ============== +# RST Formatters +# ============== +class RstFormatters: + SPACE_PER_LEVEL = 4 + + @staticmethod + def headroom(level: int) -> str: + """Return space to format""" + return " " * (level * RstFormatters.SPACE_PER_LEVEL) + + + @staticmethod + def bold(text: str) -> str: + """Format bold text""" + return f"**{text}**" + + + @staticmethod + def inline(text: str) -> str: + """Format inline text""" + return f"``{text}``" + + + @staticmethod + def sanitize(text: str) -> str: + """Remove newlines and multiple spaces""" + # This is useful for some fields that are spread across multiple lines + return str(text).replace("\n", " ").strip() + + + def rst_fields(self, key: str, value: str, level: int = 0) -> str: + """Return a RST formatted field""" + return self.headroom(level) + f":{key}: {value}" + + + def rst_definition(self, key: str, value: Any, level: int = 0) -> str: + """Format a single rst definition""" + return self.headroom(level) + key + "\n" + self.headroom(level + 1) + str(value) + + + def rst_paragraph(self, paragraph: str, level: int = 0) -> str: + """Return a formatted paragraph""" + return self.headroom(level) + paragraph + + + def rst_bullet(self, item: str, level: int = 0) -> str: + """Return a formatted a bullet""" + return self.headroom(level) + f"- {item}" + + + @staticmethod + def rst_subsection(title: str) -> str: + """Add a sub-section to the document""" + return f"{title}\n" + "-" * len(title) + + + @staticmethod + def rst_subsubsection(title: str) -> str: + """Add a sub-sub-section to the document""" + return f"{title}\n" + "~" * len(title) + + + @staticmethod + def rst_section(namespace: str, prefix: str, title: str) -> str: + """Add a section to the document""" + return f".. _{namespace}-{prefix}-{title}:\n\n{title}\n" + "=" * len(title) + + + @staticmethod + def rst_subtitle(title: str) -> str: + """Add a subtitle to the document""" + return "\n" + "-" * len(title) + f"\n{title}\n" + "-" * len(title) + "\n\n" + + + @staticmethod + def rst_title(title: str) -> str: + """Add a title to the document""" + return "=" * len(title) + f"\n{title}\n" + "=" * len(title) + "\n\n" + + + def rst_list_inline(self, list_: List[str], level: int = 0) -> str: + """Format a list using inlines""" + return self.headroom(level) + "[" + ", ".join(self.inline(i) for i in list_) + "]" + + + @staticmethod + def rst_ref(namespace: str, prefix: str, name: str) -> str: + """Add a hyperlink to the document""" + mappings = {'enum': 'definition', + 'fixed-header': 'definition', + 'nested-attributes': 'attribute-set', + 'struct': 'definition'} + if prefix in mappings: + prefix = mappings[prefix] + return f":ref:`{namespace}-{prefix}-{name}`" + + + def rst_header(self) -> str: + """The headers for all the auto generated RST files""" + lines = [] + + lines.append(self.rst_paragraph(".. SPDX-License-Identifier: GPL-2.0")) + lines.append(self.rst_paragraph(".. NOTE: This document was auto-generated.\n\n")) + + return "\n".join(lines) + + + @staticmethod + def rst_toctree(maxdepth: int = 2) -> str: + """Generate a toctree RST primitive""" + lines = [] + + lines.append(".. toctree::") + lines.append(f" :maxdepth: {maxdepth}\n\n") + + return "\n".join(lines) + + + @staticmethod + def rst_label(title: str) -> str: + """Return a formatted label""" + return f".. _{title}:\n\n" + +# ======= +# Parsers +# ======= +class YnlDocGenerator: + + fmt = RstFormatters() + + def parse_mcast_group(self, mcast_group: List[Dict[str, Any]]) -> str: + """Parse 'multicast' group list and return a formatted string""" + lines = [] + for group in mcast_group: + lines.append(self.fmt.rst_bullet(group["name"])) + + return "\n".join(lines) + + + def parse_do(self, do_dict: Dict[str, Any], level: int = 0) -> str: + """Parse 'do' section and return a formatted string""" + lines = [] + for key in do_dict.keys(): + lines.append(self.fmt.rst_paragraph(self.fmt.bold(key), level + 1)) + if key in ['request', 'reply']: + lines.append(self.parse_do_attributes(do_dict[key], level + 1) + "\n") + else: + lines.append(self.fmt.headroom(level + 2) + do_dict[key] + "\n") + + return "\n".join(lines) + + + def parse_do_attributes(self, attrs: Dict[str, Any], level: int = 0) -> str: + """Parse 'attributes' section""" + if "attributes" not in attrs: + return "" + lines = [self.fmt.rst_fields("attributes", self.fmt.rst_list_inline(attrs["attributes"]), level + 1)] + + return "\n".join(lines) + + + def parse_operations(self, operations: List[Dict[str, Any]], namespace: str) -> str: + """Parse operations block""" + preprocessed = ["name", "doc", "title", "do", "dump", "flags"] + linkable = ["fixed-header", "attribute-set"] + lines = [] + + for operation in operations: + lines.append(self.fmt.rst_section(namespace, 'operation', operation["name"])) + lines.append(self.fmt.rst_paragraph(operation["doc"]) + "\n") + + for key in operation.keys(): + if key in preprocessed: + # Skip the special fields + continue + value = operation[key] + if key in linkable: + value = self.fmt.rst_ref(namespace, key, value) + lines.append(self.fmt.rst_fields(key, value, 0)) + if 'flags' in operation: + lines.append(self.fmt.rst_fields('flags', self.fmt.rst_list_inline(operation['flags']))) + + if "do" in operation: + lines.append(self.fmt.rst_paragraph(":do:", 0)) + lines.append(self.parse_do(operation["do"], 0)) + if "dump" in operation: + lines.append(self.fmt.rst_paragraph(":dump:", 0)) + lines.append(self.parse_do(operation["dump"], 0)) + + # New line after fields + lines.append("\n") + + return "\n".join(lines) + + + def parse_entries(self, entries: List[Dict[str, Any]], level: int) -> str: + """Parse a list of entries""" + ignored = ["pad"] + lines = [] + for entry in entries: + if isinstance(entry, dict): + # entries could be a list or a dictionary + field_name = entry.get("name", "") + if field_name in ignored: + continue + type_ = entry.get("type") + if type_: + field_name += f" ({self.fmt.inline(type_)})" + lines.append( + self.fmt.rst_fields(field_name, self.fmt.sanitize(entry.get("doc", "")), level) + ) + elif isinstance(entry, list): + lines.append(self.fmt.rst_list_inline(entry, level)) + else: + lines.append(self.fmt.rst_bullet(self.fmt.inline(self.fmt.sanitize(entry)), level)) + + lines.append("\n") + return "\n".join(lines) + + + def parse_definitions(self, defs: Dict[str, Any], namespace: str) -> str: + """Parse definitions section""" + preprocessed = ["name", "entries", "members"] + ignored = ["render-max"] # This is not printed + lines = [] + + for definition in defs: + lines.append(self.fmt.rst_section(namespace, 'definition', definition["name"])) + for k in definition.keys(): + if k in preprocessed + ignored: + continue + lines.append(self.fmt.rst_fields(k, self.fmt.sanitize(definition[k]), 0)) + + # Field list needs to finish with a new line + lines.append("\n") + if "entries" in definition: + lines.append(self.fmt.rst_paragraph(":entries:", 0)) + lines.append(self.parse_entries(definition["entries"], 1)) + if "members" in definition: + lines.append(self.fmt.rst_paragraph(":members:", 0)) + lines.append(self.parse_entries(definition["members"], 1)) + + return "\n".join(lines) + + + def parse_attr_sets(self, entries: List[Dict[str, Any]], namespace: str) -> str: + """Parse attribute from attribute-set""" + preprocessed = ["name", "type"] + linkable = ["enum", "nested-attributes", "struct", "sub-message"] + ignored = ["checks"] + lines = [] + + for entry in entries: + lines.append(self.fmt.rst_section(namespace, 'attribute-set', entry["name"])) + for attr in entry["attributes"]: + type_ = attr.get("type") + attr_line = attr["name"] + if type_: + # Add the attribute type in the same line + attr_line += f" ({self.fmt.inline(type_)})" + + lines.append(self.fmt.rst_subsubsection(attr_line)) + + for k in attr.keys(): + if k in preprocessed + ignored: + continue + if k in linkable: + value = self.fmt.rst_ref(namespace, k, attr[k]) + else: + value = self.fmt.sanitize(attr[k]) + lines.append(self.fmt.rst_fields(k, value, 0)) + lines.append("\n") + + return "\n".join(lines) + + + def parse_sub_messages(self, entries: List[Dict[str, Any]], namespace: str) -> str: + """Parse sub-message definitions""" + lines = [] + + for entry in entries: + lines.append(self.fmt.rst_section(namespace, 'sub-message', entry["name"])) + for fmt in entry["formats"]: + value = fmt["value"] + + lines.append(self.fmt.rst_bullet(self.fmt.bold(value))) + for attr in ['fixed-header', 'attribute-set']: + if attr in fmt: + lines.append(self.fmt.rst_fields(attr, + self.fmt.rst_ref(namespace, attr, fmt[attr]), + 1)) + lines.append("\n") + + return "\n".join(lines) + + + def parse_yaml(self, obj: Dict[str, Any]) -> str: + """Format the whole YAML into a RST string""" + lines = [] + + # Main header + + family = obj['name'] + + lines.append(self.fmt.rst_header()) + lines.append(self.fmt.rst_label("netlink-" + family)) + + title = f"Family ``{family}`` netlink specification" + lines.append(self.fmt.rst_title(title)) + lines.append(self.fmt.rst_paragraph(".. contents:: :depth: 3\n")) + + if "doc" in obj: + lines.append(self.fmt.rst_subtitle("Summary")) + lines.append(self.fmt.rst_paragraph(obj["doc"], 0)) + + # Operations + if "operations" in obj: + lines.append(self.fmt.rst_subtitle("Operations")) + lines.append(self.parse_operations(obj["operations"]["list"], family)) + + # Multicast groups + if "mcast-groups" in obj: + lines.append(self.fmt.rst_subtitle("Multicast groups")) + lines.append(self.parse_mcast_group(obj["mcast-groups"]["list"])) + + # Definitions + if "definitions" in obj: + lines.append(self.fmt.rst_subtitle("Definitions")) + lines.append(self.parse_definitions(obj["definitions"], family)) + + # Attributes set + if "attribute-sets" in obj: + lines.append(self.fmt.rst_subtitle("Attribute sets")) + lines.append(self.parse_attr_sets(obj["attribute-sets"], family)) + + # Sub-messages + if "sub-messages" in obj: + lines.append(self.fmt.rst_subtitle("Sub-messages")) + lines.append(self.parse_sub_messages(obj["sub-messages"], family)) + + return "\n".join(lines) + + + # Main functions + # ============== + + + def parse_yaml_file(self, filename: str) -> str: + """Transform the YAML specified by filename into an RST-formatted string""" + with open(filename, "r", encoding="utf-8") as spec_file: + yaml_data = yaml.safe_load(spec_file) + content = self.parse_yaml(yaml_data) + + return content diff --git a/tools/net/ynl/pyynl/ynl_gen_rst.py b/tools/net/ynl/pyynl/ynl_gen_rst.py index 7bfb8ceeeefc..010315fad498 100755 --- a/tools/net/ynl/pyynl/ynl_gen_rst.py +++ b/tools/net/ynl/pyynl/ynl_gen_rst.py @@ -10,354 +10,17 @@ This script performs extensive parsing to the Linux kernel's netlink YAML spec files, in an effort to avoid needing to heavily mark up the original - YAML file. - - This code is split in three big parts: - 1) RST formatters: Use to convert a string to a RST output - 2) Parser helpers: Functions to parse the YAML data structure - 3) Main function and small helpers + YAML file. It uses the library code from scripts/lib. """ -from typing import Any, Dict, List import os.path +import pathlib import sys import argparse import logging -import yaml - - -SPACE_PER_LEVEL = 4 - - -# RST Formatters -# ============== -def headroom(level: int) -> str: - """Return space to format""" - return " " * (level * SPACE_PER_LEVEL) - - -def bold(text: str) -> str: - """Format bold text""" - return f"**{text}**" - - -def inline(text: str) -> str: - """Format inline text""" - return f"``{text}``" - - -def sanitize(text: str) -> str: - """Remove newlines and multiple spaces""" - # This is useful for some fields that are spread across multiple lines - return str(text).replace("\n", " ").strip() - - -def rst_fields(key: str, value: str, level: int = 0) -> str: - """Return a RST formatted field""" - return headroom(level) + f":{key}: {value}" - - -def rst_definition(key: str, value: Any, level: int = 0) -> str: - """Format a single rst definition""" - return headroom(level) + key + "\n" + headroom(level + 1) + str(value) - - -def rst_paragraph(paragraph: str, level: int = 0) -> str: - """Return a formatted paragraph""" - return headroom(level) + paragraph - - -def rst_bullet(item: str, level: int = 0) -> str: - """Return a formatted a bullet""" - return headroom(level) + f"- {item}" - - -def rst_subsection(title: str) -> str: - """Add a sub-section to the document""" - return f"{title}\n" + "-" * len(title) - - -def rst_subsubsection(title: str) -> str: - """Add a sub-sub-section to the document""" - return f"{title}\n" + "~" * len(title) - - -def rst_section(namespace: str, prefix: str, title: str) -> str: - """Add a section to the document""" - return f".. _{namespace}-{prefix}-{title}:\n\n{title}\n" + "=" * len(title) - - -def rst_subtitle(title: str) -> str: - """Add a subtitle to the document""" - return "\n" + "-" * len(title) + f"\n{title}\n" + "-" * len(title) + "\n\n" - - -def rst_title(title: str) -> str: - """Add a title to the document""" - return "=" * len(title) + f"\n{title}\n" + "=" * len(title) + "\n\n" - - -def rst_list_inline(list_: List[str], level: int = 0) -> str: - """Format a list using inlines""" - return headroom(level) + "[" + ", ".join(inline(i) for i in list_) + "]" - - -def rst_ref(namespace: str, prefix: str, name: str) -> str: - """Add a hyperlink to the document""" - mappings = {'enum': 'definition', - 'fixed-header': 'definition', - 'nested-attributes': 'attribute-set', - 'struct': 'definition'} - if prefix in mappings: - prefix = mappings[prefix] - return f":ref:`{namespace}-{prefix}-{name}`" - - -def rst_header() -> str: - """The headers for all the auto generated RST files""" - lines = [] - - lines.append(rst_paragraph(".. SPDX-License-Identifier: GPL-2.0")) - lines.append(rst_paragraph(".. NOTE: This document was auto-generated.\n\n")) - - return "\n".join(lines) - - -def rst_toctree(maxdepth: int = 2) -> str: - """Generate a toctree RST primitive""" - lines = [] - - lines.append(".. toctree::") - lines.append(f" :maxdepth: {maxdepth}\n\n") - - return "\n".join(lines) - - -def rst_label(title: str) -> str: - """Return a formatted label""" - return f".. _{title}:\n\n" - - -# Parsers -# ======= - - -def parse_mcast_group(mcast_group: List[Dict[str, Any]]) -> str: - """Parse 'multicast' group list and return a formatted string""" - lines = [] - for group in mcast_group: - lines.append(rst_bullet(group["name"])) - - return "\n".join(lines) - - -def parse_do(do_dict: Dict[str, Any], level: int = 0) -> str: - """Parse 'do' section and return a formatted string""" - lines = [] - for key in do_dict.keys(): - lines.append(rst_paragraph(bold(key), level + 1)) - if key in ['request', 'reply']: - lines.append(parse_do_attributes(do_dict[key], level + 1) + "\n") - else: - lines.append(headroom(level + 2) + do_dict[key] + "\n") - - return "\n".join(lines) - - -def parse_do_attributes(attrs: Dict[str, Any], level: int = 0) -> str: - """Parse 'attributes' section""" - if "attributes" not in attrs: - return "" - lines = [rst_fields("attributes", rst_list_inline(attrs["attributes"]), level + 1)] - - return "\n".join(lines) - - -def parse_operations(operations: List[Dict[str, Any]], namespace: str) -> str: - """Parse operations block""" - preprocessed = ["name", "doc", "title", "do", "dump", "flags"] - linkable = ["fixed-header", "attribute-set"] - lines = [] - - for operation in operations: - lines.append(rst_section(namespace, 'operation', operation["name"])) - lines.append(rst_paragraph(operation["doc"]) + "\n") - - for key in operation.keys(): - if key in preprocessed: - # Skip the special fields - continue - value = operation[key] - if key in linkable: - value = rst_ref(namespace, key, value) - lines.append(rst_fields(key, value, 0)) - if 'flags' in operation: - lines.append(rst_fields('flags', rst_list_inline(operation['flags']))) - - if "do" in operation: - lines.append(rst_paragraph(":do:", 0)) - lines.append(parse_do(operation["do"], 0)) - if "dump" in operation: - lines.append(rst_paragraph(":dump:", 0)) - lines.append(parse_do(operation["dump"], 0)) - - # New line after fields - lines.append("\n") - - return "\n".join(lines) - - -def parse_entries(entries: List[Dict[str, Any]], level: int) -> str: - """Parse a list of entries""" - ignored = ["pad"] - lines = [] - for entry in entries: - if isinstance(entry, dict): - # entries could be a list or a dictionary - field_name = entry.get("name", "") - if field_name in ignored: - continue - type_ = entry.get("type") - if type_: - field_name += f" ({inline(type_)})" - lines.append( - rst_fields(field_name, sanitize(entry.get("doc", "")), level) - ) - elif isinstance(entry, list): - lines.append(rst_list_inline(entry, level)) - else: - lines.append(rst_bullet(inline(sanitize(entry)), level)) - - lines.append("\n") - return "\n".join(lines) - - -def parse_definitions(defs: Dict[str, Any], namespace: str) -> str: - """Parse definitions section""" - preprocessed = ["name", "entries", "members"] - ignored = ["render-max"] # This is not printed - lines = [] - - for definition in defs: - lines.append(rst_section(namespace, 'definition', definition["name"])) - for k in definition.keys(): - if k in preprocessed + ignored: - continue - lines.append(rst_fields(k, sanitize(definition[k]), 0)) - - # Field list needs to finish with a new line - lines.append("\n") - if "entries" in definition: - lines.append(rst_paragraph(":entries:", 0)) - lines.append(parse_entries(definition["entries"], 1)) - if "members" in definition: - lines.append(rst_paragraph(":members:", 0)) - lines.append(parse_entries(definition["members"], 1)) - - return "\n".join(lines) - - -def parse_attr_sets(entries: List[Dict[str, Any]], namespace: str) -> str: - """Parse attribute from attribute-set""" - preprocessed = ["name", "type"] - linkable = ["enum", "nested-attributes", "struct", "sub-message"] - ignored = ["checks"] - lines = [] - - for entry in entries: - lines.append(rst_section(namespace, 'attribute-set', entry["name"])) - for attr in entry["attributes"]: - type_ = attr.get("type") - attr_line = attr["name"] - if type_: - # Add the attribute type in the same line - attr_line += f" ({inline(type_)})" - - lines.append(rst_subsubsection(attr_line)) - - for k in attr.keys(): - if k in preprocessed + ignored: - continue - if k in linkable: - value = rst_ref(namespace, k, attr[k]) - else: - value = sanitize(attr[k]) - lines.append(rst_fields(k, value, 0)) - lines.append("\n") - - return "\n".join(lines) - - -def parse_sub_messages(entries: List[Dict[str, Any]], namespace: str) -> str: - """Parse sub-message definitions""" - lines = [] - - for entry in entries: - lines.append(rst_section(namespace, 'sub-message', entry["name"])) - for fmt in entry["formats"]: - value = fmt["value"] - - lines.append(rst_bullet(bold(value))) - for attr in ['fixed-header', 'attribute-set']: - if attr in fmt: - lines.append(rst_fields(attr, - rst_ref(namespace, attr, fmt[attr]), - 1)) - lines.append("\n") - - return "\n".join(lines) - - -def parse_yaml(obj: Dict[str, Any]) -> str: - """Format the whole YAML into a RST string""" - lines = [] - - # Main header - - family = obj['name'] - - lines.append(rst_header()) - lines.append(rst_label("netlink-" + family)) - - title = f"Family ``{family}`` netlink specification" - lines.append(rst_title(title)) - lines.append(rst_paragraph(".. contents:: :depth: 3\n")) - - if "doc" in obj: - lines.append(rst_subtitle("Summary")) - lines.append(rst_paragraph(obj["doc"], 0)) - - # Operations - if "operations" in obj: - lines.append(rst_subtitle("Operations")) - lines.append(parse_operations(obj["operations"]["list"], family)) - - # Multicast groups - if "mcast-groups" in obj: - lines.append(rst_subtitle("Multicast groups")) - lines.append(parse_mcast_group(obj["mcast-groups"]["list"])) - - # Definitions - if "definitions" in obj: - lines.append(rst_subtitle("Definitions")) - lines.append(parse_definitions(obj["definitions"], family)) - - # Attributes set - if "attribute-sets" in obj: - lines.append(rst_subtitle("Attribute sets")) - lines.append(parse_attr_sets(obj["attribute-sets"], family)) - - # Sub-messages - if "sub-messages" in obj: - lines.append(rst_subtitle("Sub-messages")) - lines.append(parse_sub_messages(obj["sub-messages"], family)) - - return "\n".join(lines) - - -# Main functions -# ============== +sys.path.append(pathlib.Path(__file__).resolve().parent.as_posix()) +from lib import YnlDocGenerator # pylint: disable=C0413 def parse_arguments() -> argparse.Namespace: """Parse arguments from user""" @@ -392,15 +55,6 @@ def parse_arguments() -> argparse.Namespace: return args -def parse_yaml_file(filename: str) -> str: - """Transform the YAML specified by filename into an RST-formatted string""" - with open(filename, "r", encoding="utf-8") as spec_file: - yaml_data = yaml.safe_load(spec_file) - content = parse_yaml(yaml_data) - - return content - - def write_to_rstfile(content: str, filename: str) -> None: """Write the generated content into an RST file""" logging.debug("Saving RST file to %s", filename) @@ -409,21 +63,22 @@ def write_to_rstfile(content: str, filename: str) -> None: rst_file.write(content) -def generate_main_index_rst(output: str) -> None: +def generate_main_index_rst(parser: YnlDocGenerator, output: str) -> None: """Generate the `networking_spec/index` content and write to the file""" lines = [] - lines.append(rst_header()) - lines.append(rst_label("specs")) - lines.append(rst_title("Netlink Family Specifications")) - lines.append(rst_toctree(1)) + lines.append(parser.fmt.rst_header()) + lines.append(parser.fmt.rst_label("specs")) + lines.append(parser.fmt.rst_title("Netlink Family Specifications")) + lines.append(parser.fmt.rst_toctree(1)) index_dir = os.path.dirname(output) logging.debug("Looking for .rst files in %s", index_dir) for filename in sorted(os.listdir(index_dir)): - if not filename.endswith(".rst") or filename == "index.rst": + base, ext = os.path.splitext(filename) + if filename == "index.rst" or ext not in [".rst", ".yaml"]: continue - lines.append(f" {filename.replace('.rst', '')}\n") + lines.append(f" {base}\n") logging.debug("Writing an index file at %s", output) write_to_rstfile("".join(lines), output) @@ -434,10 +89,12 @@ def main() -> None: args = parse_arguments() + parser = YnlDocGenerator() + if args.input: logging.debug("Parsing %s", args.input) try: - content = parse_yaml_file(os.path.join(args.input)) + content = parser.parse_yaml_file(os.path.join(args.input)) except Exception as exception: logging.warning("Failed to parse %s.", args.input) logging.warning(exception) @@ -447,7 +104,7 @@ def main() -> None: if args.index: # Generate the index RST file - generate_main_index_rst(args.output) + generate_main_index_rst(parser, args.output) if __name__ == "__main__": From beaea9c4ba2d8ef1b10223dc3a75a7d7be3e5cd9 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Thu, 12 Jun 2025 10:34:30 +0200 Subject: [PATCH 0005/1536] docs: netlink: index.rst: add a netlink index file Instead of generating the index file, use glob to automatically include all data from yaml. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Donald Hunter --- Documentation/netlink/specs/index.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 Documentation/netlink/specs/index.rst diff --git a/Documentation/netlink/specs/index.rst b/Documentation/netlink/specs/index.rst new file mode 100644 index 000000000000..7f7cf4a096f2 --- /dev/null +++ b/Documentation/netlink/specs/index.rst @@ -0,0 +1,13 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _specs: + +============================= +Netlink Family Specifications +============================= + +.. toctree:: + :maxdepth: 1 + :glob: + + * From 3a3b8a144754858b59f108c7dc80eb613bf6fe64 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 13 Jun 2025 16:09:05 +0200 Subject: [PATCH 0006/1536] tools: ynl_gen_rst.py: cleanup coding style Cleanup some coding style issues pointed by pylint and flake8. No functional changes. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Breno Leitao Reviewed-by: Donald Hunter --- tools/net/ynl/pyynl/lib/doc_generator.py | 72 ++++++++---------------- 1 file changed, 25 insertions(+), 47 deletions(-) diff --git a/tools/net/ynl/pyynl/lib/doc_generator.py b/tools/net/ynl/pyynl/lib/doc_generator.py index 80e468086693..474b2b78c7bc 100644 --- a/tools/net/ynl/pyynl/lib/doc_generator.py +++ b/tools/net/ynl/pyynl/lib/doc_generator.py @@ -18,17 +18,12 @@ """ from typing import Any, Dict, List -import os.path -import sys -import argparse -import logging import yaml -# ============== -# RST Formatters -# ============== class RstFormatters: + """RST Formatters""" + SPACE_PER_LEVEL = 4 @staticmethod @@ -36,81 +31,67 @@ class RstFormatters: """Return space to format""" return " " * (level * RstFormatters.SPACE_PER_LEVEL) - @staticmethod def bold(text: str) -> str: """Format bold text""" return f"**{text}**" - @staticmethod def inline(text: str) -> str: """Format inline text""" return f"``{text}``" - @staticmethod def sanitize(text: str) -> str: """Remove newlines and multiple spaces""" # This is useful for some fields that are spread across multiple lines return str(text).replace("\n", " ").strip() - def rst_fields(self, key: str, value: str, level: int = 0) -> str: """Return a RST formatted field""" return self.headroom(level) + f":{key}: {value}" - def rst_definition(self, key: str, value: Any, level: int = 0) -> str: """Format a single rst definition""" return self.headroom(level) + key + "\n" + self.headroom(level + 1) + str(value) - def rst_paragraph(self, paragraph: str, level: int = 0) -> str: """Return a formatted paragraph""" return self.headroom(level) + paragraph - def rst_bullet(self, item: str, level: int = 0) -> str: """Return a formatted a bullet""" return self.headroom(level) + f"- {item}" - @staticmethod def rst_subsection(title: str) -> str: """Add a sub-section to the document""" return f"{title}\n" + "-" * len(title) - @staticmethod def rst_subsubsection(title: str) -> str: """Add a sub-sub-section to the document""" return f"{title}\n" + "~" * len(title) - @staticmethod def rst_section(namespace: str, prefix: str, title: str) -> str: """Add a section to the document""" return f".. _{namespace}-{prefix}-{title}:\n\n{title}\n" + "=" * len(title) - @staticmethod def rst_subtitle(title: str) -> str: """Add a subtitle to the document""" return "\n" + "-" * len(title) + f"\n{title}\n" + "-" * len(title) + "\n\n" - @staticmethod def rst_title(title: str) -> str: """Add a title to the document""" return "=" * len(title) + f"\n{title}\n" + "=" * len(title) + "\n\n" - def rst_list_inline(self, list_: List[str], level: int = 0) -> str: """Format a list using inlines""" return self.headroom(level) + "[" + ", ".join(self.inline(i) for i in list_) + "]" - @staticmethod def rst_ref(namespace: str, prefix: str, name: str) -> str: """Add a hyperlink to the document""" @@ -122,7 +103,6 @@ class RstFormatters: prefix = mappings[prefix] return f":ref:`{namespace}-{prefix}-{name}`" - def rst_header(self) -> str: """The headers for all the auto generated RST files""" lines = [] @@ -132,7 +112,6 @@ class RstFormatters: return "\n".join(lines) - @staticmethod def rst_toctree(maxdepth: int = 2) -> str: """Generate a toctree RST primitive""" @@ -143,16 +122,13 @@ class RstFormatters: return "\n".join(lines) - @staticmethod def rst_label(title: str) -> str: """Return a formatted label""" return f".. _{title}:\n\n" -# ======= -# Parsers -# ======= class YnlDocGenerator: + """YAML Netlink specs Parser""" fmt = RstFormatters() @@ -164,7 +140,6 @@ class YnlDocGenerator: return "\n".join(lines) - def parse_do(self, do_dict: Dict[str, Any], level: int = 0) -> str: """Parse 'do' section and return a formatted string""" lines = [] @@ -177,16 +152,16 @@ class YnlDocGenerator: return "\n".join(lines) - def parse_do_attributes(self, attrs: Dict[str, Any], level: int = 0) -> str: """Parse 'attributes' section""" if "attributes" not in attrs: return "" - lines = [self.fmt.rst_fields("attributes", self.fmt.rst_list_inline(attrs["attributes"]), level + 1)] + lines = [self.fmt.rst_fields("attributes", + self.fmt.rst_list_inline(attrs["attributes"]), + level + 1)] return "\n".join(lines) - def parse_operations(self, operations: List[Dict[str, Any]], namespace: str) -> str: """Parse operations block""" preprocessed = ["name", "doc", "title", "do", "dump", "flags"] @@ -194,7 +169,8 @@ class YnlDocGenerator: lines = [] for operation in operations: - lines.append(self.fmt.rst_section(namespace, 'operation', operation["name"])) + lines.append(self.fmt.rst_section(namespace, 'operation', + operation["name"])) lines.append(self.fmt.rst_paragraph(operation["doc"]) + "\n") for key in operation.keys(): @@ -206,7 +182,8 @@ class YnlDocGenerator: value = self.fmt.rst_ref(namespace, key, value) lines.append(self.fmt.rst_fields(key, value, 0)) if 'flags' in operation: - lines.append(self.fmt.rst_fields('flags', self.fmt.rst_list_inline(operation['flags']))) + lines.append(self.fmt.rst_fields('flags', + self.fmt.rst_list_inline(operation['flags']))) if "do" in operation: lines.append(self.fmt.rst_paragraph(":do:", 0)) @@ -220,7 +197,6 @@ class YnlDocGenerator: return "\n".join(lines) - def parse_entries(self, entries: List[Dict[str, Any]], level: int) -> str: """Parse a list of entries""" ignored = ["pad"] @@ -235,17 +211,19 @@ class YnlDocGenerator: if type_: field_name += f" ({self.fmt.inline(type_)})" lines.append( - self.fmt.rst_fields(field_name, self.fmt.sanitize(entry.get("doc", "")), level) + self.fmt.rst_fields(field_name, + self.fmt.sanitize(entry.get("doc", "")), + level) ) elif isinstance(entry, list): lines.append(self.fmt.rst_list_inline(entry, level)) else: - lines.append(self.fmt.rst_bullet(self.fmt.inline(self.fmt.sanitize(entry)), level)) + lines.append(self.fmt.rst_bullet(self.fmt.inline(self.fmt.sanitize(entry)), + level)) lines.append("\n") return "\n".join(lines) - def parse_definitions(self, defs: Dict[str, Any], namespace: str) -> str: """Parse definitions section""" preprocessed = ["name", "entries", "members"] @@ -270,7 +248,6 @@ class YnlDocGenerator: return "\n".join(lines) - def parse_attr_sets(self, entries: List[Dict[str, Any]], namespace: str) -> str: """Parse attribute from attribute-set""" preprocessed = ["name", "type"] @@ -279,7 +256,8 @@ class YnlDocGenerator: lines = [] for entry in entries: - lines.append(self.fmt.rst_section(namespace, 'attribute-set', entry["name"])) + lines.append(self.fmt.rst_section(namespace, 'attribute-set', + entry["name"])) for attr in entry["attributes"]: type_ = attr.get("type") attr_line = attr["name"] @@ -301,13 +279,13 @@ class YnlDocGenerator: return "\n".join(lines) - def parse_sub_messages(self, entries: List[Dict[str, Any]], namespace: str) -> str: """Parse sub-message definitions""" lines = [] for entry in entries: - lines.append(self.fmt.rst_section(namespace, 'sub-message', entry["name"])) + lines.append(self.fmt.rst_section(namespace, 'sub-message', + entry["name"])) for fmt in entry["formats"]: value = fmt["value"] @@ -315,13 +293,14 @@ class YnlDocGenerator: for attr in ['fixed-header', 'attribute-set']: if attr in fmt: lines.append(self.fmt.rst_fields(attr, - self.fmt.rst_ref(namespace, attr, fmt[attr]), - 1)) + self.fmt.rst_ref(namespace, + attr, + fmt[attr]), + 1)) lines.append("\n") return "\n".join(lines) - def parse_yaml(self, obj: Dict[str, Any]) -> str: """Format the whole YAML into a RST string""" lines = [] @@ -344,7 +323,8 @@ class YnlDocGenerator: # Operations if "operations" in obj: lines.append(self.fmt.rst_subtitle("Operations")) - lines.append(self.parse_operations(obj["operations"]["list"], family)) + lines.append(self.parse_operations(obj["operations"]["list"], + family)) # Multicast groups if "mcast-groups" in obj: @@ -368,11 +348,9 @@ class YnlDocGenerator: return "\n".join(lines) - # Main functions # ============== - def parse_yaml_file(self, filename: str) -> str: """Transform the YAML specified by filename into an RST-formatted string""" with open(filename, "r", encoding="utf-8") as spec_file: From bb1e3629b2e684e03adb21d75c7839f3afeb488e Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Thu, 12 Jun 2025 09:58:56 +0200 Subject: [PATCH 0007/1536] docs: sphinx: add a parser for yaml files for Netlink specs Add a simple sphinx.Parser to handle yaml files and add the the code to handle Netlink specs. All other yaml files are ignored. The code was written in a way that parsing yaml for different subsystems and even for different parts of Netlink are easy. All it takes to have a different parser is to add an import line similar to: from doc_generator import YnlDocGenerator adding the corresponding parser somewhere at the extension: netlink_parser = YnlDocGenerator() And then add a logic inside parse() to handle different doc outputs, depending on the file location, similar to: if "/netlink/specs/" in fname: msg = self.netlink_parser.parse_yaml_file(fname) Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Donald Hunter --- Documentation/sphinx/parser_yaml.py | 104 ++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100755 Documentation/sphinx/parser_yaml.py diff --git a/Documentation/sphinx/parser_yaml.py b/Documentation/sphinx/parser_yaml.py new file mode 100755 index 000000000000..fa2e6da17617 --- /dev/null +++ b/Documentation/sphinx/parser_yaml.py @@ -0,0 +1,104 @@ +# SPDX-License-Identifier: GPL-2.0 +# Copyright 2025 Mauro Carvalho Chehab + +""" +Sphinx extension for processing YAML files +""" + +import os +import re +import sys + +from pprint import pformat + +from docutils.parsers.rst import Parser as RSTParser +from docutils.statemachine import ViewList + +from sphinx.util import logging +from sphinx.parsers import Parser + +srctree = os.path.abspath(os.environ["srctree"]) +sys.path.insert(0, os.path.join(srctree, "tools/net/ynl/pyynl/lib")) + +from doc_generator import YnlDocGenerator # pylint: disable=C0413 + +logger = logging.getLogger(__name__) + +class YamlParser(Parser): + """ + Kernel parser for YAML files. + + This is a simple sphinx.Parser to handle yaml files inside the + Kernel tree that will be part of the built documentation. + + The actual parser function is not contained here: the code was + written in a way that parsing yaml for different subsystems + can be done from a single dispatcher. + + All it takes to have parse YAML patches is to have an import line: + + from some_parser_code import NewYamlGenerator + + To this module. Then add an instance of the parser with: + + new_parser = NewYamlGenerator() + + and add a logic inside parse() to handle it based on the path, + like this: + + if "/foo" in fname: + msg = self.new_parser.parse_yaml_file(fname) + """ + + supported = ('yaml', ) + + netlink_parser = YnlDocGenerator() + + def rst_parse(self, inputstring, document, msg): + """ + Receives a ReST content that was previously converted by the + YAML parser, adding it to the document tree. + """ + + self.setup_parse(inputstring, document) + + result = ViewList() + + try: + # Parse message with RSTParser + for i, line in enumerate(msg.split('\n')): + result.append(line, document.current_source, i) + + rst_parser = RSTParser() + rst_parser.parse('\n'.join(result), document) + + except Exception as e: + document.reporter.error("YAML parsing error: %s" % pformat(e)) + + self.finish_parse() + + # Overrides docutils.parsers.Parser. See sphinx.parsers.RSTParser + def parse(self, inputstring, document): + """Check if a YAML is meant to be parsed.""" + + fname = document.current_source + + # Handle netlink yaml specs + if "/netlink/specs/" in fname: + msg = self.netlink_parser.parse_yaml_file(fname) + self.rst_parse(inputstring, document, msg) + + # All other yaml files are ignored + +def setup(app): + """Setup function for the Sphinx extension.""" + + # Add YAML parser + app.add_source_parser(YamlParser) + app.add_source_suffix('.yaml', 'yaml') + + return { + 'version': '1.0', + 'parallel_read_safe': True, + 'parallel_write_safe': True, + } From 1ce4da3dd99e98bd4a8b396c291041080e0fe85e Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Thu, 12 Jun 2025 10:34:30 +0200 Subject: [PATCH 0008/1536] docs: use parser_yaml extension to handle Netlink specs Instead of manually calling ynl_gen_rst.py, use a Sphinx extension. This way, no .rst files would be written to the Kernel source directories. We are using here a toctree with :glob: property. This way, there is no need to touch the netlink/specs/index.rst file every time a new Netlink spec is added/renamed/removed. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Donald Hunter --- Documentation/Makefile | 17 ---------------- Documentation/conf.py | 20 ++++++++++++++----- Documentation/networking/index.rst | 2 +- .../networking/netlink_spec/readme.txt | 4 ---- 4 files changed, 16 insertions(+), 27 deletions(-) delete mode 100644 Documentation/networking/netlink_spec/readme.txt diff --git a/Documentation/Makefile b/Documentation/Makefile index b98477df5ddf..820f07e0afe6 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -104,22 +104,6 @@ quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4) cp $(if $(patsubst /%,,$(DOCS_CSS)),$(abspath $(srctree)/$(DOCS_CSS)),$(DOCS_CSS)) $(BUILDDIR)/$3/_static/; \ fi -YNL_INDEX:=$(srctree)/Documentation/networking/netlink_spec/index.rst -YNL_RST_DIR:=$(srctree)/Documentation/networking/netlink_spec -YNL_YAML_DIR:=$(srctree)/Documentation/netlink/specs -YNL_TOOL:=$(srctree)/tools/net/ynl/pyynl/ynl_gen_rst.py - -YNL_RST_FILES_TMP := $(patsubst %.yaml,%.rst,$(wildcard $(YNL_YAML_DIR)/*.yaml)) -YNL_RST_FILES := $(patsubst $(YNL_YAML_DIR)%,$(YNL_RST_DIR)%, $(YNL_RST_FILES_TMP)) - -$(YNL_INDEX): $(YNL_RST_FILES) - $(Q)$(YNL_TOOL) -o $@ -x - -$(YNL_RST_DIR)/%.rst: $(YNL_YAML_DIR)/%.yaml $(YNL_TOOL) - $(Q)$(YNL_TOOL) -i $< -o $@ - -htmldocs texinfodocs latexdocs epubdocs xmldocs: $(YNL_INDEX) - htmldocs: @$(srctree)/scripts/sphinx-pre-install --version-check @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var))) @@ -186,7 +170,6 @@ refcheckdocs: $(Q)cd $(srctree);scripts/documentation-file-ref-check cleandocs: - $(Q)rm -f $(YNL_INDEX) $(YNL_RST_FILES) $(Q)rm -rf $(BUILDDIR) $(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/userspace-api/media clean diff --git a/Documentation/conf.py b/Documentation/conf.py index 700516238d3f..f9828f3862f9 100644 --- a/Documentation/conf.py +++ b/Documentation/conf.py @@ -42,6 +42,15 @@ exclude_patterns = [] dyn_include_patterns = [] dyn_exclude_patterns = ["output"] +# Currently, only netlink/specs has a parser for yaml. +# Prefer using include patterns if available, as it is faster +if has_include_patterns: + dyn_include_patterns.append("netlink/specs/*.yaml") +else: + dyn_exclude_patterns.append("netlink/*.yaml") + dyn_exclude_patterns.append("devicetree/bindings/**.yaml") + dyn_exclude_patterns.append("core-api/kho/bindings/**.yaml") + # Properly handle include/exclude patterns # ---------------------------------------- @@ -102,12 +111,12 @@ extensions = [ "kernel_include", "kfigure", "maintainers_include", + "parser_yaml", "rstFlatTable", "sphinx.ext.autosectionlabel", "sphinx.ext.ifconfig", "translations", ] - # Since Sphinx version 3, the C function parser is more pedantic with regards # to type checking. Due to that, having macros at c:function cause problems. # Those needed to be escaped by using c_id_attributes[] array @@ -204,10 +213,11 @@ else: # Add any paths that contain templates here, relative to this directory. templates_path = ["sphinx/templates"] -# The suffix(es) of source filenames. -# You can specify multiple suffix as a list of string: -# source_suffix = ['.rst', '.md'] -source_suffix = '.rst' +# The suffixes of source filenames that will be automatically parsed +source_suffix = { + ".rst": "restructuredtext", + ".yaml": "yaml", +} # The encoding of source files. # source_encoding = 'utf-8-sig' diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index ac90b82f3ce9..b7a4969e9bc9 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -57,7 +57,7 @@ Contents: filter generic-hdlc generic_netlink - netlink_spec/index + ../netlink/specs/index gen_stats gtp ila diff --git a/Documentation/networking/netlink_spec/readme.txt b/Documentation/networking/netlink_spec/readme.txt deleted file mode 100644 index 030b44aca4e6..000000000000 --- a/Documentation/networking/netlink_spec/readme.txt +++ /dev/null @@ -1,4 +0,0 @@ -SPDX-License-Identifier: GPL-2.0 - -This file is populated during the build of the documentation (htmldocs) by the -tools/net/ynl/pyynl/ynl_gen_rst.py script. From 11d137aaef802fb2a22b917b4d91e800141140c7 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Thu, 12 Jun 2025 15:27:39 +0200 Subject: [PATCH 0009/1536] docs: uapi: netlink: update netlink specs link With the recent parser_yaml extension, and the removal of the auto-generated ReST source files, the location of netlink specs changed. Update uAPI accordingly. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Donald Hunter --- Documentation/userspace-api/netlink/index.rst | 2 +- Documentation/userspace-api/netlink/specs.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/userspace-api/netlink/index.rst b/Documentation/userspace-api/netlink/index.rst index c1b6765cc963..83ae25066591 100644 --- a/Documentation/userspace-api/netlink/index.rst +++ b/Documentation/userspace-api/netlink/index.rst @@ -18,4 +18,4 @@ Netlink documentation for users. See also: - :ref:`Documentation/core-api/netlink.rst ` - - :ref:`Documentation/networking/netlink_spec/index.rst ` + - :ref:`Documentation/netlink/specs/index.rst ` diff --git a/Documentation/userspace-api/netlink/specs.rst b/Documentation/userspace-api/netlink/specs.rst index 1b50d97d8d7c..debb4bfca5c4 100644 --- a/Documentation/userspace-api/netlink/specs.rst +++ b/Documentation/userspace-api/netlink/specs.rst @@ -15,7 +15,7 @@ kernel headers directly. Internally kernel uses the YAML specs to generate: - the C uAPI header - - documentation of the protocol as a ReST file - see :ref:`Documentation/networking/netlink_spec/index.rst ` + - documentation of the protocol as a ReST file - see :ref:`Documentation/netlink/specs/index.rst ` - policy tables for input attribute validation - operation tables From dc2f50796a78061480a658e2b7c2ebacc4c8d84e Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 17 Jun 2025 09:34:39 +0200 Subject: [PATCH 0010/1536] tools: ynl_gen_rst.py: drop support for generating index files As we're now using an index file with a glob, there's no need to generate index files anymore. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Donald Hunter --- tools/net/ynl/pyynl/ynl_gen_rst.py | 28 ---------------------------- 1 file changed, 28 deletions(-) diff --git a/tools/net/ynl/pyynl/ynl_gen_rst.py b/tools/net/ynl/pyynl/ynl_gen_rst.py index 010315fad498..90ae19aac89d 100755 --- a/tools/net/ynl/pyynl/ynl_gen_rst.py +++ b/tools/net/ynl/pyynl/ynl_gen_rst.py @@ -31,9 +31,6 @@ def parse_arguments() -> argparse.Namespace: # Index and input are mutually exclusive group = parser.add_mutually_exclusive_group() - group.add_argument( - "-x", "--index", action="store_true", help="Generate the index page" - ) group.add_argument("-i", "--input", help="YAML file name") args = parser.parse_args() @@ -63,27 +60,6 @@ def write_to_rstfile(content: str, filename: str) -> None: rst_file.write(content) -def generate_main_index_rst(parser: YnlDocGenerator, output: str) -> None: - """Generate the `networking_spec/index` content and write to the file""" - lines = [] - - lines.append(parser.fmt.rst_header()) - lines.append(parser.fmt.rst_label("specs")) - lines.append(parser.fmt.rst_title("Netlink Family Specifications")) - lines.append(parser.fmt.rst_toctree(1)) - - index_dir = os.path.dirname(output) - logging.debug("Looking for .rst files in %s", index_dir) - for filename in sorted(os.listdir(index_dir)): - base, ext = os.path.splitext(filename) - if filename == "index.rst" or ext not in [".rst", ".yaml"]: - continue - lines.append(f" {base}\n") - - logging.debug("Writing an index file at %s", output) - write_to_rstfile("".join(lines), output) - - def main() -> None: """Main function that reads the YAML files and generates the RST files""" @@ -102,10 +78,6 @@ def main() -> None: write_to_rstfile(content, args.output) - if args.index: - # Generate the index RST file - generate_main_index_rst(parser, args.output) - if __name__ == "__main__": main() From 6abc773747a80dfcbf8aa05fce7bb8d5d815fdb3 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 10 Jun 2025 10:44:45 +0200 Subject: [PATCH 0011/1536] docs: netlink: remove obsolete .gitignore from unused directory The previous code was generating source rst files under Documentation/networking/netlink_spec/. With the Sphinx YAML parser, this is now gone. So, stop ignoring *.rst files inside netlink specs directory. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Donald Hunter --- Documentation/networking/netlink_spec/.gitignore | 1 - 1 file changed, 1 deletion(-) delete mode 100644 Documentation/networking/netlink_spec/.gitignore diff --git a/Documentation/networking/netlink_spec/.gitignore b/Documentation/networking/netlink_spec/.gitignore deleted file mode 100644 index 30d85567b592..000000000000 --- a/Documentation/networking/netlink_spec/.gitignore +++ /dev/null @@ -1 +0,0 @@ -*.rst From 778756819af1e5aca9e3c9f77a3c4cfb4b2a5fb8 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 13 Jun 2025 13:02:01 +0200 Subject: [PATCH 0012/1536] MAINTAINERS: add netlink_yml_parser.py to linux-doc The documentation build depends on the parsing code at ynl_gen_rst.py. Ensure that changes to it will be c/c to linux-doc ML and maintainers by adding an entry for it. This way, if a change there would affect the build, or the minimal version required for Python, doc developers may know in advance. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Breno Leitao Reviewed-by: Donald Hunter --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index fe168477caa4..39d5e4e09273 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7307,6 +7307,7 @@ F: scripts/get_abi.py F: scripts/kernel-doc* F: scripts/lib/abi/* F: scripts/lib/kdoc/* +F: tools/net/ynl/pyynl/lib/doc_generator.py F: scripts/sphinx-pre-install X: Documentation/ABI/ X: Documentation/admin-guide/media/ From ad06a878a328f230e9bf62ec0042eea1798d2cb0 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 17 Jun 2025 17:28:04 +0200 Subject: [PATCH 0013/1536] tools: netlink_yml_parser.py: add line numbers to parsed data When something goes wrong, we want Sphinx error to point to the right line number from the original source, not from the processed ReST data. Signed-off-by: Mauro Carvalho Chehab --- tools/net/ynl/pyynl/lib/doc_generator.py | 34 ++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/tools/net/ynl/pyynl/lib/doc_generator.py b/tools/net/ynl/pyynl/lib/doc_generator.py index 474b2b78c7bc..658759a527a6 100644 --- a/tools/net/ynl/pyynl/lib/doc_generator.py +++ b/tools/net/ynl/pyynl/lib/doc_generator.py @@ -20,6 +20,16 @@ from typing import Any, Dict, List import yaml +LINE_STR = '__lineno__' + +class NumberedSafeLoader(yaml.SafeLoader): # pylint: disable=R0901 + """Override the SafeLoader class to add line number to parsed data""" + + def construct_mapping(self, node, *args, **kwargs): + mapping = super().construct_mapping(node, *args, **kwargs) + mapping[LINE_STR] = node.start_mark.line + + return mapping class RstFormatters: """RST Formatters""" @@ -127,6 +137,11 @@ class RstFormatters: """Return a formatted label""" return f".. _{title}:\n\n" + @staticmethod + def rst_lineno(lineno: int) -> str: + """Return a lineno comment""" + return f".. LINENO {lineno}\n" + class YnlDocGenerator: """YAML Netlink specs Parser""" @@ -144,6 +159,9 @@ class YnlDocGenerator: """Parse 'do' section and return a formatted string""" lines = [] for key in do_dict.keys(): + if key == LINE_STR: + lines.append(self.fmt.rst_lineno(do_dict[key])) + continue lines.append(self.fmt.rst_paragraph(self.fmt.bold(key), level + 1)) if key in ['request', 'reply']: lines.append(self.parse_do_attributes(do_dict[key], level + 1) + "\n") @@ -174,6 +192,10 @@ class YnlDocGenerator: lines.append(self.fmt.rst_paragraph(operation["doc"]) + "\n") for key in operation.keys(): + if key == LINE_STR: + lines.append(self.fmt.rst_lineno(operation[key])) + continue + if key in preprocessed: # Skip the special fields continue @@ -233,6 +255,9 @@ class YnlDocGenerator: for definition in defs: lines.append(self.fmt.rst_section(namespace, 'definition', definition["name"])) for k in definition.keys(): + if k == LINE_STR: + lines.append(self.fmt.rst_lineno(definition[k])) + continue if k in preprocessed + ignored: continue lines.append(self.fmt.rst_fields(k, self.fmt.sanitize(definition[k]), 0)) @@ -268,6 +293,9 @@ class YnlDocGenerator: lines.append(self.fmt.rst_subsubsection(attr_line)) for k in attr.keys(): + if k == LINE_STR: + lines.append(self.fmt.rst_lineno(attr[k])) + continue if k in preprocessed + ignored: continue if k in linkable: @@ -306,6 +334,8 @@ class YnlDocGenerator: lines = [] # Main header + lineno = obj.get('__lineno__', 0) + lines.append(self.fmt.rst_lineno(lineno)) family = obj['name'] @@ -354,7 +384,7 @@ class YnlDocGenerator: def parse_yaml_file(self, filename: str) -> str: """Transform the YAML specified by filename into an RST-formatted string""" with open(filename, "r", encoding="utf-8") as spec_file: - yaml_data = yaml.safe_load(spec_file) - content = self.parse_yaml(yaml_data) + numbered_yaml = yaml.load(spec_file, Loader=NumberedSafeLoader) + content = self.parse_yaml(numbered_yaml) return content From 0b24dfdd12f4ca3ef5efa0cfaad3fc14f5b3fbf1 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 17 Jun 2025 17:54:03 +0200 Subject: [PATCH 0014/1536] docs: parser_yaml.py: add support for line numbers from the parser Instead of printing line numbers from the temp converted ReST file, get them from the original source. Signed-off-by: Mauro Carvalho Chehab --- Documentation/sphinx/parser_yaml.py | 12 ++++++++++-- tools/net/ynl/pyynl/lib/doc_generator.py | 16 ++++++++++++---- 2 files changed, 22 insertions(+), 6 deletions(-) diff --git a/Documentation/sphinx/parser_yaml.py b/Documentation/sphinx/parser_yaml.py index fa2e6da17617..8288e2ff7c7c 100755 --- a/Documentation/sphinx/parser_yaml.py +++ b/Documentation/sphinx/parser_yaml.py @@ -54,6 +54,8 @@ class YamlParser(Parser): netlink_parser = YnlDocGenerator() + re_lineno = re.compile(r"\.\. LINENO ([0-9]+)$") + def rst_parse(self, inputstring, document, msg): """ Receives a ReST content that was previously converted by the @@ -66,8 +68,14 @@ class YamlParser(Parser): try: # Parse message with RSTParser - for i, line in enumerate(msg.split('\n')): - result.append(line, document.current_source, i) + lineoffset = 0; + for line in msg.split('\n'): + match = self.re_lineno.match(line) + if match: + lineoffset = int(match.group(1)) + continue + + result.append(line, document.current_source, lineoffset) rst_parser = RSTParser() rst_parser.parse('\n'.join(result), document) diff --git a/tools/net/ynl/pyynl/lib/doc_generator.py b/tools/net/ynl/pyynl/lib/doc_generator.py index 658759a527a6..403abf1a2eda 100644 --- a/tools/net/ynl/pyynl/lib/doc_generator.py +++ b/tools/net/ynl/pyynl/lib/doc_generator.py @@ -158,9 +158,11 @@ class YnlDocGenerator: def parse_do(self, do_dict: Dict[str, Any], level: int = 0) -> str: """Parse 'do' section and return a formatted string""" lines = [] + if LINE_STR in do_dict: + lines.append(self.fmt.rst_lineno(do_dict[LINE_STR])) + for key in do_dict.keys(): if key == LINE_STR: - lines.append(self.fmt.rst_lineno(do_dict[key])) continue lines.append(self.fmt.rst_paragraph(self.fmt.bold(key), level + 1)) if key in ['request', 'reply']: @@ -187,13 +189,15 @@ class YnlDocGenerator: lines = [] for operation in operations: + if LINE_STR in operation: + lines.append(self.fmt.rst_lineno(operation[LINE_STR])) + lines.append(self.fmt.rst_section(namespace, 'operation', operation["name"])) lines.append(self.fmt.rst_paragraph(operation["doc"]) + "\n") for key in operation.keys(): if key == LINE_STR: - lines.append(self.fmt.rst_lineno(operation[key])) continue if key in preprocessed: @@ -253,10 +257,12 @@ class YnlDocGenerator: lines = [] for definition in defs: + if LINE_STR in definition: + lines.append(self.fmt.rst_lineno(definition[LINE_STR])) + lines.append(self.fmt.rst_section(namespace, 'definition', definition["name"])) for k in definition.keys(): if k == LINE_STR: - lines.append(self.fmt.rst_lineno(definition[k])) continue if k in preprocessed + ignored: continue @@ -284,6 +290,9 @@ class YnlDocGenerator: lines.append(self.fmt.rst_section(namespace, 'attribute-set', entry["name"])) for attr in entry["attributes"]: + if LINE_STR in attr: + lines.append(self.fmt.rst_lineno(attr[LINE_STR])) + type_ = attr.get("type") attr_line = attr["name"] if type_: @@ -294,7 +303,6 @@ class YnlDocGenerator: for k in attr.keys(): if k == LINE_STR: - lines.append(self.fmt.rst_lineno(attr[k])) continue if k in preprocessed + ignored: continue From d90555ef0603935529837e490d5acc8436297c56 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Thu, 19 Jun 2025 14:55:33 +0200 Subject: [PATCH 0015/1536] docs: parser_yaml.py: fix backward compatibility with old docutils As reported by Akira, older docutils versions are not compatible with the way some Sphinx versions send tab_width. Add a code to address it. Reported-by: Akira Yokosawa Closes: https://lore.kernel.org/linux-doc/598b2cb7-2fd7-4388-96ba-2ddf0ab55d2a@gmail.com/ Tested-by: Akira Yokosawa Signed-off-by: Mauro Carvalho Chehab --- Documentation/sphinx/parser_yaml.py | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Documentation/sphinx/parser_yaml.py b/Documentation/sphinx/parser_yaml.py index 8288e2ff7c7c..1602b31f448e 100755 --- a/Documentation/sphinx/parser_yaml.py +++ b/Documentation/sphinx/parser_yaml.py @@ -77,6 +77,10 @@ class YamlParser(Parser): result.append(line, document.current_source, lineoffset) + # Fix backward compatibility with docutils < 0.17.1 + if "tab_width" not in vars(document.settings): + document.settings.tab_width = 8 + rst_parser = RSTParser() rst_parser.parse('\n'.join(result), document) From 47459937be8031aae6aaa17ac5f60985f7c9e1bd Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Thu, 10 Jul 2025 18:36:40 +0200 Subject: [PATCH 0016/1536] sphinx: parser_yaml.py: fix line numbers information As reported by Donald, this code: rst_parser = RSTParser() rst_parser.parse('\n'.join(result), document) breaks line parsing. As an alternative, I tested a variant of it: rst_parser.parse(result, document) but still line number was not preserved. As Donald noted, standard Parser classes don't have a direct mechanism to preserve line numbers from ViewList(). So, instead, let's use a mechanism similar to what we do already at kerneldoc.py: call the statemachine mechanism directly there. I double-checked when states and statemachine were introduced: both were back in 2002. I also tested doc build with docutils 0.16 and 0.21.2. It worked with both, so it seems to be stable enough for our needs. Reported-by: Donald Hunter Closes: https://lore.kernel.org/linux-doc/m24ivk78ng.fsf@gmail.com/T/#u Signed-off-by: Mauro Carvalho Chehab --- Documentation/sphinx/parser_yaml.py | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/Documentation/sphinx/parser_yaml.py b/Documentation/sphinx/parser_yaml.py index 1602b31f448e..634d84a202fc 100755 --- a/Documentation/sphinx/parser_yaml.py +++ b/Documentation/sphinx/parser_yaml.py @@ -11,7 +11,9 @@ import sys from pprint import pformat +from docutils import statemachine from docutils.parsers.rst import Parser as RSTParser +from docutils.parsers.rst import states from docutils.statemachine import ViewList from sphinx.util import logging @@ -56,6 +58,8 @@ class YamlParser(Parser): re_lineno = re.compile(r"\.\. LINENO ([0-9]+)$") + tab_width = 8 + def rst_parse(self, inputstring, document, msg): """ Receives a ReST content that was previously converted by the @@ -66,10 +70,18 @@ class YamlParser(Parser): result = ViewList() + self.statemachine = states.RSTStateMachine(state_classes=states.state_classes, + initial_state='Body', + debug=document.reporter.debug_flag) + try: # Parse message with RSTParser lineoffset = 0; - for line in msg.split('\n'): + + lines = statemachine.string2lines(msg, self.tab_width, + convert_whitespace=True) + + for line in lines: match = self.re_lineno.match(line) if match: lineoffset = int(match.group(1)) @@ -77,12 +89,7 @@ class YamlParser(Parser): result.append(line, document.current_source, lineoffset) - # Fix backward compatibility with docutils < 0.17.1 - if "tab_width" not in vars(document.settings): - document.settings.tab_width = 8 - - rst_parser = RSTParser() - rst_parser.parse('\n'.join(result), document) + self.statemachine.run(result, document) except Exception as e: document.reporter.error("YAML parsing error: %s" % pformat(e)) From 58de1f91e033b1fface8d8948984583125f93736 Mon Sep 17 00:00:00 2001 From: Ping-Ke Shih Date: Thu, 24 Jul 2025 08:48:15 +0800 Subject: [PATCH 0017/1536] wifi: rtw88: sdio: use indirect IO for device registers before power-on The register REG_SYS_CFG1 is used to determine chip basic information as arguments of following flows, such as download firmware and load PHY parameters, so driver read the value early (before power-on). However, the direct IO is disallowed before power-on, or it causes wrong values, which driver recognizes a chip as a wrong type RF_1T1R, but actually RF_2T2R, causing driver warns: rtw88_8822cs mmc1:0001:1: unsupported rf path (1) Fix it by using indirect IO before power-on. Reported-by: Piotr Oniszczuk Closes: https://lore.kernel.org/linux-wireless/699C22B4-A3E3-4206-97D0-22AB3348EBF6@gmail.com/T/#t Suggested-by: Bitterblue Smith Tested-by: Piotr Oniszczuk Reviewed-by: Martin Blumenstingl Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20250724004815.7043-1-pkshih@realtek.com --- drivers/net/wireless/realtek/rtw88/sdio.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/wireless/realtek/rtw88/sdio.c b/drivers/net/wireless/realtek/rtw88/sdio.c index cc2d4fef3587..99d7c629eac6 100644 --- a/drivers/net/wireless/realtek/rtw88/sdio.c +++ b/drivers/net/wireless/realtek/rtw88/sdio.c @@ -144,6 +144,10 @@ static u32 rtw_sdio_to_io_address(struct rtw_dev *rtwdev, u32 addr, static bool rtw_sdio_use_direct_io(struct rtw_dev *rtwdev, u32 addr) { + if (!test_bit(RTW_FLAG_POWERON, rtwdev->flags) && + !rtw_sdio_is_bus_addr(addr)) + return false; + return !rtw_sdio_is_sdio30_supported(rtwdev) || rtw_sdio_is_bus_addr(addr); } From 26a8bf978ae9cd7688af1d08bc8760674d372e22 Mon Sep 17 00:00:00 2001 From: Bitterblue Smith Date: Fri, 1 Aug 2025 23:08:24 +0300 Subject: [PATCH 0018/1536] wifi: rtw88: Lock rtwdev->mutex before setting the LED Some users report that the LED blinking breaks AP mode somehow. Most likely the LED code and the dynamic mechanism are trying to access the hardware registers at the same time. Fix it by locking rtwdev->mutex before setting the LED and unlocking it after. Fixes: 4b6652bc6d8d ("wifi: rtw88: Add support for LED blinking") Closes: https://github.com/lwfinger/rtw88/issues/305 Signed-off-by: Bitterblue Smith Acked-by: Ping-Ke Shih Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/ed69fa07-8678-4a40-af44-65e7b1862197@gmail.com --- drivers/net/wireless/realtek/rtw88/led.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw88/led.c b/drivers/net/wireless/realtek/rtw88/led.c index 25aa6cbaa728..7f9ace351a5b 100644 --- a/drivers/net/wireless/realtek/rtw88/led.c +++ b/drivers/net/wireless/realtek/rtw88/led.c @@ -6,13 +6,23 @@ #include "debug.h" #include "led.h" -static int rtw_led_set_blocking(struct led_classdev *led, - enum led_brightness brightness) +static void rtw_led_set(struct led_classdev *led, + enum led_brightness brightness) { struct rtw_dev *rtwdev = container_of(led, struct rtw_dev, led_cdev); + mutex_lock(&rtwdev->mutex); + rtwdev->chip->ops->led_set(led, brightness); + mutex_unlock(&rtwdev->mutex); +} + +static int rtw_led_set_blocking(struct led_classdev *led, + enum led_brightness brightness) +{ + rtw_led_set(led, brightness); + return 0; } @@ -37,7 +47,7 @@ void rtw_led_init(struct rtw_dev *rtwdev) return; if (rtw_hci_type(rtwdev) == RTW_HCI_TYPE_PCIE) - led->brightness_set = rtwdev->chip->ops->led_set; + led->brightness_set = rtw_led_set; else led->brightness_set_blocking = rtw_led_set_blocking; From 7e1c44fe4c2e1e01fa47d9490893d95309a99687 Mon Sep 17 00:00:00 2001 From: Ping-Ke Shih Date: Mon, 4 Aug 2025 09:22:33 +0800 Subject: [PATCH 0019/1536] wifi: rtw89: print just once for unknown C2H events When driver receives new or unknown C2H events, it print out messages repeatedly once events are received, like rtw89_8922ae 0000:81:00.0: PHY c2h class 2 not support To avoid the thousands of messages, use rtw89_info_once() instead. Also, print out class/func for unknown (undefined) class. Reported-by: Sean Anderson Closes: https://lore.kernel.org/linux-wireless/20250729204437.164320-1-sean.anderson@linux.dev/ Reviewed-by: Sean Anderson Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20250804012234.8913-2-pkshih@realtek.com --- drivers/net/wireless/realtek/rtw89/debug.h | 1 + drivers/net/wireless/realtek/rtw89/mac.c | 7 +++---- drivers/net/wireless/realtek/rtw89/phy.c | 7 +++---- 3 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw89/debug.h b/drivers/net/wireless/realtek/rtw89/debug.h index fc690f7c55dc..a364e7adb079 100644 --- a/drivers/net/wireless/realtek/rtw89/debug.h +++ b/drivers/net/wireless/realtek/rtw89/debug.h @@ -56,6 +56,7 @@ static inline void rtw89_debugfs_deinit(struct rtw89_dev *rtwdev) {} #endif #define rtw89_info(rtwdev, a...) dev_info((rtwdev)->dev, ##a) +#define rtw89_info_once(rtwdev, a...) dev_info_once((rtwdev)->dev, ##a) #define rtw89_warn(rtwdev, a...) dev_warn((rtwdev)->dev, ##a) #define rtw89_err(rtwdev, a...) dev_err((rtwdev)->dev, ##a) diff --git a/drivers/net/wireless/realtek/rtw89/mac.c b/drivers/net/wireless/realtek/rtw89/mac.c index 5a5da9d9c0c5..ef17a307b770 100644 --- a/drivers/net/wireless/realtek/rtw89/mac.c +++ b/drivers/net/wireless/realtek/rtw89/mac.c @@ -5813,12 +5813,11 @@ void rtw89_mac_c2h_handle(struct rtw89_dev *rtwdev, struct sk_buff *skb, case RTW89_MAC_C2H_CLASS_ROLE: return; default: - rtw89_info(rtwdev, "MAC c2h class %d not support\n", class); - return; + break; } if (!handler) { - rtw89_info(rtwdev, "MAC c2h class %d func %d not support\n", class, - func); + rtw89_info_once(rtwdev, "MAC c2h class %d func %d not support\n", + class, func); return; } handler(rtwdev, skb, len); diff --git a/drivers/net/wireless/realtek/rtw89/phy.c b/drivers/net/wireless/realtek/rtw89/phy.c index d607577b353c..01a03d2de3ff 100644 --- a/drivers/net/wireless/realtek/rtw89/phy.c +++ b/drivers/net/wireless/realtek/rtw89/phy.c @@ -3626,12 +3626,11 @@ void rtw89_phy_c2h_handle(struct rtw89_dev *rtwdev, struct sk_buff *skb, handler = rtw89_phy_c2h_dm_handler[func]; break; default: - rtw89_info(rtwdev, "PHY c2h class %d not support\n", class); - return; + break; } if (!handler) { - rtw89_info(rtwdev, "PHY c2h class %d func %d not support\n", class, - func); + rtw89_info_once(rtwdev, "PHY c2h class %d func %d not support\n", + class, func); return; } handler(rtwdev, skb, len); From 04a2de8cfc95076d6c65d4d6d06d0f9d964a2105 Mon Sep 17 00:00:00 2001 From: Ping-Ke Shih Date: Mon, 4 Aug 2025 09:22:34 +0800 Subject: [PATCH 0020/1536] wifi: rtw89: add dummy C2H handlers for BCN resend and update done Two C2H events are not listed, and driver throws MAC c2h class 0 func 6 not support MAC c2h class 1 func 3 not support Since the implementation in vendor driver does nothing, add two dummy functions for them. Reported-by: Bitterblue Smith Closes: https://lore.kernel.org/linux-wireless/d2d62793-046c-4b55-93ed-1d1f43cff7f2@gmail.com/ Reviewed-by: Sean Anderson Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20250804012234.8913-3-pkshih@realtek.com --- drivers/net/wireless/realtek/rtw89/mac.c | 13 ++++++++++++- drivers/net/wireless/realtek/rtw89/mac.h | 1 + 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/realtek/rtw89/mac.c b/drivers/net/wireless/realtek/rtw89/mac.c index ef17a307b770..33a7dd9d6f0e 100644 --- a/drivers/net/wireless/realtek/rtw89/mac.c +++ b/drivers/net/wireless/realtek/rtw89/mac.c @@ -5235,6 +5235,11 @@ rtw89_mac_c2h_bcn_cnt(struct rtw89_dev *rtwdev, struct sk_buff *c2h, u32 len) { } +static void +rtw89_mac_c2h_bcn_upd_done(struct rtw89_dev *rtwdev, struct sk_buff *c2h, u32 len) +{ +} + static void rtw89_mac_c2h_pkt_ofld_rsp(struct rtw89_dev *rtwdev, struct sk_buff *skb_c2h, u32 len) @@ -5257,6 +5262,11 @@ rtw89_mac_c2h_pkt_ofld_rsp(struct rtw89_dev *rtwdev, struct sk_buff *skb_c2h, rtw89_complete_cond(wait, cond, &data); } +static void +rtw89_mac_c2h_bcn_resend(struct rtw89_dev *rtwdev, struct sk_buff *c2h, u32 len) +{ +} + static void rtw89_mac_c2h_tx_duty_rpt(struct rtw89_dev *rtwdev, struct sk_buff *skb_c2h, u32 len) { @@ -5646,7 +5656,7 @@ void (* const rtw89_mac_c2h_ofld_handler[])(struct rtw89_dev *rtwdev, [RTW89_MAC_C2H_FUNC_EFUSE_DUMP] = NULL, [RTW89_MAC_C2H_FUNC_READ_RSP] = NULL, [RTW89_MAC_C2H_FUNC_PKT_OFLD_RSP] = rtw89_mac_c2h_pkt_ofld_rsp, - [RTW89_MAC_C2H_FUNC_BCN_RESEND] = NULL, + [RTW89_MAC_C2H_FUNC_BCN_RESEND] = rtw89_mac_c2h_bcn_resend, [RTW89_MAC_C2H_FUNC_MACID_PAUSE] = rtw89_mac_c2h_macid_pause, [RTW89_MAC_C2H_FUNC_SCANOFLD_RSP] = rtw89_mac_c2h_scanofld_rsp, [RTW89_MAC_C2H_FUNC_TX_DUTY_RPT] = rtw89_mac_c2h_tx_duty_rpt, @@ -5661,6 +5671,7 @@ void (* const rtw89_mac_c2h_info_handler[])(struct rtw89_dev *rtwdev, [RTW89_MAC_C2H_FUNC_DONE_ACK] = rtw89_mac_c2h_done_ack, [RTW89_MAC_C2H_FUNC_C2H_LOG] = rtw89_mac_c2h_log, [RTW89_MAC_C2H_FUNC_BCN_CNT] = rtw89_mac_c2h_bcn_cnt, + [RTW89_MAC_C2H_FUNC_BCN_UPD_DONE] = rtw89_mac_c2h_bcn_upd_done, }; static diff --git a/drivers/net/wireless/realtek/rtw89/mac.h b/drivers/net/wireless/realtek/rtw89/mac.h index 241e89983c4a..25fe5e5c8a97 100644 --- a/drivers/net/wireless/realtek/rtw89/mac.h +++ b/drivers/net/wireless/realtek/rtw89/mac.h @@ -419,6 +419,7 @@ enum rtw89_mac_c2h_info_func { RTW89_MAC_C2H_FUNC_DONE_ACK, RTW89_MAC_C2H_FUNC_C2H_LOG, RTW89_MAC_C2H_FUNC_BCN_CNT, + RTW89_MAC_C2H_FUNC_BCN_UPD_DONE = 0x06, RTW89_MAC_C2H_FUNC_INFO_MAX, }; From 5846154126541b27281fbc9aba0c8ee97e22597b Mon Sep 17 00:00:00 2001 From: Akhilesh Patil Date: Fri, 8 Aug 2025 11:09:18 +0530 Subject: [PATCH 0021/1536] wifi: rtw89: 8852bt: Use standard helper for string choice Use standard helper str_on_off() defined at string_choices.h in _dpk_information() to improve code reusability. Reduce hardcoding of repeated use of the same strings to save code space. Reported-by: kernel test robot Closes: https://lore.kernel.org/r/202507282341.drTGfLWA-lkp@intel.com/ Signed-off-by: Akhilesh Patil Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/aJWNhu9bAkcjEyb4@bhairav-test.ee.iitb.ac.in --- drivers/net/wireless/realtek/rtw89/rtw8852bt_rfk.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw89/rtw8852bt_rfk.c b/drivers/net/wireless/realtek/rtw89/rtw8852bt_rfk.c index d0e299803225..b01f921b4224 100644 --- a/drivers/net/wireless/realtek/rtw89/rtw8852bt_rfk.c +++ b/drivers/net/wireless/realtek/rtw89/rtw8852bt_rfk.c @@ -1883,8 +1883,8 @@ static void _dpk_information(struct rtw89_dev *rtwdev, enum rtw89_phy_idx phy, rtw89_debug(rtwdev, RTW89_DBG_RFK, "[DPK] S%d[%d] (PHY%d): TSSI %s/ DBCC %s/ %s/ CH%d/ %s\n", path, dpk->cur_idx[path], phy, - rtwdev->is_tssi_mode[path] ? "on" : "off", - rtwdev->dbcc_en ? "on" : "off", + str_on_off(rtwdev->is_tssi_mode[path]), + str_on_off(rtwdev->dbcc_en), dpk->bp[path][kidx].band == 0 ? "2G" : dpk->bp[path][kidx].band == 1 ? "5G" : "6G", dpk->bp[path][kidx].ch, From 526c2530cbf84428a0a2b5ca7800986c0912ac35 Mon Sep 17 00:00:00 2001 From: Qianfeng Rong Date: Sun, 10 Aug 2025 15:29:40 +0800 Subject: [PATCH 0022/1536] tcp: cdg: remove redundant __GFP_NOWARN GFP_NOWAIT already includes __GFP_NOWARN, so let's remove the redundant __GFP_NOWARN. Signed-off-by: Qianfeng Rong Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20250810072944.438574-2-rongqianfeng@vivo.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp_cdg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c index ba4d98e510e0..fbad6c35dee9 100644 --- a/net/ipv4/tcp_cdg.c +++ b/net/ipv4/tcp_cdg.c @@ -379,7 +379,7 @@ static void tcp_cdg_init(struct sock *sk) /* We silently fall back to window = 1 if allocation fails. */ if (window > 1) ca->gradients = kcalloc(window, sizeof(ca->gradients[0]), - GFP_NOWAIT | __GFP_NOWARN); + GFP_NOWAIT); ca->rtt_seq = tp->snd_nxt; ca->shadow_wnd = tcp_snd_cwnd(tp); } From 7792232a4ea1a8b6fe34d0d99a1e02aac185e633 Mon Sep 17 00:00:00 2001 From: Qianfeng Rong Date: Sun, 10 Aug 2025 15:29:41 +0800 Subject: [PATCH 0023/1536] RDS: remove redundant __GFP_NOWARN GFP_NOWAIT already includes __GFP_NOWARN, so let's remove the redundant __GFP_NOWARN. Signed-off-by: Qianfeng Rong Reviewed-by: Allison Henderson Link: https://patch.msgid.link/20250810072944.438574-3-rongqianfeng@vivo.com Signed-off-by: Jakub Kicinski --- net/rds/ib_recv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c index e53b7f266bd7..4248dfa816eb 100644 --- a/net/rds/ib_recv.c +++ b/net/rds/ib_recv.c @@ -1034,7 +1034,7 @@ void rds_ib_recv_cqe_handler(struct rds_ib_connection *ic, rds_ib_stats_inc(s_ib_rx_ring_empty); if (rds_ib_ring_low(&ic->i_recv_ring)) { - rds_ib_recv_refill(conn, 0, GFP_NOWAIT | __GFP_NOWARN); + rds_ib_recv_refill(conn, 0, GFP_NOWAIT); rds_ib_stats_inc(s_ib_rx_refill_from_cq); } } From 63fe077c21d323cbc8d56114e9139bbadab33ce5 Mon Sep 17 00:00:00 2001 From: Thorsten Blum Date: Mon, 11 Aug 2025 11:34:40 +0200 Subject: [PATCH 0024/1536] caif: Replace memset(0) + strscpy() with strscpy_pad() Replace memset(0) followed by strscpy() with strscpy_pad() to improve cfctrl_linkup_request(). This avoids zeroing the memory before copying the string and ensures the destination buffer is only written to once, simplifying the code and improving efficiency. Signed-off-by: Thorsten Blum Link: https://patch.msgid.link/20250811093442.5075-2-thorsten.blum@linux.dev Signed-off-by: Jakub Kicinski --- net/caif/cfctrl.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/caif/cfctrl.c b/net/caif/cfctrl.c index 06b604cf9d58..2aa1e7d46eb2 100644 --- a/net/caif/cfctrl.c +++ b/net/caif/cfctrl.c @@ -257,9 +257,7 @@ int cfctrl_linkup_request(struct cflayer *layer, cfpkt_add_body(pkt, &tmp16, 2); tmp16 = cpu_to_le16(param->u.utility.fifosize_bufs); cfpkt_add_body(pkt, &tmp16, 2); - memset(utility_name, 0, sizeof(utility_name)); - strscpy(utility_name, param->u.utility.name, - UTILITY_NAME_LENGTH); + strscpy_pad(utility_name, param->u.utility.name); cfpkt_add_body(pkt, utility_name, UTILITY_NAME_LENGTH); tmp8 = param->u.utility.paramlen; cfpkt_add_body(pkt, &tmp8, 1); From 11b99886d1948503ccc3f24548b3dfb7ff0a5261 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Mon, 11 Aug 2025 12:12:11 +0100 Subject: [PATCH 0025/1536] net: stmmac: make variable data a u32 Make data a u32 instead of an unsigned long, this way it is explicitly the same width as the operations performed on it and the same width as a writel store, and it cleans up sign extention warnings when 64 bit static analysis is performed on the code. Signed-off-by: Colin Ian King Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/20250811111211.1646600-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c index 4846bf49c576..467f1a05747e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c @@ -251,7 +251,7 @@ void dwmac_dma_flush_tx_fifo(void __iomem *ioaddr) void stmmac_set_mac_addr(void __iomem *ioaddr, const u8 addr[6], unsigned int high, unsigned int low) { - unsigned long data; + u32 data; data = (addr[5] << 8) | addr[4]; /* For MAC Addr registers we have to set the Address Enable (AE) From f8262b8dadfa635cc8f203c40eb0b75b8625f44c Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Mon, 11 Aug 2025 16:22:36 +0200 Subject: [PATCH 0026/1536] dt-bindings: nfc: ti,trf7970a: Drop 'db' suffix duplicating dtschema A common property unit suffix '-db' was added to dtschema, thus in-kernel bindings should not reference the type. Signed-off-by: Krzysztof Kozlowski Acked-by: Conor Dooley Link: https://patch.msgid.link/20250811142235.170407-2-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski --- Documentation/devicetree/bindings/net/nfc/ti,trf7970a.yaml | 1 - 1 file changed, 1 deletion(-) diff --git a/Documentation/devicetree/bindings/net/nfc/ti,trf7970a.yaml b/Documentation/devicetree/bindings/net/nfc/ti,trf7970a.yaml index 5f49bd9ac5e6..783a85b84893 100644 --- a/Documentation/devicetree/bindings/net/nfc/ti,trf7970a.yaml +++ b/Documentation/devicetree/bindings/net/nfc/ti,trf7970a.yaml @@ -56,7 +56,6 @@ properties: Regulator for supply voltage to VIN pin ti,rx-gain-reduction-db: - $ref: /schemas/types.yaml#/definitions/uint32 description: | Specify an RX gain reduction to reduce antenna sensitivity with 5dB per increment, with a maximum of 15dB. Supported values: [0, 5, 10, 15]. From 86e3d52bd3e919181d5f7e5107065d16e694c8d8 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 11 Aug 2025 14:52:52 +0000 Subject: [PATCH 0027/1536] phonet: add __rcu annotations MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Removes following sparse errors. make C=2 net/phonet/socket.o net/phonet/af_phonet.o CHECK net/phonet/socket.c net/phonet/socket.c:619:14: error: incompatible types in comparison expression (different address spaces): net/phonet/socket.c:619:14: struct sock [noderef] __rcu * net/phonet/socket.c:619:14: struct sock * net/phonet/socket.c:642:17: error: incompatible types in comparison expression (different address spaces): net/phonet/socket.c:642:17: struct sock [noderef] __rcu * net/phonet/socket.c:642:17: struct sock * net/phonet/socket.c:658:17: error: incompatible types in comparison expression (different address spaces): net/phonet/socket.c:658:17: struct sock [noderef] __rcu * net/phonet/socket.c:658:17: struct sock * net/phonet/socket.c:677:25: error: incompatible types in comparison expression (different address spaces): net/phonet/socket.c:677:25: struct sock [noderef] __rcu * net/phonet/socket.c:677:25: struct sock * net/phonet/socket.c:726:21: warning: context imbalance in 'pn_res_seq_start' - wrong count at exit net/phonet/socket.c:741:13: warning: context imbalance in 'pn_res_seq_stop' - wrong count at exit CHECK net/phonet/af_phonet.c net/phonet/af_phonet.c:35:14: error: incompatible types in comparison expression (different address spaces): net/phonet/af_phonet.c:35:14: struct phonet_protocol const [noderef] __rcu * net/phonet/af_phonet.c:35:14: struct phonet_protocol const * net/phonet/af_phonet.c:474:17: error: incompatible types in comparison expression (different address spaces): net/phonet/af_phonet.c:474:17: struct phonet_protocol const [noderef] __rcu * net/phonet/af_phonet.c:474:17: struct phonet_protocol const * net/phonet/af_phonet.c:486:9: error: incompatible types in comparison expression (different address spaces): net/phonet/af_phonet.c:486:9: struct phonet_protocol const [noderef] __rcu * net/phonet/af_phonet.c:486:9: struct phonet_protocol const * Signed-off-by: Eric Dumazet Acked-by: Rémi Denis-Courmont Link: https://patch.msgid.link/20250811145252.1007242-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/phonet/af_phonet.c | 4 ++-- net/phonet/socket.c | 23 ++++++++++++----------- 2 files changed, 14 insertions(+), 13 deletions(-) diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c index a27efa4faa4e..238a9638d2b0 100644 --- a/net/phonet/af_phonet.c +++ b/net/phonet/af_phonet.c @@ -22,7 +22,7 @@ #include /* Transport protocol registration */ -static const struct phonet_protocol *proto_tab[PHONET_NPROTO] __read_mostly; +static const struct phonet_protocol __rcu *proto_tab[PHONET_NPROTO] __read_mostly; static const struct phonet_protocol *phonet_proto_get(unsigned int protocol) { @@ -482,7 +482,7 @@ void phonet_proto_unregister(unsigned int protocol, const struct phonet_protocol *pp) { mutex_lock(&proto_tab_lock); - BUG_ON(proto_tab[protocol] != pp); + BUG_ON(rcu_access_pointer(proto_tab[protocol]) != pp); RCU_INIT_POINTER(proto_tab[protocol], NULL); mutex_unlock(&proto_tab_lock); synchronize_rcu(); diff --git a/net/phonet/socket.c b/net/phonet/socket.c index ea4d5e6533db..2b61a40b568e 100644 --- a/net/phonet/socket.c +++ b/net/phonet/socket.c @@ -602,7 +602,7 @@ const struct seq_operations pn_sock_seq_ops = { #endif static struct { - struct sock *sk[256]; + struct sock __rcu *sk[256]; } pnres; /* @@ -654,7 +654,7 @@ int pn_sock_unbind_res(struct sock *sk, u8 res) return -EPERM; mutex_lock(&resource_mutex); - if (pnres.sk[res] == sk) { + if (rcu_access_pointer(pnres.sk[res]) == sk) { RCU_INIT_POINTER(pnres.sk[res], NULL); ret = 0; } @@ -673,7 +673,7 @@ void pn_sock_unbind_all_res(struct sock *sk) mutex_lock(&resource_mutex); for (res = 0; res < 256; res++) { - if (pnres.sk[res] == sk) { + if (rcu_access_pointer(pnres.sk[res]) == sk) { RCU_INIT_POINTER(pnres.sk[res], NULL); match++; } @@ -688,7 +688,7 @@ void pn_sock_unbind_all_res(struct sock *sk) } #ifdef CONFIG_PROC_FS -static struct sock **pn_res_get_idx(struct seq_file *seq, loff_t pos) +static struct sock __rcu **pn_res_get_idx(struct seq_file *seq, loff_t pos) { struct net *net = seq_file_net(seq); unsigned int i; @@ -697,7 +697,7 @@ static struct sock **pn_res_get_idx(struct seq_file *seq, loff_t pos) return NULL; for (i = 0; i < 256; i++) { - if (pnres.sk[i] == NULL) + if (rcu_access_pointer(pnres.sk[i]) == NULL) continue; if (!pos) return pnres.sk + i; @@ -706,7 +706,7 @@ static struct sock **pn_res_get_idx(struct seq_file *seq, loff_t pos) return NULL; } -static struct sock **pn_res_get_next(struct seq_file *seq, struct sock **sk) +static struct sock __rcu **pn_res_get_next(struct seq_file *seq, struct sock __rcu **sk) { struct net *net = seq_file_net(seq); unsigned int i; @@ -728,7 +728,7 @@ static void *pn_res_seq_start(struct seq_file *seq, loff_t *pos) static void *pn_res_seq_next(struct seq_file *seq, void *v, loff_t *pos) { - struct sock **sk; + struct sock __rcu **sk; if (v == SEQ_START_TOKEN) sk = pn_res_get_idx(seq, 0); @@ -747,11 +747,12 @@ static void pn_res_seq_stop(struct seq_file *seq, void *v) static int pn_res_seq_show(struct seq_file *seq, void *v) { seq_setwidth(seq, 63); - if (v == SEQ_START_TOKEN) + if (v == SEQ_START_TOKEN) { seq_puts(seq, "rs uid inode"); - else { - struct sock **psk = v; - struct sock *sk = *psk; + } else { + struct sock __rcu **psk = v; + struct sock *sk = rcu_dereference_protected(*psk, + lockdep_is_held(&resource_mutex)); seq_printf(seq, "%02X %5u %lu", (int) (psk - pnres.sk), From b3ba7d929ce197ff2651046798b94bd62eb0e680 Mon Sep 17 00:00:00 2001 From: Thorsten Blum Date: Mon, 11 Aug 2025 18:40:38 +0200 Subject: [PATCH 0028/1536] net/sched: Remove redundant memset(0) call in reset_policy() The call to nla_strscpy() already zero-pads the tail of the destination buffer which makes the additional memset(0) call redundant. Remove it. Signed-off-by: Thorsten Blum Reviewed-by: Joe Damato Link: https://patch.msgid.link/20250811164039.43250-1-thorsten.blum@linux.dev Signed-off-by: Jakub Kicinski --- net/sched/act_simple.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c index f3abe0545989..8e69a919b4fe 100644 --- a/net/sched/act_simple.c +++ b/net/sched/act_simple.c @@ -72,7 +72,6 @@ static int reset_policy(struct tc_action *a, const struct nlattr *defdata, d = to_defact(a); spin_lock_bh(&d->tcf_lock); goto_ch = tcf_action_set_ctrlact(a, p->action, goto_ch); - memset(d->tcfd_defdata, 0, SIMP_MAX_DATA); nla_strscpy(d->tcfd_defdata, defdata, SIMP_MAX_DATA); spin_unlock_bh(&d->tcf_lock); if (goto_ch) From 75f26257667588260d067fa022ebd6ac39b4a9c1 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Mon, 11 Aug 2025 09:59:21 -0700 Subject: [PATCH 0029/1536] net: mdio: mdio-bcm-unimac: Refine incorrect clock message In light of a81649a4efd3 ("net: mdio: mdio-bcm-unimac: Correct rate fallback logic"), it became clear that the warning should be specific to the MDIO controller instance, and there should be further information provided to indicate what is wrong, whether the requested clock frequency or the rate calculation. Clarify the message accordingly. Signed-off-by: Florian Fainelli Link: https://patch.msgid.link/20250811165921.392030-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski --- drivers/net/mdio/mdio-bcm-unimac.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/mdio/mdio-bcm-unimac.c b/drivers/net/mdio/mdio-bcm-unimac.c index 7baab230008a..37e35f282d9a 100644 --- a/drivers/net/mdio/mdio-bcm-unimac.c +++ b/drivers/net/mdio/mdio-bcm-unimac.c @@ -215,7 +215,9 @@ static int unimac_mdio_clk_set(struct unimac_mdio_priv *priv) div = (rate / (2 * priv->clk_freq)) - 1; if (div & ~MDIO_CLK_DIV_MASK) { - pr_warn("Incorrect MDIO clock frequency, ignoring\n"); + dev_warn(priv->mii_bus->parent, + "Ignoring MDIO clock frequency request: %d vs. rate: %ld\n", + priv->clk_freq, rate); ret = 0; goto out; } From fa38524ca5a783aa0cfc8ffa5730c5012edfccf4 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Mon, 11 Aug 2025 11:13:25 -0700 Subject: [PATCH 0030/1536] netconsole: move netpoll_parse_ip_addr() earlier for reuse Move netpoll_parse_ip_addr() earlier in the file to be reused in other functions, such as local_ip_store(). This avoids duplicate address parsing logic and centralizes validation for both IPv4 and IPv6 string input. No functional changes intended. Signed-off-by: Breno Leitao Reviewed-by: Simon Horman Link: https://patch.msgid.link/20250811-netconsole_ref-v4-1-9c510d8713a2@debian.org Signed-off-by: Jakub Kicinski --- drivers/net/netconsole.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index e3722de08ea9..8d1b93264e0f 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -300,6 +300,26 @@ static void netconsole_print_banner(struct netpoll *np) np_info(np, "remote ethernet address %pM\n", np->remote_mac); } +static int netpoll_parse_ip_addr(const char *str, union inet_addr *addr) +{ + const char *end; + + if (!strchr(str, ':') && + in4_pton(str, -1, (void *)addr, -1, &end) > 0) { + if (!*end) + return 0; + } + if (in6_pton(str, -1, addr->in6.s6_addr, -1, &end) > 0) { +#if IS_ENABLED(CONFIG_IPV6) + if (!*end) + return 1; +#else + return -1; +#endif + } + return -1; +} + #ifdef CONFIG_NETCONSOLE_DYNAMIC /* @@ -1742,26 +1762,6 @@ static void write_msg(struct console *con, const char *msg, unsigned int len) spin_unlock_irqrestore(&target_list_lock, flags); } -static int netpoll_parse_ip_addr(const char *str, union inet_addr *addr) -{ - const char *end; - - if (!strchr(str, ':') && - in4_pton(str, -1, (void *)addr, -1, &end) > 0) { - if (!*end) - return 0; - } - if (in6_pton(str, -1, addr->in6.s6_addr, -1, &end) > 0) { -#if IS_ENABLED(CONFIG_IPV6) - if (!*end) - return 1; -#else - return -1; -#endif - } - return -1; -} - static int netconsole_parser_cmdline(struct netpoll *np, char *opt) { bool ipversion_set = false; From 364213b736e313fab963b272ebe5a2ffeb888831 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Mon, 11 Aug 2025 11:13:26 -0700 Subject: [PATCH 0031/1536] netconsole: add support for strings with new line in netpoll_parse_ip_addr The current IP address parsing logic fails when the input string contains a trailing newline character. This can occur when IP addresses are provided through configfs, which contains newlines in a const buffer. Teach netpoll_parse_ip_addr() how to ignore newlines at the end of the IPs. Also, simplify the code by: * No need to check for separators. Try to parse ipv4, if it fails try ipv6 similarly to ceph_pton() * If ipv6 is not supported, don't call in6_pton() at all. Signed-off-by: Breno Leitao Link: https://patch.msgid.link/20250811-netconsole_ref-v4-2-9c510d8713a2@debian.org Signed-off-by: Jakub Kicinski --- drivers/net/netconsole.c | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 8d1b93264e0f..2919522d963e 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -300,23 +300,30 @@ static void netconsole_print_banner(struct netpoll *np) np_info(np, "remote ethernet address %pM\n", np->remote_mac); } +/* Parse the string and populate the `inet_addr` union. Return 0 if IPv4 is + * populated, 1 if IPv6 is populated, and -1 upon failure. + */ static int netpoll_parse_ip_addr(const char *str, union inet_addr *addr) { - const char *end; + const char *end = NULL; + int len; - if (!strchr(str, ':') && - in4_pton(str, -1, (void *)addr, -1, &end) > 0) { - if (!*end) - return 0; - } - if (in6_pton(str, -1, addr->in6.s6_addr, -1, &end) > 0) { -#if IS_ENABLED(CONFIG_IPV6) - if (!*end) - return 1; -#else + len = strlen(str); + if (!len) return -1; -#endif - } + + if (str[len - 1] == '\n') + len -= 1; + + if (in4_pton(str, len, (void *)addr, -1, &end) > 0 && + (!end || *end == 0 || *end == '\n')) + return 0; + + if (IS_ENABLED(CONFIG_IPV6) && + in6_pton(str, len, (void *)addr, -1, &end) > 0 && + (!end || *end == 0 || *end == '\n')) + return 1; + return -1; } From 60cb69214148fbe7fc50239c28e4d052eec6ae61 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Mon, 11 Aug 2025 11:13:27 -0700 Subject: [PATCH 0032/1536] netconsole: use netpoll_parse_ip_addr in local_ip_store Replace manual IP address parsing with a call to netpoll_parse_ip_addr in local_ip_store(), simplifying the code and reducing the chance of errors. Also, remove the pr_err() if the user enters an invalid value in configfs entries. pr_err() is not the best way to alert user that the configuration is invalid. Signed-off-by: Breno Leitao Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- drivers/net/netconsole.c | 22 +++++----------------- 1 file changed, 5 insertions(+), 17 deletions(-) diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 2919522d963e..a9b30b5891d7 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -757,6 +757,7 @@ static ssize_t local_ip_store(struct config_item *item, const char *buf, { struct netconsole_target *nt = to_target(item); ssize_t ret = -EINVAL; + int ipv6; mutex_lock(&dynamic_netconsole_mutex); if (nt->enabled) { @@ -765,23 +766,10 @@ static ssize_t local_ip_store(struct config_item *item, const char *buf, goto out_unlock; } - if (strnchr(buf, count, ':')) { - const char *end; - - if (in6_pton(buf, count, nt->np.local_ip.in6.s6_addr, -1, &end) > 0) { - if (*end && *end != '\n') { - pr_err("invalid IPv6 address at: <%c>\n", *end); - goto out_unlock; - } - nt->np.ipv6 = true; - } else - goto out_unlock; - } else { - if (!nt->np.ipv6) - nt->np.local_ip.ip = in_aton(buf); - else - goto out_unlock; - } + ipv6 = netpoll_parse_ip_addr(buf, &nt->np.local_ip); + if (ipv6 == -1) + goto out_unlock; + nt->np.ipv6 = !!ipv6; ret = strnlen(buf, count); out_unlock: From 4aeb452c237afa6d7e8d09185bca6e34454ca333 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Mon, 11 Aug 2025 11:13:28 -0700 Subject: [PATCH 0033/1536] netconsole: use netpoll_parse_ip_addr in local_ip_store Replace manual IP address parsing with a call to netpoll_parse_ip_addr in remote_ip_store(), simplifying the code and reducing the chance of errors. The error message got removed, since it is not a good practice to pr_err() if used pass a wrong value in configfs. Signed-off-by: Breno Leitao Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- drivers/net/netconsole.c | 22 +++++----------------- 1 file changed, 5 insertions(+), 17 deletions(-) diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index a9b30b5891d7..194570443493 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -782,6 +782,7 @@ static ssize_t remote_ip_store(struct config_item *item, const char *buf, { struct netconsole_target *nt = to_target(item); ssize_t ret = -EINVAL; + int ipv6; mutex_lock(&dynamic_netconsole_mutex); if (nt->enabled) { @@ -790,23 +791,10 @@ static ssize_t remote_ip_store(struct config_item *item, const char *buf, goto out_unlock; } - if (strnchr(buf, count, ':')) { - const char *end; - - if (in6_pton(buf, count, nt->np.remote_ip.in6.s6_addr, -1, &end) > 0) { - if (*end && *end != '\n') { - pr_err("invalid IPv6 address at: <%c>\n", *end); - goto out_unlock; - } - nt->np.ipv6 = true; - } else - goto out_unlock; - } else { - if (!nt->np.ipv6) - nt->np.remote_ip.ip = in_aton(buf); - else - goto out_unlock; - } + ipv6 = netpoll_parse_ip_addr(buf, &nt->np.remote_ip); + if (ipv6 == -1) + goto out_unlock; + nt->np.ipv6 = !!ipv6; ret = strnlen(buf, count); out_unlock: From 942224e6baca0d1bdd1801e935f3544f2df37baf Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Mon, 11 Aug 2025 21:53:05 +0000 Subject: [PATCH 0034/1536] selftest: af_unix: Silence -Wflex-array-member-not-at-end warning for scm_inq.c. scm_inq.c has no problem in functionality, but when compiled with -Wflex-array-member-not-at-end, it shows this warning: scm_inq.c:15:24: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] 15 | struct cmsghdr cmsghdr; | ^~~~~~~ Let's silence it. Signed-off-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20250811215432.3379570-3-kuniyu@google.com Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/af_unix/scm_inq.c | 26 +++++++++---------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/tools/testing/selftests/net/af_unix/scm_inq.c b/tools/testing/selftests/net/af_unix/scm_inq.c index 9d22561e7b8f..fc467714387e 100644 --- a/tools/testing/selftests/net/af_unix/scm_inq.c +++ b/tools/testing/selftests/net/af_unix/scm_inq.c @@ -11,11 +11,6 @@ #define NR_CHUNKS 100 #define MSG_LEN 256 -struct scm_inq { - struct cmsghdr cmsghdr; - int inq; -}; - FIXTURE(scm_inq) { int fd[2]; @@ -70,35 +65,38 @@ static void send_chunks(struct __test_metadata *_metadata, static void recv_chunks(struct __test_metadata *_metadata, FIXTURE_DATA(scm_inq) *self) { + char cmsg_buf[CMSG_SPACE(sizeof(int))]; struct msghdr msg = {}; struct iovec iov = {}; - struct scm_inq cmsg; + struct cmsghdr *cmsg; char buf[MSG_LEN]; int i, ret; int inq; msg.msg_iov = &iov; msg.msg_iovlen = 1; - msg.msg_control = &cmsg; - msg.msg_controllen = CMSG_SPACE(sizeof(cmsg.inq)); + msg.msg_control = cmsg_buf; + msg.msg_controllen = sizeof(cmsg_buf); iov.iov_base = buf; iov.iov_len = sizeof(buf); for (i = 0; i < NR_CHUNKS; i++) { memset(buf, 0, sizeof(buf)); - memset(&cmsg, 0, sizeof(cmsg)); + memset(cmsg_buf, 0, sizeof(cmsg_buf)); ret = recvmsg(self->fd[1], &msg, 0); ASSERT_EQ(MSG_LEN, ret); - ASSERT_NE(NULL, CMSG_FIRSTHDR(&msg)); - ASSERT_EQ(CMSG_LEN(sizeof(cmsg.inq)), cmsg.cmsghdr.cmsg_len); - ASSERT_EQ(SOL_SOCKET, cmsg.cmsghdr.cmsg_level); - ASSERT_EQ(SCM_INQ, cmsg.cmsghdr.cmsg_type); + + cmsg = CMSG_FIRSTHDR(&msg); + ASSERT_NE(NULL, cmsg); + ASSERT_EQ(CMSG_LEN(sizeof(int)), cmsg->cmsg_len); + ASSERT_EQ(SOL_SOCKET, cmsg->cmsg_level); + ASSERT_EQ(SCM_INQ, cmsg->cmsg_type); ret = ioctl(self->fd[1], SIOCINQ, &inq); ASSERT_EQ(0, ret); - ASSERT_EQ(cmsg.inq, inq); + ASSERT_EQ(*(int *)CMSG_DATA(cmsg), inq); } } From 9a58d8e68252af07138a2530a22f081ef0b070c1 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Mon, 11 Aug 2025 21:53:06 +0000 Subject: [PATCH 0035/1536] selftest: af_unix: Silence -Wflex-array-member-not-at-end warning for scm_rights.c. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit scm_rights.c has no problem in functionality, but when compiled with -Wflex-array-member-not-at-end, it shows this warning: scm_rights.c: In function ‘__send_fd’: scm_rights.c:275:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] 275 | struct cmsghdr cmsghdr; | ^~~~~~~ Let's silence it. Signed-off-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20250811215432.3379570-4-kuniyu@google.com Signed-off-by: Jakub Kicinski --- .../selftests/net/af_unix/scm_rights.c | 28 +++++++++---------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/tools/testing/selftests/net/af_unix/scm_rights.c b/tools/testing/selftests/net/af_unix/scm_rights.c index 8b015f16c03d..914f99d153ce 100644 --- a/tools/testing/selftests/net/af_unix/scm_rights.c +++ b/tools/testing/selftests/net/af_unix/scm_rights.c @@ -271,20 +271,11 @@ void __send_fd(struct __test_metadata *_metadata, { #define MSG "x" #define MSGLEN 1 - struct { - struct cmsghdr cmsghdr; - int fd[2]; - } cmsg = { - .cmsghdr = { - .cmsg_len = CMSG_LEN(sizeof(cmsg.fd)), - .cmsg_level = SOL_SOCKET, - .cmsg_type = SCM_RIGHTS, - }, - .fd = { - self->fd[inflight * 2], - self->fd[inflight * 2], - }, + int fds[2] = { + self->fd[inflight * 2], + self->fd[inflight * 2], }; + char cmsg_buf[CMSG_SPACE(sizeof(fds))]; struct iovec iov = { .iov_base = MSG, .iov_len = MSGLEN, @@ -294,11 +285,18 @@ void __send_fd(struct __test_metadata *_metadata, .msg_namelen = 0, .msg_iov = &iov, .msg_iovlen = 1, - .msg_control = &cmsg, - .msg_controllen = CMSG_SPACE(sizeof(cmsg.fd)), + .msg_control = cmsg_buf, + .msg_controllen = sizeof(cmsg_buf), }; + struct cmsghdr *cmsg; int ret; + cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + cmsg->cmsg_len = CMSG_LEN(sizeof(fds)); + memcpy(CMSG_DATA(cmsg), fds, sizeof(fds)); + ret = sendmsg(self->fd[receiver * 2 + 1], &msg, variant->flags); if (variant->disabled) { From fd9faac372cc5f95ca537be5e3daf044c87ffae1 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Mon, 11 Aug 2025 21:53:07 +0000 Subject: [PATCH 0036/1536] selftest: af_unix: Silence -Wall warning for scm_pid.c. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit -Wall found 2 unused variables in scm_pid.c: scm_pidfd.c: In function ‘parse_cmsg’: scm_pidfd.c:140:13: warning: unused variable ‘data’ [-Wunused-variable] 140 | int data = 0; | ^~~~ scm_pidfd.c: In function ‘cmsg_check_dead’: scm_pidfd.c:246:15: warning: unused variable ‘client_pid’ [-Wunused-variable] 246 | pid_t client_pid; | ^~~~~~~~~~ Let's remove these variables. Signed-off-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20250811215432.3379570-5-kuniyu@google.com Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/af_unix/scm_pidfd.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/tools/testing/selftests/net/af_unix/scm_pidfd.c b/tools/testing/selftests/net/af_unix/scm_pidfd.c index 37e034874034..ef2921988e5f 100644 --- a/tools/testing/selftests/net/af_unix/scm_pidfd.c +++ b/tools/testing/selftests/net/af_unix/scm_pidfd.c @@ -137,7 +137,6 @@ struct cmsg_data { static int parse_cmsg(struct msghdr *msg, struct cmsg_data *res) { struct cmsghdr *cmsg; - int data = 0; if (msg->msg_flags & (MSG_TRUNC | MSG_CTRUNC)) { log_err("recvmsg: truncated"); @@ -243,7 +242,6 @@ static int cmsg_check_dead(int fd, int expected_pid) int data = 0; char control[CMSG_SPACE(sizeof(struct ucred)) + CMSG_SPACE(sizeof(int))] = { 0 }; - pid_t client_pid; struct pidfd_info info = { .mask = PIDFD_INFO_EXIT, }; From 1838731f1072ce859ae3350547de4792f365760c Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Mon, 11 Aug 2025 21:53:04 +0000 Subject: [PATCH 0037/1536] selftest: af_unix: Add -Wall and -Wflex-array-member-not-at-end to CFLAGS. -Wall and -Wflex-array-member-not-at-end caught some warnings that will be fixed in later patches. Signed-off-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20250811215432.3379570-2-kuniyu@google.com Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/af_unix/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile index a4b61c6d0290..0a20c98bbcfd 100644 --- a/tools/testing/selftests/net/af_unix/Makefile +++ b/tools/testing/selftests/net/af_unix/Makefile @@ -1,4 +1,4 @@ -CFLAGS += $(KHDR_INCLUDES) +CFLAGS += $(KHDR_INCLUDES) -Wall -Wflex-array-member-not-at-end TEST_GEN_PROGS := diag_uid msg_oob scm_inq scm_pidfd scm_rights unix_connect include ../../lib.mk From 07bbbfe7addf5b032e04f3c38f0b183d067a3f0d Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 11 Aug 2025 19:50:43 +0100 Subject: [PATCH 0038/1536] net: stmmac: add suspend()/resume() platform ops Add suspend/resume platform operations, which, when populated, override the init/exit platform operations when we suspend and resume. These suspend()/resume() methods are called by core code, and thus are designed to support any struct device, not just platform devices. This allows them to be used by the PCI drivers we have. Signed-off-by: Russell King (Oracle) Reviewed-by: Maxime Chevallier Link: https://patch.msgid.link/E1ulXbX-008gqZ-Bb@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 9 +++++++++ .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 12 ++++++++---- include/linux/stmmac.h | 2 ++ 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index f1abf4242cd2..2da4f7bb2899 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7879,6 +7879,9 @@ int stmmac_suspend(struct device *dev) if (stmmac_fpe_supported(priv)) ethtool_mmsv_stop(&priv->fpe_cfg.mmsv); + if (priv->plat->suspend) + return priv->plat->suspend(dev, priv->plat->bsp_priv); + return 0; } EXPORT_SYMBOL_GPL(stmmac_suspend); @@ -7931,6 +7934,12 @@ int stmmac_resume(struct device *dev) struct stmmac_priv *priv = netdev_priv(ndev); int ret; + if (priv->plat->resume) { + ret = priv->plat->resume(dev, priv->plat->bsp_priv); + if (ret) + return ret; + } + if (!netif_running(ndev)) return 0; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 030fcf1b5993..21df052eeed0 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -901,7 +901,9 @@ static int __maybe_unused stmmac_pltfr_suspend(struct device *dev) struct platform_device *pdev = to_platform_device(dev); ret = stmmac_suspend(dev); - stmmac_pltfr_exit(pdev, priv->plat); + + if (!priv->plat->suspend) + stmmac_pltfr_exit(pdev, priv->plat); return ret; } @@ -920,9 +922,11 @@ static int __maybe_unused stmmac_pltfr_resume(struct device *dev) struct platform_device *pdev = to_platform_device(dev); int ret; - ret = stmmac_pltfr_init(pdev, priv->plat); - if (ret) - return ret; + if (!priv->plat->resume) { + ret = stmmac_pltfr_init(pdev, priv->plat); + if (ret) + return ret; + } return stmmac_resume(dev); } diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 26ddf95d23f9..22c24dacbc65 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -248,6 +248,8 @@ struct plat_stmmacenet_data { void (*ptp_clk_freq_config)(struct stmmac_priv *priv); int (*init)(struct platform_device *pdev, void *priv); void (*exit)(struct platform_device *pdev, void *priv); + int (*suspend)(struct device *dev, void *priv); + int (*resume)(struct device *dev, void *priv); struct mac_device_info *(*setup)(void *priv); int (*clks_config)(void *priv, bool enabled); int (*crosststamp)(ktime_t *device, struct system_counterval_t *system, From 7e84b3fae58c5425c324277268cad431ad181599 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 11 Aug 2025 19:50:48 +0100 Subject: [PATCH 0039/1536] net: stmmac: provide a set of simple PM ops Several drivers will want to make use of simple PM operations, so provide these from the core driver. Signed-off-by: Russell King (Oracle) Reviewed-by: Maxime Chevallier Link: https://patch.msgid.link/E1ulXbc-008gqf-GJ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index cda09cf5dcca..bf95f03dd33f 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -374,6 +374,8 @@ enum stmmac_state { STMMAC_SERVICE_SCHED, }; +extern const struct dev_pm_ops stmmac_simple_pm_ops; + int stmmac_mdio_unregister(struct net_device *ndev); int stmmac_mdio_register(struct net_device *ndev); int stmmac_mdio_reset(struct mii_bus *mii); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 2da4f7bb2899..4a82045ea6eb 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -8013,6 +8013,9 @@ int stmmac_resume(struct device *dev) } EXPORT_SYMBOL_GPL(stmmac_resume); +EXPORT_GPL_SIMPLE_DEV_PM_OPS(stmmac_simple_pm_ops, stmmac_suspend, + stmmac_resume); + #ifndef MODULE static int __init stmmac_cmdline_opt(char *str) { From b51f34bc85e3b8c24f091a8e216b28213758021d Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 11 Aug 2025 19:50:53 +0100 Subject: [PATCH 0040/1536] net: stmmac: platform: legacy hooks for suspend()/resume() methods Add legacy hooks for the suspend() and resume() methods to forward these calls to the init() and exit() methods when the platform code hasn't populated the two former methods. This allows us to get rid of stmmac_pltfr_suspend() and stmmac_pltfr_resume(). Signed-off-by: Russell King (Oracle) Reviewed-by: Maxime Chevallier Link: https://patch.msgid.link/E1ulXbh-008gql-LO@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- .../ethernet/stmicro/stmmac/stmmac_platform.c | 68 ++++++------------- 1 file changed, 22 insertions(+), 46 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 21df052eeed0..c849676d98e8 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -811,6 +811,22 @@ static void stmmac_pltfr_exit(struct platform_device *pdev, plat->exit(pdev, plat->bsp_priv); } +static int stmmac_plat_suspend(struct device *dev, void *bsp_priv) +{ + struct stmmac_priv *priv = netdev_priv(dev_get_drvdata(dev)); + + stmmac_pltfr_exit(to_platform_device(dev), priv->plat); + + return 0; +} + +static int stmmac_plat_resume(struct device *dev, void *bsp_priv) +{ + struct stmmac_priv *priv = netdev_priv(dev_get_drvdata(dev)); + + return stmmac_pltfr_init(to_platform_device(dev), priv->plat); +} + /** * stmmac_pltfr_probe * @pdev: platform device pointer @@ -825,6 +841,11 @@ int stmmac_pltfr_probe(struct platform_device *pdev, { int ret; + if (!plat->suspend && plat->exit) + plat->suspend = stmmac_plat_suspend; + if (!plat->resume && plat->init) + plat->resume = stmmac_plat_resume; + ret = stmmac_pltfr_init(pdev, plat); if (ret) return ret; @@ -886,51 +907,6 @@ void stmmac_pltfr_remove(struct platform_device *pdev) } EXPORT_SYMBOL_GPL(stmmac_pltfr_remove); -/** - * stmmac_pltfr_suspend - * @dev: device pointer - * Description: this function is invoked when suspend the driver and it direcly - * call the main suspend function and then, if required, on some platform, it - * can call an exit helper. - */ -static int __maybe_unused stmmac_pltfr_suspend(struct device *dev) -{ - int ret; - struct net_device *ndev = dev_get_drvdata(dev); - struct stmmac_priv *priv = netdev_priv(ndev); - struct platform_device *pdev = to_platform_device(dev); - - ret = stmmac_suspend(dev); - - if (!priv->plat->suspend) - stmmac_pltfr_exit(pdev, priv->plat); - - return ret; -} - -/** - * stmmac_pltfr_resume - * @dev: device pointer - * Description: this function is invoked when resume the driver before calling - * the main resume function, on some platforms, it can call own init helper - * if required. - */ -static int __maybe_unused stmmac_pltfr_resume(struct device *dev) -{ - struct net_device *ndev = dev_get_drvdata(dev); - struct stmmac_priv *priv = netdev_priv(ndev); - struct platform_device *pdev = to_platform_device(dev); - int ret; - - if (!priv->plat->resume) { - ret = stmmac_pltfr_init(pdev, priv->plat); - if (ret) - return ret; - } - - return stmmac_resume(dev); -} - static int __maybe_unused stmmac_runtime_suspend(struct device *dev) { struct net_device *ndev = dev_get_drvdata(dev); @@ -998,7 +974,7 @@ static int __maybe_unused stmmac_pltfr_noirq_resume(struct device *dev) } const struct dev_pm_ops stmmac_pltfr_pm_ops = { - SET_SYSTEM_SLEEP_PM_OPS(stmmac_pltfr_suspend, stmmac_pltfr_resume) + SET_SYSTEM_SLEEP_PM_OPS(stmmac_suspend, stmmac_resume) SET_RUNTIME_PM_OPS(stmmac_runtime_suspend, stmmac_runtime_resume, NULL) SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(stmmac_pltfr_noirq_suspend, stmmac_pltfr_noirq_resume) }; From 062b428017332adf8cee09612db2ae0495b4eb4e Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 11 Aug 2025 19:50:58 +0100 Subject: [PATCH 0041/1536] net: stmmac: intel: convert to suspend()/resume() methods Convert intel to use the new suspend() and resume() methods rather than implementing these in custom wrappers around the main driver's suspend/resume methods. This allows this driver to use the stmmac simple PM ops structure. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1ulXbm-008gqs-P9@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- .../net/ethernet/stmicro/stmmac/dwmac-intel.c | 74 +++++++++---------- 1 file changed, 35 insertions(+), 39 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c index ea33ae39be6b..3fac3945cbfa 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c @@ -1231,6 +1231,37 @@ static int stmmac_config_multi_msi(struct pci_dev *pdev, return 0; } +static int intel_eth_pci_suspend(struct device *dev, void *bsp_priv) +{ + struct pci_dev *pdev = to_pci_dev(dev); + int ret; + + ret = pci_save_state(pdev); + if (ret) + return ret; + + pci_wake_from_d3(pdev, true); + pci_set_power_state(pdev, PCI_D3hot); + return 0; +} + +static int intel_eth_pci_resume(struct device *dev, void *bsp_priv) +{ + struct pci_dev *pdev = to_pci_dev(dev); + int ret; + + pci_restore_state(pdev); + pci_set_power_state(pdev, PCI_D0); + + ret = pcim_enable_device(pdev); + if (ret) + return ret; + + pci_set_master(pdev); + + return 0; +} + /** * intel_eth_pci_probe * @@ -1292,6 +1323,9 @@ static int intel_eth_pci_probe(struct pci_dev *pdev, pci_set_master(pdev); plat->bsp_priv = intel_priv; + plat->suspend = intel_eth_pci_suspend; + plat->resume = intel_eth_pci_resume; + intel_priv->mdio_adhoc_addr = INTEL_MGBE_ADHOC_ADDR; intel_priv->crossts_adj = 1; @@ -1355,44 +1389,6 @@ static void intel_eth_pci_remove(struct pci_dev *pdev) clk_unregister_fixed_rate(priv->plat->stmmac_clk); } -static int __maybe_unused intel_eth_pci_suspend(struct device *dev) -{ - struct pci_dev *pdev = to_pci_dev(dev); - int ret; - - ret = stmmac_suspend(dev); - if (ret) - return ret; - - ret = pci_save_state(pdev); - if (ret) - return ret; - - pci_wake_from_d3(pdev, true); - pci_set_power_state(pdev, PCI_D3hot); - return 0; -} - -static int __maybe_unused intel_eth_pci_resume(struct device *dev) -{ - struct pci_dev *pdev = to_pci_dev(dev); - int ret; - - pci_restore_state(pdev); - pci_set_power_state(pdev, PCI_D0); - - ret = pcim_enable_device(pdev); - if (ret) - return ret; - - pci_set_master(pdev); - - return stmmac_resume(dev); -} - -static SIMPLE_DEV_PM_OPS(intel_eth_pm_ops, intel_eth_pci_suspend, - intel_eth_pci_resume); - #define PCI_DEVICE_ID_INTEL_QUARK 0x0937 #define PCI_DEVICE_ID_INTEL_EHL_RGMII1G 0x4b30 #define PCI_DEVICE_ID_INTEL_EHL_SGMII1G 0x4b31 @@ -1442,7 +1438,7 @@ static struct pci_driver intel_eth_pci_driver = { .probe = intel_eth_pci_probe, .remove = intel_eth_pci_remove, .driver = { - .pm = &intel_eth_pm_ops, + .pm = &stmmac_simple_pm_ops, }, }; From 38772638d6d1989655f5a05b273dc62a026e0ef1 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 11 Aug 2025 19:51:04 +0100 Subject: [PATCH 0042/1536] net: stmmac: loongson: convert to suspend()/resume() methods Convert loongson to use the new suspend() and resume() methods rather than implementing these in custom wrappers around the main driver's suspend/resume methods. This allows this driver to use the stmmac simple PM ops structure. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1ulXbs-008gqy-16@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- .../ethernet/stmicro/stmmac/dwmac-loongson.c | 73 +++++++++---------- 1 file changed, 34 insertions(+), 39 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c index e1591e6217d4..5769165ee5ba 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c @@ -521,6 +521,37 @@ static int loongson_dwmac_fix_reset(void *priv, void __iomem *ioaddr) 10000, 2000000); } +static int loongson_dwmac_suspend(struct device *dev, void *bsp_priv) +{ + struct pci_dev *pdev = to_pci_dev(dev); + int ret; + + ret = pci_save_state(pdev); + if (ret) + return ret; + + pci_disable_device(pdev); + pci_wake_from_d3(pdev, true); + return 0; +} + +static int loongson_dwmac_resume(struct device *dev, void *bsp_priv) +{ + struct pci_dev *pdev = to_pci_dev(dev); + int ret; + + pci_restore_state(pdev); + pci_set_power_state(pdev, PCI_D0); + + ret = pci_enable_device(pdev); + if (ret) + return ret; + + pci_set_master(pdev); + + return 0; +} + static int loongson_dwmac_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct plat_stmmacenet_data *plat; @@ -565,6 +596,8 @@ static int loongson_dwmac_probe(struct pci_dev *pdev, const struct pci_device_id plat->bsp_priv = ld; plat->setup = loongson_dwmac_setup; plat->fix_soc_reset = loongson_dwmac_fix_reset; + plat->suspend = loongson_dwmac_suspend; + plat->resume = loongson_dwmac_resume; ld->dev = &pdev->dev; ld->loongson_id = readl(res.addr + GMAC_VERSION) & 0xff; @@ -621,44 +654,6 @@ static void loongson_dwmac_remove(struct pci_dev *pdev) pci_disable_device(pdev); } -static int __maybe_unused loongson_dwmac_suspend(struct device *dev) -{ - struct pci_dev *pdev = to_pci_dev(dev); - int ret; - - ret = stmmac_suspend(dev); - if (ret) - return ret; - - ret = pci_save_state(pdev); - if (ret) - return ret; - - pci_disable_device(pdev); - pci_wake_from_d3(pdev, true); - return 0; -} - -static int __maybe_unused loongson_dwmac_resume(struct device *dev) -{ - struct pci_dev *pdev = to_pci_dev(dev); - int ret; - - pci_restore_state(pdev); - pci_set_power_state(pdev, PCI_D0); - - ret = pci_enable_device(pdev); - if (ret) - return ret; - - pci_set_master(pdev); - - return stmmac_resume(dev); -} - -static SIMPLE_DEV_PM_OPS(loongson_dwmac_pm_ops, loongson_dwmac_suspend, - loongson_dwmac_resume); - static const struct pci_device_id loongson_dwmac_id_table[] = { { PCI_DEVICE_DATA(LOONGSON, GMAC1, &loongson_gmac_pci_info) }, { PCI_DEVICE_DATA(LOONGSON, GMAC2, &loongson_gmac_pci_info) }, @@ -673,7 +668,7 @@ static struct pci_driver loongson_dwmac_driver = { .probe = loongson_dwmac_probe, .remove = loongson_dwmac_remove, .driver = { - .pm = &loongson_dwmac_pm_ops, + .pm = &stmmac_simple_pm_ops, }, }; From c91918a1e9767fc40488f69436f61af90221615b Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 11 Aug 2025 19:51:09 +0100 Subject: [PATCH 0043/1536] net: stmmac: pci: convert to suspend()/resume() methods Convert pci to use the new suspend() and resume() methods rather than implementing these in custom wrappers around the main driver's suspend/resume methods. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1ulXbx-008gr4-5H@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- .../net/ethernet/stmicro/stmmac/stmmac_pci.c | 73 +++++++++---------- 1 file changed, 35 insertions(+), 38 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c index 9c1b54b701f7..e6a7d0ddac2a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c @@ -138,6 +138,37 @@ static const struct stmmac_pci_info snps_gmac5_pci_info = { .setup = snps_gmac5_default_data, }; +static int stmmac_pci_suspend(struct device *dev, void *bsp_priv) +{ + struct pci_dev *pdev = to_pci_dev(dev); + int ret; + + ret = pci_save_state(pdev); + if (ret) + return ret; + + pci_disable_device(pdev); + pci_wake_from_d3(pdev, true); + return 0; +} + +static int stmmac_pci_resume(struct device *dev, void *bsp_priv) +{ + struct pci_dev *pdev = to_pci_dev(dev); + int ret; + + pci_restore_state(pdev); + pci_set_power_state(pdev, PCI_D0); + + ret = pci_enable_device(pdev); + if (ret) + return ret; + + pci_set_master(pdev); + + return 0; +} + /** * stmmac_pci_probe * @@ -217,6 +248,9 @@ static int stmmac_pci_probe(struct pci_dev *pdev, plat->safety_feat_cfg->prtyen = 1; plat->safety_feat_cfg->tmouten = 1; + plat->suspend = stmmac_pci_suspend; + plat->resume = stmmac_pci_resume; + return stmmac_dvr_probe(&pdev->dev, plat, &res); } @@ -231,43 +265,6 @@ static void stmmac_pci_remove(struct pci_dev *pdev) stmmac_dvr_remove(&pdev->dev); } -static int __maybe_unused stmmac_pci_suspend(struct device *dev) -{ - struct pci_dev *pdev = to_pci_dev(dev); - int ret; - - ret = stmmac_suspend(dev); - if (ret) - return ret; - - ret = pci_save_state(pdev); - if (ret) - return ret; - - pci_disable_device(pdev); - pci_wake_from_d3(pdev, true); - return 0; -} - -static int __maybe_unused stmmac_pci_resume(struct device *dev) -{ - struct pci_dev *pdev = to_pci_dev(dev); - int ret; - - pci_restore_state(pdev); - pci_set_power_state(pdev, PCI_D0); - - ret = pci_enable_device(pdev); - if (ret) - return ret; - - pci_set_master(pdev); - - return stmmac_resume(dev); -} - -static SIMPLE_DEV_PM_OPS(stmmac_pm_ops, stmmac_pci_suspend, stmmac_pci_resume); - /* synthetic ID, no official vendor */ #define PCI_VENDOR_ID_STMMAC 0x0700 @@ -289,7 +286,7 @@ static struct pci_driver stmmac_pci_driver = { .probe = stmmac_pci_probe, .remove = stmmac_pci_remove, .driver = { - .pm = &stmmac_pm_ops, + .pm = &stmmac_simple_pm_ops, }, }; From d7a276a5768ffe4f3dbcb0a2d9d037f117b53784 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 11 Aug 2025 19:51:14 +0100 Subject: [PATCH 0044/1536] net: stmmac: rk: convert to suspend()/resume() methods Convert rk to use the new suspend() and resume() methods rather than implementing these in custom wrappers around the main driver's suspend/resume methods. This allows this driver to use the simmac simple PM ops structure. We can further simplify the driver as there is no need to track whether the device was suspended, we only need to check whether the device is wakeup capable in the resume method. This is because the resume method will only be called after the suspend method. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1ulXc2-008grB-9k@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- .../net/ethernet/stmicro/stmmac/dwmac-rk.c | 58 ++++++++----------- 1 file changed, 25 insertions(+), 33 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c index 79b92130a03f..ac8288301994 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c @@ -71,7 +71,6 @@ struct rk_priv_data { phy_interface_t phy_iface; int id; struct regulator *regulator; - bool suspended; const struct rk_gmac_ops *ops; bool clk_enabled; @@ -1706,6 +1705,28 @@ static int rk_set_clk_tx_rate(void *bsp_priv_, struct clk *clk_tx_i, return -EINVAL; } +static int rk_gmac_suspend(struct device *dev, void *bsp_priv_) +{ + struct rk_priv_data *bsp_priv = bsp_priv_; + + /* Keep the PHY up if we use Wake-on-Lan. */ + if (!device_may_wakeup(dev)) + rk_gmac_powerdown(bsp_priv); + + return 0; +} + +static int rk_gmac_resume(struct device *dev, void *bsp_priv_) +{ + struct rk_priv_data *bsp_priv = bsp_priv_; + + /* The PHY was up for Wake-on-Lan. */ + if (!device_may_wakeup(dev)) + rk_gmac_powerup(bsp_priv); + + return 0; +} + static int rk_gmac_probe(struct platform_device *pdev) { struct plat_stmmacenet_data *plat_dat; @@ -1738,6 +1759,8 @@ static int rk_gmac_probe(struct platform_device *pdev) plat_dat->get_interfaces = rk_get_interfaces; plat_dat->set_clk_tx_rate = rk_set_clk_tx_rate; + plat_dat->suspend = rk_gmac_suspend; + plat_dat->resume = rk_gmac_resume; plat_dat->bsp_priv = rk_gmac_setup(pdev, plat_dat, data); if (IS_ERR(plat_dat->bsp_priv)) @@ -1772,37 +1795,6 @@ static void rk_gmac_remove(struct platform_device *pdev) rk_gmac_powerdown(bsp_priv); } -#ifdef CONFIG_PM_SLEEP -static int rk_gmac_suspend(struct device *dev) -{ - struct rk_priv_data *bsp_priv = get_stmmac_bsp_priv(dev); - int ret = stmmac_suspend(dev); - - /* Keep the PHY up if we use Wake-on-Lan. */ - if (!device_may_wakeup(dev)) { - rk_gmac_powerdown(bsp_priv); - bsp_priv->suspended = true; - } - - return ret; -} - -static int rk_gmac_resume(struct device *dev) -{ - struct rk_priv_data *bsp_priv = get_stmmac_bsp_priv(dev); - - /* The PHY was up for Wake-on-Lan. */ - if (bsp_priv->suspended) { - rk_gmac_powerup(bsp_priv); - bsp_priv->suspended = false; - } - - return stmmac_resume(dev); -} -#endif /* CONFIG_PM_SLEEP */ - -static SIMPLE_DEV_PM_OPS(rk_gmac_pm_ops, rk_gmac_suspend, rk_gmac_resume); - static const struct of_device_id rk_gmac_dwmac_match[] = { { .compatible = "rockchip,px30-gmac", .data = &px30_ops }, { .compatible = "rockchip,rk3128-gmac", .data = &rk3128_ops }, @@ -1828,7 +1820,7 @@ static struct platform_driver rk_gmac_dwmac_driver = { .remove = rk_gmac_remove, .driver = { .name = "rk_gmac-dwmac", - .pm = &rk_gmac_pm_ops, + .pm = &stmmac_simple_pm_ops, .of_match_table = rk_gmac_dwmac_match, }, }; From c7308b2f3d0d045005bc9d2110ac5d5085a4fe0c Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 11 Aug 2025 19:51:19 +0100 Subject: [PATCH 0045/1536] net: stmmac: stm32: convert to suspend()/resume() methods Convert stm32 to use the new suspend() and resume() methods rather than implementing these in custom wrappers around the main driver's suspend/resume methods. This allows this driver to use the stmmac simple PM ops structure. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1ulXc7-008grH-Dh@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- .../net/ethernet/stmicro/stmmac/dwmac-stm32.c | 68 +++++++------------ 1 file changed, 23 insertions(+), 45 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c index 1eb16eec9c0d..77a04c4579c9 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c @@ -498,6 +498,26 @@ static int stm32mp1_parse_data(struct stm32_dwmac *dwmac, return err; } +static int stm32_dwmac_suspend(struct device *dev, void *bsp_priv) +{ + struct stm32_dwmac *dwmac = bsp_priv; + + stm32_dwmac_clk_disable(dwmac); + + return dwmac->ops->suspend ? dwmac->ops->suspend(dwmac) : 0; +} + +static int stm32_dwmac_resume(struct device *dev, void *bsp_priv) +{ + struct stmmac_priv *priv = netdev_priv(dev_get_drvdata(dev)); + struct stm32_dwmac *dwmac = bsp_priv; + + if (dwmac->ops->resume) + dwmac->ops->resume(dwmac); + + return stm32_dwmac_init(priv->plat); +} + static int stm32_dwmac_probe(struct platform_device *pdev) { struct plat_stmmacenet_data *plat_dat; @@ -535,6 +555,8 @@ static int stm32_dwmac_probe(struct platform_device *pdev) plat_dat->flags |= STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP; plat_dat->bsp_priv = dwmac; + plat_dat->suspend = stm32_dwmac_suspend; + plat_dat->resume = stm32_dwmac_resume; ret = stm32_dwmac_init(plat_dat); if (ret) @@ -600,50 +622,6 @@ static void stm32mp1_resume(struct stm32_dwmac *dwmac) clk_disable_unprepare(dwmac->clk_ethstp); } -#ifdef CONFIG_PM_SLEEP -static int stm32_dwmac_suspend(struct device *dev) -{ - struct net_device *ndev = dev_get_drvdata(dev); - struct stmmac_priv *priv = netdev_priv(ndev); - struct stm32_dwmac *dwmac = priv->plat->bsp_priv; - - int ret; - - ret = stmmac_suspend(dev); - if (ret) - return ret; - - stm32_dwmac_clk_disable(dwmac); - - if (dwmac->ops->suspend) - ret = dwmac->ops->suspend(dwmac); - - return ret; -} - -static int stm32_dwmac_resume(struct device *dev) -{ - struct net_device *ndev = dev_get_drvdata(dev); - struct stmmac_priv *priv = netdev_priv(ndev); - struct stm32_dwmac *dwmac = priv->plat->bsp_priv; - int ret; - - if (dwmac->ops->resume) - dwmac->ops->resume(dwmac); - - ret = stm32_dwmac_init(priv->plat); - if (ret) - return ret; - - ret = stmmac_resume(dev); - - return ret; -} -#endif /* CONFIG_PM_SLEEP */ - -static SIMPLE_DEV_PM_OPS(stm32_dwmac_pm_ops, - stm32_dwmac_suspend, stm32_dwmac_resume); - static struct stm32_ops stm32mcu_dwmac_data = { .set_mode = stm32mcu_set_mode }; @@ -691,7 +669,7 @@ static struct platform_driver stm32_dwmac_driver = { .remove = stm32_dwmac_remove, .driver = { .name = "stm32-dwmac", - .pm = &stm32_dwmac_pm_ops, + .pm = &stmmac_simple_pm_ops, .of_match_table = stm32_dwmac_match, }, }; From d6e1f2272960cdb5081e9318d6a04de478d80400 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 11 Aug 2025 19:51:24 +0100 Subject: [PATCH 0046/1536] net: stmmac: mediatek: convert to resume() method Convert mediatek to use the resume() platform method rather than the init() platform method as mediatek_dwmac_init() is only called from the resume paths. This will ensure that in a future commit, mediatek_dwmac_init() won't be called when probing the main part of the stmmac driver. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1ulXcC-008grN-Hc@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c index 39421d6a34e4..f1b36f0a401d 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c @@ -523,7 +523,7 @@ static int mediatek_dwmac_clk_init(struct mediatek_dwmac_plat_data *plat) return ret; } -static int mediatek_dwmac_init(struct platform_device *pdev, void *priv) +static int mediatek_dwmac_init(struct device *dev, void *priv) { struct mediatek_dwmac_plat_data *plat = priv; const struct mediatek_dwmac_variant *variant = plat->variant; @@ -532,7 +532,7 @@ static int mediatek_dwmac_init(struct platform_device *pdev, void *priv) if (variant->dwmac_set_phy_interface) { ret = variant->dwmac_set_phy_interface(plat); if (ret) { - dev_err(plat->dev, "failed to set phy interface, err = %d\n", ret); + dev_err(dev, "failed to set phy interface, err = %d\n", ret); return ret; } } @@ -540,7 +540,7 @@ static int mediatek_dwmac_init(struct platform_device *pdev, void *priv) if (variant->dwmac_set_delay) { ret = variant->dwmac_set_delay(plat); if (ret) { - dev_err(plat->dev, "failed to set delay value, err = %d\n", ret); + dev_err(dev, "failed to set delay value, err = %d\n", ret); return ret; } } @@ -589,7 +589,7 @@ static int mediatek_dwmac_common_data(struct platform_device *pdev, plat->maxmtu = ETH_DATA_LEN; plat->host_dma_width = priv_plat->variant->dma_bit_mask; plat->bsp_priv = priv_plat; - plat->init = mediatek_dwmac_init; + plat->resume = mediatek_dwmac_init; plat->clks_config = mediatek_dwmac_clks_config; plat->safety_feat_cfg = devm_kzalloc(&pdev->dev, @@ -654,7 +654,7 @@ static int mediatek_dwmac_probe(struct platform_device *pdev) return PTR_ERR(plat_dat); mediatek_dwmac_common_data(pdev, plat_dat, priv_plat); - mediatek_dwmac_init(pdev, priv_plat); + mediatek_dwmac_init(&pdev->dev, priv_plat); ret = mediatek_dwmac_clks_config(priv_plat, true); if (ret) From 27e5b560a86e479bef247a1327182d155ed0bad6 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 11 Aug 2025 16:13:30 -0700 Subject: [PATCH 0047/1536] selftests: drv-net: add configs for zerocopy Rx Looks like neither IO_URING nor UDMABUF are enabled even tho iou-zcrx.py and devmem.py (respectively) need those. IO_URING gets enabled by default but UDMABUF is missing. Reviewed-by: Joe Damato Reviewed-by: Mina Almasry Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250811231334.561137-2-kuba@kernel.org Signed-off-by: Jakub Kicinski --- tools/testing/selftests/drivers/net/hw/config | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/drivers/net/hw/config b/tools/testing/selftests/drivers/net/hw/config index 88ae719e6f8f..e8a06aa1471c 100644 --- a/tools/testing/selftests/drivers/net/hw/config +++ b/tools/testing/selftests/drivers/net/hw/config @@ -1,5 +1,7 @@ +CONFIG_IO_URING=y CONFIG_IPV6=y CONFIG_IPV6_GRE=y CONFIG_NET_IPGRE=y CONFIG_NET_IPGRE_DEMUX=y +CONFIG_UDMABUF=y CONFIG_VXLAN=y From a94e9cf79ceee04c3e2d042fdf417b8c235c6117 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 11 Aug 2025 16:13:31 -0700 Subject: [PATCH 0048/1536] selftests: drv-net: devmem: remove sudo from system() calls The general expectations for network HW selftests is that they will be run as root. sudo doesn't seem to work on NIPA VMs. While it's probably something solvable in the setup I think we should remove the sudos. devmem is the only networking test using sudo. Reviewed-by: Joe Damato Reviewed-by: Mina Almasry Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250811231334.561137-3-kuba@kernel.org Signed-off-by: Jakub Kicinski --- tools/testing/selftests/drivers/net/hw/ncdevmem.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c index 72f828021f83..be937542b4c0 100644 --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c @@ -287,10 +287,10 @@ static int reset_flow_steering(void) * the exit status. */ - run_command("sudo ethtool -K %s ntuple off >&2", ifname); - run_command("sudo ethtool -K %s ntuple on >&2", ifname); + run_command("ethtool -K %s ntuple off >&2", ifname); + run_command("ethtool -K %s ntuple on >&2", ifname); run_command( - "sudo ethtool -n %s | grep 'Filter:' | awk '{print $2}' | xargs -n1 ethtool -N %s delete >&2", + "ethtool -n %s | grep 'Filter:' | awk '{print $2}' | xargs -n1 ethtool -N %s delete >&2", ifname, ifname); return 0; } @@ -351,12 +351,12 @@ static int configure_headersplit(bool on) static int configure_rss(void) { - return run_command("sudo ethtool -X %s equal %d >&2", ifname, start_queue); + return run_command("ethtool -X %s equal %d >&2", ifname, start_queue); } static int configure_channels(unsigned int rx, unsigned int tx) { - return run_command("sudo ethtool -L %s rx %u tx %u", ifname, rx, tx); + return run_command("ethtool -L %s rx %u tx %u", ifname, rx, tx); } static int configure_flow_steering(struct sockaddr_in6 *server_sin) @@ -374,7 +374,7 @@ static int configure_flow_steering(struct sockaddr_in6 *server_sin) } /* Try configure 5-tuple */ - if (run_command("sudo ethtool -N %s flow-type %s %s %s dst-ip %s %s %s dst-port %s queue %d >&2", + if (run_command("ethtool -N %s flow-type %s %s %s dst-ip %s %s %s dst-port %s queue %d >&2", ifname, type, client_ip ? "src-ip" : "", @@ -384,7 +384,7 @@ static int configure_flow_steering(struct sockaddr_in6 *server_sin) client_ip ? port : "", port, start_queue)) /* If that fails, try configure 3-tuple */ - if (run_command("sudo ethtool -N %s flow-type %s dst-ip %s dst-port %s queue %d >&2", + if (run_command("ethtool -N %s flow-type %s dst-ip %s dst-port %s queue %d >&2", ifname, type, server_addr, From 424e96de30230aac2061f941961be645cf0070d5 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 11 Aug 2025 16:13:32 -0700 Subject: [PATCH 0049/1536] selftests: drv-net: devmem: add / correct the IPv6 support We need to use bracketed IPv6 addresses for socat. Reviewed-by: Joe Damato Reviewed-by: Mina Almasry Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250811231334.561137-4-kuba@kernel.org Signed-off-by: Jakub Kicinski --- tools/testing/selftests/drivers/net/hw/devmem.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/drivers/net/hw/devmem.py b/tools/testing/selftests/drivers/net/hw/devmem.py index baa2f24240ba..0a2533a3d6d6 100755 --- a/tools/testing/selftests/drivers/net/hw/devmem.py +++ b/tools/testing/selftests/drivers/net/hw/devmem.py @@ -24,7 +24,7 @@ def check_rx(cfg) -> None: require_devmem(cfg) port = rand_port() - socat = f"socat -u - TCP{cfg.addr_ipver}:{cfg.addr}:{port},bind={cfg.remote_addr}:{port}" + socat = f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind={cfg.remote_baddr}:{port}" listen_cmd = f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {port} -c {cfg.remote_addr} -v 7" with bkg(listen_cmd, exit_wait=True) as ncdevmem: From 6e9a12f85a7567bb9a41d5230468886bd6a27b20 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 11 Aug 2025 16:13:33 -0700 Subject: [PATCH 0050/1536] selftests: net: terminate bkg() commands on exception There is a number of: with bkg("socat ..LISTEN..", exit_wait=True) uses in the tests. If whatever is supposed to send the traffic fails we will get stuck in the bkg(). Try to kill the process in case of exception, to avoid the long wait. A specific example where this happens is the devmem Tx tests. Reviewed-by: Joe Damato Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250811231334.561137-5-kuba@kernel.org Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/lib/py/utils.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/lib/py/utils.py b/tools/testing/selftests/net/lib/py/utils.py index f395c90fb0f1..4ac9249c85ab 100644 --- a/tools/testing/selftests/net/lib/py/utils.py +++ b/tools/testing/selftests/net/lib/py/utils.py @@ -117,6 +117,7 @@ class bkg(cmd): shell=shell, fail=fail, ns=ns, host=host, ksft_wait=ksft_wait) self.terminate = not exit_wait and not ksft_wait + self._exit_wait = exit_wait self.check_fail = fail if shell and self.terminate: @@ -127,7 +128,9 @@ class bkg(cmd): return self def __exit__(self, ex_type, ex_value, ex_tb): - return self.process(terminate=self.terminate, fail=self.check_fail) + # Force termination on exception + terminate = self.terminate or (self._exit_wait and ex_type) + return self.process(terminate=terminate, fail=self.check_fail) global_defer_queue = [] From c378c497f3fe8dc8f08b487fce49c3d96e4cada8 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 11 Aug 2025 16:13:34 -0700 Subject: [PATCH 0051/1536] selftests: drv-net: devmem: flip the direction of Tx tests The Device Under Test should always be the local system. While the Rx test gets this right the Tx test is sending from remote to local. So Tx of DMABUF memory happens on remote. These tests never run in NIPA since we don't have a compatible device so we haven't caught this. Reviewed-by: Joe Damato Reviewed-by: Mina Almasry Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250811231334.561137-6-kuba@kernel.org Signed-off-by: Jakub Kicinski --- tools/testing/selftests/drivers/net/hw/devmem.py | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hw/devmem.py b/tools/testing/selftests/drivers/net/hw/devmem.py index 0a2533a3d6d6..45c2d49d55b6 100755 --- a/tools/testing/selftests/drivers/net/hw/devmem.py +++ b/tools/testing/selftests/drivers/net/hw/devmem.py @@ -42,9 +42,9 @@ def check_tx(cfg) -> None: port = rand_port() listen_cmd = f"socat -U - TCP{cfg.addr_ipver}-LISTEN:{port}" - with bkg(listen_cmd) as socat: - wait_port_listen(port) - cmd(f"echo -e \"hello\\nworld\"| {cfg.bin_remote} -f {cfg.ifname} -s {cfg.addr} -p {port}", host=cfg.remote, shell=True) + with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as socat: + wait_port_listen(port, host=cfg.remote) + cmd(f"echo -e \"hello\\nworld\"| {cfg.bin_local} -f {cfg.ifname} -s {cfg.remote_addr} -p {port}", shell=True) ksft_eq(socat.stdout.strip(), "hello\nworld") @@ -56,9 +56,9 @@ def check_tx_chunks(cfg) -> None: port = rand_port() listen_cmd = f"socat -U - TCP{cfg.addr_ipver}-LISTEN:{port}" - with bkg(listen_cmd, exit_wait=True) as socat: - wait_port_listen(port) - cmd(f"echo -e \"hello\\nworld\"| {cfg.bin_remote} -f {cfg.ifname} -s {cfg.addr} -p {port} -z 3", host=cfg.remote, shell=True) + with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as socat: + wait_port_listen(port, host=cfg.remote) + cmd(f"echo -e \"hello\\nworld\"| {cfg.bin_local} -f {cfg.ifname} -s {cfg.remote_addr} -p {port} -z 3", shell=True) ksft_eq(socat.stdout.strip(), "hello\nworld") From cebd717d8f010a9897105434dbc6bdbd85693596 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Mon, 11 Aug 2025 17:31:36 +0200 Subject: [PATCH 0052/1536] dt-bindings: net: airoha: npu: Add memory regions used for wlan offload Document memory regions used by Airoha EN7581 NPU for wlan traffic offloading. The brand new added memory regions do not introduce any backward compatibility issues since they will be used just to offload traffic to/from the MT76 wireless NIC and the MT76 probing will not fail if these memory regions are not provide, it will just disable offloading via the NPU module. Reviewed-by: Krzysztof Kozlowski Signed-off-by: Lorenzo Bianconi Link: https://patch.msgid.link/20250811-airoha-en7581-wlan-offlaod-v7-1-58823603bb4e@kernel.org Signed-off-by: Jakub Kicinski --- .../bindings/net/airoha,en7581-npu.yaml | 22 +++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/Documentation/devicetree/bindings/net/airoha,en7581-npu.yaml b/Documentation/devicetree/bindings/net/airoha,en7581-npu.yaml index 76dd97c3fb40..c7644e6586d3 100644 --- a/Documentation/devicetree/bindings/net/airoha,en7581-npu.yaml +++ b/Documentation/devicetree/bindings/net/airoha,en7581-npu.yaml @@ -41,9 +41,21 @@ properties: - description: wlan irq line5 memory-region: - maxItems: 1 - description: - Memory used to store NPU firmware binary. + oneOf: + - items: + - description: NPU firmware binary region + - items: + - description: NPU firmware binary region + - description: NPU wlan offload RX buffers region + - description: NPU wlan offload TX buffers region + - description: NPU wlan offload TX packet identifiers region + + memory-region-names: + items: + - const: firmware + - const: pkt + - const: tx-pkt + - const: tx-bufid required: - compatible @@ -79,6 +91,8 @@ examples: , , ; - memory-region = <&npu_binary>; + memory-region = <&npu_firmware>, <&npu_pkt>, <&npu_txpkt>, + <&npu_txbufid>; + memory-region-names = "firmware", "pkt", "tx-pkt", "tx-bufid"; }; }; From 564923b02c1d2fe02ee789f9849ff79979b63b9f Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Mon, 11 Aug 2025 17:31:37 +0200 Subject: [PATCH 0053/1536] net: airoha: npu: Add NPU wlan memory initialization commands Introduce wlan_init_reserved_memory callback used by MT76 driver during NPU wlan offloading setup. This is a preliminary patch to enable wlan flowtable offload for EN7581 SoC with MT76 driver. Reviewed-by: Simon Horman Signed-off-by: Lorenzo Bianconi Link: https://patch.msgid.link/20250811-airoha-en7581-wlan-offlaod-v7-2-58823603bb4e@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/airoha/airoha_npu.c | 82 ++++++++++++++++++++++++ drivers/net/ethernet/airoha/airoha_npu.h | 38 +++++++++++ 2 files changed, 120 insertions(+) diff --git a/drivers/net/ethernet/airoha/airoha_npu.c b/drivers/net/ethernet/airoha/airoha_npu.c index a802f95df99d..731e0119d988 100644 --- a/drivers/net/ethernet/airoha/airoha_npu.c +++ b/drivers/net/ethernet/airoha/airoha_npu.c @@ -124,6 +124,13 @@ struct ppe_mbox_data { }; }; +struct wlan_mbox_data { + u32 ifindex:4; + u32 func_type:4; + u32 func_id; + DECLARE_FLEX_ARRAY(u8, d); +}; + static int airoha_npu_send_msg(struct airoha_npu *npu, int func_id, void *p, int size) { @@ -390,6 +397,80 @@ out: return err; } +static int airoha_npu_wlan_msg_send(struct airoha_npu *npu, int ifindex, + enum airoha_npu_wlan_set_cmd func_id, + void *data, int data_len, gfp_t gfp) +{ + struct wlan_mbox_data *wlan_data; + int err, len; + + len = sizeof(*wlan_data) + data_len; + wlan_data = kzalloc(len, gfp); + if (!wlan_data) + return -ENOMEM; + + wlan_data->ifindex = ifindex; + wlan_data->func_type = NPU_OP_SET; + wlan_data->func_id = func_id; + memcpy(wlan_data->d, data, data_len); + + err = airoha_npu_send_msg(npu, NPU_FUNC_WIFI, wlan_data, len); + kfree(wlan_data); + + return err; +} + +static int +airoha_npu_wlan_set_reserved_memory(struct airoha_npu *npu, + int ifindex, const char *name, + enum airoha_npu_wlan_set_cmd func_id) +{ + struct device *dev = npu->dev; + struct resource res; + int err; + u32 val; + + err = of_reserved_mem_region_to_resource_byname(dev->of_node, name, + &res); + if (err) + return err; + + val = res.start; + return airoha_npu_wlan_msg_send(npu, ifindex, func_id, &val, + sizeof(val), GFP_KERNEL); +} + +static int airoha_npu_wlan_init_memory(struct airoha_npu *npu) +{ + enum airoha_npu_wlan_set_cmd cmd = WLAN_FUNC_SET_WAIT_NPU_BAND0_ONCPU; + u32 val = 0; + int err; + + err = airoha_npu_wlan_msg_send(npu, 1, cmd, &val, sizeof(val), + GFP_KERNEL); + if (err) + return err; + + cmd = WLAN_FUNC_SET_WAIT_TX_BUF_CHECK_ADDR; + err = airoha_npu_wlan_set_reserved_memory(npu, 0, "tx-bufid", cmd); + if (err) + return err; + + cmd = WLAN_FUNC_SET_WAIT_PKT_BUF_ADDR; + err = airoha_npu_wlan_set_reserved_memory(npu, 0, "pkt", cmd); + if (err) + return err; + + cmd = WLAN_FUNC_SET_WAIT_TX_PKT_BUF_ADDR; + err = airoha_npu_wlan_set_reserved_memory(npu, 0, "tx-pkt", cmd); + if (err) + return err; + + cmd = WLAN_FUNC_SET_WAIT_IS_FORCE_TO_CPU; + return airoha_npu_wlan_msg_send(npu, 0, cmd, &val, sizeof(val), + GFP_KERNEL); +} + struct airoha_npu *airoha_npu_get(struct device *dev, dma_addr_t *stats_addr) { struct platform_device *pdev; @@ -493,6 +574,7 @@ static int airoha_npu_probe(struct platform_device *pdev) npu->ops.ppe_deinit = airoha_npu_ppe_deinit; npu->ops.ppe_flush_sram_entries = airoha_npu_ppe_flush_sram_entries; npu->ops.ppe_foe_commit_entry = airoha_npu_foe_commit_entry; + npu->ops.wlan_init_reserved_memory = airoha_npu_wlan_init_memory; npu->regmap = devm_regmap_init_mmio(dev, base, ®map_config); if (IS_ERR(npu->regmap)) diff --git a/drivers/net/ethernet/airoha/airoha_npu.h b/drivers/net/ethernet/airoha/airoha_npu.h index 98ec3be74ce4..0cb5356b00e9 100644 --- a/drivers/net/ethernet/airoha/airoha_npu.h +++ b/drivers/net/ethernet/airoha/airoha_npu.h @@ -6,6 +6,43 @@ #define NPU_NUM_CORES 8 +enum airoha_npu_wlan_set_cmd { + WLAN_FUNC_SET_WAIT_PCIE_ADDR, + WLAN_FUNC_SET_WAIT_DESC, + WLAN_FUNC_SET_WAIT_NPU_INIT_DONE, + WLAN_FUNC_SET_WAIT_TRAN_TO_CPU, + WLAN_FUNC_SET_WAIT_BA_WIN_SIZE, + WLAN_FUNC_SET_WAIT_DRIVER_MODEL, + WLAN_FUNC_SET_WAIT_DEL_STA, + WLAN_FUNC_SET_WAIT_DRAM_BA_NODE_ADDR, + WLAN_FUNC_SET_WAIT_PKT_BUF_ADDR, + WLAN_FUNC_SET_WAIT_IS_TEST_NOBA, + WLAN_FUNC_SET_WAIT_FLUSHONE_TIMEOUT, + WLAN_FUNC_SET_WAIT_FLUSHALL_TIMEOUT, + WLAN_FUNC_SET_WAIT_IS_FORCE_TO_CPU, + WLAN_FUNC_SET_WAIT_PCIE_STATE, + WLAN_FUNC_SET_WAIT_PCIE_PORT_TYPE, + WLAN_FUNC_SET_WAIT_ERROR_RETRY_TIMES, + WLAN_FUNC_SET_WAIT_BAR_INFO, + WLAN_FUNC_SET_WAIT_FAST_FLAG, + WLAN_FUNC_SET_WAIT_NPU_BAND0_ONCPU, + WLAN_FUNC_SET_WAIT_TX_RING_PCIE_ADDR, + WLAN_FUNC_SET_WAIT_TX_DESC_HW_BASE, + WLAN_FUNC_SET_WAIT_TX_BUF_SPACE_HW_BASE, + WLAN_FUNC_SET_WAIT_RX_RING_FOR_TXDONE_HW_BASE, + WLAN_FUNC_SET_WAIT_TX_PKT_BUF_ADDR, + WLAN_FUNC_SET_WAIT_INODE_TXRX_REG_ADDR, + WLAN_FUNC_SET_WAIT_INODE_DEBUG_FLAG, + WLAN_FUNC_SET_WAIT_INODE_HW_CFG_INFO, + WLAN_FUNC_SET_WAIT_INODE_STOP_ACTION, + WLAN_FUNC_SET_WAIT_INODE_PCIE_SWAP, + WLAN_FUNC_SET_WAIT_RATELIMIT_CTRL, + WLAN_FUNC_SET_WAIT_HWNAT_INIT, + WLAN_FUNC_SET_WAIT_ARHT_CHIP_INFO, + WLAN_FUNC_SET_WAIT_TX_BUF_CHECK_ADDR, + WLAN_FUNC_SET_WAIT_TOKEN_ID_SIZE, +}; + struct airoha_npu { struct device *dev; struct regmap *regmap; @@ -29,6 +66,7 @@ struct airoha_npu { dma_addr_t foe_addr, u32 entry_size, u32 hash, bool ppe2); + int (*wlan_init_reserved_memory)(struct airoha_npu *npu); } ops; }; From f97fc66185b2004ad5f393f78b3e645009ddd1d0 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Mon, 11 Aug 2025 17:31:38 +0200 Subject: [PATCH 0054/1536] net: airoha: npu: Add wlan_{send,get}_msg NPU callbacks Introduce wlan_send_msg() and wlan_get_msg() NPU wlan callbacks used by the wlan driver (MT76) to initialize NPU module registers in order to offload wireless-wired traffic. This is a preliminary patch to enable wlan flowtable offload for EN7581 SoC with MT76 driver. Signed-off-by: Lorenzo Bianconi Link: https://patch.msgid.link/20250811-airoha-en7581-wlan-offlaod-v7-3-58823603bb4e@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/airoha/airoha_npu.c | 52 ++++++++++++++++++++++++ drivers/net/ethernet/airoha/airoha_npu.h | 22 ++++++++++ 2 files changed, 74 insertions(+) diff --git a/drivers/net/ethernet/airoha/airoha_npu.c b/drivers/net/ethernet/airoha/airoha_npu.c index 731e0119d988..2a337f00386f 100644 --- a/drivers/net/ethernet/airoha/airoha_npu.c +++ b/drivers/net/ethernet/airoha/airoha_npu.c @@ -42,6 +42,22 @@ #define REG_CR_MBQ8_CTRL(_n) (NPU_MBOX_BASE_ADDR + 0x0b0 + ((_n) << 2)) #define REG_CR_NPU_MIB(_n) (NPU_MBOX_BASE_ADDR + 0x140 + ((_n) << 2)) +#define NPU_WLAN_BASE_ADDR 0x30d000 + +#define REG_IRQ_STATUS (NPU_WLAN_BASE_ADDR + 0x030) +#define REG_IRQ_RXDONE(_n) (NPU_WLAN_BASE_ADDR + ((_n) << 2) + 0x034) +#define NPU_IRQ_RX_MASK(_n) ((_n) == 1 ? BIT(17) : BIT(16)) + +#define REG_TX_BASE(_n) (NPU_WLAN_BASE_ADDR + ((_n) << 4) + 0x080) +#define REG_TX_DSCP_NUM(_n) (NPU_WLAN_BASE_ADDR + ((_n) << 4) + 0x084) +#define REG_TX_CPU_IDX(_n) (NPU_WLAN_BASE_ADDR + ((_n) << 4) + 0x088) +#define REG_TX_DMA_IDX(_n) (NPU_WLAN_BASE_ADDR + ((_n) << 4) + 0x08c) + +#define REG_RX_BASE(_n) (NPU_WLAN_BASE_ADDR + ((_n) << 4) + 0x180) +#define REG_RX_DSCP_NUM(_n) (NPU_WLAN_BASE_ADDR + ((_n) << 4) + 0x184) +#define REG_RX_CPU_IDX(_n) (NPU_WLAN_BASE_ADDR + ((_n) << 4) + 0x188) +#define REG_RX_DMA_IDX(_n) (NPU_WLAN_BASE_ADDR + ((_n) << 4) + 0x18c) + #define NPU_TIMER_BASE_ADDR 0x310100 #define REG_WDT_TIMER_CTRL(_n) (NPU_TIMER_BASE_ADDR + ((_n) * 0x100)) #define WDT_EN_MASK BIT(25) @@ -420,6 +436,30 @@ static int airoha_npu_wlan_msg_send(struct airoha_npu *npu, int ifindex, return err; } +static int airoha_npu_wlan_msg_get(struct airoha_npu *npu, int ifindex, + enum airoha_npu_wlan_get_cmd func_id, + void *data, int data_len, gfp_t gfp) +{ + struct wlan_mbox_data *wlan_data; + int err, len; + + len = sizeof(*wlan_data) + data_len; + wlan_data = kzalloc(len, gfp); + if (!wlan_data) + return -ENOMEM; + + wlan_data->ifindex = ifindex; + wlan_data->func_type = NPU_OP_GET; + wlan_data->func_id = func_id; + + err = airoha_npu_send_msg(npu, NPU_FUNC_WIFI, wlan_data, len); + if (!err) + memcpy(data, wlan_data->d, data_len); + kfree(wlan_data); + + return err; +} + static int airoha_npu_wlan_set_reserved_memory(struct airoha_npu *npu, int ifindex, const char *name, @@ -471,6 +511,15 @@ static int airoha_npu_wlan_init_memory(struct airoha_npu *npu) GFP_KERNEL); } +static u32 airoha_npu_wlan_queue_addr_get(struct airoha_npu *npu, int qid, + bool xmit) +{ + if (xmit) + return REG_TX_BASE(qid + 2); + + return REG_RX_BASE(qid); +} + struct airoha_npu *airoha_npu_get(struct device *dev, dma_addr_t *stats_addr) { struct platform_device *pdev; @@ -575,6 +624,9 @@ static int airoha_npu_probe(struct platform_device *pdev) npu->ops.ppe_flush_sram_entries = airoha_npu_ppe_flush_sram_entries; npu->ops.ppe_foe_commit_entry = airoha_npu_foe_commit_entry; npu->ops.wlan_init_reserved_memory = airoha_npu_wlan_init_memory; + npu->ops.wlan_send_msg = airoha_npu_wlan_msg_send; + npu->ops.wlan_get_msg = airoha_npu_wlan_msg_get; + npu->ops.wlan_get_queue_addr = airoha_npu_wlan_queue_addr_get; npu->regmap = devm_regmap_init_mmio(dev, base, ®map_config); if (IS_ERR(npu->regmap)) diff --git a/drivers/net/ethernet/airoha/airoha_npu.h b/drivers/net/ethernet/airoha/airoha_npu.h index 0cb5356b00e9..7b9ff370c879 100644 --- a/drivers/net/ethernet/airoha/airoha_npu.h +++ b/drivers/net/ethernet/airoha/airoha_npu.h @@ -43,6 +43,20 @@ enum airoha_npu_wlan_set_cmd { WLAN_FUNC_SET_WAIT_TOKEN_ID_SIZE, }; +enum airoha_npu_wlan_get_cmd { + WLAN_FUNC_GET_WAIT_NPU_INFO, + WLAN_FUNC_GET_WAIT_LAST_RATE, + WLAN_FUNC_GET_WAIT_COUNTER, + WLAN_FUNC_GET_WAIT_DBG_COUNTER, + WLAN_FUNC_GET_WAIT_RXDESC_BASE, + WLAN_FUNC_GET_WAIT_WCID_DBG_COUNTER, + WLAN_FUNC_GET_WAIT_DMA_ADDR, + WLAN_FUNC_GET_WAIT_RING_SIZE, + WLAN_FUNC_GET_WAIT_NPU_SUPPORT_MAP, + WLAN_FUNC_GET_WAIT_MDC_LOCK_ADDRESS, + WLAN_FUNC_GET_WAIT_NPU_VERSION, +}; + struct airoha_npu { struct device *dev; struct regmap *regmap; @@ -67,6 +81,14 @@ struct airoha_npu { u32 entry_size, u32 hash, bool ppe2); int (*wlan_init_reserved_memory)(struct airoha_npu *npu); + int (*wlan_send_msg)(struct airoha_npu *npu, int ifindex, + enum airoha_npu_wlan_set_cmd func_id, + void *data, int data_len, gfp_t gfp); + int (*wlan_get_msg)(struct airoha_npu *npu, int ifindex, + enum airoha_npu_wlan_get_cmd func_id, + void *data, int data_len, gfp_t gfp); + u32 (*wlan_get_queue_addr)(struct airoha_npu *npu, int qid, + bool xmit); } ops; }; From 03b7ca3ee5e1b700c462aed5b6cb88f616d6ba7f Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Mon, 11 Aug 2025 17:31:39 +0200 Subject: [PATCH 0055/1536] net: airoha: npu: Add wlan irq management callbacks Introduce callbacks used by the MT76 driver to configure NPU SoC interrupts. This is a preliminary patch to enable wlan flowtable offload for EN7581 SoC with MT76 driver. Reviewed-by: Simon Horman Signed-off-by: Lorenzo Bianconi Link: https://patch.msgid.link/20250811-airoha-en7581-wlan-offlaod-v7-4-58823603bb4e@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/airoha/airoha_npu.c | 27 ++++++++++++++++++++++++ drivers/net/ethernet/airoha/airoha_npu.h | 4 ++++ 2 files changed, 31 insertions(+) diff --git a/drivers/net/ethernet/airoha/airoha_npu.c b/drivers/net/ethernet/airoha/airoha_npu.c index 2a337f00386f..5d1355126d16 100644 --- a/drivers/net/ethernet/airoha/airoha_npu.c +++ b/drivers/net/ethernet/airoha/airoha_npu.c @@ -520,6 +520,29 @@ static u32 airoha_npu_wlan_queue_addr_get(struct airoha_npu *npu, int qid, return REG_RX_BASE(qid); } +static void airoha_npu_wlan_irq_status_set(struct airoha_npu *npu, u32 val) +{ + regmap_write(npu->regmap, REG_IRQ_STATUS, val); +} + +static u32 airoha_npu_wlan_irq_status_get(struct airoha_npu *npu, int q) +{ + u32 val; + + regmap_read(npu->regmap, REG_IRQ_STATUS, &val); + return val; +} + +static void airoha_npu_wlan_irq_enable(struct airoha_npu *npu, int q) +{ + regmap_set_bits(npu->regmap, REG_IRQ_RXDONE(q), NPU_IRQ_RX_MASK(q)); +} + +static void airoha_npu_wlan_irq_disable(struct airoha_npu *npu, int q) +{ + regmap_clear_bits(npu->regmap, REG_IRQ_RXDONE(q), NPU_IRQ_RX_MASK(q)); +} + struct airoha_npu *airoha_npu_get(struct device *dev, dma_addr_t *stats_addr) { struct platform_device *pdev; @@ -627,6 +650,10 @@ static int airoha_npu_probe(struct platform_device *pdev) npu->ops.wlan_send_msg = airoha_npu_wlan_msg_send; npu->ops.wlan_get_msg = airoha_npu_wlan_msg_get; npu->ops.wlan_get_queue_addr = airoha_npu_wlan_queue_addr_get; + npu->ops.wlan_set_irq_status = airoha_npu_wlan_irq_status_set; + npu->ops.wlan_get_irq_status = airoha_npu_wlan_irq_status_get; + npu->ops.wlan_enable_irq = airoha_npu_wlan_irq_enable; + npu->ops.wlan_disable_irq = airoha_npu_wlan_irq_disable; npu->regmap = devm_regmap_init_mmio(dev, base, ®map_config); if (IS_ERR(npu->regmap)) diff --git a/drivers/net/ethernet/airoha/airoha_npu.h b/drivers/net/ethernet/airoha/airoha_npu.h index 7b9ff370c879..84c83753c2bd 100644 --- a/drivers/net/ethernet/airoha/airoha_npu.h +++ b/drivers/net/ethernet/airoha/airoha_npu.h @@ -89,6 +89,10 @@ struct airoha_npu { void *data, int data_len, gfp_t gfp); u32 (*wlan_get_queue_addr)(struct airoha_npu *npu, int qid, bool xmit); + void (*wlan_set_irq_status)(struct airoha_npu *npu, u32 val); + u32 (*wlan_get_irq_status)(struct airoha_npu *npu, int q); + void (*wlan_enable_irq)(struct airoha_npu *npu, int q); + void (*wlan_disable_irq)(struct airoha_npu *npu, int q); } ops; }; From a1740b16c83729d908c760eaa821f27b51e58a13 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Mon, 11 Aug 2025 17:31:40 +0200 Subject: [PATCH 0056/1536] net: airoha: npu: Read NPU wlan interrupt lines from the DTS Read all NPU wlan IRQ lines from the NPU device-tree node. NPU module fires wlan irq lines when the traffic to/from the WiFi NIC is not hw accelerated (these interrupts will be consumed by the MT76 driver in subsequent patches). This is a preliminary patch to enable wlan flowtable offload for EN7581 SoC. Signed-off-by: Lorenzo Bianconi Link: https://patch.msgid.link/20250811-airoha-en7581-wlan-offlaod-v7-5-58823603bb4e@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/airoha/airoha_npu.c | 9 +++++++++ drivers/net/ethernet/airoha/airoha_npu.h | 3 +++ 2 files changed, 12 insertions(+) diff --git a/drivers/net/ethernet/airoha/airoha_npu.c b/drivers/net/ethernet/airoha/airoha_npu.c index 5d1355126d16..e0448e1225b8 100644 --- a/drivers/net/ethernet/airoha/airoha_npu.c +++ b/drivers/net/ethernet/airoha/airoha_npu.c @@ -690,6 +690,15 @@ static int airoha_npu_probe(struct platform_device *pdev) INIT_WORK(&core->wdt_work, airoha_npu_wdt_work); } + /* wlan IRQ lines */ + for (i = 0; i < ARRAY_SIZE(npu->irqs); i++) { + irq = platform_get_irq(pdev, i + ARRAY_SIZE(npu->cores) + 1); + if (irq < 0) + return irq; + + npu->irqs[i] = irq; + } + err = dma_set_coherent_mask(dev, DMA_BIT_MASK(32)); if (err) return err; diff --git a/drivers/net/ethernet/airoha/airoha_npu.h b/drivers/net/ethernet/airoha/airoha_npu.h index 84c83753c2bd..a448c74208a9 100644 --- a/drivers/net/ethernet/airoha/airoha_npu.h +++ b/drivers/net/ethernet/airoha/airoha_npu.h @@ -5,6 +5,7 @@ */ #define NPU_NUM_CORES 8 +#define NPU_NUM_IRQ 6 enum airoha_npu_wlan_set_cmd { WLAN_FUNC_SET_WAIT_PCIE_ADDR, @@ -68,6 +69,8 @@ struct airoha_npu { struct work_struct wdt_work; } cores[NPU_NUM_CORES]; + int irqs[NPU_NUM_IRQ]; + struct airoha_foe_stats __iomem *stats; struct { From 29c4a3ce508961a02d185ead2d52699b16d82c6d Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Mon, 11 Aug 2025 17:31:41 +0200 Subject: [PATCH 0057/1536] net: airoha: npu: Enable core 3 for WiFi offloading NPU core 3 is responsible for WiFi offloading so enable it during NPU probe. Reviewed-by: Simon Horman Signed-off-by: Lorenzo Bianconi Link: https://patch.msgid.link/20250811-airoha-en7581-wlan-offlaod-v7-6-58823603bb4e@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/airoha/airoha_npu.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/airoha/airoha_npu.c b/drivers/net/ethernet/airoha/airoha_npu.c index e0448e1225b8..66a8a992dbf2 100644 --- a/drivers/net/ethernet/airoha/airoha_npu.c +++ b/drivers/net/ethernet/airoha/airoha_npu.c @@ -720,8 +720,7 @@ static int airoha_npu_probe(struct platform_device *pdev) usleep_range(1000, 2000); /* enable NPU cores */ - /* do not start core3 since it is used for WiFi offloading */ - regmap_write(npu->regmap, REG_CR_BOOT_CONFIG, 0xf7); + regmap_write(npu->regmap, REG_CR_BOOT_CONFIG, 0xff); regmap_write(npu->regmap, REG_CR_BOOT_TRIGGER, 0x1); msleep(100); From b3ef7bdec66fb1813e865fd39d179a93cefd2015 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Mon, 11 Aug 2025 17:31:42 +0200 Subject: [PATCH 0058/1536] net: airoha: Add airoha_offload.h header Move NPU definitions to airoha_offload.h in include/linux/soc/airoha/ in order to allow the MT76 driver to access the callback definitions. Signed-off-by: Lorenzo Bianconi Link: https://patch.msgid.link/20250811-airoha-en7581-wlan-offlaod-v7-7-58823603bb4e@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/airoha/airoha_npu.c | 2 +- drivers/net/ethernet/airoha/airoha_npu.h | 103 --------- drivers/net/ethernet/airoha/airoha_ppe.c | 2 +- include/linux/soc/airoha/airoha_offload.h | 260 ++++++++++++++++++++++ 4 files changed, 262 insertions(+), 105 deletions(-) delete mode 100644 drivers/net/ethernet/airoha/airoha_npu.h create mode 100644 include/linux/soc/airoha/airoha_offload.h diff --git a/drivers/net/ethernet/airoha/airoha_npu.c b/drivers/net/ethernet/airoha/airoha_npu.c index 66a8a992dbf2..1a6b191ae0b0 100644 --- a/drivers/net/ethernet/airoha/airoha_npu.c +++ b/drivers/net/ethernet/airoha/airoha_npu.c @@ -11,9 +11,9 @@ #include #include #include +#include #include "airoha_eth.h" -#include "airoha_npu.h" #define NPU_EN7581_FIRMWARE_DATA "airoha/en7581_npu_data.bin" #define NPU_EN7581_FIRMWARE_RV32 "airoha/en7581_npu_rv32.bin" diff --git a/drivers/net/ethernet/airoha/airoha_npu.h b/drivers/net/ethernet/airoha/airoha_npu.h deleted file mode 100644 index a448c74208a9..000000000000 --- a/drivers/net/ethernet/airoha/airoha_npu.h +++ /dev/null @@ -1,103 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (c) 2025 AIROHA Inc - * Author: Lorenzo Bianconi - */ - -#define NPU_NUM_CORES 8 -#define NPU_NUM_IRQ 6 - -enum airoha_npu_wlan_set_cmd { - WLAN_FUNC_SET_WAIT_PCIE_ADDR, - WLAN_FUNC_SET_WAIT_DESC, - WLAN_FUNC_SET_WAIT_NPU_INIT_DONE, - WLAN_FUNC_SET_WAIT_TRAN_TO_CPU, - WLAN_FUNC_SET_WAIT_BA_WIN_SIZE, - WLAN_FUNC_SET_WAIT_DRIVER_MODEL, - WLAN_FUNC_SET_WAIT_DEL_STA, - WLAN_FUNC_SET_WAIT_DRAM_BA_NODE_ADDR, - WLAN_FUNC_SET_WAIT_PKT_BUF_ADDR, - WLAN_FUNC_SET_WAIT_IS_TEST_NOBA, - WLAN_FUNC_SET_WAIT_FLUSHONE_TIMEOUT, - WLAN_FUNC_SET_WAIT_FLUSHALL_TIMEOUT, - WLAN_FUNC_SET_WAIT_IS_FORCE_TO_CPU, - WLAN_FUNC_SET_WAIT_PCIE_STATE, - WLAN_FUNC_SET_WAIT_PCIE_PORT_TYPE, - WLAN_FUNC_SET_WAIT_ERROR_RETRY_TIMES, - WLAN_FUNC_SET_WAIT_BAR_INFO, - WLAN_FUNC_SET_WAIT_FAST_FLAG, - WLAN_FUNC_SET_WAIT_NPU_BAND0_ONCPU, - WLAN_FUNC_SET_WAIT_TX_RING_PCIE_ADDR, - WLAN_FUNC_SET_WAIT_TX_DESC_HW_BASE, - WLAN_FUNC_SET_WAIT_TX_BUF_SPACE_HW_BASE, - WLAN_FUNC_SET_WAIT_RX_RING_FOR_TXDONE_HW_BASE, - WLAN_FUNC_SET_WAIT_TX_PKT_BUF_ADDR, - WLAN_FUNC_SET_WAIT_INODE_TXRX_REG_ADDR, - WLAN_FUNC_SET_WAIT_INODE_DEBUG_FLAG, - WLAN_FUNC_SET_WAIT_INODE_HW_CFG_INFO, - WLAN_FUNC_SET_WAIT_INODE_STOP_ACTION, - WLAN_FUNC_SET_WAIT_INODE_PCIE_SWAP, - WLAN_FUNC_SET_WAIT_RATELIMIT_CTRL, - WLAN_FUNC_SET_WAIT_HWNAT_INIT, - WLAN_FUNC_SET_WAIT_ARHT_CHIP_INFO, - WLAN_FUNC_SET_WAIT_TX_BUF_CHECK_ADDR, - WLAN_FUNC_SET_WAIT_TOKEN_ID_SIZE, -}; - -enum airoha_npu_wlan_get_cmd { - WLAN_FUNC_GET_WAIT_NPU_INFO, - WLAN_FUNC_GET_WAIT_LAST_RATE, - WLAN_FUNC_GET_WAIT_COUNTER, - WLAN_FUNC_GET_WAIT_DBG_COUNTER, - WLAN_FUNC_GET_WAIT_RXDESC_BASE, - WLAN_FUNC_GET_WAIT_WCID_DBG_COUNTER, - WLAN_FUNC_GET_WAIT_DMA_ADDR, - WLAN_FUNC_GET_WAIT_RING_SIZE, - WLAN_FUNC_GET_WAIT_NPU_SUPPORT_MAP, - WLAN_FUNC_GET_WAIT_MDC_LOCK_ADDRESS, - WLAN_FUNC_GET_WAIT_NPU_VERSION, -}; - -struct airoha_npu { - struct device *dev; - struct regmap *regmap; - - struct airoha_npu_core { - struct airoha_npu *npu; - /* protect concurrent npu memory accesses */ - spinlock_t lock; - struct work_struct wdt_work; - } cores[NPU_NUM_CORES]; - - int irqs[NPU_NUM_IRQ]; - - struct airoha_foe_stats __iomem *stats; - - struct { - int (*ppe_init)(struct airoha_npu *npu); - int (*ppe_deinit)(struct airoha_npu *npu); - int (*ppe_flush_sram_entries)(struct airoha_npu *npu, - dma_addr_t foe_addr, - int sram_num_entries); - int (*ppe_foe_commit_entry)(struct airoha_npu *npu, - dma_addr_t foe_addr, - u32 entry_size, u32 hash, - bool ppe2); - int (*wlan_init_reserved_memory)(struct airoha_npu *npu); - int (*wlan_send_msg)(struct airoha_npu *npu, int ifindex, - enum airoha_npu_wlan_set_cmd func_id, - void *data, int data_len, gfp_t gfp); - int (*wlan_get_msg)(struct airoha_npu *npu, int ifindex, - enum airoha_npu_wlan_get_cmd func_id, - void *data, int data_len, gfp_t gfp); - u32 (*wlan_get_queue_addr)(struct airoha_npu *npu, int qid, - bool xmit); - void (*wlan_set_irq_status)(struct airoha_npu *npu, u32 val); - u32 (*wlan_get_irq_status)(struct airoha_npu *npu, int q); - void (*wlan_enable_irq)(struct airoha_npu *npu, int q); - void (*wlan_disable_irq)(struct airoha_npu *npu, int q); - } ops; -}; - -struct airoha_npu *airoha_npu_get(struct device *dev, dma_addr_t *stats_addr); -void airoha_npu_put(struct airoha_npu *npu); diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c index 47411d2cbd28..82163392332c 100644 --- a/drivers/net/ethernet/airoha/airoha_ppe.c +++ b/drivers/net/ethernet/airoha/airoha_ppe.c @@ -7,10 +7,10 @@ #include #include #include +#include #include #include -#include "airoha_npu.h" #include "airoha_regs.h" #include "airoha_eth.h" diff --git a/include/linux/soc/airoha/airoha_offload.h b/include/linux/soc/airoha/airoha_offload.h new file mode 100644 index 000000000000..117c63c2448d --- /dev/null +++ b/include/linux/soc/airoha/airoha_offload.h @@ -0,0 +1,260 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2025 AIROHA Inc + * Author: Lorenzo Bianconi + */ +#ifndef AIROHA_OFFLOAD_H +#define AIROHA_OFFLOAD_H + +#include +#include + +#define NPU_NUM_CORES 8 +#define NPU_NUM_IRQ 6 +#define NPU_RX0_DESC_NUM 512 +#define NPU_RX1_DESC_NUM 512 + +/* CTRL */ +#define NPU_RX_DMA_DESC_LAST_MASK BIT(29) +#define NPU_RX_DMA_DESC_LEN_MASK GENMASK(28, 15) +#define NPU_RX_DMA_DESC_CUR_LEN_MASK GENMASK(14, 1) +#define NPU_RX_DMA_DESC_DONE_MASK BIT(0) +/* INFO */ +#define NPU_RX_DMA_PKT_COUNT_MASK GENMASK(31, 28) +#define NPU_RX_DMA_PKT_ID_MASK GENMASK(28, 26) +#define NPU_RX_DMA_SRC_PORT_MASK GENMASK(25, 21) +#define NPU_RX_DMA_CRSN_MASK GENMASK(20, 16) +#define NPU_RX_DMA_FOE_ID_MASK GENMASK(15, 0) +/* DATA */ +#define NPU_RX_DMA_SID_MASK GENMASK(31, 16) +#define NPU_RX_DMA_FRAG_TYPE_MASK GENMASK(15, 14) +#define NPU_RX_DMA_PRIORITY_MASK GENMASK(13, 10) +#define NPU_RX_DMA_RADIO_ID_MASK GENMASK(9, 6) +#define NPU_RX_DMA_VAP_ID_MASK GENMASK(5, 2) +#define NPU_RX_DMA_FRAME_TYPE_MASK GENMASK(1, 0) + +struct airoha_npu_rx_dma_desc { + u32 ctrl; + u32 info; + u32 data; + u32 addr; + u64 rsv; +} __packed; + +/* CTRL */ +#define NPU_TX_DMA_DESC_SCHED_MASK BIT(31) +#define NPU_TX_DMA_DESC_LEN_MASK GENMASK(30, 18) +#define NPU_TX_DMA_DESC_VEND_LEN_MASK GENMASK(17, 1) +#define NPU_TX_DMA_DESC_DONE_MASK BIT(0) + +#define NPU_TXWI_LEN 192 + +struct airoha_npu_tx_dma_desc { + u32 ctrl; + u32 addr; + u64 rsv; + u8 txwi[NPU_TXWI_LEN]; +} __packed; + +enum airoha_npu_wlan_set_cmd { + WLAN_FUNC_SET_WAIT_PCIE_ADDR, + WLAN_FUNC_SET_WAIT_DESC, + WLAN_FUNC_SET_WAIT_NPU_INIT_DONE, + WLAN_FUNC_SET_WAIT_TRAN_TO_CPU, + WLAN_FUNC_SET_WAIT_BA_WIN_SIZE, + WLAN_FUNC_SET_WAIT_DRIVER_MODEL, + WLAN_FUNC_SET_WAIT_DEL_STA, + WLAN_FUNC_SET_WAIT_DRAM_BA_NODE_ADDR, + WLAN_FUNC_SET_WAIT_PKT_BUF_ADDR, + WLAN_FUNC_SET_WAIT_IS_TEST_NOBA, + WLAN_FUNC_SET_WAIT_FLUSHONE_TIMEOUT, + WLAN_FUNC_SET_WAIT_FLUSHALL_TIMEOUT, + WLAN_FUNC_SET_WAIT_IS_FORCE_TO_CPU, + WLAN_FUNC_SET_WAIT_PCIE_STATE, + WLAN_FUNC_SET_WAIT_PCIE_PORT_TYPE, + WLAN_FUNC_SET_WAIT_ERROR_RETRY_TIMES, + WLAN_FUNC_SET_WAIT_BAR_INFO, + WLAN_FUNC_SET_WAIT_FAST_FLAG, + WLAN_FUNC_SET_WAIT_NPU_BAND0_ONCPU, + WLAN_FUNC_SET_WAIT_TX_RING_PCIE_ADDR, + WLAN_FUNC_SET_WAIT_TX_DESC_HW_BASE, + WLAN_FUNC_SET_WAIT_TX_BUF_SPACE_HW_BASE, + WLAN_FUNC_SET_WAIT_RX_RING_FOR_TXDONE_HW_BASE, + WLAN_FUNC_SET_WAIT_TX_PKT_BUF_ADDR, + WLAN_FUNC_SET_WAIT_INODE_TXRX_REG_ADDR, + WLAN_FUNC_SET_WAIT_INODE_DEBUG_FLAG, + WLAN_FUNC_SET_WAIT_INODE_HW_CFG_INFO, + WLAN_FUNC_SET_WAIT_INODE_STOP_ACTION, + WLAN_FUNC_SET_WAIT_INODE_PCIE_SWAP, + WLAN_FUNC_SET_WAIT_RATELIMIT_CTRL, + WLAN_FUNC_SET_WAIT_HWNAT_INIT, + WLAN_FUNC_SET_WAIT_ARHT_CHIP_INFO, + WLAN_FUNC_SET_WAIT_TX_BUF_CHECK_ADDR, + WLAN_FUNC_SET_WAIT_TOKEN_ID_SIZE, +}; + +enum airoha_npu_wlan_get_cmd { + WLAN_FUNC_GET_WAIT_NPU_INFO, + WLAN_FUNC_GET_WAIT_LAST_RATE, + WLAN_FUNC_GET_WAIT_COUNTER, + WLAN_FUNC_GET_WAIT_DBG_COUNTER, + WLAN_FUNC_GET_WAIT_RXDESC_BASE, + WLAN_FUNC_GET_WAIT_WCID_DBG_COUNTER, + WLAN_FUNC_GET_WAIT_DMA_ADDR, + WLAN_FUNC_GET_WAIT_RING_SIZE, + WLAN_FUNC_GET_WAIT_NPU_SUPPORT_MAP, + WLAN_FUNC_GET_WAIT_MDC_LOCK_ADDRESS, + WLAN_FUNC_GET_WAIT_NPU_VERSION, +}; + +struct airoha_npu { +#if (IS_BUILTIN(CONFIG_NET_AIROHA_NPU) || IS_MODULE(CONFIG_NET_AIROHA_NPU)) + struct device *dev; + struct regmap *regmap; + + struct airoha_npu_core { + struct airoha_npu *npu; + /* protect concurrent npu memory accesses */ + spinlock_t lock; + struct work_struct wdt_work; + } cores[NPU_NUM_CORES]; + + int irqs[NPU_NUM_IRQ]; + + struct airoha_foe_stats __iomem *stats; + + struct { + int (*ppe_init)(struct airoha_npu *npu); + int (*ppe_deinit)(struct airoha_npu *npu); + int (*ppe_flush_sram_entries)(struct airoha_npu *npu, + dma_addr_t foe_addr, + int sram_num_entries); + int (*ppe_foe_commit_entry)(struct airoha_npu *npu, + dma_addr_t foe_addr, + u32 entry_size, u32 hash, + bool ppe2); + int (*wlan_init_reserved_memory)(struct airoha_npu *npu); + int (*wlan_send_msg)(struct airoha_npu *npu, int ifindex, + enum airoha_npu_wlan_set_cmd func_id, + void *data, int data_len, gfp_t gfp); + int (*wlan_get_msg)(struct airoha_npu *npu, int ifindex, + enum airoha_npu_wlan_get_cmd func_id, + void *data, int data_len, gfp_t gfp); + u32 (*wlan_get_queue_addr)(struct airoha_npu *npu, int qid, + bool xmit); + void (*wlan_set_irq_status)(struct airoha_npu *npu, u32 val); + u32 (*wlan_get_irq_status)(struct airoha_npu *npu, int q); + void (*wlan_enable_irq)(struct airoha_npu *npu, int q); + void (*wlan_disable_irq)(struct airoha_npu *npu, int q); + } ops; +#endif +}; + +#if (IS_BUILTIN(CONFIG_NET_AIROHA_NPU) || IS_MODULE(CONFIG_NET_AIROHA_NPU)) +struct airoha_npu *airoha_npu_get(struct device *dev, dma_addr_t *stats_addr); +void airoha_npu_put(struct airoha_npu *npu); + +static inline int airoha_npu_wlan_init_reserved_memory(struct airoha_npu *npu) +{ + return npu->ops.wlan_init_reserved_memory(npu); +} + +static inline int airoha_npu_wlan_send_msg(struct airoha_npu *npu, + int ifindex, + enum airoha_npu_wlan_set_cmd cmd, + void *data, int data_len, gfp_t gfp) +{ + return npu->ops.wlan_send_msg(npu, ifindex, cmd, data, data_len, gfp); +} + +static inline int airoha_npu_wlan_get_msg(struct airoha_npu *npu, int ifindex, + enum airoha_npu_wlan_get_cmd cmd, + void *data, int data_len, gfp_t gfp) +{ + return npu->ops.wlan_get_msg(npu, ifindex, cmd, data, data_len, gfp); +} + +static inline u32 airoha_npu_wlan_get_queue_addr(struct airoha_npu *npu, + int qid, bool xmit) +{ + return npu->ops.wlan_get_queue_addr(npu, qid, xmit); +} + +static inline void airoha_npu_wlan_set_irq_status(struct airoha_npu *npu, + u32 val) +{ + npu->ops.wlan_set_irq_status(npu, val); +} + +static inline u32 airoha_npu_wlan_get_irq_status(struct airoha_npu *npu, int q) +{ + return npu->ops.wlan_get_irq_status(npu, q); +} + +static inline void airoha_npu_wlan_enable_irq(struct airoha_npu *npu, int q) +{ + npu->ops.wlan_enable_irq(npu, q); +} + +static inline void airoha_npu_wlan_disable_irq(struct airoha_npu *npu, int q) +{ + npu->ops.wlan_disable_irq(npu, q); +} +#else +static inline struct airoha_npu *airoha_npu_get(struct device *dev, + dma_addr_t *foe_stats_addr) +{ + return NULL; +} + +static inline void airoha_npu_put(struct airoha_npu *npu) +{ +} + +static inline int airoha_npu_wlan_init_reserved_memory(struct airoha_npu *npu) +{ + return -EOPNOTSUPP; +} + +static inline int airoha_npu_wlan_send_msg(struct airoha_npu *npu, + int ifindex, + enum airoha_npu_wlan_set_cmd cmd, + void *data, int data_len, gfp_t gfp) +{ + return -EOPNOTSUPP; +} + +static inline int airoha_npu_wlan_get_msg(struct airoha_npu *npu, int ifindex, + enum airoha_npu_wlan_get_cmd cmd, + void *data, int data_len, gfp_t gfp) +{ + return -EOPNOTSUPP; +} + +static inline u32 airoha_npu_wlan_get_queue_addr(struct airoha_npu *npu, + int qid, bool xmit) +{ + return 0; +} + +static inline void airoha_npu_wlan_set_irq_status(struct airoha_npu *npu, + u32 val) +{ +} + +static inline u32 airoha_npu_wlan_get_irq_status(struct airoha_npu *npu, + int q) +{ + return 0; +} + +static inline void airoha_npu_wlan_enable_irq(struct airoha_npu *npu, int q) +{ +} + +static inline void airoha_npu_wlan_disable_irq(struct airoha_npu *npu, int q) +{ +} +#endif + +#endif /* AIROHA_OFFLOAD_H */ From 6896c2449a1858acb643014894d01b3a1223d4e5 Mon Sep 17 00:00:00 2001 From: Tiezhu Yang Date: Mon, 11 Aug 2025 15:35:04 +0800 Subject: [PATCH 0059/1536] net: stmmac: Check stmmac_hw_setup() in stmmac_resume() stmmac_hw_setup() may return 0 on success and an appropriate negative integer as defined in errno.h file on failure, just check it and then return early if failed in stmmac_resume(). Signed-off-by: Tiezhu Yang Reviewed-by: Maxime Chevallier Reviewed-by: Huacai Chen Link: https://patch.msgid.link/20250811073506.27513-2-yangtiezhu@loongson.cn Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 4a82045ea6eb..9a77390b7f9d 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7984,7 +7984,14 @@ int stmmac_resume(struct device *dev) stmmac_free_tx_skbufs(priv); stmmac_clear_descriptors(priv, &priv->dma_conf); - stmmac_hw_setup(ndev, false); + ret = stmmac_hw_setup(ndev, false); + if (ret < 0) { + netdev_err(priv->dev, "%s: Hw setup failed\n", __func__); + mutex_unlock(&priv->lock); + rtnl_unlock(); + return ret; + } + stmmac_init_coalesce(priv); phylink_rx_clk_stop_block(priv->phylink); stmmac_set_rx_mode(ndev); From 139235103f6039c2c77cb8f51cb2e7e610fe0114 Mon Sep 17 00:00:00 2001 From: Tiezhu Yang Date: Mon, 11 Aug 2025 15:35:05 +0800 Subject: [PATCH 0060/1536] net: stmmac: Change first parameter of fix_soc_reset() In order to use netdev_err() to print message in the callback function of fix_soc_reset(), change fix_soc_reset() to have "struct stmmac_priv *" as its first parameter. This is preparation for later patch, no functionality change. Suggested-by: Andrew Lunn Signed-off-by: Tiezhu Yang Link: https://patch.msgid.link/20250811073506.27513-3-yangtiezhu@loongson.cn Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c | 6 +++--- drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c | 2 +- drivers/net/ethernet/stmicro/stmmac/hwif.c | 2 +- include/linux/stmmac.h | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c index 889e2bb6f7f5..c2d9e89f0063 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c @@ -49,7 +49,7 @@ struct imx_dwmac_ops { u32 flags; bool mac_rgmii_txclk_auto_adj; - int (*fix_soc_reset)(void *priv, void __iomem *ioaddr); + int (*fix_soc_reset)(struct stmmac_priv *priv, void __iomem *ioaddr); int (*set_intf_mode)(struct plat_stmmacenet_data *plat_dat); void (*fix_mac_speed)(void *priv, int speed, unsigned int mode); }; @@ -265,9 +265,9 @@ static void imx93_dwmac_fix_speed(void *priv, int speed, unsigned int mode) writel(old_ctrl, dwmac->base_addr + MAC_CTRL_REG); } -static int imx_dwmac_mx93_reset(void *priv, void __iomem *ioaddr) +static int imx_dwmac_mx93_reset(struct stmmac_priv *priv, void __iomem *ioaddr) { - struct plat_stmmacenet_data *plat_dat = priv; + struct plat_stmmacenet_data *plat_dat = priv->plat; u32 value = readl(ioaddr + DMA_BUS_MODE); /* DMA SW reset */ diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c index 5769165ee5ba..4c477d6ce1e6 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c @@ -509,7 +509,7 @@ static int loongson_dwmac_acpi_config(struct pci_dev *pdev, } /* Loongson's DWMAC device may take nearly two seconds to complete DMA reset */ -static int loongson_dwmac_fix_reset(void *priv, void __iomem *ioaddr) +static int loongson_dwmac_fix_reset(struct stmmac_priv *priv, void __iomem *ioaddr) { u32 value = readl(ioaddr + DMA_BUS_MODE); diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c b/drivers/net/ethernet/stmicro/stmmac/hwif.c index 99635b37044a..3f7c765dcb79 100644 --- a/drivers/net/ethernet/stmicro/stmmac/hwif.c +++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c @@ -100,7 +100,7 @@ int stmmac_reset(struct stmmac_priv *priv, void __iomem *ioaddr) return -EINVAL; if (plat && plat->fix_soc_reset) - return plat->fix_soc_reset(plat, ioaddr); + return plat->fix_soc_reset(priv, ioaddr); return stmmac_do_callback(priv, dma, reset, ioaddr); } diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 22c24dacbc65..e284f04964bf 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -238,7 +238,7 @@ struct plat_stmmacenet_data { int (*set_clk_tx_rate)(void *priv, struct clk *clk_tx_i, phy_interface_t interface, int speed); void (*fix_mac_speed)(void *priv, int speed, unsigned int mode); - int (*fix_soc_reset)(void *priv, void __iomem *ioaddr); + int (*fix_soc_reset)(struct stmmac_priv *priv, void __iomem *ioaddr); int (*serdes_powerup)(struct net_device *ndev, void *priv); void (*serdes_powerdown)(struct net_device *ndev, void *priv); int (*mac_finish)(struct net_device *ndev, From bfd9d893edfa8278064c4e914e29cf98e290532e Mon Sep 17 00:00:00 2001 From: Tiezhu Yang Date: Mon, 11 Aug 2025 15:35:06 +0800 Subject: [PATCH 0061/1536] net: stmmac: Return early if invalid in loongson_dwmac_fix_reset() If the MAC controller does not connect to any PHY interface, there is a missing clock, then the DMA reset fails. For this case, the DMA_BUS_MODE_SFT_RESET bit is 1 before software reset, just print an error message which gives a hint the PHY clock is missing, and then return -EINVAL immediately to avoid waiting for the timeout when the DMA reset fails in loongson_dwmac_fix_reset(). With this patch, for the normal end user, the computer start faster with reducing boot time for 2 seconds on the specified mainboard. Signed-off-by: Tiezhu Yang Link: https://patch.msgid.link/20250811073506.27513-4-yangtiezhu@loongson.cn Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c index 4c477d6ce1e6..6fca0fca4892 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c @@ -513,6 +513,11 @@ static int loongson_dwmac_fix_reset(struct stmmac_priv *priv, void __iomem *ioad { u32 value = readl(ioaddr + DMA_BUS_MODE); + if (value & DMA_BUS_MODE_SFT_RESET) { + netdev_err(priv->dev, "the PHY clock is missing\n"); + return -EINVAL; + } + value |= DMA_BUS_MODE_SFT_RESET; writel(value, ioaddr + DMA_BUS_MODE); From 4d18083d6b2c02e89d833c92c3fb79e2fe1e6795 Mon Sep 17 00:00:00 2001 From: Wang Liang Date: Tue, 12 Aug 2025 09:59:29 +0800 Subject: [PATCH 0062/1536] vsock: use sizeof(struct sockaddr_storage) instead of magic value Previous commit 230b183921ec ("net: Use standard structures for generic socket address structures.") use 'struct sockaddr_storage address;' to replace 'char address[MAX_SOCK_ADDR];'. The macro MAX_SOCK_ADDR is removed by commit 01893c82b4e6 ("net: Remove MAX_SOCK_ADDR constant"). The comment in vsock_getname() is outdated, use sizeof(struct sockaddr_storage) instead of magic value 128. Signed-off-by: Wang Liang Link: https://patch.msgid.link/20250812015929.1419896-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski --- net/vmw_vsock/af_vsock.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index ead6a3c14b87..f7b2d61d1d16 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1028,12 +1028,7 @@ static int vsock_getname(struct socket *sock, vm_addr = &vsk->local_addr; } - /* sys_getsockname() and sys_getpeername() pass us a - * MAX_SOCK_ADDR-sized buffer and don't set addr_len. Unfortunately - * that macro is defined in socket.c instead of .h, so we hardcode its - * value here. - */ - BUILD_BUG_ON(sizeof(*vm_addr) > 128); + BUILD_BUG_ON(sizeof(*vm_addr) > sizeof(struct sockaddr_storage)); memcpy(addr, vm_addr, sizeof(*vm_addr)); err = sizeof(*vm_addr); From 96326447d466ca62b713f660cfc73ef7879151a0 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Tue, 12 Aug 2025 06:57:23 +0200 Subject: [PATCH 0063/1536] net: mediatek: wed: Introduce MT7992 WED support to MT7988 SoC Introduce the second WDMA RX ring in WED driver for MT7988 SoC since the Mediatek MT7992 WiFi chipset supports two separated WDMA rings. Add missing MT7988 configurations to properly support WED for MT7992 in MT76 driver. Co-developed-by: Rex Lu Signed-off-by: Rex Lu Signed-off-by: Lorenzo Bianconi Link: https://patch.msgid.link/20250812-mt7992-wed-support-v3-1-9ada78a819a4@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mediatek/mtk_wed.c | 33 +++++++++++++++---- drivers/net/ethernet/mediatek/mtk_wed.h | 2 +- .../net/wireless/mediatek/mt76/mt7915/mmio.c | 6 ++-- .../net/wireless/mediatek/mt76/mt7996/mmio.c | 12 +++---- include/linux/soc/mediatek/mtk_wed.h | 2 +- 5 files changed, 38 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/mediatek/mtk_wed.c b/drivers/net/ethernet/mediatek/mtk_wed.c index 0a80d8f8cff7..3dbb113b792c 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed.c +++ b/drivers/net/ethernet/mediatek/mtk_wed.c @@ -59,7 +59,9 @@ struct mtk_wed_flow_block_priv { static const struct mtk_wed_soc_data mt7622_data = { .regmap = { .tx_bm_tkid = 0x088, - .wpdma_rx_ring0 = 0x770, + .wpdma_rx_ring = { + 0x770, + }, .reset_idx_tx_mask = GENMASK(3, 0), .reset_idx_rx_mask = GENMASK(17, 16), }, @@ -70,7 +72,9 @@ static const struct mtk_wed_soc_data mt7622_data = { static const struct mtk_wed_soc_data mt7986_data = { .regmap = { .tx_bm_tkid = 0x0c8, - .wpdma_rx_ring0 = 0x770, + .wpdma_rx_ring = { + 0x770, + }, .reset_idx_tx_mask = GENMASK(1, 0), .reset_idx_rx_mask = GENMASK(7, 6), }, @@ -81,7 +85,10 @@ static const struct mtk_wed_soc_data mt7986_data = { static const struct mtk_wed_soc_data mt7988_data = { .regmap = { .tx_bm_tkid = 0x0c8, - .wpdma_rx_ring0 = 0x7d0, + .wpdma_rx_ring = { + 0x7d0, + 0x7d8, + }, .reset_idx_tx_mask = GENMASK(1, 0), .reset_idx_rx_mask = GENMASK(7, 6), }, @@ -621,8 +628,8 @@ mtk_wed_amsdu_init(struct mtk_wed_device *dev) return ret; } - /* eagle E1 PCIE1 tx ring 22 flow control issue */ - if (dev->wlan.id == 0x7991) + /* Kite and Eagle E1 PCIE1 tx ring 22 flow control issue */ + if (dev->wlan.id == 0x7991 || dev->wlan.id == 0x7992) wed_clr(dev, MTK_WED_AMSDU_FIFO, MTK_WED_AMSDU_IS_PRIOR0_RING); wed_set(dev, MTK_WED_CTRL, MTK_WED_CTRL_TX_AMSDU_EN); @@ -1239,7 +1246,11 @@ mtk_wed_set_wpdma(struct mtk_wed_device *dev) return; wed_w32(dev, MTK_WED_WPDMA_RX_GLO_CFG, dev->wlan.wpdma_rx_glo); - wed_w32(dev, dev->hw->soc->regmap.wpdma_rx_ring0, dev->wlan.wpdma_rx); + wed_w32(dev, dev->hw->soc->regmap.wpdma_rx_ring[0], + dev->wlan.wpdma_rx[0]); + if (mtk_wed_is_v3_or_greater(dev->hw)) + wed_w32(dev, dev->hw->soc->regmap.wpdma_rx_ring[1], + dev->wlan.wpdma_rx[1]); if (!dev->wlan.hw_rro) return; @@ -2323,6 +2334,16 @@ mtk_wed_start(struct mtk_wed_device *dev, u32 irq_mask) if (!dev->rx_wdma[i].desc) mtk_wed_wdma_rx_ring_setup(dev, i, 16, false); + if (dev->wlan.hw_rro) { + for (i = 0; i < MTK_WED_RX_PAGE_QUEUES; i++) { + u32 addr = MTK_WED_RRO_MSDU_PG_CTRL0(i) + + MTK_WED_RING_OFS_COUNT; + + if (!wed_r32(dev, addr)) + wed_w32(dev, addr, 1); + } + } + mtk_wed_hw_init(dev); mtk_wed_configure_irq(dev, irq_mask); diff --git a/drivers/net/ethernet/mediatek/mtk_wed.h b/drivers/net/ethernet/mediatek/mtk_wed.h index c1f0479d7a71..b49aee9a8b65 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed.h +++ b/drivers/net/ethernet/mediatek/mtk_wed.h @@ -17,7 +17,7 @@ struct mtk_wed_wo; struct mtk_wed_soc_data { struct { u32 tx_bm_tkid; - u32 wpdma_rx_ring0; + u32 wpdma_rx_ring[MTK_WED_RX_QUEUES]; u32 reset_idx_tx_mask; u32 reset_idx_rx_mask; } regmap; diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c b/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c index 4a82f8e4c118..36488aa6cc20 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c +++ b/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c @@ -664,8 +664,8 @@ int mt7915_mmio_wed_init(struct mt7915_dev *dev, void *pdev_ptr, MT_RXQ_WED_RING_BASE; wed->wlan.wpdma_rx_glo = pci_resource_start(pci_dev, 0) + MT_WPDMA_GLO_CFG; - wed->wlan.wpdma_rx = pci_resource_start(pci_dev, 0) + - MT_RXQ_WED_DATA_RING_BASE; + wed->wlan.wpdma_rx[0] = pci_resource_start(pci_dev, 0) + + MT_RXQ_WED_DATA_RING_BASE; } else { struct platform_device *plat_dev = pdev_ptr; struct resource *res; @@ -687,7 +687,7 @@ int mt7915_mmio_wed_init(struct mt7915_dev *dev, void *pdev_ptr, wed->wlan.wpdma_tx = res->start + MT_TXQ_WED_RING_BASE; wed->wlan.wpdma_txfree = res->start + MT_RXQ_WED_RING_BASE; wed->wlan.wpdma_rx_glo = res->start + MT_WPDMA_GLO_CFG; - wed->wlan.wpdma_rx = res->start + MT_RXQ_WED_DATA_RING_BASE; + wed->wlan.wpdma_rx[0] = res->start + MT_RXQ_WED_DATA_RING_BASE; } wed->wlan.nbuf = MT7915_HW_TOKEN_SIZE; wed->wlan.tx_tbit[0] = is_mt7915(&dev->mt76) ? 4 : 30; diff --git a/drivers/net/wireless/mediatek/mt76/mt7996/mmio.c b/drivers/net/wireless/mediatek/mt76/mt7996/mmio.c index 30b40f4a91be..fb2428a9b877 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7996/mmio.c +++ b/drivers/net/wireless/mediatek/mt76/mt7996/mmio.c @@ -503,9 +503,9 @@ int mt7996_mmio_wed_init(struct mt7996_dev *dev, void *pdev_ptr, } wed->wlan.wpdma_rx_glo = wed->wlan.phy_base + hif1_ofs + MT_WFDMA0_GLO_CFG; - wed->wlan.wpdma_rx = wed->wlan.phy_base + hif1_ofs + - MT_RXQ_RING_BASE(MT7996_RXQ_BAND0) + - MT7996_RXQ_BAND0 * MT_RING_SIZE; + wed->wlan.wpdma_rx[0] = wed->wlan.phy_base + hif1_ofs + + MT_RXQ_RING_BASE(MT7996_RXQ_BAND0) + + MT7996_RXQ_BAND0 * MT_RING_SIZE; wed->wlan.id = MT7996_DEVICE_ID_2; wed->wlan.tx_tbit[0] = ffs(MT_INT_TX_DONE_BAND2) - 1; @@ -518,9 +518,9 @@ int mt7996_mmio_wed_init(struct mt7996_dev *dev, void *pdev_ptr, wed->wlan.wpdma_rx_glo = wed->wlan.phy_base + MT_WFDMA0_GLO_CFG; - wed->wlan.wpdma_rx = wed->wlan.phy_base + - MT_RXQ_RING_BASE(MT7996_RXQ_BAND0) + - MT7996_RXQ_BAND0 * MT_RING_SIZE; + wed->wlan.wpdma_rx[0] = wed->wlan.phy_base + + MT_RXQ_RING_BASE(MT7996_RXQ_BAND0) + + MT7996_RXQ_BAND0 * MT_RING_SIZE; wed->wlan.wpdma_rx_rro[0] = wed->wlan.phy_base + MT_RXQ_RING_BASE(MT7996_RXQ_RRO_BAND0) + diff --git a/include/linux/soc/mediatek/mtk_wed.h b/include/linux/soc/mediatek/mtk_wed.h index d8949a4ed0dc..c4ff6bab176d 100644 --- a/include/linux/soc/mediatek/mtk_wed.h +++ b/include/linux/soc/mediatek/mtk_wed.h @@ -147,7 +147,7 @@ struct mtk_wed_device { u32 wpdma_tx; u32 wpdma_txfree; u32 wpdma_rx_glo; - u32 wpdma_rx; + u32 wpdma_rx[MTK_WED_RX_QUEUES]; u32 wpdma_rx_rro[MTK_WED_RX_QUEUES]; u32 wpdma_rx_pg; From 5e88777a382480d0b1f7eafb6d0fb680ec7a40bb Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 12 Aug 2025 10:18:10 +0300 Subject: [PATCH 0064/1536] selftests: forwarding: Add a test for FDB activity notification control Test various aspects of FDB activity notification control: * Transitioning of an FDB entry from inactive to active state. * Transitioning of an FDB entry from active to inactive state. * Avoiding the resetting of an FDB entry's last activity time (i.e., "updated" time) using the "norefresh" keyword. Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Link: https://patch.msgid.link/20250812071810.312346-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski --- .../testing/selftests/net/forwarding/Makefile | 4 +- .../net/forwarding/bridge_activity_notify.sh | 173 ++++++++++++++++++ 2 files changed, 176 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/net/forwarding/bridge_activity_notify.sh diff --git a/tools/testing/selftests/net/forwarding/Makefile b/tools/testing/selftests/net/forwarding/Makefile index d7bb2e80e88c..0a0d4c2a85f7 100644 --- a/tools/testing/selftests/net/forwarding/Makefile +++ b/tools/testing/selftests/net/forwarding/Makefile @@ -1,6 +1,8 @@ # SPDX-License-Identifier: GPL-2.0+ OR MIT -TEST_PROGS = bridge_fdb_learning_limit.sh \ +TEST_PROGS = \ + bridge_activity_notify.sh \ + bridge_fdb_learning_limit.sh \ bridge_igmp.sh \ bridge_locked_port.sh \ bridge_mdb.sh \ diff --git a/tools/testing/selftests/net/forwarding/bridge_activity_notify.sh b/tools/testing/selftests/net/forwarding/bridge_activity_notify.sh new file mode 100755 index 000000000000..a20ef4bd310b --- /dev/null +++ b/tools/testing/selftests/net/forwarding/bridge_activity_notify.sh @@ -0,0 +1,173 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# +-----------------------+ +------------------------+ +# | H1 (vrf) | | H2 (vrf) | +# | 192.0.2.1/28 | | 192.0.2.2/28 | +# | + $h1 | | + $h2 | +# +----|------------------+ +----|-------------------+ +# | | +# +----|--------------------------------------------------|-------------------+ +# | SW | | | +# | +--|--------------------------------------------------|-----------------+ | +# | | + $swp1 BR1 (802.1d) + $swp2 | | +# | | | | +# | +-----------------------------------------------------------------------+ | +# +---------------------------------------------------------------------------+ + +ALL_TESTS=" + new_inactive_test + existing_active_test + norefresh_test +" + +NUM_NETIFS=4 +source lib.sh + +h1_create() +{ + simple_if_init "$h1" 192.0.2.1/28 + defer simple_if_fini "$h1" 192.0.2.1/28 +} + +h2_create() +{ + simple_if_init "$h2" 192.0.2.2/28 + defer simple_if_fini "$h2" 192.0.2.2/28 +} + +switch_create() +{ + ip_link_add br1 type bridge vlan_filtering 0 mcast_snooping 0 \ + ageing_time "$LOW_AGEING_TIME" + ip_link_set_up br1 + + ip_link_set_master "$swp1" br1 + ip_link_set_up "$swp1" + + ip_link_set_master "$swp2" br1 + ip_link_set_up "$swp2" +} + +setup_prepare() +{ + h1=${NETIFS[p1]} + swp1=${NETIFS[p2]} + + swp2=${NETIFS[p3]} + h2=${NETIFS[p4]} + + vrf_prepare + defer vrf_cleanup + + h1_create + h2_create + switch_create +} + +fdb_active_wait() +{ + local mac=$1; shift + + bridge -d fdb get "$mac" br br1 | grep -q -v "inactive" +} + +fdb_inactive_wait() +{ + local mac=$1; shift + + bridge -d fdb get "$mac" br br1 | grep -q "inactive" +} + +new_inactive_test() +{ + local mac="00:11:22:33:44:55" + + # Add a new FDB entry as static and inactive and check that it + # becomes active upon traffic. + RET=0 + + bridge fdb add "$mac" dev "$swp1" master static activity_notify inactive + bridge -d fdb get "$mac" br br1 | grep -q "inactive" + check_err $? "FDB entry not present as \"inactive\" when should" + + $MZ "$h1" -c 1 -p 64 -a "$mac" -b bcast -t ip -q + + busywait "$BUSYWAIT_TIMEOUT" fdb_active_wait "$mac" + check_err $? "FDB entry present as \"inactive\" when should not" + + log_test "Transition from inactive to active" + + bridge fdb del "$mac" dev "$swp1" master +} + +existing_active_test() +{ + local mac="00:11:22:33:44:55" + local ageing_time + + # Enable activity notifications on an existing dynamic FDB entry and + # check that it becomes inactive after the ageing time passed. + RET=0 + + bridge fdb add "$mac" dev "$swp1" master dynamic + bridge fdb replace "$mac" dev "$swp1" master static activity_notify norefresh + + bridge -d fdb get "$mac" br br1 | grep -q "activity_notify" + check_err $? "FDB entry not present as \"activity_notify\" when should" + + bridge -d fdb get "$mac" br br1 | grep -q "inactive" + check_fail $? "FDB entry present as \"inactive\" when should not" + + ageing_time=$(bridge_ageing_time_get br1) + slowwait $((ageing_time * 2)) fdb_inactive_wait "$mac" + check_err $? "FDB entry not present as \"inactive\" when should" + + log_test "Transition from active to inactive" + + bridge fdb del "$mac" dev "$swp1" master +} + +norefresh_test() +{ + local mac="00:11:22:33:44:55" + local updated_time + + # Check that the "updated" time is reset when replacing an FDB entry + # without the "norefresh" keyword and that it is not reset when + # replacing with the "norefresh" keyword. + RET=0 + + bridge fdb add "$mac" dev "$swp1" master static + sleep 1 + + bridge fdb replace "$mac" dev "$swp1" master static activity_notify + updated_time=$(bridge -d -s -j fdb get "$mac" br br1 | jq '.[]["updated"]') + if [[ $updated_time -ne 0 ]]; then + check_err 1 "\"updated\" time was not reset when should" + fi + + sleep 1 + bridge fdb replace "$mac" dev "$swp1" master static norefresh + updated_time=$(bridge -d -s -j fdb get "$mac" br br1 | jq '.[]["updated"]') + if [[ $updated_time -eq 0 ]]; then + check_err 1 "\"updated\" time was reset when should not" + fi + + log_test "Resetting of \"updated\" time" + + bridge fdb del "$mac" dev "$swp1" master +} + +if ! bridge fdb help 2>&1 | grep -q "activity_notify"; then + echo "SKIP: iproute2 too old, missing bridge FDB activity notification control" + exit "$ksft_skip" +fi + +trap cleanup EXIT + +setup_prepare +setup_wait +tests_run + +exit "$EXIT_STATUS" From a57384110dc633856216fc254680923905391d67 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miguel=20Garc=C3=ADa?= Date: Tue, 12 Aug 2025 10:22:44 +0200 Subject: [PATCH 0065/1536] tun: replace strcpy with strscpy for ifr_name MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the strcpy() calls that copy the device name into ifr->ifr_name with strscpy() to avoid potential overflows and guarantee NULL termination. Destination is ifr->ifr_name (size IFNAMSIZ). Tested in QEMU (BusyBox rootfs): - Created TUN devices via TUNSETIFF helper - Set addresses and brought links up - Verified long interface names are safely truncated (IFNAMSIZ-1) Signed-off-by: Miguel García Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20250812082244.60240-1-miguelgarciaroman8@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/tun.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index cc6c50180663..86a9e927d0ff 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2823,13 +2823,13 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) if (netif_running(tun->dev)) netif_tx_wake_all_queues(tun->dev); - strcpy(ifr->ifr_name, tun->dev->name); + strscpy(ifr->ifr_name, tun->dev->name); return 0; } static void tun_get_iff(struct tun_struct *tun, struct ifreq *ifr) { - strcpy(ifr->ifr_name, tun->dev->name); + strscpy(ifr->ifr_name, tun->dev->name); ifr->ifr_flags = tun_flags(tun); From 30f7d4099fb6736712338bd3e41433db796ec869 Mon Sep 17 00:00:00 2001 From: Jiawen Wu Date: Tue, 12 Aug 2025 17:37:25 +0800 Subject: [PATCH 0066/1536] net: libwx: cleanup VF register macros Adjust the order of VF regitser macros, make it elegant. Signed-off-by: Jiawen Wu Link: https://patch.msgid.link/778899EE1D862EC2+20250812093725.58821-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/wangxun/libwx/wx_vf.h | 72 +++++++++++----------- 1 file changed, 35 insertions(+), 37 deletions(-) diff --git a/drivers/net/ethernet/wangxun/libwx/wx_vf.h b/drivers/net/ethernet/wangxun/libwx/wx_vf.h index fec1126703e3..3f16de0fa427 100644 --- a/drivers/net/ethernet/wangxun/libwx/wx_vf.h +++ b/drivers/net/ethernet/wangxun/libwx/wx_vf.h @@ -4,6 +4,7 @@ #ifndef _WX_VF_H_ #define _WX_VF_H_ +/* Control registers */ #define WX_VF_MAX_RING_NUMS 8 #define WX_VX_PF_BME 0x4B8 #define WX_VF_BME_ENABLE BIT(0) @@ -12,16 +13,32 @@ #define WX_VXCTRL_RST BIT(0) #define WX_VXMRQC 0x78 +#define WX_VXMRQC_PSR_L4HDR BIT(0) +#define WX_VXMRQC_PSR_L3HDR BIT(1) +#define WX_VXMRQC_PSR_L2HDR BIT(2) +#define WX_VXMRQC_PSR_TUNHDR BIT(3) +#define WX_VXMRQC_PSR_TUNMAC BIT(4) +#define WX_VXMRQC_PSR_MASK GENMASK(5, 1) +#define WX_VXMRQC_PSR(f) FIELD_PREP(GENMASK(5, 1), f) +#define WX_VXMRQC_RSS_HASH(f) FIELD_PREP(GENMASK(15, 13), f) +#define WX_VXMRQC_RSS_MASK GENMASK(31, 16) +#define WX_VXMRQC_RSS(f) FIELD_PREP(GENMASK(31, 16), f) +#define WX_VXMRQC_RSS_ALG_IPV4_TCP BIT(0) +#define WX_VXMRQC_RSS_ALG_IPV4 BIT(1) +#define WX_VXMRQC_RSS_ALG_IPV6 BIT(4) +#define WX_VXMRQC_RSS_ALG_IPV6_TCP BIT(5) +#define WX_VXMRQC_RSS_EN BIT(8) + +#define WX_VXRSSRK(i) (0x80 + ((i) * 4)) /* i=[0,9] */ +#define WX_VXRETA(i) (0xC0 + ((i) * 4)) /* i=[0,15] */ + +/* Interrupt registers */ #define WX_VXICR 0x100 #define WX_VXIMS 0x108 #define WX_VXIMC 0x10C #define WX_VF_IRQ_CLEAR_MASK 7 #define WX_VF_MAX_TX_QUEUES 4 #define WX_VF_MAX_RX_QUEUES 4 -#define WX_VXTXDCTL(r) (0x3010 + (0x40 * (r))) -#define WX_VXRXDCTL(r) (0x1010 + (0x40 * (r))) -#define WX_VXRXDCTL_ENABLE BIT(0) -#define WX_VXTXDCTL_FLUSH BIT(26) #define WX_VXITR(i) (0x200 + (4 * (i))) /* i=[0,1] */ #define WX_VXITR_MASK GENMASK(8, 0) @@ -29,16 +46,6 @@ #define WX_VXIVAR_MISC 0x260 #define WX_VXIVAR(i) (0x240 + (4 * (i))) /* i=[0,3] */ -#define WX_VXRXDCTL_RSCMAX(f) FIELD_PREP(GENMASK(24, 23), f) -#define WX_VXRXDCTL_BUFLEN(f) FIELD_PREP(GENMASK(6, 1), f) -#define WX_VXRXDCTL_BUFSZ(f) FIELD_PREP(GENMASK(11, 8), f) -#define WX_VXRXDCTL_HDRSZ(f) FIELD_PREP(GENMASK(15, 12), f) - -#define WX_VXRXDCTL_RSCMAX_MASK GENMASK(24, 23) -#define WX_VXRXDCTL_BUFLEN_MASK GENMASK(6, 1) -#define WX_VXRXDCTL_BUFSZ_MASK GENMASK(11, 8) -#define WX_VXRXDCTL_HDRSZ_MASK GENMASK(15, 12) - #define wx_conf_size(v, mwidth, uwidth) ({ \ typeof(v) _v = (v); \ (_v == 2 << (mwidth) ? 0 : _v >> (uwidth)); \ @@ -59,44 +66,35 @@ #define WX_VXRDBAH(r) (0x1004 + (0x40 * (r))) #define WX_VXRDT(r) (0x1008 + (0x40 * (r))) #define WX_VXRDH(r) (0x100C + (0x40 * (r))) - +#define WX_VXRXDCTL(r) (0x1010 + (0x40 * (r))) +#define WX_VXRXDCTL_ENABLE BIT(0) +#define WX_VXRXDCTL_BUFLEN_MASK GENMASK(6, 1) +#define WX_VXRXDCTL_BUFLEN(f) FIELD_PREP(GENMASK(6, 1), f) +#define WX_VXRXDCTL_BUFSZ_MASK GENMASK(11, 8) +#define WX_VXRXDCTL_BUFSZ(f) FIELD_PREP(GENMASK(11, 8), f) +#define WX_VXRXDCTL_HDRSZ_MASK GENMASK(15, 12) +#define WX_VXRXDCTL_HDRSZ(f) FIELD_PREP(GENMASK(15, 12), f) +#define WX_VXRXDCTL_RSCMAX_MASK GENMASK(24, 23) +#define WX_VXRXDCTL_RSCMAX(f) FIELD_PREP(GENMASK(24, 23), f) #define WX_VXRXDCTL_RSCEN BIT(29) #define WX_VXRXDCTL_DROP BIT(30) #define WX_VXRXDCTL_VLAN BIT(31) +/* Transimit Path */ #define WX_VXTDBAL(r) (0x3000 + (0x40 * (r))) #define WX_VXTDBAH(r) (0x3004 + (0x40 * (r))) #define WX_VXTDT(r) (0x3008 + (0x40 * (r))) #define WX_VXTDH(r) (0x300C + (0x40 * (r))) - +#define WX_VXTXDCTL(r) (0x3010 + (0x40 * (r))) #define WX_VXTXDCTL_ENABLE BIT(0) #define WX_VXTXDCTL_BUFLEN(f) FIELD_PREP(GENMASK(6, 1), f) #define WX_VXTXDCTL_PTHRESH(f) FIELD_PREP(GENMASK(11, 8), f) #define WX_VXTXDCTL_WTHRESH(f) FIELD_PREP(GENMASK(22, 16), f) - -#define WX_VXMRQC_PSR(f) FIELD_PREP(GENMASK(5, 1), f) -#define WX_VXMRQC_PSR_MASK GENMASK(5, 1) -#define WX_VXMRQC_PSR_L4HDR BIT(0) -#define WX_VXMRQC_PSR_L3HDR BIT(1) -#define WX_VXMRQC_PSR_L2HDR BIT(2) -#define WX_VXMRQC_PSR_TUNHDR BIT(3) -#define WX_VXMRQC_PSR_TUNMAC BIT(4) - -#define WX_VXRSSRK(i) (0x80 + ((i) * 4)) /* i=[0,9] */ -#define WX_VXRETA(i) (0xC0 + ((i) * 4)) /* i=[0,15] */ - -#define WX_VXMRQC_RSS(f) FIELD_PREP(GENMASK(31, 16), f) -#define WX_VXMRQC_RSS_MASK GENMASK(31, 16) -#define WX_VXMRQC_RSS_ALG_IPV4_TCP BIT(0) -#define WX_VXMRQC_RSS_ALG_IPV4 BIT(1) -#define WX_VXMRQC_RSS_ALG_IPV6 BIT(4) -#define WX_VXMRQC_RSS_ALG_IPV6_TCP BIT(5) -#define WX_VXMRQC_RSS_EN BIT(8) -#define WX_VXMRQC_RSS_HASH(f) FIELD_PREP(GENMASK(15, 13), f) +#define WX_VXTXDCTL_FLUSH BIT(26) #define WX_PFLINK_STATUS(g) FIELD_GET(BIT(0), g) #define WX_PFLINK_SPEED(g) FIELD_GET(GENMASK(31, 1), g) -#define WX_VXSTATUS_SPEED(g) FIELD_GET(GENMASK(4, 1), g) +#define WX_VXSTATUS_SPEED(g) FIELD_GET(GENMASK(4, 1), g) struct wx_link_reg_fields { u32 mac_type; From 3051f49b0e03b1a5a2f8a70dbc4ae5bf3759bcb7 Mon Sep 17 00:00:00 2001 From: Waqar Hameed Date: Tue, 12 Aug 2025 14:13:58 +0200 Subject: [PATCH 0067/1536] net: enetc: Remove error print for devm_add_action_or_reset() When `devm_add_action_or_reset()` fails, it is due to a failed memory allocation and will thus return `-ENOMEM`. `dev_err_probe()` doesn't do anything when error is `-ENOMEM`. Therefore, remove the useless call to `dev_err_probe()` when `devm_add_action_or_reset()` fails, and just return the value instead. Signed-off-by: Waqar Hameed Reviewed-by: Joe Damato Link: https://patch.msgid.link/pnd1ppghh4p.a.out@axis.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/freescale/enetc/enetc4_pf.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c index b3dc1afeefd1..38fb81db48c2 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c +++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c @@ -1016,8 +1016,7 @@ static int enetc4_pf_probe(struct pci_dev *pdev, err = devm_add_action_or_reset(dev, enetc4_pci_remove, pdev); if (err) - return dev_err_probe(dev, err, - "Add enetc4_pci_remove() action failed\n"); + return err; /* si is the private data. */ si = pci_get_drvdata(pdev); From acfea9361073134e828fddbd5f8201d428d7a7b1 Mon Sep 17 00:00:00 2001 From: Andre Carvalho Date: Tue, 12 Aug 2025 20:38:23 +0100 Subject: [PATCH 0068/1536] selftests: netconsole: Validate interface selection by MAC address Extend the existing netconsole cmdline selftest to also validate that interface selection can be performed via MAC address. The test now validates that netconsole works with both interface name and MAC address, improving test coverage. Suggested-by: Breno Leitao Reviewed-by: Breno Leitao Signed-off-by: Andre Carvalho Link: https://patch.msgid.link/20250812-netcons-cmdline-selftest-v2-1-8099fb7afa9e@gmail.com Signed-off-by: Jakub Kicinski --- .../drivers/net/lib/sh/lib_netcons.sh | 10 +++- .../selftests/drivers/net/netcons_cmdline.sh | 51 ++++++++++++------- 2 files changed, 41 insertions(+), 20 deletions(-) diff --git a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh index b6071e80ebbb..8e1085e89647 100644 --- a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh +++ b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh @@ -148,12 +148,20 @@ function create_dynamic_target() { # Generate the command line argument for netconsole following: # netconsole=[+][src-port]@[src-ip]/[],[tgt-port]@/[tgt-macaddr] function create_cmdline_str() { + local BINDMODE=${1:-"ifname"} + if [ "${BINDMODE}" == "ifname" ] + then + SRCDEV=${SRCIF} + else + SRCDEV=$(mac_get "${SRCIF}") + fi + DSTMAC=$(ip netns exec "${NAMESPACE}" \ ip link show "${DSTIF}" | awk '/ether/ {print $2}') SRCPORT="1514" TGTPORT="6666" - echo "netconsole=\"+${SRCPORT}@${SRCIP}/${SRCIF},${TGTPORT}@${DSTIP}/${DSTMAC}\"" + echo "netconsole=\"+${SRCPORT}@${SRCIP}/${SRCDEV},${TGTPORT}@${DSTIP}/${DSTMAC}\"" } # Do not append the release to the header of the message diff --git a/tools/testing/selftests/drivers/net/netcons_cmdline.sh b/tools/testing/selftests/drivers/net/netcons_cmdline.sh index ad2fb8b1c463..d1d23dc67f99 100755 --- a/tools/testing/selftests/drivers/net/netcons_cmdline.sh +++ b/tools/testing/selftests/drivers/net/netcons_cmdline.sh @@ -19,9 +19,6 @@ check_netconsole_module modprobe netdevsim 2> /dev/null || true rmmod netconsole 2> /dev/null || true -# The content of kmsg will be save to the following file -OUTPUT_FILE="/tmp/${TARGET}" - # Check for basic system dependency and exit if not found # check_for_dependencies # Set current loglevel to KERN_INFO(6), and default to KERN_NOTICE(5) @@ -30,23 +27,39 @@ echo "6 5" > /proc/sys/kernel/printk trap do_cleanup EXIT # Create one namespace and two interfaces set_network -# Create the command line for netconsole, with the configuration from the -# function above -CMDLINE="$(create_cmdline_str)" -# Load the module, with the cmdline set -modprobe netconsole "${CMDLINE}" +# Run the test twice, with different cmdline parameters +for BINDMODE in "ifname" "mac" +do + echo "Running with bind mode: ${BINDMODE}" >&2 + # Create the command line for netconsole, with the configuration from + # the function above + CMDLINE=$(create_cmdline_str "${BINDMODE}") -# Listed for netconsole port inside the namespace and destination interface -listen_port_and_save_to "${OUTPUT_FILE}" & -# Wait for socat to start and listen to the port. -wait_local_port_listen "${NAMESPACE}" "${PORT}" udp -# Send the message -echo "${MSG}: ${TARGET}" > /dev/kmsg -# Wait until socat saves the file to disk -busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}" -# Make sure the message was received in the dst part -# and exit -validate_msg "${OUTPUT_FILE}" + # The content of kmsg will be save to the following file + OUTPUT_FILE="/tmp/${TARGET}-${BINDMODE}" + + # Load the module, with the cmdline set + modprobe netconsole "${CMDLINE}" + + # Listed for netconsole port inside the namespace and destination + # interface + listen_port_and_save_to "${OUTPUT_FILE}" & + # Wait for socat to start and listen to the port. + wait_local_port_listen "${NAMESPACE}" "${PORT}" udp + # Send the message + echo "${MSG}: ${TARGET}" > /dev/kmsg + # Wait until socat saves the file to disk + busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}" + # Make sure the message was received in the dst part + # and exit + validate_msg "${OUTPUT_FILE}" + + # kill socat in case it is still running + pkill_socat + # Unload the module + rmmod netconsole + echo "${BINDMODE} : Test passed" >&2 +done exit "${ksft_pass}" From 66ceb45b7d7e9673254116eefe5b6d3a44eba267 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= Date: Mon, 11 Aug 2025 11:43:18 +0200 Subject: [PATCH 0069/1536] ice: Don't use %pK through printk or tracepoints MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In the past %pK was preferable to %p as it would not leak raw pointer values into the kernel log. Since commit ad67b74d2469 ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. Furthermore, restricted pointers ("%pK") were never meant to be used through printk(). They can still unintentionally leak raw pointers or acquire sleeping locks in atomic contexts. Switch to the regular pointer formatting which is safer and easier to reason about. There are still a few users of %pK left, but these use it through seq_file, for which its usage is safe. Signed-off-by: Thomas Weißschuh Acked-by: Przemek Kitszel Reviewed-by: Aleksandr Loktionov Reviewed-by: Simon Horman Reviewed-by: Paul Menzel Reviewed-by: Jacob Keller Link: https://patch.msgid.link/20250811-restricted-pointers-net-v5-1-2e2fdc7d3f2c@linutronix.de Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/intel/ice/ice_main.c | 2 +- drivers/net/ethernet/intel/ice/ice_trace.h | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 8e0b06c1e02b..93c6e64ed940 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -9117,7 +9117,7 @@ static int ice_create_q_channels(struct ice_vsi *vsi) list_add_tail(&ch->list, &vsi->ch_list); vsi->tc_map_vsi[i] = ch->ch_vsi; dev_dbg(ice_pf_to_dev(pf), - "successfully created channel: VSI %pK\n", ch->ch_vsi); + "successfully created channel: VSI %p\n", ch->ch_vsi); } return 0; diff --git a/drivers/net/ethernet/intel/ice/ice_trace.h b/drivers/net/ethernet/intel/ice/ice_trace.h index 07aab6e130cd..4f35ef8d6b29 100644 --- a/drivers/net/ethernet/intel/ice/ice_trace.h +++ b/drivers/net/ethernet/intel/ice/ice_trace.h @@ -130,7 +130,7 @@ DECLARE_EVENT_CLASS(ice_tx_template, __entry->buf = buf; __assign_str(devname);), - TP_printk("netdev: %s ring: %pK desc: %pK buf %pK", __get_str(devname), + TP_printk("netdev: %s ring: %p desc: %p buf %p", __get_str(devname), __entry->ring, __entry->desc, __entry->buf) ); @@ -158,7 +158,7 @@ DECLARE_EVENT_CLASS(ice_rx_template, __entry->desc = desc; __assign_str(devname);), - TP_printk("netdev: %s ring: %pK desc: %pK", __get_str(devname), + TP_printk("netdev: %s ring: %p desc: %p", __get_str(devname), __entry->ring, __entry->desc) ); DEFINE_EVENT(ice_rx_template, ice_clean_rx_irq, @@ -182,7 +182,7 @@ DECLARE_EVENT_CLASS(ice_rx_indicate_template, __entry->skb = skb; __assign_str(devname);), - TP_printk("netdev: %s ring: %pK desc: %pK skb %pK", __get_str(devname), + TP_printk("netdev: %s ring: %p desc: %p skb %p", __get_str(devname), __entry->ring, __entry->desc, __entry->skb) ); @@ -205,7 +205,7 @@ DECLARE_EVENT_CLASS(ice_xmit_template, __entry->skb = skb; __assign_str(devname);), - TP_printk("netdev: %s skb: %pK ring: %pK", __get_str(devname), + TP_printk("netdev: %s skb: %p ring: %p", __get_str(devname), __entry->skb, __entry->ring) ); @@ -228,7 +228,7 @@ DECLARE_EVENT_CLASS(ice_tx_tstamp_template, TP_fast_assign(__entry->skb = skb; __entry->idx = idx;), - TP_printk("skb %pK idx %d", + TP_printk("skb %p idx %d", __entry->skb, __entry->idx) ); #define DEFINE_TX_TSTAMP_OP_EVENT(name) \ From e2068f74b97653356ad7d6ce456db1f5b7fb575e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= Date: Mon, 11 Aug 2025 11:43:19 +0200 Subject: [PATCH 0070/1536] net/mlx5: Don't use %pK through tracepoints MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In the past %pK was preferable to %p as it would not leak raw pointer values into the kernel log. Since commit ad67b74d2469 ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. Furthermore, restricted pointers ("%pK") were never meant to be used through tracepoints. They can still unintentionally leak raw pointers or acquire sleeping locks in atomic contexts. Switch to the regular pointer formatting which is safer and easier to reason about. There are still a few users of %pK left, but these use it through seq_file, for which its usage is safe. Signed-off-by: Thomas Weißschuh Reviewed-by: Aleksandr Loktionov Reviewed-by: Tariq Toukan Reviewed-by: Simon Horman Reviewed-by: Paul Menzel Reviewed-by: Jacob Keller Link: https://patch.msgid.link/20250811-restricted-pointers-net-v5-2-2e2fdc7d3f2c@linutronix.de Signed-off-by: Jakub Kicinski --- .../ethernet/mellanox/mlx5/core/sf/dev/diag/dev_tracepoint.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/diag/dev_tracepoint.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/diag/dev_tracepoint.h index 0537de86f981..9b0f44253f33 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/diag/dev_tracepoint.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/diag/dev_tracepoint.h @@ -28,7 +28,7 @@ DECLARE_EVENT_CLASS(mlx5_sf_dev_template, __entry->hw_fn_id = sfdev->fn_id; __entry->sfnum = sfdev->sfnum; ), - TP_printk("(%s) sfdev=%pK aux_id=%d hw_id=0x%x sfnum=%u\n", + TP_printk("(%s) sfdev=%p aux_id=%d hw_id=0x%x sfnum=%u\n", __get_str(devname), __entry->sfdev, __entry->aux_id, __entry->hw_fn_id, __entry->sfnum) From 40e819747b45fd254ab99cfe39ba8787d6bbd33d Mon Sep 17 00:00:00 2001 From: Brian Masney Date: Sun, 10 Aug 2025 18:24:14 -0400 Subject: [PATCH 0071/1536] net: cadence: macb: convert from round_rate() to determine_rate() The round_rate() clk ops is deprecated, so migrate this driver from round_rate() to determine_rate(). Signed-off-by: Brian Masney Reviewed-by: Jacob Keller Link: https://patch.msgid.link/20250810-net-round-rate-v1-1-dbb237c9fe5c@redhat.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/cadence/macb_main.c | 57 ++++++++++++++---------- 1 file changed, 33 insertions(+), 24 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index ce95fad8cedd..ce55a1f59b50 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -4822,36 +4822,45 @@ static unsigned long fu540_macb_tx_recalc_rate(struct clk_hw *hw, return mgmt->rate; } -static long fu540_macb_tx_round_rate(struct clk_hw *hw, unsigned long rate, - unsigned long *parent_rate) +static int fu540_macb_tx_determine_rate(struct clk_hw *hw, + struct clk_rate_request *req) { - if (WARN_ON(rate < 2500000)) - return 2500000; - else if (rate == 2500000) - return 2500000; - else if (WARN_ON(rate < 13750000)) - return 2500000; - else if (WARN_ON(rate < 25000000)) - return 25000000; - else if (rate == 25000000) - return 25000000; - else if (WARN_ON(rate < 75000000)) - return 25000000; - else if (WARN_ON(rate < 125000000)) - return 125000000; - else if (rate == 125000000) - return 125000000; + if (WARN_ON(req->rate < 2500000)) + req->rate = 2500000; + else if (req->rate == 2500000) + req->rate = 2500000; + else if (WARN_ON(req->rate < 13750000)) + req->rate = 2500000; + else if (WARN_ON(req->rate < 25000000)) + req->rate = 25000000; + else if (req->rate == 25000000) + req->rate = 25000000; + else if (WARN_ON(req->rate < 75000000)) + req->rate = 25000000; + else if (WARN_ON(req->rate < 125000000)) + req->rate = 125000000; + else if (req->rate == 125000000) + req->rate = 125000000; + else if (WARN_ON(req->rate > 125000000)) + req->rate = 125000000; + else + req->rate = 125000000; - WARN_ON(rate > 125000000); - - return 125000000; + return 0; } static int fu540_macb_tx_set_rate(struct clk_hw *hw, unsigned long rate, unsigned long parent_rate) { - rate = fu540_macb_tx_round_rate(hw, rate, &parent_rate); - if (rate != 125000000) + struct clk_rate_request req; + int ret; + + clk_hw_init_rate_request(hw, &req, rate); + ret = fu540_macb_tx_determine_rate(hw, &req); + if (ret != 0) + return ret; + + if (req.rate != 125000000) iowrite32(1, mgmt->reg); else iowrite32(0, mgmt->reg); @@ -4862,7 +4871,7 @@ static int fu540_macb_tx_set_rate(struct clk_hw *hw, unsigned long rate, static const struct clk_ops fu540_c000_ops = { .recalc_rate = fu540_macb_tx_recalc_rate, - .round_rate = fu540_macb_tx_round_rate, + .determine_rate = fu540_macb_tx_determine_rate, .set_rate = fu540_macb_tx_set_rate, }; From f22cc6f766f84496b260347d4f0d92cf95f30699 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 11 Aug 2025 16:42:09 -0700 Subject: [PATCH 0072/1536] net: ethtool: support including Flow Label in the flow hash for RSS Some modern NICs support including the IPv6 Flow Label in the flow hash for RSS queue selection. This is outside the old "Microsoft spec", but was included in the OCP NIC spec: [ ] RSS include flow label in the hash (configurable) https://www.opencompute.org/w/index.php?title=Core_Offloads#Receive_Side_Scaling RSS Flow Label hashing allows TCP Protective Load Balancing (PLB) to recover from receiver congestion / overload. Rx CPU/queue hotspots are relatively common for data ingest workloads, and so far we had to try to detect the condition at the RPC layer and reopen the connection. PLB lets us change the Flow Label and therefore Rx CPU on RTO, with minimal packet reordering. PLB reaction times are much faster, and can happen at any point in the connection, not just at RPC boundaries. Due to the nature of host processing (relatively long queues, other kernel subsystems masking IRQs for 100s of msecs) the risk of reordering within the host is higher than in the network. But for applications which need it - it is far preferable to potentially persistent overload of subset of queues. It is expected that the hash communicated to the host may change if the Flow Label changes. This may be surprising to some host software, but I don't expect the devices can compute two Toeplitz hashes, one with the Flow Label for queue selection and one without for the rx hash communicated to the host. Besides, changing the hash may potentially help to change the path thru host queues. User can disable NETIF_F_RXHASH if they require a stable flow hash. The name RXH_IP6_FL was chosen based on what we call Flow Label variables in IPv6 processing (fl). I prefer fl_lbl but that appears to be an fbnic-only spelling. We could spell out RXH_IP6_FLOW_LABEL but existing RXH_ defines are a lot more terse. Willem notes [1] that Flow Label is defined as identifying the flow and therefore including both the flow label _and_ the L4 header fields is not generally necessary. But it should not hurt so it's not explicitly prevented if the driver supports hashing on both at the same time. Link: https://lore.kernel.org/68483433b45e2_3cd66f29440@willemb.c.googlers.com.notmuch [1] Signed-off-by: Jakub Kicinski Reviewed-by: Joe Damato Link: https://patch.msgid.link/20250811234212.580748-2-kuba@kernel.org Signed-off-by: Paolo Abeni --- Documentation/netlink/specs/ethtool.yaml | 3 +++ include/uapi/linux/ethtool.h | 1 + net/ethtool/ioctl.c | 25 ++++++++++++++++++++++ net/ethtool/rss.c | 27 ++++++++++++------------ 4 files changed, 43 insertions(+), 13 deletions(-) diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml index 1bc1bd7d33c2..7a7594713f1f 100644 --- a/Documentation/netlink/specs/ethtool.yaml +++ b/Documentation/netlink/specs/ethtool.yaml @@ -204,6 +204,9 @@ definitions: doc: dst port in case of TCP/UDP/SCTP - name: gtp-teid + - + name: ip6-fl + doc: IPv6 Flow Label - name: discard value: 31 diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 9e9afdd1238a..8bd5ea5469d9 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -2380,6 +2380,7 @@ enum { #define RXH_L4_B_0_1 (1 << 6) /* src port in case of TCP/UDP/SCTP */ #define RXH_L4_B_2_3 (1 << 7) /* dst port in case of TCP/UDP/SCTP */ #define RXH_GTP_TEID (1 << 8) /* teid in case of GTP */ +#define RXH_IP6_FL (1 << 9) /* IPv6 flow label */ #define RXH_DISCARD (1 << 31) #define RX_CLS_FLOW_DISC 0xffffffffffffffffULL diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c index 43a7854e784e..0b2a4d0573b3 100644 --- a/net/ethtool/ioctl.c +++ b/net/ethtool/ioctl.c @@ -1014,6 +1014,28 @@ static bool flow_type_hashable(u32 flow_type) return false; } +static bool flow_type_v6(u32 flow_type) +{ + switch (flow_type) { + case TCP_V6_FLOW: + case UDP_V6_FLOW: + case SCTP_V6_FLOW: + case AH_ESP_V6_FLOW: + case AH_V6_FLOW: + case ESP_V6_FLOW: + case IPV6_FLOW: + case GTPU_V6_FLOW: + case GTPC_V6_FLOW: + case GTPC_TEID_V6_FLOW: + case GTPU_EH_V6_FLOW: + case GTPU_UL_V6_FLOW: + case GTPU_DL_V6_FLOW: + return true; + } + + return false; +} + /* When adding a new type, update the assert and, if it's hashable, add it to * the flow_type_hashable switch case. */ @@ -1077,6 +1099,9 @@ ethtool_set_rxfh_fields(struct net_device *dev, u32 cmd, void __user *useraddr) if (rc) return rc; + if (info.data & RXH_IP6_FL && !flow_type_v6(info.flow_type)) + return -EINVAL; + if (info.flow_type & FLOW_RSS && info.rss_context && !ops->rxfh_per_ctx_fields) return -EINVAL; diff --git a/net/ethtool/rss.c b/net/ethtool/rss.c index 992e98abe9dd..202d95e8bf3e 100644 --- a/net/ethtool/rss.c +++ b/net/ethtool/rss.c @@ -536,35 +536,36 @@ void ethtool_rss_notify(struct net_device *dev, u32 type, u32 rss_context) #define RFH_MASK (RXH_L2DA | RXH_VLAN | RXH_IP_SRC | RXH_IP_DST | \ RXH_L3_PROTO | RXH_L4_B_0_1 | RXH_L4_B_2_3 | \ RXH_GTP_TEID | RXH_DISCARD) +#define RFH_MASKv6 (RFH_MASK | RXH_IP6_FL) static const struct nla_policy ethnl_rss_flows_policy[] = { [ETHTOOL_A_FLOW_ETHER] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), [ETHTOOL_A_FLOW_IP4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_IP6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), + [ETHTOOL_A_FLOW_IP6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), [ETHTOOL_A_FLOW_TCP4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), [ETHTOOL_A_FLOW_UDP4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), [ETHTOOL_A_FLOW_SCTP4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), [ETHTOOL_A_FLOW_AH_ESP4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_TCP6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_UDP6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_SCTP6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_AH_ESP6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), + [ETHTOOL_A_FLOW_TCP6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), + [ETHTOOL_A_FLOW_UDP6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), + [ETHTOOL_A_FLOW_SCTP6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), + [ETHTOOL_A_FLOW_AH_ESP6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), [ETHTOOL_A_FLOW_AH4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), [ETHTOOL_A_FLOW_ESP4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_AH6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_ESP6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), + [ETHTOOL_A_FLOW_AH6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), + [ETHTOOL_A_FLOW_ESP6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), [ETHTOOL_A_FLOW_GTPU4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_GTPU6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), + [ETHTOOL_A_FLOW_GTPU6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), [ETHTOOL_A_FLOW_GTPC4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_GTPC6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), + [ETHTOOL_A_FLOW_GTPC6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), [ETHTOOL_A_FLOW_GTPC_TEID4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_GTPC_TEID6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), + [ETHTOOL_A_FLOW_GTPC_TEID6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), [ETHTOOL_A_FLOW_GTPU_EH4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_GTPU_EH6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), + [ETHTOOL_A_FLOW_GTPU_EH6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), [ETHTOOL_A_FLOW_GTPU_UL4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_GTPU_UL6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), + [ETHTOOL_A_FLOW_GTPU_UL6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), [ETHTOOL_A_FLOW_GTPU_DL4] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), - [ETHTOOL_A_FLOW_GTPU_DL6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASK), + [ETHTOOL_A_FLOW_GTPU_DL6] = NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6), }; const struct nla_policy ethnl_rss_set_policy[ETHTOOL_A_RSS_FLOW_HASH + 1] = { From 0afbfdc0f64a8525624b317a99050082be08ceb2 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 11 Aug 2025 16:42:10 -0700 Subject: [PATCH 0073/1536] eth: fbnic: support RSS on IPv6 Flow Label Support IPv6 Flow Label hashing. Use both inner and outer IPv6 header's Flow Label if both headers are detected. Flow Label is unlike normal header fields, by enabling it user accepts the unstable hash and possible reordering. Because of that I think it's reasonable to hash over all Flow Labels we can find, even tho we don't hash over all L3 addresses. Signed-off-by: Jakub Kicinski Link: https://patch.msgid.link/20250811234212.580748-3-kuba@kernel.org Signed-off-by: Paolo Abeni --- drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c | 2 +- drivers/net/ethernet/meta/fbnic/fbnic_rpc.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c index dc7ba8d5fc43..461c8661eb93 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c @@ -1310,7 +1310,7 @@ fbnic_get_rss_hash_opts(struct net_device *netdev, #define FBNIC_L2_HASH_OPTIONS \ (RXH_L2DA | RXH_DISCARD) #define FBNIC_L3_HASH_OPTIONS \ - (FBNIC_L2_HASH_OPTIONS | RXH_IP_SRC | RXH_IP_DST) + (FBNIC_L2_HASH_OPTIONS | RXH_IP_SRC | RXH_IP_DST | RXH_IP6_FL) #define FBNIC_L4_HASH_OPTIONS \ (FBNIC_L3_HASH_OPTIONS | RXH_L4_B_0_1 | RXH_L4_B_2_3) diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c index 8ff07b5562e3..a4dc1024c0c2 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c @@ -71,6 +71,8 @@ u16 fbnic_flow_hash_2_rss_en_mask(struct fbnic_net *fbn, int flow_type) rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(IP_DST, IP_DST, flow_hash); rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(L4_B_0_1, L4_SRC, flow_hash); rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(L4_B_2_3, L4_DST, flow_hash); + rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(IP6_FL, OV6_FL_LBL, flow_hash); + rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(IP6_FL, IV6_FL_LBL, flow_hash); return rss_en_mask; } From 46c0faa46378ee54404383629641857476dc77f7 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 11 Aug 2025 16:42:11 -0700 Subject: [PATCH 0074/1536] eth: bnxt: support RSS on IPv6 Flow Label It appears that the bnxt FW API has the relevant bit for Flow Label hashing. Plumb in the support. Obey the capability bit. Signed-off-by: Jakub Kicinski Reviewed-by: Michael Chan Link: https://patch.msgid.link/20250811234212.580748-4-kuba@kernel.org Signed-off-by: Paolo Abeni --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 ++ drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 + .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 20 +++++++++++++++---- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 5578ddcb465d..ab74c71c4557 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -6957,6 +6957,8 @@ static int bnxt_hwrm_vnic_qcaps(struct bnxt *bp) bp->rss_cap |= BNXT_RSS_CAP_ESP_V4_RSS_CAP; if (flags & VNIC_QCAPS_RESP_FLAGS_RSS_IPSEC_ESP_SPI_IPV6_CAP) bp->rss_cap |= BNXT_RSS_CAP_ESP_V6_RSS_CAP; + if (flags & VNIC_QCAPS_RESP_FLAGS_RSS_IPV6_FLOW_LABEL_CAP) + bp->rss_cap |= BNXT_RSS_CAP_IPV6_FLOW_LABEL_RSS_CAP; if (flags & VNIC_QCAPS_RESP_FLAGS_RE_FLUSH_CAP) bp->fw_cap |= BNXT_FW_CAP_VNIC_RE_FLUSH; } diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index fda0d3cc6227..40ae34923511 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -2407,6 +2407,7 @@ struct bnxt { #define BNXT_RSS_CAP_ESP_V4_RSS_CAP BIT(6) #define BNXT_RSS_CAP_ESP_V6_RSS_CAP BIT(7) #define BNXT_RSS_CAP_MULTI_RSS_CTX BIT(8) +#define BNXT_RSS_CAP_IPV6_FLOW_LABEL_RSS_CAP BIT(9) u8 rss_hash_key[HW_HASH_KEY_SIZE]; u8 rss_hash_key_valid:1; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index 1b37612b1c01..68a4ee9f69b1 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -1584,6 +1584,8 @@ static u64 get_ethtool_ipv6_rss(struct bnxt *bp) { if (bp->rss_hash_cfg & VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6) return RXH_IP_SRC | RXH_IP_DST; + if (bp->rss_hash_cfg & VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL) + return RXH_IP_SRC | RXH_IP_DST | RXH_IP6_FL; return 0; } @@ -1662,13 +1664,18 @@ static int bnxt_set_rxfh_fields(struct net_device *dev, if (cmd->data == RXH_4TUPLE) tuple = 4; - else if (cmd->data == RXH_2TUPLE) + else if (cmd->data == RXH_2TUPLE || + cmd->data == (RXH_2TUPLE | RXH_IP6_FL)) tuple = 2; else if (!cmd->data) tuple = 0; else return -EINVAL; + if (cmd->data & RXH_IP6_FL && + !(bp->rss_cap & BNXT_RSS_CAP_IPV6_FLOW_LABEL_RSS_CAP)) + return -EINVAL; + if (cmd->flow_type == TCP_V4_FLOW) { rss_hash_cfg &= ~VNIC_RSS_CFG_REQ_HASH_TYPE_TCP_IPV4; if (tuple == 4) @@ -1732,10 +1739,15 @@ static int bnxt_set_rxfh_fields(struct net_device *dev, case AH_V6_FLOW: case ESP_V6_FLOW: case IPV6_FLOW: - if (tuple == 2) + rss_hash_cfg &= ~(VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6 | + VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL); + if (!tuple) + break; + if (cmd->data & RXH_IP6_FL) + rss_hash_cfg |= + VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL; + else if (tuple == 2) rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6; - else if (!tuple) - rss_hash_cfg &= ~VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6; break; } From 26dbe030ff08930bb243af3c66dd3b7e7bb8406d Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 11 Aug 2025 16:42:12 -0700 Subject: [PATCH 0075/1536] selftests: drv-net: add test for RSS on flow label Add a simple test for checking that RSS on flow label works, and that its rejected for IPv4 flows. # ./tools/testing/selftests/drivers/net/hw/rss_flow_label.py TAP version 13 1..2 ok 1 rss_flow_label.test_rss_flow_label ok 2 rss_flow_label.test_rss_flow_label_6only # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0 Reviewed-by: Willem de Bruijn Signed-off-by: Jakub Kicinski Reviewed-by: Joe Damato Link: https://patch.msgid.link/20250811234212.580748-5-kuba@kernel.org Signed-off-by: Paolo Abeni --- .../testing/selftests/drivers/net/hw/Makefile | 1 + .../drivers/net/hw/rss_flow_label.py | 167 ++++++++++++++++++ 2 files changed, 168 insertions(+) create mode 100755 tools/testing/selftests/drivers/net/hw/rss_flow_label.py diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile index fdc97355588c..5159fd34cb33 100644 --- a/tools/testing/selftests/drivers/net/hw/Makefile +++ b/tools/testing/selftests/drivers/net/hw/Makefile @@ -18,6 +18,7 @@ TEST_PROGS = \ pp_alloc_fail.py \ rss_api.py \ rss_ctx.py \ + rss_flow_label.py \ rss_input_xfrm.py \ tso.py \ xsk_reconfig.py \ diff --git a/tools/testing/selftests/drivers/net/hw/rss_flow_label.py b/tools/testing/selftests/drivers/net/hw/rss_flow_label.py new file mode 100755 index 000000000000..6fa95fe27c47 --- /dev/null +++ b/tools/testing/selftests/drivers/net/hw/rss_flow_label.py @@ -0,0 +1,167 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +""" +Tests for RSS hashing on IPv6 Flow Label. +""" + +import glob +import os +import socket +from lib.py import CmdExitFailure +from lib.py import ksft_run, ksft_exit, ksft_eq, ksft_ge, ksft_in, \ + ksft_not_in, ksft_raises, KsftSkipEx +from lib.py import bkg, cmd, defer, fd_read_timeout, rand_port +from lib.py import NetDrvEpEnv + + +def _check_system(cfg): + if not hasattr(socket, "SO_INCOMING_CPU"): + raise KsftSkipEx("socket.SO_INCOMING_CPU was added in Python 3.11") + + qcnt = len(glob.glob(f"/sys/class/net/{cfg.ifname}/queues/rx-*")) + if qcnt < 2: + raise KsftSkipEx(f"Local has only {qcnt} queues") + + for f in [f"/sys/class/net/{cfg.ifname}/queues/rx-0/rps_flow_cnt", + f"/sys/class/net/{cfg.ifname}/queues/rx-0/rps_cpus"]: + try: + with open(f, 'r') as fp: + setting = fp.read().strip() + # CPU mask will be zeros and commas + if setting.replace("0", "").replace(",", ""): + raise KsftSkipEx(f"RPS/RFS is configured: {f}: {setting}") + except FileNotFoundError: + pass + + # 1 is the default, if someone changed it we probably shouldn"t mess with it + af = cmd("cat /proc/sys/net/ipv6/auto_flowlabels", host=cfg.remote).stdout + if af.strip() != "1": + raise KsftSkipEx("Remote does not have auto_flowlabels enabled") + + +def _ethtool_get_cfg(cfg, fl_type): + descr = cmd(f"ethtool -n {cfg.ifname} rx-flow-hash {fl_type}").stdout + + converter = { + "IP SA": "s", + "IP DA": "d", + "L3 proto": "t", + "L4 bytes 0 & 1 [TCP/UDP src port]": "f", + "L4 bytes 2 & 3 [TCP/UDP dst port]": "n", + "IPv6 Flow Label": "l", + } + + ret = "" + for line in descr.split("\n")[1:-2]: + # if this raises we probably need to add more keys to converter above + ret += converter[line] + return ret + + +def _traffic(cfg, one_sock, one_cpu): + local_port = rand_port(socket.SOCK_DGRAM) + remote_port = rand_port(socket.SOCK_DGRAM) + + sock = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM) + sock.bind(("", local_port)) + sock.connect((cfg.remote_addr_v["6"], 0)) + if one_sock: + send = f"exec 5<>/dev/udp/{cfg.addr_v['6']}/{local_port}; " \ + "for i in `seq 20`; do echo a >&5; sleep 0.02; done; exec 5>&-" + else: + send = "for i in `seq 20`; do echo a | socat -t0.02 - UDP6:" \ + f"[{cfg.addr_v['6']}]:{local_port},sourceport={remote_port}; done" + + cpus = set() + with bkg(send, shell=True, host=cfg.remote, exit_wait=True): + for _ in range(20): + fd_read_timeout(sock.fileno(), 1) + cpu = sock.getsockopt(socket.SOL_SOCKET, socket.SO_INCOMING_CPU) + cpus.add(cpu) + + if one_cpu: + ksft_eq(len(cpus), 1, + f"{one_sock=} - expected one CPU, got traffic on: {cpus=}") + else: + ksft_ge(len(cpus), 2, + f"{one_sock=} - expected many CPUs, got traffic on: {cpus=}") + + +def test_rss_flow_label(cfg): + """ + Test hashing on IPv6 flow label. Send traffic over a single socket + and over multiple sockets. Depend on the remote having auto-label + enabled so that it randomizes the label per socket. + """ + + cfg.require_ipver("6") + cfg.require_cmd("socat", remote=True) + _check_system(cfg) + + # Enable flow label hashing for UDP6 + initial = _ethtool_get_cfg(cfg, "udp6") + no_lbl = initial.replace("l", "") + if "l" not in initial: + try: + cmd(f"ethtool -N {cfg.ifname} rx-flow-hash udp6 l{no_lbl}") + except CmdExitFailure as exc: + raise KsftSkipEx("Device doesn't support Flow Label for UDP6") from exc + + defer(cmd, f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {initial}") + + _traffic(cfg, one_sock=True, one_cpu=True) + _traffic(cfg, one_sock=False, one_cpu=False) + + # Disable it, we should see no hashing (reset was already defer()ed) + cmd(f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {no_lbl}") + + _traffic(cfg, one_sock=False, one_cpu=True) + + +def _check_v4_flow_types(cfg): + for fl_type in ["tcp4", "udp4", "ah4", "esp4", "sctp4"]: + try: + cur = cmd(f"ethtool -n {cfg.ifname} rx-flow-hash {fl_type}").stdout + ksft_not_in("Flow Label", cur, + comment=f"{fl_type=} has Flow Label:" + cur) + except CmdExitFailure: + # Probably does not support this flow type + pass + + +def test_rss_flow_label_6only(cfg): + """ + Test interactions with IPv4 flow types. It should not be possible to set + IPv6 Flow Label hashing for an IPv4 flow type. The Flow Label should also + not appear in the IPv4 "current config". + """ + + with ksft_raises(CmdExitFailure) as cm: + cmd(f"ethtool -N {cfg.ifname} rx-flow-hash tcp4 sdfnl") + ksft_in("Invalid argument", cm.exception.cmd.stderr) + + _check_v4_flow_types(cfg) + + # Try to enable Flow Labels and check again, in case it leaks thru + initial = _ethtool_get_cfg(cfg, "udp6") + changed = initial.replace("l", "") if "l" in initial else initial + "l" + + cmd(f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {changed}") + restore = defer(cmd, f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {initial}") + + _check_v4_flow_types(cfg) + restore.exec() + _check_v4_flow_types(cfg) + + +def main() -> None: + with NetDrvEpEnv(__file__, nsim_test=False) as cfg: + ksft_run([test_rss_flow_label, + test_rss_flow_label_6only], + args=(cfg, )) + ksft_exit() + + +if __name__ == "__main__": + main() From ba7fad179699dbbd3dfdc9ca5594d7f3c425a2a2 Mon Sep 17 00:00:00 2001 From: Dave Ertman Date: Mon, 16 Jun 2025 13:03:16 +0200 Subject: [PATCH 0076/1536] ice: Remove casts on void pointers in LAG code This series will be touching on the LAG code in the ice driver, to prevent moving or propagating casting on void pointers, clean them up first. This also allows for moving the variable initialization into the variable declaration. Reviewed-by: Aleksandr Loktionov Reviewed-by: Przemek Kitszel Signed-off-by: Dave Ertman Tested-by: Sujai Buvaneswaran Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_lag.c | 24 ++++++++---------------- 1 file changed, 8 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c index b1129da72139..96b11d8f1422 100644 --- a/drivers/net/ethernet/intel/ice/ice_lag.c +++ b/drivers/net/ethernet/intel/ice/ice_lag.c @@ -321,7 +321,7 @@ ice_lag_cfg_drop_fltr(struct ice_lag *lag, bool add) static void ice_lag_cfg_pf_fltrs(struct ice_lag *lag, void *ptr) { - struct netdev_notifier_bonding_info *info; + struct netdev_notifier_bonding_info *info = ptr; struct netdev_bonding_info *bonding_info; struct net_device *event_netdev; struct device *dev; @@ -331,7 +331,6 @@ ice_lag_cfg_pf_fltrs(struct ice_lag *lag, void *ptr) if (event_netdev != lag->netdev) return; - info = (struct netdev_notifier_bonding_info *)ptr; bonding_info = &info->bonding_info; dev = ice_pf_to_dev(lag->pf); @@ -873,13 +872,12 @@ void ice_lag_complete_vf_reset(struct ice_lag *lag, u8 act_prt) */ static void ice_lag_info_event(struct ice_lag *lag, void *ptr) { - struct netdev_notifier_bonding_info *info; + struct netdev_notifier_bonding_info *info = ptr; struct netdev_bonding_info *bonding_info; struct net_device *event_netdev; const char *lag_netdev_name; event_netdev = netdev_notifier_info_to_dev(ptr); - info = ptr; lag_netdev_name = netdev_name(lag->netdev); bonding_info = &info->bonding_info; @@ -1344,11 +1342,10 @@ static void ice_lag_init_feature_support_flag(struct ice_pf *pf) */ static void ice_lag_changeupper_event(struct ice_lag *lag, void *ptr) { - struct netdev_notifier_changeupper_info *info; + struct netdev_notifier_changeupper_info *info = ptr; struct ice_lag *primary_lag; struct net_device *netdev; - info = ptr; netdev = netdev_notifier_info_to_dev(ptr); /* not for this netdev */ @@ -1408,7 +1405,7 @@ static void ice_lag_changeupper_event(struct ice_lag *lag, void *ptr) */ static void ice_lag_monitor_link(struct ice_lag *lag, void *ptr) { - struct netdev_notifier_changeupper_info *info; + struct netdev_notifier_changeupper_info *info = ptr; struct ice_hw *prim_hw, *active_hw; struct net_device *event_netdev; struct ice_pf *pf; @@ -1425,7 +1422,6 @@ static void ice_lag_monitor_link(struct ice_lag *lag, void *ptr) prim_hw = &pf->hw; prim_port = prim_hw->port_info->lport; - info = (struct netdev_notifier_changeupper_info *)ptr; if (info->upper_dev != lag->upper_netdev) return; @@ -1454,8 +1450,8 @@ static void ice_lag_monitor_link(struct ice_lag *lag, void *ptr) */ static void ice_lag_monitor_active(struct ice_lag *lag, void *ptr) { + struct netdev_notifier_bonding_info *info = ptr; struct net_device *event_netdev, *event_upper; - struct netdev_notifier_bonding_info *info; struct netdev_bonding_info *bonding_info; struct ice_netdev_priv *event_np; struct ice_pf *pf, *event_pf; @@ -1480,7 +1476,6 @@ static void ice_lag_monitor_active(struct ice_lag *lag, void *ptr) event_port = event_pf->hw.port_info->lport; prim_port = pf->hw.port_info->lport; - info = (struct netdev_notifier_bonding_info *)ptr; bonding_info = &info->bonding_info; if (!bonding_info->slave.state) { @@ -1527,8 +1522,8 @@ static void ice_lag_monitor_active(struct ice_lag *lag, void *ptr) static bool ice_lag_chk_comp(struct ice_lag *lag, void *ptr) { + struct netdev_notifier_bonding_info *info = ptr; struct net_device *event_netdev, *event_upper; - struct netdev_notifier_bonding_info *info; struct netdev_bonding_info *bonding_info; struct list_head *tmp; struct device *dev; @@ -1554,7 +1549,6 @@ ice_lag_chk_comp(struct ice_lag *lag, void *ptr) return false; } - info = (struct netdev_notifier_bonding_info *)ptr; bonding_info = &info->bonding_info; lag->bond_mode = bonding_info->master.bond_mode; if (lag->bond_mode != BOND_MODE_ACTIVEBACKUP) { @@ -1664,10 +1658,9 @@ ice_lag_unregister(struct ice_lag *lag, struct net_device *event_netdev) static void ice_lag_monitor_rdma(struct ice_lag *lag, void *ptr) { - struct netdev_notifier_changeupper_info *info; + struct netdev_notifier_changeupper_info *info = ptr; struct net_device *netdev; - info = ptr; netdev = netdev_notifier_info_to_dev(ptr); if (netdev != lag->netdev) @@ -1837,9 +1830,8 @@ ice_lag_event_handler(struct notifier_block *notif_blk, unsigned long event, lag_work->lag = lag; lag_work->event = event; if (event == NETDEV_CHANGEUPPER) { - struct netdev_notifier_changeupper_info *info; + struct netdev_notifier_changeupper_info *info = ptr; - info = ptr; upper_netdev = info->upper_dev; } else { upper_netdev = netdev_master_upper_dev_get(netdev); From 5b35b83d0d759e8313e0dd921fb9b9ac947c2585 Mon Sep 17 00:00:00 2001 From: Dave Ertman Date: Mon, 16 Jun 2025 13:03:17 +0200 Subject: [PATCH 0077/1536] ice: replace u8 elements with bool where appropriate In preparation for the new LAG functionality implementation, there are a couple of existing LAG elements of the capabilities struct that should be bool instead of u8. Since we are adding a new element to this struct that should also be a bool, fix the existing LAG u8 in this patch and eliminate !! operators where possible. Reviewed-by: Przemek Kitszel Signed-off-by: Dave Ertman Tested-by: Sujai Buvaneswaran Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_common.c | 4 ++-- drivers/net/ethernet/intel/ice/ice_type.h | 5 +++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c index 003d60a4db21..209f42045fea 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.c +++ b/drivers/net/ethernet/intel/ice/ice_common.c @@ -2418,10 +2418,10 @@ ice_parse_common_caps(struct ice_hw *hw, struct ice_hw_common_caps *caps, caps->reset_restrict_support); break; case LIBIE_AQC_CAPS_FW_LAG_SUPPORT: - caps->roce_lag = !!(number & LIBIE_AQC_BIT_ROCEV2_LAG); + caps->roce_lag = number & LIBIE_AQC_BIT_ROCEV2_LAG; ice_debug(hw, ICE_DBG_INIT, "%s: roce_lag = %u\n", prefix, caps->roce_lag); - caps->sriov_lag = !!(number & LIBIE_AQC_BIT_SRIOV_LAG); + caps->sriov_lag = number & LIBIE_AQC_BIT_SRIOV_LAG; ice_debug(hw, ICE_DBG_INIT, "%s: sriov_lag = %u\n", prefix, caps->sriov_lag); break; diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h index 03c6c271865d..2b07d5e48847 100644 --- a/drivers/net/ethernet/intel/ice/ice_type.h +++ b/drivers/net/ethernet/intel/ice/ice_type.h @@ -293,8 +293,9 @@ struct ice_hw_common_caps { u8 dcb; u8 ieee_1588; u8 rdma; - u8 roce_lag; - u8 sriov_lag; + + bool roce_lag; + bool sriov_lag; bool nvm_update_pending_nvm; bool nvm_update_pending_orom; From a66b3b537d2133e8980e49d68b3573b5aec23699 Mon Sep 17 00:00:00 2001 From: Dave Ertman Date: Mon, 16 Jun 2025 13:03:18 +0200 Subject: [PATCH 0078/1536] ice: Add driver specific prefix to LAG defines A define in the LAG code is missing a driver specific prefix. Add a prefix to the define. Also shorten a defines name and move to a more logical place. Reviewed-by: Przemek Kitszel Signed-off-by: Dave Ertman Tested-by: Sujai Buvaneswaran Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_lag.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c index 96b11d8f1422..48efd8d27296 100644 --- a/drivers/net/ethernet/intel/ice/ice_lag.c +++ b/drivers/net/ethernet/intel/ice/ice_lag.c @@ -10,12 +10,14 @@ #define ICE_LAG_RES_SHARED BIT(14) #define ICE_LAG_RES_VALID BIT(15) -#define LACP_TRAIN_PKT_LEN 16 -static const u8 lacp_train_pkt[LACP_TRAIN_PKT_LEN] = { 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, - 0x88, 0x09, 0, 0 }; +#define ICE_TRAIN_PKT_LEN 16 +static const u8 lacp_train_pkt[ICE_TRAIN_PKT_LEN] = { 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, + 0x88, 0x09, 0, 0 }; #define ICE_RECIPE_LEN 64 +#define ICE_LAG_SRIOV_CP_RECIPE 10 + static const u8 ice_dflt_vsi_rcp[ICE_RECIPE_LEN] = { 0x05, 0, 0, 0, 0x20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x85, 0, 0x01, 0, 0, 0, 0xff, 0xff, 0x08, 0, 0, 0, 0, 0, 0, 0, @@ -766,9 +768,6 @@ void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8 dst_prt) ice_lag_destroy_netdev_list(lag, &ndlist); } -#define ICE_LAG_SRIOV_CP_RECIPE 10 -#define ICE_LAG_SRIOV_TRAIN_PKT_LEN 16 - /** * ice_lag_cfg_cp_fltr - configure filter for control packets * @lag: local interface's lag struct @@ -783,8 +782,7 @@ ice_lag_cfg_cp_fltr(struct ice_lag *lag, bool add) vsi = lag->pf->vsi[0]; - buf_len = ICE_SW_RULE_RX_TX_HDR_SIZE(s_rule, - ICE_LAG_SRIOV_TRAIN_PKT_LEN); + buf_len = ICE_SW_RULE_RX_TX_HDR_SIZE(s_rule, ICE_TRAIN_PKT_LEN); s_rule = kzalloc(buf_len, GFP_KERNEL); if (!s_rule) { netdev_warn(lag->netdev, "-ENOMEM error configuring CP filter\n"); @@ -799,8 +797,8 @@ ice_lag_cfg_cp_fltr(struct ice_lag *lag, bool add) ICE_SINGLE_ACT_LAN_ENABLE | ICE_SINGLE_ACT_VALID_BIT | FIELD_PREP(ICE_SINGLE_ACT_VSI_ID_M, vsi->vsi_num)); - s_rule->hdr_len = cpu_to_le16(ICE_LAG_SRIOV_TRAIN_PKT_LEN); - memcpy(s_rule->hdr_data, lacp_train_pkt, LACP_TRAIN_PKT_LEN); + s_rule->hdr_len = cpu_to_le16(ICE_TRAIN_PKT_LEN); + memcpy(s_rule->hdr_data, lacp_train_pkt, ICE_TRAIN_PKT_LEN); opc = ice_aqc_opc_add_sw_rules; } else { opc = ice_aqc_opc_remove_sw_rules; From b2e97152df79ab4670ab47de34f2df5bd6b86005 Mon Sep 17 00:00:00 2001 From: Dave Ertman Date: Mon, 16 Jun 2025 13:03:19 +0200 Subject: [PATCH 0079/1536] ice: move LAG function in code to prepare for Active-Active In the SRIOV LAG Active-Active, the functions ice_lag_cfg_pf_fltr's and ice_lag_config_eswitch's content are moved to earlier locations in the source file. Also, ice_lag_cfg_pf_fltr is renamed, and its flow is changed. To reduce the delta in the larger patch, move the original functions to their new location so that only functional changes are needed in the larger patch. Reviewed-by: Przemek Kitszel Signed-off-by: Dave Ertman Tested-by: Sujai Buvaneswaran Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_lag.c | 147 ++++++++++++----------- 1 file changed, 74 insertions(+), 73 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c index 48efd8d27296..1f76536e6176 100644 --- a/drivers/net/ethernet/intel/ice/ice_lag.c +++ b/drivers/net/ethernet/intel/ice/ice_lag.c @@ -100,6 +100,28 @@ static bool netif_is_same_ice(struct ice_pf *pf, struct net_device *netdev) return true; } +/** + * ice_lag_config_eswitch - configure eswitch to work with LAG + * @lag: lag info struct + * @netdev: active network interface device struct + * + * Updates all port representors in eswitch to use @netdev for Tx. + * + * Configures the netdev to keep dst metadata (also used in representor Tx). + * This is required for an uplink without switchdev mode configured. + */ +static void ice_lag_config_eswitch(struct ice_lag *lag, + struct net_device *netdev) +{ + struct ice_repr *repr; + unsigned long id; + + xa_for_each(&lag->pf->eswitch.reprs, id, repr) + repr->dst->u.port_info.lower_dev = netdev; + + netif_keep_dst(netdev); +} + /** * ice_netdev_to_lag - return pointer to associated lag struct from netdev * @netdev: pointer to net_device struct to query @@ -354,6 +376,58 @@ ice_lag_cfg_pf_fltrs(struct ice_lag *lag, void *ptr) } } +/** + * ice_lag_cfg_cp_fltr - configure filter for control packets + * @lag: local interface's lag struct + * @add: add or remove rule + */ +static void +ice_lag_cfg_cp_fltr(struct ice_lag *lag, bool add) +{ + struct ice_sw_rule_lkup_rx_tx *s_rule = NULL; + struct ice_vsi *vsi; + u16 buf_len, opc; + + vsi = lag->pf->vsi[0]; + + buf_len = ICE_SW_RULE_RX_TX_HDR_SIZE(s_rule, ICE_TRAIN_PKT_LEN); + s_rule = kzalloc(buf_len, GFP_KERNEL); + if (!s_rule) { + netdev_warn(lag->netdev, "-ENOMEM error configuring CP filter\n"); + return; + } + + if (add) { + s_rule->hdr.type = cpu_to_le16(ICE_AQC_SW_RULES_T_LKUP_RX); + s_rule->recipe_id = cpu_to_le16(ICE_LAG_SRIOV_CP_RECIPE); + s_rule->src = cpu_to_le16(vsi->port_info->lport); + s_rule->act = cpu_to_le32(ICE_FWD_TO_VSI | + ICE_SINGLE_ACT_LAN_ENABLE | + ICE_SINGLE_ACT_VALID_BIT | + FIELD_PREP(ICE_SINGLE_ACT_VSI_ID_M, + vsi->vsi_num)); + s_rule->hdr_len = cpu_to_le16(ICE_TRAIN_PKT_LEN); + memcpy(s_rule->hdr_data, lacp_train_pkt, ICE_TRAIN_PKT_LEN); + opc = ice_aqc_opc_add_sw_rules; + } else { + opc = ice_aqc_opc_remove_sw_rules; + s_rule->index = cpu_to_le16(lag->cp_rule_idx); + } + if (ice_aq_sw_rules(&lag->pf->hw, s_rule, buf_len, 1, opc, NULL)) { + netdev_warn(lag->netdev, "Error %s CP rule for fail-over\n", + add ? "ADDING" : "REMOVING"); + goto err_cp_free; + } + + if (add) + lag->cp_rule_idx = le16_to_cpu(s_rule->index); + else + lag->cp_rule_idx = 0; + +err_cp_free: + kfree(s_rule); +} + /** * ice_display_lag_info - print LAG info * @lag: LAG info struct @@ -768,57 +842,6 @@ void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8 dst_prt) ice_lag_destroy_netdev_list(lag, &ndlist); } -/** - * ice_lag_cfg_cp_fltr - configure filter for control packets - * @lag: local interface's lag struct - * @add: add or remove rule - */ -static void -ice_lag_cfg_cp_fltr(struct ice_lag *lag, bool add) -{ - struct ice_sw_rule_lkup_rx_tx *s_rule = NULL; - struct ice_vsi *vsi; - u16 buf_len, opc; - - vsi = lag->pf->vsi[0]; - - buf_len = ICE_SW_RULE_RX_TX_HDR_SIZE(s_rule, ICE_TRAIN_PKT_LEN); - s_rule = kzalloc(buf_len, GFP_KERNEL); - if (!s_rule) { - netdev_warn(lag->netdev, "-ENOMEM error configuring CP filter\n"); - return; - } - - if (add) { - s_rule->hdr.type = cpu_to_le16(ICE_AQC_SW_RULES_T_LKUP_RX); - s_rule->recipe_id = cpu_to_le16(ICE_LAG_SRIOV_CP_RECIPE); - s_rule->src = cpu_to_le16(vsi->port_info->lport); - s_rule->act = cpu_to_le32(ICE_FWD_TO_VSI | - ICE_SINGLE_ACT_LAN_ENABLE | - ICE_SINGLE_ACT_VALID_BIT | - FIELD_PREP(ICE_SINGLE_ACT_VSI_ID_M, vsi->vsi_num)); - s_rule->hdr_len = cpu_to_le16(ICE_TRAIN_PKT_LEN); - memcpy(s_rule->hdr_data, lacp_train_pkt, ICE_TRAIN_PKT_LEN); - opc = ice_aqc_opc_add_sw_rules; - } else { - opc = ice_aqc_opc_remove_sw_rules; - s_rule->index = cpu_to_le16(lag->cp_rule_idx); - } - if (ice_aq_sw_rules(&lag->pf->hw, s_rule, buf_len, 1, opc, NULL)) { - netdev_warn(lag->netdev, "Error %s CP rule for fail-over\n", - add ? "ADDING" : "REMOVING"); - goto cp_free; - } - - if (add) - lag->cp_rule_idx = le16_to_cpu(s_rule->index); - else - lag->cp_rule_idx = 0; - -cp_free: - kfree(s_rule); -} - /** * ice_lag_prepare_vf_reset - helper to adjust vf lag for reset * @lag: lag struct for interface that owns VF @@ -1038,28 +1061,6 @@ static void ice_lag_link(struct ice_lag *lag) netdev_info(lag->netdev, "Shared SR-IOV resources in bond are active\n"); } -/** - * ice_lag_config_eswitch - configure eswitch to work with LAG - * @lag: lag info struct - * @netdev: active network interface device struct - * - * Updates all port representors in eswitch to use @netdev for Tx. - * - * Configures the netdev to keep dst metadata (also used in representor Tx). - * This is required for an uplink without switchdev mode configured. - */ -static void ice_lag_config_eswitch(struct ice_lag *lag, - struct net_device *netdev) -{ - struct ice_repr *repr; - unsigned long id; - - xa_for_each(&lag->pf->eswitch.reprs, id, repr) - repr->dst->u.port_info.lower_dev = netdev; - - netif_keep_dst(netdev); -} - /** * ice_lag_unlink - handle unlink event * @lag: LAG info struct From 148c8cb32b2f4a32c4b47998da20f2fd62cc38ca Mon Sep 17 00:00:00 2001 From: Dave Ertman Date: Mon, 16 Jun 2025 13:03:20 +0200 Subject: [PATCH 0080/1536] ice: Cleanup variable initialization in LAG code In preparation for implementing SRIOV Active-Active LAG support, cleanup several unneeded variable initializations in declaration blocks. Also move a couple of variable initializations into declaration block that should be there. Reviewed-by: Przemek Kitszel Signed-off-by: Dave Ertman Tested-by: Sujai Buvaneswaran Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_lag.c | 54 ++++++++---------------- 1 file changed, 17 insertions(+), 37 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c index 1f76536e6176..72284555b98a 100644 --- a/drivers/net/ethernet/intel/ice/ice_lag.c +++ b/drivers/net/ethernet/intel/ice/ice_lag.c @@ -234,13 +234,12 @@ ice_lag_cfg_fltr(struct ice_lag *lag, u32 act, u16 recipe_id, u16 *rule_idx, u8 direction, bool add) { struct ice_sw_rule_lkup_rx_tx *s_rule; + struct ice_hw *hw = &lag->pf->hw; u16 s_rule_sz, vsi_num; - struct ice_hw *hw; u8 *eth_hdr; u32 opc; int err; - hw = &lag->pf->hw; vsi_num = ice_get_hw_vsi_num(hw, 0); s_rule_sz = ICE_SW_RULE_RX_TX_ETH_HDR_SIZE(s_rule); @@ -384,12 +383,10 @@ ice_lag_cfg_pf_fltrs(struct ice_lag *lag, void *ptr) static void ice_lag_cfg_cp_fltr(struct ice_lag *lag, bool add) { - struct ice_sw_rule_lkup_rx_tx *s_rule = NULL; - struct ice_vsi *vsi; + struct ice_sw_rule_lkup_rx_tx *s_rule; + struct ice_vsi *vsi = lag->pf->vsi[0]; u16 buf_len, opc; - vsi = lag->pf->vsi[0]; - buf_len = ICE_SW_RULE_RX_TX_HDR_SIZE(s_rule, ICE_TRAIN_PKT_LEN); s_rule = kzalloc(buf_len, GFP_KERNEL); if (!s_rule) { @@ -477,12 +474,11 @@ static u16 ice_lag_qbuf_recfg(struct ice_hw *hw, struct ice_aqc_cfg_txqs_buf *qbuf, u16 vsi_num, u16 numq, u8 tc) { + struct ice_pf *pf = hw->back; struct ice_q_ctx *q_ctx; u16 qid, count = 0; - struct ice_pf *pf; int i; - pf = hw->back; for (i = 0; i < numq; i++) { q_ctx = ice_get_lan_q_ctx(hw, vsi_num, tc, i); if (!q_ctx) { @@ -940,13 +936,12 @@ ice_lag_reclaim_vf_tc(struct ice_lag *lag, struct ice_hw *src_hw, u16 vsi_num, u16 numq, valq, num_moved, qbuf_size; u16 buf_size = __struct_size(buf); struct ice_aqc_cfg_txqs_buf *qbuf; + struct ice_hw *hw = &lag->pf->hw; struct ice_sched_node *n_prt; __le32 teid, parent_teid; struct ice_vsi_ctx *ctx; - struct ice_hw *hw; u32 tmp_teid; - hw = &lag->pf->hw; ctx = ice_get_vsi_ctx(hw, vsi_num); if (!ctx) { dev_warn(dev, "Unable to locate VSI context for LAG reclaim\n"); @@ -1221,11 +1216,8 @@ ice_lag_set_swid(u16 primary_swid, struct ice_lag *local_lag, */ static void ice_lag_primary_swid(struct ice_lag *lag, bool link) { - struct ice_hw *hw; - u16 swid; - - hw = &lag->pf->hw; - swid = hw->port_info->sw_id; + struct ice_hw *hw = &lag->pf->hw; + u16 swid = hw->port_info->sw_id; if (ice_share_res(hw, ICE_AQC_RES_TYPE_SWID, link, swid)) dev_warn(ice_pf_to_dev(lag->pf), "Failure to set primary interface shared status\n"); @@ -1238,12 +1230,10 @@ static void ice_lag_primary_swid(struct ice_lag *lag, bool link) */ static void ice_lag_add_prune_list(struct ice_lag *lag, struct ice_pf *event_pf) { - u16 num_vsi, rule_buf_sz, vsi_list_id, event_vsi_num, prim_vsi_idx; - struct ice_sw_rule_vsi_list *s_rule = NULL; + u16 rule_buf_sz, vsi_list_id, event_vsi_num, prim_vsi_idx, num_vsi = 1; + struct ice_sw_rule_vsi_list *s_rule; struct device *dev; - num_vsi = 1; - dev = ice_pf_to_dev(lag->pf); event_vsi_num = event_pf->vsi[0]->vsi_num; prim_vsi_idx = lag->pf->vsi[0]->idx; @@ -1279,12 +1269,10 @@ static void ice_lag_add_prune_list(struct ice_lag *lag, struct ice_pf *event_pf) */ static void ice_lag_del_prune_list(struct ice_lag *lag, struct ice_pf *event_pf) { - u16 num_vsi, vsi_num, vsi_idx, rule_buf_sz, vsi_list_id; - struct ice_sw_rule_vsi_list *s_rule = NULL; + u16 vsi_num, vsi_idx, rule_buf_sz, vsi_list_id, num_vsi = 1; + struct ice_sw_rule_vsi_list *s_rule; struct device *dev; - num_vsi = 1; - dev = ice_pf_to_dev(lag->pf); vsi_num = event_pf->vsi[0]->vsi_num; vsi_idx = lag->pf->vsi[0]->idx; @@ -1707,11 +1695,9 @@ static void ice_lag_chk_disabled_bond(struct ice_lag *lag, void *ptr) */ static void ice_lag_disable_sriov_bond(struct ice_lag *lag) { - struct ice_netdev_priv *np; - struct ice_pf *pf; + struct ice_netdev_priv *np = netdev_priv(lag->netdev); + struct ice_pf *pf = np->vsi->back; - np = netdev_priv(lag->netdev); - pf = np->vsi->back; ice_clear_feature_support(pf, ICE_F_SRIOV_LAG); } @@ -1880,10 +1866,8 @@ ice_lag_event_handler(struct notifier_block *notif_blk, unsigned long event, */ static int ice_register_lag_handler(struct ice_lag *lag) { + struct notifier_block *notif_blk = &lag->notif_block; struct device *dev = ice_pf_to_dev(lag->pf); - struct notifier_block *notif_blk; - - notif_blk = &lag->notif_block; if (!notif_blk->notifier_call) { notif_blk->notifier_call = ice_lag_event_handler; @@ -1903,10 +1887,9 @@ static int ice_register_lag_handler(struct ice_lag *lag) */ static void ice_unregister_lag_handler(struct ice_lag *lag) { + struct notifier_block *notif_blk = &lag->notif_block; struct device *dev = ice_pf_to_dev(lag->pf); - struct notifier_block *notif_blk; - notif_blk = &lag->notif_block; if (notif_blk->notifier_call) { unregister_netdevice_notifier(notif_blk); dev_dbg(dev, "LAG event handler unregistered\n"); @@ -1968,13 +1951,12 @@ ice_lag_move_vf_nodes_tc_sync(struct ice_lag *lag, struct ice_hw *dest_hw, u16 numq, valq, num_moved, qbuf_size; u16 buf_size = __struct_size(buf); struct ice_aqc_cfg_txqs_buf *qbuf; + struct ice_hw *hw = &lag->pf->hw; struct ice_sched_node *n_prt; __le32 teid, parent_teid; struct ice_vsi_ctx *ctx; - struct ice_hw *hw; u32 tmp_teid; - hw = &lag->pf->hw; ctx = ice_get_vsi_ctx(hw, vsi_num); if (!ctx) { dev_warn(dev, "LAG rebuild failed after reset due to VSI Context failure\n"); @@ -2165,9 +2147,7 @@ lag_error: */ void ice_deinit_lag(struct ice_pf *pf) { - struct ice_lag *lag; - - lag = pf->lag; + struct ice_lag *lag = pf->lag; if (!lag) return; From fb2f2a86f0cd9690357b9bb67af00d386a7e819f Mon Sep 17 00:00:00 2001 From: Dave Ertman Date: Mon, 16 Jun 2025 13:03:21 +0200 Subject: [PATCH 0081/1536] ice: cleanup capabilities evaluation When evaluating the capabilities field, the ICE_AQC_BIT_ROCEV2_LAG and ICE_AQC_BIT_SRIOV_LAG defines were both not using the BIT operator, instead simply setting a hex value that set the correct bits. While not inaccurate, this method is misleading, and when it is expanded in the following implementation it becomes even more confusing. Switch to using the BIT() operator to clarify what is being checked. Reviewed-by: Przemek Kitszel Reviewed-by: Aleksandr Loktionov Reviewed-by: Marcin Szycik Signed-off-by: Dave Ertman Tested-by: Sujai Buvaneswaran Signed-off-by: Tony Nguyen --- include/linux/net/intel/libie/adminq.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/net/intel/libie/adminq.h b/include/linux/net/intel/libie/adminq.h index 012b5d499c1a..dbe93f940ef0 100644 --- a/include/linux/net/intel/libie/adminq.h +++ b/include/linux/net/intel/libie/adminq.h @@ -192,8 +192,8 @@ LIBIE_CHECK_STRUCT_LEN(16, libie_aqc_list_caps); #define LIBIE_AQC_CAPS_TX_SCHED_TOPO_COMP_MODE 0x0085 #define LIBIE_AQC_CAPS_NAC_TOPOLOGY 0x0087 #define LIBIE_AQC_CAPS_FW_LAG_SUPPORT 0x0092 -#define LIBIE_AQC_BIT_ROCEV2_LAG 0x01 -#define LIBIE_AQC_BIT_SRIOV_LAG 0x02 +#define LIBIE_AQC_BIT_ROCEV2_LAG BIT(0) +#define LIBIE_AQC_BIT_SRIOV_LAG BIT(1) #define LIBIE_AQC_CAPS_FLEX10 0x00F1 #define LIBIE_AQC_CAPS_CEM 0x00F2 From 28f073b38372b99d8d33ff5e63897d28419bda20 Mon Sep 17 00:00:00 2001 From: Dave Ertman Date: Mon, 16 Jun 2025 13:03:23 +0200 Subject: [PATCH 0082/1536] ice: Implement support for SRIOV VFs across Active/Active bonds This patch implements the software flows to handle SRIOV VF communication across an Active/Active link aggregate. The same restrictions apply as are in place for the support of Active/Backup bonds. - the two interfaces must be on the same NIC - the FW LLDP engine needs to be disabled - the DDP package that supports VF LAG must be loaded on device - the two interfaces must have the same QoS config - only the first interface added to the bond will have VF support - the interface with VFs must be in switchdev mode With the additional requirement of - the version of the FW on the NIC needs to have VF Active/Active support This requirement is indicated in the capabilities struct associated with the NVM loaded on the NIC. The balancing of traffic between the two interfaces is done on a queue basis. Taking the queues allocated to all of the VFs as a whole, one half of them will be distributed to each interface. When a link goes down, then the queues allocated to the down interface will migrate to the active port. When the down port comes back up, then the same queues as were originally assigned there will be moved back. Co-developed-by: Marcin Szycik Signed-off-by: Marcin Szycik Signed-off-by: Dave Ertman Tested-by: Sujai Buvaneswaran Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice.h | 1 + .../net/ethernet/intel/ice/ice_adminq_cmd.h | 4 + drivers/net/ethernet/intel/ice/ice_common.c | 15 +- drivers/net/ethernet/intel/ice/ice_common.h | 2 +- drivers/net/ethernet/intel/ice/ice_lag.c | 770 +++++++++++++++--- drivers/net/ethernet/intel/ice/ice_lag.h | 21 +- drivers/net/ethernet/intel/ice/ice_type.h | 1 + include/linux/net/intel/libie/adminq.h | 1 + 8 files changed, 712 insertions(+), 103 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 2098f00b3cd3..a83fdd22cbe4 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -203,6 +203,7 @@ enum ice_feature { ICE_F_GCS, ICE_F_ROCE_LAG, ICE_F_SRIOV_LAG, + ICE_F_SRIOV_AA_LAG, ICE_F_MBX_LIMIT, ICE_F_MAX }; diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h index 3bd3ea3af888..caae1780fd37 100644 --- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h +++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h @@ -2060,6 +2060,10 @@ struct ice_aqc_cfg_txqs { #define ICE_AQC_Q_CFG_SRC_PRT_M 0x7 #define ICE_AQC_Q_CFG_DST_PRT_S 3 #define ICE_AQC_Q_CFG_DST_PRT_M (0x7 << ICE_AQC_Q_CFG_DST_PRT_S) +#define ICE_AQC_Q_CFG_MODE_M GENMASK(7, 6) +#define ICE_AQC_Q_CFG_MODE_SAME_PF 0x0 +#define ICE_AQC_Q_CFG_MODE_GIVE_OWN 0x1 +#define ICE_AQC_Q_CFG_MODE_KEEP_OWN 0x2 u8 time_out; #define ICE_AQC_Q_CFG_TIMEOUT_S 2 #define ICE_AQC_Q_CFG_TIMEOUT_M (0x1F << ICE_AQC_Q_CFG_TIMEOUT_S) diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c index 209f42045fea..808870539667 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.c +++ b/drivers/net/ethernet/intel/ice/ice_common.c @@ -2424,6 +2424,9 @@ ice_parse_common_caps(struct ice_hw *hw, struct ice_hw_common_caps *caps, caps->sriov_lag = number & LIBIE_AQC_BIT_SRIOV_LAG; ice_debug(hw, ICE_DBG_INIT, "%s: sriov_lag = %u\n", prefix, caps->sriov_lag); + caps->sriov_aa_lag = number & LIBIE_AQC_BIT_SRIOV_AA_LAG; + ice_debug(hw, ICE_DBG_INIT, "%s: sriov_aa_lag = %u\n", + prefix, caps->sriov_aa_lag); break; case LIBIE_AQC_CAPS_TX_SCHED_TOPO_COMP_MODE: caps->tx_sched_topo_comp_mode_en = (number == 1); @@ -4712,24 +4715,24 @@ do_aq: } /** - * ice_aq_cfg_lan_txq + * ice_aq_cfg_lan_txq - send AQ command 0x0C32 to FW * @hw: pointer to the hardware structure * @buf: buffer for command * @buf_size: size of buffer in bytes * @num_qs: number of queues being configured * @oldport: origination lport * @newport: destination lport + * @mode: cmd_type for move to use * @cd: pointer to command details structure or NULL * * Move/Configure LAN Tx queue (0x0C32) * - * There is a better AQ command to use for moving nodes, so only coding - * this one for configuring the node. + * Return: Zero on success, associated error code on failure. */ int ice_aq_cfg_lan_txq(struct ice_hw *hw, struct ice_aqc_cfg_txqs_buf *buf, u16 buf_size, u16 num_qs, u8 oldport, u8 newport, - struct ice_sq_cd *cd) + u8 mode, struct ice_sq_cd *cd) { struct ice_aqc_cfg_txqs *cmd; struct libie_aq_desc desc; @@ -4742,10 +4745,12 @@ ice_aq_cfg_lan_txq(struct ice_hw *hw, struct ice_aqc_cfg_txqs_buf *buf, if (!buf) return -EINVAL; - cmd->cmd_type = ICE_AQC_Q_CFG_TC_CHNG; + cmd->cmd_type = mode; cmd->num_qs = num_qs; cmd->port_num_chng = (oldport & ICE_AQC_Q_CFG_SRC_PRT_M); cmd->port_num_chng |= FIELD_PREP(ICE_AQC_Q_CFG_DST_PRT_M, newport); + cmd->port_num_chng |= FIELD_PREP(ICE_AQC_Q_CFG_MODE_M, + ICE_AQC_Q_CFG_MODE_KEEP_OWN); cmd->time_out = FIELD_PREP(ICE_AQC_Q_CFG_TIMEOUT_M, 5); cmd->blocked_cgds = 0; diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h index 60320cdf7804..dba15ad315a6 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.h +++ b/drivers/net/ethernet/intel/ice/ice_common.h @@ -270,7 +270,7 @@ ice_ena_vsi_txq(struct ice_port_info *pi, u16 vsi_handle, u8 tc, u16 q_handle, int ice_aq_cfg_lan_txq(struct ice_hw *hw, struct ice_aqc_cfg_txqs_buf *buf, u16 buf_size, u16 num_qs, u8 oldport, u8 newport, - struct ice_sq_cd *cd); + u8 mode, struct ice_sq_cd *cd); int ice_replay_vsi(struct ice_hw *hw, u16 vsi_handle); void ice_replay_post(struct ice_hw *hw); struct ice_q_ctx * diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c index 72284555b98a..80312e1dcf7f 100644 --- a/drivers/net/ethernet/intel/ice/ice_lag.c +++ b/drivers/net/ethernet/intel/ice/ice_lag.c @@ -14,6 +14,9 @@ static const u8 lacp_train_pkt[ICE_TRAIN_PKT_LEN] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x88, 0x09, 0, 0 }; +static const u8 act_act_train_pkt[ICE_TRAIN_PKT_LEN] = { 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0 }; #define ICE_RECIPE_LEN 64 #define ICE_LAG_SRIOV_CP_RECIPE 10 @@ -48,10 +51,10 @@ static void ice_lag_set_primary(struct ice_lag *lag) } /** - * ice_lag_set_backup - set PF LAG state to Backup + * ice_lag_set_bkup - set PF LAG state to Backup * @lag: LAG info struct */ -static void ice_lag_set_backup(struct ice_lag *lag) +static void ice_lag_set_bkup(struct ice_lag *lag) { struct ice_pf *pf = lag->pf; @@ -337,25 +340,15 @@ ice_lag_cfg_drop_fltr(struct ice_lag *lag, bool add) } /** - * ice_lag_cfg_pf_fltrs - set filters up for new active port + * ice_lag_cfg_pf_fltrs_act_bkup - set filters up for new active port * @lag: local interfaces lag struct - * @ptr: opaque data containing notifier event + * @bonding_info: netdev event bonding info */ static void -ice_lag_cfg_pf_fltrs(struct ice_lag *lag, void *ptr) +ice_lag_cfg_pf_fltrs_act_bkup(struct ice_lag *lag, + struct netdev_bonding_info *bonding_info) { - struct netdev_notifier_bonding_info *info = ptr; - struct netdev_bonding_info *bonding_info; - struct net_device *event_netdev; - struct device *dev; - - event_netdev = netdev_notifier_info_to_dev(ptr); - /* not for this netdev */ - if (event_netdev != lag->netdev) - return; - - bonding_info = &info->bonding_info; - dev = ice_pf_to_dev(lag->pf); + struct device *dev = ice_pf_to_dev(lag->pf); /* interface not active - remove old default VSI rule */ if (bonding_info->slave.state && lag->pf_rx_rule_id) { @@ -376,12 +369,13 @@ ice_lag_cfg_pf_fltrs(struct ice_lag *lag, void *ptr) } /** - * ice_lag_cfg_cp_fltr - configure filter for control packets + * ice_lag_cfg_lp_fltr - configure lport filters * @lag: local interface's lag struct * @add: add or remove rule + * @cp: control packet only or general PF lport rule */ static void -ice_lag_cfg_cp_fltr(struct ice_lag *lag, bool add) +ice_lag_cfg_lp_fltr(struct ice_lag *lag, bool add, bool cp) { struct ice_sw_rule_lkup_rx_tx *s_rule; struct ice_vsi *vsi = lag->pf->vsi[0]; @@ -395,8 +389,17 @@ ice_lag_cfg_cp_fltr(struct ice_lag *lag, bool add) } if (add) { - s_rule->hdr.type = cpu_to_le16(ICE_AQC_SW_RULES_T_LKUP_RX); - s_rule->recipe_id = cpu_to_le16(ICE_LAG_SRIOV_CP_RECIPE); + if (cp) { + s_rule->recipe_id = + cpu_to_le16(ICE_LAG_SRIOV_CP_RECIPE); + memcpy(s_rule->hdr_data, lacp_train_pkt, + ICE_TRAIN_PKT_LEN); + } else { + s_rule->recipe_id = cpu_to_le16(lag->act_act_recipe); + memcpy(s_rule->hdr_data, act_act_train_pkt, + ICE_TRAIN_PKT_LEN); + } + s_rule->src = cpu_to_le16(vsi->port_info->lport); s_rule->act = cpu_to_le32(ICE_FWD_TO_VSI | ICE_SINGLE_ACT_LAN_ENABLE | @@ -404,27 +407,66 @@ ice_lag_cfg_cp_fltr(struct ice_lag *lag, bool add) FIELD_PREP(ICE_SINGLE_ACT_VSI_ID_M, vsi->vsi_num)); s_rule->hdr_len = cpu_to_le16(ICE_TRAIN_PKT_LEN); - memcpy(s_rule->hdr_data, lacp_train_pkt, ICE_TRAIN_PKT_LEN); + s_rule->hdr.type = cpu_to_le16(ICE_AQC_SW_RULES_T_LKUP_RX); opc = ice_aqc_opc_add_sw_rules; } else { opc = ice_aqc_opc_remove_sw_rules; - s_rule->index = cpu_to_le16(lag->cp_rule_idx); + if (cp) + s_rule->index = cpu_to_le16(lag->cp_rule_idx); + else + s_rule->index = cpu_to_le16(lag->act_act_rule_idx); } if (ice_aq_sw_rules(&lag->pf->hw, s_rule, buf_len, 1, opc, NULL)) { - netdev_warn(lag->netdev, "Error %s CP rule for fail-over\n", - add ? "ADDING" : "REMOVING"); + netdev_warn(lag->netdev, "Error %s %s rule for aggregate\n", + add ? "ADDING" : "REMOVING", + cp ? "CONTROL PACKET" : "LPORT"); goto err_cp_free; } - if (add) - lag->cp_rule_idx = le16_to_cpu(s_rule->index); - else - lag->cp_rule_idx = 0; + if (add) { + if (cp) + lag->cp_rule_idx = le16_to_cpu(s_rule->index); + else + lag->act_act_rule_idx = le16_to_cpu(s_rule->index); + } else { + if (cp) + lag->cp_rule_idx = 0; + else + lag->act_act_rule_idx = 0; + } err_cp_free: kfree(s_rule); } +/** + * ice_lag_cfg_pf_fltrs - set filters up for PF traffic + * @lag: local interfaces lag struct + * @ptr: opaque data containing notifier event + */ +static void +ice_lag_cfg_pf_fltrs(struct ice_lag *lag, void *ptr) +{ + struct netdev_notifier_bonding_info *info = ptr; + struct netdev_bonding_info *bonding_info; + struct net_device *event_netdev; + + event_netdev = netdev_notifier_info_to_dev(ptr); + if (event_netdev != lag->netdev) + return; + + bonding_info = &info->bonding_info; + + if (lag->bond_aa) { + if (lag->need_fltr_cfg) { + ice_lag_cfg_lp_fltr(lag, true, false); + lag->need_fltr_cfg = false; + } + } else { + ice_lag_cfg_pf_fltrs_act_bkup(lag, bonding_info); + } +} + /** * ice_display_lag_info - print LAG info * @lag: LAG info struct @@ -648,7 +690,7 @@ ice_lag_move_vf_node_tc(struct ice_lag *lag, u8 oldport, u8 newport, } if (ice_aq_cfg_lan_txq(&lag->pf->hw, qbuf, qbuf_size, valq, oldport, - newport, NULL)) { + newport, ICE_AQC_Q_CFG_TC_CHNG, NULL)) { dev_warn(dev, "Failure to configure queues for LAG failover\n"); goto qbuf_err; } @@ -784,10 +826,17 @@ void ice_lag_move_new_vf_nodes(struct ice_vf *vf) if (lag->upper_netdev) ice_lag_build_netdev_list(lag, &ndlist); - if (ice_is_feature_supported(pf, ICE_F_SRIOV_LAG) && - lag->bonded && lag->primary && pri_port != act_port && - !list_empty(lag->netdev_head)) - ice_lag_move_single_vf_nodes(lag, pri_port, act_port, vsi->idx); + if (lag->bonded && lag->primary && !list_empty(lag->netdev_head)) { + if (lag->bond_aa && + ice_is_feature_supported(pf, ICE_F_SRIOV_AA_LAG)) + ice_lag_aa_failover(lag, ICE_LAGS_IDX, NULL); + + if (!lag->bond_aa && + ice_is_feature_supported(pf, ICE_F_SRIOV_LAG) && + pri_port != act_port) + ice_lag_move_single_vf_nodes(lag, pri_port, act_port, + vsi->idx); + } ice_lag_destroy_netdev_list(lag, &ndlist); @@ -851,11 +900,20 @@ u8 ice_lag_prepare_vf_reset(struct ice_lag *lag) u8 pri_prt, act_prt; if (lag && lag->bonded && lag->primary && lag->upper_netdev) { - pri_prt = lag->pf->hw.port_info->lport; - act_prt = lag->active_port; - if (act_prt != pri_prt && act_prt != ICE_LAG_INVALID_PORT) { - ice_lag_move_vf_nodes_cfg(lag, act_prt, pri_prt); - return act_prt; + if (!lag->bond_aa) { + pri_prt = lag->pf->hw.port_info->lport; + act_prt = lag->active_port; + if (act_prt != pri_prt && + act_prt != ICE_LAG_INVALID_PORT) { + ice_lag_move_vf_nodes_cfg(lag, act_prt, pri_prt); + return act_prt; + } + } else { + if (lag->port_bitmap & ICE_LAGS_M) { + lag->port_bitmap &= ~ICE_LAGS_M; + ice_lag_aa_failover(lag, ICE_LAGP_IDX, NULL); + lag->port_bitmap |= ICE_LAGS_M; + } } } @@ -873,10 +931,15 @@ void ice_lag_complete_vf_reset(struct ice_lag *lag, u8 act_prt) { u8 pri_prt; - if (lag && lag->bonded && lag->primary && - act_prt != ICE_LAG_INVALID_PORT) { - pri_prt = lag->pf->hw.port_info->lport; - ice_lag_move_vf_nodes_cfg(lag, pri_prt, act_prt); + if (lag && lag->bonded && lag->primary) { + if (!lag->bond_aa) { + pri_prt = lag->pf->hw.port_info->lport; + if (act_prt != ICE_LAG_INVALID_PORT) + ice_lag_move_vf_nodes_cfg(lag, pri_prt, + act_prt); + } else { + ice_lag_aa_failover(lag, ICE_LAGS_IDX, NULL); + } } } @@ -912,7 +975,7 @@ static void ice_lag_info_event(struct ice_lag *lag, void *ptr) } if (bonding_info->slave.state) - ice_lag_set_backup(lag); + ice_lag_set_bkup(lag); else ice_lag_set_primary(lag); @@ -920,6 +983,295 @@ lag_out: ice_display_lag_info(lag); } +/** + * ice_lag_aa_qbuf_recfg - fill a single queue buffer for recfg cmd + * @hw: HW struct that contains the queue context + * @qbuf: pointer to single queue buffer + * @vsi_num: index of the VF VSI in PF space + * @qnum: queue index + * + * Return: Zero on success, error code on failure. + */ +static int +ice_lag_aa_qbuf_recfg(struct ice_hw *hw, struct ice_aqc_cfg_txqs_buf *qbuf, + u16 vsi_num, int qnum) +{ + struct ice_pf *pf = hw->back; + struct ice_q_ctx *q_ctx; + u16 q_id; + + q_ctx = ice_get_lan_q_ctx(hw, vsi_num, 0, qnum); + if (!q_ctx) { + dev_dbg(ice_hw_to_dev(hw), "LAG queue %d no Q context\n", qnum); + return -ENOENT; + } + + if (q_ctx->q_teid == ICE_INVAL_TEID) { + dev_dbg(ice_hw_to_dev(hw), "LAG queue %d INVAL TEID\n", qnum); + return -EINVAL; + } + + if (q_ctx->q_handle == ICE_INVAL_Q_HANDLE) { + dev_dbg(ice_hw_to_dev(hw), "LAG queue %d INVAL Q HANDLE\n", qnum); + return -EINVAL; + } + + q_id = pf->vsi[vsi_num]->txq_map[q_ctx->q_handle]; + qbuf->queue_info[0].q_handle = cpu_to_le16(q_id); + qbuf->queue_info[0].tc = 0; + qbuf->queue_info[0].q_teid = cpu_to_le32(q_ctx->q_teid); + + return 0; +} + +/** + * ice_lag_aa_move_vf_qs - Move some/all VF queues to destination + * @lag: primary interface's lag struct + * @dest: index of destination port + * @vsi_num: index of VF VSI in PF space + * @all: if true move all queues to destination + * @odd: VF wide q indicator for odd/even + * @e_pf: PF struct for the event interface + * + * the parameter "all" is to control whether we are splitting the queues + * between two interfaces or moving them all to the destination interface + */ +static void ice_lag_aa_move_vf_qs(struct ice_lag *lag, u8 dest, u16 vsi_num, + bool all, bool *odd, struct ice_pf *e_pf) +{ + DEFINE_RAW_FLEX(struct ice_aqc_cfg_txqs_buf, qbuf, queue_info, 1); + struct ice_hw *old_hw, *new_hw, *pri_hw, *sec_hw; + struct device *dev = ice_pf_to_dev(lag->pf); + struct ice_vsi_ctx *pv_ctx, *sv_ctx; + struct ice_lag_netdev_list ndlist; + u16 num_q, qbuf_size, sec_vsi_num; + u8 pri_lport, sec_lport; + u32 pvf_teid, svf_teid; + u16 vf_id; + + vf_id = lag->pf->vsi[vsi_num]->vf->vf_id; + /* If sec_vf[] not defined, then no second interface to share with */ + if (lag->sec_vf[vf_id]) + sec_vsi_num = lag->sec_vf[vf_id]->idx; + else + return; + + pri_lport = lag->bond_lport_pri; + sec_lport = lag->bond_lport_sec; + + if (pri_lport == ICE_LAG_INVALID_PORT || + sec_lport == ICE_LAG_INVALID_PORT) + return; + + if (!e_pf) + ice_lag_build_netdev_list(lag, &ndlist); + + pri_hw = &lag->pf->hw; + if (e_pf && lag->pf != e_pf) + sec_hw = &e_pf->hw; + else + sec_hw = ice_lag_find_hw_by_lport(lag, sec_lport); + + if (!pri_hw || !sec_hw) + return; + + if (dest == ICE_LAGP_IDX) { + struct ice_vsi *vsi; + + vsi = ice_get_main_vsi(lag->pf); + if (!vsi) + return; + + old_hw = sec_hw; + new_hw = pri_hw; + ice_lag_config_eswitch(lag, vsi->netdev); + } else { + struct ice_pf *sec_pf = sec_hw->back; + struct ice_vsi *vsi; + + vsi = ice_get_main_vsi(sec_pf); + if (!vsi) + return; + + old_hw = pri_hw; + new_hw = sec_hw; + ice_lag_config_eswitch(lag, vsi->netdev); + } + + pv_ctx = ice_get_vsi_ctx(pri_hw, vsi_num); + if (!pv_ctx) { + dev_warn(dev, "Unable to locate primary VSI %d context for LAG failover\n", + vsi_num); + return; + } + + sv_ctx = ice_get_vsi_ctx(sec_hw, sec_vsi_num); + if (!sv_ctx) { + dev_warn(dev, "Unable to locate secondary VSI %d context for LAG failover\n", + vsi_num); + return; + } + + num_q = pv_ctx->num_lan_q_entries[0]; + qbuf_size = __struct_size(qbuf); + + /* Suspend traffic for primary VSI VF */ + pvf_teid = le32_to_cpu(pv_ctx->sched.vsi_node[0]->info.node_teid); + ice_sched_suspend_resume_elems(pri_hw, 1, &pvf_teid, true); + + /* Suspend traffic for secondary VSI VF */ + svf_teid = le32_to_cpu(sv_ctx->sched.vsi_node[0]->info.node_teid); + ice_sched_suspend_resume_elems(sec_hw, 1, &svf_teid, true); + + for (int i = 0; i < num_q; i++) { + struct ice_sched_node *n_prt, *q_node, *parent; + struct ice_port_info *pi, *new_pi; + struct ice_vsi_ctx *src_ctx; + struct ice_sched_node *p; + struct ice_q_ctx *q_ctx; + u16 dst_vsi_num; + + pi = old_hw->port_info; + new_pi = new_hw->port_info; + + *odd = !(*odd); + if ((dest == ICE_LAGP_IDX && *odd && !all) || + (dest == ICE_LAGS_IDX && !(*odd) && !all) || + lag->q_home[vf_id][i] == dest) + continue; + + if (dest == ICE_LAGP_IDX) + dst_vsi_num = vsi_num; + else + dst_vsi_num = sec_vsi_num; + + n_prt = ice_sched_get_free_qparent(new_hw->port_info, + dst_vsi_num, 0, + ICE_SCHED_NODE_OWNER_LAN); + if (!n_prt) + continue; + + q_ctx = ice_get_lan_q_ctx(pri_hw, vsi_num, 0, i); + if (!q_ctx) + continue; + + if (dest == ICE_LAGP_IDX) + src_ctx = sv_ctx; + else + src_ctx = pv_ctx; + + q_node = ice_sched_find_node_by_teid(src_ctx->sched.vsi_node[0], + q_ctx->q_teid); + if (!q_node) + continue; + + qbuf->src_parent_teid = q_node->info.parent_teid; + qbuf->dst_parent_teid = n_prt->info.node_teid; + + /* Move the node in the HW/FW */ + if (ice_lag_aa_qbuf_recfg(pri_hw, qbuf, vsi_num, i)) + continue; + + if (dest == ICE_LAGP_IDX) + ice_aq_cfg_lan_txq(pri_hw, qbuf, qbuf_size, 1, + sec_lport, pri_lport, + ICE_AQC_Q_CFG_MOVE_TC_CHNG, + NULL); + else + ice_aq_cfg_lan_txq(pri_hw, qbuf, qbuf_size, 1, + pri_lport, sec_lport, + ICE_AQC_Q_CFG_MOVE_TC_CHNG, + NULL); + + /* Move the node in the SW */ + parent = q_node->parent; + if (!parent) + continue; + + for (int n = 0; n < parent->num_children; n++) { + int j; + + if (parent->children[n] != q_node) + continue; + + for (j = n + 1; j < parent->num_children; + j++) { + parent->children[j - 1] = + parent->children[j]; + } + parent->children[j] = NULL; + parent->num_children--; + break; + } + + p = pi->sib_head[0][q_node->tx_sched_layer]; + while (p) { + if (p->sibling == q_node) { + p->sibling = q_node->sibling; + break; + } + p = p->sibling; + } + + if (pi->sib_head[0][q_node->tx_sched_layer] == q_node) + pi->sib_head[0][q_node->tx_sched_layer] = + q_node->sibling; + + q_node->parent = n_prt; + q_node->info.parent_teid = n_prt->info.node_teid; + q_node->sibling = NULL; + p = new_pi->sib_head[0][q_node->tx_sched_layer]; + if (p) { + while (p) { + if (!p->sibling) { + p->sibling = q_node; + break; + } + p = p->sibling; + } + } else { + new_pi->sib_head[0][q_node->tx_sched_layer] = + q_node; + } + + n_prt->children[n_prt->num_children++] = q_node; + lag->q_home[vf_id][i] = dest; + } + + ice_sched_suspend_resume_elems(pri_hw, 1, &pvf_teid, false); + ice_sched_suspend_resume_elems(sec_hw, 1, &svf_teid, false); + + if (!e_pf) + ice_lag_destroy_netdev_list(lag, &ndlist); +} + +/** + * ice_lag_aa_failover - move VF queues in A/A mode + * @lag: primary lag struct + * @dest: index of destination port + * @e_pf: PF struct for event port + */ +void ice_lag_aa_failover(struct ice_lag *lag, u8 dest, struct ice_pf *e_pf) +{ + bool odd = true, all = false; + int i; + + /* Primary can be a target if down (cleanup), but secondary can't */ + if (dest == ICE_LAGS_IDX && !(lag->port_bitmap & ICE_LAGS_M)) + return; + + /* Move all queues to a destination if only one port is active, + * or no ports are active and dest is primary. + */ + if ((lag->port_bitmap ^ (ICE_LAGP_M | ICE_LAGS_M)) || + (!lag->port_bitmap && dest == ICE_LAGP_IDX)) + all = true; + + ice_for_each_vsi(lag->pf, i) + if (lag->pf->vsi[i] && lag->pf->vsi[i]->type == ICE_VSI_VF) + ice_lag_aa_move_vf_qs(lag, dest, i, all, &odd, e_pf); +} + /** * ice_lag_reclaim_vf_tc - move scheduling nodes back to primary interface * @lag: primary interface lag struct @@ -982,7 +1334,7 @@ ice_lag_reclaim_vf_tc(struct ice_lag *lag, struct ice_hw *src_hw, u16 vsi_num, if (ice_aq_cfg_lan_txq(hw, qbuf, qbuf_size, numq, src_hw->port_info->lport, hw->port_info->lport, - NULL)) { + ICE_AQC_Q_CFG_TC_CHNG, NULL)) { dev_warn(dev, "Failure to configure queues for LAG failover\n"); goto reclaim_qerr; } @@ -1053,14 +1405,15 @@ static void ice_lag_link(struct ice_lag *lag) lag->bonded = true; lag->role = ICE_LAG_UNSET; + lag->need_fltr_cfg = true; netdev_info(lag->netdev, "Shared SR-IOV resources in bond are active\n"); } /** - * ice_lag_unlink - handle unlink event + * ice_lag_act_bkup_unlink - handle unlink event for A/B bond * @lag: LAG info struct */ -static void ice_lag_unlink(struct ice_lag *lag) +static void ice_lag_act_bkup_unlink(struct ice_lag *lag) { u8 pri_port, act_port, loc_port; struct ice_pf *pf = lag->pf; @@ -1096,10 +1449,32 @@ static void ice_lag_unlink(struct ice_lag *lag) } } } +} - lag->bonded = false; - lag->role = ICE_LAG_NONE; - lag->upper_netdev = NULL; +/** + * ice_lag_aa_unlink - handle unlink event for Active-Active bond + * @lag: LAG info struct + */ +static void ice_lag_aa_unlink(struct ice_lag *lag) +{ + struct ice_lag *pri_lag; + + if (lag->primary) { + pri_lag = lag; + lag->port_bitmap &= ~ICE_LAGP_M; + } else { + pri_lag = ice_lag_find_primary(lag); + if (pri_lag) + pri_lag->port_bitmap &= ICE_LAGS_M; + } + + if (pri_lag) { + ice_lag_aa_failover(pri_lag, ICE_LAGP_IDX, lag->pf); + if (lag->primary) + pri_lag->bond_lport_pri = ICE_LAG_INVALID_PORT; + else + pri_lag->bond_lport_sec = ICE_LAG_INVALID_PORT; + } } /** @@ -1115,10 +1490,20 @@ static void ice_lag_link_unlink(struct ice_lag *lag, void *ptr) if (netdev != lag->netdev) return; - if (info->linking) + if (info->linking) { ice_lag_link(lag); - else - ice_lag_unlink(lag); + } else { + if (lag->bond_aa) + ice_lag_aa_unlink(lag); + else + ice_lag_act_bkup_unlink(lag); + + lag->bonded = false; + lag->role = ICE_LAG_NONE; + lag->upper_netdev = NULL; + lag->bond_aa = false; + lag->need_fltr_cfg = false; + } } /** @@ -1320,6 +1705,11 @@ static void ice_lag_init_feature_support_flag(struct ice_pf *pf) ice_set_feature_support(pf, ICE_F_SRIOV_LAG); else ice_clear_feature_support(pf, ICE_F_SRIOV_LAG); + + if (caps->sriov_aa_lag && ice_pkg_has_lport_extract(&pf->hw)) + ice_set_feature_support(pf, ICE_F_SRIOV_AA_LAG); + else + ice_clear_feature_support(pf, ICE_F_SRIOV_AA_LAG); } /** @@ -1353,6 +1743,9 @@ static void ice_lag_changeupper_event(struct ice_lag *lag, void *ptr) /* Configure primary's SWID to be shared */ ice_lag_primary_swid(lag, true); primary_lag = lag; + lag->bond_lport_pri = lag->pf->hw.port_info->lport; + lag->bond_lport_sec = ICE_LAG_INVALID_PORT; + lag->port_bitmap = 0; } else { u16 swid; @@ -1362,16 +1755,29 @@ static void ice_lag_changeupper_event(struct ice_lag *lag, void *ptr) swid = primary_lag->pf->hw.port_info->sw_id; ice_lag_set_swid(swid, lag, true); ice_lag_add_prune_list(primary_lag, lag->pf); - ice_lag_cfg_drop_fltr(lag, true); + primary_lag->bond_lport_sec = + lag->pf->hw.port_info->lport; } /* add filter for primary control packets */ - ice_lag_cfg_cp_fltr(lag, true); + ice_lag_cfg_lp_fltr(lag, true, true); } else { if (!primary_lag && lag->primary) primary_lag = lag; + if (primary_lag) { + for (int i = 0; i < ICE_MAX_SRIOV_VFS; i++) { + if (primary_lag->sec_vf[i]) { + ice_vsi_release(primary_lag->sec_vf[i]); + primary_lag->sec_vf[i] = NULL; + } + } + } + if (!lag->primary) { ice_lag_set_swid(0, lag, false); + if (primary_lag) + primary_lag->bond_lport_sec = + ICE_LAG_INVALID_PORT; } else { if (primary_lag && lag->primary) { ice_lag_primary_swid(lag, false); @@ -1379,7 +1785,7 @@ static void ice_lag_changeupper_event(struct ice_lag *lag, void *ptr) } } /* remove filter for control packets */ - ice_lag_cfg_cp_fltr(lag, false); + ice_lag_cfg_lp_fltr(lag, false, !lag->bond_aa); } } @@ -1405,18 +1811,34 @@ static void ice_lag_monitor_link(struct ice_lag *lag, void *ptr) if (!netif_is_same_ice(lag->pf, event_netdev)) return; + if (info->upper_dev != lag->upper_netdev) + return; + + if (info->linking) + return; + pf = lag->pf; prim_hw = &pf->hw; prim_port = prim_hw->port_info->lport; - if (info->upper_dev != lag->upper_netdev) - return; + /* Since there are only two interfaces allowed in SRIOV+LAG, if + * one port is leaving, then nodes need to be on primary + * interface. + */ + if (lag->bond_aa) { + struct ice_netdev_priv *e_ndp; + struct ice_pf *e_pf; - if (!info->linking) { - /* Since there are only two interfaces allowed in SRIOV+LAG, if - * one port is leaving, then nodes need to be on primary - * interface. - */ + e_ndp = netdev_priv(event_netdev); + e_pf = e_ndp->vsi->back; + + if (lag->bond_lport_pri != ICE_LAG_INVALID_PORT && + lag->port_bitmap & ICE_LAGS_M) { + lag->port_bitmap &= ~ICE_LAGS_M; + ice_lag_aa_failover(lag, ICE_LAGP_IDX, e_pf); + lag->bond_lport_sec = ICE_LAG_INVALID_PORT; + } + } else { if (prim_port != lag->active_port && lag->active_port != ICE_LAG_INVALID_PORT) { active_hw = ice_lag_find_hw_by_lport(lag, @@ -1428,44 +1850,32 @@ static void ice_lag_monitor_link(struct ice_lag *lag, void *ptr) } /** - * ice_lag_monitor_active - main PF keep track of which port is active + * ice_lag_monitor_act_bkup - keep track of which port is active in A/B LAG * @lag: lag info struct - * @ptr: opaque data containing notifier event + * @b_info: bonding info + * @event_netdev: net_device got target netdev * * This function is for the primary PF to monitor changes in which port is * active and handle changes for SRIOV VF functionality */ -static void ice_lag_monitor_active(struct ice_lag *lag, void *ptr) +static void ice_lag_monitor_act_bkup(struct ice_lag *lag, + struct netdev_bonding_info *b_info, + struct net_device *event_netdev) { - struct netdev_notifier_bonding_info *info = ptr; - struct net_device *event_netdev, *event_upper; - struct netdev_bonding_info *bonding_info; struct ice_netdev_priv *event_np; struct ice_pf *pf, *event_pf; u8 prim_port, event_port; - if (!lag->primary) - return; - pf = lag->pf; if (!pf) return; - event_netdev = netdev_notifier_info_to_dev(ptr); - rcu_read_lock(); - event_upper = netdev_master_upper_dev_get_rcu(event_netdev); - rcu_read_unlock(); - if (!netif_is_ice(event_netdev) || event_upper != lag->upper_netdev) - return; - event_np = netdev_priv(event_netdev); event_pf = event_np->vsi->back; event_port = event_pf->hw.port_info->lport; prim_port = pf->hw.port_info->lport; - bonding_info = &info->bonding_info; - - if (!bonding_info->slave.state) { + if (!b_info->slave.state) { /* if no port is currently active, then nodes and filters exist * on primary port, check if we need to move them */ @@ -1501,6 +1911,128 @@ static void ice_lag_monitor_active(struct ice_lag *lag, void *ptr) } } +/** + * ice_lag_aa_clear_spoof - adjust the placeholder VSI spoofing for A/A LAG + * @vsi: placeholder VSI to adjust + */ +static void ice_lag_aa_clear_spoof(struct ice_vsi *vsi) +{ + ice_vsi_update_security(vsi, ice_vsi_ctx_clear_antispoof); +} + +/** + * ice_lag_monitor_act_act - Keep track of active ports in A/A LAG + * @lag: lag struct for primary interface + * @b_info: bonding_info for event + * @event_netdev: net_device for target netdev + */ +static void ice_lag_monitor_act_act(struct ice_lag *lag, + struct netdev_bonding_info *b_info, + struct net_device *event_netdev) +{ + struct ice_netdev_priv *event_np; + u8 prim_port, event_port; + struct ice_pf *event_pf; + + event_np = netdev_priv(event_netdev); + event_pf = event_np->vsi->back; + event_port = event_pf->hw.port_info->lport; + prim_port = lag->pf->hw.port_info->lport; + + if (b_info->slave.link == BOND_LINK_UP) { + /* Port is coming up */ + if (prim_port == event_port) { + /* Processing event for primary interface */ + if (lag->bond_lport_pri == ICE_LAG_INVALID_PORT) + return; + + if (!(lag->port_bitmap & ICE_LAGP_M)) { + /* Primary port was not marked up before, move + * some|all VF queues to it and mark as up + */ + lag->port_bitmap |= ICE_LAGP_M; + ice_lag_aa_failover(lag, ICE_LAGP_IDX, event_pf); + } + } else { + if (lag->bond_lport_sec == ICE_LAG_INVALID_PORT) + return; + + /* Create placeholder VSIs on secondary PF. + * The placeholder is necessary so that we have + * an element that represents the VF on the secondary + * interface's scheduling tree. This will be a tree + * root for scheduling nodes when they are moved to + * the secondary interface. + */ + if (!lag->sec_vf[0]) { + struct ice_vsi_cfg_params params = {}; + struct ice_vsi *nvsi; + struct ice_vf *vf; + unsigned int bkt; + + params.type = ICE_VSI_VF; + params.port_info = event_pf->hw.port_info; + params.flags = ICE_VSI_FLAG_INIT; + + ice_for_each_vf(lag->pf, bkt, vf) { + params.vf = vf; + nvsi = ice_vsi_setup(event_pf, + ¶ms); + ice_lag_aa_clear_spoof(nvsi); + lag->sec_vf[vf->vf_id] = nvsi; + } + } + + if (!(lag->port_bitmap & ICE_LAGS_M)) { + /* Secondary port was not marked up before, + * move some|all VF queues to it and mark as up + */ + lag->port_bitmap |= ICE_LAGS_M; + ice_lag_aa_failover(lag, ICE_LAGS_IDX, event_pf); + } + } + } else { + /* Port is going down */ + if (prim_port == event_port) { + lag->port_bitmap &= ~ICE_LAGP_M; + ice_lag_aa_failover(lag, ICE_LAGS_IDX, event_pf); + } else { + lag->port_bitmap &= ~ICE_LAGS_M; + ice_lag_aa_failover(lag, ICE_LAGP_IDX, event_pf); + } + } +} + +/** + * ice_lag_monitor_info - Calls relevant A/A or A/B monitoring function + * @lag: lag info struct + * @ptr: opaque data containing notifier event + * + * This function is for the primary PF to monitor changes in which port is + * active and handle changes for SRIOV VF functionality + */ +static void ice_lag_monitor_info(struct ice_lag *lag, void *ptr) +{ + struct netdev_notifier_bonding_info *info = ptr; + struct net_device *event_netdev, *event_upper; + struct netdev_bonding_info *bonding_info; + + if (!lag->primary) + return; + + event_netdev = netdev_notifier_info_to_dev(ptr); + bonding_info = &info->bonding_info; + rcu_read_lock(); + event_upper = netdev_master_upper_dev_get_rcu(event_netdev); + rcu_read_unlock(); + if (!netif_is_ice(event_netdev) || event_upper != lag->upper_netdev) + return; + + if (lag->bond_aa) + ice_lag_monitor_act_act(lag, bonding_info, event_netdev); + else + ice_lag_monitor_act_bkup(lag, bonding_info, event_netdev); +} /** * ice_lag_chk_comp - evaluate bonded interface for feature support * @lag: lag info struct @@ -1516,6 +2048,14 @@ ice_lag_chk_comp(struct ice_lag *lag, void *ptr) struct device *dev; int count = 0; + /* All members need to know if bond A/A or A/B */ + bonding_info = &info->bonding_info; + lag->bond_mode = bonding_info->master.bond_mode; + if (lag->bond_mode != BOND_MODE_ACTIVEBACKUP) + lag->bond_aa = true; + else + lag->bond_aa = false; + if (!lag->primary) return true; @@ -1536,12 +2076,9 @@ ice_lag_chk_comp(struct ice_lag *lag, void *ptr) return false; } - bonding_info = &info->bonding_info; - lag->bond_mode = bonding_info->master.bond_mode; - if (lag->bond_mode != BOND_MODE_ACTIVEBACKUP) { - dev_info(dev, "Bond Mode not ACTIVE-BACKUP - VF LAG disabled\n"); + if (lag->bond_aa && !ice_is_feature_supported(lag->pf, + ICE_F_SRIOV_AA_LAG)) return false; - } list_for_each(tmp, lag->netdev_head) { struct ice_dcbx_cfg *dcb_cfg, *peer_dcb_cfg; @@ -1699,6 +2236,26 @@ static void ice_lag_disable_sriov_bond(struct ice_lag *lag) struct ice_pf *pf = np->vsi->back; ice_clear_feature_support(pf, ICE_F_SRIOV_LAG); + ice_clear_feature_support(pf, ICE_F_SRIOV_AA_LAG); +} + +/** + * ice_lag_preset_drop_fltr - preset drop filter for A/B bonds + * @lag: local lag struct + * @ptr: opaque data containing event + * + * Sets the initial drop filter for secondary interface in an + * active-backup bond + */ +static void ice_lag_preset_drop_fltr(struct ice_lag *lag, void *ptr) +{ + struct net_device *netdev = netdev_notifier_info_to_dev(ptr); + + if (netdev != lag->netdev || lag->primary || !lag->need_fltr_cfg) + return; + + ice_lag_cfg_drop_fltr(lag, true); + lag->need_fltr_cfg = false; } /** @@ -1739,10 +2296,12 @@ static void ice_lag_process_event(struct work_struct *work) ice_lag_unregister(lag_work->lag, netdev); goto lag_cleanup; } - ice_lag_monitor_active(lag_work->lag, - &lag_work->info.bonding_info); ice_lag_cfg_pf_fltrs(lag_work->lag, &lag_work->info.bonding_info); + ice_lag_preset_drop_fltr(lag_work->lag, + &lag_work->info.bonding_info); + ice_lag_monitor_info(lag_work->lag, + &lag_work->info.bonding_info); } ice_lag_info_event(lag_work->lag, &lag_work->info.bonding_info); break; @@ -1993,7 +2552,8 @@ ice_lag_move_vf_nodes_tc_sync(struct ice_lag *lag, struct ice_hw *dest_hw, } if (ice_aq_cfg_lan_txq(hw, qbuf, qbuf_size, numq, hw->port_info->lport, - dest_hw->port_info->lport, NULL)) { + dest_hw->port_info->lport, + ICE_AQC_Q_CFG_TC_CHNG, NULL)) { dev_warn(dev, "Failure to configure queues for LAG reset rebuild\n"); goto sync_qerr; } @@ -2089,9 +2649,13 @@ int ice_init_lag(struct ice_pf *pf) lag->netdev = vsi->netdev; lag->role = ICE_LAG_NONE; lag->active_port = ICE_LAG_INVALID_PORT; + lag->port_bitmap = 0x0; lag->bonded = false; + lag->bond_aa = false; + lag->need_fltr_cfg = false; lag->upper_netdev = NULL; lag->notif_block.notifier_call = NULL; + memset(lag->sec_vf, 0, sizeof(lag->sec_vf)); err = ice_register_lag_handler(lag); if (err) { @@ -2109,6 +2673,11 @@ int ice_init_lag(struct ice_pf *pf) if (err) goto free_rcp_res; + err = ice_create_lag_recipe(&pf->hw, &lag->act_act_recipe, + ice_lport_rcp, 1); + if (err) + goto free_lport_res; + /* associate recipes to profiles */ for (n = 0; n < ICE_PROFID_IPV6_GTPU_IPV6_TCP_INNER; n++) { err = ice_aq_get_recipe_to_profile(&pf->hw, n, @@ -2118,7 +2687,8 @@ int ice_init_lag(struct ice_pf *pf) if (recipe_bits & BIT(ICE_SW_LKUP_DFLT)) { recipe_bits |= BIT(lag->pf_recipe) | - BIT(lag->lport_recipe); + BIT(lag->lport_recipe) | + BIT(lag->act_act_recipe); ice_aq_map_recipe_to_profile(&pf->hw, n, recipe_bits, NULL); } @@ -2129,9 +2699,13 @@ int ice_init_lag(struct ice_pf *pf) dev_dbg(dev, "INIT LAG complete\n"); return 0; +free_lport_res: + ice_free_hw_res(&pf->hw, ICE_AQC_RES_TYPE_RECIPE, 1, + &lag->lport_recipe); + free_rcp_res: ice_free_hw_res(&pf->hw, ICE_AQC_RES_TYPE_RECIPE, 1, - &pf->lag->pf_recipe); + &lag->pf_recipe); lag_error: kfree(lag); pf->lag = NULL; @@ -2216,11 +2790,15 @@ void ice_lag_rebuild(struct ice_pf *pf) ice_lag_move_vf_nodes_sync(prim_lag, &pf->hw); } - ice_lag_cfg_cp_fltr(lag, true); + if (!lag->bond_aa) { + ice_lag_cfg_lp_fltr(lag, true, true); + if (lag->pf_rx_rule_id) + if (ice_lag_cfg_dflt_fltr(lag, true)) + dev_err(ice_pf_to_dev(pf), "Error adding default VSI rule in rebuild\n"); + } else { + ice_lag_cfg_lp_fltr(lag, true, false); + } - if (lag->pf_rx_rule_id) - if (ice_lag_cfg_dflt_fltr(lag, true)) - dev_err(ice_pf_to_dev(pf), "Error adding default VSI rule in rebuild\n"); ice_clear_rdma_cap(pf); lag_rebuild_out: diff --git a/drivers/net/ethernet/intel/ice/ice_lag.h b/drivers/net/ethernet/intel/ice/ice_lag.h index 69347d9f986b..e2a0a782bdd7 100644 --- a/drivers/net/ethernet/intel/ice/ice_lag.h +++ b/drivers/net/ethernet/intel/ice/ice_lag.h @@ -14,7 +14,11 @@ enum ice_lag_role { ICE_LAG_UNSET }; -#define ICE_LAG_INVALID_PORT 0xFF +#define ICE_LAG_INVALID_PORT 0xFF +#define ICE_LAGP_IDX 0 +#define ICE_LAGS_IDX 1 +#define ICE_LAGP_M 0x1 +#define ICE_LAGS_M 0x2 #define ICE_LAG_RESET_RETRIES 5 #define ICE_SW_DEFAULT_PROFILE 0 @@ -41,12 +45,26 @@ struct ice_lag { u8 active_port; /* lport value for the current active port */ u8 bonded:1; /* currently bonded */ u8 primary:1; /* this is primary */ + u8 bond_aa:1; /* is this bond active-active */ + u8 need_fltr_cfg:1; /* fltrs for A/A bond still need to be make */ + u8 port_bitmap:2; /* bitmap of active ports */ + u8 bond_lport_pri; /* lport values for primary PF */ + u8 bond_lport_sec; /* lport values for secondary PF */ + + /* q_home keeps track of which interface the q is currently on */ + u8 q_home[ICE_MAX_SRIOV_VFS][ICE_MAX_RSS_QS_PER_VF]; + + /* placeholder VSI for hanging VF queues from on secondary interface */ + struct ice_vsi *sec_vf[ICE_MAX_SRIOV_VFS]; + u16 pf_recipe; u16 lport_recipe; + u16 act_act_recipe; u16 pf_rx_rule_id; u16 pf_tx_rule_id; u16 cp_rule_idx; u16 lport_rule_idx; + u16 act_act_rule_idx; u8 role; }; @@ -65,6 +83,7 @@ struct ice_lag_work { }; void ice_lag_move_new_vf_nodes(struct ice_vf *vf); +void ice_lag_aa_failover(struct ice_lag *lag, u8 dest, struct ice_pf *e_pf); int ice_init_lag(struct ice_pf *pf); void ice_deinit_lag(struct ice_pf *pf); void ice_lag_rebuild(struct ice_pf *pf); diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h index 2b07d5e48847..8d19efc1df72 100644 --- a/drivers/net/ethernet/intel/ice/ice_type.h +++ b/drivers/net/ethernet/intel/ice/ice_type.h @@ -296,6 +296,7 @@ struct ice_hw_common_caps { bool roce_lag; bool sriov_lag; + bool sriov_aa_lag; bool nvm_update_pending_nvm; bool nvm_update_pending_orom; diff --git a/include/linux/net/intel/libie/adminq.h b/include/linux/net/intel/libie/adminq.h index dbe93f940ef0..ba62f703df43 100644 --- a/include/linux/net/intel/libie/adminq.h +++ b/include/linux/net/intel/libie/adminq.h @@ -194,6 +194,7 @@ LIBIE_CHECK_STRUCT_LEN(16, libie_aqc_list_caps); #define LIBIE_AQC_CAPS_FW_LAG_SUPPORT 0x0092 #define LIBIE_AQC_BIT_ROCEV2_LAG BIT(0) #define LIBIE_AQC_BIT_SRIOV_LAG BIT(1) +#define LIBIE_AQC_BIT_SRIOV_AA_LAG BIT(2) #define LIBIE_AQC_CAPS_FLEX10 0x00F1 #define LIBIE_AQC_CAPS_CEM 0x00F2 From 355b82c54c122e59487c52c084a146101bedc2c8 Mon Sep 17 00:00:00 2001 From: Jijie Shao Date: Wed, 13 Aug 2025 20:45:42 +0800 Subject: [PATCH 0083/1536] net: phy: motorcomm: Add support for PHY LEDs on YT8521 Add minimal LED controller driver supporting the most common uses with the 'netdev' trigger. Signed-off-by: Jijie Shao Tested-by: Heiko Stuebner Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/20250813124542.3450447-1-shaojijie@huawei.com Signed-off-by: Jakub Kicinski --- drivers/net/phy/motorcomm.c | 117 ++++++++++++++++++++++++++++++++++++ 1 file changed, 117 insertions(+) diff --git a/drivers/net/phy/motorcomm.c b/drivers/net/phy/motorcomm.c index 0e91f5d1a4fd..a3593e663059 100644 --- a/drivers/net/phy/motorcomm.c +++ b/drivers/net/phy/motorcomm.c @@ -213,6 +213,20 @@ #define YT8521_RC1R_RGMII_2_100_NS 14 #define YT8521_RC1R_RGMII_2_250_NS 15 +/* LED CONFIG */ +#define YT8521_MAX_LEDS 3 +#define YT8521_LED0_CFG_REG 0xA00C +#define YT8521_LED1_CFG_REG 0xA00D +#define YT8521_LED2_CFG_REG 0xA00E +#define YT8521_LED_ACT_BLK_IND BIT(13) +#define YT8521_LED_FDX_ON_EN BIT(12) +#define YT8521_LED_HDX_ON_EN BIT(11) +#define YT8521_LED_TXACT_BLK_EN BIT(10) +#define YT8521_LED_RXACT_BLK_EN BIT(9) +#define YT8521_LED_1000_ON_EN BIT(6) +#define YT8521_LED_100_ON_EN BIT(5) +#define YT8521_LED_10_ON_EN BIT(4) + #define YTPHY_MISC_CONFIG_REG 0xA006 #define YTPHY_MCR_FIBER_SPEED_MASK BIT(0) #define YTPHY_MCR_FIBER_1000BX (0x1 << 0) @@ -1681,6 +1695,106 @@ err_restore_page: return phy_restore_page(phydev, old_page, ret); } +static const unsigned long supported_trgs = (BIT(TRIGGER_NETDEV_FULL_DUPLEX) | + BIT(TRIGGER_NETDEV_HALF_DUPLEX) | + BIT(TRIGGER_NETDEV_LINK) | + BIT(TRIGGER_NETDEV_LINK_10) | + BIT(TRIGGER_NETDEV_LINK_100) | + BIT(TRIGGER_NETDEV_LINK_1000) | + BIT(TRIGGER_NETDEV_RX) | + BIT(TRIGGER_NETDEV_TX)); + +static int yt8521_led_hw_is_supported(struct phy_device *phydev, u8 index, + unsigned long rules) +{ + if (index >= YT8521_MAX_LEDS) + return -EINVAL; + + /* All combinations of the supported triggers are allowed */ + if (rules & ~supported_trgs) + return -EOPNOTSUPP; + + return 0; +} + +static int yt8521_led_hw_control_set(struct phy_device *phydev, u8 index, + unsigned long rules) +{ + u16 val = 0; + + if (index >= YT8521_MAX_LEDS) + return -EINVAL; + + if (test_bit(TRIGGER_NETDEV_LINK, &rules)) { + val |= YT8521_LED_10_ON_EN; + val |= YT8521_LED_100_ON_EN; + val |= YT8521_LED_1000_ON_EN; + } + + if (test_bit(TRIGGER_NETDEV_LINK_10, &rules)) + val |= YT8521_LED_10_ON_EN; + + if (test_bit(TRIGGER_NETDEV_LINK_100, &rules)) + val |= YT8521_LED_100_ON_EN; + + if (test_bit(TRIGGER_NETDEV_LINK_1000, &rules)) + val |= YT8521_LED_1000_ON_EN; + + if (test_bit(TRIGGER_NETDEV_FULL_DUPLEX, &rules)) + val |= YT8521_LED_HDX_ON_EN; + + if (test_bit(TRIGGER_NETDEV_HALF_DUPLEX, &rules)) + val |= YT8521_LED_FDX_ON_EN; + + if (test_bit(TRIGGER_NETDEV_TX, &rules) || + test_bit(TRIGGER_NETDEV_RX, &rules)) + val |= YT8521_LED_ACT_BLK_IND; + + if (test_bit(TRIGGER_NETDEV_TX, &rules)) + val |= YT8521_LED_TXACT_BLK_EN; + + if (test_bit(TRIGGER_NETDEV_RX, &rules)) + val |= YT8521_LED_RXACT_BLK_EN; + + return ytphy_write_ext(phydev, YT8521_LED0_CFG_REG + index, val); +} + +static int yt8521_led_hw_control_get(struct phy_device *phydev, u8 index, + unsigned long *rules) +{ + int val; + + if (index >= YT8521_MAX_LEDS) + return -EINVAL; + + val = ytphy_read_ext(phydev, YT8521_LED0_CFG_REG + index); + if (val < 0) + return val; + + if (val & YT8521_LED_TXACT_BLK_EN || val & YT8521_LED_ACT_BLK_IND) + __set_bit(TRIGGER_NETDEV_TX, rules); + + if (val & YT8521_LED_RXACT_BLK_EN || val & YT8521_LED_ACT_BLK_IND) + __set_bit(TRIGGER_NETDEV_RX, rules); + + if (val & YT8521_LED_FDX_ON_EN) + __set_bit(TRIGGER_NETDEV_FULL_DUPLEX, rules); + + if (val & YT8521_LED_HDX_ON_EN) + __set_bit(TRIGGER_NETDEV_HALF_DUPLEX, rules); + + if (val & YT8521_LED_1000_ON_EN) + __set_bit(TRIGGER_NETDEV_LINK_1000, rules); + + if (val & YT8521_LED_100_ON_EN) + __set_bit(TRIGGER_NETDEV_LINK_100, rules); + + if (val & YT8521_LED_10_ON_EN) + __set_bit(TRIGGER_NETDEV_LINK_10, rules); + + return 0; +} + static int yt8531_config_init(struct phy_device *phydev) { struct device_node *node = phydev->mdio.dev.of_node; @@ -2920,6 +3034,9 @@ static struct phy_driver motorcomm_phy_drvs[] = { .soft_reset = yt8521_soft_reset, .suspend = yt8521_suspend, .resume = yt8521_resume, + .led_hw_is_supported = yt8521_led_hw_is_supported, + .led_hw_control_set = yt8521_led_hw_control_set, + .led_hw_control_get = yt8521_led_hw_control_get, }, { PHY_ID_MATCH_EXACT(PHY_ID_YT8531), From 34167f1a024d2c5abae0b0325a6d0b8257160f86 Mon Sep 17 00:00:00 2001 From: Markus Stockhausen Date: Wed, 13 Aug 2025 01:44:07 -0400 Subject: [PATCH 0084/1536] net: phy: realtek: convert RTL8226-CG to c45 only Short: Convert the RTL8226-CG to c45 so it can be used in its Realtek based ecosystems. Long: The RTL8226-CG can be mainly found on devices of the Realtek Otto switch platform. Devices like the Zyxel XGS1210-12 are based on it. These implement a hardware based phy polling in the background to update SoC status registers. The hardware provides 4 smi busses where phys are attached to. For each bus one can decide if it is polled in c45 or c22 mode. See https://svanheule.net/realtek/longan/register/smi_glb_ctrl With this setting the register access will be limited by the hardware. This is very complex (including caching and special c45-over-c22 handling). But basically it boils down to "enable protocol x and SoC will disable register access via protocol y". Mainline already gained support for the rtl9300 mdio driver in commit 24e31e474769 ("net: mdio: Add RTL9300 MDIO driver"). It covers the basic features, but a lot effort is still needed to understand hardware properly. So it runs a simple setup by selecting the proper bus mode during startup. /* Put the interfaces into C45 mode if required */ glb_ctrl_mask = GENMASK(19, 16); for (i = 0; i < MAX_SMI_BUSSES; i++) if (priv->smi_bus_is_c45[i]) glb_ctrl_val |= GLB_CTRL_INTF_SEL(i); ... err = regmap_update_bits(regmap, SMI_GLB_CTRL, glb_ctrl_mask, glb_ctrl_val); To avoid complex coding later on, it limits access by only providing either c22 or c45: bus->name = "Realtek Switch MDIO Bus"; if (priv->smi_bus_is_c45[mdio_bus]) { bus->read_c45 = rtl9300_mdio_read_c45; bus->write_c45 = rtl9300_mdio_write_c45; } else { bus->read = rtl9300_mdio_read_c22; bus->write = rtl9300_mdio_write_c22; } Because of these limitations the existing RTL8226 phy driver is not working at all on Realtek switches. Convert the driver to c45-only. Luckily the RTL8226 seems to support proper MDIO_PMA_EXTABLE flags. So standard function genphy_c45_pma_read_abilities() can call genphy_c45_pma_read_ext_abilities() and 10/100/1000 is populated right. Thus conversion is straight forward. Outputs before - REMARK: For this a "hacked" bus was used that toggles the mode for each c22/c45 access. But that is slow and produces unstable data in the SoC status registers). Settings for lan9: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full 2500baseT/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full 2500baseT/Full Advertised pause frame use: Symmetric Receive-only Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Speed: Unknown! Duplex: Unknown! (255) Port: Twisted Pair PHYAD: 24 Transceiver: external Auto-negotiation: on MDI-X: Unknown Supports Wake-on: d Wake-on: d Link detected: no Outputs with this commit: Settings for lan9: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full 2500baseT/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full 2500baseT/Full Advertised pause frame use: Symmetric Receive-only Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Speed: Unknown! Duplex: Unknown! (255) Port: Twisted Pair PHYAD: 24 Transceiver: external Auto-negotiation: on MDI-X: Unknown Supports Wake-on: d Wake-on: d Link detected: no Signed-off-by: Markus Stockhausen Link: https://patch.msgid.link/20250813054407.1108285-1-markus.stockhausen@gmx.de Signed-off-by: Jakub Kicinski --- drivers/net/phy/realtek/realtek_main.c | 28 +++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/drivers/net/phy/realtek/realtek_main.c b/drivers/net/phy/realtek/realtek_main.c index dd0d675149ad..a7541899e327 100644 --- a/drivers/net/phy/realtek/realtek_main.c +++ b/drivers/net/phy/realtek/realtek_main.c @@ -1280,6 +1280,21 @@ static int rtl822x_c45_read_status(struct phy_device *phydev) return 0; } +static int rtl822x_c45_soft_reset(struct phy_device *phydev) +{ + int ret, val; + + ret = phy_modify_mmd(phydev, MDIO_MMD_PMAPMD, MDIO_CTRL1, + MDIO_CTRL1_RESET, MDIO_CTRL1_RESET); + if (ret < 0) + return ret; + + return phy_read_mmd_poll_timeout(phydev, MDIO_MMD_PMAPMD, + MDIO_CTRL1, val, + !(val & MDIO_CTRL1_RESET), + 5000, 100000, true); +} + static int rtl822xb_c45_read_status(struct phy_device *phydev) { int ret; @@ -1675,13 +1690,12 @@ static struct phy_driver realtek_drvs[] = { }, { PHY_ID_MATCH_EXACT(0x001cc838), .name = "RTL8226-CG 2.5Gbps PHY", - .get_features = rtl822x_get_features, - .config_aneg = rtl822x_config_aneg, - .read_status = rtl822x_read_status, - .suspend = genphy_suspend, - .resume = rtlgen_resume, - .read_page = rtl821x_read_page, - .write_page = rtl821x_write_page, + .soft_reset = rtl822x_c45_soft_reset, + .get_features = rtl822x_c45_get_features, + .config_aneg = rtl822x_c45_config_aneg, + .read_status = rtl822x_c45_read_status, + .suspend = genphy_c45_pma_suspend, + .resume = rtlgen_c45_resume, }, { PHY_ID_MATCH_EXACT(0x001cc848), .name = "RTL8226B-CG_RTL8221B-CG 2.5Gbps PHY", From c6f68f69416d0950965e5744489382ccebdc72b4 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 13 Aug 2025 08:51:22 +0300 Subject: [PATCH 0085/1536] nfc: pn533: Delete an unnecessary check The "rc" variable is set like this: if (IS_ERR(resp)) { rc = PTR_ERR(resp); We know that "rc" is negative so there is no need to check. Signed-off-by: Dan Carpenter Reviewed-by: Krzysztof Kozlowski Link: https://patch.msgid.link/aJwn2ox5g9WsD2Vx@stanley.mountain Signed-off-by: Jakub Kicinski --- drivers/nfc/pn533/pn533.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/drivers/nfc/pn533/pn533.c b/drivers/nfc/pn533/pn533.c index 14661249c690..2b043a9f9533 100644 --- a/drivers/nfc/pn533/pn533.c +++ b/drivers/nfc/pn533/pn533.c @@ -1412,11 +1412,9 @@ static int pn533_autopoll_complete(struct pn533 *dev, void *arg, if (dev->poll_mod_count != 0) return rc; goto stop_poll; - } else if (rc < 0) { - nfc_err(dev->dev, - "Error %d when running autopoll\n", rc); - goto stop_poll; } + nfc_err(dev->dev, "Error %d when running autopoll\n", rc); + goto stop_poll; } nbtg = resp->data[0]; @@ -1505,11 +1503,9 @@ static int pn533_poll_complete(struct pn533 *dev, void *arg, if (dev->poll_mod_count != 0) return rc; goto stop_poll; - } else if (rc < 0) { - nfc_err(dev->dev, - "Error %d when running poll\n", rc); - goto stop_poll; } + nfc_err(dev->dev, "Error %d when running poll\n", rc); + goto stop_poll; } cur_mod = dev->poll_mod_active[dev->poll_mod_curr]; From 0ebc0bcd0aa0037019aac996c50166c7baf44ff8 Mon Sep 17 00:00:00 2001 From: Parav Pandit Date: Wed, 13 Aug 2025 12:44:16 +0300 Subject: [PATCH 0086/1536] devlink/port: Simplify return checks Drop always returning 0 from the helper routine and simplify its callers. Reviewed-by: Jiri Pirko Signed-off-by: Parav Pandit Link: https://patch.msgid.link/20250813094417.7269-2-parav@nvidia.com Signed-off-by: Jakub Kicinski --- net/devlink/port.c | 29 ++++++----------------------- 1 file changed, 6 insertions(+), 23 deletions(-) diff --git a/net/devlink/port.c b/net/devlink/port.c index cb8d4df61619..aaca1b23aa5f 100644 --- a/net/devlink/port.c +++ b/net/devlink/port.c @@ -1333,8 +1333,8 @@ int devlink_port_netdevice_event(struct notifier_block *nb, return NOTIFY_OK; } -static int __devlink_port_attrs_set(struct devlink_port *devlink_port, - enum devlink_port_flavour flavour) +static void __devlink_port_attrs_set(struct devlink_port *devlink_port, + enum devlink_port_flavour flavour) { struct devlink_port_attrs *attrs = &devlink_port->attrs; @@ -1347,7 +1347,6 @@ static int __devlink_port_attrs_set(struct devlink_port *devlink_port, } else { devlink_port->switch_port = false; } - return 0; } /** @@ -1359,14 +1358,10 @@ static int __devlink_port_attrs_set(struct devlink_port *devlink_port, void devlink_port_attrs_set(struct devlink_port *devlink_port, struct devlink_port_attrs *attrs) { - int ret; - ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); devlink_port->attrs = *attrs; - ret = __devlink_port_attrs_set(devlink_port, attrs->flavour); - if (ret) - return; + __devlink_port_attrs_set(devlink_port, attrs->flavour); WARN_ON(attrs->splittable && attrs->split); } EXPORT_SYMBOL_GPL(devlink_port_attrs_set); @@ -1383,14 +1378,10 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro u16 pf, bool external) { struct devlink_port_attrs *attrs = &devlink_port->attrs; - int ret; ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); - ret = __devlink_port_attrs_set(devlink_port, - DEVLINK_PORT_FLAVOUR_PCI_PF); - if (ret) - return; + __devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PCI_PF); attrs->pci_pf.controller = controller; attrs->pci_pf.pf = pf; attrs->pci_pf.external = external; @@ -1411,14 +1402,10 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro u16 pf, u16 vf, bool external) { struct devlink_port_attrs *attrs = &devlink_port->attrs; - int ret; ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); - ret = __devlink_port_attrs_set(devlink_port, - DEVLINK_PORT_FLAVOUR_PCI_VF); - if (ret) - return; + __devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PCI_VF); attrs->pci_vf.controller = controller; attrs->pci_vf.pf = pf; attrs->pci_vf.vf = vf; @@ -1439,14 +1426,10 @@ void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 contro u16 pf, u32 sf, bool external) { struct devlink_port_attrs *attrs = &devlink_port->attrs; - int ret; ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); - ret = __devlink_port_attrs_set(devlink_port, - DEVLINK_PORT_FLAVOUR_PCI_SF); - if (ret) - return; + __devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PCI_SF); attrs->pci_sf.controller = controller; attrs->pci_sf.pf = pf; attrs->pci_sf.sf = sf; From 41a6e8ab18642741437da932c2f5762b185e928c Mon Sep 17 00:00:00 2001 From: Parav Pandit Date: Wed, 13 Aug 2025 12:44:17 +0300 Subject: [PATCH 0087/1536] devlink/port: Check attributes early and constify Constify the devlink port attributes to indicate they are read only and does not depend on anything else. Therefore, validate it early before setting in the devlink port. Reviewed-by: Jiri Pirko Signed-off-by: Parav Pandit Link: https://patch.msgid.link/20250813094417.7269-3-parav@nvidia.com Signed-off-by: Jakub Kicinski --- include/net/devlink.h | 2 +- net/devlink/port.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/net/devlink.h b/include/net/devlink.h index b32c9ceeb81d..3119d053bc4d 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1743,7 +1743,7 @@ void devlink_port_type_ib_set(struct devlink_port *devlink_port, struct ib_device *ibdev); void devlink_port_type_clear(struct devlink_port *devlink_port); void devlink_port_attrs_set(struct devlink_port *devlink_port, - struct devlink_port_attrs *devlink_port_attrs); + const struct devlink_port_attrs *attrs); void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 controller, u16 pf, bool external); void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller, diff --git a/net/devlink/port.c b/net/devlink/port.c index aaca1b23aa5f..93d8a25bb920 100644 --- a/net/devlink/port.c +++ b/net/devlink/port.c @@ -1356,13 +1356,13 @@ static void __devlink_port_attrs_set(struct devlink_port *devlink_port, * @attrs: devlink port attrs */ void devlink_port_attrs_set(struct devlink_port *devlink_port, - struct devlink_port_attrs *attrs) + const struct devlink_port_attrs *attrs) { ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); + WARN_ON(attrs->splittable && attrs->split); devlink_port->attrs = *attrs; __devlink_port_attrs_set(devlink_port, attrs->flavour); - WARN_ON(attrs->splittable && attrs->split); } EXPORT_SYMBOL_GPL(devlink_port_attrs_set); From 4b6dc4c891cc270699c3214557621df7df7c05cc Mon Sep 17 00:00:00 2001 From: Liao Yuanhong Date: Wed, 13 Aug 2025 17:50:24 +0800 Subject: [PATCH 0088/1536] ptp: ptp_clockmatrix: Remove redundant semicolons Remove unnecessary semicolons. Signed-off-by: Liao Yuanhong Link: https://patch.msgid.link/20250813095024.559085-1-liaoyuanhong@vivo.com Signed-off-by: Jakub Kicinski --- drivers/ptp/ptp_clockmatrix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/ptp/ptp_clockmatrix.c b/drivers/ptp/ptp_clockmatrix.c index b8d4df8c6da2..59cd6bbb33f3 100644 --- a/drivers/ptp/ptp_clockmatrix.c +++ b/drivers/ptp/ptp_clockmatrix.c @@ -1161,7 +1161,7 @@ static int set_pll_output_mask(struct idtcm *idtcm, u16 addr, u8 val) SET_U16_MSB(idtcm->channel[3].output_mask, val); break; default: - err = -EFAULT; /* Bad address */; + err = -EFAULT; /* Bad address */ break; } From eeea7688632e0c697f66bdd71708c8faf36f6540 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Wed, 13 Aug 2025 20:55:26 +0800 Subject: [PATCH 0089/1536] net/sched: Use TC_RTAB_SIZE instead of magic number Replace magic number with TC_RTAB_SIZE to make it more informative. Signed-off-by: Yue Haibing Link: https://patch.msgid.link/20250813125526.853895-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- net/sched/sch_api.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index d7c767b861a4..1e058b46d3e1 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -431,7 +431,7 @@ struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r, for (rtab = qdisc_rtab_list; rtab; rtab = rtab->next) { if (!memcmp(&rtab->rate, r, sizeof(struct tc_ratespec)) && - !memcmp(&rtab->data, nla_data(tab), 1024)) { + !memcmp(&rtab->data, nla_data(tab), TC_RTAB_SIZE)) { rtab->refcnt++; return rtab; } @@ -441,7 +441,7 @@ struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r, if (rtab) { rtab->rate = *r; rtab->refcnt = 1; - memcpy(rtab->data, nla_data(tab), 1024); + memcpy(rtab->data, nla_data(tab), TC_RTAB_SIZE); if (r->linklayer == TC_LINKLAYER_UNAWARE) r->linklayer = __detect_linklayer(r, rtab->data); rtab->next = qdisc_rtab_list; From 20e1b75b38fd51ad7fb215943cd80fd78d1e767b Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Wed, 13 Aug 2025 21:10:23 +0300 Subject: [PATCH 0090/1536] net: dsa: realtek: remove unnecessary file, dentry, inode declarations These are present since commit d8652956cf37 ("net: dsa: realtek-smi: Add Realtek SMI driver") and never needed. Apparently the driver was not cleaned up sufficiently for submission. Signed-off-by: Vladimir Oltean Reviewed-by: Linus Walleij Link: https://patch.msgid.link/20250813181023.808528-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- drivers/net/dsa/realtek/realtek.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/dsa/realtek/realtek.h b/drivers/net/dsa/realtek/realtek.h index a1b2e0b529d5..c03485a80d93 100644 --- a/drivers/net/dsa/realtek/realtek.h +++ b/drivers/net/dsa/realtek/realtek.h @@ -19,9 +19,6 @@ struct phylink_mac_ops; struct realtek_ops; -struct dentry; -struct inode; -struct file; struct rtl8366_mib_counter { unsigned int base; From df979273bd716a93ca9ffa8f84aeb205c9bf2ab6 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Wed, 13 Aug 2025 10:44:54 +0300 Subject: [PATCH 0091/1536] net: phy: mscc: report and configure in-band auto-negotiation for SGMII/QSGMII The following Vitesse/Microsemi/Microchip PHYs, among those supported by this driver, have the host interface configurable as SGMII or QSGMII: - VSC8504 - VSC8514 - VSC8552 - VSC8562 - VSC8572 - VSC8574 - VSC8575 - VSC8582 - VSC8584 All these PHYs are documented to have bit 7 of "MAC SerDes PCS Control" as "MAC SerDes ANEG enable". Out of these, I could test the VSC8514 quad PHY in QSGMII. This works both with the in-band autoneg on and off, on the NXP LS1028A-RDB and T1040-RDB boards. Notably, the bit is sticky (survives soft resets), so giving Linux the tools to read and modify this settings makes it robust to changes made to it by previous boot layers (U-Boot). Signed-off-by: Vladimir Oltean Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/20250813074454.63224-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- drivers/net/phy/mscc/mscc.h | 3 +++ drivers/net/phy/mscc/mscc_main.c | 40 ++++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+) diff --git a/drivers/net/phy/mscc/mscc.h b/drivers/net/phy/mscc/mscc.h index 6a3d8a754eb8..138355f1ab0b 100644 --- a/drivers/net/phy/mscc/mscc.h +++ b/drivers/net/phy/mscc/mscc.h @@ -196,6 +196,9 @@ enum rgmii_clock_delay { #define MSCC_PHY_EXTENDED_INT_MS_EGR BIT(9) /* Extended Page 3 Registers */ +#define MSCC_PHY_SERDES_PCS_CTRL 16 +#define MSCC_PHY_SERDES_ANEG BIT(7) + #define MSCC_PHY_SERDES_TX_VALID_CNT 21 #define MSCC_PHY_SERDES_TX_CRC_ERR_CNT 22 #define MSCC_PHY_SERDES_RX_VALID_CNT 28 diff --git a/drivers/net/phy/mscc/mscc_main.c b/drivers/net/phy/mscc/mscc_main.c index 7ed6522fb0ef..3bcf48febb1f 100644 --- a/drivers/net/phy/mscc/mscc_main.c +++ b/drivers/net/phy/mscc/mscc_main.c @@ -2202,6 +2202,28 @@ static int vsc85xx_read_status(struct phy_device *phydev) return genphy_read_status(phydev); } +static unsigned int vsc85xx_inband_caps(struct phy_device *phydev, + phy_interface_t interface) +{ + if (interface != PHY_INTERFACE_MODE_SGMII && + interface != PHY_INTERFACE_MODE_QSGMII) + return 0; + + return LINK_INBAND_DISABLE | LINK_INBAND_ENABLE; +} + +static int vsc85xx_config_inband(struct phy_device *phydev, unsigned int modes) +{ + u16 reg_val = 0; + + if (modes == LINK_INBAND_ENABLE) + reg_val = MSCC_PHY_SERDES_ANEG; + + return phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_3, + MSCC_PHY_SERDES_PCS_CTRL, MSCC_PHY_SERDES_ANEG, + reg_val); +} + static int vsc8514_probe(struct phy_device *phydev) { struct vsc8531_private *vsc8531; @@ -2409,6 +2431,8 @@ static struct phy_driver vsc85xx_driver[] = { .get_sset_count = &vsc85xx_get_sset_count, .get_strings = &vsc85xx_get_strings, .get_stats = &vsc85xx_get_stats, + .inband_caps = vsc85xx_inband_caps, + .config_inband = vsc85xx_config_inband, }, { .phy_id = PHY_ID_VSC8514, @@ -2432,6 +2456,8 @@ static struct phy_driver vsc85xx_driver[] = { .get_sset_count = &vsc85xx_get_sset_count, .get_strings = &vsc85xx_get_strings, .get_stats = &vsc85xx_get_stats, + .inband_caps = vsc85xx_inband_caps, + .config_inband = vsc85xx_config_inband, }, { .phy_id = PHY_ID_VSC8530, @@ -2552,6 +2578,8 @@ static struct phy_driver vsc85xx_driver[] = { .get_sset_count = &vsc85xx_get_sset_count, .get_strings = &vsc85xx_get_strings, .get_stats = &vsc85xx_get_stats, + .inband_caps = vsc85xx_inband_caps, + .config_inband = vsc85xx_config_inband, }, { .phy_id = PHY_ID_VSC856X, @@ -2574,6 +2602,8 @@ static struct phy_driver vsc85xx_driver[] = { .get_sset_count = &vsc85xx_get_sset_count, .get_strings = &vsc85xx_get_strings, .get_stats = &vsc85xx_get_stats, + .inband_caps = vsc85xx_inband_caps, + .config_inband = vsc85xx_config_inband, }, { .phy_id = PHY_ID_VSC8572, @@ -2599,6 +2629,8 @@ static struct phy_driver vsc85xx_driver[] = { .get_sset_count = &vsc85xx_get_sset_count, .get_strings = &vsc85xx_get_strings, .get_stats = &vsc85xx_get_stats, + .inband_caps = vsc85xx_inband_caps, + .config_inband = vsc85xx_config_inband, }, { .phy_id = PHY_ID_VSC8574, @@ -2624,6 +2656,8 @@ static struct phy_driver vsc85xx_driver[] = { .get_sset_count = &vsc85xx_get_sset_count, .get_strings = &vsc85xx_get_strings, .get_stats = &vsc85xx_get_stats, + .inband_caps = vsc85xx_inband_caps, + .config_inband = vsc85xx_config_inband, }, { .phy_id = PHY_ID_VSC8575, @@ -2647,6 +2681,8 @@ static struct phy_driver vsc85xx_driver[] = { .get_sset_count = &vsc85xx_get_sset_count, .get_strings = &vsc85xx_get_strings, .get_stats = &vsc85xx_get_stats, + .inband_caps = vsc85xx_inband_caps, + .config_inband = vsc85xx_config_inband, }, { .phy_id = PHY_ID_VSC8582, @@ -2670,6 +2706,8 @@ static struct phy_driver vsc85xx_driver[] = { .get_sset_count = &vsc85xx_get_sset_count, .get_strings = &vsc85xx_get_strings, .get_stats = &vsc85xx_get_stats, + .inband_caps = vsc85xx_inband_caps, + .config_inband = vsc85xx_config_inband, }, { .phy_id = PHY_ID_VSC8584, @@ -2694,6 +2732,8 @@ static struct phy_driver vsc85xx_driver[] = { .get_strings = &vsc85xx_get_strings, .get_stats = &vsc85xx_get_stats, .link_change_notify = &vsc85xx_link_change_notify, + .inband_caps = vsc85xx_inband_caps, + .config_inband = vsc85xx_config_inband, } }; From f09fc24dd9a5ec989dfdde7090624924ede6ddc7 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 12 Aug 2025 07:20:54 -0700 Subject: [PATCH 0092/1536] selftests: drv-net: wait for carrier On fast machines the tests run in quick succession so even when tests clean up after themselves the carrier may need some time to come back. Specifically in NIPA when ping.py runs right after netpoll_basic.py the first ping command fails. Since the context manager callbacks are now common NetDrvEpEnv gets an ip link up call as well. Reviewed-by: Joe Damato Link: https://patch.msgid.link/20250812142054.750282-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- .../selftests/drivers/net/lib/py/__init__.py | 2 +- .../selftests/drivers/net/lib/py/env.py | 41 +++++++++---------- tools/testing/selftests/net/lib/py/utils.py | 18 ++++++++ 3 files changed, 39 insertions(+), 22 deletions(-) diff --git a/tools/testing/selftests/drivers/net/lib/py/__init__.py b/tools/testing/selftests/drivers/net/lib/py/__init__.py index 8711c67ad658..a07b56a75c8a 100644 --- a/tools/testing/selftests/drivers/net/lib/py/__init__.py +++ b/tools/testing/selftests/drivers/net/lib/py/__init__.py @@ -15,7 +15,7 @@ try: NlError, RtnlFamily, DevlinkFamily from net.lib.py import CmdExitFailure from net.lib.py import bkg, cmd, bpftool, bpftrace, defer, ethtool, \ - fd_read_timeout, ip, rand_port, tool, wait_port_listen + fd_read_timeout, ip, rand_port, tool, wait_port_listen, wait_file from net.lib.py import fd_read_timeout from net.lib.py import KsftSkipEx, KsftFailEx, KsftXfailEx from net.lib.py import ksft_disruptive, ksft_exit, ksft_pr, ksft_run, \ diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py index 1b8bd648048f..c1f3b608c6d8 100644 --- a/tools/testing/selftests/drivers/net/lib/py/env.py +++ b/tools/testing/selftests/drivers/net/lib/py/env.py @@ -4,7 +4,7 @@ import os import time from pathlib import Path from lib.py import KsftSkipEx, KsftXfailEx -from lib.py import ksft_setup +from lib.py import ksft_setup, wait_file from lib.py import cmd, ethtool, ip, CmdExitFailure from lib.py import NetNS, NetdevSimDev from .remote import Remote @@ -25,6 +25,9 @@ class NetDrvEnvBase: self.env = self._load_env_file() + # Following attrs must be set be inheriting classes + self.dev = None + def _load_env_file(self): env = os.environ.copy() @@ -48,6 +51,22 @@ class NetDrvEnvBase: env[pair[0]] = pair[1] return ksft_setup(env) + def __del__(self): + pass + + def __enter__(self): + ip(f"link set dev {self.dev['ifname']} up") + wait_file(f"/sys/class/net/{self.dev['ifname']}/carrier", + lambda x: x.strip() == "1") + + return self + + def __exit__(self, ex_type, ex_value, ex_tb): + """ + __exit__ gets called at the end of a "with" block. + """ + self.__del__() + class NetDrvEnv(NetDrvEnvBase): """ @@ -72,17 +91,6 @@ class NetDrvEnv(NetDrvEnvBase): self.ifname = self.dev['ifname'] self.ifindex = self.dev['ifindex'] - def __enter__(self): - ip(f"link set dev {self.dev['ifname']} up") - - return self - - def __exit__(self, ex_type, ex_value, ex_tb): - """ - __exit__ gets called at the end of a "with" block. - """ - self.__del__() - def __del__(self): if self._ns: self._ns.remove() @@ -219,15 +227,6 @@ class NetDrvEpEnv(NetDrvEnvBase): raise Exception("Can't resolve remote interface name, multiple interfaces match") return v6[0]["ifname"] if v6 else v4[0]["ifname"] - def __enter__(self): - return self - - def __exit__(self, ex_type, ex_value, ex_tb): - """ - __exit__ gets called at the end of a "with" block. - """ - self.__del__() - def __del__(self): if self._ns: self._ns.remove() diff --git a/tools/testing/selftests/net/lib/py/utils.py b/tools/testing/selftests/net/lib/py/utils.py index 4ac9249c85ab..b188cac49738 100644 --- a/tools/testing/selftests/net/lib/py/utils.py +++ b/tools/testing/selftests/net/lib/py/utils.py @@ -252,3 +252,21 @@ def wait_port_listen(port, proto="tcp", ns=None, host=None, sleep=0.005, deadlin if time.monotonic() > end: raise Exception("Waiting for port listen timed out") time.sleep(sleep) + + +def wait_file(fname, test_fn, sleep=0.005, deadline=5, encoding='utf-8'): + """ + Wait for file contents on the local system to satisfy a condition. + test_fn() should take one argument (file contents) and return whether + condition is met. + """ + end = time.monotonic() + deadline + + with open(fname, "r", encoding=encoding) as fp: + while True: + if test_fn(fp.read()): + break + fp.seek(0) + if time.monotonic() > end: + raise TimeoutError("Wait for file contents failed", fname) + time.sleep(sleep) From 3d05b24429e1de7a17c8fdccb04a04dbc8ad297b Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 12 Aug 2025 11:02:12 +0300 Subject: [PATCH 0093/1536] bridge: Redirect to backup port when port is administratively down If a backup port is configured for a bridge port, the bridge will redirect known unicast traffic towards the backup port when the primary port is administratively up but without a carrier. This is useful, for example, in MLAG configurations where a system is connected to two switches and there is a peer link between both switches. The peer link serves as the backup port in case one of the switches loses its connection to the multi-homed system. In order to avoid flooding when the primary port loses its carrier, the bridge does not flush dynamic FDB entries pointing to the port upon STP disablement, if the port has a backup port. The above means that known unicast traffic destined to the primary port will be blackholed when the port is put administratively down, until the FDB entries pointing to it are aged-out. Given that the current behavior is quite weird and unlikely to be depended on by anyone, amend the bridge to redirect to the backup port also when the primary port is administratively down and not only when it does not have a carrier. The change is motivated by a report from a user who expected traffic to be redirected to the backup port when the primary port was put administratively down while debugging a network issue. Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Link: https://patch.msgid.link/20250812080213.325298-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski --- net/bridge/br_forward.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 29097e984b4f..870bdf2e082c 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -148,7 +148,8 @@ void br_forward(const struct net_bridge_port *to, goto out; /* redirect to backup link if the destination port is down */ - if (rcu_access_pointer(to->backup_port) && !netif_carrier_ok(to->dev)) { + if (rcu_access_pointer(to->backup_port) && + (!netif_carrier_ok(to->dev) || !netif_running(to->dev))) { struct net_bridge_port *backup_port; backup_port = rcu_dereference(to->backup_port); From 51ca1e67f416000bfd9a2a49c868538e744a317e Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 12 Aug 2025 11:02:13 +0300 Subject: [PATCH 0094/1536] selftests: net: Test bridge backup port when port is administratively down Test that packets are redirected to the backup port when the primary port is administratively down. With the previous patch: # ./test_bridge_backup_port.sh [...] TEST: swp1 administratively down [ OK ] TEST: No forwarding out of swp1 [ OK ] TEST: Forwarding out of vx0 [ OK ] TEST: swp1 administratively up [ OK ] TEST: Forwarding out of swp1 [ OK ] TEST: No forwarding out of vx0 [ OK ] [...] Tests passed: 89 Tests failed: 0 Without the previous patch: # ./test_bridge_backup_port.sh [...] TEST: swp1 administratively down [ OK ] TEST: No forwarding out of swp1 [ OK ] TEST: Forwarding out of vx0 [FAIL] TEST: swp1 administratively up [ OK ] TEST: Forwarding out of swp1 [ OK ] [...] Tests passed: 85 Tests failed: 4 Signed-off-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Link: https://patch.msgid.link/20250812080213.325298-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski --- .../selftests/net/test_bridge_backup_port.sh | 31 ++++++++++++++++--- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/net/test_bridge_backup_port.sh b/tools/testing/selftests/net/test_bridge_backup_port.sh index 1b3f89e2b86e..2a7224fe74f2 100755 --- a/tools/testing/selftests/net/test_bridge_backup_port.sh +++ b/tools/testing/selftests/net/test_bridge_backup_port.sh @@ -315,6 +315,29 @@ backup_port() tc_check_packets $sw1 "dev vx0 egress" 101 1 log_test $? 0 "No forwarding out of vx0" + # Check that packets are forwarded out of vx0 when swp1 is + # administratively down and out of swp1 when it is administratively up + # again. + run_cmd "ip -n $sw1 link set dev swp1 down" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled + log_test $? 0 "swp1 administratively down" + + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 3 + log_test $? 0 "No forwarding out of swp1" + tc_check_packets $sw1 "dev vx0 egress" 101 2 + log_test $? 0 "Forwarding out of vx0" + + run_cmd "ip -n $sw1 link set dev swp1 up" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding + log_test $? 0 "swp1 administratively up" + + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 4 + log_test $? 0 "Forwarding out of swp1" + tc_check_packets $sw1 "dev vx0 egress" 101 2 + log_test $? 0 "No forwarding out of vx0" + # Remove vx0 as the backup port of swp1 and check that packets are no # longer forwarded out of vx0 when swp1 does not have a carrier. run_cmd "bridge -n $sw1 link set dev swp1 nobackup_port" @@ -322,9 +345,9 @@ backup_port() log_test $? 1 "vx0 not configured as backup port of swp1" run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets $sw1 "dev swp1 egress" 101 4 + tc_check_packets $sw1 "dev swp1 egress" 101 5 log_test $? 0 "Forwarding out of swp1" - tc_check_packets $sw1 "dev vx0 egress" 101 1 + tc_check_packets $sw1 "dev vx0 egress" 101 2 log_test $? 0 "No forwarding out of vx0" run_cmd "ip -n $sw1 link set dev swp1 carrier off" @@ -332,9 +355,9 @@ backup_port() log_test $? 0 "swp1 carrier off" run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets $sw1 "dev swp1 egress" 101 4 + tc_check_packets $sw1 "dev swp1 egress" 101 5 log_test $? 0 "No forwarding out of swp1" - tc_check_packets $sw1 "dev vx0 egress" 101 1 + tc_check_packets $sw1 "dev vx0 egress" 101 2 log_test $? 0 "No forwarding out of vx0" } From 6398d8a856fb8be01f53cf84b481b3cdac978607 Mon Sep 17 00:00:00 2001 From: Xichao Zhao Date: Tue, 12 Aug 2025 14:50:26 +0800 Subject: [PATCH 0095/1536] sfc: replace min/max nesting with clamp() The clamp() macro explicitly expresses the intent of constraining a value within bounds.Therefore, replacing min(max(a, b), c) with clamp(val, lo, hi) can improve code readability. Signed-off-by: Xichao Zhao Reviewed-by: Joe Damato Reviewed-by: Edward Cree Link: https://patch.msgid.link/20250812065026.620115-1-zhao.xichao@vivo.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/sfc/efx_channels.c | 4 ++-- drivers/net/ethernet/sfc/falcon/efx.c | 5 ++--- drivers/net/ethernet/sfc/siena/efx_channels.c | 4 ++-- 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c index 06b4f52713ef..0f66324ed351 100644 --- a/drivers/net/ethernet/sfc/efx_channels.c +++ b/drivers/net/ethernet/sfc/efx_channels.c @@ -216,8 +216,8 @@ static int efx_allocate_msix_channels(struct efx_nic *efx, if (efx_separate_tx_channels) { efx->n_tx_channels = - min(max(n_channels / 2, 1U), - efx->max_tx_channels); + clamp(n_channels / 2, 1U, + efx->max_tx_channels); efx->tx_channel_offset = n_channels - efx->n_tx_channels; efx->n_rx_channels = diff --git a/drivers/net/ethernet/sfc/falcon/efx.c b/drivers/net/ethernet/sfc/falcon/efx.c index b07f7e4e2877..d19fbf8732ff 100644 --- a/drivers/net/ethernet/sfc/falcon/efx.c +++ b/drivers/net/ethernet/sfc/falcon/efx.c @@ -1394,9 +1394,8 @@ static int ef4_probe_interrupts(struct ef4_nic *efx) if (n_channels > extra_channels) n_channels -= extra_channels; if (ef4_separate_tx_channels) { - efx->n_tx_channels = min(max(n_channels / 2, - 1U), - efx->max_tx_channels); + efx->n_tx_channels = clamp(n_channels / 2, 1U, + efx->max_tx_channels); efx->n_rx_channels = max(n_channels - efx->n_tx_channels, 1U); diff --git a/drivers/net/ethernet/sfc/siena/efx_channels.c b/drivers/net/ethernet/sfc/siena/efx_channels.c index d120b3c83ac0..703419866d18 100644 --- a/drivers/net/ethernet/sfc/siena/efx_channels.c +++ b/drivers/net/ethernet/sfc/siena/efx_channels.c @@ -217,8 +217,8 @@ static int efx_allocate_msix_channels(struct efx_nic *efx, if (efx_siena_separate_tx_channels) { efx->n_tx_channels = - min(max(n_channels / 2, 1U), - efx->max_tx_channels); + clamp(n_channels / 2, 1U, + efx->max_tx_channels); efx->tx_channel_offset = n_channels - efx->n_tx_channels; efx->n_rx_channels = From 7f95f04fe1903a31b61085e3ab1b4730f9d72941 Mon Sep 17 00:00:00 2001 From: Kyle Hendry Date: Wed, 13 Aug 2025 17:25:27 -0700 Subject: [PATCH 0096/1536] net: dsa: b53: mmap: Add gphy port to phy info for bcm63268 Add gphy mask to bcm63xx phy info struct and add data for bcm63268 Signed-off-by: Kyle Hendry Reviewed-by: Florian Fainelli Link: https://patch.msgid.link/20250814002530.5866-2-kylehendrydev@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/dsa/b53/b53_mmap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/dsa/b53/b53_mmap.c b/drivers/net/dsa/b53/b53_mmap.c index f06c3e0cc42a..87e1338765c2 100644 --- a/drivers/net/dsa/b53/b53_mmap.c +++ b/drivers/net/dsa/b53/b53_mmap.c @@ -31,6 +31,7 @@ #define BCM63XX_EPHY_REG 0x3C struct b53_phy_info { + u32 gphy_port_mask; u32 ephy_enable_mask; u32 ephy_port_mask; u32 ephy_bias_bit; @@ -65,6 +66,7 @@ static const struct b53_phy_info bcm6368_ephy_info = { static const u32 bcm63268_ephy_offsets[] = {4, 9, 14}; static const struct b53_phy_info bcm63268_ephy_info = { + .gphy_port_mask = BIT(3), .ephy_enable_mask = GENMASK(4, 0), .ephy_port_mask = GENMASK((ARRAY_SIZE(bcm63268_ephy_offsets) - 1), 0), .ephy_bias_bit = 24, From 61730ac10ba90c52563861a0119504f6a9be9868 Mon Sep 17 00:00:00 2001 From: Kyle Hendry Date: Wed, 13 Aug 2025 17:25:28 -0700 Subject: [PATCH 0097/1536] net: dsa: b53: mmap: Implement bcm63268 gphy power control Add check for gphy in enable/disable phy calls and set power bits in gphy control register. Signed-off-by: Kyle Hendry Reviewed-by: Florian Fainelli Link: https://patch.msgid.link/20250814002530.5866-3-kylehendrydev@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/dsa/b53/b53_mmap.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/drivers/net/dsa/b53/b53_mmap.c b/drivers/net/dsa/b53/b53_mmap.c index 87e1338765c2..f4a59d8fbdd6 100644 --- a/drivers/net/dsa/b53/b53_mmap.c +++ b/drivers/net/dsa/b53/b53_mmap.c @@ -29,6 +29,10 @@ #include "b53_priv.h" #define BCM63XX_EPHY_REG 0x3C +#define BCM63268_GPHY_REG 0x54 + +#define GPHY_CTRL_LOW_PWR BIT(3) +#define GPHY_CTRL_IDDQ_BIAS BIT(0) struct b53_phy_info { u32 gphy_port_mask; @@ -292,13 +296,30 @@ static int bcm63xx_ephy_set(struct b53_device *dev, int port, bool enable) return regmap_update_bits(gpio_ctrl, BCM63XX_EPHY_REG, mask, val); } +static int bcm63268_gphy_set(struct b53_device *dev, bool enable) +{ + struct b53_mmap_priv *priv = dev->priv; + struct regmap *gpio_ctrl = priv->gpio_ctrl; + u32 mask = GPHY_CTRL_IDDQ_BIAS | GPHY_CTRL_LOW_PWR; + u32 val = 0; + + if (!enable) + val = mask; + + return regmap_update_bits(gpio_ctrl, BCM63268_GPHY_REG, mask, val); +} + static void b53_mmap_phy_enable(struct b53_device *dev, int port) { struct b53_mmap_priv *priv = dev->priv; int ret = 0; - if (priv->phy_info && (BIT(port) & priv->phy_info->ephy_port_mask)) - ret = bcm63xx_ephy_set(dev, port, true); + if (priv->phy_info) { + if (BIT(port) & priv->phy_info->ephy_port_mask) + ret = bcm63xx_ephy_set(dev, port, true); + else if (BIT(port) & priv->phy_info->gphy_port_mask) + ret = bcm63268_gphy_set(dev, true); + } if (!ret) priv->phys_enabled |= BIT(port); @@ -309,8 +330,12 @@ static void b53_mmap_phy_disable(struct b53_device *dev, int port) struct b53_mmap_priv *priv = dev->priv; int ret = 0; - if (priv->phy_info && (BIT(port) & priv->phy_info->ephy_port_mask)) - ret = bcm63xx_ephy_set(dev, port, false); + if (priv->phy_info) { + if (BIT(port) & priv->phy_info->ephy_port_mask) + ret = bcm63xx_ephy_set(dev, port, false); + else if (BIT(port) & priv->phy_info->gphy_port_mask) + ret = bcm63268_gphy_set(dev, false); + } if (!ret) priv->phys_enabled &= ~BIT(port); From 2327a3d6f65ce2fe2634546dde4a25ef52296fec Mon Sep 17 00:00:00 2001 From: Charalampos Mitrodimas Date: Tue, 12 Aug 2025 15:51:25 +0000 Subject: [PATCH 0098/1536] net: ipv6: fix field-spanning memcpy warning in AH output Fix field-spanning memcpy warnings in ah6_output() and ah6_output_done() where extension headers are copied to/from IPv6 address fields, triggering fortify-string warnings about writes beyond the 16-byte address fields. memcpy: detected field-spanning write (size 40) of single field "&top_iph->saddr" at net/ipv6/ah6.c:439 (size 16) WARNING: CPU: 0 PID: 8838 at net/ipv6/ah6.c:439 ah6_output+0xe7e/0x14e0 net/ipv6/ah6.c:439 The warnings are false positives as the extension headers are intentionally placed after the IPv6 header in memory. Fix by properly copying addresses and extension headers separately, and introduce helper functions to avoid code duplication. Reported-by: syzbot+01b0667934cdceb4451c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=01b0667934cdceb4451c Signed-off-by: Charalampos Mitrodimas Signed-off-by: Steffen Klassert --- net/ipv6/ah6.c | 50 +++++++++++++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 19 deletions(-) diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c index eb474f0987ae..95372e0f1d21 100644 --- a/net/ipv6/ah6.c +++ b/net/ipv6/ah6.c @@ -46,6 +46,34 @@ struct ah_skb_cb { #define AH_SKB_CB(__skb) ((struct ah_skb_cb *)&((__skb)->cb[0])) +/* Helper to save IPv6 addresses and extension headers to temporary storage */ +static inline void ah6_save_hdrs(struct tmp_ext *iph_ext, + struct ipv6hdr *top_iph, int extlen) +{ + if (!extlen) + return; + +#if IS_ENABLED(CONFIG_IPV6_MIP6) + iph_ext->saddr = top_iph->saddr; +#endif + iph_ext->daddr = top_iph->daddr; + memcpy(&iph_ext->hdrs, top_iph + 1, extlen - sizeof(*iph_ext)); +} + +/* Helper to restore IPv6 addresses and extension headers from temporary storage */ +static inline void ah6_restore_hdrs(struct ipv6hdr *top_iph, + struct tmp_ext *iph_ext, int extlen) +{ + if (!extlen) + return; + +#if IS_ENABLED(CONFIG_IPV6_MIP6) + top_iph->saddr = iph_ext->saddr; +#endif + top_iph->daddr = iph_ext->daddr; + memcpy(top_iph + 1, &iph_ext->hdrs, extlen - sizeof(*iph_ext)); +} + static void *ah_alloc_tmp(struct crypto_ahash *ahash, int nfrags, unsigned int size) { @@ -301,13 +329,7 @@ static void ah6_output_done(void *data, int err) memcpy(ah->auth_data, icv, ahp->icv_trunc_len); memcpy(top_iph, iph_base, IPV6HDR_BASELEN); - if (extlen) { -#if IS_ENABLED(CONFIG_IPV6_MIP6) - memcpy(&top_iph->saddr, iph_ext, extlen); -#else - memcpy(&top_iph->daddr, iph_ext, extlen); -#endif - } + ah6_restore_hdrs(top_iph, iph_ext, extlen); kfree(AH_SKB_CB(skb)->tmp); xfrm_output_resume(skb->sk, skb, err); @@ -378,12 +400,8 @@ static int ah6_output(struct xfrm_state *x, struct sk_buff *skb) */ memcpy(iph_base, top_iph, IPV6HDR_BASELEN); + ah6_save_hdrs(iph_ext, top_iph, extlen); if (extlen) { -#if IS_ENABLED(CONFIG_IPV6_MIP6) - memcpy(iph_ext, &top_iph->saddr, extlen); -#else - memcpy(iph_ext, &top_iph->daddr, extlen); -#endif err = ipv6_clear_mutable_options(top_iph, extlen - sizeof(*iph_ext) + sizeof(*top_iph), @@ -434,13 +452,7 @@ static int ah6_output(struct xfrm_state *x, struct sk_buff *skb) memcpy(ah->auth_data, icv, ahp->icv_trunc_len); memcpy(top_iph, iph_base, IPV6HDR_BASELEN); - if (extlen) { -#if IS_ENABLED(CONFIG_IPV6_MIP6) - memcpy(&top_iph->saddr, iph_ext, extlen); -#else - memcpy(&top_iph->daddr, iph_ext, extlen); -#endif - } + ah6_restore_hdrs(top_iph, iph_ext, extlen); out_free: kfree(iph_base); From 9f4f591cd5a410f4203a9c104f92d467945b7d7e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miguel=20Garc=C3=ADa?= Date: Thu, 14 Aug 2025 21:32:17 +0200 Subject: [PATCH 0099/1536] xfrm: xfrm_user: use strscpy() for alg_name MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the strcpy() calls that copy the canonical algorithm name into alg_name with strscpy() to avoid potential overflows and guarantee NULL termination. Destination is alg_name in xfrm_algo/xfrm_algo_auth/xfrm_algo_aead (size CRYPTO_MAX_ALG_NAME). Tested in QEMU (BusyBox/Alpine rootfs): - Added ESP AEAD (rfc4106(gcm(aes))) and classic ESP (sha256 + cbc(aes)) - Verified canonical names via ip -d xfrm state - Checked IPComp negative (unknown algo) and deflate path Signed-off-by: Miguel García Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_user.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 684239018bec..010c9e6638c0 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -593,7 +593,7 @@ static int attach_one_algo(struct xfrm_algo **algpp, u8 *props, if (!p) return -ENOMEM; - strcpy(p->alg_name, algo->name); + strscpy(p->alg_name, algo->name); *algpp = p; return 0; } @@ -620,7 +620,7 @@ static int attach_crypt(struct xfrm_state *x, struct nlattr *rta, if (!p) return -ENOMEM; - strcpy(p->alg_name, algo->name); + strscpy(p->alg_name, algo->name); x->ealg = p; x->geniv = algo->uinfo.encr.geniv; return 0; @@ -649,7 +649,7 @@ static int attach_auth(struct xfrm_algo_auth **algpp, u8 *props, if (!p) return -ENOMEM; - strcpy(p->alg_name, algo->name); + strscpy(p->alg_name, algo->name); p->alg_key_len = ualg->alg_key_len; p->alg_trunc_len = algo->uinfo.auth.icv_truncbits; memcpy(p->alg_key, ualg->alg_key, (ualg->alg_key_len + 7) / 8); @@ -684,7 +684,7 @@ static int attach_auth_trunc(struct xfrm_algo_auth **algpp, u8 *props, if (!p) return -ENOMEM; - strcpy(p->alg_name, algo->name); + strscpy(p->alg_name, algo->name); if (!p->alg_trunc_len) p->alg_trunc_len = algo->uinfo.auth.icv_truncbits; @@ -714,7 +714,7 @@ static int attach_aead(struct xfrm_state *x, struct nlattr *rta, if (!p) return -ENOMEM; - strcpy(p->alg_name, algo->name); + strscpy(p->alg_name, algo->name); x->aead = p; x->geniv = algo->uinfo.aead.geniv; return 0; From 7de0eebbb4c3bb44c296f66679ad37480139dc6e Mon Sep 17 00:00:00 2001 From: Wang Liang Date: Thu, 14 Aug 2025 12:23:55 +0800 Subject: [PATCH 0100/1536] net: bridge: remove unused argument of br_multicast_query_expired() Since commit 67b746f94ff3 ("net: bridge: mcast: make sure querier port/address updates are consistent"), the argument 'querier' is unused, just get rid of it. Signed-off-by: Wang Liang Reviewed-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Link: https://patch.msgid.link/20250814042355.1720755-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski --- net/bridge/br_multicast.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 1377f31b719c..4dc62d01e2d3 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -4049,8 +4049,7 @@ int br_multicast_rcv(struct net_bridge_mcast **brmctx, } static void br_multicast_query_expired(struct net_bridge_mcast *brmctx, - struct bridge_mcast_own_query *query, - struct bridge_mcast_querier *querier) + struct bridge_mcast_own_query *query) { spin_lock(&brmctx->br->multicast_lock); if (br_multicast_ctx_vlan_disabled(brmctx)) @@ -4069,8 +4068,7 @@ static void br_ip4_multicast_query_expired(struct timer_list *t) struct net_bridge_mcast *brmctx = timer_container_of(brmctx, t, ip4_own_query.timer); - br_multicast_query_expired(brmctx, &brmctx->ip4_own_query, - &brmctx->ip4_querier); + br_multicast_query_expired(brmctx, &brmctx->ip4_own_query); } #if IS_ENABLED(CONFIG_IPV6) @@ -4079,8 +4077,7 @@ static void br_ip6_multicast_query_expired(struct timer_list *t) struct net_bridge_mcast *brmctx = timer_container_of(brmctx, t, ip6_own_query.timer); - br_multicast_query_expired(brmctx, &brmctx->ip6_own_query, - &brmctx->ip6_querier); + br_multicast_query_expired(brmctx, &brmctx->ip6_own_query); } #endif From 2335b3f56690f76ac34b972fcaef368bab1f76f2 Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Mon, 16 Jun 2025 23:41:53 -0700 Subject: [PATCH 0101/1536] net/mlx5: mlx5_ifc, Add hardware definitions needed for adjacent vports Next patches will implement the discovery and creation of adjacent functions vports, this patch introduces the hardware structures definitions needed for the driver implementation. Signed-off-by: Saeed Mahameed Reviewed-by: Mark Bloch Reviewed-by: Parav Pandit Reviewed-by: Jack Morgenstein Signed-off-by: Alexei Lazar --- include/linux/mlx5/mlx5_ifc.h | 133 +++++++++++++++++++++++++++++++++- 1 file changed, 129 insertions(+), 4 deletions(-) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 8360d9011d4f..44d497272162 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -189,6 +189,9 @@ enum { MLX5_CMD_OP_QUERY_XRQ_ERROR_PARAMS = 0x727, MLX5_CMD_OP_RELEASE_XRQ_ERROR = 0x729, MLX5_CMD_OP_MODIFY_XRQ = 0x72a, + MLX5_CMD_OPCODE_QUERY_DELEGATED_VHCA = 0x732, + MLX5_CMD_OPCODE_CREATE_ESW_VPORT = 0x733, + MLX5_CMD_OPCODE_DESTROY_ESW_VPORT = 0x734, MLX5_CMD_OP_QUERY_ESW_FUNCTIONS = 0x740, MLX5_CMD_OP_QUERY_VPORT_STATE = 0x750, MLX5_CMD_OP_MODIFY_VPORT_STATE = 0x751, @@ -2207,7 +2210,19 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { u8 reserved_at_440[0x8]; u8 max_num_eqs_24b[0x18]; - u8 reserved_at_460[0x3a0]; + + u8 reserved_at_460[0x160]; + + u8 query_adjacent_functions_id[0x1]; + u8 ingress_egress_esw_vport_connect[0x1]; + u8 function_id_type_vhca_id[0x1]; + u8 reserved_at_5c3[0xd]; + u8 delegate_vhca_management_profiles[0x10]; + + u8 delegated_vhca_max[0x10]; + u8 delegate_vhca_max[0x10]; + + u8 reserved_at_600[0x200]; }; enum mlx5_ifc_flow_destination_type { @@ -5159,7 +5174,9 @@ struct mlx5_ifc_set_hca_cap_in_bits { u8 other_function[0x1]; u8 ec_vf_function[0x1]; - u8 reserved_at_42[0xe]; + u8 reserved_at_42[0x1]; + u8 function_id_type[0x1]; + u8 reserved_at_44[0xc]; u8 function_id[0x10]; u8 reserved_at_60[0x20]; @@ -6357,7 +6374,9 @@ struct mlx5_ifc_query_hca_cap_in_bits { u8 other_function[0x1]; u8 ec_vf_function[0x1]; - u8 reserved_at_42[0xe]; + u8 reserved_at_42[0x1]; + u8 function_id_type[0x1]; + u8 reserved_at_44[0xc]; u8 function_id[0x10]; u8 reserved_at_60[0x20]; @@ -6983,6 +7002,28 @@ struct mlx5_ifc_query_esw_vport_context_in_bits { u8 reserved_at_60[0x20]; }; +struct mlx5_ifc_destroy_esw_vport_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x20]; +}; + +struct mlx5_ifc_destroy_esw_vport_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x10]; + u8 vport_num[0x10]; + + u8 reserved_at_60[0x20]; +}; + struct mlx5_ifc_modify_esw_vport_context_out_bits { u8 status[0x8]; u8 reserved_at_8[0x18]; @@ -7484,6 +7525,85 @@ struct mlx5_ifc_query_adapter_in_bits { u8 reserved_at_40[0x40]; }; +struct mlx5_ifc_function_vhca_rid_info_reg_bits { + u8 host_number[0x8]; + u8 host_pci_device_function[0x8]; + u8 host_pci_bus[0x8]; + u8 reserved_at_18[0x3]; + u8 pci_bus_assigned[0x1]; + u8 function_type[0x4]; + + u8 parent_pci_device_function[0x8]; + u8 parent_pci_bus[0x8]; + u8 vhca_id[0x10]; + + u8 reserved_at_40[0x10]; + u8 function_id[0x10]; + + u8 reserved_at_60[0x20]; +}; + +struct mlx5_ifc_delegated_function_vhca_rid_info_bits { + struct mlx5_ifc_function_vhca_rid_info_reg_bits function_vhca_rid_info; + + u8 reserved_at_80[0x18]; + u8 manage_profile[0x8]; + + u8 reserved_at_a0[0x60]; +}; + +struct mlx5_ifc_query_delegated_vhca_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x20]; + + u8 reserved_at_60[0x10]; + u8 functions_count[0x10]; + + u8 reserved_at_80[0x80]; + + struct mlx5_ifc_delegated_function_vhca_rid_info_bits + delegated_function_vhca_rid_info[]; +}; + +struct mlx5_ifc_query_delegated_vhca_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x40]; +}; + +struct mlx5_ifc_create_esw_vport_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x20]; + + u8 reserved_at_60[0x10]; + u8 vport_num[0x10]; +}; + +struct mlx5_ifc_create_esw_vport_in_bits { + u8 opcode[0x10]; + u8 reserved_at_10[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x10]; + u8 managed_vhca_id[0x10]; + + u8 reserved_at_60[0x20]; +}; + struct mlx5_ifc_qp_2rst_out_bits { u8 status[0x8]; u8 reserved_at_8[0x18]; @@ -7611,7 +7731,12 @@ struct mlx5_ifc_modify_vport_state_in_bits { u8 reserved_at_41[0xf]; u8 vport_number[0x10]; - u8 reserved_at_60[0x18]; + u8 reserved_at_60[0x10]; + u8 ingress_connect[0x1]; + u8 egress_connect[0x1]; + u8 ingress_connect_valid[0x1]; + u8 egress_connect_valid[0x1]; + u8 reserved_at_74[0x4]; u8 admin_state[0x4]; u8 reserved_at_7c[0x4]; }; From 864c05b9bc4077c99730512babc991a9d92730e0 Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Sat, 14 Jun 2025 02:29:41 -0700 Subject: [PATCH 0102/1536] net/mlx5: E-Switch, Cache vport vhca id on first cap query We need vhca_id to set up the vhca_id to vport mapping for every vport, for that we query the firmware in mlx5_esw_vport_vhca_id_set, and it is redundant since in esw_vport_setup, we already query hca caps which has the vhca_id, cache it there and save 2 extra fw queries per vport. Signed-off-by: Saeed Mahameed Reviewed-by: Mark Bloch Reviewed-by: Parav Pandit Signed-off-by: Alexei Lazar Reviewed-by: Feng Liu --- .../net/ethernet/mellanox/mlx5/core/eswitch.c | 6 ++-- .../net/ethernet/mellanox/mlx5/core/eswitch.h | 12 +++++-- .../mellanox/mlx5/core/eswitch_offloads.c | 34 +++++++++---------- 3 files changed, 31 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 4917d185d0c3..eeffe9c4aa56 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -820,6 +820,7 @@ static int mlx5_esw_vport_caps_get(struct mlx5_eswitch *esw, struct mlx5_vport * hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability); vport->info.roce_enabled = MLX5_GET(cmd_hca_cap, hca_caps, roce); + vport->vhca_id = MLX5_GET(cmd_hca_cap, hca_caps, vhca_id); if (!MLX5_CAP_GEN_MAX(esw->dev, hca_cap_2)) goto out_free; @@ -929,7 +930,7 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport, if (!mlx5_esw_is_manager_vport(esw, vport_num) && MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { - ret = mlx5_esw_vport_vhca_id_set(esw, vport_num); + ret = mlx5_esw_vport_vhca_id_map(esw, vport); if (ret) goto err_vhca_mapping; } @@ -973,7 +974,7 @@ void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, struct mlx5_vport *vport) if (!mlx5_esw_is_manager_vport(esw, vport_num) && MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) - mlx5_esw_vport_vhca_id_clear(esw, vport_num); + mlx5_esw_vport_vhca_id_unmap(esw, vport); if (vport->vport != MLX5_VPORT_PF && (vport->info.ipsec_crypto_enabled || vport->info.ipsec_packet_enabled)) @@ -1710,6 +1711,7 @@ static int mlx5_esw_vport_alloc(struct mlx5_eswitch *esw, vport->vport = vport_num; vport->index = index; vport->info.link_state = MLX5_VPORT_ADMIN_STATE_AUTO; + vport->vhca_id = MLX5_VHCA_ID_INVALID; INIT_WORK(&vport->vport_change_handler, esw_vport_change_handler); err = xa_insert(&esw->vports, vport_num, vport, GFP_KERNEL); if (err) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index b0b8ef3ec3c4..7f6bfaae5f5f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -197,6 +197,11 @@ static inline struct mlx5_vport *mlx5_devlink_port_vport_get(struct devlink_port return mlx5_devlink_port_get(dl_port)->vport; } +#define MLX5_VHCA_ID_INVALID (-1) + +#define MLX5_VPORT_INVAL_VHCA_ID(vport) \ + ((vport)->vhca_id == MLX5_VHCA_ID_INVALID) + struct mlx5_vport { struct mlx5_core_dev *dev; struct hlist_head uc_list[MLX5_L2_ADDR_HASH_SIZE]; @@ -209,6 +214,7 @@ struct mlx5_vport { struct vport_egress egress; u32 default_metadata; u32 metadata; + int vhca_id; struct mlx5_vport_info info; @@ -822,8 +828,10 @@ struct devlink_port *mlx5_esw_offloads_devlink_port(struct mlx5_eswitch *esw, u1 int mlx5_esw_sf_max_hpf_functions(struct mlx5_core_dev *dev, u16 *max_sfs, u16 *sf_base_id); -int mlx5_esw_vport_vhca_id_set(struct mlx5_eswitch *esw, u16 vport_num); -void mlx5_esw_vport_vhca_id_clear(struct mlx5_eswitch *esw, u16 vport_num); +int mlx5_esw_vport_vhca_id_map(struct mlx5_eswitch *esw, + struct mlx5_vport *vport); +void mlx5_esw_vport_vhca_id_unmap(struct mlx5_eswitch *esw, + struct mlx5_vport *vport); int mlx5_eswitch_vhca_id_to_vport(struct mlx5_eswitch *esw, u16 vhca_id, u16 *vport_num); /** diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index bee906661282..19decaa8a96e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -4161,23 +4161,28 @@ u32 mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw, } EXPORT_SYMBOL(mlx5_eswitch_get_vport_metadata_for_match); -int mlx5_esw_vport_vhca_id_set(struct mlx5_eswitch *esw, u16 vport_num) +int mlx5_esw_vport_vhca_id_map(struct mlx5_eswitch *esw, + struct mlx5_vport *vport) { u16 *old_entry, *vhca_map_entry, vhca_id; - int err; - err = mlx5_vport_get_vhca_id(esw->dev, vport_num, &vhca_id); - if (err) { - esw_warn(esw->dev, "Getting vhca_id for vport failed (vport=%u,err=%d)\n", - vport_num, err); - return err; + if (WARN_ONCE(MLX5_VPORT_INVAL_VHCA_ID(vport), + "vport %d vhca_id is not set", vport->vport)) { + int err; + + err = mlx5_vport_get_vhca_id(vport->dev, vport->vport, + &vhca_id); + if (err) + return err; + vport->vhca_id = vhca_id; } + vhca_id = vport->vhca_id; vhca_map_entry = kmalloc(sizeof(*vhca_map_entry), GFP_KERNEL); if (!vhca_map_entry) return -ENOMEM; - *vhca_map_entry = vport_num; + *vhca_map_entry = vport->vport; old_entry = xa_store(&esw->offloads.vhca_map, vhca_id, vhca_map_entry, GFP_KERNEL); if (xa_is_err(old_entry)) { kfree(vhca_map_entry); @@ -4187,17 +4192,12 @@ int mlx5_esw_vport_vhca_id_set(struct mlx5_eswitch *esw, u16 vport_num) return 0; } -void mlx5_esw_vport_vhca_id_clear(struct mlx5_eswitch *esw, u16 vport_num) +void mlx5_esw_vport_vhca_id_unmap(struct mlx5_eswitch *esw, + struct mlx5_vport *vport) { - u16 *vhca_map_entry, vhca_id; - int err; + u16 *vhca_map_entry; - err = mlx5_vport_get_vhca_id(esw->dev, vport_num, &vhca_id); - if (err) - esw_warn(esw->dev, "Getting vhca_id for vport failed (vport=%hu,err=%d)\n", - vport_num, err); - - vhca_map_entry = xa_erase(&esw->offloads.vhca_map, vhca_id); + vhca_map_entry = xa_erase(&esw->offloads.vhca_map, vport->vhca_id); kfree(vhca_map_entry); } From 1baf30426553efb3ac518b0e9d5c1c3f8ed7762a Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Mon, 16 Jun 2025 15:07:59 -0700 Subject: [PATCH 0103/1536] net/mlx5: E-Switch, Set/Query hca cap via vhca id Dynamically created vports require vhca id as input to set/query other vport hca cap, when FW is capable and the vhca id of a vport is valid use it instead of the local function id. Signed-off-by: Saeed Mahameed Signed-off-by: Adithya Jayachandran Reviewed-by: Parav Pandit Reviewed-by: Feng Liu Reviewed-by: William Tu Reviewed-by: Mark Bloch --- .../net/ethernet/mellanox/mlx5/core/eswitch.c | 12 +++++ .../net/ethernet/mellanox/mlx5/core/eswitch.h | 8 +++ .../net/ethernet/mellanox/mlx5/core/vport.c | 53 +++++++++++++++++-- 3 files changed, 68 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index eeffe9c4aa56..21c42138d93c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -840,6 +840,18 @@ out_free: return err; } +bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id) +{ + struct mlx5_vport *vport; + + vport = mlx5_eswitch_get_vport(esw, vportn); + if (IS_ERR(vport) || MLX5_VPORT_INVAL_VHCA_ID(vport)) + return false; + + *vhca_id = vport->vhca_id; + return true; +} + static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) { bool vst_mode_steering = esw_vst_mode_is_steering(esw); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 7f6bfaae5f5f..f47389629c62 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -833,6 +833,7 @@ int mlx5_esw_vport_vhca_id_map(struct mlx5_eswitch *esw, void mlx5_esw_vport_vhca_id_unmap(struct mlx5_eswitch *esw, struct mlx5_vport *vport); int mlx5_eswitch_vhca_id_to_vport(struct mlx5_eswitch *esw, u16 vhca_id, u16 *vport_num); +bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id); /** * struct mlx5_esw_event_info - Indicates eswitch mode changed/changing. @@ -973,6 +974,13 @@ static inline bool mlx5_eswitch_block_ipsec(struct mlx5_core_dev *dev) } static inline void mlx5_eswitch_unblock_ipsec(struct mlx5_core_dev *dev) {} + +static inline bool +mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id) +{ + return -EOPNOTSUPP; +} + #endif /* CONFIG_MLX5_ESWITCH */ #endif /* __MLX5_ESWITCH_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c index da5c24fc7b30..231bedc6a252 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c @@ -36,6 +36,7 @@ #include #include #include "mlx5_core.h" +#include "eswitch.h" #include "sf/sf.h" /* Mutex to hold while enabling or disabling RoCE */ @@ -1189,18 +1190,44 @@ u64 mlx5_query_nic_system_image_guid(struct mlx5_core_dev *mdev) } EXPORT_SYMBOL_GPL(mlx5_query_nic_system_image_guid); +static bool mlx5_vport_use_vhca_id_as_func_id(struct mlx5_core_dev *dev, + u16 vport_num, u16 *vhca_id) +{ + if (!MLX5_CAP_GEN_2(dev, function_id_type_vhca_id)) + return false; + + return mlx5_esw_vport_vhca_id(dev->priv.eswitch, vport_num, vhca_id); +} + int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 vport, void *out, u16 opmod) { - bool ec_vf_func = mlx5_core_is_ec_vf_vport(dev, vport); u8 in[MLX5_ST_SZ_BYTES(query_hca_cap_in)] = {}; + u16 vhca_id = 0, function_id = 0; + bool ec_vf_func = false; + + /* if this vport is referring to a vport on the ec PF (embedded cpu ) + * let the FW know which domain we are querying since vport numbers or + * function_ids are not unique across the different PF domains, + * unless we use vhca_id as the function_id below. + */ + ec_vf_func = mlx5_core_is_ec_vf_vport(dev, vport); + function_id = mlx5_vport_to_func_id(dev, vport, ec_vf_func); + + if (mlx5_vport_use_vhca_id_as_func_id(dev, vport, &vhca_id)) { + MLX5_SET(query_hca_cap_in, in, function_id_type, 1); + function_id = vhca_id; + ec_vf_func = false; + mlx5_core_dbg(dev, "%s using vhca_id as function_id for vport %d vhca_id 0x%x\n", + __func__, vport, vhca_id); + } opmod = (opmod << 1) | (HCA_CAP_OPMOD_GET_MAX & 0x01); MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP); MLX5_SET(query_hca_cap_in, in, op_mod, opmod); - MLX5_SET(query_hca_cap_in, in, function_id, mlx5_vport_to_func_id(dev, vport, ec_vf_func)); MLX5_SET(query_hca_cap_in, in, other_function, true); MLX5_SET(query_hca_cap_in, in, ec_vf_function, ec_vf_func); + MLX5_SET(query_hca_cap_in, in, function_id, function_id); return mlx5_cmd_exec_inout(dev, query_hca_cap, in, out); } EXPORT_SYMBOL_GPL(mlx5_vport_get_other_func_cap); @@ -1233,8 +1260,9 @@ out_free: int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap, u16 vport, u16 opmod) { - bool ec_vf_func = mlx5_core_is_ec_vf_vport(dev, vport); int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in); + u16 vhca_id = 0, function_id = 0; + bool ec_vf_func = false; void *set_hca_cap; void *set_ctx; int ret; @@ -1243,14 +1271,29 @@ int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap if (!set_ctx) return -ENOMEM; + /* if this vport is referring to a vport on the ec PF (embedded cpu ) + * let the FW know which domain we are querying since vport numbers or + * function_ids are not unique across the different PF domains, + * unless we use vhca_id as the function_id below. + */ + ec_vf_func = mlx5_core_is_ec_vf_vport(dev, vport); + function_id = mlx5_vport_to_func_id(dev, vport, ec_vf_func); + + if (mlx5_vport_use_vhca_id_as_func_id(dev, vport, &vhca_id)) { + MLX5_SET(set_hca_cap_in, set_ctx, function_id_type, 1); + function_id = vhca_id; + ec_vf_func = false; + mlx5_core_dbg(dev, "%s using vhca_id as function_id for vport %d vhca_id 0x%x\n", + __func__, vport, vhca_id); + } + MLX5_SET(set_hca_cap_in, set_ctx, opcode, MLX5_CMD_OP_SET_HCA_CAP); MLX5_SET(set_hca_cap_in, set_ctx, op_mod, opmod << 1); set_hca_cap = MLX5_ADDR_OF(set_hca_cap_in, set_ctx, capability); memcpy(set_hca_cap, hca_cap, MLX5_ST_SZ_BYTES(cmd_hca_cap)); - MLX5_SET(set_hca_cap_in, set_ctx, function_id, - mlx5_vport_to_func_id(dev, vport, ec_vf_func)); MLX5_SET(set_hca_cap_in, set_ctx, other_function, true); MLX5_SET(set_hca_cap_in, set_ctx, ec_vf_function, ec_vf_func); + MLX5_SET(set_hca_cap_in, set_ctx, function_id, function_id); ret = mlx5_cmd_exec_in(dev, set_hca_cap, set_ctx); kfree(set_ctx); From 40653f280b2640e5caa94eeedee43e0f1df97704 Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Mon, 16 Jun 2025 17:28:20 -0700 Subject: [PATCH 0104/1536] {rdma,net}/mlx5: export mlx5_vport_get_vhca_id vhca id is already cached in the vport structure no need to query on every mlx5 layer, use the mlx5_vport_get_vhca_id, where possible. Signed-off-by: Saeed Mahameed Reviewed-by: Mark Bloch Reviewed-by: Parav Pandit Signed-off-by: Alexei Lazar Reviewed-by: Feng Liu Reviewed-by: Tariq Toukan --- drivers/infiniband/hw/mlx5/std_types.c | 27 +++---------------- .../mellanox/mlx5/core/diag/reporter_vnic.c | 2 ++ .../ethernet/mellanox/mlx5/core/mlx5_core.h | 2 -- .../mellanox/mlx5/core/steering/hws/cmd.c | 16 +++++++---- .../mellanox/mlx5/core/steering/sws/dr_cmd.c | 16 ++++++++--- .../net/ethernet/mellanox/mlx5/core/vport.c | 5 +++- include/linux/mlx5/vport.h | 2 ++ 7 files changed, 35 insertions(+), 35 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/std_types.c b/drivers/infiniband/hw/mlx5/std_types.c index bdb568411091..2fcf553044e1 100644 --- a/drivers/infiniband/hw/mlx5/std_types.c +++ b/drivers/infiniband/hw/mlx5/std_types.c @@ -83,33 +83,14 @@ static int fill_vport_icm_addr(struct mlx5_core_dev *mdev, u16 vport, static int fill_vport_vhca_id(struct mlx5_core_dev *mdev, u16 vport, struct mlx5_ib_uapi_query_port *info) { - size_t out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); - u32 in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {}; - void *out; - int err; + int err = mlx5_vport_get_vhca_id(mdev, vport, &info->vport_vhca_id); - out = kzalloc(out_sz, GFP_KERNEL); - if (!out) - return -ENOMEM; - - MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP); - MLX5_SET(query_hca_cap_in, in, other_function, true); - MLX5_SET(query_hca_cap_in, in, function_id, vport); - MLX5_SET(query_hca_cap_in, in, op_mod, - MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE | - HCA_CAP_OPMOD_GET_CUR); - - err = mlx5_cmd_exec(mdev, in, sizeof(in), out, out_sz); if (err) - goto out; - - info->vport_vhca_id = MLX5_GET(query_hca_cap_out, out, - capability.cmd_hca_cap.vhca_id); + return err; info->flags |= MLX5_IB_UAPI_QUERY_PORT_VPORT_VHCA_ID; -out: - kfree(out); - return err; + + return 0; } static int fill_multiport_info(struct mlx5_ib_dev *dev, u32 port_num, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c index 86253a89c24c..32bb769f1829 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c @@ -1,6 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB /* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. */ +#include + #include "reporter_vnic.h" #include "en_stats.h" #include "devlink.h" diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index b6d53db27cd5..81857c6f6bf7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -447,8 +447,6 @@ int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap #define mlx5_vport_get_other_func_general_cap(dev, vport, out) \ mlx5_vport_get_other_func_cap(dev, vport, out, MLX5_CAP_GENERAL) -int mlx5_vport_get_vhca_id(struct mlx5_core_dev *dev, u16 vport, u16 *vhca_id); - static inline u32 mlx5_sriov_get_vf_total_msix(struct pci_dev *pdev) { struct mlx5_core_dev *dev = pci_get_drvdata(pdev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c index 9c83753e4592..d447574d86fe 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c @@ -1199,22 +1199,28 @@ out: int mlx5hws_cmd_query_gvmi(struct mlx5_core_dev *mdev, bool other_function, u16 vport_number, u16 *gvmi) { - bool ec_vf_func = other_function ? mlx5_core_is_ec_vf_vport(mdev, vport_number) : false; u32 in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {}; int out_size; void *out; int err; + if (other_function) { + err = mlx5_vport_get_vhca_id(mdev, vport_number, gvmi); + if (!err) + return 0; + + mlx5_core_err(mdev, "Failed to get vport vhca id for vport %d\n", + vport_number); + return err; + } + + /* get vhca_id for `this` function */ out_size = MLX5_ST_SZ_BYTES(query_hca_cap_out); out = kzalloc(out_size, GFP_KERNEL); if (!out) return -ENOMEM; MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP); - MLX5_SET(query_hca_cap_in, in, other_function, other_function); - MLX5_SET(query_hca_cap_in, in, function_id, - mlx5_vport_to_func_id(mdev, vport_number, ec_vf_func)); - MLX5_SET(query_hca_cap_in, in, ec_vf_function, ec_vf_func); MLX5_SET(query_hca_cap_in, in, op_mod, MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE << 1 | HCA_CAP_OPMOD_GET_CUR); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_cmd.c index baefb9a3fa05..bf99b933fd14 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_cmd.c @@ -2,6 +2,7 @@ /* Copyright (c) 2019 Mellanox Technologies. */ #include "dr_types.h" +#include "eswitch.h" int mlx5dr_cmd_query_esw_vport_context(struct mlx5_core_dev *mdev, bool other_vport, @@ -34,21 +35,28 @@ int mlx5dr_cmd_query_esw_vport_context(struct mlx5_core_dev *mdev, int mlx5dr_cmd_query_gvmi(struct mlx5_core_dev *mdev, bool other_vport, u16 vport_number, u16 *gvmi) { - bool ec_vf_func = other_vport ? mlx5_core_is_ec_vf_vport(mdev, vport_number) : false; u32 in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {}; int out_size; void *out; int err; + if (other_vport) { + err = mlx5_vport_get_vhca_id(mdev, vport_number, gvmi); + if (!err) + return 0; + + mlx5_core_err(mdev, "Failed to get vport vhca id for vport %d\n", + vport_number); + return err; + } + + /* get vhca_id for `this` function */ out_size = MLX5_ST_SZ_BYTES(query_hca_cap_out); out = kzalloc(out_size, GFP_KERNEL); if (!out) return -ENOMEM; MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP); - MLX5_SET(query_hca_cap_in, in, other_function, other_vport); - MLX5_SET(query_hca_cap_in, in, function_id, mlx5_vport_to_func_id(mdev, vport_number, ec_vf_func)); - MLX5_SET(query_hca_cap_in, in, ec_vf_function, ec_vf_func); MLX5_SET(query_hca_cap_in, in, op_mod, MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE << 1 | HCA_CAP_OPMOD_GET_CUR); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c index 231bedc6a252..2ed2e530b07d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c @@ -1239,7 +1239,9 @@ int mlx5_vport_get_vhca_id(struct mlx5_core_dev *dev, u16 vport, u16 *vhca_id) void *hca_caps; int err; - *vhca_id = 0; + /* try get vhca_id via eswitch */ + if (mlx5_esw_vport_vhca_id(dev->priv.eswitch, vport, vhca_id)) + return 0; query_ctx = kzalloc(query_out_sz, GFP_KERNEL); if (!query_ctx) @@ -1256,6 +1258,7 @@ out_free: kfree(query_ctx); return err; } +EXPORT_SYMBOL_GPL(mlx5_vport_get_vhca_id); int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap, u16 vport, u16 opmod) diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h index c36cc6d82926..c87b9507cfa1 100644 --- a/include/linux/mlx5/vport.h +++ b/include/linux/mlx5/vport.h @@ -135,4 +135,6 @@ int mlx5_nic_vport_unaffiliate_multiport(struct mlx5_core_dev *port_mdev); u64 mlx5_query_nic_system_image_guid(struct mlx5_core_dev *mdev); int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 vport, void *out, u16 opmod); +int mlx5_vport_get_vhca_id(struct mlx5_core_dev *dev, u16 vport, u16 *vhca_id); + #endif /* __MLX5_VPORT_H__ */ From d0f110773d776f919f93ba2e7dc254f80a7297b2 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Wed, 13 Aug 2025 21:18:03 +0200 Subject: [PATCH 0105/1536] net: phy: fixed: remove usage of a faux device A struct mii_bus doesn't need a parent, so we can simplify the code and remove using a faux device. Only difference is the following in sysfs under /sys/class/mdio_bus: old: fixed-0 -> '../../devices/faux/Fixed MDIO bus/mdio_bus/fixed-0' new: fixed-0 -> ../../devices/virtual/mdio_bus/fixed-0 Signed-off-by: Heiner Kallweit Link: https://patch.msgid.link/e9426bb9-f228-4b99-bc09-a80a958b5a93@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/phy/fixed_phy.c | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c index 033656d574b8..8ad49dc1105d 100644 --- a/drivers/net/phy/fixed_phy.c +++ b/drivers/net/phy/fixed_phy.c @@ -10,7 +10,6 @@ #include #include -#include #include #include #include @@ -40,7 +39,6 @@ struct fixed_phy { struct gpio_desc *link_gpiod; }; -static struct faux_device *fdev; static struct fixed_mdio_bus platform_fmb = { .phys = LIST_HEAD_INIT(platform_fmb.phys), }; @@ -317,20 +315,13 @@ static int __init fixed_mdio_bus_init(void) struct fixed_mdio_bus *fmb = &platform_fmb; int ret; - fdev = faux_device_create("Fixed MDIO bus", NULL, NULL); - if (!fdev) - return -ENODEV; - fmb->mii_bus = mdiobus_alloc(); - if (fmb->mii_bus == NULL) { - ret = -ENOMEM; - goto err_mdiobus_reg; - } + if (!fmb->mii_bus) + return -ENOMEM; snprintf(fmb->mii_bus->id, MII_BUS_ID_SIZE, "fixed-0"); fmb->mii_bus->name = "Fixed MDIO Bus"; fmb->mii_bus->priv = fmb; - fmb->mii_bus->parent = &fdev->dev; fmb->mii_bus->read = &fixed_mdio_read; fmb->mii_bus->write = &fixed_mdio_write; fmb->mii_bus->phy_mask = ~0; @@ -343,8 +334,6 @@ static int __init fixed_mdio_bus_init(void) err_mdiobus_alloc: mdiobus_free(fmb->mii_bus); -err_mdiobus_reg: - faux_device_destroy(fdev); return ret; } module_init(fixed_mdio_bus_init); @@ -356,7 +345,6 @@ static void __exit fixed_mdio_bus_exit(void) mdiobus_unregister(fmb->mii_bus); mdiobus_free(fmb->mii_bus); - faux_device_destroy(fdev); list_for_each_entry_safe(fp, tmp, &fmb->phys, node) { list_del(&fp->node); From 9e84de72aef9bcf0e751a0bff3ac91b0cf52366f Mon Sep 17 00:00:00 2001 From: Daniel Jurgens Date: Wed, 13 Aug 2025 22:19:55 +0300 Subject: [PATCH 0106/1536] net/mlx5: Query to see if host PF is disabled The host PF can be disabled, query firmware to check if the host PF of this function exists. Signed-off-by: Daniel Jurgens Reviewed-by: William Tu Signed-off-by: Tariq Toukan Link: https://patch.msgid.link/1755112796-467444-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- .../net/ethernet/mellanox/mlx5/core/eswitch.c | 23 +++++++++++++++++++ .../net/ethernet/mellanox/mlx5/core/eswitch.h | 1 + 2 files changed, 24 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 4917d185d0c3..31059fff30ec 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1038,6 +1038,25 @@ const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev) return ERR_PTR(err); } +static int mlx5_esw_host_functions_enabled_query(struct mlx5_eswitch *esw) +{ + const u32 *query_host_out; + + if (!mlx5_core_is_ecpf_esw_manager(esw->dev)) + return 0; + + query_host_out = mlx5_esw_query_functions(esw->dev); + if (IS_ERR(query_host_out)) + return PTR_ERR(query_host_out); + + esw->esw_funcs.host_funcs_disabled = + MLX5_GET(query_esw_functions_out, query_host_out, + host_params_context.host_pf_not_exist); + + kvfree(query_host_out); + return 0; +} + static void mlx5_eswitch_event_handler_register(struct mlx5_eswitch *esw) { if (esw->mode == MLX5_ESWITCH_OFFLOADS && mlx5_eswitch_is_funcs_handler(esw->dev)) { @@ -1874,6 +1893,10 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev) goto abort; } + err = mlx5_esw_host_functions_enabled_query(esw); + if (err) + goto abort; + err = mlx5_esw_vports_init(esw); if (err) goto abort; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index b0b8ef3ec3c4..6d86db20f468 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -323,6 +323,7 @@ struct mlx5_host_work { struct mlx5_esw_functions { struct mlx5_nb nb; + bool host_funcs_disabled; u16 num_vfs; u16 num_ec_vfs; }; From 520369ef43a8504f9d54ee219bb6c692d2e40028 Mon Sep 17 00:00:00 2001 From: Daniel Jurgens Date: Wed, 13 Aug 2025 22:19:56 +0300 Subject: [PATCH 0107/1536] net/mlx5: Support disabling host PFs Some devices support disabling the physical function on the host. When this is configured the vports for the host functions do not exist. This patch checks if host functions are enabled before trying to access their vports. Signed-off-by: Daniel Jurgens Reviewed-by: William Tu Signed-off-by: Tariq Toukan Link: https://patch.msgid.link/1755112796-467444-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- .../net/ethernet/mellanox/mlx5/core/eswitch.c | 62 ++++++++++++------- .../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 +++ .../mellanox/mlx5/core/eswitch_offloads.c | 34 +++++----- 3 files changed, 66 insertions(+), 37 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 31059fff30ec..3d533061311b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1297,17 +1297,19 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw, esw->mode == MLX5_ESWITCH_LEGACY; /* Enable PF vport */ - if (pf_needed) { + if (pf_needed && mlx5_esw_host_functions_enabled(esw->dev)) { ret = mlx5_eswitch_load_pf_vf_vport(esw, MLX5_VPORT_PF, enabled_events); if (ret) return ret; } - /* Enable external host PF HCA */ - ret = host_pf_enable_hca(esw->dev); - if (ret) - goto pf_hca_err; + if (mlx5_esw_host_functions_enabled(esw->dev)) { + /* Enable external host PF HCA */ + ret = host_pf_enable_hca(esw->dev); + if (ret) + goto pf_hca_err; + } /* Enable ECPF vport */ if (mlx5_ecpf_vport_exists(esw->dev)) { @@ -1339,9 +1341,10 @@ ec_vf_err: if (mlx5_ecpf_vport_exists(esw->dev)) mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_ECPF); ecpf_err: - host_pf_disable_hca(esw->dev); + if (mlx5_esw_host_functions_enabled(esw->dev)) + host_pf_disable_hca(esw->dev); pf_hca_err: - if (pf_needed) + if (pf_needed && mlx5_esw_host_functions_enabled(esw->dev)) mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); return ret; } @@ -1361,10 +1364,12 @@ void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw) mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_ECPF); } - host_pf_disable_hca(esw->dev); + if (mlx5_esw_host_functions_enabled(esw->dev)) + host_pf_disable_hca(esw->dev); - if (mlx5_core_is_ecpf_esw_manager(esw->dev) || - esw->mode == MLX5_ESWITCH_LEGACY) + if ((mlx5_core_is_ecpf_esw_manager(esw->dev) || + esw->mode == MLX5_ESWITCH_LEGACY) && + mlx5_esw_host_functions_enabled(esw->dev)) mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); } @@ -1693,7 +1698,8 @@ int mlx5_esw_sf_max_hpf_functions(struct mlx5_core_dev *dev, u16 *max_sfs, u16 * void *hca_caps; int err; - if (!mlx5_core_is_ecpf(dev)) { + if (!mlx5_core_is_ecpf(dev) || + !mlx5_esw_host_functions_enabled(dev)) { *max_sfs = 0; return 0; } @@ -1769,21 +1775,23 @@ static int mlx5_esw_vports_init(struct mlx5_eswitch *esw) xa_init(&esw->vports); - err = mlx5_esw_vport_alloc(esw, idx, MLX5_VPORT_PF); - if (err) - goto err; - if (esw->first_host_vport == MLX5_VPORT_PF) - xa_set_mark(&esw->vports, idx, MLX5_ESW_VPT_HOST_FN); - idx++; - - for (i = 0; i < mlx5_core_max_vfs(dev); i++) { - err = mlx5_esw_vport_alloc(esw, idx, idx); + if (mlx5_esw_host_functions_enabled(dev)) { + err = mlx5_esw_vport_alloc(esw, idx, MLX5_VPORT_PF); if (err) goto err; - xa_set_mark(&esw->vports, idx, MLX5_ESW_VPT_VF); - xa_set_mark(&esw->vports, idx, MLX5_ESW_VPT_HOST_FN); + if (esw->first_host_vport == MLX5_VPORT_PF) + xa_set_mark(&esw->vports, idx, MLX5_ESW_VPT_HOST_FN); idx++; + for (i = 0; i < mlx5_core_max_vfs(dev); i++) { + err = mlx5_esw_vport_alloc(esw, idx, idx); + if (err) + goto err; + xa_set_mark(&esw->vports, idx, MLX5_ESW_VPT_VF); + xa_set_mark(&esw->vports, idx, MLX5_ESW_VPT_HOST_FN); + idx++; + } } + base_sf_num = mlx5_sf_start_function_id(dev); for (i = 0; i < mlx5_sf_max_functions(dev); i++) { err = mlx5_esw_vport_alloc(esw, idx, base_sf_num + i); @@ -1883,6 +1891,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev) goto free_esw; esw->dev = dev; + dev->priv.eswitch = esw; esw->manager_vport = mlx5_eswitch_manager_vport(dev); esw->first_host_vport = mlx5_eswitch_first_host_vport_num(dev); @@ -1901,7 +1910,6 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev) if (err) goto abort; - dev->priv.eswitch = esw; err = esw_offloads_init(esw); if (err) goto reps_err; @@ -2433,3 +2441,11 @@ void mlx5_eswitch_unblock_ipsec(struct mlx5_core_dev *dev) dev->num_ipsec_offloads--; mutex_unlock(&esw->state_lock); } + +bool mlx5_esw_host_functions_enabled(const struct mlx5_core_dev *dev) +{ + if (!dev->priv.eswitch) + return true; + + return !dev->priv.eswitch->esw_funcs.host_funcs_disabled; +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 6d86db20f468..6c72080ac2a1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -899,6 +899,7 @@ int mlx5_esw_ipsec_vf_packet_offload_set(struct mlx5_eswitch *esw, struct mlx5_v bool enable); int mlx5_esw_ipsec_vf_packet_offload_supported(struct mlx5_core_dev *dev, u16 vport_num); +bool mlx5_esw_host_functions_enabled(const struct mlx5_core_dev *dev); #else /* CONFIG_MLX5_ESWITCH */ /* eswitch API stubs */ static inline int mlx5_eswitch_init(struct mlx5_core_dev *dev) { return 0; } @@ -966,6 +967,12 @@ static inline bool mlx5_eswitch_block_ipsec(struct mlx5_core_dev *dev) } static inline void mlx5_eswitch_unblock_ipsec(struct mlx5_core_dev *dev) {} + +static inline bool +mlx5_esw_host_functions_enabled(const struct mlx5_core_dev *dev) +{ + return true; +} #endif /* CONFIG_MLX5_ESWITCH */ #endif /* __MLX5_ESWITCH_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index bee906661282..8ec9c0e0f4b9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -1213,7 +1213,8 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, misc = MLX5_ADDR_OF(fte_match_param, spec->match_value, misc_parameters); - if (mlx5_core_is_ecpf_esw_manager(peer_dev)) { + if (mlx5_core_is_ecpf_esw_manager(peer_dev) && + mlx5_esw_host_functions_enabled(peer_dev)) { peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_PF); esw_set_peer_miss_rule_source_port(esw, peer_esw, spec, MLX5_VPORT_PF); @@ -1239,19 +1240,21 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw, flows[peer_vport->index] = flow; } - mlx5_esw_for_each_vf_vport(peer_esw, i, peer_vport, - mlx5_core_max_vfs(peer_dev)) { - esw_set_peer_miss_rule_source_port(esw, - peer_esw, - spec, peer_vport->vport); + if (mlx5_esw_host_functions_enabled(esw->dev)) { + mlx5_esw_for_each_vf_vport(peer_esw, i, peer_vport, + mlx5_core_max_vfs(peer_dev)) { + esw_set_peer_miss_rule_source_port(esw, peer_esw, + spec, + peer_vport->vport); - flow = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw), - spec, &flow_act, &dest, 1); - if (IS_ERR(flow)) { - err = PTR_ERR(flow); - goto add_vf_flow_err; + flow = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw), + spec, &flow_act, &dest, 1); + if (IS_ERR(flow)) { + err = PTR_ERR(flow); + goto add_vf_flow_err; + } + flows[peer_vport->index] = flow; } - flows[peer_vport->index] = flow; } if (mlx5_core_ec_sriov_enabled(peer_dev)) { @@ -1301,7 +1304,9 @@ add_vf_flow_err: mlx5_del_flow_rules(flows[peer_vport->index]); } add_ecpf_flow_err: - if (mlx5_core_is_ecpf_esw_manager(peer_dev)) { + + if (mlx5_core_is_ecpf_esw_manager(peer_dev) && + mlx5_esw_host_functions_enabled(peer_dev)) { peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_PF); mlx5_del_flow_rules(flows[peer_vport->index]); } @@ -4059,7 +4064,8 @@ mlx5_eswitch_vport_has_rep(const struct mlx5_eswitch *esw, u16 vport_num) { /* Currently, only ECPF based device has representor for host PF. */ if (vport_num == MLX5_VPORT_PF && - !mlx5_core_is_ecpf_esw_manager(esw->dev)) + (!mlx5_core_is_ecpf_esw_manager(esw->dev) || + !mlx5_esw_host_functions_enabled(esw->dev))) return false; if (vport_num == MLX5_VPORT_ECPF && From 8159572936392b514e404bab2d60e47cebd5acfe Mon Sep 17 00:00:00 2001 From: Thorsten Blum Date: Thu, 14 Aug 2025 20:05:14 +0200 Subject: [PATCH 0108/1536] net: Space: Replace memset(0) + strscpy() with strscpy_pad() Replace memset(0) followed by strscpy() with strscpy_pad() to improve netdev_boot_setup_add(). This avoids zeroing the memory before copying the string and ensures the destination buffer is only written to once, simplifying the code and improving efficiency. Signed-off-by: Thorsten Blum Link: https://patch.msgid.link/20250814180514.251000-2-thorsten.blum@linux.dev Signed-off-by: Jakub Kicinski --- drivers/net/Space.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/Space.c b/drivers/net/Space.c index dc50797a2ed0..c01e2c2f7d6c 100644 --- a/drivers/net/Space.c +++ b/drivers/net/Space.c @@ -67,8 +67,7 @@ static int netdev_boot_setup_add(char *name, struct ifmap *map) s = dev_boot_setup; for (i = 0; i < NETDEV_BOOT_SETUP_MAX; i++) { if (s[i].name[0] == '\0' || s[i].name[0] == ' ') { - memset(s[i].name, 0, sizeof(s[i].name)); - strscpy(s[i].name, name, IFNAMSIZ); + strscpy_pad(s[i].name, name); memcpy(&s[i].map, map, sizeof(s[i].map)); break; } From b826bf795564ddef6402cf2cb522ae035bd117ae Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Wed, 13 Aug 2025 11:04:45 +0100 Subject: [PATCH 0109/1536] net: phy: realtek: fix RTL8211F wake-on-lan support Implement Wake-on-Lan for RTL8211F correctly. The existing implementation has multiple issues: 1. It assumes that Wake-on-Lan can always be used, whether or not the interrupt is wired, and whether or not the interrupt is capable of waking the system. This breaks the ability for MAC drivers to detect whether the PHY WoL is functional. 2. switching the interrupt pin in the .set_wol() method to PMEB mode immediately silences link-state interrupts, which breaks phylib when interrupts are being used rather than polling mode. 3. the code claiming to "reset WOL status" was doing nothing of the sort. Bit 15 in page 0xd8a register 17 controls WoL reset, and needs to be pulsed low to reset the WoL state. This bit was always written as '1', resulting in no reset. 4. not resetting WoL state results in the PMEB pin remaining asserted, which in turn leads to an interrupt storm. Only resetting the WoL state in .set_wol() is not sufficient. 5. PMEB mode does not allow software detection of the wake-up event as there is no status bit to indicate we received the WoL packet. 6. across reboots of at least the Jetson Xavier NX system, the WoL configuration is preserved. Fix all of these issues by essentially rewriting the support. We: 1. clear the WoL event enable register at probe time. 2. detect whether we can support wake-up by having a valid interrupt, and the "wakeup-source" property in DT. If we can, then we mark the MDIO device as wakeup capable, and associate the interrupt with the wakeup source. 3. arrange for the get_wol() and set_wol() implementations to handle the case where the MDIO device has not been marked as wakeup capable (thereby returning no WoL support, and refusing to enable WoL support.) 4. avoid switching to PMEB mode, instead using INTB mode with the interrupt enable, reconfiguring the interrupt enables at suspend time, and restoring their original state at resume time (we track the state of the interrupt enable register in .config_intr() register.) 5. move WoL reset from .set_wol() to the suspend function to ensure that WoL state is cleared prior to suspend. This is necessary after the PME interrupt has been enabled as a second WoL packet will not re-raise a previously cleared PME interrupt. 6. when a PME interrupt (for wakeup) is asserted, pass this to the PM wakeup so it knows which device woke the system. This fixes WoL support in the Realtek RTL8211F driver when used on the nVidia Jetson Xavier NX platform, and needs to be applied before stmmac patches which allow these platforms to forward the ethtool WoL commands to the Realtek PHY. Signed-off-by: Russell King (Oracle) Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/E1um8Ld-008jxD-Mc@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/phy/realtek/realtek_main.c | 172 ++++++++++++++++++++----- 1 file changed, 140 insertions(+), 32 deletions(-) diff --git a/drivers/net/phy/realtek/realtek_main.c b/drivers/net/phy/realtek/realtek_main.c index a7541899e327..904fa8b1aa03 100644 --- a/drivers/net/phy/realtek/realtek_main.c +++ b/drivers/net/phy/realtek/realtek_main.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -31,6 +32,7 @@ #define RTL821x_INER 0x12 #define RTL8211B_INER_INIT 0x6400 #define RTL8211E_INER_LINK_STATUS BIT(10) +#define RTL8211F_INER_PME BIT(7) #define RTL8211F_INER_LINK_STATUS BIT(4) #define RTL821x_INSR 0x13 @@ -96,17 +98,13 @@ #define RTL8211F_RXCR 0x15 #define RTL8211F_RX_DELAY BIT(3) -/* RTL8211F WOL interrupt configuration */ -#define RTL8211F_INTBCR_PAGE 0xd40 -#define RTL8211F_INTBCR 0x16 -#define RTL8211F_INTBCR_INTB_PMEB BIT(5) - /* RTL8211F WOL settings */ -#define RTL8211F_WOL_SETTINGS_PAGE 0xd8a +#define RTL8211F_WOL_PAGE 0xd8a #define RTL8211F_WOL_SETTINGS_EVENTS 16 #define RTL8211F_WOL_EVENT_MAGIC BIT(12) -#define RTL8211F_WOL_SETTINGS_STATUS 17 -#define RTL8211F_WOL_STATUS_RESET (BIT(15) | 0x1fff) +#define RTL8211F_WOL_RST_RMSQ 17 +#define RTL8211F_WOL_RG_RSTB BIT(15) +#define RTL8211F_WOL_RMSQ 0x1fff /* RTL8211F Unique phyiscal and multicast address (WOL) */ #define RTL8211F_PHYSICAL_ADDR_PAGE 0xd8c @@ -172,7 +170,8 @@ struct rtl821x_priv { u16 phycr2; bool has_phycr2; struct clk *clk; - u32 saved_wolopts; + /* rtl8211f */ + u16 iner; }; static int rtl821x_read_page(struct phy_device *phydev) @@ -255,6 +254,34 @@ static int rtl821x_probe(struct phy_device *phydev) return 0; } +static int rtl8211f_probe(struct phy_device *phydev) +{ + struct device *dev = &phydev->mdio.dev; + int ret; + + ret = rtl821x_probe(phydev); + if (ret < 0) + return ret; + + /* Disable all PME events */ + ret = phy_write_paged(phydev, RTL8211F_WOL_PAGE, + RTL8211F_WOL_SETTINGS_EVENTS, 0); + if (ret < 0) + return ret; + + /* Mark this PHY as wakeup capable and register the interrupt as a + * wakeup IRQ if the PHY is marked as a wakeup source in firmware, + * and the interrupt is valid. + */ + if (device_property_read_bool(dev, "wakeup-source") && + phy_interrupt_is_valid(phydev)) { + device_set_wakeup_capable(dev, true); + devm_pm_set_wake_irq(dev, phydev->irq); + } + + return ret; +} + static int rtl8201_ack_interrupt(struct phy_device *phydev) { int err; @@ -352,6 +379,7 @@ static int rtl8211e_config_intr(struct phy_device *phydev) static int rtl8211f_config_intr(struct phy_device *phydev) { + struct rtl821x_priv *priv = phydev->priv; u16 val; int err; @@ -362,8 +390,10 @@ static int rtl8211f_config_intr(struct phy_device *phydev) val = RTL8211F_INER_LINK_STATUS; err = phy_write_paged(phydev, 0xa42, RTL821x_INER, val); + if (err == 0) + priv->iner = val; } else { - val = 0; + priv->iner = val = 0; err = phy_write_paged(phydev, 0xa42, RTL821x_INER, val); if (err) return err; @@ -426,21 +456,34 @@ static irqreturn_t rtl8211f_handle_interrupt(struct phy_device *phydev) return IRQ_NONE; } - if (!(irq_status & RTL8211F_INER_LINK_STATUS)) - return IRQ_NONE; + if (irq_status & RTL8211F_INER_LINK_STATUS) { + phy_trigger_machine(phydev); + return IRQ_HANDLED; + } - phy_trigger_machine(phydev); + if (irq_status & RTL8211F_INER_PME) { + pm_wakeup_event(&phydev->mdio.dev, 0); + return IRQ_HANDLED; + } - return IRQ_HANDLED; + return IRQ_NONE; } static void rtl8211f_get_wol(struct phy_device *dev, struct ethtool_wolinfo *wol) { int wol_events; + /* If the PHY is not capable of waking the system, then WoL can not + * be supported. + */ + if (!device_can_wakeup(&dev->mdio.dev)) { + wol->supported = 0; + return; + } + wol->supported = WAKE_MAGIC; - wol_events = phy_read_paged(dev, RTL8211F_WOL_SETTINGS_PAGE, RTL8211F_WOL_SETTINGS_EVENTS); + wol_events = phy_read_paged(dev, RTL8211F_WOL_PAGE, RTL8211F_WOL_SETTINGS_EVENTS); if (wol_events < 0) return; @@ -453,6 +496,9 @@ static int rtl8211f_set_wol(struct phy_device *dev, struct ethtool_wolinfo *wol) const u8 *mac_addr = dev->attached_dev->dev_addr; int oldpage; + if (!device_can_wakeup(&dev->mdio.dev)) + return -EOPNOTSUPP; + oldpage = phy_save_page(dev); if (oldpage < 0) goto err; @@ -464,25 +510,23 @@ static int rtl8211f_set_wol(struct phy_device *dev, struct ethtool_wolinfo *wol) __phy_write(dev, RTL8211F_PHYSICAL_ADDR_WORD1, mac_addr[3] << 8 | (mac_addr[2])); __phy_write(dev, RTL8211F_PHYSICAL_ADDR_WORD2, mac_addr[5] << 8 | (mac_addr[4])); - /* Enable magic packet matching and reset WOL status */ - rtl821x_write_page(dev, RTL8211F_WOL_SETTINGS_PAGE); + /* Enable magic packet matching */ + rtl821x_write_page(dev, RTL8211F_WOL_PAGE); __phy_write(dev, RTL8211F_WOL_SETTINGS_EVENTS, RTL8211F_WOL_EVENT_MAGIC); - __phy_write(dev, RTL8211F_WOL_SETTINGS_STATUS, RTL8211F_WOL_STATUS_RESET); - - /* Enable the WOL interrupt */ - rtl821x_write_page(dev, RTL8211F_INTBCR_PAGE); - __phy_set_bits(dev, RTL8211F_INTBCR, RTL8211F_INTBCR_INTB_PMEB); + /* Set the maximum packet size, and assert WoL reset */ + __phy_write(dev, RTL8211F_WOL_RST_RMSQ, RTL8211F_WOL_RMSQ); } else { - /* Disable the WOL interrupt */ - rtl821x_write_page(dev, RTL8211F_INTBCR_PAGE); - __phy_clear_bits(dev, RTL8211F_INTBCR, RTL8211F_INTBCR_INTB_PMEB); - - /* Disable magic packet matching and reset WOL status */ - rtl821x_write_page(dev, RTL8211F_WOL_SETTINGS_PAGE); + /* Disable magic packet matching */ + rtl821x_write_page(dev, RTL8211F_WOL_PAGE); __phy_write(dev, RTL8211F_WOL_SETTINGS_EVENTS, 0); - __phy_write(dev, RTL8211F_WOL_SETTINGS_STATUS, RTL8211F_WOL_STATUS_RESET); + + /* Place WoL in reset */ + __phy_clear_bits(dev, RTL8211F_WOL_RST_RMSQ, + RTL8211F_WOL_RG_RSTB); } + device_set_wakeup_enable(&dev->mdio.dev, !!(wol->wolopts & WAKE_MAGIC)); + err: return phy_restore_page(dev, oldpage, 0); } @@ -628,6 +672,52 @@ static int rtl821x_suspend(struct phy_device *phydev) return ret; } +static int rtl8211f_suspend(struct phy_device *phydev) +{ + u16 wol_rst; + int ret; + + ret = rtl821x_suspend(phydev); + if (ret < 0) + return ret; + + /* If a PME event is enabled, then configure the interrupt for + * PME events only, disabling link interrupt. We avoid switching + * to PMEB mode as we don't have a status bit for that. + */ + if (device_may_wakeup(&phydev->mdio.dev)) { + ret = phy_write_paged(phydev, 0xa42, RTL821x_INER, + RTL8211F_INER_PME); + if (ret < 0) + goto err; + + /* Read the INSR to clear any pending interrupt */ + phy_read_paged(phydev, RTL8211F_INSR_PAGE, RTL8211F_INSR); + + /* Reset the WoL to ensure that an event is picked up. + * Unless we do this, even if we receive another packet, + * we may not have a PME interrupt raised. + */ + ret = phy_read_paged(phydev, RTL8211F_WOL_PAGE, + RTL8211F_WOL_RST_RMSQ); + if (ret < 0) + goto err; + + wol_rst = ret & ~RTL8211F_WOL_RG_RSTB; + ret = phy_write_paged(phydev, RTL8211F_WOL_PAGE, + RTL8211F_WOL_RST_RMSQ, wol_rst); + if (ret < 0) + goto err; + + wol_rst |= RTL8211F_WOL_RG_RSTB; + ret = phy_write_paged(phydev, RTL8211F_WOL_PAGE, + RTL8211F_WOL_RST_RMSQ, wol_rst); + } + +err: + return ret; +} + static int rtl821x_resume(struct phy_device *phydev) { struct rtl821x_priv *priv = phydev->priv; @@ -645,6 +735,24 @@ static int rtl821x_resume(struct phy_device *phydev) return 0; } +static int rtl8211f_resume(struct phy_device *phydev) +{ + struct rtl821x_priv *priv = phydev->priv; + int ret; + + ret = rtl821x_resume(phydev); + if (ret < 0) + return ret; + + /* If the device was programmed for a PME event, restore the interrupt + * enable so phylib can receive link state interrupts. + */ + if (device_may_wakeup(&phydev->mdio.dev)) + ret = phy_write_paged(phydev, 0xa42, RTL821x_INER, priv->iner); + + return ret; +} + static int rtl8211x_led_hw_is_supported(struct phy_device *phydev, u8 index, unsigned long rules) { @@ -1627,15 +1735,15 @@ static struct phy_driver realtek_drvs[] = { }, { PHY_ID_MATCH_EXACT(0x001cc916), .name = "RTL8211F Gigabit Ethernet", - .probe = rtl821x_probe, + .probe = rtl8211f_probe, .config_init = &rtl8211f_config_init, .read_status = rtlgen_read_status, .config_intr = &rtl8211f_config_intr, .handle_interrupt = rtl8211f_handle_interrupt, .set_wol = rtl8211f_set_wol, .get_wol = rtl8211f_get_wol, - .suspend = rtl821x_suspend, - .resume = rtl821x_resume, + .suspend = rtl8211f_suspend, + .resume = rtl8211f_resume, .read_page = rtl821x_read_page, .write_page = rtl821x_write_page, .flags = PHY_ALWAYS_CALL_SUSPEND, From a510980e740cc58be5733d4c15b22c7822e84c71 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Wed, 13 Aug 2025 12:21:19 +0100 Subject: [PATCH 0110/1536] dt-bindings: net: realtek,rtl82xx: document wakeup-source property The RTL8211F PHY has two modes for a single INTB/PMEB pin: 1. INTB mode, where it signals interrupts to the CPU, which can include wake-on-LAN events. 2. PMEB mode, where it only signals a wake-on-LAN event, which may either be a latched logic low until software manually clears the WoL state, or pulsed mode. In the case of (1), there is no way to know whether the interrupt to which the PHY is connected is capable of waking the system. In the case of (2), there would be no interrupt property in the PHY's DT description, and thus there is nothing to describe whether the pin is even wired to anything such as a power management controller. There is a "wakeup-source" property which can be applied to any device - see Documentation/devicetree/bindings/power/wakeup-source.txt Case 1 above matches example 2 in this document, and case 2 above matches example 3. Therefore, it seems reasonable to make use of this existing specification, albiet it hasn't been converted to YAML. Document the wakeup-source property in the device description, which will indicate that the PHY has been wired up in such a way that it can wake the system from a low power state. We will use this in a rewrite of the existing broken Wake-on-Lan code that was merged during the 6.16 merge window to support case 1. Case 2 can be added to the driver later without needing to further alter the DT description. To be clear, the existing Wake-on-Lan code that was recently merged has multiple functional issues. Signed-off-by: Russell King (Oracle) Acked-by: Conor Dooley Link: https://patch.msgid.link/E1um9Xj-008kBx-72@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- Documentation/devicetree/bindings/net/realtek,rtl82xx.yaml | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/net/realtek,rtl82xx.yaml b/Documentation/devicetree/bindings/net/realtek,rtl82xx.yaml index d248a08a2136..2b5697bd7c5d 100644 --- a/Documentation/devicetree/bindings/net/realtek,rtl82xx.yaml +++ b/Documentation/devicetree/bindings/net/realtek,rtl82xx.yaml @@ -45,12 +45,16 @@ properties: description: Disable CLKOUT clock, CLKOUT clock default is enabled after hardware reset. - realtek,aldps-enable: type: boolean description: Enable ALDPS mode, ALDPS mode default is disabled after hardware reset. + wakeup-source: + type: boolean + description: + Enable Wake-on-LAN support for the RTL8211F PHY. + unevaluatedProperties: false allOf: From 60cbe71fdba16d39de24bd71e1638810da5ae7ee Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 13 Aug 2025 23:43:03 +0200 Subject: [PATCH 0111/1536] net: dsa: Move KS8995 to the DSA subsystem By reading the datasheets for the KS8995 it is obvious that this is a 100 Mbit DSA switch. Let us start the refactoring by moving it to the DSA subsystem to preserve development history. Verified that the chip still probes the same after this patch provided CONFIG_HAVE_NET_DSA, CONFIG_NET_DSA and CONFIG_DSA_KS8995 are selected. Signed-off-by: Linus Walleij Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/20250813-ks8995-to-dsa-v1-1-75c359ede3a5@linaro.org Signed-off-by: Jakub Kicinski --- drivers/net/dsa/Kconfig | 7 +++++++ drivers/net/dsa/Makefile | 1 + drivers/net/{phy/spi_ks8995.c => dsa/ks8995.c} | 0 drivers/net/phy/Kconfig | 4 ---- drivers/net/phy/Makefile | 1 - 5 files changed, 8 insertions(+), 5 deletions(-) rename drivers/net/{phy/spi_ks8995.c => dsa/ks8995.c} (100%) diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig index ec759f8cb0e2..49326a9a0cff 100644 --- a/drivers/net/dsa/Kconfig +++ b/drivers/net/dsa/Kconfig @@ -99,6 +99,13 @@ config NET_DSA_RZN1_A5PSW This driver supports the A5PSW switch, which is embedded in Renesas RZ/N1 SoC. +config NET_DSA_KS8995 + tristate "Micrel KS8995 family 5-ports 10/100 Ethernet switches" + depends on SPI + help + This driver supports the Micrel KS8995 family of 10/100 Mbit ethernet + switches, managed over SPI. + config NET_DSA_SMSC_LAN9303 tristate select NET_DSA_TAG_LAN9303 diff --git a/drivers/net/dsa/Makefile b/drivers/net/dsa/Makefile index cb9a97340e58..23dbdf1a36a8 100644 --- a/drivers/net/dsa/Makefile +++ b/drivers/net/dsa/Makefile @@ -5,6 +5,7 @@ obj-$(CONFIG_NET_DSA_LOOP) += dsa_loop.o ifdef CONFIG_NET_DSA_LOOP obj-$(CONFIG_FIXED_PHY) += dsa_loop_bdinfo.o endif +obj-$(CONFIG_NET_DSA_KS8995) += ks8995.o obj-$(CONFIG_NET_DSA_LANTIQ_GSWIP) += lantiq_gswip.o obj-$(CONFIG_NET_DSA_MT7530) += mt7530.o obj-$(CONFIG_NET_DSA_MT7530_MDIO) += mt7530-mdio.o diff --git a/drivers/net/phy/spi_ks8995.c b/drivers/net/dsa/ks8995.c similarity index 100% rename from drivers/net/phy/spi_ks8995.c rename to drivers/net/dsa/ks8995.c diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index 28acc6392cfc..a7fb1d7cae94 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -465,7 +465,3 @@ config XILINX_GMII2RGMII Ethernet physical media devices and the Gigabit Ethernet controller. endif # PHYLIB - -config MICREL_KS8995MA - tristate "Micrel KS8995MA 5-ports 10/100 managed Ethernet switch" - depends on SPI diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index b4795aaf9c1c..402a33d559de 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -72,7 +72,6 @@ obj-$(CONFIG_MAXLINEAR_GPHY) += mxl-gpy.o obj-$(CONFIG_MAXLINEAR_86110_PHY) += mxl-86110.o obj-y += mediatek/ obj-$(CONFIG_MESON_GXL_PHY) += meson-gxl.o -obj-$(CONFIG_MICREL_KS8995MA) += spi_ks8995.o obj-$(CONFIG_MICREL_PHY) += micrel.o obj-$(CONFIG_MICROCHIP_PHY) += microchip.o obj-$(CONFIG_MICROCHIP_PHY_RDS_PTP) += microchip_rds_ptp.o From ccf29cb84972f97c98ac31f7d6fb1680dc5d3816 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 13 Aug 2025 23:43:04 +0200 Subject: [PATCH 0112/1536] net: dsa: ks8995: Add proper RESET delay According to the datasheet we need to wait 100us before accessing any registers in the KS8995 after a reset de-assertion. Add this delay, if and only if we obtained a GPIO descriptor, otherwise it is just a pointless delay. Signed-off-by: Linus Walleij Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/20250813-ks8995-to-dsa-v1-2-75c359ede3a5@linaro.org Signed-off-by: Jakub Kicinski --- drivers/net/dsa/ks8995.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/net/dsa/ks8995.c b/drivers/net/dsa/ks8995.c index d135b061d810..bdee8c62315f 100644 --- a/drivers/net/dsa/ks8995.c +++ b/drivers/net/dsa/ks8995.c @@ -438,9 +438,15 @@ static int ks8995_probe(struct spi_device *spi) if (err) return err; - /* de-assert switch reset */ - /* FIXME: this likely requires a delay */ - gpiod_set_value_cansleep(ks->reset_gpio, 0); + if (ks->reset_gpio) { + /* + * If a reset line was obtained, wait for 100us after + * de-asserting RESET before accessing any registers, see + * the KS8995MA datasheet, page 44. + */ + gpiod_set_value_cansleep(ks->reset_gpio, 0); + udelay(100); + } spi_set_drvdata(spi, ks); From d3f2b604a1f92d9ee63dfd5a2313e0024d184682 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 13 Aug 2025 23:43:05 +0200 Subject: [PATCH 0113/1536] net: dsa: ks8995: Delete sysfs register access There is some sysfs file to read and write registers randomly in the ks8995 driver. The contemporary way to achieve the same thing is to implement regmap abstractions, if needed. Delete this and implement regmap later if we want to be able to inspect individual registers. Signed-off-by: Linus Walleij Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/20250813-ks8995-to-dsa-v1-3-75c359ede3a5@linaro.org Signed-off-by: Jakub Kicinski --- drivers/net/dsa/ks8995.c | 47 ---------------------------------------- 1 file changed, 47 deletions(-) diff --git a/drivers/net/dsa/ks8995.c b/drivers/net/dsa/ks8995.c index bdee8c62315f..36f6b2d87712 100644 --- a/drivers/net/dsa/ks8995.c +++ b/drivers/net/dsa/ks8995.c @@ -288,30 +288,6 @@ static int ks8995_reset(struct ks8995_switch *ks) return ks8995_start(ks); } -static ssize_t ks8995_registers_read(struct file *filp, struct kobject *kobj, - const struct bin_attribute *bin_attr, char *buf, loff_t off, size_t count) -{ - struct device *dev; - struct ks8995_switch *ks8995; - - dev = kobj_to_dev(kobj); - ks8995 = dev_get_drvdata(dev); - - return ks8995_read(ks8995, buf, off, count); -} - -static ssize_t ks8995_registers_write(struct file *filp, struct kobject *kobj, - const struct bin_attribute *bin_attr, char *buf, loff_t off, size_t count) -{ - struct device *dev; - struct ks8995_switch *ks8995; - - dev = kobj_to_dev(kobj); - ks8995 = dev_get_drvdata(dev); - - return ks8995_write(ks8995, buf, off, count); -} - /* ks8995_get_revision - get chip revision * @ks: pointer to switch instance * @@ -395,16 +371,6 @@ err_out: return err; } -static const struct bin_attribute ks8995_registers_attr = { - .attr = { - .name = "registers", - .mode = 0600, - }, - .size = KS8995_REGS_SIZE, - .read = ks8995_registers_read, - .write = ks8995_registers_write, -}; - /* ------------------------------------------------------------------------ */ static int ks8995_probe(struct spi_device *spi) { @@ -462,21 +428,10 @@ static int ks8995_probe(struct spi_device *spi) if (err) return err; - memcpy(&ks->regs_attr, &ks8995_registers_attr, sizeof(ks->regs_attr)); - ks->regs_attr.size = ks->chip->regs_size; - err = ks8995_reset(ks); if (err) return err; - sysfs_attr_init(&ks->regs_attr.attr); - err = sysfs_create_bin_file(&spi->dev.kobj, &ks->regs_attr); - if (err) { - dev_err(&spi->dev, "unable to create sysfs file, err=%d\n", - err); - return err; - } - dev_info(&spi->dev, "%s device found, Chip ID:%x, Revision:%x\n", ks->chip->name, ks->chip->chip_id, ks->revision_id); @@ -487,8 +442,6 @@ static void ks8995_remove(struct spi_device *spi) { struct ks8995_switch *ks = spi_get_drvdata(spi); - sysfs_remove_bin_file(&spi->dev.kobj, &ks->regs_attr); - /* assert reset */ gpiod_set_value_cansleep(ks->reset_gpio, 1); } From a7fe8b266f659624309c69208be0410584bb9980 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 13 Aug 2025 23:43:06 +0200 Subject: [PATCH 0114/1536] net: dsa: ks8995: Add basic switch set-up We start to extend the KS8995 driver by simply registering it as a DSA device and implementing a few switch callbacks for STP set-up and such to begin with. No special tags or other advanced stuff: we use DSA_TAG_NONE and rely on the default set-up in the switch with the special DSA tags turned off. This makes the switch wire up properly to all its PHY's and simple bridge traffic works properly. After this the bridge DT bindings are respected, ports and their PHYs get connected to the switch and react appropriately through the phylib when cables are plugged in etc. Tested like this in a hacky OpenWrt image: Bring up conduit interface manually: ixp4xx_eth c8009000.ethernet eth0: eth0: link up, speed 100 Mb/s, full duplex spi-ks8995 spi0.0: enable port 0 spi-ks8995 spi0.0: set KS8995_REG_PC2 for port 0 to 06 spi-ks8995 spi0.0 lan1: configuring for phy/mii link mode spi-ks8995 spi0.0 lan1: Link is Up - 100Mbps/Full - flow control rx/tx PING 169.254.1.1 (169.254.1.1): 56 data bytes 64 bytes from 169.254.1.1: seq=0 ttl=64 time=1.629 ms 64 bytes from 169.254.1.1: seq=1 ttl=64 time=0.951 ms I also tested SSH from the device to the host and it works fine. It also works fine to ping the device from the host and to SSH into the device from the host. This brings the ks8995 driver to a reasonable state where it can be used from the current device tree bindings and the existing device trees in the kernel. Signed-off-by: Linus Walleij Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/20250813-ks8995-to-dsa-v1-4-75c359ede3a5@linaro.org Signed-off-by: Jakub Kicinski --- drivers/net/dsa/Kconfig | 1 + drivers/net/dsa/ks8995.c | 398 ++++++++++++++++++++++++++++++++++++++- 2 files changed, 396 insertions(+), 3 deletions(-) diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig index 49326a9a0cff..202a35d8d061 100644 --- a/drivers/net/dsa/Kconfig +++ b/drivers/net/dsa/Kconfig @@ -102,6 +102,7 @@ config NET_DSA_RZN1_A5PSW config NET_DSA_KS8995 tristate "Micrel KS8995 family 5-ports 10/100 Ethernet switches" depends on SPI + select NET_DSA_TAG_NONE help This driver supports the Micrel KS8995 family of 10/100 Mbit ethernet switches, managed over SPI. diff --git a/drivers/net/dsa/ks8995.c b/drivers/net/dsa/ks8995.c index 36f6b2d87712..5c4c83e00477 100644 --- a/drivers/net/dsa/ks8995.c +++ b/drivers/net/dsa/ks8995.c @@ -3,6 +3,7 @@ * SPI driver for Micrel/Kendin KS8995M and KSZ8864RMN ethernet switches * * Copyright (C) 2008 Gabor Juhos + * Copyright (C) 2025 Linus Walleij * * This file was based on: drivers/spi/at25.c * Copyright (C) 2006 David Brownell @@ -10,6 +11,9 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include +#include +#include #include #include #include @@ -17,8 +21,8 @@ #include #include #include - #include +#include #define DRV_VERSION "0.1.1" #define DRV_DESC "Micrel KS8995 Ethernet switch SPI driver" @@ -29,18 +33,59 @@ #define KS8995_REG_ID1 0x01 /* Chip ID1 */ #define KS8995_REG_GC0 0x02 /* Global Control 0 */ + +#define KS8995_GC0_P5_PHY BIT(3) /* Port 5 PHY enabled */ + #define KS8995_REG_GC1 0x03 /* Global Control 1 */ #define KS8995_REG_GC2 0x04 /* Global Control 2 */ + +#define KS8995_GC2_HUGE BIT(2) /* Huge packet support */ +#define KS8995_GC2_LEGAL BIT(1) /* Legal size override */ + #define KS8995_REG_GC3 0x05 /* Global Control 3 */ #define KS8995_REG_GC4 0x06 /* Global Control 4 */ + +#define KS8995_GC4_10BT BIT(4) /* Force switch to 10Mbit */ +#define KS8995_GC4_MII_FLOW BIT(5) /* MII full-duplex flow control enable */ +#define KS8995_GC4_MII_HD BIT(6) /* MII half-duplex mode enable */ + #define KS8995_REG_GC5 0x07 /* Global Control 5 */ #define KS8995_REG_GC6 0x08 /* Global Control 6 */ #define KS8995_REG_GC7 0x09 /* Global Control 7 */ #define KS8995_REG_GC8 0x0a /* Global Control 8 */ #define KS8995_REG_GC9 0x0b /* Global Control 9 */ -#define KS8995_REG_PC(p, r) ((0x10 * p) + r) /* Port Control */ -#define KS8995_REG_PS(p, r) ((0x10 * p) + r + 0xe) /* Port Status */ +#define KS8995_GC9_SPECIAL BIT(0) /* Special tagging mode (DSA) */ + +/* In DSA the ports 1-4 are numbered 0-3 and the CPU port is port 4 */ +#define KS8995_REG_PC(p, r) (0x10 + (0x10 * (p)) + (r)) /* Port Control */ +#define KS8995_REG_PS(p, r) (0x1e + (0x10 * (p)) + (r)) /* Port Status */ + +#define KS8995_REG_PC0 0x00 /* Port Control 0 */ +#define KS8995_REG_PC1 0x01 /* Port Control 1 */ +#define KS8995_REG_PC2 0x02 /* Port Control 2 */ +#define KS8995_REG_PC3 0x03 /* Port Control 3 */ +#define KS8995_REG_PC4 0x04 /* Port Control 4 */ +#define KS8995_REG_PC5 0x05 /* Port Control 5 */ +#define KS8995_REG_PC6 0x06 /* Port Control 6 */ +#define KS8995_REG_PC7 0x07 /* Port Control 7 */ +#define KS8995_REG_PC8 0x08 /* Port Control 8 */ +#define KS8995_REG_PC9 0x09 /* Port Control 9 */ +#define KS8995_REG_PC10 0x0a /* Port Control 10 */ +#define KS8995_REG_PC11 0x0b /* Port Control 11 */ +#define KS8995_REG_PC12 0x0c /* Port Control 12 */ +#define KS8995_REG_PC13 0x0d /* Port Control 13 */ + +#define KS8995_PC0_TAG_INS BIT(2) /* Enable tag insertion on port */ +#define KS8995_PC0_TAG_REM BIT(1) /* Enable tag removal on port */ +#define KS8995_PC0_PRIO_EN BIT(0) /* Enable priority handling */ + +#define KS8995_PC2_TXEN BIT(2) /* Enable TX on port */ +#define KS8995_PC2_RXEN BIT(1) /* Enable RX on port */ +#define KS8995_PC2_LEARN_DIS BIT(0) /* Disable learning on port */ + +#define KS8995_PC13_TXDIS BIT(6) /* Disable transmitter */ +#define KS8995_PC13_PWDN BIT(3) /* Power down */ #define KS8995_REG_TPC0 0x60 /* TOS Priority Control 0 */ #define KS8995_REG_TPC1 0x61 /* TOS Priority Control 1 */ @@ -91,6 +136,8 @@ #define KS8995_CMD_WRITE 0x02U #define KS8995_CMD_READ 0x03U +#define KS8995_CPU_PORT 4 +#define KS8995_NUM_PORTS 5 /* 5 ports including the CPU port */ #define KS8995_RESET_DELAY 10 /* usec */ enum ks8995_chip_variant { @@ -138,11 +185,14 @@ static const struct ks8995_chip_params ks8995_chip[] = { struct ks8995_switch { struct spi_device *spi; + struct device *dev; + struct dsa_switch *ds; struct mutex lock; struct gpio_desc *reset_gpio; struct bin_attribute regs_attr; const struct ks8995_chip_params *chip; int revision_id; + unsigned int max_mtu[KS8995_NUM_PORTS]; }; static const struct spi_device_id ks8995_id[] = { @@ -371,6 +421,327 @@ err_out: return err; } +static int ks8995_check_config(struct ks8995_switch *ks) +{ + int ret; + u8 val; + + ret = ks8995_read_reg(ks, KS8995_REG_GC0, &val); + if (ret) { + dev_err(ks->dev, "failed to read KS8995_REG_GC0\n"); + return ret; + } + + dev_dbg(ks->dev, "port 5 PHY %senabled\n", + (val & KS8995_GC0_P5_PHY) ? "" : "not "); + + val |= KS8995_GC0_P5_PHY; + ret = ks8995_write_reg(ks, KS8995_REG_GC0, val); + if (ret) + dev_err(ks->dev, "failed to set KS8995_REG_GC0\n"); + + dev_dbg(ks->dev, "set KS8995_REG_GC0 to 0x%02x\n", val); + + return 0; +} + +static void +ks8995_mac_config(struct phylink_config *config, unsigned int mode, + const struct phylink_link_state *state) +{ +} + +static void +ks8995_mac_link_up(struct phylink_config *config, struct phy_device *phydev, + unsigned int mode, phy_interface_t interface, + int speed, int duplex, bool tx_pause, bool rx_pause) +{ + struct dsa_port *dp = dsa_phylink_to_port(config); + struct ks8995_switch *ks = dp->ds->priv; + int port = dp->index; + int ret; + u8 val; + + /* Allow forcing the mode on the fixed CPU port, no autonegotiation. + * We assume autonegotiation works on the PHY-facing ports. + */ + if (port != KS8995_CPU_PORT) + return; + + dev_dbg(ks->dev, "MAC link up on CPU port (%d)\n", port); + + ret = ks8995_read_reg(ks, KS8995_REG_GC4, &val); + if (ret) { + dev_err(ks->dev, "failed to read KS8995_REG_GC4\n"); + return; + } + + /* Conjure port config */ + switch (speed) { + case SPEED_10: + dev_dbg(ks->dev, "set switch MII to 100Mbit mode\n"); + val |= KS8995_GC4_10BT; + break; + case SPEED_100: + default: + dev_dbg(ks->dev, "set switch MII to 100Mbit mode\n"); + val &= ~KS8995_GC4_10BT; + break; + } + + if (duplex == DUPLEX_HALF) { + dev_dbg(ks->dev, "set switch MII to half duplex\n"); + val |= KS8995_GC4_MII_HD; + } else { + dev_dbg(ks->dev, "set switch MII to full duplex\n"); + val &= ~KS8995_GC4_MII_HD; + } + + dev_dbg(ks->dev, "set KS8995_REG_GC4 to %02x\n", val); + + /* Enable the CPU port */ + ret = ks8995_write_reg(ks, KS8995_REG_GC4, val); + if (ret) + dev_err(ks->dev, "failed to set KS8995_REG_GC4\n"); +} + +static void +ks8995_mac_link_down(struct phylink_config *config, unsigned int mode, + phy_interface_t interface) +{ + struct dsa_port *dp = dsa_phylink_to_port(config); + struct ks8995_switch *ks = dp->ds->priv; + int port = dp->index; + + if (port != KS8995_CPU_PORT) + return; + + dev_dbg(ks->dev, "MAC link down on CPU port (%d)\n", port); + + /* Disable the CPU port */ +} + +static const struct phylink_mac_ops ks8995_phylink_mac_ops = { + .mac_config = ks8995_mac_config, + .mac_link_up = ks8995_mac_link_up, + .mac_link_down = ks8995_mac_link_down, +}; + +static enum +dsa_tag_protocol ks8995_get_tag_protocol(struct dsa_switch *ds, + int port, + enum dsa_tag_protocol mp) +{ + /* This switch actually uses the 6 byte KS8995 protocol */ + return DSA_TAG_PROTO_NONE; +} + +static int ks8995_setup(struct dsa_switch *ds) +{ + return 0; +} + +static int ks8995_port_enable(struct dsa_switch *ds, int port, + struct phy_device *phy) +{ + struct ks8995_switch *ks = ds->priv; + + dev_dbg(ks->dev, "enable port %d\n", port); + + return 0; +} + +static void ks8995_port_disable(struct dsa_switch *ds, int port) +{ + struct ks8995_switch *ks = ds->priv; + + dev_dbg(ks->dev, "disable port %d\n", port); +} + +static int ks8995_port_pre_bridge_flags(struct dsa_switch *ds, int port, + struct switchdev_brport_flags flags, + struct netlink_ext_ack *extack) +{ + /* We support enabling/disabling learning */ + if (flags.mask & ~(BR_LEARNING)) + return -EINVAL; + + return 0; +} + +static int ks8995_port_bridge_flags(struct dsa_switch *ds, int port, + struct switchdev_brport_flags flags, + struct netlink_ext_ack *extack) +{ + struct ks8995_switch *ks = ds->priv; + int ret; + u8 val; + + if (flags.mask & BR_LEARNING) { + ret = ks8995_read_reg(ks, KS8995_REG_PC(port, KS8995_REG_PC2), &val); + if (ret) { + dev_err(ks->dev, "failed to read KS8995_REG_PC2 on port %d\n", port); + return ret; + } + + if (flags.val & BR_LEARNING) + val &= ~KS8995_PC2_LEARN_DIS; + else + val |= KS8995_PC2_LEARN_DIS; + + ret = ks8995_write_reg(ks, KS8995_REG_PC(port, KS8995_REG_PC2), val); + if (ret) { + dev_err(ks->dev, "failed to write KS8995_REG_PC2 on port %d\n", port); + return ret; + } + } + + return 0; +} + +static void ks8995_port_stp_state_set(struct dsa_switch *ds, int port, u8 state) +{ + struct ks8995_switch *ks = ds->priv; + int ret; + u8 val; + + ret = ks8995_read_reg(ks, KS8995_REG_PC(port, KS8995_REG_PC2), &val); + if (ret) { + dev_err(ks->dev, "failed to read KS8995_REG_PC2 on port %d\n", port); + return; + } + + /* Set the bits for the different STP states in accordance with + * the datasheet, pages 36-37 "Spanning tree support". + */ + switch (state) { + case BR_STATE_DISABLED: + case BR_STATE_BLOCKING: + case BR_STATE_LISTENING: + val &= ~KS8995_PC2_TXEN; + val &= ~KS8995_PC2_RXEN; + val |= KS8995_PC2_LEARN_DIS; + break; + case BR_STATE_LEARNING: + val &= ~KS8995_PC2_TXEN; + val &= ~KS8995_PC2_RXEN; + val &= ~KS8995_PC2_LEARN_DIS; + break; + case BR_STATE_FORWARDING: + val |= KS8995_PC2_TXEN; + val |= KS8995_PC2_RXEN; + val &= ~KS8995_PC2_LEARN_DIS; + break; + default: + dev_err(ks->dev, "unknown bridge state requested\n"); + return; + } + + ret = ks8995_write_reg(ks, KS8995_REG_PC(port, KS8995_REG_PC2), val); + if (ret) { + dev_err(ks->dev, "failed to write KS8995_REG_PC2 on port %d\n", port); + return; + } + + dev_dbg(ks->dev, "set KS8995_REG_PC2 for port %d to %02x\n", port, val); +} + +static void ks8995_phylink_get_caps(struct dsa_switch *dsa, int port, + struct phylink_config *config) +{ + unsigned long *interfaces = config->supported_interfaces; + + if (port == KS8995_CPU_PORT) + __set_bit(PHY_INTERFACE_MODE_MII, interfaces); + + if (port <= 3) { + /* Internal PHYs */ + __set_bit(PHY_INTERFACE_MODE_INTERNAL, interfaces); + /* phylib default */ + __set_bit(PHY_INTERFACE_MODE_MII, interfaces); + } + + config->mac_capabilities = MAC_SYM_PAUSE | MAC_10 | MAC_100; +} + +/* Huge packet support up to 1916 byte packages "inclusive" + * which means that tags are included. If the bit is not set + * it is 1536 bytes "inclusive". We present the length without + * tags or ethernet headers. The setting affects all ports. + */ +static int ks8995_change_mtu(struct dsa_switch *ds, int port, int new_mtu) +{ + struct ks8995_switch *ks = ds->priv; + unsigned int max_mtu; + int ret; + u8 val; + int i; + + ks->max_mtu[port] = new_mtu; + + /* Roof out the MTU for the entire switch to the greatest + * common denominator: the biggest set for any one port will + * be the biggest MTU for the switch. + */ + max_mtu = ETH_DATA_LEN; + for (i = 0; i < KS8995_NUM_PORTS; i++) { + if (ks->max_mtu[i] > max_mtu) + max_mtu = ks->max_mtu[i]; + } + + /* Translate to layer 2 size. + * Add ethernet and (possible) VLAN headers, and checksum to the size. + * For ETH_DATA_LEN (1500 bytes) this will add up to 1522 bytes. + */ + max_mtu += VLAN_ETH_HLEN; + max_mtu += ETH_FCS_LEN; + + ret = ks8995_read_reg(ks, KS8995_REG_GC2, &val); + if (ret) { + dev_err(ks->dev, "failed to read KS8995_REG_GC2\n"); + return ret; + } + + if (max_mtu <= 1522) { + val &= ~KS8995_GC2_HUGE; + val &= ~KS8995_GC2_LEGAL; + } else if (max_mtu > 1522 && max_mtu <= 1536) { + /* This accepts packets up to 1536 bytes */ + val &= ~KS8995_GC2_HUGE; + val |= KS8995_GC2_LEGAL; + } else { + /* This accepts packets up to 1916 bytes */ + val |= KS8995_GC2_HUGE; + val |= KS8995_GC2_LEGAL; + } + + dev_dbg(ks->dev, "new max MTU %d bytes (inclusive)\n", max_mtu); + + ret = ks8995_write_reg(ks, KS8995_REG_GC2, val); + if (ret) + dev_err(ks->dev, "failed to set KS8995_REG_GC2\n"); + + return ret; +} + +static int ks8995_get_max_mtu(struct dsa_switch *ds, int port) +{ + return 1916 - ETH_HLEN - ETH_FCS_LEN; +} + +static const struct dsa_switch_ops ks8995_ds_ops = { + .get_tag_protocol = ks8995_get_tag_protocol, + .setup = ks8995_setup, + .port_pre_bridge_flags = ks8995_port_pre_bridge_flags, + .port_bridge_flags = ks8995_port_bridge_flags, + .port_enable = ks8995_port_enable, + .port_disable = ks8995_port_disable, + .port_stp_state_set = ks8995_port_stp_state_set, + .port_change_mtu = ks8995_change_mtu, + .port_max_mtu = ks8995_get_max_mtu, + .phylink_get_caps = ks8995_phylink_get_caps, +}; + /* ------------------------------------------------------------------------ */ static int ks8995_probe(struct spi_device *spi) { @@ -389,6 +760,7 @@ static int ks8995_probe(struct spi_device *spi) mutex_init(&ks->lock); ks->spi = spi; + ks->dev = &spi->dev; ks->chip = &ks8995_chip[variant]; ks->reset_gpio = devm_gpiod_get_optional(&spi->dev, "reset", @@ -435,6 +807,25 @@ static int ks8995_probe(struct spi_device *spi) dev_info(&spi->dev, "%s device found, Chip ID:%x, Revision:%x\n", ks->chip->name, ks->chip->chip_id, ks->revision_id); + err = ks8995_check_config(ks); + if (err) + return err; + + ks->ds = devm_kzalloc(&spi->dev, sizeof(*ks->ds), GFP_KERNEL); + if (!ks->ds) + return -ENOMEM; + + ks->ds->dev = &spi->dev; + ks->ds->num_ports = KS8995_NUM_PORTS; + ks->ds->ops = &ks8995_ds_ops; + ks->ds->phylink_mac_ops = &ks8995_phylink_mac_ops; + ks->ds->priv = ks; + + err = dsa_register_switch(ks->ds); + if (err) + return dev_err_probe(&spi->dev, err, + "unable to register DSA switch\n"); + return 0; } @@ -442,6 +833,7 @@ static void ks8995_remove(struct spi_device *spi) { struct ks8995_switch *ks = spi_get_drvdata(spi); + dsa_unregister_switch(ks->ds); /* assert reset */ gpiod_set_value_cansleep(ks->reset_gpio, 1); } From 661bfb4699f8cb283014861d3fc35195df4eead0 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Thu, 14 Aug 2025 19:23:29 -0700 Subject: [PATCH 0115/1536] nfc: s3fwrn5: Use SHA-1 library instead of crypto_shash Now that a SHA-1 library API is available, use it instead of crypto_shash. This is simpler and faster. Signed-off-by: Eric Biggers Reviewed-by: Krzysztof Kozlowski Link: https://patch.msgid.link/20250815022329.28672-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski --- drivers/nfc/s3fwrn5/Kconfig | 3 +-- drivers/nfc/s3fwrn5/firmware.c | 17 +---------------- 2 files changed, 2 insertions(+), 18 deletions(-) diff --git a/drivers/nfc/s3fwrn5/Kconfig b/drivers/nfc/s3fwrn5/Kconfig index 8a6b1a79de25..96386b73fa2b 100644 --- a/drivers/nfc/s3fwrn5/Kconfig +++ b/drivers/nfc/s3fwrn5/Kconfig @@ -1,8 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only config NFC_S3FWRN5 tristate - select CRYPTO - select CRYPTO_HASH + select CRYPTO_LIB_SHA1 help Core driver for Samsung S3FWRN5 NFC chip. Contains core utilities of chip. It's intended to be used by PHYs to avoid duplicating lots diff --git a/drivers/nfc/s3fwrn5/firmware.c b/drivers/nfc/s3fwrn5/firmware.c index 781cdbcac104..64d61b2a715a 100644 --- a/drivers/nfc/s3fwrn5/firmware.c +++ b/drivers/nfc/s3fwrn5/firmware.c @@ -8,7 +8,6 @@ #include #include -#include #include #include "s3fwrn5.h" @@ -411,27 +410,13 @@ int s3fwrn5_fw_download(struct s3fwrn5_fw_info *fw_info) struct device *dev = &fw_info->ndev->nfc_dev->dev; struct s3fwrn5_fw_image *fw = &fw_info->fw; u8 hash_data[SHA1_DIGEST_SIZE]; - struct crypto_shash *tfm; u32 image_size, off; int ret; image_size = fw_info->sector_size * fw->image_sectors; /* Compute SHA of firmware data */ - - tfm = crypto_alloc_shash("sha1", 0, 0); - if (IS_ERR(tfm)) { - dev_err(dev, "Cannot allocate shash (code=%pe)\n", tfm); - return PTR_ERR(tfm); - } - - ret = crypto_shash_tfm_digest(tfm, fw->image, image_size, hash_data); - - crypto_free_shash(tfm); - if (ret) { - dev_err(dev, "Cannot compute hash (code=%d)\n", ret); - return ret; - } + sha1(fw->image, image_size, hash_data); /* Firmware update process */ From 1fb39d4c23b1e2b16bfd46261f4e83d42c1ff0a4 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Fri, 15 Aug 2025 09:56:19 +0800 Subject: [PATCH 0116/1536] eth: nfp: Remove u64_stats_update_begin()/end() for stats fetch This place is fetching the stats, u64_stats_update_begin()/end() should not be used, and the fetcher of stats is in the same context as the updater of the stats, so don't need any protection Signed-off-by: Li RongQing Link: https://patch.msgid.link/20250815015619.2713-1-lirongqing@baidu.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/nfd3/dp.c | 16 ++++------------ drivers/net/ethernet/netronome/nfp/nfdk/dp.c | 16 ++++------------ 2 files changed, 8 insertions(+), 24 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfd3/dp.c b/drivers/net/ethernet/netronome/nfp/nfd3/dp.c index 08086eb76996..91a227929a5f 100644 --- a/drivers/net/ethernet/netronome/nfp/nfd3/dp.c +++ b/drivers/net/ethernet/netronome/nfp/nfd3/dp.c @@ -1169,14 +1169,10 @@ int nfp_nfd3_poll(struct napi_struct *napi, int budget) if (r_vec->nfp_net->rx_coalesce_adapt_on && r_vec->rx_ring) { struct dim_sample dim_sample = {}; - unsigned int start; u64 pkts, bytes; - do { - start = u64_stats_fetch_begin(&r_vec->rx_sync); - pkts = r_vec->rx_pkts; - bytes = r_vec->rx_bytes; - } while (u64_stats_fetch_retry(&r_vec->rx_sync, start)); + pkts = r_vec->rx_pkts; + bytes = r_vec->rx_bytes; dim_update_sample(r_vec->event_ctr, pkts, bytes, &dim_sample); net_dim(&r_vec->rx_dim, &dim_sample); @@ -1184,14 +1180,10 @@ int nfp_nfd3_poll(struct napi_struct *napi, int budget) if (r_vec->nfp_net->tx_coalesce_adapt_on && r_vec->tx_ring) { struct dim_sample dim_sample = {}; - unsigned int start; u64 pkts, bytes; - do { - start = u64_stats_fetch_begin(&r_vec->tx_sync); - pkts = r_vec->tx_pkts; - bytes = r_vec->tx_bytes; - } while (u64_stats_fetch_retry(&r_vec->tx_sync, start)); + pkts = r_vec->tx_pkts; + bytes = r_vec->tx_bytes; dim_update_sample(r_vec->event_ctr, pkts, bytes, &dim_sample); net_dim(&r_vec->tx_dim, &dim_sample); diff --git a/drivers/net/ethernet/netronome/nfp/nfdk/dp.c b/drivers/net/ethernet/netronome/nfp/nfdk/dp.c index ab3cd06ed63e..ee0db3d5fd66 100644 --- a/drivers/net/ethernet/netronome/nfp/nfdk/dp.c +++ b/drivers/net/ethernet/netronome/nfp/nfdk/dp.c @@ -1279,14 +1279,10 @@ int nfp_nfdk_poll(struct napi_struct *napi, int budget) if (r_vec->nfp_net->rx_coalesce_adapt_on && r_vec->rx_ring) { struct dim_sample dim_sample = {}; - unsigned int start; u64 pkts, bytes; - do { - start = u64_stats_fetch_begin(&r_vec->rx_sync); - pkts = r_vec->rx_pkts; - bytes = r_vec->rx_bytes; - } while (u64_stats_fetch_retry(&r_vec->rx_sync, start)); + pkts = r_vec->rx_pkts; + bytes = r_vec->rx_bytes; dim_update_sample(r_vec->event_ctr, pkts, bytes, &dim_sample); net_dim(&r_vec->rx_dim, &dim_sample); @@ -1294,14 +1290,10 @@ int nfp_nfdk_poll(struct napi_struct *napi, int budget) if (r_vec->nfp_net->tx_coalesce_adapt_on && r_vec->tx_ring) { struct dim_sample dim_sample = {}; - unsigned int start; u64 pkts, bytes; - do { - start = u64_stats_fetch_begin(&r_vec->tx_sync); - pkts = r_vec->tx_pkts; - bytes = r_vec->tx_bytes; - } while (u64_stats_fetch_retry(&r_vec->tx_sync, start)); + pkts = r_vec->tx_pkts; + bytes = r_vec->tx_bytes; dim_update_sample(r_vec->event_ctr, pkts, bytes, &dim_sample); net_dim(&r_vec->tx_dim, &dim_sample); From b55f7295d600b4cd487ef2e9332e30d6933be78c Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Thu, 14 Aug 2025 19:07:05 -0700 Subject: [PATCH 0117/1536] ppp: mppe: Use SHA-1 library instead of crypto_shash Now that a SHA-1 library API is available, use it instead of crypto_shash. This is simpler and faster. Signed-off-by: Eric Biggers Link: https://patch.msgid.link/20250815020705.23055-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ppp/Kconfig | 3 +- drivers/net/ppp/ppp_mppe.c | 108 +++++++------------------------------ 2 files changed, 19 insertions(+), 92 deletions(-) diff --git a/drivers/net/ppp/Kconfig b/drivers/net/ppp/Kconfig index 8c9ed1889d1a..a1806b4b84be 100644 --- a/drivers/net/ppp/Kconfig +++ b/drivers/net/ppp/Kconfig @@ -85,9 +85,8 @@ config PPP_FILTER config PPP_MPPE tristate "PPP MPPE compression (encryption)" depends on PPP - select CRYPTO - select CRYPTO_SHA1 select CRYPTO_LIB_ARC4 + select CRYPTO_LIB_SHA1 help Support for the MPPE Encryption protocol, as employed by the Microsoft Point-to-Point Tunneling Protocol. diff --git a/drivers/net/ppp/ppp_mppe.c b/drivers/net/ppp/ppp_mppe.c index bcc1eaedf58f..630cbf71c147 100644 --- a/drivers/net/ppp/ppp_mppe.c +++ b/drivers/net/ppp/ppp_mppe.c @@ -43,7 +43,7 @@ */ #include -#include +#include #include #include #include @@ -55,7 +55,6 @@ #include #include #include -#include #include #include "ppp_mppe.h" @@ -67,31 +66,15 @@ MODULE_ALIAS("ppp-compress-" __stringify(CI_MPPE)); MODULE_VERSION("1.0.2"); #define SHA1_PAD_SIZE 40 - -/* - * kernel crypto API needs its arguments to be in kmalloc'd memory, not in the module - * static data area. That means sha_pad needs to be kmalloc'd. - */ - -struct sha_pad { - unsigned char sha_pad1[SHA1_PAD_SIZE]; - unsigned char sha_pad2[SHA1_PAD_SIZE]; -}; -static struct sha_pad *sha_pad; - -static inline void sha_pad_init(struct sha_pad *shapad) -{ - memset(shapad->sha_pad1, 0x00, sizeof(shapad->sha_pad1)); - memset(shapad->sha_pad2, 0xF2, sizeof(shapad->sha_pad2)); -} +static const u8 sha_pad1[SHA1_PAD_SIZE] = { 0 }; +static const u8 sha_pad2[SHA1_PAD_SIZE] = { [0 ... SHA1_PAD_SIZE - 1] = 0xF2 }; /* * State for an MPPE (de)compressor. */ struct ppp_mppe_state { struct arc4_ctx arc4; - struct shash_desc *sha1; - unsigned char *sha1_digest; + unsigned char sha1_digest[SHA1_DIGEST_SIZE]; unsigned char master_key[MPPE_MAX_KEY_LEN]; unsigned char session_key[MPPE_MAX_KEY_LEN]; unsigned keylen; /* key length in bytes */ @@ -130,16 +113,14 @@ struct ppp_mppe_state { */ static void get_new_key_from_sha(struct ppp_mppe_state * state) { - crypto_shash_init(state->sha1); - crypto_shash_update(state->sha1, state->master_key, - state->keylen); - crypto_shash_update(state->sha1, sha_pad->sha_pad1, - sizeof(sha_pad->sha_pad1)); - crypto_shash_update(state->sha1, state->session_key, - state->keylen); - crypto_shash_update(state->sha1, sha_pad->sha_pad2, - sizeof(sha_pad->sha_pad2)); - crypto_shash_final(state->sha1, state->sha1_digest); + struct sha1_ctx ctx; + + sha1_init(&ctx); + sha1_update(&ctx, state->master_key, state->keylen); + sha1_update(&ctx, sha_pad1, sizeof(sha_pad1)); + sha1_update(&ctx, state->session_key, state->keylen); + sha1_update(&ctx, sha_pad2, sizeof(sha_pad2)); + sha1_final(&ctx, state->sha1_digest); } /* @@ -171,39 +152,15 @@ static void mppe_rekey(struct ppp_mppe_state * state, int initial_key) static void *mppe_alloc(unsigned char *options, int optlen) { struct ppp_mppe_state *state; - struct crypto_shash *shash; - unsigned int digestsize; if (optlen != CILEN_MPPE + sizeof(state->master_key) || options[0] != CI_MPPE || options[1] != CILEN_MPPE || fips_enabled) - goto out; + return NULL; state = kzalloc(sizeof(*state), GFP_KERNEL); if (state == NULL) - goto out; - - - shash = crypto_alloc_shash("sha1", 0, 0); - if (IS_ERR(shash)) - goto out_free; - - state->sha1 = kmalloc(sizeof(*state->sha1) + - crypto_shash_descsize(shash), - GFP_KERNEL); - if (!state->sha1) { - crypto_free_shash(shash); - goto out_free; - } - state->sha1->tfm = shash; - - digestsize = crypto_shash_digestsize(shash); - if (digestsize < MPPE_MAX_KEY_LEN) - goto out_free; - - state->sha1_digest = kmalloc(digestsize, GFP_KERNEL); - if (!state->sha1_digest) - goto out_free; + return NULL; /* Save keys. */ memcpy(state->master_key, &options[CILEN_MPPE], @@ -217,16 +174,6 @@ static void *mppe_alloc(unsigned char *options, int optlen) */ return (void *)state; - -out_free: - kfree(state->sha1_digest); - if (state->sha1) { - crypto_free_shash(state->sha1->tfm); - kfree_sensitive(state->sha1); - } - kfree(state); -out: - return NULL; } /* @@ -235,12 +182,8 @@ out: static void mppe_free(void *arg) { struct ppp_mppe_state *state = (struct ppp_mppe_state *) arg; - if (state) { - kfree(state->sha1_digest); - crypto_free_shash(state->sha1->tfm); - kfree_sensitive(state->sha1); - kfree_sensitive(state); - } + + kfree_sensitive(state); } /* @@ -649,31 +592,17 @@ static struct compressor ppp_mppe = { .comp_extra = MPPE_PAD, }; -/* - * ppp_mppe_init() - * - * Prior to allowing load, try to load the arc4 and sha1 crypto - * libraries. The actual use will be allocated later, but - * this way the module will fail to insmod if they aren't available. - */ - static int __init ppp_mppe_init(void) { int answer; - if (fips_enabled || !crypto_has_ahash("sha1", 0, CRYPTO_ALG_ASYNC)) - return -ENODEV; - sha_pad = kmalloc(sizeof(struct sha_pad), GFP_KERNEL); - if (!sha_pad) - return -ENOMEM; - sha_pad_init(sha_pad); + if (fips_enabled) + return -ENODEV; answer = ppp_register_compressor(&ppp_mppe); if (answer == 0) printk(KERN_INFO "PPP MPPE Compression module registered\n"); - else - kfree(sha_pad); return answer; } @@ -681,7 +610,6 @@ static int __init ppp_mppe_init(void) static void __exit ppp_mppe_cleanup(void) { ppp_unregister_compressor(&ppp_mppe); - kfree(sha_pad); } module_init(ppp_mppe_init); From ab4ee77ed9bddfc1c3461b42ef4875068928d2e6 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 15 Aug 2025 09:52:42 -0700 Subject: [PATCH 0118/1536] docs: netdev: refine the clean-up patch examples We discourage sending trivial patches to clean up checkpatch warnings. There are other tools which lead to patches of similarly low value like some coccicheck warnings. The warnings are useful for new code but fixing them in the existing code base is a waste of review time. Broaden the example given in the doc a little bit. Link: https://patch.msgid.link/20250815165242.124240-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- Documentation/process/maintainer-netdev.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/process/maintainer-netdev.rst b/Documentation/process/maintainer-netdev.rst index e1755610b4bc..989192421cc9 100644 --- a/Documentation/process/maintainer-netdev.rst +++ b/Documentation/process/maintainer-netdev.rst @@ -407,7 +407,7 @@ Clean-up patches Netdev discourages patches which perform simple clean-ups, which are not in the context of other work. For example: -* Addressing ``checkpatch.pl`` warnings +* Addressing ``checkpatch.pl``, and other trivial coding style warnings * Addressing :ref:`Local variable ordering` issues * Conversions to device-managed APIs (``devm_`` helpers) From 4490d075c2d989f3403059cbd30775ae21dbd754 Mon Sep 17 00:00:00 2001 From: Qianfeng Rong Date: Sat, 16 Aug 2025 17:06:52 +0800 Subject: [PATCH 0119/1536] eth: intel: use vmalloc_array() to simplify code Remove array_size() calls and replace vmalloc() with vmalloc_array() to simplify the code and maintain consistency with existing kmalloc_array() usage. vmalloc_array() is also optimized better, resulting in less instructions being used. Signed-off-by: Qianfeng Rong Link: https://patch.msgid.link/20250816090659.117699-2-rongqianfeng@vivo.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 2 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 8 ++++---- drivers/net/ethernet/intel/igc/igc_ethtool.c | 8 ++++---- drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 2 +- drivers/net/ethernet/intel/ixgbevf/ethtool.c | 6 +++--- 5 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c index 1954a04460d1..bf2029144c1d 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c @@ -560,7 +560,7 @@ static int fm10k_set_ringparam(struct net_device *netdev, /* allocate temporary buffer to store rings in */ i = max_t(int, interface->num_tx_queues, interface->num_rx_queues); - temp_ring = vmalloc(array_size(i, sizeof(struct fm10k_ring))); + temp_ring = vmalloc_array(i, sizeof(struct fm10k_ring)); if (!temp_ring) { err = -ENOMEM; diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index 92ef33459aec..51d5cb6599ed 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -920,11 +920,11 @@ static int igb_set_ringparam(struct net_device *netdev, } if (adapter->num_tx_queues > adapter->num_rx_queues) - temp_ring = vmalloc(array_size(sizeof(struct igb_ring), - adapter->num_tx_queues)); + temp_ring = vmalloc_array(adapter->num_tx_queues, + sizeof(struct igb_ring)); else - temp_ring = vmalloc(array_size(sizeof(struct igb_ring), - adapter->num_rx_queues)); + temp_ring = vmalloc_array(adapter->num_rx_queues, + sizeof(struct igb_ring)); if (!temp_ring) { err = -ENOMEM; diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c index ecb35b693ce5..f3e7218ba6f3 100644 --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c @@ -627,11 +627,11 @@ igc_ethtool_set_ringparam(struct net_device *netdev, } if (adapter->num_tx_queues > adapter->num_rx_queues) - temp_ring = vmalloc(array_size(sizeof(struct igc_ring), - adapter->num_tx_queues)); + temp_ring = vmalloc_array(adapter->num_tx_queues, + sizeof(struct igc_ring)); else - temp_ring = vmalloc(array_size(sizeof(struct igc_ring), - adapter->num_rx_queues)); + temp_ring = vmalloc_array(adapter->num_rx_queues, + sizeof(struct igc_ring)); if (!temp_ring) { err = -ENOMEM; diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c index 25c3a09ad7f1..2c5d774f1ec1 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c @@ -1278,7 +1278,7 @@ static int ixgbe_set_ringparam(struct net_device *netdev, /* allocate temporary buffer to store rings in */ i = max_t(int, adapter->num_tx_queues + adapter->num_xdp_queues, adapter->num_rx_queues); - temp_ring = vmalloc(array_size(i, sizeof(struct ixgbe_ring))); + temp_ring = vmalloc_array(i, sizeof(struct ixgbe_ring)); if (!temp_ring) { err = -ENOMEM; diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c b/drivers/net/ethernet/intel/ixgbevf/ethtool.c index 7ac53171b041..bebad564188e 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c +++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c @@ -276,9 +276,9 @@ static int ixgbevf_set_ringparam(struct net_device *netdev, } if (new_tx_count != adapter->tx_ring_count) { - tx_ring = vmalloc(array_size(sizeof(*tx_ring), - adapter->num_tx_queues + - adapter->num_xdp_queues)); + tx_ring = vmalloc_array(adapter->num_tx_queues + + adapter->num_xdp_queues, + sizeof(*tx_ring)); if (!tx_ring) { err = -ENOMEM; goto clear_reset; From fce214586f99fe971a78ff1342cc331eecc74855 Mon Sep 17 00:00:00 2001 From: Qianfeng Rong Date: Sat, 16 Aug 2025 17:06:53 +0800 Subject: [PATCH 0120/1536] nfp: flower: use vmalloc_array() to simplify code Remove array_size() calls and replace vmalloc() with vmalloc_array() in nfp_flower_metadata_init(). vmalloc_array() is also optimized better, resulting in less instructions being used. Place 'NFP_FL_STATS_ELEM_RS' with the sizeof() parameter as the second argument to vmalloc_array() to avoid -Wcalloc-transposed-args compilation warnings. Signed-off-by: Qianfeng Rong Link: https://patch.msgid.link/20250816090659.117699-3-rongqianfeng@vivo.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/flower/metadata.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c b/drivers/net/ethernet/netronome/nfp/flower/metadata.c index 80e4675582bf..dde60c4572fa 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c +++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c @@ -564,8 +564,8 @@ int nfp_flower_metadata_init(struct nfp_app *app, u64 host_ctx_count, /* Init ring buffer and unallocated stats_ids. */ priv->stats_ids.free_list.buf = - vmalloc(array_size(NFP_FL_STATS_ELEM_RS, - priv->stats_ring_size)); + vmalloc_array(priv->stats_ring_size, + NFP_FL_STATS_ELEM_RS); if (!priv->stats_ids.free_list.buf) goto err_free_last_used; From dad3280591abde4c100e8185e3e47bfe72ae53c5 Mon Sep 17 00:00:00 2001 From: Qianfeng Rong Date: Sat, 16 Aug 2025 17:06:54 +0800 Subject: [PATCH 0121/1536] ppp: use vmalloc_array() to simplify code Remove array_size() calls and replace vmalloc() with vmalloc_array() in bsd_alloc(). vmalloc_array() is also optimized better, resulting in less instructions being used. Signed-off-by: Qianfeng Rong Link: https://patch.msgid.link/20250816090659.117699-4-rongqianfeng@vivo.com Signed-off-by: Jakub Kicinski --- drivers/net/ppp/bsd_comp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ppp/bsd_comp.c b/drivers/net/ppp/bsd_comp.c index 55954594e157..f385b759d5cf 100644 --- a/drivers/net/ppp/bsd_comp.c +++ b/drivers/net/ppp/bsd_comp.c @@ -406,7 +406,7 @@ static void *bsd_alloc (unsigned char *options, int opt_len, int decomp) * Allocate space for the dictionary. This may be more than one page in * length. */ - db->dict = vmalloc(array_size(hsize, sizeof(struct bsd_dict))); + db->dict = vmalloc_array(hsize, sizeof(struct bsd_dict)); if (!db->dict) { bsd_free (db); @@ -425,7 +425,7 @@ static void *bsd_alloc (unsigned char *options, int opt_len, int decomp) */ else { - db->lens = vmalloc(array_size(sizeof(db->lens[0]), (maxmaxcode + 1))); + db->lens = vmalloc_array(maxmaxcode + 1, sizeof(db->lens[0])); if (!db->lens) { bsd_free (db); From 5883cb32fcea4471da242c9147427038625eee04 Mon Sep 17 00:00:00 2001 From: Vishal Badole Date: Sat, 16 Aug 2025 19:49:41 +0530 Subject: [PATCH 0122/1536] amd-xgbe: Configure and retrieve 'tx-usecs' for Tx coalescing Ethtool has advanced with additional configurable options, but the current driver does not support tx-usecs configuration using Ethtool. Add support to configure and retrieve 'tx-usecs' using ethtool, which specifies the wait time before servicing an interrupt for Tx coalescing. Signed-off-by: Vishal Badole Acked-by: Shyam Sundar S K Link: https://patch.msgid.link/20250816141941.126054-1-Vishal.Badole@amd.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c | 28 ++++++++++++++++++-- drivers/net/ethernet/amd/xgbe/xgbe.h | 1 + 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c index be0d2c7d08dc..35d73306a2d6 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c @@ -329,6 +329,7 @@ static int xgbe_get_coalesce(struct net_device *netdev, ec->rx_coalesce_usecs = pdata->rx_usecs; ec->rx_max_coalesced_frames = pdata->rx_frames; + ec->tx_coalesce_usecs = pdata->tx_usecs; ec->tx_max_coalesced_frames = pdata->tx_frames; return 0; @@ -342,7 +343,8 @@ static int xgbe_set_coalesce(struct net_device *netdev, struct xgbe_prv_data *pdata = netdev_priv(netdev); struct xgbe_hw_if *hw_if = &pdata->hw_if; unsigned int rx_frames, rx_riwt, rx_usecs; - unsigned int tx_frames; + unsigned int tx_frames, tx_usecs; + unsigned int jiffy_us = jiffies_to_usecs(1); rx_riwt = hw_if->usec_to_riwt(pdata, ec->rx_coalesce_usecs); rx_usecs = ec->rx_coalesce_usecs; @@ -364,20 +366,42 @@ static int xgbe_set_coalesce(struct net_device *netdev, return -EINVAL; } + tx_usecs = ec->tx_coalesce_usecs; tx_frames = ec->tx_max_coalesced_frames; /* Check the bounds of values for Tx */ + if (!tx_usecs) { + NL_SET_ERR_MSG_FMT_MOD(extack, + "tx-usecs must not be 0"); + return -EINVAL; + } + if (tx_usecs > XGMAC_MAX_COAL_TX_TICK) { + NL_SET_ERR_MSG_FMT_MOD(extack, "tx-usecs is limited to %d usec", + XGMAC_MAX_COAL_TX_TICK); + return -EINVAL; + } if (tx_frames > pdata->tx_desc_count) { netdev_err(netdev, "tx-frames is limited to %d frames\n", pdata->tx_desc_count); return -EINVAL; } + /* Round tx-usecs to nearest multiple of jiffy granularity */ + if (tx_usecs % jiffy_us) { + tx_usecs = rounddown(tx_usecs, jiffy_us); + if (!tx_usecs) + tx_usecs = jiffy_us; + NL_SET_ERR_MSG_FMT_MOD(extack, + "tx-usecs rounded to %u usec due to jiffy granularity (%u usec)", + tx_usecs, jiffy_us); + } + pdata->rx_riwt = rx_riwt; pdata->rx_usecs = rx_usecs; pdata->rx_frames = rx_frames; hw_if->config_rx_coalesce(pdata); + pdata->tx_usecs = tx_usecs; pdata->tx_frames = tx_frames; hw_if->config_tx_coalesce(pdata); @@ -709,7 +733,7 @@ out: } static const struct ethtool_ops xgbe_ethtool_ops = { - .supported_coalesce_params = ETHTOOL_COALESCE_RX_USECS | + .supported_coalesce_params = ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_MAX_FRAMES, .get_drvinfo = xgbe_get_drvinfo, .get_msglevel = xgbe_get_msglevel, diff --git a/drivers/net/ethernet/amd/xgbe/xgbe.h b/drivers/net/ethernet/amd/xgbe/xgbe.h index d7e03e292ec4..0fa80a238ac5 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe.h +++ b/drivers/net/ethernet/amd/xgbe/xgbe.h @@ -168,6 +168,7 @@ /* Default coalescing parameters */ #define XGMAC_INIT_DMA_TX_USECS 1000 #define XGMAC_INIT_DMA_TX_FRAMES 25 +#define XGMAC_MAX_COAL_TX_TICK 100000 #define XGMAC_MAX_DMA_RIWT 0xff #define XGMAC_INIT_DMA_RX_USECS 30 From d360551f265e3c942ce06cd6f4d2f7f67741bcbd Mon Sep 17 00:00:00 2001 From: Kuan-Chung Chen Date: Mon, 11 Aug 2025 20:37:37 +0800 Subject: [PATCH 0123/1536] wifi: rtw89: introduce beacon tracking to improve connection stability In ideal scenario, AP's beacon should transmit at the Target Beacon Transmission Time (TBTT). However, in practice, beacon may be slightly off-schedule. This beacon "drift" prevents the firmware from receiving beacon at the expected TBTT, leading to connection disruptions. To address this, we introduce beacon tracking mechanism to enhance overall connection stability. This mechanism executes the following steps in each cycle (2 seconds): 1) Based on the last 32 received beacons, compute the minimum TBTT offset to use for the next cycle 2) Using the same 32 beacons, calculate the drift of each. A histogram is plotted, and outliers are identified using a boxplot. 3) According to the statistical results from the second step, a maximum receive window size (beacon timeout) is selected to cover approximately 80% of the beacons and applied to the next cycle. Signed-off-by: Kuan-Chung Chen Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20250811123744.15361-2-pkshih@realtek.com --- drivers/net/wireless/realtek/rtw89/core.c | 457 +++++++++++++++++++++- drivers/net/wireless/realtek/rtw89/core.h | 50 ++- drivers/net/wireless/realtek/rtw89/fw.c | 90 +++++ drivers/net/wireless/realtek/rtw89/fw.h | 27 ++ drivers/net/wireless/realtek/rtw89/mac.c | 22 -- drivers/net/wireless/realtek/rtw89/phy.c | 2 +- drivers/net/wireless/realtek/rtw89/ps.c | 3 + 7 files changed, 619 insertions(+), 32 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw89/core.c b/drivers/net/wireless/realtek/rtw89/core.c index 57590f5577a3..7d522031dfa1 100644 --- a/drivers/net/wireless/realtek/rtw89/core.c +++ b/drivers/net/wireless/realtek/rtw89/core.c @@ -2,6 +2,7 @@ /* Copyright(c) 2019-2020 Realtek Corporation */ #include +#include #include #include "cam.h" @@ -272,17 +273,18 @@ rtw89_get_6ghz_span(struct rtw89_dev *rtwdev, u32 center_freq) return NULL; } -bool rtw89_ra_report_to_bitrate(struct rtw89_dev *rtwdev, u8 rpt_rate, u16 *bitrate) +bool rtw89_legacy_rate_to_bitrate(struct rtw89_dev *rtwdev, u8 legacy_rate, u16 *bitrate) { - struct ieee80211_rate rate; + const struct ieee80211_rate *rate; - if (unlikely(rpt_rate >= ARRAY_SIZE(rtw89_bitrates))) { - rtw89_debug(rtwdev, RTW89_DBG_UNEXP, "invalid rpt rate %d\n", rpt_rate); + if (unlikely(legacy_rate >= ARRAY_SIZE(rtw89_bitrates))) { + rtw89_debug(rtwdev, RTW89_DBG_UNEXP, + "invalid legacy rate %d\n", legacy_rate); return false; } - rate = rtw89_bitrates[rpt_rate]; - *bitrate = rate.bitrate; + rate = &rtw89_bitrates[legacy_rate]; + *bitrate = rate->bitrate; return true; } @@ -2221,6 +2223,435 @@ static void rtw89_vif_sync_bcn_tsf(struct rtw89_vif_link *rtwvif_link, WRITE_ONCE(rtwvif_link->sync_bcn_tsf, le64_to_cpu(mgmt->u.beacon.timestamp)); } +static u32 rtw89_bcn_calc_min_tbtt(struct rtw89_dev *rtwdev, u32 tbtt1, u32 tbtt2) +{ + struct rtw89_beacon_track_info *bcn_track = &rtwdev->bcn_track; + u32 close_bcn_intvl_th = bcn_track->close_bcn_intvl_th; + u32 tbtt_diff_th = bcn_track->tbtt_diff_th; + + if (tbtt2 > tbtt1) + swap(tbtt1, tbtt2); + + if (tbtt1 - tbtt2 > tbtt_diff_th) + return tbtt1; + else if (tbtt2 > close_bcn_intvl_th) + return tbtt2; + else if (tbtt1 > close_bcn_intvl_th) + return tbtt1; + else + return tbtt2; +} + +static void rtw89_bcn_cfg_tbtt_offset(struct rtw89_dev *rtwdev, + struct rtw89_vif_link *rtwvif_link) +{ + struct rtw89_beacon_track_info *bcn_track = &rtwdev->bcn_track; + enum rtw89_core_chip_id chip_id = rtwdev->chip->chip_id; + u32 offset = bcn_track->tbtt_offset; + + if (chip_id == RTL8852A || rtw89_is_rtl885xb(rtwdev)) { + const struct rtw89_mac_gen_def *mac = rtwdev->chip->mac_def; + const struct rtw89_port_reg *p = mac->port_base; + u32 bcnspc, val; + + bcnspc = rtw89_read32_port_mask(rtwdev, rtwvif_link, + p->bcn_space, B_AX_BCN_SPACE_MASK); + val = bcnspc - (offset / 1024); + val = u32_encode_bits(val, B_AX_TBTT_SHIFT_OFST_MAG) | + B_AX_TBTT_SHIFT_OFST_SIGN; + + rtw89_write16_port_mask(rtwdev, rtwvif_link, p->tbtt_shift, + B_AX_TBTT_SHIFT_OFST_MASK, val); + + return; + } + + rtw89_fw_h2c_tbtt_tuning(rtwdev, rtwvif_link, offset); +} + +static void rtw89_bcn_update_tbtt_offset(struct rtw89_dev *rtwdev, + struct rtw89_vif_link *rtwvif_link) +{ + struct rtw89_beacon_stat *bcn_stat = &rtwdev->phystat.bcn_stat; + struct rtw89_beacon_track_info *bcn_track = &rtwdev->bcn_track; + u32 *tbtt_us = bcn_stat->tbtt_us; + u32 offset = tbtt_us[0]; + u8 i; + + for (i = 1; i < RTW89_BCN_TRACK_STAT_NR; i++) + offset = rtw89_bcn_calc_min_tbtt(rtwdev, tbtt_us[i], offset); + + if (bcn_track->tbtt_offset == offset) + return; + + bcn_track->tbtt_offset = offset; + rtw89_bcn_cfg_tbtt_offset(rtwdev, rtwvif_link); +} + +static int cmp_u16(const void *a, const void *b) +{ + return *(const u16 *)a - *(const u16 *)b; +} + +static u16 _rtw89_bcn_calc_drift(u16 tbtt, u16 offset, u16 beacon_int) +{ + if (tbtt < offset) + return beacon_int - offset + tbtt; + + return tbtt - offset; +} + +static void rtw89_bcn_calc_drift(struct rtw89_dev *rtwdev) +{ + struct rtw89_beacon_stat *bcn_stat = &rtwdev->phystat.bcn_stat; + struct rtw89_beacon_track_info *bcn_track = &rtwdev->bcn_track; + u16 offset_tu = bcn_track->tbtt_offset / 1024; + u16 *tbtt_tu = bcn_stat->tbtt_tu; + u16 *drift = bcn_stat->drift; + u8 i; + + bcn_stat->tbtt_tu_min = U16_MAX; + bcn_stat->tbtt_tu_max = 0; + for (i = 0; i < RTW89_BCN_TRACK_STAT_NR; i++) { + drift[i] = _rtw89_bcn_calc_drift(tbtt_tu[i], offset_tu, + bcn_track->beacon_int); + + bcn_stat->tbtt_tu_min = min(bcn_stat->tbtt_tu_min, tbtt_tu[i]); + bcn_stat->tbtt_tu_max = max(bcn_stat->tbtt_tu_max, tbtt_tu[i]); + } + + sort(drift, RTW89_BCN_TRACK_STAT_NR, sizeof(*drift), cmp_u16, NULL); +} + +static void rtw89_bcn_calc_distribution(struct rtw89_dev *rtwdev) +{ + struct rtw89_beacon_stat *bcn_stat = &rtwdev->phystat.bcn_stat; + struct rtw89_beacon_dist *bcn_dist = &bcn_stat->bcn_dist; + u16 lower_bound, upper_bound, outlier_count = 0; + u16 *drift = bcn_stat->drift; + u16 *bins = bcn_dist->bins; + u16 q1, q3, iqr, tmp; + u8 i; + + BUILD_BUG_ON(RTW89_BCN_TRACK_STAT_NR % 4 != 0); + + memset(bcn_dist, 0, sizeof(*bcn_dist)); + + bcn_dist->min = drift[0]; + bcn_dist->max = drift[RTW89_BCN_TRACK_STAT_NR - 1]; + + tmp = RTW89_BCN_TRACK_STAT_NR / 4; + q1 = ((drift[tmp] + drift[tmp - 1]) * RTW89_BCN_TRACK_SCALE_FACTOR) / 2; + + tmp = (RTW89_BCN_TRACK_STAT_NR * 3) / 4; + q3 = ((drift[tmp] + drift[tmp - 1]) * RTW89_BCN_TRACK_SCALE_FACTOR) / 2; + + iqr = q3 - q1; + tmp = (3 * iqr) / 2; + + if (bcn_dist->min <= 5) + lower_bound = bcn_dist->min; + else if (q1 > tmp) + lower_bound = (q1 - tmp) / RTW89_BCN_TRACK_SCALE_FACTOR; + else + lower_bound = 0; + + upper_bound = (q3 + tmp) / RTW89_BCN_TRACK_SCALE_FACTOR; + + for (i = 0; i < RTW89_BCN_TRACK_STAT_NR; i++) { + u16 tbtt = bcn_stat->tbtt_tu[i]; + u16 min = bcn_stat->tbtt_tu_min; + u8 bin_idx; + + /* histogram */ + bin_idx = min((tbtt - min) / RTW89_BCN_TRACK_BIN_WIDTH, + RTW89_BCN_TRACK_MAX_BIN_NUM - 1); + bins[bin_idx]++; + + /* boxplot outlier */ + if (drift[i] < lower_bound || drift[i] > upper_bound) + outlier_count++; + } + + bcn_dist->outlier_count = outlier_count; + bcn_dist->lower_bound = lower_bound; + bcn_dist->upper_bound = upper_bound; +} + +static u8 rtw89_bcn_get_coverage(struct rtw89_dev *rtwdev, u16 threshold) +{ + struct rtw89_beacon_stat *bcn_stat = &rtwdev->phystat.bcn_stat; + int l = 0, r = RTW89_BCN_TRACK_STAT_NR - 1, m; + u16 *drift = bcn_stat->drift; + int index = -1; + u8 count = 0; + + while (l <= r) { + m = l + (r - l) / 2; + + if (drift[m] <= threshold) { + index = m; + l = m + 1; + } else { + r = m - 1; + } + } + + count = (index == -1) ? 0 : (index + 1); + + return (count * PERCENT) / RTW89_BCN_TRACK_STAT_NR; +} + +static u16 rtw89_bcn_get_histogram_bound(struct rtw89_dev *rtwdev, u8 target) +{ + struct rtw89_beacon_stat *bcn_stat = &rtwdev->phystat.bcn_stat; + struct rtw89_beacon_dist *bcn_dist = &bcn_stat->bcn_dist; + u16 tbtt_tu_max = bcn_stat->tbtt_tu_max; + u16 upper, lower = bcn_stat->tbtt_tu_min; + u8 i, count = 0; + + for (i = 0; i < RTW89_BCN_TRACK_MAX_BIN_NUM; i++) { + upper = lower + RTW89_BCN_TRACK_BIN_WIDTH - 1; + if (i == RTW89_BCN_TRACK_MAX_BIN_NUM - 1) + upper = max(upper, tbtt_tu_max); + + count += bcn_dist->bins[i]; + if (count > target) + break; + + lower = upper + 1; + } + + return upper; +} + +static u16 rtw89_bcn_get_rx_time(struct rtw89_dev *rtwdev, + const struct rtw89_chan *chan) +{ +#define RTW89_SYMBOL_TIME_2GHZ 192 +#define RTW89_SYMBOL_TIME_5GHZ 20 +#define RTW89_SYMBOL_TIME_6GHZ 20 + struct rtw89_pkt_stat *pkt_stat = &rtwdev->phystat.cur_pkt_stat; + u16 bitrate, val; + + if (!rtw89_legacy_rate_to_bitrate(rtwdev, pkt_stat->beacon_rate, &bitrate)) + return 0; + + val = (pkt_stat->beacon_len * 8 * RTW89_BCN_TRACK_SCALE_FACTOR) / bitrate; + + switch (chan->band_type) { + default: + case RTW89_BAND_2G: + val += RTW89_SYMBOL_TIME_2GHZ; + break; + case RTW89_BAND_5G: + val += RTW89_SYMBOL_TIME_5GHZ; + break; + case RTW89_BAND_6G: + val += RTW89_SYMBOL_TIME_6GHZ; + break; + } + + /* convert to millisecond */ + return DIV_ROUND_UP(val, 1000); +} + +static void rtw89_bcn_calc_timeout(struct rtw89_dev *rtwdev, + struct rtw89_vif_link *rtwvif_link) +{ +#define RTW89_BCN_TRACK_EXTEND_TIMEOUT 5 +#define RTW89_BCN_TRACK_COVERAGE_TH 0 /* unit: TU */ +#define RTW89_BCN_TRACK_STRONG_RSSI 80 + const struct rtw89_chan *chan = rtw89_chan_get(rtwdev, rtwvif_link->chanctx_idx); + struct rtw89_pkt_stat *pkt_stat = &rtwdev->phystat.cur_pkt_stat; + struct rtw89_beacon_stat *bcn_stat = &rtwdev->phystat.bcn_stat; + struct rtw89_beacon_track_info *bcn_track = &rtwdev->bcn_track; + struct rtw89_beacon_dist *bcn_dist = &bcn_stat->bcn_dist; + u16 outlier_high_bcn_th = bcn_track->outlier_high_bcn_th; + u16 outlier_low_bcn_th = bcn_track->outlier_low_bcn_th; + u8 rssi = ewma_rssi_read(&rtwdev->phystat.bcn_rssi); + u16 target_bcn_th = bcn_track->target_bcn_th; + u16 low_bcn_th = bcn_track->low_bcn_th; + u16 med_bcn_th = bcn_track->med_bcn_th; + u16 beacon_int = bcn_track->beacon_int; + u16 bcn_timeout; + + if (pkt_stat->beacon_nr < low_bcn_th) { + bcn_timeout = (RTW89_BCN_TRACK_TARGET_BCN * beacon_int) / PERCENT; + goto out; + } + + if (bcn_dist->outlier_count >= outlier_high_bcn_th) { + bcn_timeout = bcn_dist->max; + goto out; + } + + if (pkt_stat->beacon_nr < med_bcn_th) { + if (bcn_dist->outlier_count > outlier_low_bcn_th) + bcn_timeout = (bcn_dist->max + bcn_dist->upper_bound) / 2; + else + bcn_timeout = bcn_dist->upper_bound + + RTW89_BCN_TRACK_EXTEND_TIMEOUT; + + goto out; + } + + if (rssi >= RTW89_BCN_TRACK_STRONG_RSSI) { + if (rtw89_bcn_get_coverage(rtwdev, RTW89_BCN_TRACK_COVERAGE_TH) >= 90) { + /* ideal case */ + bcn_timeout = 0; + } else { + u16 offset_tu = bcn_track->tbtt_offset / 1024; + u16 upper_bound; + + upper_bound = + rtw89_bcn_get_histogram_bound(rtwdev, target_bcn_th); + bcn_timeout = + _rtw89_bcn_calc_drift(upper_bound, offset_tu, beacon_int); + } + + goto out; + } + + bcn_timeout = bcn_stat->drift[target_bcn_th]; + +out: + bcn_track->bcn_timeout = bcn_timeout + rtw89_bcn_get_rx_time(rtwdev, chan); +} + +static void rtw89_bcn_update_timeout(struct rtw89_dev *rtwdev, + struct rtw89_vif_link *rtwvif_link) +{ + rtw89_bcn_calc_drift(rtwdev); + rtw89_bcn_calc_distribution(rtwdev); + rtw89_bcn_calc_timeout(rtwdev, rtwvif_link); +} + +static void rtw89_core_bcn_track(struct rtw89_dev *rtwdev) +{ + struct rtw89_beacon_track_info *bcn_track = &rtwdev->bcn_track; + struct rtw89_vif_link *rtwvif_link; + struct rtw89_vif *rtwvif; + unsigned int link_id; + + if (!RTW89_CHK_FW_FEATURE(BEACON_TRACKING, &rtwdev->fw)) + return; + + if (!rtwdev->lps_enabled) + return; + + if (!bcn_track->is_data_ready) + return; + + rtw89_for_each_rtwvif(rtwdev, rtwvif) { + rtw89_vif_for_each_link(rtwvif, rtwvif_link, link_id) { + if (!(rtwvif_link->wifi_role == RTW89_WIFI_ROLE_STATION || + rtwvif_link->wifi_role == RTW89_WIFI_ROLE_P2P_CLIENT)) + continue; + + rtw89_bcn_update_tbtt_offset(rtwdev, rtwvif_link); + rtw89_bcn_update_timeout(rtwdev, rtwvif_link); + } + } +} + +static bool rtw89_core_bcn_track_can_lps(struct rtw89_dev *rtwdev) +{ + struct rtw89_beacon_track_info *bcn_track = &rtwdev->bcn_track; + + if (!RTW89_CHK_FW_FEATURE(BEACON_TRACKING, &rtwdev->fw)) + return true; + + return bcn_track->is_data_ready; +} + +static void rtw89_core_bcn_track_assoc(struct rtw89_dev *rtwdev, + struct rtw89_vif_link *rtwvif_link) +{ +#define RTW89_BCN_TRACK_MED_BCN 70 +#define RTW89_BCN_TRACK_LOW_BCN 30 +#define RTW89_BCN_TRACK_OUTLIER_HIGH_BCN 30 +#define RTW89_BCN_TRACK_OUTLIER_LOW_BCN 20 + struct rtw89_beacon_track_info *bcn_track = &rtwdev->bcn_track; + u32 period = jiffies_to_msecs(RTW89_TRACK_WORK_PERIOD); + struct ieee80211_bss_conf *bss_conf; + u32 beacons_in_period; + u32 bcn_intvl_us; + u16 beacon_int; + u8 dtim; + + rcu_read_lock(); + bss_conf = rtw89_vif_rcu_dereference_link(rtwvif_link, true); + beacon_int = bss_conf->beacon_int; + dtim = bss_conf->dtim_period; + rcu_read_unlock(); + + beacons_in_period = period / beacon_int / dtim; + bcn_intvl_us = ieee80211_tu_to_usec(beacon_int); + + bcn_track->low_bcn_th = + (beacons_in_period * RTW89_BCN_TRACK_LOW_BCN) / PERCENT; + bcn_track->med_bcn_th = + (beacons_in_period * RTW89_BCN_TRACK_MED_BCN) / PERCENT; + bcn_track->outlier_low_bcn_th = + (RTW89_BCN_TRACK_STAT_NR * RTW89_BCN_TRACK_OUTLIER_LOW_BCN) / PERCENT; + bcn_track->outlier_high_bcn_th = + (RTW89_BCN_TRACK_STAT_NR * RTW89_BCN_TRACK_OUTLIER_HIGH_BCN) / PERCENT; + bcn_track->target_bcn_th = + (RTW89_BCN_TRACK_STAT_NR * RTW89_BCN_TRACK_TARGET_BCN) / PERCENT; + + bcn_track->close_bcn_intvl_th = ieee80211_tu_to_usec(beacon_int - 3); + bcn_track->tbtt_diff_th = (bcn_intvl_us * 85) / PERCENT; + bcn_track->beacon_int = beacon_int; + bcn_track->dtim = dtim; +} + +static void rtw89_core_bcn_track_reset(struct rtw89_dev *rtwdev) +{ + memset(&rtwdev->phystat.bcn_stat, 0, sizeof(rtwdev->phystat.bcn_stat)); + memset(&rtwdev->bcn_track, 0, sizeof(rtwdev->bcn_track)); +} + +static void rtw89_vif_rx_bcn_stat(struct rtw89_dev *rtwdev, + struct ieee80211_bss_conf *bss_conf, + struct sk_buff *skb) +{ +#define RTW89_APPEND_TSF_2GHZ 384 +#define RTW89_APPEND_TSF_5GHZ 52 +#define RTW89_APPEND_TSF_6GHZ 52 + struct ieee80211_mgmt *mgmt = (struct ieee80211_mgmt *)skb->data; + struct ieee80211_rx_status *rx_status = IEEE80211_SKB_RXCB(skb); + struct rtw89_beacon_stat *bcn_stat = &rtwdev->phystat.bcn_stat; + struct rtw89_beacon_track_info *bcn_track = &rtwdev->bcn_track; + u32 bcn_intvl_us = ieee80211_tu_to_usec(bss_conf->beacon_int); + u64 tsf = le64_to_cpu(mgmt->u.beacon.timestamp); + u8 wp, num = bcn_stat->num; + u16 append; + + if (!RTW89_CHK_FW_FEATURE(BEACON_TRACKING, &rtwdev->fw)) + return; + + switch (rx_status->band) { + default: + case NL80211_BAND_2GHZ: + append = RTW89_APPEND_TSF_2GHZ; + break; + case NL80211_BAND_5GHZ: + append = RTW89_APPEND_TSF_5GHZ; + break; + case NL80211_BAND_6GHZ: + append = RTW89_APPEND_TSF_6GHZ; + break; + } + + wp = bcn_stat->wp; + div_u64_rem(tsf - append, bcn_intvl_us, &bcn_stat->tbtt_us[wp]); + bcn_stat->tbtt_tu[wp] = bcn_stat->tbtt_us[wp] / 1024; + bcn_stat->wp = (wp + 1) % RTW89_BCN_TRACK_STAT_NR; + bcn_stat->num = umin(num + 1, RTW89_BCN_TRACK_STAT_NR); + bcn_track->is_data_ready = bcn_stat->num == RTW89_BCN_TRACK_STAT_NR; +} + static void rtw89_vif_rx_stats_iter(void *data, u8 *mac, struct ieee80211_vif *vif) { @@ -2272,7 +2703,6 @@ static void rtw89_vif_rx_stats_iter(void *data, u8 *mac, rtw89_vif_sync_bcn_tsf(rtwvif_link, hdr, skb->len); rtw89_fw_h2c_rssi_offload(rtwdev, phy_ppdu); } - pkt_stat->beacon_nr++; if (phy_ppdu) { ewma_rssi_add(&rtwdev->phystat.bcn_rssi, phy_ppdu->rssi_avg); @@ -2280,7 +2710,11 @@ static void rtw89_vif_rx_stats_iter(void *data, u8 *mac, rtwvif_link->bcn_bw_idx = phy_ppdu->bw_idx; } + pkt_stat->beacon_nr++; pkt_stat->beacon_rate = desc_info->data_rate; + pkt_stat->beacon_len = skb->len; + + rtw89_vif_rx_bcn_stat(rtwdev, bss_conf, skb); } if (!ether_addr_equal(bss_conf->addr, hdr->addr1)) @@ -3697,6 +4131,9 @@ static void rtw89_enter_lps_track(struct rtw89_dev *rtwdev) vif->type == NL80211_IFTYPE_P2P_CLIENT)) continue; + if (!rtw89_core_bcn_track_can_lps(rtwdev)) + continue; + rtw89_enter_lps(rtwdev, rtwvif, true); } } @@ -3883,6 +4320,7 @@ static void rtw89_track_work(struct wiphy *wiphy, struct wiphy_work *work) rtw89_btc_ntfy_wl_sta(rtwdev); } rtw89_mac_bf_monitor_track(rtwdev); + rtw89_core_bcn_track(rtwdev); rtw89_phy_stat_track(rtwdev); rtw89_phy_env_monitor_track(rtwdev); rtw89_phy_dig(rtwdev); @@ -4129,8 +4567,10 @@ int rtw89_core_sta_link_disassoc(struct rtw89_dev *rtwdev, rtw89_assoc_link_clr(rtwsta_link); - if (vif->type == NL80211_IFTYPE_STATION) + if (vif->type == NL80211_IFTYPE_STATION) { rtw89_fw_h2c_set_bcn_fltr_cfg(rtwdev, rtwvif_link, false); + rtw89_core_bcn_track_reset(rtwdev); + } if (rtwvif_link->wifi_role == RTW89_WIFI_ROLE_P2P_CLIENT) rtw89_p2p_noa_once_deinit(rtwvif_link); @@ -4271,6 +4711,7 @@ int rtw89_core_sta_link_assoc(struct rtw89_dev *rtwdev, BTC_ROLE_MSTS_STA_CONN_END); rtw89_core_get_no_ul_ofdma_htc(rtwdev, &rtwsta_link->htc_template, chan); rtw89_phy_ul_tb_assoc(rtwdev, rtwvif_link); + rtw89_core_bcn_track_assoc(rtwdev, rtwvif_link); ret = rtw89_fw_h2c_general_pkt(rtwdev, rtwvif_link, rtwsta_link->mac_id); if (ret) { diff --git a/drivers/net/wireless/realtek/rtw89/core.h b/drivers/net/wireless/realtek/rtw89/core.h index 43e10278e14d..ca48426c577f 100644 --- a/drivers/net/wireless/realtek/rtw89/core.h +++ b/drivers/net/wireless/realtek/rtw89/core.h @@ -4622,6 +4622,7 @@ enum rtw89_fw_feature { RTW89_FW_FEATURE_SCAN_OFFLOAD_EXTRA_OP, RTW89_FW_FEATURE_RFK_NTFY_MCC_V0, RTW89_FW_FEATURE_LPS_DACK_BY_C2H_REG, + RTW89_FW_FEATURE_BEACON_TRACKING, }; struct rtw89_fw_suit { @@ -5074,9 +5075,36 @@ struct rtw89_pkt_drop_params { struct rtw89_pkt_stat { u16 beacon_nr; u8 beacon_rate; + u32 beacon_len; u32 rx_rate_cnt[RTW89_HW_RATE_NR]; }; +#define RTW89_BCN_TRACK_STAT_NR 32 +#define RTW89_BCN_TRACK_SCALE_FACTOR 10 +#define RTW89_BCN_TRACK_MAX_BIN_NUM 6 +#define RTW89_BCN_TRACK_BIN_WIDTH 5 +#define RTW89_BCN_TRACK_TARGET_BCN 80 + +struct rtw89_beacon_dist { + u16 min; + u16 max; + u16 outlier_count; + u16 lower_bound; + u16 upper_bound; + u16 bins[RTW89_BCN_TRACK_MAX_BIN_NUM]; +}; + +struct rtw89_beacon_stat { + u8 num; + u8 wp; + u16 tbtt_tu_min; + u16 tbtt_tu_max; + u16 drift[RTW89_BCN_TRACK_STAT_NR]; + u32 tbtt_us[RTW89_BCN_TRACK_STAT_NR]; + u16 tbtt_tu[RTW89_BCN_TRACK_STAT_NR]; + struct rtw89_beacon_dist bcn_dist; +}; + DECLARE_EWMA(thermal, 4, 4); struct rtw89_phy_stat { @@ -5085,6 +5113,7 @@ struct rtw89_phy_stat { struct ewma_rssi bcn_rssi; struct rtw89_pkt_stat cur_pkt_stat; struct rtw89_pkt_stat last_pkt_stat; + struct rtw89_beacon_stat bcn_stat; }; enum rtw89_rfk_report_state { @@ -5882,6 +5911,24 @@ struct rtw89_mlo_info { struct rtw89_wait_info wait; }; +struct rtw89_beacon_track_info { + bool is_data_ready; + u32 tbtt_offset; /* in unit of microsecond */ + u16 bcn_timeout; /* in unit of millisecond */ + + /* The following are constant and set at association. */ + u8 dtim; + u16 beacon_int; + u16 low_bcn_th; + u16 med_bcn_th; + u16 high_bcn_th; + u16 target_bcn_th; + u16 outlier_low_bcn_th; + u16 outlier_high_bcn_th; + u32 close_bcn_intvl_th; + u32 tbtt_diff_th; +}; + struct rtw89_dev { struct ieee80211_hw *hw; struct device *dev; @@ -5896,6 +5943,7 @@ struct rtw89_dev { const struct rtw89_pci_info *pci_info; const struct rtw89_rfe_parms *rfe_parms; struct rtw89_hal hal; + struct rtw89_beacon_track_info bcn_track; struct rtw89_mcc_info mcc; struct rtw89_mlo_info mlo; struct rtw89_mac_info mac; @@ -7454,7 +7502,7 @@ void rtw89_vif_type_mapping(struct rtw89_vif_link *rtwvif_link, bool assoc); int rtw89_chip_info_setup(struct rtw89_dev *rtwdev); void rtw89_chip_cfg_txpwr_ul_tb_offset(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rtwvif_link); -bool rtw89_ra_report_to_bitrate(struct rtw89_dev *rtwdev, u8 rpt_rate, u16 *bitrate); +bool rtw89_legacy_rate_to_bitrate(struct rtw89_dev *rtwdev, u8 legacy_rate, u16 *bitrate); int rtw89_regd_setup(struct rtw89_dev *rtwdev); int rtw89_regd_init_hint(struct rtw89_dev *rtwdev); const char *rtw89_regd_get_string(enum rtw89_regulation_type regd); diff --git a/drivers/net/wireless/realtek/rtw89/fw.c b/drivers/net/wireless/realtek/rtw89/fw.c index 16e59a4a486e..cc1956245f9b 100644 --- a/drivers/net/wireless/realtek/rtw89/fw.c +++ b/drivers/net/wireless/realtek/rtw89/fw.c @@ -835,6 +835,7 @@ static const struct __fw_feat_cfg fw_feat_tbl[] = { __CFG_FW_FEAT(RTL8852BT, ge, 0, 29, 90, 0, CRASH_TRIGGER_TYPE_0), __CFG_FW_FEAT(RTL8852BT, ge, 0, 29, 91, 0, SCAN_OFFLOAD), __CFG_FW_FEAT(RTL8852BT, ge, 0, 29, 110, 0, BEACON_FILTER), + __CFG_FW_FEAT(RTL8852BT, ge, 0, 29, 122, 0, BEACON_TRACKING), __CFG_FW_FEAT(RTL8852BT, ge, 0, 29, 127, 0, SCAN_OFFLOAD_EXTRA_OP), __CFG_FW_FEAT(RTL8852BT, ge, 0, 29, 127, 0, LPS_DACK_BY_C2H_REG), __CFG_FW_FEAT(RTL8852BT, ge, 0, 29, 127, 0, CRASH_TRIGGER_TYPE_1), @@ -846,6 +847,7 @@ static const struct __fw_feat_cfg fw_feat_tbl[] = { __CFG_FW_FEAT(RTL8852C, ge, 0, 27, 56, 10, BEACON_FILTER), __CFG_FW_FEAT(RTL8852C, ge, 0, 27, 80, 0, WOW_REASON_V1), __CFG_FW_FEAT(RTL8852C, ge, 0, 27, 128, 0, BEACON_LOSS_COUNT_V1), + __CFG_FW_FEAT(RTL8852C, ge, 0, 27, 129, 1, BEACON_TRACKING), __CFG_FW_FEAT(RTL8922A, ge, 0, 34, 30, 0, CRASH_TRIGGER_TYPE_0), __CFG_FW_FEAT(RTL8922A, ge, 0, 34, 11, 0, MACID_PAUSE_SLEEP), __CFG_FW_FEAT(RTL8922A, ge, 0, 34, 35, 0, SCAN_OFFLOAD), @@ -864,6 +866,7 @@ static const struct __fw_feat_cfg fw_feat_tbl[] = { __CFG_FW_FEAT(RTL8922A, ge, 0, 35, 71, 0, BEACON_LOSS_COUNT_V1), __CFG_FW_FEAT(RTL8922A, ge, 0, 35, 76, 0, LPS_DACK_BY_C2H_REG), __CFG_FW_FEAT(RTL8922A, ge, 0, 35, 79, 0, CRASH_TRIGGER_TYPE_1), + __CFG_FW_FEAT(RTL8922A, ge, 0, 35, 80, 0, BEACON_TRACKING), }; static void rtw89_fw_iterate_feature_cfg(struct rtw89_fw_info *fw, @@ -3990,6 +3993,93 @@ fail: } EXPORT_SYMBOL(rtw89_fw_h2c_update_beacon_be); +int rtw89_fw_h2c_tbtt_tuning(struct rtw89_dev *rtwdev, + struct rtw89_vif_link *rtwvif_link, u32 offset) +{ + struct rtw89_h2c_tbtt_tuning *h2c; + u32 len = sizeof(*h2c); + struct sk_buff *skb; + int ret; + + skb = rtw89_fw_h2c_alloc_skb_with_hdr(rtwdev, len); + if (!skb) { + rtw89_err(rtwdev, "failed to alloc skb for h2c tbtt tuning\n"); + return -ENOMEM; + } + skb_put(skb, len); + h2c = (struct rtw89_h2c_tbtt_tuning *)skb->data; + + h2c->w0 = le32_encode_bits(rtwvif_link->phy_idx, RTW89_H2C_TBTT_TUNING_W0_BAND) | + le32_encode_bits(rtwvif_link->port, RTW89_H2C_TBTT_TUNING_W0_PORT); + h2c->w1 = le32_encode_bits(offset, RTW89_H2C_TBTT_TUNING_W1_SHIFT); + + rtw89_h2c_pkt_set_hdr(rtwdev, skb, FWCMD_TYPE_H2C, + H2C_CAT_MAC, H2C_CL_MAC_PS, + H2C_FUNC_TBTT_TUNING, 0, 0, + len); + + ret = rtw89_h2c_tx(rtwdev, skb, false); + if (ret) { + rtw89_err(rtwdev, "failed to send h2c\n"); + goto fail; + } + + return 0; +fail: + dev_kfree_skb_any(skb); + + return ret; +} + +int rtw89_fw_h2c_pwr_lvl(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rtwvif_link) +{ +#define RTW89_BCN_TO_VAL_MIN 4 +#define RTW89_BCN_TO_VAL_MAX 64 +#define RTW89_DTIM_TO_VAL_MIN 7 +#define RTW89_DTIM_TO_VAL_MAX 15 + struct rtw89_beacon_track_info *bcn_track = &rtwdev->bcn_track; + struct rtw89_h2c_pwr_lvl *h2c; + u32 len = sizeof(*h2c); + struct sk_buff *skb; + u8 bcn_to_val; + int ret; + + skb = rtw89_fw_h2c_alloc_skb_with_hdr(rtwdev, len); + if (!skb) { + rtw89_err(rtwdev, "failed to alloc skb for h2c pwr lvl\n"); + return -ENOMEM; + } + skb_put(skb, len); + h2c = (struct rtw89_h2c_pwr_lvl *)skb->data; + + bcn_to_val = clamp_t(u8, bcn_track->bcn_timeout, + RTW89_BCN_TO_VAL_MIN, RTW89_BCN_TO_VAL_MAX); + + h2c->w0 = le32_encode_bits(rtwvif_link->mac_id, RTW89_H2C_PWR_LVL_W0_MACID) | + le32_encode_bits(bcn_to_val, RTW89_H2C_PWR_LVL_W0_BCN_TO_VAL) | + le32_encode_bits(0, RTW89_H2C_PWR_LVL_W0_PS_LVL) | + le32_encode_bits(0, RTW89_H2C_PWR_LVL_W0_TRX_LVL) | + le32_encode_bits(RTW89_DTIM_TO_VAL_MIN, + RTW89_H2C_PWR_LVL_W0_DTIM_TO_VAL); + + rtw89_h2c_pkt_set_hdr(rtwdev, skb, FWCMD_TYPE_H2C, + H2C_CAT_MAC, H2C_CL_MAC_PS, + H2C_FUNC_PS_POWER_LEVEL, 0, 0, + len); + + ret = rtw89_h2c_tx(rtwdev, skb, false); + if (ret) { + rtw89_err(rtwdev, "failed to send h2c\n"); + goto fail; + } + + return 0; +fail: + dev_kfree_skb_any(skb); + + return ret; +} + int rtw89_fw_h2c_role_maintain(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rtwvif_link, struct rtw89_sta_link *rtwsta_link, diff --git a/drivers/net/wireless/realtek/rtw89/fw.h b/drivers/net/wireless/realtek/rtw89/fw.h index 3de29137b113..fad9d87d3e56 100644 --- a/drivers/net/wireless/realtek/rtw89/fw.h +++ b/drivers/net/wireless/realtek/rtw89/fw.h @@ -1602,6 +1602,28 @@ struct rtw89_h2c_bcn_upd_be { #define RTW89_H2C_BCN_UPD_BE_W7_ECSA_OFST GENMASK(30, 16) #define RTW89_H2C_BCN_UPD_BE_W7_PROTECTION_KEY_ID BIT(31) +struct rtw89_h2c_tbtt_tuning { + __le32 w0; + __le32 w1; +} __packed; + +#define RTW89_H2C_TBTT_TUNING_W0_BAND GENMASK(3, 0) +#define RTW89_H2C_TBTT_TUNING_W0_PORT GENMASK(7, 4) +#define RTW89_H2C_TBTT_TUNING_W1_SHIFT GENMASK(31, 0) + +struct rtw89_h2c_pwr_lvl { + __le32 w0; + __le32 w1; +} __packed; + +#define RTW89_H2C_PWR_LVL_W0_MACID GENMASK(7, 0) +#define RTW89_H2C_PWR_LVL_W0_BCN_TO_VAL GENMASK(15, 8) +#define RTW89_H2C_PWR_LVL_W0_PS_LVL GENMASK(19, 16) +#define RTW89_H2C_PWR_LVL_W0_TRX_LVL GENMASK(23, 20) +#define RTW89_H2C_PWR_LVL_W0_BCN_TO_LVL GENMASK(27, 24) +#define RTW89_H2C_PWR_LVL_W0_DTIM_TO_VAL GENMASK(31, 28) +#define RTW89_H2C_PWR_LVL_W1_MACID_EXT GENMASK(7, 0) + struct rtw89_h2c_role_maintain { __le32 w0; }; @@ -4201,6 +4223,8 @@ enum rtw89_ps_h2c_func { H2C_FUNC_MAC_LPS_PARM = 0x0, H2C_FUNC_P2P_ACT = 0x1, H2C_FUNC_IPS_CFG = 0x3, + H2C_FUNC_PS_POWER_LEVEL = 0x7, + H2C_FUNC_TBTT_TUNING = 0xA, NUM_OF_RTW89_PS_H2C_FUNC, }; @@ -4750,6 +4774,9 @@ int rtw89_fw_h2c_update_beacon(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rtwvif_link); int rtw89_fw_h2c_update_beacon_be(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rtwvif_link); +int rtw89_fw_h2c_tbtt_tuning(struct rtw89_dev *rtwdev, + struct rtw89_vif_link *rtwvif_link, u32 offset); +int rtw89_fw_h2c_pwr_lvl(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rtwvif_link); int rtw89_fw_h2c_cam(struct rtw89_dev *rtwdev, struct rtw89_vif_link *vif, struct rtw89_sta_link *rtwsta_link, const u8 *scan_mac_addr); int rtw89_fw_h2c_dctl_sec_cam_v1(struct rtw89_dev *rtwdev, diff --git a/drivers/net/wireless/realtek/rtw89/mac.c b/drivers/net/wireless/realtek/rtw89/mac.c index 33a7dd9d6f0e..e4f9d251d5ef 100644 --- a/drivers/net/wireless/realtek/rtw89/mac.c +++ b/drivers/net/wireless/realtek/rtw89/mac.c @@ -4649,27 +4649,6 @@ static void rtw89_mac_port_cfg_bcn_early(struct rtw89_dev *rtwdev, BCN_ERLY_DEF); } -static void rtw89_mac_port_cfg_tbtt_shift(struct rtw89_dev *rtwdev, - struct rtw89_vif_link *rtwvif_link) -{ - const struct rtw89_mac_gen_def *mac = rtwdev->chip->mac_def; - const struct rtw89_port_reg *p = mac->port_base; - u16 val; - - if (rtwdev->chip->chip_id != RTL8852C) - return; - - if (rtwvif_link->wifi_role != RTW89_WIFI_ROLE_P2P_CLIENT && - rtwvif_link->wifi_role != RTW89_WIFI_ROLE_STATION) - return; - - val = FIELD_PREP(B_AX_TBTT_SHIFT_OFST_MAG, 1) | - B_AX_TBTT_SHIFT_OFST_SIGN; - - rtw89_write16_port_mask(rtwdev, rtwvif_link, p->tbtt_shift, - B_AX_TBTT_SHIFT_OFST_MASK, val); -} - void rtw89_mac_port_tsf_sync(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rtwvif_link, struct rtw89_vif_link *rtwvif_src, @@ -4820,7 +4799,6 @@ int rtw89_mac_port_update(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rtwvi rtw89_mac_port_cfg_bcn_hold_time(rtwdev, rtwvif_link); rtw89_mac_port_cfg_bcn_mask_area(rtwdev, rtwvif_link); rtw89_mac_port_cfg_tbtt_early(rtwdev, rtwvif_link); - rtw89_mac_port_cfg_tbtt_shift(rtwdev, rtwvif_link); rtw89_mac_port_cfg_bss_color(rtwdev, rtwvif_link); rtw89_mac_port_cfg_mbssid(rtwdev, rtwvif_link); rtw89_mac_port_cfg_func_en(rtwdev, rtwvif_link, true); diff --git a/drivers/net/wireless/realtek/rtw89/phy.c b/drivers/net/wireless/realtek/rtw89/phy.c index 01a03d2de3ff..06598723074e 100644 --- a/drivers/net/wireless/realtek/rtw89/phy.c +++ b/drivers/net/wireless/realtek/rtw89/phy.c @@ -2943,7 +2943,7 @@ static void __rtw89_phy_c2h_ra_rpt_iter(struct rtw89_sta_link *rtwsta_link, } if (mode == RTW89_RA_RPT_MODE_LEGACY) { - valid = rtw89_ra_report_to_bitrate(rtwdev, rate, &legacy_bitrate); + valid = rtw89_legacy_rate_to_bitrate(rtwdev, rate, &legacy_bitrate); if (!valid) return; } diff --git a/drivers/net/wireless/realtek/rtw89/ps.c b/drivers/net/wireless/realtek/rtw89/ps.c index 652f8fc81b79..cf58121eb541 100644 --- a/drivers/net/wireless/realtek/rtw89/ps.c +++ b/drivers/net/wireless/realtek/rtw89/ps.c @@ -119,6 +119,9 @@ static void __rtw89_enter_lps_link(struct rtw89_dev *rtwdev, rtw89_btc_ntfy_radio_state(rtwdev, BTC_RFCTRL_FW_CTRL); rtw89_fw_h2c_lps_parm(rtwdev, &lps_param); + + if (RTW89_CHK_FW_FEATURE(BEACON_TRACKING, &rtwdev->fw)) + rtw89_fw_h2c_pwr_lvl(rtwdev, rtwvif_link); } static void __rtw89_leave_lps(struct rtw89_dev *rtwdev, From 194b7ce982471468569299991b110049c4617163 Mon Sep 17 00:00:00 2001 From: Kuan-Chung Chen Date: Mon, 11 Aug 2025 20:37:38 +0800 Subject: [PATCH 0124/1536] wifi: rtw89: debug: add beacon_info debugfs Add debugfs beacon_info to display the information of beacon tracking. The screenshot as below: [Beacon info] count: 15 interval: 100 dtim: 1 raw rssi: 146 hw rate: 0 length: 407 [Distribution] tbtt 00 - 04: 26 05 - 09: 5 10 - 14: 1 15 - 19: 0 20 - 24: 0 25 - 29: 0 drift 0: 9 1: 7 2: 3 3: 3 4: 4 5: 1 6: 1 7: 2 8: 1 11: 1 lower bound: 0 upper bound: 10 outlier count: 1 [Tracking] tbtt offset: 20 bcn timeout: 19 Signed-off-by: Kuan-Chung Chen Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20250811123744.15361-3-pkshih@realtek.com --- drivers/net/wireless/realtek/rtw89/debug.c | 61 ++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/drivers/net/wireless/realtek/rtw89/debug.c b/drivers/net/wireless/realtek/rtw89/debug.c index afdfb9647fc1..a37cdd73ddc3 100644 --- a/drivers/net/wireless/realtek/rtw89/debug.c +++ b/drivers/net/wireless/realtek/rtw89/debug.c @@ -86,6 +86,7 @@ struct rtw89_debugfs { struct rtw89_debugfs_priv stations; struct rtw89_debugfs_priv disable_dm; struct rtw89_debugfs_priv mlo_mode; + struct rtw89_debugfs_priv beacon_info; }; struct rtw89_debugfs_iter_data { @@ -4298,6 +4299,64 @@ rtw89_debug_priv_mlo_mode_set(struct rtw89_dev *rtwdev, return count; } +static ssize_t +rtw89_debug_priv_beacon_info_get(struct rtw89_dev *rtwdev, + struct rtw89_debugfs_priv *debugfs_priv, + char *buf, size_t bufsz) +{ + struct rtw89_pkt_stat *pkt_stat = &rtwdev->phystat.last_pkt_stat; + struct rtw89_beacon_track_info *bcn_track = &rtwdev->bcn_track; + struct rtw89_beacon_stat *bcn_stat = &rtwdev->phystat.bcn_stat; + struct rtw89_beacon_dist *bcn_dist = &bcn_stat->bcn_dist; + u16 upper, lower = bcn_stat->tbtt_tu_min; + char *p = buf, *end = buf + bufsz; + u16 *drift = bcn_stat->drift; + u8 bcn_num = bcn_stat->num; + u8 count; + u8 i; + + p += scnprintf(p, end - p, "[Beacon info]\n"); + p += scnprintf(p, end - p, "count: %u\n", pkt_stat->beacon_nr); + p += scnprintf(p, end - p, "interval: %u\n", bcn_track->beacon_int); + p += scnprintf(p, end - p, "dtim: %u\n", bcn_track->dtim); + p += scnprintf(p, end - p, "raw rssi: %lu\n", + ewma_rssi_read(&rtwdev->phystat.bcn_rssi)); + p += scnprintf(p, end - p, "hw rate: %u\n", pkt_stat->beacon_rate); + p += scnprintf(p, end - p, "length: %u\n", pkt_stat->beacon_len); + + p += scnprintf(p, end - p, "\n[Distribution]\n"); + p += scnprintf(p, end - p, "tbtt\n"); + for (i = 0; i < RTW89_BCN_TRACK_MAX_BIN_NUM; i++) { + upper = lower + RTW89_BCN_TRACK_BIN_WIDTH - 1; + if (i == RTW89_BCN_TRACK_MAX_BIN_NUM - 1) + upper = max(upper, bcn_stat->tbtt_tu_max); + + p += scnprintf(p, end - p, "%02u - %02u: %u\n", + lower, upper, bcn_dist->bins[i]); + + lower = upper + 1; + } + + p += scnprintf(p, end - p, "\ndrift\n"); + + for (i = 0; i < bcn_num; i += count) { + count = 1; + while (i + count < bcn_num && drift[i] == drift[i + count]) + count++; + + p += scnprintf(p, end - p, "%u: %u\n", drift[i], count); + } + p += scnprintf(p, end - p, "\nlower bound: %u\n", bcn_dist->lower_bound); + p += scnprintf(p, end - p, "upper bound: %u\n", bcn_dist->upper_bound); + p += scnprintf(p, end - p, "outlier count: %u\n", bcn_dist->outlier_count); + + p += scnprintf(p, end - p, "\n[Tracking]\n"); + p += scnprintf(p, end - p, "tbtt offset: %u\n", bcn_track->tbtt_offset); + p += scnprintf(p, end - p, "bcn timeout: %u\n", bcn_track->bcn_timeout); + + return p - buf; +} + #define rtw89_debug_priv_get(name, opts...) \ { \ .cb_read = rtw89_debug_priv_ ##name## _get, \ @@ -4356,6 +4415,7 @@ static const struct rtw89_debugfs rtw89_debugfs_templ = { .stations = rtw89_debug_priv_get(stations, RLOCK), .disable_dm = rtw89_debug_priv_set_and_get(disable_dm, RWLOCK), .mlo_mode = rtw89_debug_priv_set_and_get(mlo_mode, RWLOCK), + .beacon_info = rtw89_debug_priv_get(beacon_info), }; #define rtw89_debugfs_add(name, mode, fopname, parent) \ @@ -4401,6 +4461,7 @@ void rtw89_debugfs_add_sec1(struct rtw89_dev *rtwdev, struct dentry *debugfs_top rtw89_debugfs_add_r(stations); rtw89_debugfs_add_rw(disable_dm); rtw89_debugfs_add_rw(mlo_mode); + rtw89_debugfs_add_r(beacon_info); } void rtw89_debugfs_init(struct rtw89_dev *rtwdev) From 38846585f9df9af1f7261d85134a5510fc079458 Mon Sep 17 00:00:00 2001 From: Kuan-Chung Chen Date: Mon, 11 Aug 2025 20:37:39 +0800 Subject: [PATCH 0125/1536] wifi: rtw89: wow: remove notify during WoWLAN net-detect In WoWLAN net-detect mode, the firmware periodically performs scans and sends scan reports via C2H, which driver does not need. These unnecessary C2H events cause firmware watchdog timeout, leading to unexpected wakeups and SER 0x2599 on 8922AE. Signed-off-by: Kuan-Chung Chen Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20250811123744.15361-4-pkshih@realtek.com --- drivers/net/wireless/realtek/rtw89/fw.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw89/fw.c b/drivers/net/wireless/realtek/rtw89/fw.c index cc1956245f9b..398d8ab98f63 100644 --- a/drivers/net/wireless/realtek/rtw89/fw.c +++ b/drivers/net/wireless/realtek/rtw89/fw.c @@ -7213,7 +7213,6 @@ static void rtw89_pno_scan_add_chan_ax(struct rtw89_dev *rtwdev, struct rtw89_pktofld_info *info; u8 probe_count = 0; - ch_info->notify_action = RTW89_SCANOFLD_DEBUG_MASK; ch_info->dfs_ch = chan_type == RTW89_CHAN_DFS; ch_info->bw = RTW89_SCAN_WIDTH; ch_info->tx_pkt = true; @@ -7354,7 +7353,6 @@ static void rtw89_pno_scan_add_chan_be(struct rtw89_dev *rtwdev, int chan_type, struct rtw89_pktofld_info *info; u8 probe_count = 0, i; - ch_info->notify_action = RTW89_SCANOFLD_DEBUG_MASK; ch_info->dfs_ch = chan_type == RTW89_CHAN_DFS; ch_info->bw = RTW89_SCAN_WIDTH; ch_info->tx_null = false; From b521685da35ebf091e51f9ea9ad2896a4ddb6e98 Mon Sep 17 00:00:00 2001 From: Kuan-Chung Chen Date: Mon, 11 Aug 2025 20:37:40 +0800 Subject: [PATCH 0126/1536] wifi: rtw89: 8851b: rfk: update IQK TIA setting With the new TIA setting of RX IQK, unstable RX throughput can be avoided, especially in medium-high attenuation environments. Signed-off-by: Kuan-Chung Chen Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20250811123744.15361-5-pkshih@realtek.com --- .../net/wireless/realtek/rtw89/rtw8851b_rfk.c | 85 ++++++++++++------- 1 file changed, 54 insertions(+), 31 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw89/rtw8851b_rfk.c b/drivers/net/wireless/realtek/rtw89/rtw8851b_rfk.c index 7a319a6c838a..a7867b0e083a 100644 --- a/drivers/net/wireless/realtek/rtw89/rtw8851b_rfk.c +++ b/drivers/net/wireless/realtek/rtw89/rtw8851b_rfk.c @@ -17,7 +17,7 @@ #define DPK_RF_REG_NUM_8851B 4 #define DPK_KSET_NUM 4 #define RTW8851B_RXK_GROUP_NR 4 -#define RTW8851B_RXK_GROUP_IDX_NR 2 +#define RTW8851B_RXK_GROUP_IDX_NR 4 #define RTW8851B_TXK_GROUP_NR 1 #define RTW8851B_IQK_VER 0x14 #define RTW8851B_IQK_SS 1 @@ -114,9 +114,9 @@ static const u32 _tssi_de_mcs_10m[RF_PATH_NUM_8851B] = {0x5830}; static const u32 g_idxrxgain[RTW8851B_RXK_GROUP_NR] = {0x10e, 0x116, 0x28e, 0x296}; static const u32 g_idxattc2[RTW8851B_RXK_GROUP_NR] = {0x0, 0xf, 0x0, 0xf}; static const u32 g_idxrxagc[RTW8851B_RXK_GROUP_NR] = {0x0, 0x1, 0x2, 0x3}; -static const u32 a_idxrxgain[RTW8851B_RXK_GROUP_IDX_NR] = {0x10C, 0x28c}; -static const u32 a_idxattc2[RTW8851B_RXK_GROUP_IDX_NR] = {0xf, 0xf}; -static const u32 a_idxrxagc[RTW8851B_RXK_GROUP_IDX_NR] = {0x4, 0x6}; +static const u32 a_idxrxgain[RTW8851B_RXK_GROUP_IDX_NR] = {0x10C, 0x112, 0x28c, 0x292}; +static const u32 a_idxattc2[RTW8851B_RXK_GROUP_IDX_NR] = {0xf, 0xf, 0xf, 0xf}; +static const u32 a_idxrxagc[RTW8851B_RXK_GROUP_IDX_NR] = {0x4, 0x5, 0x6, 0x7}; static const u32 a_power_range[RTW8851B_TXK_GROUP_NR] = {0x0}; static const u32 a_track_range[RTW8851B_TXK_GROUP_NR] = {0x6}; static const u32 a_gain_bb[RTW8851B_TXK_GROUP_NR] = {0x0a}; @@ -139,17 +139,6 @@ static const u32 dpk_rf_reg[DPK_RF_REG_NUM_8851B] = {0xde, 0x8f, 0x5, 0x10005}; static void _set_ch(struct rtw89_dev *rtwdev, u32 val); -static u8 _rxk_5ghz_group_from_idx(u8 idx) -{ - /* There are four RXK groups (RTW8851B_RXK_GROUP_NR), but only group 0 - * and 2 are used in 5 GHz band, so reduce elements to 2. - */ - if (idx < RTW8851B_RXK_GROUP_IDX_NR) - return idx * 2; - - return 0; -} - static u8 _kpath(struct rtw89_dev *rtwdev, enum rtw89_phy_idx phy_idx) { return RF_A; @@ -196,7 +185,7 @@ static void _txck_force(struct rtw89_dev *rtwdev, enum rtw89_rf_path path, static void _rxck_force(struct rtw89_dev *rtwdev, enum rtw89_rf_path path, bool force, enum adc_ck ck) { - static const u32 ck960_8851b[] = {0x8, 0x2, 0x2, 0x4, 0xf, 0xa, 0x93}; + static const u32 ck960_8851b[] = {0x8, 0x2, 0x2, 0x4, 0xf, 0xa, 0x92}; static const u32 ck1920_8851b[] = {0x9, 0x0, 0x0, 0x3, 0xf, 0xa, 0x49}; const u32 *data; @@ -905,18 +894,27 @@ static bool _rxk_5g_group_sel(struct rtw89_dev *rtwdev, bool kfail = false; bool notready; u32 rf_0; - u8 idx; + u32 val; u8 gp; rtw89_debug(rtwdev, RTW89_DBG_RFK, "[IQK]===>%s\n", __func__); - for (idx = 0; idx < RTW8851B_RXK_GROUP_IDX_NR; idx++) { - gp = _rxk_5ghz_group_from_idx(idx); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWE, RFREG_MASK, 0x1000); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWA, RFREG_MASK, 0x4); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWD0, RFREG_MASK, 0x17); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWA, RFREG_MASK, 0x5); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWD0, RFREG_MASK, 0x27); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWE, RFREG_MASK, 0x0); + val = rtw89_read_rf(rtwdev, RF_PATH_A, RR_RXA2, 0x20); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_MOD, RR_MOD_MASK, 0xc); + + for (gp = 0; gp < RTW8851B_RXK_GROUP_IDX_NR; gp++) { rtw89_debug(rtwdev, RTW89_DBG_RFK, "[IQK]S%x, gp = %x\n", path, gp); - rtw89_write_rf(rtwdev, RF_PATH_A, RR_MOD, RR_MOD_RGM, a_idxrxgain[idx]); - rtw89_write_rf(rtwdev, RF_PATH_A, RR_RXA2, RR_RXA2_ATT, a_idxattc2[idx]); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_MOD, RR_MOD_RGM, a_idxrxgain[gp]); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_RXA2, RR_RXA2_ATT, a_idxattc2[gp]); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_RXA2, 0x20, 0x1); rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_SEL, 0x1); rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_G3, 0x0); @@ -926,7 +924,7 @@ static bool _rxk_5g_group_sel(struct rtw89_dev *rtwdev, fsleep(100); rf_0 = rtw89_read_rf(rtwdev, path, RR_MOD, RFREG_MASK); rtw89_phy_write32_mask(rtwdev, R_IQK_DIF2, B_IQK_DIF2_RXPI, rf_0); - rtw89_phy_write32_mask(rtwdev, R_IQK_RXA, B_IQK_RXAGC, a_idxrxagc[idx]); + rtw89_phy_write32_mask(rtwdev, R_IQK_RXA, B_IQK_RXAGC, a_idxrxagc[gp]); rtw89_phy_write32_mask(rtwdev, R_IQK_DIF4, B_IQK_DIF4_RXT, 0x11); notready = _iqk_one_shot(rtwdev, phy_idx, path, ID_RXAGC); @@ -959,6 +957,7 @@ static bool _rxk_5g_group_sel(struct rtw89_dev *rtwdev, _iqk_sram(rtwdev, path); if (kfail) { + rtw89_phy_write32_mask(rtwdev, R_IQK_RES, B_IQK_RES_RXCFIR, 0x0); rtw89_phy_write32_mask(rtwdev, R_RXIQC + (path << 8), MASKDWORD, iqk_info->nb_rxcfir[path] | 0x2); iqk_info->is_wb_txiqk[path] = false; @@ -968,6 +967,14 @@ static bool _rxk_5g_group_sel(struct rtw89_dev *rtwdev, iqk_info->is_wb_txiqk[path] = true; } + rtw89_write_rf(rtwdev, RF_PATH_A, RR_RXA2, 0x20, val); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWE, RFREG_MASK, 0x1000); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWA, RFREG_MASK, 0x4); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWD0, RFREG_MASK, 0x37); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWA, RFREG_MASK, 0x5); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWD0, RFREG_MASK, 0x27); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWE, RFREG_MASK, 0x0); + rtw89_debug(rtwdev, RTW89_DBG_RFK, "[IQK]S%x, kfail = 0x%x, 0x8%x3c = 0x%x\n", path, kfail, 1 << path, iqk_info->nb_rxcfir[path]); @@ -980,17 +987,26 @@ static bool _iqk_5g_nbrxk(struct rtw89_dev *rtwdev, enum rtw89_phy_idx phy_idx, struct rtw89_iqk_info *iqk_info = &rtwdev->iqk; bool kfail = false; bool notready; - u8 idx = 0x1; + u8 gp = 2; u32 rf_0; - u8 gp; - - gp = _rxk_5ghz_group_from_idx(idx); + u32 val; rtw89_debug(rtwdev, RTW89_DBG_RFK, "[IQK]===>%s\n", __func__); rtw89_debug(rtwdev, RTW89_DBG_RFK, "[IQK]S%x, gp = %x\n", path, gp); - rtw89_write_rf(rtwdev, RF_PATH_A, RR_MOD, RR_MOD_RGM, a_idxrxgain[idx]); - rtw89_write_rf(rtwdev, RF_PATH_A, RR_RXA2, RR_RXA2_ATT, a_idxattc2[idx]); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWE, RFREG_MASK, 0x1000); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWA, RFREG_MASK, 0x4); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWD0, RFREG_MASK, 0x17); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWA, RFREG_MASK, 0x5); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWD0, RFREG_MASK, 0x27); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWE, RFREG_MASK, 0x0); + + val = rtw89_read_rf(rtwdev, RF_PATH_A, RR_RXA2, 0x20); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_MOD, RR_MOD_MASK, 0xc); + + rtw89_write_rf(rtwdev, RF_PATH_A, RR_MOD, RR_MOD_RGM, a_idxrxgain[gp]); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_RXA2, RR_RXA2_ATT, a_idxattc2[gp]); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_RXA2, 0x20, 0x1); rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_SEL, 0x1); rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_G3, 0x0); @@ -1000,7 +1016,7 @@ static bool _iqk_5g_nbrxk(struct rtw89_dev *rtwdev, enum rtw89_phy_idx phy_idx, fsleep(100); rf_0 = rtw89_read_rf(rtwdev, path, RR_MOD, RFREG_MASK); rtw89_phy_write32_mask(rtwdev, R_IQK_DIF2, B_IQK_DIF2_RXPI, rf_0); - rtw89_phy_write32_mask(rtwdev, R_IQK_RXA, B_IQK_RXAGC, a_idxrxagc[idx]); + rtw89_phy_write32_mask(rtwdev, R_IQK_RXA, B_IQK_RXAGC, a_idxrxagc[gp]); rtw89_phy_write32_mask(rtwdev, R_IQK_DIF4, B_IQK_DIF4_RXT, 0x11); notready = _iqk_one_shot(rtwdev, phy_idx, path, ID_RXAGC); @@ -1026,6 +1042,7 @@ static bool _iqk_5g_nbrxk(struct rtw89_dev *rtwdev, enum rtw89_phy_idx phy_idx, kfail = !!rtw89_phy_read32_mask(rtwdev, R_NCTL_RPT, B_NCTL_RPT_FLG); if (kfail) { + rtw89_phy_write32_mask(rtwdev, R_IQK_RES + (path << 8), 0xf, 0x0); rtw89_phy_write32_mask(rtwdev, R_RXIQC + (path << 8), MASKDWORD, 0x40000002); iqk_info->is_wb_rxiqk[path] = false; @@ -1033,6 +1050,14 @@ static bool _iqk_5g_nbrxk(struct rtw89_dev *rtwdev, enum rtw89_phy_idx phy_idx, iqk_info->is_wb_rxiqk[path] = false; } + rtw89_write_rf(rtwdev, RF_PATH_A, RR_RXA2, 0x20, val); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWE, RFREG_MASK, 0x1000); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWA, RFREG_MASK, 0x4); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWD0, RFREG_MASK, 0x37); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWA, RFREG_MASK, 0x5); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWD0, RFREG_MASK, 0x27); + rtw89_write_rf(rtwdev, RF_PATH_A, RR_LUTWE, RFREG_MASK, 0x0); + rtw89_debug(rtwdev, RTW89_DBG_RFK, "[IQK]S%x, kfail = 0x%x, 0x8%x3c = 0x%x\n", path, kfail, 1 << path, iqk_info->nb_rxcfir[path]); @@ -1664,8 +1689,6 @@ static void _iqk_init(struct rtw89_dev *rtwdev) struct rtw89_iqk_info *iqk_info = &rtwdev->iqk; u8 idx, path; - rtw89_phy_write32_mask(rtwdev, R_IQKINF, MASKDWORD, 0x0); - if (iqk_info->is_iqk_init) return; From 5b2341efbb7af974925985f7798321fe92b4f8af Mon Sep 17 00:00:00 2001 From: Kuan-Chung Chen Date: Mon, 11 Aug 2025 20:39:13 +0800 Subject: [PATCH 0127/1536] wifi: rtw89: 8851b: rfk: update TX wideband IQK Adjust TX wideband calibration from 1 to 2 loop gain settings. This can reflect in better performance in 5 GHz medium-high attenuation environments. Signed-off-by: Kuan-Chung Chen Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20250811123913.15524-1-pkshih@realtek.com --- drivers/net/wireless/realtek/rtw89/reg.h | 1 + .../net/wireless/realtek/rtw89/rtw8851b_rfk.c | 72 +++++++++++-------- 2 files changed, 42 insertions(+), 31 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw89/reg.h b/drivers/net/wireless/realtek/rtw89/reg.h index de81103a072f..0aad9dc91736 100644 --- a/drivers/net/wireless/realtek/rtw89/reg.h +++ b/drivers/net/wireless/realtek/rtw89/reg.h @@ -9126,6 +9126,7 @@ #define B_COEF_SEL_MDPD BIT(8) #define B_COEF_SEL_MDPD_V1 GENMASK(9, 8) #define B_COEF_SEL_EN BIT(31) +#define R_CFIR_COEF 0x810c #define R_CFIR_SYS 0x8120 #define R_IQK_RES 0x8124 #define B_IQK_RES_K BIT(28) diff --git a/drivers/net/wireless/realtek/rtw89/rtw8851b_rfk.c b/drivers/net/wireless/realtek/rtw89/rtw8851b_rfk.c index a7867b0e083a..84c46d2f4d85 100644 --- a/drivers/net/wireless/realtek/rtw89/rtw8851b_rfk.c +++ b/drivers/net/wireless/realtek/rtw89/rtw8851b_rfk.c @@ -18,7 +18,8 @@ #define DPK_KSET_NUM 4 #define RTW8851B_RXK_GROUP_NR 4 #define RTW8851B_RXK_GROUP_IDX_NR 4 -#define RTW8851B_TXK_GROUP_NR 1 +#define RTW8851B_A_TXK_GROUP_NR 2 +#define RTW8851B_G_TXK_GROUP_NR 1 #define RTW8851B_IQK_VER 0x14 #define RTW8851B_IQK_SS 1 #define RTW8851B_LOK_GRAM 10 @@ -117,16 +118,18 @@ static const u32 g_idxrxagc[RTW8851B_RXK_GROUP_NR] = {0x0, 0x1, 0x2, 0x3}; static const u32 a_idxrxgain[RTW8851B_RXK_GROUP_IDX_NR] = {0x10C, 0x112, 0x28c, 0x292}; static const u32 a_idxattc2[RTW8851B_RXK_GROUP_IDX_NR] = {0xf, 0xf, 0xf, 0xf}; static const u32 a_idxrxagc[RTW8851B_RXK_GROUP_IDX_NR] = {0x4, 0x5, 0x6, 0x7}; -static const u32 a_power_range[RTW8851B_TXK_GROUP_NR] = {0x0}; -static const u32 a_track_range[RTW8851B_TXK_GROUP_NR] = {0x6}; -static const u32 a_gain_bb[RTW8851B_TXK_GROUP_NR] = {0x0a}; -static const u32 a_itqt[RTW8851B_TXK_GROUP_NR] = {0x12}; -static const u32 g_power_range[RTW8851B_TXK_GROUP_NR] = {0x0}; -static const u32 g_track_range[RTW8851B_TXK_GROUP_NR] = {0x6}; -static const u32 g_gain_bb[RTW8851B_TXK_GROUP_NR] = {0x10}; -static const u32 g_itqt[RTW8851B_TXK_GROUP_NR] = {0x12}; +static const u32 a_power_range[RTW8851B_A_TXK_GROUP_NR] = {0x0, 0x0}; +static const u32 a_track_range[RTW8851B_A_TXK_GROUP_NR] = {0x7, 0x7}; +static const u32 a_gain_bb[RTW8851B_A_TXK_GROUP_NR] = {0x08, 0x0d}; +static const u32 a_itqt[RTW8851B_A_TXK_GROUP_NR] = {0x12, 0x12}; +static const u32 a_att_smxr[RTW8851B_A_TXK_GROUP_NR] = {0x0, 0x2}; +static const u32 g_power_range[RTW8851B_G_TXK_GROUP_NR] = {0x0}; +static const u32 g_track_range[RTW8851B_G_TXK_GROUP_NR] = {0x6}; +static const u32 g_gain_bb[RTW8851B_G_TXK_GROUP_NR] = {0x10}; +static const u32 g_itqt[RTW8851B_G_TXK_GROUP_NR] = {0x12}; -static const u32 rtw8851b_backup_bb_regs[] = {0xc0d4, 0xc0d8, 0xc0c4, 0xc0ec, 0xc0e8}; +static const u32 rtw8851b_backup_bb_regs[] = { + 0xc0d4, 0xc0d8, 0xc0c4, 0xc0ec, 0xc0e8, 0x12a0, 0xc0f0}; static const u32 rtw8851b_backup_rf_regs[] = { 0xef, 0xde, 0x0, 0x1e, 0x2, 0x85, 0x90, 0x5}; @@ -789,7 +792,7 @@ static bool _iqk_one_shot(struct rtw89_dev *rtwdev, enum rtw89_phy_idx phy_idx, "[IQK]============ S%d ID_NBTXK ============\n", path); rtw89_phy_write32_mask(rtwdev, R_UPD_CLK, B_IQK_RFC_ON, 0x0); rtw89_phy_write32_mask(rtwdev, R_IQK_DIF4, B_IQK_DIF4_TXT, - 0x00b); + 0x11); iqk_cmd = 0x408 | (1 << (4 + path)); break; case ID_NBRXK: @@ -807,7 +810,7 @@ static bool _iqk_one_shot(struct rtw89_dev *rtwdev, enum rtw89_phy_idx phy_idx, rtw89_phy_write32_mask(rtwdev, R_NCTL_CFG, MASKDWORD, iqk_cmd + 1); notready = _iqk_check_cal(rtwdev, path); if (iqk_info->iqk_sram_en && - (ktype == ID_NBRXK || ktype == ID_RXK)) + (ktype == ID_NBRXK || ktype == ID_RXK || ktype == ID_NBTXK)) _iqk_sram(rtwdev, path); rtw89_phy_write32_mask(rtwdev, R_UPD_CLK, B_IQK_RFC_ON, 0x0); @@ -1174,6 +1177,7 @@ static void _iqk_rxclk_setting(struct rtw89_dev *rtwdev, u8 path) static bool _txk_5g_group_sel(struct rtw89_dev *rtwdev, enum rtw89_phy_idx phy_idx, u8 path) { + static const u8 a_idx[RTW8851B_A_TXK_GROUP_NR] = {2, 3}; struct rtw89_iqk_info *iqk_info = &rtwdev->iqk; bool kfail = false; bool notready; @@ -1181,16 +1185,20 @@ static bool _txk_5g_group_sel(struct rtw89_dev *rtwdev, rtw89_debug(rtwdev, RTW89_DBG_RFK, "[IQK]===>%s\n", __func__); - for (gp = 0x0; gp < RTW8851B_TXK_GROUP_NR; gp++) { + rtw89_phy_write32_mask(rtwdev, R_CFIR_COEF, MASKDWORD, 0x33332222); + + for (gp = 0x0; gp < RTW8851B_A_TXK_GROUP_NR; gp++) { rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_GR0, a_power_range[gp]); rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_GR1, a_track_range[gp]); rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_TG, a_gain_bb[gp]); + rtw89_write_rf(rtwdev, path, RR_BIASA, RR_BIASA_A, a_att_smxr[gp]); rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_SEL, 0x1); rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_G3, 0x1); rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_G2, 0x0); - rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_GP, gp); + rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_GP, a_idx[gp]); rtw89_phy_write32_mask(rtwdev, R_NCTL_N1, B_NCTL_N1_CIP, 0x00); + rtw89_phy_write32_mask(rtwdev, R_IQK_DIF4, B_IQK_DIF4_TXT, 0x11); rtw89_phy_write32_mask(rtwdev, R_KIP_IQP, MASKDWORD, a_itqt[gp]); notready = _iqk_one_shot(rtwdev, phy_idx, path, ID_NBTXK); @@ -1231,7 +1239,9 @@ static bool _txk_2g_group_sel(struct rtw89_dev *rtwdev, rtw89_debug(rtwdev, RTW89_DBG_RFK, "[IQK]===>%s\n", __func__); - for (gp = 0x0; gp < RTW8851B_TXK_GROUP_NR; gp++) { + rtw89_phy_write32_mask(rtwdev, R_CFIR_COEF, MASKDWORD, 0x0); + + for (gp = 0x0; gp < RTW8851B_G_TXK_GROUP_NR; gp++) { rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_GR0, g_power_range[gp]); rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_GR1, g_track_range[gp]); rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_TG, g_gain_bb[gp]); @@ -1274,29 +1284,29 @@ static bool _txk_2g_group_sel(struct rtw89_dev *rtwdev, static bool _iqk_5g_nbtxk(struct rtw89_dev *rtwdev, enum rtw89_phy_idx phy_idx, u8 path) { + static const u8 a_idx[RTW8851B_A_TXK_GROUP_NR] = {2, 3}; struct rtw89_iqk_info *iqk_info = &rtwdev->iqk; bool kfail = false; bool notready; - u8 gp; + u8 gp = 0; rtw89_debug(rtwdev, RTW89_DBG_RFK, "[IQK]===>%s\n", __func__); - for (gp = 0x0; gp < RTW8851B_TXK_GROUP_NR; gp++) { - rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_GR0, a_power_range[gp]); - rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_GR1, a_track_range[gp]); - rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_TG, a_gain_bb[gp]); + rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_GR0, a_power_range[gp]); + rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_GR1, a_track_range[gp]); + rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_TG, a_gain_bb[gp]); + rtw89_write_rf(rtwdev, path, RR_BIASA, RR_BIASA_A, a_att_smxr[gp]); - rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_SEL, 0x1); - rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_G3, 0x1); - rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_G2, 0x0); - rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_GP, gp); - rtw89_phy_write32_mask(rtwdev, R_NCTL_N1, B_NCTL_N1_CIP, 0x00); - rtw89_phy_write32_mask(rtwdev, R_KIP_IQP, MASKDWORD, a_itqt[gp]); + rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_SEL, 0x1); + rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_G3, 0x1); + rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_G2, 0x0); + rtw89_phy_write32_mask(rtwdev, R_CFIR_LUT, B_CFIR_LUT_GP, a_idx[gp]); + rtw89_phy_write32_mask(rtwdev, R_NCTL_N1, B_NCTL_N1_CIP, 0x00); + rtw89_phy_write32_mask(rtwdev, R_KIP_IQP, MASKDWORD, a_itqt[gp]); - notready = _iqk_one_shot(rtwdev, phy_idx, path, ID_NBTXK); - iqk_info->nb_txcfir[path] = - rtw89_phy_read32_mask(rtwdev, R_TXIQC, MASKDWORD) | 0x2; - } + notready = _iqk_one_shot(rtwdev, phy_idx, path, ID_NBTXK); + iqk_info->nb_txcfir[path] = + rtw89_phy_read32_mask(rtwdev, R_TXIQC, MASKDWORD) | 0x2; if (!notready) kfail = !!rtw89_phy_read32_mask(rtwdev, R_NCTL_RPT, B_NCTL_RPT_FLG); @@ -1325,7 +1335,7 @@ static bool _iqk_2g_nbtxk(struct rtw89_dev *rtwdev, enum rtw89_phy_idx phy_idx, rtw89_debug(rtwdev, RTW89_DBG_RFK, "[IQK]===>%s\n", __func__); - for (gp = 0x0; gp < RTW8851B_TXK_GROUP_NR; gp++) { + for (gp = 0x0; gp < RTW8851B_G_TXK_GROUP_NR; gp++) { rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_GR0, g_power_range[gp]); rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_GR1, g_track_range[gp]); rtw89_write_rf(rtwdev, path, RR_TXIG, RR_TXIG_TG, g_gain_bb[gp]); From 46ac5412e406f63149f199e38279f21f87f3701a Mon Sep 17 00:00:00 2001 From: Chih-Kang Chang Date: Mon, 11 Aug 2025 20:39:34 +0800 Subject: [PATCH 0128/1536] wifi: rtw89: 8852c: check LPS H2C command complete by C2H reg instead of done ack 8852C after FW 0.27.128.0 driver check LPS H2C command received by FW using C2H reg instead of done ack. Signed-off-by: Chih-Kang Chang Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20250811123934.15614-1-pkshih@realtek.com --- drivers/net/wireless/realtek/rtw89/fw.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wireless/realtek/rtw89/fw.c b/drivers/net/wireless/realtek/rtw89/fw.c index 398d8ab98f63..ede4c15314a8 100644 --- a/drivers/net/wireless/realtek/rtw89/fw.c +++ b/drivers/net/wireless/realtek/rtw89/fw.c @@ -847,6 +847,7 @@ static const struct __fw_feat_cfg fw_feat_tbl[] = { __CFG_FW_FEAT(RTL8852C, ge, 0, 27, 56, 10, BEACON_FILTER), __CFG_FW_FEAT(RTL8852C, ge, 0, 27, 80, 0, WOW_REASON_V1), __CFG_FW_FEAT(RTL8852C, ge, 0, 27, 128, 0, BEACON_LOSS_COUNT_V1), + __CFG_FW_FEAT(RTL8852C, ge, 0, 27, 128, 0, LPS_DACK_BY_C2H_REG), __CFG_FW_FEAT(RTL8852C, ge, 0, 27, 129, 1, BEACON_TRACKING), __CFG_FW_FEAT(RTL8922A, ge, 0, 34, 30, 0, CRASH_TRIGGER_TYPE_0), __CFG_FW_FEAT(RTL8922A, ge, 0, 34, 11, 0, MACID_PAUSE_SLEEP), From c4c16c88e78417424b4e3f33177e84baf0bc9a99 Mon Sep 17 00:00:00 2001 From: Kuan-Chung Chen Date: Mon, 11 Aug 2025 20:39:50 +0800 Subject: [PATCH 0129/1536] wifi: rtw89: fix BSSID comparison for non-transmitted BSSID For non-transmitted connections, beacons are received from the transmitted BSSID. Fix this to avoid missing beacon statistics. Signed-off-by: Kuan-Chung Chen Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20250811123950.15697-1-pkshih@realtek.com --- drivers/net/wireless/realtek/rtw89/core.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/realtek/rtw89/core.c b/drivers/net/wireless/realtek/rtw89/core.c index 7d522031dfa1..0ad7562632a5 100644 --- a/drivers/net/wireless/realtek/rtw89/core.c +++ b/drivers/net/wireless/realtek/rtw89/core.c @@ -2668,6 +2668,7 @@ static void rtw89_vif_rx_stats_iter(void *data, u8 *mac, struct ieee80211_bss_conf *bss_conf; struct rtw89_vif_link *rtwvif_link; const u8 *bssid = iter_data->bssid; + const u8 *target_bssid; if (rtwdev->scanning && (ieee80211_is_beacon(hdr->frame_control) || @@ -2689,7 +2690,10 @@ static void rtw89_vif_rx_stats_iter(void *data, u8 *mac, goto out; } - if (!ether_addr_equal(bss_conf->bssid, bssid)) + target_bssid = ieee80211_is_beacon(hdr->frame_control) && + bss_conf->nontransmitted ? + bss_conf->transmitter_bssid : bss_conf->bssid; + if (!ether_addr_equal(target_bssid, bssid)) goto out; if (is_mld) { From bf02a01d1dd5bfe76ad8db3b588a39ecac96ebee Mon Sep 17 00:00:00 2001 From: Kuan-Chung Chen Date: Mon, 11 Aug 2025 20:40:01 +0800 Subject: [PATCH 0130/1536] wifi: rtw89: fix group frames loss when connected to non-transmitted BSSID When STA connects to AP with dot11MultiBSSIDImplemented set to true, the layout of the TIM element's Partial Virtual Bitmap changes. Bits 1 to (2^n - 1) are used to indicate buffered group addressed frames (e.g., broadcast/multicast) for non-transmitted BSSIDs. Fix the interpretation of this field to ensure group addressed frames are correctly received. Signed-off-by: Kuan-Chung Chen Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20250811124001.15774-1-pkshih@realtek.com --- drivers/net/wireless/realtek/rtw89/core.h | 1 + drivers/net/wireless/realtek/rtw89/mac.c | 26 +++++++++++++++++++++ drivers/net/wireless/realtek/rtw89/mac_be.c | 1 + drivers/net/wireless/realtek/rtw89/reg.h | 8 +++++++ 4 files changed, 36 insertions(+) diff --git a/drivers/net/wireless/realtek/rtw89/core.h b/drivers/net/wireless/realtek/rtw89/core.h index ca48426c577f..d8c40ce3ec61 100644 --- a/drivers/net/wireless/realtek/rtw89/core.h +++ b/drivers/net/wireless/realtek/rtw89/core.h @@ -1011,6 +1011,7 @@ struct rtw89_port_reg { u32 ptcl_dbg; u32 ptcl_dbg_info; u32 bcn_drop_all; + u32 bcn_psr_rpt; u32 hiq_win[RTW89_PORT_NUM]; }; diff --git a/drivers/net/wireless/realtek/rtw89/mac.c b/drivers/net/wireless/realtek/rtw89/mac.c index e4f9d251d5ef..48712a2994b6 100644 --- a/drivers/net/wireless/realtek/rtw89/mac.c +++ b/drivers/net/wireless/realtek/rtw89/mac.c @@ -4197,6 +4197,7 @@ static const struct rtw89_port_reg rtw89_port_base_ax = { .ptcl_dbg = R_AX_PTCL_DBG, .ptcl_dbg_info = R_AX_PTCL_DBG_INFO, .bcn_drop_all = R_AX_BCN_DROP_ALL0, + .bcn_psr_rpt = R_AX_BCN_PSR_RPT_P0, .hiq_win = {R_AX_P0MB_HGQ_WINDOW_CFG_0, R_AX_PORT_HGQ_WINDOW_CFG, R_AX_PORT_HGQ_WINDOW_CFG + 1, R_AX_PORT_HGQ_WINDOW_CFG + 2, R_AX_PORT_HGQ_WINDOW_CFG + 3}, @@ -4649,6 +4650,30 @@ static void rtw89_mac_port_cfg_bcn_early(struct rtw89_dev *rtwdev, BCN_ERLY_DEF); } +static void rtw89_mac_port_cfg_bcn_psr_rpt(struct rtw89_dev *rtwdev, + struct rtw89_vif_link *rtwvif_link) +{ + const struct rtw89_mac_gen_def *mac = rtwdev->chip->mac_def; + const struct rtw89_port_reg *p = mac->port_base; + struct ieee80211_bss_conf *bss_conf; + u8 bssid_index; + u32 reg; + + rcu_read_lock(); + + bss_conf = rtw89_vif_rcu_dereference_link(rtwvif_link, true); + if (bss_conf->nontransmitted) + bssid_index = bss_conf->bssid_index; + else + bssid_index = 0; + + rcu_read_unlock(); + + reg = rtw89_mac_reg_by_idx(rtwdev, p->bcn_psr_rpt + rtwvif_link->port * 4, + rtwvif_link->mac_idx); + rtw89_write32_mask(rtwdev, reg, B_AX_BCAID_P0_MASK, bssid_index); +} + void rtw89_mac_port_tsf_sync(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rtwvif_link, struct rtw89_vif_link *rtwvif_src, @@ -4805,6 +4830,7 @@ int rtw89_mac_port_update(struct rtw89_dev *rtwdev, struct rtw89_vif_link *rtwvi rtw89_mac_port_tsf_resync_all(rtwdev); fsleep(BCN_ERLY_SET_DLY); rtw89_mac_port_cfg_bcn_early(rtwdev, rtwvif_link); + rtw89_mac_port_cfg_bcn_psr_rpt(rtwdev, rtwvif_link); return 0; } diff --git a/drivers/net/wireless/realtek/rtw89/mac_be.c b/drivers/net/wireless/realtek/rtw89/mac_be.c index 0078080b3999..ef69672b6862 100644 --- a/drivers/net/wireless/realtek/rtw89/mac_be.c +++ b/drivers/net/wireless/realtek/rtw89/mac_be.c @@ -56,6 +56,7 @@ static const struct rtw89_port_reg rtw89_port_base_be = { .ptcl_dbg = R_BE_PTCL_DBG, .ptcl_dbg_info = R_BE_PTCL_DBG_INFO, .bcn_drop_all = R_BE_BCN_DROP_ALL0, + .bcn_psr_rpt = R_BE_BCN_PSR_RPT_P0, .hiq_win = {R_BE_P0MB_HGQ_WINDOW_CFG_0, R_BE_PORT_HGQ_WINDOW_CFG, R_BE_PORT_HGQ_WINDOW_CFG + 1, R_BE_PORT_HGQ_WINDOW_CFG + 2, R_BE_PORT_HGQ_WINDOW_CFG + 3}, diff --git a/drivers/net/wireless/realtek/rtw89/reg.h b/drivers/net/wireless/realtek/rtw89/reg.h index 0aad9dc91736..bfed0bbcfb7e 100644 --- a/drivers/net/wireless/realtek/rtw89/reg.h +++ b/drivers/net/wireless/realtek/rtw89/reg.h @@ -3370,6 +3370,10 @@ #define B_AX_CSIPRT_HESU_AID_EN BIT(25) #define B_AX_CSIPRT_VHTSU_AID_EN BIT(24) +#define R_AX_BCN_PSR_RPT_P0 0xCE84 +#define R_AX_BCN_PSR_RPT_P0_C1 0xEE84 +#define B_AX_BCAID_P0_MASK GENMASK(10, 0) + #define R_AX_RX_STATE_MONITOR 0xCEF0 #define R_AX_RX_STATE_MONITOR_C1 0xEEF0 #define B_AX_RX_STATE_MONITOR_MASK GENMASK(31, 0) @@ -7494,6 +7498,10 @@ #define R_BE_DRV_INFO_OPTION_C1 0x15470 #define B_BE_DRV_INFO_PHYRPT_EN BIT(0) +#define R_BE_BCN_PSR_RPT_P0 0x11484 +#define R_BE_BCN_PSR_RPT_P0_C1 0x15484 +#define B_BE_BCAID_P0_MASK GENMASK(10, 0) + #define R_BE_RX_ERR_ISR 0x114F4 #define R_BE_RX_ERR_ISR_C1 0x154F4 #define B_BE_RX_ERR_TRIG_ACT_TO BIT(9) From 6d598e856d10317412898cb841ee9df5779bc57d Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Fri, 15 Aug 2025 12:31:50 +0100 Subject: [PATCH 0131/1536] net: stmmac: remove unnecessary checks in ethtool eee ops Phylink will check whether the MAC supports the LPI methods in struct phylink_mac_ops, and return -EOPNOTSUPP if the LPI capabilities are not provided. stmmac doesn't provide LPI capabilities if priv->dma_cap.eee is not set. Therefore, checking the state of priv->dma_cap.eee in the ethtool ops and returning -EOPNOTSUPP is redundant - let phylink handle this. Reviewed-by: Andrew Lunn Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1umsf0-008vKK-A3@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c index 77758a7299b4..dda7ba1f524d 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c @@ -852,9 +852,6 @@ static int stmmac_ethtool_op_get_eee(struct net_device *dev, { struct stmmac_priv *priv = netdev_priv(dev); - if (!priv->dma_cap.eee) - return -EOPNOTSUPP; - return phylink_ethtool_get_eee(priv->phylink, edata); } @@ -863,9 +860,6 @@ static int stmmac_ethtool_op_set_eee(struct net_device *dev, { struct stmmac_priv *priv = netdev_priv(dev); - if (!priv->dma_cap.eee) - return -EOPNOTSUPP; - return phylink_ethtool_set_eee(priv->phylink, edata); } From 49b97bc52affb8c800a3b266c8d64482f7b5caf2 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Fri, 15 Aug 2025 12:31:55 +0100 Subject: [PATCH 0132/1536] net: stmmac: remove write-only mac->pmt mac_device_info->pmt is only ever written, nothing reads it. Remove this struct member. Reviewed-by: Andrew Lunn Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1umsf5-008vKQ-DT@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/common.h | 1 - drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 - 2 files changed, 2 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index cbffccb3b9af..eaa1f2e1c5a5 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -602,7 +602,6 @@ struct mac_device_info { unsigned int mcast_bits_log2; unsigned int rx_csum; unsigned int pcs; - unsigned int pmt; unsigned int ps; unsigned int xlgmac; unsigned int num_vlan; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 9a77390b7f9d..cff75367311e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7240,7 +7240,6 @@ static int stmmac_hw_init(struct stmmac_priv *priv) priv->plat->enh_desc = priv->dma_cap.enh_desc; priv->plat->pmt = priv->dma_cap.pmt_remote_wake_up && !(priv->plat->flags & STMMAC_FLAG_USE_PHY_WOL); - priv->hw->pmt = priv->plat->pmt; if (priv->dma_cap.hash_tb_sz) { priv->hw->multicast_filter_bins = (BIT(priv->dma_cap.hash_tb_sz) << 5); From b181306e5e6877f4330081821437ae10ee4a10f6 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Fri, 15 Aug 2025 12:32:00 +0100 Subject: [PATCH 0133/1536] net: stmmac: remove redundant WoL option validation The core ethtool API validates the WoL options passed from userspace against the support which the driver reports from its get_wol() method, returning EINVAL if an unsupported mode is requested. Therefore, there is no need for stmmac to implement its own validation. Remove this unnecessary code. See ethnl_set_wol() in net/ethtool/wol.c and ethtool_set_wol() in net/ethtool/ioctl.c. Reviewed-by: Andrew Lunn Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1umsfA-008vKW-H1@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c index dda7ba1f524d..cd2fb92ac84c 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c @@ -803,7 +803,6 @@ static void stmmac_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) static int stmmac_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) { struct stmmac_priv *priv = netdev_priv(dev); - u32 support = WAKE_MAGIC | WAKE_UCAST; if (!device_can_wakeup(priv->device)) return -EOPNOTSUPP; @@ -816,15 +815,6 @@ static int stmmac_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) return ret; } - /* By default almost all GMAC devices support the WoL via - * magic frame but we can disable it if the HW capability - * register shows no support for pmt_magic_frame. */ - if ((priv->hw_cap_support) && (!priv->dma_cap.pmt_magic_frame)) - wol->wolopts &= ~WAKE_MAGIC; - - if (wol->wolopts & ~support) - return -EINVAL; - if (wol->wolopts) { pr_info("stmmac: wakeup enable\n"); device_set_wakeup_enable(priv->device, 1); From f17bd297bb833e6f1cae9d3cadad8f334bc3cb8b Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Fri, 15 Aug 2025 12:32:05 +0100 Subject: [PATCH 0134/1536] net: stmmac: remove unnecessary "stmmac: wakeup enable" print Printing "stmmac: wakeup enable" to the kernel log isn't useful - it doesn't identify the adapter, and is effectively nothing more than a debugging print. This information can be discovered by looking at /sys/device.../power/wakeup as the device_set_wakeup_enable() call updates this sysfs file. Reviewed-by: Andrew Lunn Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1umsfF-008vKc-Kt@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c index cd2fb92ac84c..58542b72cc01 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c @@ -816,7 +816,6 @@ static int stmmac_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) } if (wol->wolopts) { - pr_info("stmmac: wakeup enable\n"); device_set_wakeup_enable(priv->device, 1); /* Avoid unbalanced enable_irq_wake calls */ if (priv->wol_irq_disabled) From d09413dd2577953f671136d572932b91a0293db4 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Fri, 15 Aug 2025 12:32:10 +0100 Subject: [PATCH 0135/1536] net: stmmac: use core wake IRQ support The PM core provides management of wake IRQs along side setting the device wake enable state. In order to use this, we need to register the interrupt used to wakeup the system using devm_pm_set_wake_irq() or dev_pm_set_wake_irq(). The core will then enable or disable IRQ wake state on this interrupt as appropriate, depending on the device_set_wakeup_enable() state. device_set_wakeup_enable() does not care about having balanced enable/disable calls. Make use of this functionality, rather than explicitly managing the IRQ enable state in the set_wol() ethtool op. This removes the IRQ wake state management from stmmac. Signed-off-by: Russell King (Oracle) Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/E1umsfK-008vKj-Pw@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 1 - .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 14 +------------- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++-- 3 files changed, 3 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index bf95f03dd33f..b16b8aeeb583 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -289,7 +289,6 @@ struct stmmac_priv { u32 msg_enable; int wolopts; int wol_irq; - bool wol_irq_disabled; int clk_csr; struct timer_list eee_ctrl_timer; int lpi_irq; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c index 58542b72cc01..39fa1ec92f82 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c @@ -815,19 +815,7 @@ static int stmmac_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) return ret; } - if (wol->wolopts) { - device_set_wakeup_enable(priv->device, 1); - /* Avoid unbalanced enable_irq_wake calls */ - if (priv->wol_irq_disabled) - enable_irq_wake(priv->wol_irq); - priv->wol_irq_disabled = false; - } else { - device_set_wakeup_enable(priv->device, 0); - /* Avoid unbalanced disable_irq_wake calls */ - if (!priv->wol_irq_disabled) - disable_irq_wake(priv->wol_irq); - priv->wol_irq_disabled = true; - } + device_set_wakeup_enable(priv->device, !!wol->wolopts); mutex_lock(&priv->lock); priv->wolopts = wol->wolopts; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index cff75367311e..db0433d0ce40 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #ifdef CONFIG_DEBUG_FS @@ -3724,7 +3725,6 @@ static int stmmac_request_irq_multi_msi(struct net_device *dev) /* Request the Wake IRQ in case of another line * is used for WoL */ - priv->wol_irq_disabled = true; if (priv->wol_irq > 0 && priv->wol_irq != dev->irq) { int_name = priv->int_name_wol; sprintf(int_name, "%s:%s", dev->name, "wol"); @@ -3885,7 +3885,6 @@ static int stmmac_request_irq_single(struct net_device *dev) /* Request the Wake IRQ in case of another line * is used for WoL */ - priv->wol_irq_disabled = true; if (priv->wol_irq > 0 && priv->wol_irq != dev->irq) { ret = request_irq(priv->wol_irq, stmmac_interrupt, IRQF_SHARED, dev->name, dev); @@ -7277,6 +7276,7 @@ static int stmmac_hw_init(struct stmmac_priv *priv) if (priv->plat->pmt) { dev_info(priv->device, "Wake-Up On Lan supported\n"); device_set_wakeup_capable(priv->device, 1); + devm_pm_set_wake_irq(priv->device, priv->wol_irq); } if (priv->dma_cap.tsoen) From 6a9a6ce962292b2cf31049f02a9360ca2ef25676 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Fri, 15 Aug 2025 12:32:15 +0100 Subject: [PATCH 0136/1536] net: stmmac: add helpers to indicate WoL enable status Add two helpers to abstract the WoL enable status at the PHY and MAC to make the code easier to read. Signed-off-by: Russell King (Oracle) Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/E1umsfP-008vKp-U1@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 10 ++++++++++ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 11 +++++------ drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 4 ++-- 3 files changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index b16b8aeeb583..78d6b3737a26 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -375,6 +375,16 @@ enum stmmac_state { extern const struct dev_pm_ops stmmac_simple_pm_ops; +static inline bool stmmac_wol_enabled_mac(struct stmmac_priv *priv) +{ + return priv->plat->pmt && device_may_wakeup(priv->device); +} + +static inline bool stmmac_wol_enabled_phy(struct stmmac_priv *priv) +{ + return !priv->plat->pmt && device_may_wakeup(priv->device); +} + int stmmac_mdio_unregister(struct net_device *ndev); int stmmac_mdio_register(struct net_device *ndev); int stmmac_mdio_reset(struct mii_bus *mii); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index db0433d0ce40..b9e8df90e7b5 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7857,7 +7857,7 @@ int stmmac_suspend(struct device *dev) priv->plat->serdes_powerdown(ndev, priv->plat->bsp_priv); /* Enable Power down mode by programming the PMT regs */ - if (device_may_wakeup(priv->device) && priv->plat->pmt) { + if (stmmac_wol_enabled_mac(priv)) { stmmac_pmt(priv, priv->hw, priv->wolopts); priv->irq_wake = 1; } else { @@ -7868,11 +7868,10 @@ int stmmac_suspend(struct device *dev) mutex_unlock(&priv->lock); rtnl_lock(); - if (device_may_wakeup(priv->device) && !priv->plat->pmt) + if (stmmac_wol_enabled_phy(priv)) phylink_speed_down(priv->phylink, false); - phylink_suspend(priv->phylink, - device_may_wakeup(priv->device) && priv->plat->pmt); + phylink_suspend(priv->phylink, stmmac_wol_enabled_mac(priv)); rtnl_unlock(); if (stmmac_fpe_supported(priv)) @@ -7948,7 +7947,7 @@ int stmmac_resume(struct device *dev) * this bit because it can generate problems while resuming * from another devices (e.g. serial console). */ - if (device_may_wakeup(priv->device) && priv->plat->pmt) { + if (stmmac_wol_enabled_mac(priv)) { mutex_lock(&priv->lock); stmmac_pmt(priv, priv->hw, 0); mutex_unlock(&priv->lock); @@ -8008,7 +8007,7 @@ int stmmac_resume(struct device *dev) * workqueue thread, which will race with initialisation. */ phylink_resume(priv->phylink); - if (device_may_wakeup(priv->device) && !priv->plat->pmt) + if (stmmac_wol_enabled_phy(priv)) phylink_speed_up(priv->phylink); rtnl_unlock(); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index c849676d98e8..a3e077f225d1 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -934,7 +934,7 @@ static int __maybe_unused stmmac_pltfr_noirq_suspend(struct device *dev) if (!netif_running(ndev)) return 0; - if (!device_may_wakeup(priv->device) || !priv->plat->pmt) { + if (!stmmac_wol_enabled_mac(priv)) { /* Disable clock in case of PWM is off */ clk_disable_unprepare(priv->plat->clk_ptp_ref); @@ -955,7 +955,7 @@ static int __maybe_unused stmmac_pltfr_noirq_resume(struct device *dev) if (!netif_running(ndev)) return 0; - if (!device_may_wakeup(priv->device) || !priv->plat->pmt) { + if (!stmmac_wol_enabled_mac(priv)) { /* enable the clk previously disabled */ ret = pm_runtime_force_resume(dev); if (ret) From 5e5b39aa6f82e74a2b4caa79005b2e673f657809 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Fri, 15 Aug 2025 12:32:21 +0100 Subject: [PATCH 0137/1536] net: stmmac: explain the phylink_speed_down() call in stmmac_release() The call to phylink_speed_down() looks odd on the face of it. Add a comment to explain why this call is there. phylink_speed_up() is always called in __stmmac_open(), and already has a comment. Signed-off-by: Russell King (Oracle) Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/E1umsfV-008vKv-1O@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index b9e8df90e7b5..0bbdc1f1a691 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4138,8 +4138,13 @@ static int stmmac_release(struct net_device *dev) struct stmmac_priv *priv = netdev_priv(dev); u32 chan; + /* If the PHY or MAC has WoL enabled, then the PHY will not be + * suspended when phylink_stop() is called below. Set the PHY + * to its slowest speed to save power. + */ if (device_may_wakeup(priv->device)) phylink_speed_down(priv->phylink, false); + /* Stop and disconnect the PHY */ phylink_stop(priv->phylink); phylink_disconnect_phy(priv->phylink); From e798f2ac6040f46a04795d7de977341fa9aeabae Mon Sep 17 00:00:00 2001 From: Bitterblue Smith Date: Mon, 11 Aug 2025 18:32:55 +0300 Subject: [PATCH 0138/1536] wifi: rtlwifi: rtl8192cu: Don't claim USB ID 07b8:8188 This ID appears to be RTL8188SU, not RTL8188CU. This is the wrong driver for RTL8188SU. The r8712u driver from staging used to handle this ID. Closes: https://lore.kernel.org/linux-wireless/ee0acfef-a753-4f90-87df-15f8eaa9c3a8@gmx.de/ Cc: stable@vger.kernel.org Signed-off-by: Bitterblue Smith Acked-by: Ping-Ke Shih Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/2e5e2348-bdb3-44b2-92b2-0231dbf464b0@gmail.com --- drivers/net/wireless/realtek/rtlwifi/rtl8192cu/sw.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/sw.c b/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/sw.c index 00a6778df704..9480823af838 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/sw.c +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/sw.c @@ -291,7 +291,6 @@ static const struct usb_device_id rtl8192c_usb_ids[] = { {RTL_USB_DEVICE(0x050d, 0x1102, rtl92cu_hal_cfg)}, /*Belkin - Edimax*/ {RTL_USB_DEVICE(0x050d, 0x11f2, rtl92cu_hal_cfg)}, /*Belkin - ISY*/ {RTL_USB_DEVICE(0x06f8, 0xe033, rtl92cu_hal_cfg)}, /*Hercules - Edimax*/ - {RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/ {RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/ {RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/ {RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NG WNA1000Mv2*/ From ec0b44736b1d22b763ee94f1aee856f9e793f3fe Mon Sep 17 00:00:00 2001 From: Bitterblue Smith Date: Mon, 11 Aug 2025 18:33:28 +0300 Subject: [PATCH 0139/1536] wifi: rtl8xxxu: Don't claim USB ID 07b8:8188 This ID appears to be RTL8188SU, not RTL8188CU. This is the wrong driver for RTL8188SU. The r8712u driver from staging used to handle this ID. Closes: https://lore.kernel.org/linux-wireless/ee0acfef-a753-4f90-87df-15f8eaa9c3a8@gmx.de/ Cc: stable@vger.kernel.org Signed-off-by: Bitterblue Smith Reviewed-by: Ping-Ke Shih Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/f147b2ab-4505-435a-aa32-62964e4f1f1e@gmail.com --- drivers/net/wireless/realtek/rtl8xxxu/core.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/wireless/realtek/rtl8xxxu/core.c b/drivers/net/wireless/realtek/rtl8xxxu/core.c index 831b5025c634..018f5afcd50d 100644 --- a/drivers/net/wireless/realtek/rtl8xxxu/core.c +++ b/drivers/net/wireless/realtek/rtl8xxxu/core.c @@ -8172,8 +8172,6 @@ static const struct usb_device_id dev_table[] = { .driver_info = (unsigned long)&rtl8192cu_fops}, {USB_DEVICE_AND_INTERFACE_INFO(0x06f8, 0xe033, 0xff, 0xff, 0xff), .driver_info = (unsigned long)&rtl8192cu_fops}, -{USB_DEVICE_AND_INTERFACE_INFO(0x07b8, 0x8188, 0xff, 0xff, 0xff), - .driver_info = (unsigned long)&rtl8192cu_fops}, {USB_DEVICE_AND_INTERFACE_INFO(0x07b8, 0x8189, 0xff, 0xff, 0xff), .driver_info = (unsigned long)&rtl8192cu_fops}, {USB_DEVICE_AND_INTERFACE_INFO(0x0846, 0x9041, 0xff, 0xff, 0xff), From 33319e8fd7ac2631eef89c770940fa5f96a93063 Mon Sep 17 00:00:00 2001 From: Liao Yuanhong Date: Mon, 18 Aug 2025 14:42:19 +0800 Subject: [PATCH 0140/1536] wifi: rtw89: 8852bt: Simplify unnecessary if-else conditions in _dpk_onoff() Some simple if-else logic can be simplified using the ! operator to improve code readability. Signed-off-by: Liao Yuanhong Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20250818064219.448066-1-liaoyuanhong@vivo.com --- drivers/net/wireless/realtek/rtw89/rtw8852bt_rfk.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw89/rtw8852bt_rfk.c b/drivers/net/wireless/realtek/rtw89/rtw8852bt_rfk.c index b01f921b4224..99cb6a973de2 100644 --- a/drivers/net/wireless/realtek/rtw89/rtw8852bt_rfk.c +++ b/drivers/net/wireless/realtek/rtw89/rtw8852bt_rfk.c @@ -1803,10 +1803,7 @@ static void _dpk_onoff(struct rtw89_dev *rtwdev, enum rtw89_rf_path path, bool o val = dpk->is_dpk_enable && !off && dpk->bp[path][kidx].path_ok; - if (off) - off_reverse = false; - else - off_reverse = true; + off_reverse = !off; val = dpk->is_dpk_enable & off_reverse & dpk->bp[path][kidx].path_ok; From 2b30fc01a6c788ed4a799ed8a6f42ed9ac82417f Mon Sep 17 00:00:00 2001 From: Mohsin Bashir Date: Wed, 13 Aug 2025 15:13:11 -0700 Subject: [PATCH 0141/1536] eth: fbnic: Add support for HDS configuration Add support for configuring the header data split threshold. For fbnic, the tcp data split support is enabled all the time. Fbnic supports a maximum buffer size of 4KB. However, the reservation for the headroom, tailroom, and padding reduce the max header size accordingly. ethtool_hds -g eth0 Ring parameters for eth0: Pre-set maximums: ... HDS thresh: 3584 Current hardware settings: ... HDS thresh: 1536 Verify hds tests in ksft-net-drv are passing ksft-net-drv]# ./drivers/net/hds.py TAP version 13 1..13 ok 1 hds.get_hds ok 2 hds.get_hds_thresh ok 3 hds.set_hds_disable # SKIP disabling of HDS not supported by ... ... ... ok 12 hds.ioctl_set_xdp ok 13 hds.ioctl_enabled_set_xdp \# Totals: pass:12 fail:0 xfail:0 xpass:0 skip:1 error:0 Signed-off-by: Jakub Kicinski Signed-off-by: Mohsin Bashir Link: https://patch.msgid.link/20250813221319.3367670-2-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni --- .../net/ethernet/meta/fbnic/fbnic_ethtool.c | 21 ++++++++++++++++--- .../net/ethernet/meta/fbnic/fbnic_netdev.c | 4 ++++ .../net/ethernet/meta/fbnic/fbnic_netdev.h | 2 ++ drivers/net/ethernet/meta/fbnic/fbnic_txrx.c | 17 +++++++++++---- drivers/net/ethernet/meta/fbnic/fbnic_txrx.h | 7 ++++++- 5 files changed, 43 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c index 461c8661eb93..569ddd767f9d 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c @@ -2,6 +2,7 @@ /* Copyright (c) Meta Platforms, Inc. and affiliates. */ #include +#include #include #include #include @@ -160,6 +161,7 @@ static void fbnic_clone_swap_cfg(struct fbnic_net *orig, swap(clone->num_rx_queues, orig->num_rx_queues); swap(clone->num_tx_queues, orig->num_tx_queues); swap(clone->num_napi, orig->num_napi); + swap(clone->hds_thresh, orig->hds_thresh); } static void fbnic_aggregate_vector_counters(struct fbnic_net *fbn, @@ -277,15 +279,21 @@ fbnic_get_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring, ring->rx_mini_pending = fbn->hpq_size; ring->rx_jumbo_pending = fbn->ppq_size; ring->tx_pending = fbn->txq_size; + + kernel_ring->tcp_data_split = ETHTOOL_TCP_DATA_SPLIT_ENABLED; + kernel_ring->hds_thresh_max = FBNIC_HDS_THRESH_MAX; + kernel_ring->hds_thresh = fbn->hds_thresh; } static void fbnic_set_rings(struct fbnic_net *fbn, - struct ethtool_ringparam *ring) + struct ethtool_ringparam *ring, + struct kernel_ethtool_ringparam *kernel_ring) { fbn->rcq_size = ring->rx_pending; fbn->hpq_size = ring->rx_mini_pending; fbn->ppq_size = ring->rx_jumbo_pending; fbn->txq_size = ring->tx_pending; + fbn->hds_thresh = kernel_ring->hds_thresh; } static int @@ -316,8 +324,13 @@ fbnic_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring, return -EINVAL; } + if (kernel_ring->tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_DISABLED) { + NL_SET_ERR_MSG_MOD(extack, "Cannot disable TCP data split"); + return -EINVAL; + } + if (!netif_running(netdev)) { - fbnic_set_rings(fbn, ring); + fbnic_set_rings(fbn, ring, kernel_ring); return 0; } @@ -325,7 +338,7 @@ fbnic_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring, if (!clone) return -ENOMEM; - fbnic_set_rings(clone, ring); + fbnic_set_rings(clone, ring, kernel_ring); err = fbnic_alloc_napi_vectors(clone); if (err) @@ -1678,6 +1691,8 @@ fbnic_get_rmon_stats(struct net_device *netdev, static const struct ethtool_ops fbnic_ethtool_ops = { .supported_coalesce_params = ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_RX_MAX_FRAMES, + .supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT | + ETHTOOL_RING_USE_HDS_THRS, .rxfh_max_num_contexts = FBNIC_RPC_RSS_TBL_COUNT, .get_drvinfo = fbnic_get_drvinfo, .get_regs_len = fbnic_get_regs_len, diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c index e67e99487a27..a7eb7a367b98 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c @@ -695,6 +695,10 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd) fbn->rx_usecs = FBNIC_RX_USECS_DEFAULT; fbn->rx_max_frames = FBNIC_RX_FRAMES_DEFAULT; + /* Initialize the hds_thresh */ + netdev->cfg->hds_thresh = FBNIC_HDS_THRESH_DEFAULT; + fbn->hds_thresh = FBNIC_HDS_THRESH_DEFAULT; + default_queues = netif_get_num_default_rss_queues(); if (default_queues > fbd->max_num_queues) default_queues = fbd->max_num_queues; diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h index 86576ae04262..04c5c7ed6c3a 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h @@ -31,6 +31,8 @@ struct fbnic_net { u32 ppq_size; u32 rcq_size; + u32 hds_thresh; + u16 rx_usecs; u16 tx_usecs; diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c index f9543d03485f..7c69f6381d9e 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c @@ -2232,13 +2232,22 @@ static void fbnic_enable_rcq(struct fbnic_napi_vector *nv, { struct fbnic_net *fbn = netdev_priv(nv->napi.dev); u32 log_size = fls(rcq->size_mask); - u32 rcq_ctl; + u32 hds_thresh = fbn->hds_thresh; + u32 rcq_ctl = 0; fbnic_config_drop_mode_rcq(nv, rcq); - rcq_ctl = FIELD_PREP(FBNIC_QUEUE_RDE_CTL1_PADLEN_MASK, FBNIC_RX_PAD) | - FIELD_PREP(FBNIC_QUEUE_RDE_CTL1_MAX_HDR_MASK, - FBNIC_RX_MAX_HDR) | + /* Force lower bound on MAX_HEADER_BYTES. Below this, all frames should + * be split at L4. It would also result in the frames being split at + * L2/L3 depending on the frame size. + */ + if (fbn->hds_thresh < FBNIC_HDR_BYTES_MIN) { + rcq_ctl = FBNIC_QUEUE_RDE_CTL0_EN_HDR_SPLIT; + hds_thresh = FBNIC_HDR_BYTES_MIN; + } + + rcq_ctl |= FIELD_PREP(FBNIC_QUEUE_RDE_CTL1_PADLEN_MASK, FBNIC_RX_PAD) | + FIELD_PREP(FBNIC_QUEUE_RDE_CTL1_MAX_HDR_MASK, hds_thresh) | FIELD_PREP(FBNIC_QUEUE_RDE_CTL1_PAYLD_OFF_MASK, FBNIC_RX_PAYLD_OFFSET) | FIELD_PREP(FBNIC_QUEUE_RDE_CTL1_PAYLD_PG_CL_MASK, diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h index 34693596e5eb..7d27712d5462 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h @@ -53,7 +53,6 @@ struct fbnic_net; #define FBNIC_RX_HROOM \ (ALIGN(FBNIC_RX_TROOM + NET_SKB_PAD, 128) - FBNIC_RX_TROOM) #define FBNIC_RX_PAD 0 -#define FBNIC_RX_MAX_HDR (1536 - FBNIC_RX_PAD) #define FBNIC_RX_PAYLD_OFFSET 0 #define FBNIC_RX_PAYLD_PG_CL 0 @@ -61,6 +60,12 @@ struct fbnic_net; #define FBNIC_RING_F_CTX BIT(1) #define FBNIC_RING_F_STATS BIT(2) /* Ring's stats may be used */ +#define FBNIC_HDS_THRESH_MAX \ + (4096 - FBNIC_RX_HROOM - FBNIC_RX_TROOM - FBNIC_RX_PAD) +#define FBNIC_HDS_THRESH_DEFAULT \ + (1536 - FBNIC_RX_PAD) +#define FBNIC_HDR_BYTES_MIN 128 + struct fbnic_pkt_buff { struct xdp_buff buff; ktime_t hwtstamp; From 0cf5a39720d09793f9cc5a8ec6bb026e6b20e883 Mon Sep 17 00:00:00 2001 From: Mohsin Bashir Date: Wed, 13 Aug 2025 15:13:12 -0700 Subject: [PATCH 0142/1536] eth: fbnic: Update Headroom Fbnic currently reserves a minimum of 64B headroom, but this is insufficient for inserting additional headers (e.g., IPV6) via XDP, as only 24 bytes are available for adjustment. To address this limitation, increase the headroom to a larger value while ensuring better page use. Although the resulting headroom (192B) is smaller than the recommended value (256B), forcing the headroom to 256B would require aligning to 256B (as opposed to the current 128B), which can push the max headroom to 511B. Signed-off-by: Jakub Kicinski Signed-off-by: Mohsin Bashir Link: https://patch.msgid.link/20250813221319.3367670-3-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/meta/fbnic/fbnic_txrx.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h index 7d27712d5462..66c84375e299 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h @@ -50,8 +50,9 @@ struct fbnic_net; #define FBNIC_RX_TROOM \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +#define FBNIC_RX_HROOM_PAD 128 #define FBNIC_RX_HROOM \ - (ALIGN(FBNIC_RX_TROOM + NET_SKB_PAD, 128) - FBNIC_RX_TROOM) + (ALIGN(FBNIC_RX_TROOM + FBNIC_RX_HROOM_PAD, 128) - FBNIC_RX_TROOM) #define FBNIC_RX_PAD 0 #define FBNIC_RX_PAYLD_OFFSET 0 #define FBNIC_RX_PAYLD_PG_CL 0 From 61f9a066c3099264f40737d134c7921567f85072 Mon Sep 17 00:00:00 2001 From: Mohsin Bashir Date: Wed, 13 Aug 2025 15:13:13 -0700 Subject: [PATCH 0143/1536] eth: fbnic: Use shinfo to track frags state on Rx Remove local fields that track frags state and instead store this information directly in the shinfo struct. This change is necessary because the current implementation can lead to inaccuracies in certain scenarios, such as when using XDP multi-buff support. Specifically, the XDP program may update nr_frags without updating the local variables, resulting in an inconsistent state. Signed-off-by: Jakub Kicinski Signed-off-by: Mohsin Bashir Link: https://patch.msgid.link/20250813221319.3367670-4-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/meta/fbnic/fbnic_txrx.c | 80 ++++++-------------- drivers/net/ethernet/meta/fbnic/fbnic_txrx.h | 4 +- 2 files changed, 26 insertions(+), 58 deletions(-) diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c index 7c69f6381d9e..2adbe175ac09 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c @@ -892,9 +892,8 @@ static void fbnic_pkt_prepare(struct fbnic_napi_vector *nv, u64 rcd, xdp_prepare_buff(&pkt->buff, hdr_start, headroom, len - FBNIC_RX_PAD, true); - pkt->data_truesize = 0; - pkt->data_len = 0; - pkt->nr_frags = 0; + pkt->hwtstamp = 0; + pkt->add_frag_failed = false; } static void fbnic_add_rx_frag(struct fbnic_napi_vector *nv, u64 rcd, @@ -905,8 +904,8 @@ static void fbnic_add_rx_frag(struct fbnic_napi_vector *nv, u64 rcd, unsigned int pg_off = FIELD_GET(FBNIC_RCD_AL_BUFF_OFF_MASK, rcd); unsigned int len = FIELD_GET(FBNIC_RCD_AL_BUFF_LEN_MASK, rcd); struct page *page = fbnic_page_pool_get(&qt->sub1, pg_idx); - struct skb_shared_info *shinfo; unsigned int truesize; + bool added; truesize = FIELD_GET(FBNIC_RCD_AL_PAGE_FIN, rcd) ? FBNIC_BD_FRAG_SIZE - pg_off : ALIGN(len, 128); @@ -918,34 +917,34 @@ static void fbnic_add_rx_frag(struct fbnic_napi_vector *nv, u64 rcd, dma_sync_single_range_for_cpu(nv->dev, page_pool_get_dma_addr(page), pg_off, truesize, DMA_BIDIRECTIONAL); - /* Add page to xdp shared info */ - shinfo = xdp_get_shared_info_from_buff(&pkt->buff); - - /* We use gso_segs to store truesize */ - pkt->data_truesize += truesize; - - __skb_fill_page_desc_noacc(shinfo, pkt->nr_frags++, page, pg_off, len); - - /* Store data_len in gso_size */ - pkt->data_len += len; + added = xdp_buff_add_frag(&pkt->buff, page_to_netmem(page), pg_off, len, + truesize); + if (unlikely(!added)) { + pkt->add_frag_failed = true; + netdev_err_once(nv->napi.dev, + "Failed to add fragment to xdp_buff\n"); + } } static void fbnic_put_pkt_buff(struct fbnic_napi_vector *nv, struct fbnic_pkt_buff *pkt, int budget) { - struct skb_shared_info *shinfo; struct page *page; - int nr_frags; if (!pkt->buff.data_hard_start) return; - shinfo = xdp_get_shared_info_from_buff(&pkt->buff); - nr_frags = pkt->nr_frags; + if (xdp_buff_has_frags(&pkt->buff)) { + struct skb_shared_info *shinfo; + int nr_frags; - while (nr_frags--) { - page = skb_frag_page(&shinfo->frags[nr_frags]); - page_pool_put_full_page(nv->page_pool, page, !!budget); + shinfo = xdp_get_shared_info_from_buff(&pkt->buff); + nr_frags = shinfo->nr_frags; + + while (nr_frags--) { + page = skb_frag_page(&shinfo->frags[nr_frags]); + page_pool_put_full_page(nv->page_pool, page, !!budget); + } } page = virt_to_page(pkt->buff.data_hard_start); @@ -955,43 +954,12 @@ static void fbnic_put_pkt_buff(struct fbnic_napi_vector *nv, static struct sk_buff *fbnic_build_skb(struct fbnic_napi_vector *nv, struct fbnic_pkt_buff *pkt) { - unsigned int nr_frags = pkt->nr_frags; - struct skb_shared_info *shinfo; - unsigned int truesize; struct sk_buff *skb; - truesize = xdp_data_hard_end(&pkt->buff) + FBNIC_RX_TROOM - - pkt->buff.data_hard_start; - - /* Build frame around buffer */ - skb = napi_build_skb(pkt->buff.data_hard_start, truesize); - if (unlikely(!skb)) + skb = xdp_build_skb_from_buff(&pkt->buff); + if (!skb) return NULL; - /* Push data pointer to start of data, put tail to end of data */ - skb_reserve(skb, pkt->buff.data - pkt->buff.data_hard_start); - __skb_put(skb, pkt->buff.data_end - pkt->buff.data); - - /* Add tracking for metadata at the start of the frame */ - skb_metadata_set(skb, pkt->buff.data - pkt->buff.data_meta); - - /* Add Rx frags */ - if (nr_frags) { - /* Verify that shared info didn't move */ - shinfo = xdp_get_shared_info_from_buff(&pkt->buff); - WARN_ON(skb_shinfo(skb) != shinfo); - - skb->truesize += pkt->data_truesize; - skb->data_len += pkt->data_len; - shinfo->nr_frags = nr_frags; - skb->len += pkt->data_len; - } - - skb_mark_for_recycle(skb); - - /* Set MAC header specific fields */ - skb->protocol = eth_type_trans(skb, nv->napi.dev); - /* Add timestamp if present */ if (pkt->hwtstamp) skb_hwtstamps(skb)->hwtstamp = pkt->hwtstamp; @@ -1094,7 +1062,9 @@ static int fbnic_clean_rcq(struct fbnic_napi_vector *nv, /* We currently ignore the action table index */ break; case FBNIC_RCD_TYPE_META: - if (likely(!fbnic_rcd_metadata_err(rcd))) + if (unlikely(pkt->add_frag_failed)) + skb = NULL; + else if (likely(!fbnic_rcd_metadata_err(rcd))) skb = fbnic_build_skb(nv, pkt); /* Populate skb and invalidate XDP */ diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h index 66c84375e299..d236152bbaaa 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h @@ -70,9 +70,7 @@ struct fbnic_net; struct fbnic_pkt_buff { struct xdp_buff buff; ktime_t hwtstamp; - u32 data_truesize; - u16 data_len; - u16 nr_frags; + bool add_frag_failed; }; struct fbnic_queue_stats { From 9064ab485f04df40e7f0838245849e2e4c5159d9 Mon Sep 17 00:00:00 2001 From: Mohsin Bashir Date: Wed, 13 Aug 2025 15:13:14 -0700 Subject: [PATCH 0144/1536] eth: fbnic: Prefetch packet headers on Rx Issue a prefetch for the start of the buffer on Rx to try to avoid cache miss on packet headers. Signed-off-by: Jakub Kicinski Signed-off-by: Mohsin Bashir Link: https://patch.msgid.link/20250813221319.3367670-5-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/meta/fbnic/fbnic_txrx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c index 2adbe175ac09..65d1e40addec 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c @@ -888,7 +888,7 @@ static void fbnic_pkt_prepare(struct fbnic_napi_vector *nv, u64 rcd, /* Build frame around buffer */ hdr_start = page_address(page) + hdr_pg_start; - + net_prefetch(pkt->buff.data); xdp_prepare_buff(&pkt->buff, hdr_start, headroom, len - FBNIC_RX_PAD, true); From 1b0a3950dbd4fa278dc33401a4faba2a23307a16 Mon Sep 17 00:00:00 2001 From: Mohsin Bashir Date: Wed, 13 Aug 2025 15:13:15 -0700 Subject: [PATCH 0145/1536] eth: fbnic: Add XDP pass, drop, abort support Add basic support for attaching an XDP program to the device and support for PASS/DROP/ABORT actions. In fbnic, buffers are always mapped as DMA_BIDIRECTIONAL. The BPF program pointer can be read either on a per-packet basis or on a per-NAPI poll basis. Both approaches are functionally equivalent, in the current code. Stick to per-packet as it limits number of arguments we need to pass around. On the XDP hot path, check that packets with fragments are only allowed when multi-buffer support is enabled for the XDP program. Ideally, this check should not be necessary because ndo_bpf verifies that for XDP programs without multi-buff support, MTU is less than the hds_thresh. However, the MTU currently does not enforce the receive size which would require cleaning up the data path and bouncing the link. For practical reasons, prioritize the ability to enter and exit BPF mode with different MTU sizes without requiring a full reconfig. Testing: Hook a simple XDP program that passes all the packets destined for a specific port iperf3 -c 192.168.1.10 -P 5 -p 12345 Connecting to host 192.168.1.10, port 12345 [ 5] local 192.168.1.9 port 46702 connected to 192.168.1.10 port 12345 [ ID] Interval Transfer Bitrate Retr Cwnd - - - - - - - - - - - - - - - - - - - - - - - - - [SUM] 1.00-2.00 sec 3.86 GBytes 33.2 Gbits/sec 0 XDP_DROP: Hook an XDP program that drops packets destined for a specific port iperf3 -c 192.168.1.10 -P 5 -p 12345 ^C- - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [SUM] 0.00-0.00 sec 0.00 Bytes 0.00 bits/sec 0 sender [SUM] 0.00-0.00 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: interrupt - the client has terminated XDP with HDS: - Validate XDP attachment failure when HDS is low ~] ethtool -G eth0 hds-thresh 512 ~] sudo ip link set eth0 xdpdrv obj xdp_pass_12345.o sec xdp ~] Error: fbnic: MTU too high, or HDS threshold is too low for single buffer XDP. - Validate successful XDP attachment when HDS threshold is appropriate ~] ethtool -G eth0 hds-thresh 1536 ~] sudo ip link set eth0 xdpdrv obj xdp_pass_12345.o sec xdp - Validate when the XDP program is attached, changing HDS thresh to a lower value fails ~] ethtool -G eth0 hds-thresh 512 ~] netlink error: fbnic: Use higher HDS threshold or multi-buf capable program - Validate HDS thresh does not matter when xdp frags support is available ~] ethtool -G eth0 hds-thresh 512 ~] sudo ip link set eth0 xdpdrv obj xdp_pass_mb_12345.o sec xdp.frags Signed-off-by: Jakub Kicinski Signed-off-by: Mohsin Bashir Link: https://patch.msgid.link/20250813221319.3367670-6-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni --- .../net/ethernet/meta/fbnic/fbnic_ethtool.c | 11 +++ .../net/ethernet/meta/fbnic/fbnic_netdev.c | 35 +++++++ .../net/ethernet/meta/fbnic/fbnic_netdev.h | 5 + drivers/net/ethernet/meta/fbnic/fbnic_txrx.c | 96 +++++++++++++++++-- drivers/net/ethernet/meta/fbnic/fbnic_txrx.h | 1 + 5 files changed, 141 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c index 569ddd767f9d..4c231433a63d 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c @@ -329,6 +329,17 @@ fbnic_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring, return -EINVAL; } + /* If an XDP program is attached, we should check for potential frame + * splitting. If the new HDS threshold can cause splitting, we should + * only allow if the attached XDP program can handle frags. + */ + if (fbnic_check_split_frames(fbn->xdp_prog, netdev->mtu, + kernel_ring->hds_thresh)) { + NL_SET_ERR_MSG_MOD(extack, + "Use higher HDS threshold or multi-buf capable program"); + return -EINVAL; + } + if (!netif_running(netdev)) { fbnic_set_rings(fbn, ring, kernel_ring); return 0; diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c index a7eb7a367b98..fb81d1a7bc51 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c @@ -508,6 +508,40 @@ static void fbnic_get_stats64(struct net_device *dev, } } +bool fbnic_check_split_frames(struct bpf_prog *prog, unsigned int mtu, + u32 hds_thresh) +{ + if (!prog) + return false; + + if (prog->aux->xdp_has_frags) + return false; + + return mtu + ETH_HLEN > hds_thresh; +} + +static int fbnic_bpf(struct net_device *netdev, struct netdev_bpf *bpf) +{ + struct bpf_prog *prog = bpf->prog, *prev_prog; + struct fbnic_net *fbn = netdev_priv(netdev); + + if (bpf->command != XDP_SETUP_PROG) + return -EINVAL; + + if (fbnic_check_split_frames(prog, netdev->mtu, + fbn->hds_thresh)) { + NL_SET_ERR_MSG_MOD(bpf->extack, + "MTU too high, or HDS threshold is too low for single buffer XDP"); + return -EOPNOTSUPP; + } + + prev_prog = xchg(&fbn->xdp_prog, prog); + if (prev_prog) + bpf_prog_put(prev_prog); + + return 0; +} + static const struct net_device_ops fbnic_netdev_ops = { .ndo_open = fbnic_open, .ndo_stop = fbnic_stop, @@ -517,6 +551,7 @@ static const struct net_device_ops fbnic_netdev_ops = { .ndo_set_mac_address = fbnic_set_mac, .ndo_set_rx_mode = fbnic_set_rx_mode, .ndo_get_stats64 = fbnic_get_stats64, + .ndo_bpf = fbnic_bpf, .ndo_hwtstamp_get = fbnic_hwtstamp_get, .ndo_hwtstamp_set = fbnic_hwtstamp_set, }; diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h index 04c5c7ed6c3a..bfa79ea910d8 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h @@ -18,6 +18,8 @@ #define FBNIC_TUN_GSO_FEATURES NETIF_F_GSO_IPXIP6 struct fbnic_net { + struct bpf_prog *xdp_prog; + struct fbnic_ring *tx[FBNIC_MAX_TXQS]; struct fbnic_ring *rx[FBNIC_MAX_RXQS]; @@ -104,4 +106,7 @@ int fbnic_phylink_ethtool_ksettings_get(struct net_device *netdev, int fbnic_phylink_get_fecparam(struct net_device *netdev, struct ethtool_fecparam *fecparam); int fbnic_phylink_init(struct net_device *netdev); + +bool fbnic_check_split_frames(struct bpf_prog *prog, + unsigned int mtu, u32 hds_threshold); #endif /* _FBNIC_NETDEV_H_ */ diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c index 65d1e40addec..a669e169e3ad 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c @@ -2,17 +2,26 @@ /* Copyright (c) Meta Platforms, Inc. and affiliates. */ #include +#include +#include #include #include #include #include #include +#include #include "fbnic.h" #include "fbnic_csr.h" #include "fbnic_netdev.h" #include "fbnic_txrx.h" +enum { + FBNIC_XDP_PASS = 0, + FBNIC_XDP_CONSUME, + FBNIC_XDP_LEN_ERR, +}; + enum { FBNIC_XMIT_CB_TS = 0x01, }; @@ -877,7 +886,7 @@ static void fbnic_pkt_prepare(struct fbnic_napi_vector *nv, u64 rcd, headroom = hdr_pg_off - hdr_pg_start + FBNIC_RX_PAD; frame_sz = hdr_pg_end - hdr_pg_start; - xdp_init_buff(&pkt->buff, frame_sz, NULL); + xdp_init_buff(&pkt->buff, frame_sz, &qt->xdp_rxq); hdr_pg_start += (FBNIC_RCD_AL_BUFF_FRAG_MASK & rcd) * FBNIC_BD_FRAG_SIZE; @@ -967,6 +976,39 @@ static struct sk_buff *fbnic_build_skb(struct fbnic_napi_vector *nv, return skb; } +static struct sk_buff *fbnic_run_xdp(struct fbnic_napi_vector *nv, + struct fbnic_pkt_buff *pkt) +{ + struct fbnic_net *fbn = netdev_priv(nv->napi.dev); + struct bpf_prog *xdp_prog; + int act; + + xdp_prog = READ_ONCE(fbn->xdp_prog); + if (!xdp_prog) + goto xdp_pass; + + /* Should never happen, config paths enforce HDS threshold > MTU */ + if (xdp_buff_has_frags(&pkt->buff) && !xdp_prog->aux->xdp_has_frags) + return ERR_PTR(-FBNIC_XDP_LEN_ERR); + + act = bpf_prog_run_xdp(xdp_prog, &pkt->buff); + switch (act) { + case XDP_PASS: +xdp_pass: + return fbnic_build_skb(nv, pkt); + default: + bpf_warn_invalid_xdp_action(nv->napi.dev, xdp_prog, act); + fallthrough; + case XDP_ABORTED: + trace_xdp_exception(nv->napi.dev, xdp_prog, act); + fallthrough; + case XDP_DROP: + break; + } + + return ERR_PTR(-FBNIC_XDP_CONSUME); +} + static enum pkt_hash_types fbnic_skb_hash_type(u64 rcd) { return (FBNIC_RCD_META_L4_TYPE_MASK & rcd) ? PKT_HASH_TYPE_L4 : @@ -1065,7 +1107,7 @@ static int fbnic_clean_rcq(struct fbnic_napi_vector *nv, if (unlikely(pkt->add_frag_failed)) skb = NULL; else if (likely(!fbnic_rcd_metadata_err(rcd))) - skb = fbnic_build_skb(nv, pkt); + skb = fbnic_run_xdp(nv, pkt); /* Populate skb and invalidate XDP */ if (!IS_ERR_OR_NULL(skb)) { @@ -1251,6 +1293,7 @@ static void fbnic_free_napi_vector(struct fbnic_net *fbn, } for (j = 0; j < nv->rxt_count; j++, i++) { + xdp_rxq_info_unreg(&nv->qt[i].xdp_rxq); fbnic_remove_rx_ring(fbn, &nv->qt[i].sub0); fbnic_remove_rx_ring(fbn, &nv->qt[i].sub1); fbnic_remove_rx_ring(fbn, &nv->qt[i].cmpl); @@ -1423,6 +1466,11 @@ static int fbnic_alloc_napi_vector(struct fbnic_dev *fbd, struct fbnic_net *fbn, fbnic_ring_init(&qt->cmpl, db, rxq_idx, FBNIC_RING_F_STATS); fbn->rx[rxq_idx] = &qt->cmpl; + err = xdp_rxq_info_reg(&qt->xdp_rxq, fbn->netdev, rxq_idx, + nv->napi.napi_id); + if (err) + goto free_ring_cur_qt; + /* Update Rx queue index */ rxt_count--; rxq_idx += v_count; @@ -1433,6 +1481,25 @@ static int fbnic_alloc_napi_vector(struct fbnic_dev *fbd, struct fbnic_net *fbn, return 0; + while (rxt_count < nv->rxt_count) { + qt--; + + xdp_rxq_info_unreg(&qt->xdp_rxq); +free_ring_cur_qt: + fbnic_remove_rx_ring(fbn, &qt->sub0); + fbnic_remove_rx_ring(fbn, &qt->sub1); + fbnic_remove_rx_ring(fbn, &qt->cmpl); + rxt_count++; + } + while (txt_count < nv->txt_count) { + qt--; + + fbnic_remove_tx_ring(fbn, &qt->sub0); + fbnic_remove_tx_ring(fbn, &qt->cmpl); + + txt_count++; + } + fbnic_napi_free_irq(fbd, nv); pp_destroy: page_pool_destroy(nv->page_pool); napi_del: @@ -1709,8 +1776,10 @@ static void fbnic_free_nv_resources(struct fbnic_net *fbn, for (i = 0; i < nv->txt_count; i++) fbnic_free_qt_resources(fbn, &nv->qt[i]); - for (j = 0; j < nv->rxt_count; j++, i++) + for (j = 0; j < nv->rxt_count; j++, i++) { fbnic_free_qt_resources(fbn, &nv->qt[i]); + xdp_rxq_info_unreg_mem_model(&nv->qt[i].xdp_rxq); + } } static int fbnic_alloc_nv_resources(struct fbnic_net *fbn, @@ -1722,19 +1791,32 @@ static int fbnic_alloc_nv_resources(struct fbnic_net *fbn, for (i = 0; i < nv->txt_count; i++) { err = fbnic_alloc_tx_qt_resources(fbn, &nv->qt[i]); if (err) - goto free_resources; + goto free_qt_resources; } /* Allocate Rx Resources */ for (j = 0; j < nv->rxt_count; j++, i++) { + /* Register XDP memory model for completion queue */ + err = xdp_reg_mem_model(&nv->qt[i].xdp_rxq.mem, + MEM_TYPE_PAGE_POOL, + nv->page_pool); + if (err) + goto xdp_unreg_mem_model; + err = fbnic_alloc_rx_qt_resources(fbn, &nv->qt[i]); if (err) - goto free_resources; + goto xdp_unreg_cur_model; } return 0; -free_resources: +xdp_unreg_mem_model: + while (j-- && i--) { + fbnic_free_qt_resources(fbn, &nv->qt[i]); +xdp_unreg_cur_model: + xdp_rxq_info_unreg_mem_model(&nv->qt[i].xdp_rxq); + } +free_qt_resources: while (i--) fbnic_free_qt_resources(fbn, &nv->qt[i]); return err; @@ -2026,7 +2108,7 @@ void fbnic_flush(struct fbnic_net *fbn) memset(qt->cmpl.desc, 0, qt->cmpl.size); fbnic_put_pkt_buff(nv, qt->cmpl.pkt, 0); - qt->cmpl.pkt->buff.data_hard_start = NULL; + memset(qt->cmpl.pkt, 0, sizeof(struct fbnic_pkt_buff)); } } } diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h index d236152bbaaa..5536f72a1c85 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h @@ -128,6 +128,7 @@ struct fbnic_ring { struct fbnic_q_triad { struct fbnic_ring sub0, sub1, cmpl; + struct xdp_rxq_info xdp_rxq; }; struct fbnic_napi_vector { From cf4facfb132af01c3c1bf0fab7685b3db17446ea Mon Sep 17 00:00:00 2001 From: Mohsin Bashir Date: Wed, 13 Aug 2025 15:13:16 -0700 Subject: [PATCH 0146/1536] eth: fbnic: Add support for XDP queues Add support for allocating XDP_TX queues and configuring ring support. FBNIC has been designed with XDP support in mind. Each Tx queue has 2 submission queues and one completion queue, with the expectation that one of the submission queues will be used by the stack, and the other by XDP. XDP queues are populated by XDP_TX and start from index 128 in the TX queue array. The support for XDP_TX is added in the next patch. Signed-off-by: Jakub Kicinski Signed-off-by: Mohsin Bashir Link: https://patch.msgid.link/20250813221319.3367670-7-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni --- .../net/ethernet/meta/fbnic/fbnic_netdev.h | 2 +- drivers/net/ethernet/meta/fbnic/fbnic_txrx.c | 139 +++++++++++++++++- drivers/net/ethernet/meta/fbnic/fbnic_txrx.h | 7 + 3 files changed, 142 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h index bfa79ea910d8..0a6347f28210 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h @@ -20,7 +20,7 @@ struct fbnic_net { struct bpf_prog *xdp_prog; - struct fbnic_ring *tx[FBNIC_MAX_TXQS]; + struct fbnic_ring *tx[FBNIC_MAX_TXQS + FBNIC_MAX_XDPQS]; struct fbnic_ring *rx[FBNIC_MAX_RXQS]; struct fbnic_napi_vector *napi[FBNIC_MAX_NAPI_VECTORS]; diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c index a669e169e3ad..8c7709f707e6 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c @@ -615,6 +615,37 @@ static void fbnic_clean_twq0(struct fbnic_napi_vector *nv, int napi_budget, } } +static void fbnic_clean_twq1(struct fbnic_napi_vector *nv, bool pp_allow_direct, + struct fbnic_ring *ring, bool discard, + unsigned int hw_head) +{ + unsigned int head = ring->head; + u64 total_bytes = 0; + + while (hw_head != head) { + struct page *page; + u64 twd; + + if (unlikely(!(ring->desc[head] & FBNIC_TWD_TYPE(AL)))) + goto next_desc; + + twd = le64_to_cpu(ring->desc[head]); + page = ring->tx_buf[head]; + + total_bytes += FIELD_GET(FBNIC_TWD_LEN_MASK, twd); + + page_pool_put_page(nv->page_pool, page, -1, pp_allow_direct); +next_desc: + head++; + head &= ring->size_mask; + } + + if (!total_bytes) + return; + + ring->head = head; +} + static void fbnic_clean_tsq(struct fbnic_napi_vector *nv, struct fbnic_ring *ring, u64 tcd, int *ts_head, int *head0) @@ -698,12 +729,21 @@ static void fbnic_page_pool_drain(struct fbnic_ring *ring, unsigned int idx, } static void fbnic_clean_twq(struct fbnic_napi_vector *nv, int napi_budget, - struct fbnic_q_triad *qt, s32 ts_head, s32 head0) + struct fbnic_q_triad *qt, s32 ts_head, s32 head0, + s32 head1) { if (head0 >= 0) fbnic_clean_twq0(nv, napi_budget, &qt->sub0, false, head0); else if (ts_head >= 0) fbnic_clean_twq0(nv, napi_budget, &qt->sub0, false, ts_head); + + if (head1 >= 0) { + qt->cmpl.deferred_head = -1; + if (napi_budget) + fbnic_clean_twq1(nv, true, &qt->sub1, false, head1); + else + qt->cmpl.deferred_head = head1; + } } static void @@ -711,6 +751,7 @@ fbnic_clean_tcq(struct fbnic_napi_vector *nv, struct fbnic_q_triad *qt, int napi_budget) { struct fbnic_ring *cmpl = &qt->cmpl; + s32 head1 = cmpl->deferred_head; s32 head0 = -1, ts_head = -1; __le64 *raw_tcd, done; u32 head = cmpl->head; @@ -728,7 +769,10 @@ fbnic_clean_tcq(struct fbnic_napi_vector *nv, struct fbnic_q_triad *qt, switch (FIELD_GET(FBNIC_TCD_TYPE_MASK, tcd)) { case FBNIC_TCD_TYPE_0: - if (!(tcd & FBNIC_TCD_TWQ1)) + if (tcd & FBNIC_TCD_TWQ1) + head1 = FIELD_GET(FBNIC_TCD_TYPE0_HEAD1_MASK, + tcd); + else head0 = FIELD_GET(FBNIC_TCD_TYPE0_HEAD0_MASK, tcd); /* Currently all err status bits are related to @@ -761,7 +805,7 @@ fbnic_clean_tcq(struct fbnic_napi_vector *nv, struct fbnic_q_triad *qt, } /* Unmap and free processed buffers */ - fbnic_clean_twq(nv, napi_budget, qt, ts_head, head0); + fbnic_clean_twq(nv, napi_budget, qt, ts_head, head0, head1); } static void fbnic_clean_bdq(struct fbnic_napi_vector *nv, int napi_budget, @@ -1268,6 +1312,17 @@ static void fbnic_remove_tx_ring(struct fbnic_net *fbn, fbn->tx[txr->q_idx] = NULL; } +static void fbnic_remove_xdp_ring(struct fbnic_net *fbn, + struct fbnic_ring *xdpr) +{ + if (!(xdpr->flags & FBNIC_RING_F_STATS)) + return; + + /* Remove pointer to the Tx ring */ + WARN_ON(fbn->tx[xdpr->q_idx] && fbn->tx[xdpr->q_idx] != xdpr); + fbn->tx[xdpr->q_idx] = NULL; +} + static void fbnic_remove_rx_ring(struct fbnic_net *fbn, struct fbnic_ring *rxr) { @@ -1289,6 +1344,7 @@ static void fbnic_free_napi_vector(struct fbnic_net *fbn, for (i = 0; i < nv->txt_count; i++) { fbnic_remove_tx_ring(fbn, &nv->qt[i].sub0); + fbnic_remove_xdp_ring(fbn, &nv->qt[i].sub1); fbnic_remove_tx_ring(fbn, &nv->qt[i].cmpl); } @@ -1363,6 +1419,7 @@ static void fbnic_ring_init(struct fbnic_ring *ring, u32 __iomem *doorbell, ring->doorbell = doorbell; ring->q_idx = q_idx; ring->flags = flags; + ring->deferred_head = -1; } static int fbnic_alloc_napi_vector(struct fbnic_dev *fbd, struct fbnic_net *fbn, @@ -1372,11 +1429,18 @@ static int fbnic_alloc_napi_vector(struct fbnic_dev *fbd, struct fbnic_net *fbn, { int txt_count = txq_count, rxt_count = rxq_count; u32 __iomem *uc_addr = fbd->uc_addr0; + int xdp_count = 0, qt_count, err; struct fbnic_napi_vector *nv; struct fbnic_q_triad *qt; - int qt_count, err; u32 __iomem *db; + /* We need to reserve at least one Tx Queue Triad for an XDP ring */ + if (rxq_count) { + xdp_count = 1; + if (!txt_count) + txt_count = 1; + } + qt_count = txt_count + rxq_count; if (!qt_count) return -EINVAL; @@ -1425,12 +1489,13 @@ static int fbnic_alloc_napi_vector(struct fbnic_dev *fbd, struct fbnic_net *fbn, qt = nv->qt; while (txt_count) { + u8 flags = FBNIC_RING_F_CTX | FBNIC_RING_F_STATS; + /* Configure Tx queue */ db = &uc_addr[FBNIC_QUEUE(txq_idx) + FBNIC_QUEUE_TWQ0_TAIL]; /* Assign Tx queue to netdev if applicable */ if (txq_count > 0) { - u8 flags = FBNIC_RING_F_CTX | FBNIC_RING_F_STATS; fbnic_ring_init(&qt->sub0, db, txq_idx, flags); fbn->tx[txq_idx] = &qt->sub0; @@ -1440,6 +1505,28 @@ static int fbnic_alloc_napi_vector(struct fbnic_dev *fbd, struct fbnic_net *fbn, FBNIC_RING_F_DISABLED); } + /* Configure XDP queue */ + db = &uc_addr[FBNIC_QUEUE(txq_idx) + FBNIC_QUEUE_TWQ1_TAIL]; + + /* Assign XDP queue to netdev if applicable + * + * The setup for this is in itself a bit different. + * 1. We only need one XDP Tx queue per NAPI vector. + * 2. We associate it to the first Rx queue index. + * 3. The hardware side is associated based on the Tx Queue. + * 4. The netdev queue is offset by FBNIC_MAX_TXQs. + */ + if (xdp_count > 0) { + unsigned int xdp_idx = FBNIC_MAX_TXQS + rxq_idx; + + fbnic_ring_init(&qt->sub1, db, xdp_idx, flags); + fbn->tx[xdp_idx] = &qt->sub1; + xdp_count--; + } else { + fbnic_ring_init(&qt->sub1, db, 0, + FBNIC_RING_F_DISABLED); + } + /* Configure Tx completion queue */ db = &uc_addr[FBNIC_QUEUE(txq_idx) + FBNIC_QUEUE_TCQ_HEAD]; fbnic_ring_init(&qt->cmpl, db, 0, 0); @@ -1495,6 +1582,7 @@ free_ring_cur_qt: qt--; fbnic_remove_tx_ring(fbn, &qt->sub0); + fbnic_remove_xdp_ring(fbn, &qt->sub1); fbnic_remove_tx_ring(fbn, &qt->cmpl); txt_count++; @@ -1729,6 +1817,10 @@ static int fbnic_alloc_tx_qt_resources(struct fbnic_net *fbn, if (err) return err; + err = fbnic_alloc_tx_ring_resources(fbn, &qt->sub1); + if (err) + goto free_sub0; + err = fbnic_alloc_tx_ring_resources(fbn, &qt->cmpl); if (err) goto free_sub1; @@ -1736,6 +1828,8 @@ static int fbnic_alloc_tx_qt_resources(struct fbnic_net *fbn, return 0; free_sub1: + fbnic_free_ring_resources(dev, &qt->sub1); +free_sub0: fbnic_free_ring_resources(dev, &qt->sub0); return err; } @@ -1923,6 +2017,15 @@ static void fbnic_disable_twq0(struct fbnic_ring *txr) fbnic_ring_wr32(txr, FBNIC_QUEUE_TWQ0_CTL, twq_ctl); } +static void fbnic_disable_twq1(struct fbnic_ring *txr) +{ + u32 twq_ctl = fbnic_ring_rd32(txr, FBNIC_QUEUE_TWQ1_CTL); + + twq_ctl &= ~FBNIC_QUEUE_TWQ_CTL_ENABLE; + + fbnic_ring_wr32(txr, FBNIC_QUEUE_TWQ1_CTL, twq_ctl); +} + static void fbnic_disable_tcq(struct fbnic_ring *txr) { fbnic_ring_wr32(txr, FBNIC_QUEUE_TCQ_CTL, 0); @@ -1968,6 +2071,7 @@ void fbnic_disable(struct fbnic_net *fbn) struct fbnic_q_triad *qt = &nv->qt[t]; fbnic_disable_twq0(&qt->sub0); + fbnic_disable_twq1(&qt->sub1); fbnic_disable_tcq(&qt->cmpl); } @@ -2082,6 +2186,8 @@ void fbnic_flush(struct fbnic_net *fbn) /* Clean the work queues of unprocessed work */ fbnic_clean_twq0(nv, 0, &qt->sub0, true, qt->sub0.tail); + fbnic_clean_twq1(nv, false, &qt->sub1, true, + qt->sub1.tail); /* Reset completion queue descriptor ring */ memset(qt->cmpl.desc, 0, qt->cmpl.size); @@ -2156,6 +2262,28 @@ static void fbnic_enable_twq0(struct fbnic_ring *twq) fbnic_ring_wr32(twq, FBNIC_QUEUE_TWQ0_CTL, FBNIC_QUEUE_TWQ_CTL_ENABLE); } +static void fbnic_enable_twq1(struct fbnic_ring *twq) +{ + u32 log_size = fls(twq->size_mask); + + if (!twq->size_mask) + return; + + /* Reset head/tail */ + fbnic_ring_wr32(twq, FBNIC_QUEUE_TWQ1_CTL, FBNIC_QUEUE_TWQ_CTL_RESET); + twq->tail = 0; + twq->head = 0; + + /* Store descriptor ring address and size */ + fbnic_ring_wr32(twq, FBNIC_QUEUE_TWQ1_BAL, lower_32_bits(twq->dma)); + fbnic_ring_wr32(twq, FBNIC_QUEUE_TWQ1_BAH, upper_32_bits(twq->dma)); + + /* Write lower 4 bits of log size as 64K ring size is 0 */ + fbnic_ring_wr32(twq, FBNIC_QUEUE_TWQ1_SIZE, log_size & 0xf); + + fbnic_ring_wr32(twq, FBNIC_QUEUE_TWQ1_CTL, FBNIC_QUEUE_TWQ_CTL_ENABLE); +} + static void fbnic_enable_tcq(struct fbnic_napi_vector *nv, struct fbnic_ring *tcq) { @@ -2341,6 +2469,7 @@ void fbnic_enable(struct fbnic_net *fbn) struct fbnic_q_triad *qt = &nv->qt[t]; fbnic_enable_twq0(&qt->sub0); + fbnic_enable_twq1(&qt->sub1); fbnic_enable_tcq(nv, &qt->cmpl); } diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h index 5536f72a1c85..0e92d11115a6 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h @@ -35,6 +35,7 @@ struct fbnic_net; #define FBNIC_MAX_TXQS 128u #define FBNIC_MAX_RXQS 128u +#define FBNIC_MAX_XDPQS 128u /* These apply to TWQs, TCQ, RCQ */ #define FBNIC_QUEUE_SIZE_MIN 16u @@ -119,6 +120,12 @@ struct fbnic_ring { u32 head, tail; /* Head/Tail of ring */ + /* Deferred_head is used to cache the head for TWQ1 if an attempt + * is made to clean TWQ1 with zero napi_budget. We do not use it for + * any other ring. + */ + s32 deferred_head; + struct fbnic_queue_stats stats; /* Slow path fields follow */ From 168deb7b31b2e2f578726093c8a133f8e36c0b86 Mon Sep 17 00:00:00 2001 From: Mohsin Bashir Date: Wed, 13 Aug 2025 15:13:17 -0700 Subject: [PATCH 0147/1536] eth: fbnic: Add support for XDP_TX action Add support for XDP_TX action and cleaning the associated work. Signed-off-by: Jakub Kicinski Signed-off-by: Mohsin Bashir Link: https://patch.msgid.link/20250813221319.3367670-8-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/meta/fbnic/fbnic_txrx.c | 85 +++++++++++++++++++- 1 file changed, 84 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c index 8c7709f707e6..de3610a7491a 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c @@ -19,6 +19,7 @@ enum { FBNIC_XDP_PASS = 0, FBNIC_XDP_CONSUME, + FBNIC_XDP_TX, FBNIC_XDP_LEN_ERR, }; @@ -1020,6 +1021,80 @@ static struct sk_buff *fbnic_build_skb(struct fbnic_napi_vector *nv, return skb; } +static long fbnic_pkt_tx(struct fbnic_napi_vector *nv, + struct fbnic_pkt_buff *pkt) +{ + struct fbnic_ring *ring = &nv->qt[0].sub1; + int size, offset, nsegs = 1, data_len = 0; + unsigned int tail = ring->tail; + struct skb_shared_info *shinfo; + skb_frag_t *frag = NULL; + struct page *page; + dma_addr_t dma; + __le64 *twd; + + if (unlikely(xdp_buff_has_frags(&pkt->buff))) { + shinfo = xdp_get_shared_info_from_buff(&pkt->buff); + nsegs += shinfo->nr_frags; + data_len = shinfo->xdp_frags_size; + frag = &shinfo->frags[0]; + } + + if (fbnic_desc_unused(ring) < nsegs) + return -FBNIC_XDP_CONSUME; + + page = virt_to_page(pkt->buff.data_hard_start); + offset = offset_in_page(pkt->buff.data); + dma = page_pool_get_dma_addr(page); + + size = pkt->buff.data_end - pkt->buff.data; + + while (nsegs--) { + dma_sync_single_range_for_device(nv->dev, dma, offset, size, + DMA_BIDIRECTIONAL); + dma += offset; + + ring->tx_buf[tail] = page; + + twd = &ring->desc[tail]; + *twd = cpu_to_le64(FIELD_PREP(FBNIC_TWD_ADDR_MASK, dma) | + FIELD_PREP(FBNIC_TWD_LEN_MASK, size) | + FIELD_PREP(FBNIC_TWD_TYPE_MASK, + FBNIC_TWD_TYPE_AL)); + + tail++; + tail &= ring->size_mask; + + if (!data_len) + break; + + offset = skb_frag_off(frag); + page = skb_frag_page(frag); + dma = page_pool_get_dma_addr(page); + + size = skb_frag_size(frag); + data_len -= size; + frag++; + } + + *twd |= FBNIC_TWD_TYPE(LAST_AL); + + ring->tail = tail; + + return -FBNIC_XDP_TX; +} + +static void fbnic_pkt_commit_tail(struct fbnic_napi_vector *nv, + unsigned int pkt_tail) +{ + struct fbnic_ring *ring = &nv->qt[0].sub1; + + /* Force DMA writes to flush before writing to tail */ + dma_wmb(); + + writel(pkt_tail, ring->doorbell); +} + static struct sk_buff *fbnic_run_xdp(struct fbnic_napi_vector *nv, struct fbnic_pkt_buff *pkt) { @@ -1040,6 +1115,8 @@ static struct sk_buff *fbnic_run_xdp(struct fbnic_napi_vector *nv, case XDP_PASS: xdp_pass: return fbnic_build_skb(nv, pkt); + case XDP_TX: + return ERR_PTR(fbnic_pkt_tx(nv, pkt)); default: bpf_warn_invalid_xdp_action(nv->napi.dev, xdp_prog, act); fallthrough; @@ -1104,10 +1181,10 @@ static int fbnic_clean_rcq(struct fbnic_napi_vector *nv, struct fbnic_q_triad *qt, int budget) { unsigned int packets = 0, bytes = 0, dropped = 0, alloc_failed = 0; + s32 head0 = -1, head1 = -1, pkt_tail = -1; u64 csum_complete = 0, csum_none = 0; struct fbnic_ring *rcq = &qt->cmpl; struct fbnic_pkt_buff *pkt; - s32 head0 = -1, head1 = -1; __le64 *raw_rcd, done; u32 head = rcq->head; @@ -1163,6 +1240,9 @@ static int fbnic_clean_rcq(struct fbnic_napi_vector *nv, bytes += skb->len; napi_gro_receive(&nv->napi, skb); + } else if (skb == ERR_PTR(-FBNIC_XDP_TX)) { + pkt_tail = nv->qt[0].sub1.tail; + bytes += xdp_get_buff_len(&pkt->buff); } else { if (!skb) { alloc_failed++; @@ -1198,6 +1278,9 @@ static int fbnic_clean_rcq(struct fbnic_napi_vector *nv, rcq->stats.rx.csum_none += csum_none; u64_stats_update_end(&rcq->stats.syncp); + if (pkt_tail >= 0) + fbnic_pkt_commit_tail(nv, pkt_tail); + /* Unmap and free processed buffers */ if (head0 >= 0) fbnic_clean_bdq(nv, budget, &qt->sub0, head0); From 5213ff086344394ddb586298ad1d408c93406621 Mon Sep 17 00:00:00 2001 From: Mohsin Bashir Date: Wed, 13 Aug 2025 15:13:18 -0700 Subject: [PATCH 0148/1536] eth: fbnic: Collect packet statistics for XDP Add support for XDP statistics collection and reporting via rtnl_link and netdev_queue API. For XDP programs without frags support, fbnic requires MTU to be less than the HDS threshold. If an over-sized frame is received, the frame is dropped and recorded as rx_length_errors reported via ip stats to highlight that this is an error. Signed-off-by: Jakub Kicinski Signed-off-by: Mohsin Bashir Link: https://patch.msgid.link/20250813221319.3367670-9-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni --- .../device_drivers/ethernet/meta/fbnic.rst | 11 ++++ .../net/ethernet/meta/fbnic/fbnic_netdev.c | 36 ++++++++++++- drivers/net/ethernet/meta/fbnic/fbnic_txrx.c | 51 +++++++++++++++++-- drivers/net/ethernet/meta/fbnic/fbnic_txrx.h | 1 + 4 files changed, 94 insertions(+), 5 deletions(-) diff --git a/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst b/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst index afb8353daefd..fb6559fa4be4 100644 --- a/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst +++ b/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst @@ -160,3 +160,14 @@ behavior and potential performance bottlenecks. credit exhaustion - ``pcie_ob_rd_no_np_cred``: Read requests dropped due to non-posted credit exhaustion + +XDP Length Error: +~~~~~~~~~~~~~~~~~ + +For XDP programs without frags support, fbnic tries to make sure that MTU fits +into a single buffer. If an oversized frame is received and gets fragmented, +it is dropped and the following netlink counters are updated + + - ``rx-length``: number of frames dropped due to lack of fragmentation + support in the attached XDP program + - ``rx-errors``: total number of packets with errors received on the interface diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c index fb81d1a7bc51..b8b684ad376b 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c @@ -407,11 +407,12 @@ static void fbnic_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats64) { u64 rx_bytes, rx_packets, rx_dropped = 0, rx_errors = 0; + u64 rx_over = 0, rx_missed = 0, rx_length = 0; u64 tx_bytes, tx_packets, tx_dropped = 0; struct fbnic_net *fbn = netdev_priv(dev); struct fbnic_dev *fbd = fbn->fbd; struct fbnic_queue_stats *stats; - u64 rx_over = 0, rx_missed = 0; + unsigned int start, i; fbnic_get_hw_stats(fbd); @@ -489,6 +490,7 @@ static void fbnic_get_stats64(struct net_device *dev, stats64->rx_missed_errors = rx_missed; for (i = 0; i < fbn->num_rx_queues; i++) { + struct fbnic_ring *xdpr = fbn->tx[FBNIC_MAX_TXQS + i]; struct fbnic_ring *rxr = fbn->rx[i]; if (!rxr) @@ -500,11 +502,29 @@ static void fbnic_get_stats64(struct net_device *dev, rx_bytes = stats->bytes; rx_packets = stats->packets; rx_dropped = stats->dropped; + rx_length = stats->rx.length_errors; } while (u64_stats_fetch_retry(&stats->syncp, start)); stats64->rx_bytes += rx_bytes; stats64->rx_packets += rx_packets; stats64->rx_dropped += rx_dropped; + stats64->rx_errors += rx_length; + stats64->rx_length_errors += rx_length; + + if (!xdpr) + continue; + + stats = &xdpr->stats; + do { + start = u64_stats_fetch_begin(&stats->syncp); + tx_bytes = stats->bytes; + tx_packets = stats->packets; + tx_dropped = stats->dropped; + } while (u64_stats_fetch_retry(&stats->syncp, start)); + + stats64->tx_bytes += tx_bytes; + stats64->tx_packets += tx_packets; + stats64->tx_dropped += tx_dropped; } } @@ -603,6 +623,7 @@ static void fbnic_get_queue_stats_tx(struct net_device *dev, int idx, struct fbnic_ring *txr = fbn->tx[idx]; struct fbnic_queue_stats *stats; u64 stop, wake, csum, lso; + struct fbnic_ring *xdpr; unsigned int start; u64 bytes, packets; @@ -626,6 +647,19 @@ static void fbnic_get_queue_stats_tx(struct net_device *dev, int idx, tx->hw_gso_wire_packets = lso; tx->stop = stop; tx->wake = wake; + + xdpr = fbn->tx[FBNIC_MAX_TXQS + idx]; + if (xdpr) { + stats = &xdpr->stats; + do { + start = u64_stats_fetch_begin(&stats->syncp); + bytes = stats->bytes; + packets = stats->packets; + } while (u64_stats_fetch_retry(&stats->syncp, start)); + + tx->bytes += bytes; + tx->packets += packets; + } } static void fbnic_get_base_stats(struct net_device *dev, diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c index de3610a7491a..fea4577e38d4 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c @@ -620,8 +620,8 @@ static void fbnic_clean_twq1(struct fbnic_napi_vector *nv, bool pp_allow_direct, struct fbnic_ring *ring, bool discard, unsigned int hw_head) { + u64 total_bytes = 0, total_packets = 0; unsigned int head = ring->head; - u64 total_bytes = 0; while (hw_head != head) { struct page *page; @@ -633,6 +633,11 @@ static void fbnic_clean_twq1(struct fbnic_napi_vector *nv, bool pp_allow_direct, twd = le64_to_cpu(ring->desc[head]); page = ring->tx_buf[head]; + /* TYPE_AL is 2, TYPE_LAST_AL is 3. So this trick gives + * us one increment per packet, with no branches. + */ + total_packets += FIELD_GET(FBNIC_TWD_TYPE_MASK, twd) - + FBNIC_TWD_TYPE_AL; total_bytes += FIELD_GET(FBNIC_TWD_LEN_MASK, twd); page_pool_put_page(nv->page_pool, page, -1, pp_allow_direct); @@ -645,6 +650,18 @@ next_desc: return; ring->head = head; + + if (discard) { + u64_stats_update_begin(&ring->stats.syncp); + ring->stats.dropped += total_packets; + u64_stats_update_end(&ring->stats.syncp); + return; + } + + u64_stats_update_begin(&ring->stats.syncp); + ring->stats.bytes += total_bytes; + ring->stats.packets += total_packets; + u64_stats_update_end(&ring->stats.syncp); } static void fbnic_clean_tsq(struct fbnic_napi_vector *nv, @@ -1040,8 +1057,12 @@ static long fbnic_pkt_tx(struct fbnic_napi_vector *nv, frag = &shinfo->frags[0]; } - if (fbnic_desc_unused(ring) < nsegs) + if (fbnic_desc_unused(ring) < nsegs) { + u64_stats_update_begin(&ring->stats.syncp); + ring->stats.dropped++; + u64_stats_update_end(&ring->stats.syncp); return -FBNIC_XDP_CONSUME; + } page = virt_to_page(pkt->buff.data_hard_start); offset = offset_in_page(pkt->buff.data); @@ -1181,8 +1202,8 @@ static int fbnic_clean_rcq(struct fbnic_napi_vector *nv, struct fbnic_q_triad *qt, int budget) { unsigned int packets = 0, bytes = 0, dropped = 0, alloc_failed = 0; + u64 csum_complete = 0, csum_none = 0, length_errors = 0; s32 head0 = -1, head1 = -1, pkt_tail = -1; - u64 csum_complete = 0, csum_none = 0; struct fbnic_ring *rcq = &qt->cmpl; struct fbnic_pkt_buff *pkt; __le64 *raw_rcd, done; @@ -1247,6 +1268,8 @@ static int fbnic_clean_rcq(struct fbnic_napi_vector *nv, if (!skb) { alloc_failed++; dropped++; + } else if (skb == ERR_PTR(-FBNIC_XDP_LEN_ERR)) { + length_errors++; } else { dropped++; } @@ -1276,6 +1299,7 @@ static int fbnic_clean_rcq(struct fbnic_napi_vector *nv, rcq->stats.rx.alloc_failed += alloc_failed; rcq->stats.rx.csum_complete += csum_complete; rcq->stats.rx.csum_none += csum_none; + rcq->stats.rx.length_errors += length_errors; u64_stats_update_end(&rcq->stats.syncp); if (pkt_tail >= 0) @@ -1359,8 +1383,9 @@ void fbnic_aggregate_ring_rx_counters(struct fbnic_net *fbn, fbn->rx_stats.rx.alloc_failed += stats->rx.alloc_failed; fbn->rx_stats.rx.csum_complete += stats->rx.csum_complete; fbn->rx_stats.rx.csum_none += stats->rx.csum_none; + fbn->rx_stats.rx.length_errors += stats->rx.length_errors; /* Remember to add new stats here */ - BUILD_BUG_ON(sizeof(fbn->rx_stats.rx) / 8 != 3); + BUILD_BUG_ON(sizeof(fbn->rx_stats.rx) / 8 != 4); } void fbnic_aggregate_ring_tx_counters(struct fbnic_net *fbn, @@ -1382,6 +1407,22 @@ void fbnic_aggregate_ring_tx_counters(struct fbnic_net *fbn, BUILD_BUG_ON(sizeof(fbn->tx_stats.twq) / 8 != 6); } +static void fbnic_aggregate_ring_xdp_counters(struct fbnic_net *fbn, + struct fbnic_ring *xdpr) +{ + struct fbnic_queue_stats *stats = &xdpr->stats; + + if (!(xdpr->flags & FBNIC_RING_F_STATS)) + return; + + /* Capture stats from queues before dissasociating them */ + fbn->rx_stats.bytes += stats->bytes; + fbn->rx_stats.packets += stats->packets; + fbn->rx_stats.dropped += stats->dropped; + fbn->tx_stats.bytes += stats->bytes; + fbn->tx_stats.packets += stats->packets; +} + static void fbnic_remove_tx_ring(struct fbnic_net *fbn, struct fbnic_ring *txr) { @@ -1401,6 +1442,8 @@ static void fbnic_remove_xdp_ring(struct fbnic_net *fbn, if (!(xdpr->flags & FBNIC_RING_F_STATS)) return; + fbnic_aggregate_ring_xdp_counters(fbn, xdpr); + /* Remove pointer to the Tx ring */ WARN_ON(fbn->tx[xdpr->q_idx] && fbn->tx[xdpr->q_idx] != xdpr); fbn->tx[xdpr->q_idx] = NULL; diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h index 0e92d11115a6..873440ca6a31 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h +++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h @@ -90,6 +90,7 @@ struct fbnic_queue_stats { u64 alloc_failed; u64 csum_complete; u64 csum_none; + u64 length_errors; } rx; }; u64 dropped; From 7fedb8f2677e89d6a39f238fe7428eaf7646ba58 Mon Sep 17 00:00:00 2001 From: Mohsin Bashir Date: Wed, 13 Aug 2025 15:13:19 -0700 Subject: [PATCH 0149/1536] eth: fbnic: Report XDP stats via ethtool Add support to collect XDP stats via ethtool API. We record packets and bytes sent, and packets dropped on the XDP_TX path. ethtool -S eth0 | grep xdp | grep -v "0" xdp_tx_queue_13_packets: 2 xdp_tx_queue_13_bytes: 16126 Signed-off-by: Jakub Kicinski Signed-off-by: Mohsin Bashir Link: https://patch.msgid.link/20250813221319.3367670-10-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni --- .../net/ethernet/meta/fbnic/fbnic_ethtool.c | 50 ++++++++++++++++++- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c index 4c231433a63d..a758f510f886 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c @@ -112,6 +112,20 @@ static const struct fbnic_stat fbnic_gstrings_hw_q_stats[] = { FBNIC_HW_RXB_DEQUEUE_STATS_LEN * FBNIC_RXB_DEQUEUE_INDICES + \ FBNIC_HW_Q_STATS_LEN * FBNIC_MAX_QUEUES) +#define FBNIC_QUEUE_STAT(name, stat) \ + FBNIC_STAT_FIELDS(fbnic_ring, name, stat) + +static const struct fbnic_stat fbnic_gstrings_xdp_stats[] = { + FBNIC_QUEUE_STAT("xdp_tx_queue_%u_packets", stats.packets), + FBNIC_QUEUE_STAT("xdp_tx_queue_%u_bytes", stats.bytes), + FBNIC_QUEUE_STAT("xdp_tx_queue_%u_dropped", stats.dropped), +}; + +#define FBNIC_XDP_STATS_LEN ARRAY_SIZE(fbnic_gstrings_xdp_stats) + +#define FBNIC_STATS_LEN \ + (FBNIC_HW_STATS_LEN + FBNIC_XDP_STATS_LEN * FBNIC_MAX_XDPQS) + static void fbnic_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo) { @@ -422,6 +436,16 @@ static void fbnic_get_rxb_dequeue_strings(u8 **data, unsigned int idx) ethtool_sprintf(data, stat->string, idx); } +static void fbnic_get_xdp_queue_strings(u8 **data, unsigned int idx) +{ + const struct fbnic_stat *stat; + int i; + + stat = fbnic_gstrings_xdp_stats; + for (i = 0; i < FBNIC_XDP_STATS_LEN; i++, stat++) + ethtool_sprintf(data, stat->string, idx); +} + static void fbnic_get_strings(struct net_device *dev, u32 sset, u8 *data) { const struct fbnic_stat *stat; @@ -447,6 +471,9 @@ static void fbnic_get_strings(struct net_device *dev, u32 sset, u8 *data) for (i = 0; i < FBNIC_HW_Q_STATS_LEN; i++, stat++) ethtool_sprintf(&data, stat->string, idx); } + + for (i = 0; i < FBNIC_MAX_XDPQS; i++) + fbnic_get_xdp_queue_strings(&data, i); break; } } @@ -464,6 +491,24 @@ static void fbnic_report_hw_stats(const struct fbnic_stat *stat, } } +static void fbnic_get_xdp_queue_stats(struct fbnic_ring *ring, u64 **data) +{ + const struct fbnic_stat *stat; + int i; + + if (!ring) { + *data += FBNIC_XDP_STATS_LEN; + return; + } + + stat = fbnic_gstrings_xdp_stats; + for (i = 0; i < FBNIC_XDP_STATS_LEN; i++, stat++, (*data)++) { + u8 *p = (u8 *)ring + stat->offset; + + **data = *(u64 *)p; + } +} + static void fbnic_get_ethtool_stats(struct net_device *dev, struct ethtool_stats *stats, u64 *data) { @@ -511,13 +556,16 @@ static void fbnic_get_ethtool_stats(struct net_device *dev, FBNIC_HW_Q_STATS_LEN, &data); } spin_unlock(&fbd->hw_stats_lock); + + for (i = 0; i < FBNIC_MAX_XDPQS; i++) + fbnic_get_xdp_queue_stats(fbn->tx[i + FBNIC_MAX_TXQS], &data); } static int fbnic_get_sset_count(struct net_device *dev, int sset) { switch (sset) { case ETH_SS_STATS: - return FBNIC_HW_STATS_LEN; + return FBNIC_STATS_LEN; default: return -EOPNOTSUPP; } From 89934dbf169e358b57c2b394bb51a57d3f259dc0 Mon Sep 17 00:00:00 2001 From: Vineeth Karumanchi Date: Thu, 14 Aug 2025 12:40:57 +0530 Subject: [PATCH 0150/1536] net: macb: Add TAPRIO traffic scheduling support Implement Time-Aware Traffic Scheduling (TAPRIO) offload support for Cadence MACB/GEM ethernet controllers to enable IEEE 802.1Qbv compliant time-sensitive networking (TSN) capabilities. Key Features: 1. Enhanced Scheduled Traffic (ENST) Register Management - Per-queue ENST registers: ENST_START_TIME, ENST_ON_TIME, ENST_OFF_TIME - Centralized control via ENST_CONTROL for gate enable/disable - Infrastructure enhancements: * Extended macb_queue structure with ENST timing control registers * Mapped ENST register offsets into queue management framework * Introduced macb_queue_enst_config for per-entry TC configuration - Timing conversion utility: * enst_ns_to_hw_units(): Converts nanoseconds to hardware units * Timing values are programmed as hardware units based on link speed * Conversion formula: time_bytes = time_ns / divisor * Speed-specific divisors: 1Gbps=8, 100Mbps=80, 10Mbps=800 - Hardware limit utility: * enst_max_hw_interval(): Returns max interval for given speed 2. TAPRIO Configuration via "tc qdisc replace" - macb_taprio_setup_replace(): Configures TAPRIO hardware offload - Parameter validation checks performed: * TC entry limit validation against available hardware queues * Base time non-negativity enforcement * Speed-adaptive timing constraint verification * Cycle time vs. total gate time consistency checks * Single-queue gate mask enforcement per scheduling entry - Programming sequence: * GEM doesn't support changing ENST registers if ENST is enabled, hence disable ENST before programming * Atomic timing register configuration (START_TIME, ON_TIME, OFF_TIME) * Enable queues via ENST_CONTROL 3. TAPRIO Cleanup via "tc qdisc destroy" - macb_taprio_destroy(): Safely removes TAPRIO configuration - Restores default queue behavior - Cleanup steps: * Reset TC state * Disable ENST * Clear timing registers * Ensure atomic updates with locking 4. Traffic Control Offload Infrastructure - macb_setup_taprio(): TAPRIO command dispatcher * Verifies hardware support * Handles runtime suspend state - macb_setup_tc(): TC_SETUP_QDISC_TAPRIO entry point - Supports REPLACE and DESTROY operations Tested on Xilinx Versal platforms with QBV-capable MACB controllers. Signed-off-by: Vineeth Karumanchi Link: https://patch.msgid.link/20250814071058.3062453-2-vineeth.karumanchi@amd.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/cadence/macb.h | 66 +++++++ drivers/net/ethernet/cadence/macb_main.c | 223 +++++++++++++++++++++++ 2 files changed, 289 insertions(+) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index c9a5c8beb2fa..d1a98b45c92c 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -184,6 +184,13 @@ #define GEM_DCFG8 0x029C /* Design Config 8 */ #define GEM_DCFG10 0x02A4 /* Design Config 10 */ #define GEM_DCFG12 0x02AC /* Design Config 12 */ +#define GEM_ENST_START_TIME_Q0 0x0800 /* ENST Q0 start time */ +#define GEM_ENST_START_TIME_Q1 0x0804 /* ENST Q1 start time */ +#define GEM_ENST_ON_TIME_Q0 0x0820 /* ENST Q0 on time */ +#define GEM_ENST_ON_TIME_Q1 0x0824 /* ENST Q1 on time */ +#define GEM_ENST_OFF_TIME_Q0 0x0840 /* ENST Q0 off time */ +#define GEM_ENST_OFF_TIME_Q1 0x0844 /* ENST Q1 off time */ +#define GEM_ENST_CONTROL 0x0880 /* ENST control register */ #define GEM_USX_CONTROL 0x0A80 /* High speed PCS control register */ #define GEM_USX_STATUS 0x0A88 /* High speed PCS status register */ @@ -221,6 +228,13 @@ #define GEM_IDR(hw_q) (0x0620 + ((hw_q) << 2)) #define GEM_IMR(hw_q) (0x0640 + ((hw_q) << 2)) +#define GEM_ENST_START_TIME(hw_q) (0x0800 + ((hw_q) << 2)) +#define GEM_ENST_ON_TIME(hw_q) (0x0820 + ((hw_q) << 2)) +#define GEM_ENST_OFF_TIME(hw_q) (0x0840 + ((hw_q) << 2)) + +/* Bitfields in ENST_CONTROL */ +#define GEM_ENST_DISABLE_QUEUE_OFFSET 16 + /* Bitfields in NCR */ #define MACB_LB_OFFSET 0 /* reserved */ #define MACB_LB_SIZE 1 @@ -554,6 +568,23 @@ #define GEM_HIGH_SPEED_OFFSET 26 #define GEM_HIGH_SPEED_SIZE 1 +/* Bitfields in ENST_START_TIME_Qx. */ +#define GEM_START_TIME_SEC_OFFSET 30 +#define GEM_START_TIME_SEC_SIZE 2 +#define GEM_START_TIME_NSEC_OFFSET 0 +#define GEM_START_TIME_NSEC_SIZE 30 + +/* Bitfields in ENST_ON_TIME_Qx. */ +#define GEM_ON_TIME_OFFSET 0 +#define GEM_ON_TIME_SIZE 17 + +/* Bitfields in ENST_OFF_TIME_Qx. */ +#define GEM_OFF_TIME_OFFSET 0 +#define GEM_OFF_TIME_SIZE 17 + +/* Hardware ENST timing registers granularity */ +#define ENST_TIME_GRANULARITY_NS 8 + /* Bitfields in USX_CONTROL. */ #define GEM_USX_CTRL_SPEED_OFFSET 14 #define GEM_USX_CTRL_SPEED_SIZE 3 @@ -1219,6 +1250,11 @@ struct macb_queue { unsigned int RBQP; unsigned int RBQPH; + /* ENST register offsets for this queue */ + unsigned int ENST_START_TIME; + unsigned int ENST_ON_TIME; + unsigned int ENST_OFF_TIME; + /* Lock to protect tx_head and tx_tail */ spinlock_t tx_ptr_lock; unsigned int tx_head, tx_tail; @@ -1397,6 +1433,19 @@ static inline bool gem_has_ptp(struct macb *bp) return IS_ENABLED(CONFIG_MACB_USE_HWSTAMP) && (bp->caps & MACB_CAPS_GEM_HAS_PTP); } +/* ENST Helper functions */ +static inline u64 enst_ns_to_hw_units(size_t ns, u32 speed_mbps) +{ + return DIV_ROUND_UP((ns) * (speed_mbps), + (ENST_TIME_GRANULARITY_NS * 1000)); +} + +static inline u64 enst_max_hw_interval(u32 speed_mbps) +{ + return DIV_ROUND_UP(GENMASK(GEM_ON_TIME_SIZE - 1, 0) * + ENST_TIME_GRANULARITY_NS * 1000, (speed_mbps)); +} + /** * struct macb_platform_data - platform data for MACB Ethernet used for PCI registration * @pclk: platform clock @@ -1407,4 +1456,21 @@ struct macb_platform_data { struct clk *hclk; }; +/** + * struct macb_queue_enst_config - Configuration for Enhanced Scheduled Traffic + * @start_time_mask: Bitmask representing the start time for the queue + * @on_time_bytes: "on" time nsec expressed in bytes + * @off_time_bytes: "off" time nsec expressed in bytes + * @queue_id: Identifier for the queue + * + * This structure holds the configuration parameters for an ENST queue, + * used to control time-based transmission scheduling in the MACB driver. + */ +struct macb_queue_enst_config { + u32 start_time_mask; + u32 on_time_bytes; + u32 off_time_bytes; + u8 queue_id; +}; + #endif /* _MACB_H */ diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index ce55a1f59b50..a42304eb2bf8 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -36,6 +36,7 @@ #include #include #include +#include #include "macb.h" /* This structure is only used for MACB on SiFive FU540 devices */ @@ -4084,6 +4085,223 @@ static void macb_restore_features(struct macb *bp) macb_set_rxflow_feature(bp, features); } +static int macb_taprio_setup_replace(struct net_device *ndev, + struct tc_taprio_qopt_offload *conf) +{ + u64 total_on_time = 0, start_time_sec = 0, start_time = conf->base_time; + u32 configured_queues = 0, speed = 0, start_time_nsec; + struct macb_queue_enst_config *enst_queue; + struct tc_taprio_sched_entry *entry; + struct macb *bp = netdev_priv(ndev); + struct ethtool_link_ksettings kset; + struct macb_queue *queue; + size_t i; + int err; + + if (conf->num_entries > bp->num_queues) { + netdev_err(ndev, "Too many TAPRIO entries: %zu > %d queues\n", + conf->num_entries, bp->num_queues); + return -EINVAL; + } + + if (start_time < 0) { + netdev_err(ndev, "Invalid base_time: must be 0 or positive, got %lld\n", + conf->base_time); + return -ERANGE; + } + + /* Get the current link speed */ + err = phylink_ethtool_ksettings_get(bp->phylink, &kset); + if (unlikely(err)) { + netdev_err(ndev, "Failed to get link settings: %d\n", err); + return err; + } + + speed = kset.base.speed; + if (unlikely(speed <= 0)) { + netdev_err(ndev, "Invalid speed: %d\n", speed); + return -EINVAL; + } + + enst_queue = kcalloc(conf->num_entries, sizeof(*enst_queue), GFP_KERNEL); + if (unlikely(!enst_queue)) + return -ENOMEM; + + /* Pre-validate all entries before making any hardware changes */ + for (i = 0; i < conf->num_entries; i++) { + entry = &conf->entries[i]; + + if (entry->command != TC_TAPRIO_CMD_SET_GATES) { + netdev_err(ndev, "Entry %zu: unsupported command %d\n", + i, entry->command); + err = -EOPNOTSUPP; + goto cleanup; + } + + /* Validate gate_mask: must be nonzero, single queue, and within range */ + if (!is_power_of_2(entry->gate_mask)) { + netdev_err(ndev, "Entry %zu: gate_mask 0x%x is not a power of 2 (only one queue per entry allowed)\n", + i, entry->gate_mask); + err = -EINVAL; + goto cleanup; + } + + /* gate_mask must not select queues outside the valid queue_mask */ + if (entry->gate_mask & ~bp->queue_mask) { + netdev_err(ndev, "Entry %zu: gate_mask 0x%x exceeds queue range (max_queues=%d)\n", + i, entry->gate_mask, bp->num_queues); + err = -EINVAL; + goto cleanup; + } + + /* Check for start time limits */ + start_time_sec = start_time; + start_time_nsec = do_div(start_time_sec, NSEC_PER_SEC); + if (start_time_sec > GENMASK(GEM_START_TIME_SEC_SIZE - 1, 0)) { + netdev_err(ndev, "Entry %zu: Start time %llu s exceeds hardware limit\n", + i, start_time_sec); + err = -ERANGE; + goto cleanup; + } + + /* Check for on time limit */ + if (entry->interval > enst_max_hw_interval(speed)) { + netdev_err(ndev, "Entry %zu: interval %u ns exceeds hardware limit %llu ns\n", + i, entry->interval, enst_max_hw_interval(speed)); + err = -ERANGE; + goto cleanup; + } + + /* Check for off time limit*/ + if ((conf->cycle_time - entry->interval) > enst_max_hw_interval(speed)) { + netdev_err(ndev, "Entry %zu: off_time %llu ns exceeds hardware limit %llu ns\n", + i, conf->cycle_time - entry->interval, + enst_max_hw_interval(speed)); + err = -ERANGE; + goto cleanup; + } + + enst_queue[i].queue_id = order_base_2(entry->gate_mask); + enst_queue[i].start_time_mask = + (start_time_sec << GEM_START_TIME_SEC_OFFSET) | + start_time_nsec; + enst_queue[i].on_time_bytes = + enst_ns_to_hw_units(entry->interval, speed); + enst_queue[i].off_time_bytes = + enst_ns_to_hw_units(conf->cycle_time - entry->interval, speed); + + configured_queues |= entry->gate_mask; + total_on_time += entry->interval; + start_time += entry->interval; + } + + /* Check total interval doesn't exceed cycle time */ + if (total_on_time > conf->cycle_time) { + netdev_err(ndev, "Total ON %llu ns exceeds cycle time %llu ns\n", + total_on_time, conf->cycle_time); + err = -EINVAL; + goto cleanup; + } + + netdev_dbg(ndev, "TAPRIO setup: %zu entries, base_time=%lld ns, cycle_time=%llu ns\n", + conf->num_entries, conf->base_time, conf->cycle_time); + + /* All validations passed - proceed with hardware configuration */ + scoped_guard(spinlock_irqsave, &bp->lock) { + /* Disable ENST queues if running before configuring */ + gem_writel(bp, ENST_CONTROL, + bp->queue_mask << GEM_ENST_DISABLE_QUEUE_OFFSET); + + for (i = 0; i < conf->num_entries; i++) { + queue = &bp->queues[enst_queue[i].queue_id]; + /* Configure queue timing registers */ + queue_writel(queue, ENST_START_TIME, + enst_queue[i].start_time_mask); + queue_writel(queue, ENST_ON_TIME, + enst_queue[i].on_time_bytes); + queue_writel(queue, ENST_OFF_TIME, + enst_queue[i].off_time_bytes); + } + + /* Enable ENST for all configured queues in one write */ + gem_writel(bp, ENST_CONTROL, configured_queues); + } + + netdev_info(ndev, "TAPRIO configuration completed successfully: %zu entries, %d queues configured\n", + conf->num_entries, hweight32(configured_queues)); + +cleanup: + kfree(enst_queue); + return err; +} + +static void macb_taprio_destroy(struct net_device *ndev) +{ + struct macb *bp = netdev_priv(ndev); + struct macb_queue *queue; + u32 enst_disable_mask; + unsigned int q; + + netdev_reset_tc(ndev); + enst_disable_mask = bp->queue_mask << GEM_ENST_DISABLE_QUEUE_OFFSET; + + scoped_guard(spinlock_irqsave, &bp->lock) { + /* Single disable command for all queues */ + gem_writel(bp, ENST_CONTROL, enst_disable_mask); + + /* Clear all queue ENST registers in batch */ + for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) { + queue_writel(queue, ENST_START_TIME, 0); + queue_writel(queue, ENST_ON_TIME, 0); + queue_writel(queue, ENST_OFF_TIME, 0); + } + } + netdev_info(ndev, "TAPRIO destroy: All gates disabled\n"); +} + +static int macb_setup_taprio(struct net_device *ndev, + struct tc_taprio_qopt_offload *taprio) +{ + struct macb *bp = netdev_priv(ndev); + int err = 0; + + if (unlikely(!(ndev->hw_features & NETIF_F_HW_TC))) + return -EOPNOTSUPP; + + /* Check if Device is in runtime suspend */ + if (unlikely(pm_runtime_suspended(&bp->pdev->dev))) { + netdev_err(ndev, "Device is in runtime suspend\n"); + return -EOPNOTSUPP; + } + + switch (taprio->cmd) { + case TAPRIO_CMD_REPLACE: + err = macb_taprio_setup_replace(ndev, taprio); + break; + case TAPRIO_CMD_DESTROY: + macb_taprio_destroy(ndev); + break; + default: + err = -EOPNOTSUPP; + } + + return err; +} + +static int macb_setup_tc(struct net_device *dev, enum tc_setup_type type, + void *type_data) +{ + if (!dev || !type_data) + return -EINVAL; + + switch (type) { + case TC_SETUP_QDISC_TAPRIO: + return macb_setup_taprio(dev, type_data); + default: + return -EOPNOTSUPP; + } +} + static const struct net_device_ops macb_netdev_ops = { .ndo_open = macb_open, .ndo_stop = macb_close, @@ -4101,6 +4319,7 @@ static const struct net_device_ops macb_netdev_ops = { .ndo_features_check = macb_features_check, .ndo_hwtstamp_set = macb_hwtstamp_set, .ndo_hwtstamp_get = macb_hwtstamp_get, + .ndo_setup_tc = macb_setup_tc, }; /* Configure peripheral capabilities according to device tree @@ -4327,6 +4546,10 @@ static int macb_init(struct platform_device *pdev) #endif } + queue->ENST_START_TIME = GEM_ENST_START_TIME(hw_q); + queue->ENST_ON_TIME = GEM_ENST_ON_TIME(hw_q); + queue->ENST_OFF_TIME = GEM_ENST_OFF_TIME(hw_q); + /* get irq: here we use the linux queue index, not the hardware * queue index. the queue irq definitions in the device tree * must remove the optional gaps that could exist in the From d739ce4bebf4c708020a900548e36d005236388f Mon Sep 17 00:00:00 2001 From: Vineeth Karumanchi Date: Thu, 14 Aug 2025 12:40:58 +0530 Subject: [PATCH 0151/1536] net: macb: Add capability-based QBV detection and Versal support The 'exclude_qbv' bit in the designcfg_debug1 register varies across MACB/GEM IP revisions, making direct probing unreliable for detecting QBV support. This patch introduces a capability-based approach for consistent QBV feature identification across the IP family. Platform support updates: - Establish foundation for QBV detection in TAPRIO implementation - Enable MACB_CAPS_QBV for Xilinx Versal platform configuration - Fix capability line wrapping, ensuring code stays within 80 columns Signed-off-by: Vineeth Karumanchi Link: https://patch.msgid.link/20250814071058.3062453-3-vineeth.karumanchi@amd.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/cadence/macb.h | 1 + drivers/net/ethernet/cadence/macb_main.c | 9 +++++++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index d1a98b45c92c..904954610611 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -770,6 +770,7 @@ #define MACB_CAPS_MIIONRGMII 0x00000200 #define MACB_CAPS_NEED_TSUCLK 0x00000400 #define MACB_CAPS_QUEUE_DISABLE 0x00000800 +#define MACB_CAPS_QBV 0x00001000 #define MACB_CAPS_PCS 0x01000000 #define MACB_CAPS_HIGH_SPEED 0x02000000 #define MACB_CAPS_CLK_HW_CHG 0x04000000 diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index a42304eb2bf8..124ef0b0bcad 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -4602,6 +4602,10 @@ static int macb_init(struct platform_device *pdev) dev->hw_features |= NETIF_F_HW_CSUM | NETIF_F_RXCSUM; if (bp->caps & MACB_CAPS_SG_DISABLED) dev->hw_features &= ~NETIF_F_SG; + /* Enable HW_TC if hardware supports QBV */ + if (bp->caps & MACB_CAPS_QBV) + dev->hw_features |= NETIF_F_HW_TC; + dev->features = dev->hw_features; /* Check RX Flow Filters support. @@ -5354,8 +5358,9 @@ static const struct macb_config sama7g5_emac_config = { static const struct macb_config versal_config = { .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_JUMBO | - MACB_CAPS_GEM_HAS_PTP | MACB_CAPS_BD_RD_PREFETCH | MACB_CAPS_NEED_TSUCLK | - MACB_CAPS_QUEUE_DISABLE, + MACB_CAPS_GEM_HAS_PTP | MACB_CAPS_BD_RD_PREFETCH | + MACB_CAPS_NEED_TSUCLK | MACB_CAPS_QUEUE_DISABLE | + MACB_CAPS_QBV, .dma_burst_length = 16, .clk_init = macb_clk_init, .init = init_reset_optional, From a8bdd935d1ddb7186358fb60ffe84253e85340c8 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Thu, 14 Aug 2025 09:51:16 +0200 Subject: [PATCH 0152/1536] net: airoha: Add wlan flowtable TX offload Introduce support to offload the traffic received on the ethernet NIC and forwarded to the wireless one using HW Packet Processor Engine (PPE) capabilities. Signed-off-by: Lorenzo Bianconi Link: https://patch.msgid.link/20250814-airoha-en7581-wlan-tx-offload-v1-1-72e0a312003e@kernel.org Signed-off-by: Paolo Abeni --- drivers/net/ethernet/airoha/airoha_eth.h | 11 +++ drivers/net/ethernet/airoha/airoha_ppe.c | 95 +++++++++++++++++------- 2 files changed, 81 insertions(+), 25 deletions(-) diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h index a970b789cf23..9f721e2b972f 100644 --- a/drivers/net/ethernet/airoha/airoha_eth.h +++ b/drivers/net/ethernet/airoha/airoha_eth.h @@ -252,6 +252,10 @@ enum { #define AIROHA_FOE_MAC_SMAC_ID GENMASK(20, 16) #define AIROHA_FOE_MAC_PPPOE_ID GENMASK(15, 0) +#define AIROHA_FOE_MAC_WDMA_QOS GENMASK(15, 12) +#define AIROHA_FOE_MAC_WDMA_BAND BIT(11) +#define AIROHA_FOE_MAC_WDMA_WCID GENMASK(10, 0) + struct airoha_foe_mac_info_common { u16 vlan1; u16 etype; @@ -481,6 +485,13 @@ struct airoha_flow_table_entry { unsigned long cookie; }; +struct airoha_wdma_info { + u8 idx; + u8 queue; + u16 wcid; + u8 bss; +}; + /* RX queue to IRQ mapping: BIT(q) in IRQ(n) */ #define RX_IRQ0_BANK_PIN_MASK 0x839f #define RX_IRQ1_BANK_PIN_MASK 0x7fe00000 diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c index 82163392332c..2bf1c584ba7b 100644 --- a/drivers/net/ethernet/airoha/airoha_ppe.c +++ b/drivers/net/ethernet/airoha/airoha_ppe.c @@ -190,6 +190,31 @@ static int airoha_ppe_flow_mangle_ipv4(const struct flow_action_entry *act, return 0; } +static int airoha_ppe_get_wdma_info(struct net_device *dev, const u8 *addr, + struct airoha_wdma_info *info) +{ + struct net_device_path_stack stack; + struct net_device_path *path; + int err; + + if (!dev) + return -ENODEV; + + err = dev_fill_forward_path(dev, addr, &stack); + if (err) + return err; + + path = &stack.path[stack.num_paths - 1]; + if (path->type != DEV_PATH_MTK_WDMA) + return -1; + + info->idx = path->mtk_wdma.wdma_idx; + info->bss = path->mtk_wdma.bss; + info->wcid = path->mtk_wdma.wcid; + + return 0; +} + static int airoha_get_dsa_port(struct net_device **dev) { #if IS_ENABLED(CONFIG_NET_DSA) @@ -220,9 +245,9 @@ static int airoha_ppe_foe_entry_prepare(struct airoha_eth *eth, struct airoha_flow_data *data, int l4proto) { - int dsa_port = airoha_get_dsa_port(&dev); + u32 qdata = FIELD_PREP(AIROHA_FOE_SHAPER_ID, 0x7f), ports_pad, val; + int wlan_etype = -EINVAL, dsa_port = airoha_get_dsa_port(&dev); struct airoha_foe_mac_info_common *l2; - u32 qdata, ports_pad, val; u8 smac_id = 0xf; memset(hwe, 0, sizeof(*hwe)); @@ -236,31 +261,47 @@ static int airoha_ppe_foe_entry_prepare(struct airoha_eth *eth, AIROHA_FOE_IB1_BIND_TTL; hwe->ib1 = val; - val = FIELD_PREP(AIROHA_FOE_IB2_PORT_AG, 0x1f) | - AIROHA_FOE_IB2_PSE_QOS; - if (dsa_port >= 0) - val |= FIELD_PREP(AIROHA_FOE_IB2_NBQ, dsa_port); - + val = FIELD_PREP(AIROHA_FOE_IB2_PORT_AG, 0x1f); if (dev) { - struct airoha_gdm_port *port = netdev_priv(dev); - u8 pse_port; + struct airoha_wdma_info info = {}; - if (!airoha_is_valid_gdm_port(eth, port)) - return -EINVAL; + if (!airoha_ppe_get_wdma_info(dev, data->eth.h_dest, &info)) { + val |= FIELD_PREP(AIROHA_FOE_IB2_NBQ, info.idx) | + FIELD_PREP(AIROHA_FOE_IB2_PSE_PORT, + FE_PSE_PORT_CDM4); + qdata |= FIELD_PREP(AIROHA_FOE_ACTDP, info.bss); + wlan_etype = FIELD_PREP(AIROHA_FOE_MAC_WDMA_BAND, + info.idx) | + FIELD_PREP(AIROHA_FOE_MAC_WDMA_WCID, + info.wcid); + } else { + struct airoha_gdm_port *port = netdev_priv(dev); + u8 pse_port; - if (dsa_port >= 0) - pse_port = port->id == 4 ? FE_PSE_PORT_GDM4 : port->id; - else - pse_port = 2; /* uplink relies on GDM2 loopback */ - val |= FIELD_PREP(AIROHA_FOE_IB2_PSE_PORT, pse_port); + if (!airoha_is_valid_gdm_port(eth, port)) + return -EINVAL; - /* For downlink traffic consume SRAM memory for hw forwarding - * descriptors queue. - */ - if (airhoa_is_lan_gdm_port(port)) - val |= AIROHA_FOE_IB2_FAST_PATH; + if (dsa_port >= 0) + pse_port = port->id == 4 ? FE_PSE_PORT_GDM4 + : port->id; + else + pse_port = 2; /* uplink relies on GDM2 + * loopback + */ - smac_id = port->id; + val |= FIELD_PREP(AIROHA_FOE_IB2_PSE_PORT, pse_port) | + AIROHA_FOE_IB2_PSE_QOS; + /* For downlink traffic consume SRAM memory for hw + * forwarding descriptors queue. + */ + if (airhoa_is_lan_gdm_port(port)) + val |= AIROHA_FOE_IB2_FAST_PATH; + if (dsa_port >= 0) + val |= FIELD_PREP(AIROHA_FOE_IB2_NBQ, + dsa_port); + + smac_id = port->id; + } } if (is_multicast_ether_addr(data->eth.h_dest)) @@ -272,7 +313,6 @@ static int airoha_ppe_foe_entry_prepare(struct airoha_eth *eth, if (type == PPE_PKT_TYPE_IPV6_ROUTE_3T) hwe->ipv6.ports = ports_pad; - qdata = FIELD_PREP(AIROHA_FOE_SHAPER_ID, 0x7f); if (type == PPE_PKT_TYPE_BRIDGE) { airoha_ppe_foe_set_bridge_addrs(&hwe->bridge, &data->eth); hwe->bridge.data = qdata; @@ -313,7 +353,9 @@ static int airoha_ppe_foe_entry_prepare(struct airoha_eth *eth, l2->vlan2 = data->vlan.hdr[1].id; } - if (dsa_port >= 0) { + if (wlan_etype >= 0) { + l2->etype = wlan_etype; + } else if (dsa_port >= 0) { l2->etype = BIT(dsa_port); l2->etype |= !data->vlan.num ? BIT(15) : 0; } else if (data->pppoe.num) { @@ -490,6 +532,10 @@ static void airoha_ppe_foe_flow_stats_update(struct airoha_ppe *ppe, meter = &hwe->ipv4.l2.meter; } + pse_port = FIELD_GET(AIROHA_FOE_IB2_PSE_PORT, *ib2); + if (pse_port == FE_PSE_PORT_CDM4) + return; + airoha_ppe_foe_flow_stat_entry_reset(ppe, npu, index); val = FIELD_GET(AIROHA_FOE_CHANNEL | AIROHA_FOE_QID, *data); @@ -500,7 +546,6 @@ static void airoha_ppe_foe_flow_stats_update(struct airoha_ppe *ppe, AIROHA_FOE_IB2_PSE_QOS | AIROHA_FOE_IB2_FAST_PATH); *meter |= FIELD_PREP(AIROHA_FOE_TUNNEL_MTU, val); - pse_port = FIELD_GET(AIROHA_FOE_IB2_PSE_PORT, *ib2); nbq = pse_port == 1 ? 6 : 5; *ib2 &= ~(AIROHA_FOE_IB2_NBQ | AIROHA_FOE_IB2_PSE_PORT | AIROHA_FOE_IB2_PSE_QOS); From 730ff06d3f5cc2ce0348414b78c10528b767d4a3 Mon Sep 17 00:00:00 2001 From: Dipayaan Roy Date: Thu, 14 Aug 2025 07:04:10 -0700 Subject: [PATCH 0153/1536] net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency. This patch enhances RX buffer handling in the mana driver by allocating pages from a page pool and slicing them into MTU-sized fragments, rather than dedicating a full page per packet. This approach is especially beneficial on systems with large base page sizes like 64KB. Key improvements: - Proper integration of page pool for RX buffer allocations. - MTU-sized buffer slicing to improve memory utilization. - Reduce overall per Rx queue memory footprint. - Automatic fallback to full-page buffers when: * Jumbo frames are enabled (MTU > PAGE_SIZE / 2). * The XDP path is active, to avoid complexities with fragment reuse. Testing on VMs with 64KB pages shows around 200% throughput improvement. Memory efficiency is significantly improved due to reduced wastage in page allocations. Example: We are now able to fit 35 rx buffers in a single 64kb page for MTU size of 1500, instead of 1 rx buffer per page previously. Tested: - iperf3, iperf2, and nttcp benchmarks. - Jumbo frames with MTU 9000. - Native XDP programs (XDP_PASS, XDP_DROP, XDP_TX, XDP_REDIRECT) for testing the XDP path in driver. - Memory leak detection (kmemleak). - Driver load/unload, reboot, and stress scenarios. Reviewed-by: Jacob Keller Reviewed-by: Saurabh Sengar Reviewed-by: Haiyang Zhang Signed-off-by: Dipayaan Roy Link: https://patch.msgid.link/20250814140410.GA22089@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Paolo Abeni --- .../net/ethernet/microsoft/mana/mana_bpf.c | 46 +++++- drivers/net/ethernet/microsoft/mana/mana_en.c | 153 ++++++++++++------ include/net/mana/mana.h | 4 + 3 files changed, 151 insertions(+), 52 deletions(-) diff --git a/drivers/net/ethernet/microsoft/mana/mana_bpf.c b/drivers/net/ethernet/microsoft/mana/mana_bpf.c index d30721d4516f..7697c9b52ed3 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_bpf.c +++ b/drivers/net/ethernet/microsoft/mana/mana_bpf.c @@ -174,6 +174,7 @@ static int mana_xdp_set(struct net_device *ndev, struct bpf_prog *prog, struct mana_port_context *apc = netdev_priv(ndev); struct bpf_prog *old_prog; struct gdma_context *gc; + int err; gc = apc->ac->gdma_dev->gdma_context; @@ -195,18 +196,57 @@ static int mana_xdp_set(struct net_device *ndev, struct bpf_prog *prog, */ apc->bpf_prog = prog; + if (apc->port_is_up) { + /* Re-create rxq's after xdp prog was loaded or unloaded. + * Ex: re create rxq's to switch from full pages to smaller + * size page fragments when xdp prog is unloaded and + * vice-versa. + */ + + /* Pre-allocate buffers to prevent failure in mana_attach */ + err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues); + if (err) { + NL_SET_ERR_MSG_MOD(extack, + "XDP: Insufficient memory for tx/rx re-config"); + return err; + } + + err = mana_detach(ndev, false); + if (err) { + netdev_err(ndev, + "mana_detach failed at xdp set: %d\n", err); + NL_SET_ERR_MSG_MOD(extack, + "XDP: Re-config failed at detach"); + goto err_dealloc_rxbuffs; + } + + err = mana_attach(ndev); + if (err) { + netdev_err(ndev, + "mana_attach failed at xdp set: %d\n", err); + NL_SET_ERR_MSG_MOD(extack, + "XDP: Re-config failed at attach"); + goto err_dealloc_rxbuffs; + } + + mana_chn_setxdp(apc, prog); + mana_pre_dealloc_rxbufs(apc); + } + if (old_prog) bpf_prog_put(old_prog); - if (apc->port_is_up) - mana_chn_setxdp(apc, prog); - if (prog) ndev->max_mtu = MANA_XDP_MTU_MAX; else ndev->max_mtu = gc->adapter_mtu - ETH_HLEN; return 0; + +err_dealloc_rxbuffs: + apc->bpf_prog = old_prog; + mana_pre_dealloc_rxbufs(apc); + return err; } int mana_bpf(struct net_device *ndev, struct netdev_bpf *bpf) diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index 550843e2164b..f4fc86f20213 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -57,6 +57,15 @@ static bool mana_en_need_log(struct mana_port_context *apc, int err) return true; } +static void mana_put_rx_page(struct mana_rxq *rxq, struct page *page, + bool from_pool) +{ + if (from_pool) + page_pool_put_full_page(rxq->page_pool, page, false); + else + put_page(page); +} + /* Microsoft Azure Network Adapter (MANA) functions */ static int mana_open(struct net_device *ndev) @@ -630,21 +639,40 @@ static void *mana_get_rxbuf_pre(struct mana_rxq *rxq, dma_addr_t *da) } /* Get RX buffer's data size, alloc size, XDP headroom based on MTU */ -static void mana_get_rxbuf_cfg(int mtu, u32 *datasize, u32 *alloc_size, - u32 *headroom) +static void mana_get_rxbuf_cfg(struct mana_port_context *apc, + int mtu, u32 *datasize, u32 *alloc_size, + u32 *headroom, u32 *frag_count) { - if (mtu > MANA_XDP_MTU_MAX) - *headroom = 0; /* no support for XDP */ - else - *headroom = XDP_PACKET_HEADROOM; - - *alloc_size = SKB_DATA_ALIGN(mtu + MANA_RXBUF_PAD + *headroom); - - /* Using page pool in this case, so alloc_size is PAGE_SIZE */ - if (*alloc_size < PAGE_SIZE) - *alloc_size = PAGE_SIZE; + u32 len, buf_size; + /* Calculate datasize first (consistent across all cases) */ *datasize = mtu + ETH_HLEN; + + /* For xdp and jumbo frames make sure only one packet fits per page */ + if (mtu + MANA_RXBUF_PAD > PAGE_SIZE / 2 || mana_xdp_get(apc)) { + if (mana_xdp_get(apc)) { + *headroom = XDP_PACKET_HEADROOM; + *alloc_size = PAGE_SIZE; + } else { + *headroom = 0; /* no support for XDP */ + *alloc_size = SKB_DATA_ALIGN(mtu + MANA_RXBUF_PAD + + *headroom); + } + + *frag_count = 1; + return; + } + + /* Standard MTU case - optimize for multiple packets per page */ + *headroom = 0; + + /* Calculate base buffer size needed */ + len = SKB_DATA_ALIGN(mtu + MANA_RXBUF_PAD + *headroom); + buf_size = ALIGN(len, MANA_RX_FRAG_ALIGNMENT); + + /* Calculate how many packets can fit in a page */ + *frag_count = PAGE_SIZE / buf_size; + *alloc_size = buf_size; } int mana_pre_alloc_rxbufs(struct mana_port_context *mpc, int new_mtu, int num_queues) @@ -656,8 +684,9 @@ int mana_pre_alloc_rxbufs(struct mana_port_context *mpc, int new_mtu, int num_qu void *va; int i; - mana_get_rxbuf_cfg(new_mtu, &mpc->rxbpre_datasize, - &mpc->rxbpre_alloc_size, &mpc->rxbpre_headroom); + mana_get_rxbuf_cfg(mpc, new_mtu, &mpc->rxbpre_datasize, + &mpc->rxbpre_alloc_size, &mpc->rxbpre_headroom, + &mpc->rxbpre_frag_count); dev = mpc->ac->gdma_dev->gdma_context->dev; @@ -1842,8 +1871,11 @@ drop_xdp: drop: if (from_pool) { - page_pool_recycle_direct(rxq->page_pool, - virt_to_head_page(buf_va)); + if (rxq->frag_count == 1) + page_pool_recycle_direct(rxq->page_pool, + virt_to_head_page(buf_va)); + else + page_pool_free_va(rxq->page_pool, buf_va, true); } else { WARN_ON_ONCE(rxq->xdp_save_va); /* Save for reuse */ @@ -1859,33 +1891,46 @@ static void *mana_get_rxfrag(struct mana_rxq *rxq, struct device *dev, dma_addr_t *da, bool *from_pool) { struct page *page; + u32 offset; void *va; - *from_pool = false; - /* Reuse XDP dropped page if available */ - if (rxq->xdp_save_va) { - va = rxq->xdp_save_va; - rxq->xdp_save_va = NULL; - } else { - page = page_pool_dev_alloc_pages(rxq->page_pool); - if (!page) + /* Don't use fragments for jumbo frames or XDP where it's 1 fragment + * per page. + */ + if (rxq->frag_count == 1) { + /* Reuse XDP dropped page if available */ + if (rxq->xdp_save_va) { + va = rxq->xdp_save_va; + page = virt_to_head_page(va); + rxq->xdp_save_va = NULL; + } else { + page = page_pool_dev_alloc_pages(rxq->page_pool); + if (!page) + return NULL; + + *from_pool = true; + va = page_to_virt(page); + } + + *da = dma_map_single(dev, va + rxq->headroom, rxq->datasize, + DMA_FROM_DEVICE); + if (dma_mapping_error(dev, *da)) { + mana_put_rx_page(rxq, page, *from_pool); return NULL; + } - *from_pool = true; - va = page_to_virt(page); + return va; } - *da = dma_map_single(dev, va + rxq->headroom, rxq->datasize, - DMA_FROM_DEVICE); - if (dma_mapping_error(dev, *da)) { - if (*from_pool) - page_pool_put_full_page(rxq->page_pool, page, false); - else - put_page(virt_to_head_page(va)); - + page = page_pool_dev_alloc_frag(rxq->page_pool, &offset, + rxq->alloc_size); + if (!page) return NULL; - } + + va = page_to_virt(page) + offset; + *da = page_pool_get_dma_addr(page) + offset + rxq->headroom; + *from_pool = true; return va; } @@ -1902,9 +1947,9 @@ static void mana_refill_rx_oob(struct device *dev, struct mana_rxq *rxq, va = mana_get_rxfrag(rxq, dev, &da, &from_pool); if (!va) return; - - dma_unmap_single(dev, rxoob->sgl[0].address, rxq->datasize, - DMA_FROM_DEVICE); + if (!rxoob->from_pool || rxq->frag_count == 1) + dma_unmap_single(dev, rxoob->sgl[0].address, rxq->datasize, + DMA_FROM_DEVICE); *old_buf = rxoob->buf_va; *old_fp = rxoob->from_pool; @@ -2315,15 +2360,15 @@ static void mana_destroy_rxq(struct mana_port_context *apc, if (!rx_oob->buf_va) continue; - dma_unmap_single(dev, rx_oob->sgl[0].address, - rx_oob->sgl[0].size, DMA_FROM_DEVICE); - page = virt_to_head_page(rx_oob->buf_va); - if (rx_oob->from_pool) - page_pool_put_full_page(rxq->page_pool, page, false); - else - put_page(page); + if (rxq->frag_count == 1 || !rx_oob->from_pool) { + dma_unmap_single(dev, rx_oob->sgl[0].address, + rx_oob->sgl[0].size, DMA_FROM_DEVICE); + mana_put_rx_page(rxq, page, rx_oob->from_pool); + } else { + page_pool_free_va(rxq->page_pool, rx_oob->buf_va, true); + } rx_oob->buf_va = NULL; } @@ -2429,11 +2474,22 @@ static int mana_create_page_pool(struct mana_rxq *rxq, struct gdma_context *gc) struct page_pool_params pprm = {}; int ret; - pprm.pool_size = mpc->rx_queue_size; + pprm.pool_size = mpc->rx_queue_size / rxq->frag_count + 1; pprm.nid = gc->numa_node; pprm.napi = &rxq->rx_cq.napi; pprm.netdev = rxq->ndev; pprm.order = get_order(rxq->alloc_size); + pprm.queue_idx = rxq->rxq_idx; + pprm.dev = gc->dev; + + /* Let the page pool do the dma map when page sharing with multiple + * fragments enabled for rx buffers. + */ + if (rxq->frag_count > 1) { + pprm.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV; + pprm.max_len = PAGE_SIZE; + pprm.dma_dir = DMA_FROM_DEVICE; + } rxq->page_pool = page_pool_create(&pprm); @@ -2472,9 +2528,8 @@ static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc, rxq->rxq_idx = rxq_idx; rxq->rxobj = INVALID_MANA_HANDLE; - mana_get_rxbuf_cfg(ndev->mtu, &rxq->datasize, &rxq->alloc_size, - &rxq->headroom); - + mana_get_rxbuf_cfg(apc, ndev->mtu, &rxq->datasize, &rxq->alloc_size, + &rxq->headroom, &rxq->frag_count); /* Create page pool for RX queue */ err = mana_create_page_pool(rxq, gc); if (err) { diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h index e1030a7d2daa..0921485565c0 100644 --- a/include/net/mana/mana.h +++ b/include/net/mana/mana.h @@ -65,6 +65,8 @@ enum TRI_STATE { #define MANA_STATS_RX_COUNT 5 #define MANA_STATS_TX_COUNT 11 +#define MANA_RX_FRAG_ALIGNMENT 64 + struct mana_stats_rx { u64 packets; u64 bytes; @@ -328,6 +330,7 @@ struct mana_rxq { u32 datasize; u32 alloc_size; u32 headroom; + u32 frag_count; mana_handle_t rxobj; @@ -510,6 +513,7 @@ struct mana_port_context { u32 rxbpre_datasize; u32 rxbpre_alloc_size; u32 rxbpre_headroom; + u32 rxbpre_frag_count; struct bpf_prog *bpf_prog; From 0283b8f134e499a51bb972403eb47cda6b960f7c Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 14 Aug 2025 18:33:14 -0700 Subject: [PATCH 0154/1536] selftests: drv-net: test the napi init state Test that threaded state (in the persistent NAPI config) gets updated even when NAPI with given ID is not allocated at the time. This test is validating commit ccba9f6baa90 ("net: update NAPI threaded config even for disabled NAPIs"). Signed-off-by: Jakub Kicinski Reviewed-by: Joe Damato Link: https://patch.msgid.link/20250815013314.2237512-1-kuba@kernel.org Signed-off-by: Paolo Abeni --- .../selftests/drivers/net/napi_threaded.py | 31 ++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/drivers/net/napi_threaded.py b/tools/testing/selftests/drivers/net/napi_threaded.py index 9699a100a87d..ed66efa481b0 100755 --- a/tools/testing/selftests/drivers/net/napi_threaded.py +++ b/tools/testing/selftests/drivers/net/napi_threaded.py @@ -38,6 +38,34 @@ def _setup_deferred_cleanup(cfg) -> None: return combined +def napi_init(cfg, nl) -> None: + """ + Test that threaded state (in the persistent NAPI config) gets updated + even when NAPI with given ID is not allocated at the time. + """ + + qcnt = _setup_deferred_cleanup(cfg) + + _set_threaded_state(cfg, 1) + cmd(f"ethtool -L {cfg.ifname} combined 1") + _set_threaded_state(cfg, 0) + cmd(f"ethtool -L {cfg.ifname} combined {qcnt}") + + napis = nl.napi_get({'ifindex': cfg.ifindex}, dump=True) + for napi in napis: + ksft_eq(napi['threaded'], 'disabled') + ksft_eq(napi.get('pid'), None) + + cmd(f"ethtool -L {cfg.ifname} combined 1") + _set_threaded_state(cfg, 1) + cmd(f"ethtool -L {cfg.ifname} combined {qcnt}") + + napis = nl.napi_get({'ifindex': cfg.ifindex}, dump=True) + for napi in napis: + ksft_eq(napi['threaded'], 'enabled') + ksft_ne(napi.get('pid'), None) + + def enable_dev_threaded_disable_napi_threaded(cfg, nl) -> None: """ Test that when napi threaded is enabled at device level and @@ -103,7 +131,8 @@ def main() -> None: """ Ksft boiler plate main """ with NetDrvEnv(__file__, queue_count=2) as cfg: - ksft_run([change_num_queues, + ksft_run([napi_init, + change_num_queues, enable_dev_threaded_disable_napi_threaded], args=(cfg, NetdevFamily())) ksft_exit() From da114122b83149d1f1db0586b1d67947b651aa20 Mon Sep 17 00:00:00 2001 From: Chaoyi Chen Date: Fri, 15 Aug 2025 10:35:15 +0800 Subject: [PATCH 0155/1536] net: ethernet: stmmac: dwmac-rk: Make the clk_phy could be used for external phy For external phy, clk_phy should be optional, and some external phy need the clock input from clk_phy. This patch adds support for setting clk_phy for external phy. Signed-off-by: David Wu Signed-off-by: Chaoyi Chen Link: https://patch.msgid.link/20250815023515.114-1-kernel@airkyi.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c index 7c898768d544..9fc41207cc45 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c @@ -1412,12 +1412,15 @@ static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat) clk_set_rate(plat->stmmac_clk, 50000000); } - if (plat->phy_node && bsp_priv->integrated_phy) { + if (plat->phy_node) { bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); - if (ret) - return dev_err_probe(dev, ret, "Cannot get PHY clock\n"); - clk_set_rate(bsp_priv->clk_phy, 50000000); + /* If it is not integrated_phy, clk_phy is optional */ + if (bsp_priv->integrated_phy) { + if (ret) + return dev_err_probe(dev, ret, "Cannot get PHY clock\n"); + clk_set_rate(bsp_priv->clk_phy, 50000000); + } } return 0; From eddc821f98afaa13ecd09b02b73ac5946387ffcc Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 15 Aug 2025 15:41:00 -0700 Subject: [PATCH 0156/1536] selftests: drv-net: tso: increase the retransmit threshold We see quite a few flakes during the TSO test against virtualized devices in NIPA. There's often 10-30 retransmissions during the test. Sometimes as many as 100. Set the retransmission threshold at 1/4th of the wire frame target. Link: https://patch.msgid.link/20250815224100.363438-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- tools/testing/selftests/drivers/net/hw/tso.py | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hw/tso.py b/tools/testing/selftests/drivers/net/hw/tso.py index c13dd5efa27a..0998e68ebaf0 100755 --- a/tools/testing/selftests/drivers/net/hw/tso.py +++ b/tools/testing/selftests/drivers/net/hw/tso.py @@ -60,16 +60,17 @@ def run_one_stream(cfg, ipver, remote_v4, remote_v6, should_lso): sock_wait_drain(sock) qstat_new = cfg.netnl.qstats_get({"ifindex": cfg.ifindex}, dump=True)[0] - # No math behind the 10 here, but try to catch cases where - # TCP falls back to non-LSO. - ksft_lt(tcp_sock_get_retrans(sock), 10) - sock.close() - # Check that at least 90% of the data was sent as LSO packets. # System noise may cause false negatives. Also header overheads # will add up to 5% of extra packes... The check is best effort. total_lso_wire = len(buf) * 0.90 // cfg.dev["mtu"] total_lso_super = len(buf) * 0.90 // cfg.dev["tso_max_size"] + + # Make sure we have order of magnitude more LSO packets than + # retransmits, in case TCP retransmitted all the LSO packets. + ksft_lt(tcp_sock_get_retrans(sock), total_lso_wire / 4) + sock.close() + if should_lso: if cfg.have_stat_super_count: ksft_ge(qstat_new['tx-hw-gso-packets'] - From 51992f99f068fba966a680a9ac118b815f2fe08e Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 15 Aug 2025 16:15:13 -0700 Subject: [PATCH 0157/1536] selftests: drv-net: ncdevmem: make configure_channels() support combined channels ncdevmem tests that the kernel correctly rejects attempts to deactivate queues with MPs bound. Make the configure_channels() test support combined channels. Currently it tries to set the queue counts to rx N tx N-1, which only makes sense for devices which have IRQs per ring type. Most modern devices used combined IRQs/channels with both Rx and Tx queues. Since the math is total Rx == combined+Rx setting Rx when combined is non-zero will be increasing the total queue count, not decreasing as the test intends. Note that the test would previously also try to set the Tx ring count to Rx - 1, for some reason. Which would be 0 if the device has only 2 queues configured. With this change (device with 2 queues): setting channel count rx:1 tx:1 YNL set channels: Kernel error: 'requested channel counts are too low for existing memory provider setting (2)' Reviewed-by: Mina Almasry Link: https://patch.msgid.link/20250815231513.381652-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- .../selftests/drivers/net/hw/ncdevmem.c | 78 ++++++++++++++++++- 1 file changed, 76 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c index be937542b4c0..71961a7688e6 100644 --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c @@ -356,7 +356,81 @@ static int configure_rss(void) static int configure_channels(unsigned int rx, unsigned int tx) { - return run_command("ethtool -L %s rx %u tx %u", ifname, rx, tx); + struct ethtool_channels_get_req *gchan; + struct ethtool_channels_set_req *schan; + struct ethtool_channels_get_rsp *chan; + struct ynl_error yerr; + struct ynl_sock *ys; + int ret; + + fprintf(stderr, "setting channel count rx:%u tx:%u\n", rx, tx); + + ys = ynl_sock_create(&ynl_ethtool_family, &yerr); + if (!ys) { + fprintf(stderr, "YNL: %s\n", yerr.msg); + return -1; + } + + gchan = ethtool_channels_get_req_alloc(); + if (!gchan) { + ret = -1; + goto exit_close_sock; + } + + ethtool_channels_get_req_set_header_dev_index(gchan, ifindex); + chan = ethtool_channels_get(ys, gchan); + ethtool_channels_get_req_free(gchan); + if (!chan) { + fprintf(stderr, "YNL get channels: %s\n", ys->err.msg); + ret = -1; + goto exit_close_sock; + } + + schan = ethtool_channels_set_req_alloc(); + if (!schan) { + ret = -1; + goto exit_free_chan; + } + + ethtool_channels_set_req_set_header_dev_index(schan, ifindex); + + if (chan->_present.combined_count) { + if (chan->_present.rx_count || chan->_present.tx_count) { + ethtool_channels_set_req_set_rx_count(schan, 0); + ethtool_channels_set_req_set_tx_count(schan, 0); + } + + if (rx == tx) { + ethtool_channels_set_req_set_combined_count(schan, rx); + } else if (rx > tx) { + ethtool_channels_set_req_set_combined_count(schan, tx); + ethtool_channels_set_req_set_rx_count(schan, rx - tx); + } else { + ethtool_channels_set_req_set_combined_count(schan, rx); + ethtool_channels_set_req_set_tx_count(schan, tx - rx); + } + + ret = ethtool_channels_set(ys, schan); + if (ret) + fprintf(stderr, "YNL set channels: %s\n", ys->err.msg); + } else if (chan->_present.rx_count) { + ethtool_channels_set_req_set_rx_count(schan, rx); + ethtool_channels_set_req_set_tx_count(schan, tx); + + ret = ethtool_channels_set(ys, schan); + if (ret) + fprintf(stderr, "YNL set channels: %s\n", ys->err.msg); + } else { + fprintf(stderr, "Error: device has neither combined nor rx channels\n"); + ret = -1; + } + ethtool_channels_set_req_free(schan); +exit_free_chan: + ethtool_channels_get_rsp_free(chan); +exit_close_sock: + ynl_sock_destroy(ys); + + return ret; } static int configure_flow_steering(struct sockaddr_in6 *server_sin) @@ -752,7 +826,7 @@ void run_devmem_tests(void) error(1, 0, "Failed to bind\n"); /* Deactivating a bound queue should not be legal */ - if (!configure_channels(num_queues, num_queues - 1)) + if (!configure_channels(num_queues, num_queues)) error(1, 0, "Deactivating a bound queue should be illegal.\n"); /* Closing the netlink socket does an implicit unbind */ From 5236f57e7c033d869fe8f2080a977ea47882b26f Mon Sep 17 00:00:00 2001 From: Christoph Paasch Date: Sat, 16 Aug 2025 16:12:48 -0700 Subject: [PATCH 0158/1536] net: Make nexthop-dumps scale linearly with the number of nexthops When we have a (very) large number of nexthops, they do not fit within a single message. rtm_dump_walk_nexthops() thus will be called repeatedly and ctx->idx is used to avoid dumping the same nexthops again. The approach in which we avoid dumping the same nexthops is by basically walking the entire nexthop rb-tree from the left-most node until we find a node whose id is >= s_idx. That does not scale well. Instead of this inefficient approach, rather go directly through the tree to the nexthop that should be dumped (the one whose nh_id >= s_idx). This allows us to find the relevant node in O(log(n)). We have quite a nice improvement with this: Before: ======= --> ~1M nexthops: $ time ~/libnl/src/nl-nh-list | wc -l 1050624 real 0m21.080s user 0m0.666s sys 0m20.384s --> ~2M nexthops: $ time ~/libnl/src/nl-nh-list | wc -l 2101248 real 1m51.649s user 0m1.540s sys 1m49.908s After: ====== --> ~1M nexthops: $ time ~/libnl/src/nl-nh-list | wc -l 1050624 real 0m1.157s user 0m0.926s sys 0m0.259s --> ~2M nexthops: $ time ~/libnl/src/nl-nh-list | wc -l 2101248 real 0m2.763s user 0m2.042s sys 0m0.776s Signed-off-by: Christoph Paasch Reviewed-by: Ido Schimmel Reviewed-by: Nikolay Aleksandrov Reviewed-by: Eric Dumazet Reviewed-by: David Ahern Link: https://patch.msgid.link/20250816-nexthop_dump-v2-1-491da3462118@openai.com Signed-off-by: Jakub Kicinski --- net/ipv4/nexthop.c | 36 +++++++++++++++++++++++++++++++++--- 1 file changed, 33 insertions(+), 3 deletions(-) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 29118c43ebf5..509004bfd08e 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -3511,12 +3511,42 @@ static int rtm_dump_walk_nexthops(struct sk_buff *skb, int err; s_idx = ctx->idx; - for (node = rb_first(root); node; node = rb_next(node)) { + + /* If this is not the first invocation, ctx->idx will contain the id of + * the last nexthop we processed. Instead of starting from the very + * first element of the red/black tree again and linearly skipping the + * (potentially large) set of nodes with an id smaller than s_idx, walk + * the tree and find the left-most node whose id is >= s_idx. This + * provides an efficient O(log n) starting point for the dump + * continuation. + */ + if (s_idx != 0) { + struct rb_node *tmp = root->rb_node; + + node = NULL; + while (tmp) { + struct nexthop *nh; + + nh = rb_entry(tmp, struct nexthop, rb_node); + if (nh->id < s_idx) { + tmp = tmp->rb_right; + } else { + /* Track current candidate and keep looking on + * the left side to find the left-most + * (smallest id) that is still >= s_idx. + */ + node = tmp; + tmp = tmp->rb_left; + } + } + } else { + node = rb_first(root); + } + + for (; node; node = rb_next(node)) { struct nexthop *nh; nh = rb_entry(node, struct nexthop, rb_node); - if (nh->id < s_idx) - continue; ctx->idx = nh->id; err = nh_cb(skb, cb, nh, data); From b0ac6d3b56a2384db151696cfda2836a8a961b6d Mon Sep 17 00:00:00 2001 From: Christoph Paasch Date: Sat, 16 Aug 2025 16:12:49 -0700 Subject: [PATCH 0159/1536] net: When removing nexthops, don't call synchronize_net if it is not necessary When removing a nexthop, commit 90f33bffa382 ("nexthops: don't modify published nexthop groups") added a call to synchronize_rcu() (later changed to _net()) to make sure everyone sees the new nexthop-group before the rtnl-lock is released. When one wants to delete a large number of groups and nexthops, it is fastest to first flush the groups (ip nexthop flush groups) and then flush the nexthops themselves (ip -6 nexthop flush). As that way the groups don't need to be rebalanced. However, `ip -6 nexthop flush` will still take a long time if there is a very large number of nexthops because of the call to synchronize_net(). Now, if there are no more groups, there is no point in calling synchronize_net(). So, let's skip that entirely by checking if nh->grp_list is empty. This gives us a nice speedup: BEFORE: ======= $ time sudo ip -6 nexthop flush Dump was interrupted and may be inconsistent. Flushed 2097152 nexthops real 1m45.345s user 0m0.001s sys 0m0.005s $ time sudo ip -6 nexthop flush Dump was interrupted and may be inconsistent. Flushed 4194304 nexthops real 3m10.430s user 0m0.002s sys 0m0.004s AFTER: ====== $ time sudo ip -6 nexthop flush Dump was interrupted and may be inconsistent. Flushed 2097152 nexthops real 0m17.545s user 0m0.003s sys 0m0.003s $ time sudo ip -6 nexthop flush Dump was interrupted and may be inconsistent. Flushed 4194304 nexthops real 0m35.823s user 0m0.002s sys 0m0.004s Signed-off-by: Christoph Paasch Reviewed-by: Ido Schimmel Reviewed-by: Nikolay Aleksandrov Reviewed-by: Eric Dumazet Reviewed-by: David Ahern Link: https://patch.msgid.link/20250816-nexthop_dump-v2-2-491da3462118@openai.com Signed-off-by: Jakub Kicinski --- net/ipv4/nexthop.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 509004bfd08e..0a20625f5ffb 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -2087,6 +2087,12 @@ static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh, { struct nh_grp_entry *nhge, *tmp; + /* If there is nothing to do, let's avoid the costly call to + * synchronize_net() + */ + if (list_empty(&nh->grp_list)) + return; + list_for_each_entry_safe(nhge, tmp, &nh->grp_list, nh_list) remove_nh_grp_entry(net, nhge, nlinfo); From c3f0c02997c7f8489fec259e28e0e04e9811edac Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Mon, 18 Aug 2025 08:40:26 -0700 Subject: [PATCH 0160/1536] net: Add skb_dstref_steal and skb_dstref_restore Going forward skb_dst_set will assert that skb dst_entry is empty during skb_dst_set to prevent potential leaks. There are few places that still manually manage dst_entry not using the helpers. Convert them to the following new helpers: - skb_dstref_steal that resets dst_entry and returns previous dst_entry value - skb_dstref_restore that restores dst_entry previously reset via skb_dstref_steal Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250818154032.3173645-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 14b923ddb6df..7538ca507ee9 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1159,6 +1159,38 @@ static inline struct dst_entry *skb_dst(const struct sk_buff *skb) return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK); } +/** + * skb_dstref_steal() - return current dst_entry value and clear it + * @skb: buffer + * + * Resets skb dst_entry without adjusting its reference count. Useful in + * cases where dst_entry needs to be temporarily reset and restored. + * Note that the returned value cannot be used directly because it + * might contain SKB_DST_NOREF bit. + * + * When in doubt, prefer skb_dst_drop() over skb_dstref_steal() to correctly + * handle dst_entry reference counting. + * + * Returns: original skb dst_entry. + */ +static inline unsigned long skb_dstref_steal(struct sk_buff *skb) +{ + unsigned long refdst = skb->_skb_refdst; + + skb->_skb_refdst = 0; + return refdst; +} + +/** + * skb_dstref_restore() - restore skb dst_entry removed via skb_dstref_steal() + * @skb: buffer + * @refdst: dst entry from a call to skb_dstref_steal() + */ +static inline void skb_dstref_restore(struct sk_buff *skb, unsigned long refdst) +{ + skb->_skb_refdst = refdst; +} + /** * skb_dst_set - sets skb dst * @skb: buffer From c829aab21ed55d9e38346604259aa3ff88d17274 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Mon, 18 Aug 2025 08:40:27 -0700 Subject: [PATCH 0161/1536] xfrm: Switch to skb_dstref_steal to clear dst_entry Going forward skb_dst_set will assert that skb dst_entry is empty during skb_dst_set. skb_dstref_steal is added to reset existing entry without doing refcnt. Switch to skb_dstref_steal in __xfrm_route_forward and add a comment on why it's safe to skip skb_dstref_restore. Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250818154032.3173645-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- net/xfrm/xfrm_policy.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index c5035a9bc3bb..7111184eef59 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -3881,12 +3881,18 @@ int __xfrm_route_forward(struct sk_buff *skb, unsigned short family) } skb_dst_force(skb); - if (!skb_dst(skb)) { + dst = skb_dst(skb); + if (!dst) { XFRM_INC_STATS(net, LINUX_MIB_XFRMFWDHDRERROR); return 0; } - dst = xfrm_lookup(net, skb_dst(skb), &fl, NULL, XFRM_LOOKUP_QUEUE); + /* ignore return value from skb_dstref_steal, xfrm_lookup takes + * care of dropping the refcnt if needed. + */ + skb_dstref_steal(skb); + + dst = xfrm_lookup(net, dst, &fl, NULL, XFRM_LOOKUP_QUEUE); if (IS_ERR(dst)) { res = 0; dst = NULL; From 15488d4d8dc10a575f32cc692b6c9aa99f66bc7b Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Mon, 18 Aug 2025 08:40:28 -0700 Subject: [PATCH 0162/1536] netfilter: Switch to skb_dstref_steal to clear dst_entry Going forward skb_dst_set will assert that skb dst_entry is empty during skb_dst_set. skb_dstref_steal is added to reset existing entry without doing refcnt. Switch to skb_dstref_steal in ip[6]_route_me_harder and add a comment on why it's safe to skip skb_dstref_restore. Acked-by: Florian Westphal Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250818154032.3173645-4-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- net/ipv4/netfilter.c | 5 ++++- net/ipv6/netfilter.c | 5 ++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c index 0565f001120d..e60e54e7945d 100644 --- a/net/ipv4/netfilter.c +++ b/net/ipv4/netfilter.c @@ -65,7 +65,10 @@ int ip_route_me_harder(struct net *net, struct sock *sk, struct sk_buff *skb, un if (!(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) && xfrm_decode_session(net, skb, flowi4_to_flowi(&fl4), AF_INET) == 0) { struct dst_entry *dst = skb_dst(skb); - skb_dst_set(skb, NULL); + /* ignore return value from skb_dstref_steal, xfrm_lookup takes + * care of dropping the refcnt if needed. + */ + skb_dstref_steal(skb); dst = xfrm_lookup(net, dst, flowi4_to_flowi(&fl4), sk, 0); if (IS_ERR(dst)) return PTR_ERR(dst); diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c index 45f9105f9ac1..46540a5a4331 100644 --- a/net/ipv6/netfilter.c +++ b/net/ipv6/netfilter.c @@ -63,7 +63,10 @@ int ip6_route_me_harder(struct net *net, struct sock *sk_partial, struct sk_buff #ifdef CONFIG_XFRM if (!(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) && xfrm_decode_session(net, skb, flowi6_to_flowi(&fl6), AF_INET6) == 0) { - skb_dst_set(skb, NULL); + /* ignore return value from skb_dstref_steal, xfrm_lookup takes + * care of dropping the refcnt if needed. + */ + skb_dstref_steal(skb); dst = xfrm_lookup(net, dst, flowi6_to_flowi(&fl6), sk, 0); if (IS_ERR(dst)) return PTR_ERR(dst); From e97e6a1830ddb5885ba312e56b6fa3aa39b5f47e Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Mon, 18 Aug 2025 08:40:29 -0700 Subject: [PATCH 0163/1536] net: Switch to skb_dstref_steal/skb_dstref_restore for ip_route_input callers Going forward skb_dst_set will assert that skb dst_entry is empty during skb_dst_set. skb_dstref_steal is added to reset existing entry without doing refcnt. skb_dstref_restore should be used to restore the previous entry. Convert icmp_route_lookup and ip_options_rcv_srr to these helpers. Add extra call to skb_dstref_reset to icmp_route_lookup to clear the ip_route_input entry. Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250818154032.3173645-5-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- net/ipv4/icmp.c | 7 ++++--- net/ipv4/ip_options.c | 5 ++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 2ffe73ea644f..91765057aa1d 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -544,14 +544,15 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4, goto relookup_failed; } /* Ugh! */ - orefdst = skb_in->_skb_refdst; /* save old refdst */ - skb_dst_set(skb_in, NULL); + orefdst = skb_dstref_steal(skb_in); err = ip_route_input(skb_in, fl4_dec.daddr, fl4_dec.saddr, dscp, rt2->dst.dev) ? -EINVAL : 0; dst_release(&rt2->dst); rt2 = skb_rtable(skb_in); - skb_in->_skb_refdst = orefdst; /* restore old refdst */ + /* steal dst entry from skb_in, don't drop refcnt */ + skb_dstref_steal(skb_in); + skb_dstref_restore(skb_in, orefdst); } if (err) diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c index e3321932bec0..be8815ce3ac2 100644 --- a/net/ipv4/ip_options.c +++ b/net/ipv4/ip_options.c @@ -615,14 +615,13 @@ int ip_options_rcv_srr(struct sk_buff *skb, struct net_device *dev) } memcpy(&nexthop, &optptr[srrptr-1], 4); - orefdst = skb->_skb_refdst; - skb_dst_set(skb, NULL); + orefdst = skb_dstref_steal(skb); err = ip_route_input(skb, nexthop, iph->saddr, ip4h_dscp(iph), dev) ? -EINVAL : 0; rt2 = skb_rtable(skb); if (err || (rt2->rt_type != RTN_UNICAST && rt2->rt_type != RTN_LOCAL)) { skb_dst_drop(skb); - skb->_skb_refdst = orefdst; + skb_dstref_restore(skb, orefdst); return -EINVAL; } refdst_drop(orefdst); From da3b9d493ba27c710f4812f1d0a98846f3043f94 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Mon, 18 Aug 2025 08:40:30 -0700 Subject: [PATCH 0164/1536] staging: octeon: Convert to skb_dst_drop Instead of doing dst_release and skb_dst_set, do skb_dst_drop which should do the right thing. Acked-by: Greg Kroah-Hartman Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250818154032.3173645-6-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- drivers/staging/octeon/ethernet-tx.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/staging/octeon/ethernet-tx.c b/drivers/staging/octeon/ethernet-tx.c index 261f8dbdc382..0ba240e634a1 100644 --- a/drivers/staging/octeon/ethernet-tx.c +++ b/drivers/staging/octeon/ethernet-tx.c @@ -346,8 +346,7 @@ netdev_tx_t cvm_oct_xmit(struct sk_buff *skb, struct net_device *dev) * The skbuff will be reused without ever being freed. We must * cleanup a bunch of core things. */ - dst_release(skb_dst(skb)); - skb_dst_set(skb, NULL); + skb_dst_drop(skb); skb_ext_reset(skb); nf_reset_ct(skb); skb_reset_redirect(skb); From 3e31075a119409613627517f205edd86a7f89d7b Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Mon, 18 Aug 2025 08:40:31 -0700 Subject: [PATCH 0165/1536] chtls: Convert to skb_dst_reset Going forward skb_dst_set will assert that skb dst_entry is empty during skb_dst_set. skb_dstref_steal is added to reset existing entry without doing refcnt. Chelsio driver is doing extra dst management via skb_dst_set(NULL). Replace these calls with skb_dstref_steal. Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250818154032.3173645-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- .../ethernet/chelsio/inline_crypto/chtls/chtls_cm.c | 10 +++++----- .../ethernet/chelsio/inline_crypto/chtls/chtls_cm.h | 4 ++-- .../ethernet/chelsio/inline_crypto/chtls/chtls_io.c | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c index 6f6525983130..2e7c2691a193 100644 --- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c +++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c @@ -171,7 +171,7 @@ static void chtls_purge_receive_queue(struct sock *sk) struct sk_buff *skb; while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) { - skb_dst_set(skb, (void *)NULL); + skb_dstref_steal(skb); kfree_skb(skb); } } @@ -194,7 +194,7 @@ static void chtls_purge_recv_queue(struct sock *sk) struct sk_buff *skb; while ((skb = __skb_dequeue(&tlsk->sk_recv_queue)) != NULL) { - skb_dst_set(skb, NULL); + skb_dstref_steal(skb); kfree_skb(skb); } } @@ -1734,7 +1734,7 @@ static int chtls_rx_data(struct chtls_dev *cdev, struct sk_buff *skb) pr_err("can't find conn. for hwtid %u.\n", hwtid); return -EINVAL; } - skb_dst_set(skb, NULL); + skb_dstref_steal(skb); process_cpl_msg(chtls_recv_data, sk, skb); return 0; } @@ -1786,7 +1786,7 @@ static int chtls_rx_pdu(struct chtls_dev *cdev, struct sk_buff *skb) pr_err("can't find conn. for hwtid %u.\n", hwtid); return -EINVAL; } - skb_dst_set(skb, NULL); + skb_dstref_steal(skb); process_cpl_msg(chtls_recv_pdu, sk, skb); return 0; } @@ -1855,7 +1855,7 @@ static int chtls_rx_cmp(struct chtls_dev *cdev, struct sk_buff *skb) pr_err("can't find conn. for hwtid %u.\n", hwtid); return -EINVAL; } - skb_dst_set(skb, NULL); + skb_dstref_steal(skb); process_cpl_msg(chtls_rx_hdr, sk, skb); return 0; diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.h b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.h index f61ca657601c..2285cf2df251 100644 --- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.h +++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.h @@ -171,14 +171,14 @@ static inline void chtls_set_req_addr(struct request_sock *oreq, static inline void chtls_free_skb(struct sock *sk, struct sk_buff *skb) { - skb_dst_set(skb, NULL); + skb_dstref_steal(skb); __skb_unlink(skb, &sk->sk_receive_queue); __kfree_skb(skb); } static inline void chtls_kfree_skb(struct sock *sk, struct sk_buff *skb) { - skb_dst_set(skb, NULL); + skb_dstref_steal(skb); __skb_unlink(skb, &sk->sk_receive_queue); kfree_skb(skb); } diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c index 465fa8077964..4036db466e18 100644 --- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c +++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c @@ -1434,7 +1434,7 @@ static int chtls_pt_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, continue; found_ok_skb: if (!skb->len) { - skb_dst_set(skb, NULL); + skb_dstref_steal(skb); __skb_unlink(skb, &sk->sk_receive_queue); kfree_skb(skb); From a890348adcc993f48d1ae38f1174dc8de4c3c5ac Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Mon, 18 Aug 2025 08:40:32 -0700 Subject: [PATCH 0166/1536] net: Add skb_dst_check_unset To prevent dst_entry leaks, add warning when the non-NULL dst_entry is rewritten. Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250818154032.3173645-8-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 7538ca507ee9..ca8be45dd8be 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1159,6 +1159,12 @@ static inline struct dst_entry *skb_dst(const struct sk_buff *skb) return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK); } +static inline void skb_dst_check_unset(struct sk_buff *skb) +{ + DEBUG_NET_WARN_ON_ONCE((skb->_skb_refdst & SKB_DST_PTRMASK) && + !(skb->_skb_refdst & SKB_DST_NOREF)); +} + /** * skb_dstref_steal() - return current dst_entry value and clear it * @skb: buffer @@ -1188,6 +1194,7 @@ static inline unsigned long skb_dstref_steal(struct sk_buff *skb) */ static inline void skb_dstref_restore(struct sk_buff *skb, unsigned long refdst) { + skb_dst_check_unset(skb); skb->_skb_refdst = refdst; } @@ -1201,6 +1208,7 @@ static inline void skb_dstref_restore(struct sk_buff *skb, unsigned long refdst) */ static inline void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) { + skb_dst_check_unset(skb); skb->slow_gro |= !!dst; skb->_skb_refdst = (unsigned long)dst; } @@ -1217,6 +1225,7 @@ static inline void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) */ static inline void skb_dst_set_noref(struct sk_buff *skb, struct dst_entry *dst) { + skb_dst_check_unset(skb); WARN_ON(!rcu_read_lock_held() && !rcu_read_lock_bh_held()); skb->slow_gro |= !!dst; skb->_skb_refdst = (unsigned long)dst | SKB_DST_NOREF; From 09bde6fdcd752b4512e7b554a3259e4f6b77c6d1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miguel=20Garc=C3=ADa?= Date: Tue, 19 Aug 2025 00:02:03 +0200 Subject: [PATCH 0167/1536] ipv6: ip6_gre: replace strcpy with strscpy for tunnel name MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the strcpy() call that copies the device name into tunnel->parms.name with strscpy(), to avoid potential overflow and guarantee NULL termination. This uses the two-argument form of strscpy(), where the destination size is inferred from the array type. Destination is tunnel->parms.name (size IFNAMSIZ). Tested in QEMU (Alpine rootfs): - Created IPv6 GRE tunnels over loopback - Assigned overlay IPv6 addresses - Verified bidirectional ping through the tunnel - Changed tunnel parameters at runtime (`ip -6 tunnel change`) Signed-off-by: Miguel García Link: https://patch.msgid.link/20250818220203.899338-1-miguelgarciaroman8@gmail.com Signed-off-by: Jakub Kicinski --- net/ipv6/ip6_gre.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 74d49dd6124d..c82a75510c0e 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -329,9 +329,9 @@ static struct ip6_tnl *ip6gre_tunnel_locate(struct net *net, if (parms->name[0]) { if (!dev_valid_name(parms->name)) return NULL; - strscpy(name, parms->name, IFNAMSIZ); + strscpy(name, parms->name); } else { - strcpy(name, "ip6gre%d"); + strscpy(name, "ip6gre%d"); } dev = alloc_netdev(sizeof(*t), name, NET_NAME_UNKNOWN, ip6gre_tunnel_setup); @@ -1469,7 +1469,7 @@ static int ip6gre_tunnel_init_common(struct net_device *dev) tunnel = netdev_priv(dev); tunnel->dev = dev; - strcpy(tunnel->parms.name, dev->name); + strscpy(tunnel->parms.name, dev->name); ret = dst_cache_init(&tunnel->dst_cache, GFP_KERNEL); if (ret) @@ -1529,7 +1529,7 @@ static void ip6gre_fb_tunnel_init(struct net_device *dev) tunnel->dev = dev; tunnel->net = dev_net(dev); - strcpy(tunnel->parms.name, dev->name); + strscpy(tunnel->parms.name, dev->name); tunnel->hlen = sizeof(struct ipv6hdr) + 4; } @@ -1842,7 +1842,7 @@ static int ip6erspan_tap_init(struct net_device *dev) tunnel = netdev_priv(dev); tunnel->dev = dev; - strcpy(tunnel->parms.name, dev->name); + strscpy(tunnel->parms.name, dev->name); ret = dst_cache_init(&tunnel->dst_cache, GFP_KERNEL); if (ret) From 3a752e67800106a5c42d802d67e06c60aa71d07b Mon Sep 17 00:00:00 2001 From: Markus Stockhausen Date: Fri, 15 Aug 2025 04:20:09 -0400 Subject: [PATCH 0168/1536] net: phy: realtek: enable serdes option mode for RTL8226-CG The RTL8226-CG can make use of the serdes option mode feature to dynamically switch between SGMII and 2500base-X. From what is known the setup sequence is much simpler with no magic values. Convert the exiting config_init() into a helper that configures the PHY depending on generation 1 or 2. Call the helper from two separated new config_init() functions. Finally convert the phy_driver specs of the RTL8226-CG to make use of the new configuration and switch over to the extended read_status() function to dynamically change the interface according to the serdes mode. Remark! The logic could be simpler if the serdes mode could be set before all other generation 2 magic values. Due to missing RTL8221B test hardware the mmd command order was kept. Tested on Zyxel XGS1210-12. Signed-off-by: Markus Stockhausen Link: https://patch.msgid.link/20250815082009.3678865-1-markus.stockhausen@gmx.de Signed-off-by: Jakub Kicinski --- drivers/net/phy/realtek/realtek_main.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/drivers/net/phy/realtek/realtek_main.c b/drivers/net/phy/realtek/realtek_main.c index 904fa8b1aa03..b2d0c0f88f75 100644 --- a/drivers/net/phy/realtek/realtek_main.c +++ b/drivers/net/phy/realtek/realtek_main.c @@ -1146,7 +1146,7 @@ static int rtl822x_probe(struct phy_device *phydev) return 0; } -static int rtl822xb_config_init(struct phy_device *phydev) +static int rtl822x_set_serdes_option_mode(struct phy_device *phydev, bool gen1) { bool has_2500, has_sgmii; u16 mode; @@ -1181,15 +1181,18 @@ static int rtl822xb_config_init(struct phy_device *phydev) /* the following sequence with magic numbers sets up the SerDes * option mode */ - ret = phy_write_mmd(phydev, MDIO_MMD_VEND1, 0x75f3, 0); - if (ret < 0) - return ret; + + if (!gen1) { + ret = phy_write_mmd(phydev, MDIO_MMD_VEND1, 0x75f3, 0); + if (ret < 0) + return ret; + } ret = phy_modify_mmd_changed(phydev, MDIO_MMD_VEND1, RTL822X_VND1_SERDES_OPTION, RTL822X_VND1_SERDES_OPTION_MODE_MASK, mode); - if (ret < 0) + if (gen1 || ret < 0) return ret; ret = phy_write_mmd(phydev, MDIO_MMD_VEND1, 0x6a04, 0x0503); @@ -1203,6 +1206,16 @@ static int rtl822xb_config_init(struct phy_device *phydev) return phy_write_mmd(phydev, MDIO_MMD_VEND1, 0x6f11, 0x8020); } +static int rtl822x_config_init(struct phy_device *phydev) +{ + return rtl822x_set_serdes_option_mode(phydev, true); +} + +static int rtl822xb_config_init(struct phy_device *phydev) +{ + return rtl822x_set_serdes_option_mode(phydev, false); +} + static int rtl822xb_get_rate_matching(struct phy_device *phydev, phy_interface_t iface) { @@ -1801,7 +1814,8 @@ static struct phy_driver realtek_drvs[] = { .soft_reset = rtl822x_c45_soft_reset, .get_features = rtl822x_c45_get_features, .config_aneg = rtl822x_c45_config_aneg, - .read_status = rtl822x_c45_read_status, + .config_init = rtl822x_config_init, + .read_status = rtl822xb_c45_read_status, .suspend = genphy_c45_pma_suspend, .resume = rtlgen_c45_resume, }, { From e16e973c576f89be26837778a5566ab9c4231852 Mon Sep 17 00:00:00 2001 From: Jijie Shao Date: Fri, 15 Aug 2025 18:04:13 +0800 Subject: [PATCH 0169/1536] net: hns3: add parameter check for tx_copybreak and tx_spare_buf_size Since the driver always enables tx bounce buffer, there are minimum values for `copybreak` and `tx_spare_buf_size`. This patch will check and reject configurations with values smaller than these minimums. Closes: https://lore.kernel.org/all/20250723072900.GV2459@horms.kernel.org/ Signed-off-by: Jijie Shao Link: https://patch.msgid.link/20250815100414.949752-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski --- .../ethernet/hisilicon/hns3/hns3_ethtool.c | 33 +++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c index d5454e126c85..a752d0e3db3a 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c @@ -1927,6 +1927,31 @@ static int hns3_set_tx_spare_buf_size(struct net_device *netdev, return ret; } +static int hns3_check_tx_copybreak(struct net_device *netdev, u32 copybreak) +{ + struct hns3_nic_priv *priv = netdev_priv(netdev); + + if (copybreak < priv->min_tx_copybreak) { + netdev_err(netdev, "tx copybreak %u should be no less than %u!\n", + copybreak, priv->min_tx_copybreak); + return -EINVAL; + } + return 0; +} + +static int hns3_check_tx_spare_buf_size(struct net_device *netdev, u32 buf_size) +{ + struct hns3_nic_priv *priv = netdev_priv(netdev); + + if (buf_size < priv->min_tx_spare_buf_size) { + netdev_err(netdev, + "tx spare buf size %u should be no less than %u!\n", + buf_size, priv->min_tx_spare_buf_size); + return -EINVAL; + } + return 0; +} + static int hns3_set_tunable(struct net_device *netdev, const struct ethtool_tunable *tuna, const void *data) @@ -1943,6 +1968,10 @@ static int hns3_set_tunable(struct net_device *netdev, switch (tuna->id) { case ETHTOOL_TX_COPYBREAK: + ret = hns3_check_tx_copybreak(netdev, *(u32 *)data); + if (ret) + return ret; + priv->tx_copybreak = *(u32 *)data; for (i = 0; i < h->kinfo.num_tqps; i++) @@ -1957,6 +1986,10 @@ static int hns3_set_tunable(struct net_device *netdev, break; case ETHTOOL_TX_COPYBREAK_BUF_SIZE: + ret = hns3_check_tx_spare_buf_size(netdev, *(u32 *)data); + if (ret) + return ret; + old_tx_spare_buf_size = h->kinfo.tx_spare_buf_size; new_tx_spare_buf_size = *(u32 *)data; netdev_info(netdev, "request to set tx spare buf size from %u to %u\n", From 021f989c863b40613d1a7595e6f88524fd5a531a Mon Sep 17 00:00:00 2001 From: Jijie Shao Date: Fri, 15 Aug 2025 18:04:14 +0800 Subject: [PATCH 0170/1536] net: hns3: change the function return type from int to bool hclge_only_alloc_priv_buff() only return true or false, So, change the function return type from integer to boolean. Signed-off-by: Jijie Shao Link: https://patch.msgid.link/20250815100414.949752-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index f209a05e2033..f5457ae0b64f 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -2182,8 +2182,8 @@ static bool hclge_drop_pfc_buf_till_fit(struct hclge_dev *hdev, return hclge_is_rx_buf_ok(hdev, buf_alloc, rx_all); } -static int hclge_only_alloc_priv_buff(struct hclge_dev *hdev, - struct hclge_pkt_buf_alloc *buf_alloc) +static bool hclge_only_alloc_priv_buff(struct hclge_dev *hdev, + struct hclge_pkt_buf_alloc *buf_alloc) { #define COMPENSATE_BUFFER 0x3C00 #define COMPENSATE_HALF_MPS_NUM 5 From ee0aace5f844ef59335148875d05bec8764e71e8 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Mon, 18 Aug 2025 11:02:15 +0200 Subject: [PATCH 0171/1536] net: stmmac: Correctly handle Rx checksum offload errors The stmmac_rx function would previously set skb->ip_summed to CHECKSUM_UNNECESSARY if hardware checksum offload (CoE) was enabled and the packet was of a known IP ethertype. However, this logic failed to check if the hardware had actually reported a checksum error. The hardware status, indicating a header or payload checksum failure, was being ignored at this stage. This could cause corrupt packets to be passed up the network stack as valid. This patch corrects the logic by checking the `csum_none` status flag, which is set when the hardware reports a checksum error. If this flag is set, skb->ip_summed is now correctly set to CHECKSUM_NONE, ensuring the kernel's network stack will perform its own validation and properly handle the corrupt packet. Signed-off-by: Oleksij Rempel Link: https://patch.msgid.link/20250818090217.2789521-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 0bbdc1f1a691..a55e43804670 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -5737,7 +5737,8 @@ drain_data: skb->protocol = eth_type_trans(skb, priv->dev); - if (unlikely(!coe) || !stmmac_has_ip_ethertype(skb)) + if (unlikely(!coe) || !stmmac_has_ip_ethertype(skb) || + (status & csum_none)) skb_checksum_none_assert(skb); else skb->ip_summed = CHECKSUM_UNNECESSARY; From 644b8437ccef9e7444164d8c4409933565928371 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Mon, 18 Aug 2025 11:02:16 +0200 Subject: [PATCH 0172/1536] net: stmmac: dwmac4: report Rx checksum errors in status Propagate hardware checksum failures from the descriptor parser to the caller. Currently, dwmac4_wrback_get_rx_status() updates stats when the Rx descriptor signals an IP header or payload checksum error, but it does not reflect this in its return value. The higher-level stmmac_rx() code therefore cannot tell that hardware checksum validation failed. Set the csum_none flag in the returned status when either RDES1_IP_HDR_ERROR or RDES1_IP_PAYLOAD_ERROR is present. This aligns dwmac4 with enh_desc_coe_rdes0() and lets stmmac_rx() mark the skb as CHECKSUM_NONE for software verification. This is a preparatory step for disabling the hardware filter that drops frames which do not pass checksum validation. Signed-off-by: Oleksij Rempel Link: https://patch.msgid.link/20250818090217.2789521-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c index a5fb31eb0192..aac68dc28dc1 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c @@ -110,16 +110,20 @@ static int dwmac4_wrback_get_rx_status(struct stmmac_extra_stats *x, message_type = (rdes1 & ERDES4_MSG_TYPE_MASK) >> 8; - if (rdes1 & RDES1_IP_HDR_ERROR) + if (rdes1 & RDES1_IP_HDR_ERROR) { x->ip_hdr_err++; + ret |= csum_none; + } if (rdes1 & RDES1_IP_CSUM_BYPASSED) x->ip_csum_bypassed++; if (rdes1 & RDES1_IPV4_HEADER) x->ipv4_pkt_rcvd++; if (rdes1 & RDES1_IPV6_HEADER) x->ipv6_pkt_rcvd++; - if (rdes1 & RDES1_IP_PAYLOAD_ERROR) + if (rdes1 & RDES1_IP_PAYLOAD_ERROR) { x->ip_payload_err++; + ret |= csum_none; + } if (message_type == RDES_EXT_NO_PTP) x->no_ptp_rx_msg_type_ext++; From fe4042797651215ca9acd83035122b5e940acf2e Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Mon, 18 Aug 2025 11:02:17 +0200 Subject: [PATCH 0173/1536] net: stmmac: dwmac4: stop hardware from dropping checksum-error packets Tell the MAC not to discard frames that fail TCP/IP checksum validation. By default, when the hardware checksum engine (CoE) is enabled, dwmac4 silently drops any packet where the offload engine detects a checksum error. These frames are not reported to the driver and are not counted in any statistics as dropped packets. Set the MTL_OP_MODE_DIS_TCP_EF bit when initializing the Rx channel so that all packets are delivered, even if they failed hardware checksum validation. CoE remains enabled, but instead of dropping such frames, the driver propagates the error status and marks the skb with CHECKSUM_NONE. This allows the stack to verify and drop the packet while updating statistics. This change follows the decision made in the discussion: Link: https://lore.kernel.org/all/20250625132117.1b3264e8@kernel.org/ It depends on the previous patches that added proper error propagation in the Rx path. Signed-off-by: Oleksij Rempel Link: https://patch.msgid.link/20250818090217.2789521-4-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/dwmac4.h | 1 + drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h index f4694fd576f5..3dec1a264cf6 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h @@ -341,6 +341,7 @@ static inline u32 mtl_chanx_base_addr(const struct dwmac4_addrs *addrs, #define MTL_OP_MODE_RFA_SHIFT 8 #define MTL_OP_MODE_EHFC BIT(7) +#define MTL_OP_MODE_DIS_TCP_EF BIT(6) #define MTL_OP_MODE_RTC_MASK GENMASK(1, 0) #define MTL_OP_MODE_RTC_SHIFT 0 diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c index 0cb84a0041a4..d87a8b595e6a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c @@ -268,6 +268,8 @@ static void dwmac4_dma_rx_chan_op_mode(struct stmmac_priv *priv, mtl_rx_op = readl(ioaddr + MTL_CHAN_RX_OP_MODE(dwmac4_addrs, channel)); + mtl_rx_op |= MTL_OP_MODE_DIS_TCP_EF; + if (mode == SF_DMA_MODE) { pr_debug("GMAC: enable RX store and forward mode\n"); mtl_rx_op |= MTL_OP_MODE_RSF; From 68889dfd547bd8eabc5a98b58475d7b901cf5129 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 15 Aug 2025 20:16:09 +0000 Subject: [PATCH 0174/1536] mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n. When sk_alloc() allocates a socket, mem_cgroup_sk_alloc() sets sk->sk_memcg based on the current task. MPTCP subflow socket creation is triggered from userspace or an in-kernel worker. In the latter case, sk->sk_memcg is not what we want. So, we fix it up from the parent socket's sk->sk_memcg in mptcp_attach_cgroup(). Although the code is placed under #ifdef CONFIG_MEMCG, it is buried under #ifdef CONFIG_SOCK_CGROUP_DATA. The two configs are orthogonal. If CONFIG_MEMCG is enabled without CONFIG_SOCK_CGROUP_DATA, the subflow's memory usage is not charged correctly. Let's move the code out of the wrong ifdef guard. Note that sk->sk_memcg is freed in sk_prot_free() and the parent sk holds the refcnt of memcg->css here, so we don't need to use css_tryget(). Fixes: 3764b0c5651e3 ("mptcp: attach subflow socket to parent cgroup") Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Acked-by: Matthieu Baerts (NGI0) Acked-by: Shakeel Butt Link: https://patch.msgid.link/20250815201712.1745332-2-kuniyu@google.com Signed-off-by: Jakub Kicinski --- include/linux/memcontrol.h | 6 ++++++ mm/memcontrol.c | 13 +++++++++++++ net/mptcp/subflow.c | 11 +++-------- 3 files changed, 22 insertions(+), 8 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 785173aa0739..25921fbec685 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1604,6 +1604,7 @@ extern struct static_key_false memcg_sockets_enabled_key; #define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key) void mem_cgroup_sk_alloc(struct sock *sk); void mem_cgroup_sk_free(struct sock *sk); +void mem_cgroup_sk_inherit(const struct sock *sk, struct sock *newsk); #if BITS_PER_LONG < 64 static inline void mem_cgroup_set_socket_pressure(struct mem_cgroup *memcg) @@ -1661,6 +1662,11 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg); #define mem_cgroup_sockets_enabled 0 static inline void mem_cgroup_sk_alloc(struct sock *sk) { }; static inline void mem_cgroup_sk_free(struct sock *sk) { }; + +static inline void mem_cgroup_sk_inherit(const struct sock *sk, struct sock *newsk) +{ +} + static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg) { return false; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8dd7fbed5a94..46713b9ece06 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5024,6 +5024,19 @@ void mem_cgroup_sk_free(struct sock *sk) css_put(&sk->sk_memcg->css); } +void mem_cgroup_sk_inherit(const struct sock *sk, struct sock *newsk) +{ + if (sk->sk_memcg == newsk->sk_memcg) + return; + + mem_cgroup_sk_free(newsk); + + if (sk->sk_memcg) + css_get(&sk->sk_memcg->css); + + newsk->sk_memcg = sk->sk_memcg; +} + /** * mem_cgroup_charge_skmem - charge socket memory * @memcg: memcg to charge diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 3f1b62a9fe88..c8a7e4b59db1 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1717,19 +1717,14 @@ static void mptcp_attach_cgroup(struct sock *parent, struct sock *child) /* only the additional subflows created by kworkers have to be modified */ if (cgroup_id(sock_cgroup_ptr(parent_skcd)) != cgroup_id(sock_cgroup_ptr(child_skcd))) { -#ifdef CONFIG_MEMCG - struct mem_cgroup *memcg = parent->sk_memcg; - - mem_cgroup_sk_free(child); - if (memcg && css_tryget(&memcg->css)) - child->sk_memcg = memcg; -#endif /* CONFIG_MEMCG */ - cgroup_sk_free(child_skcd); *child_skcd = *parent_skcd; cgroup_sk_clone(child_skcd); } #endif /* CONFIG_SOCK_CGROUP_DATA */ + + if (mem_cgroup_sockets_enabled) + mem_cgroup_sk_inherit(parent, child); } static void mptcp_subflow_ops_override(struct sock *ssk) From 1068b48ed10805b61be0668cd774af97163479a7 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 15 Aug 2025 20:16:10 +0000 Subject: [PATCH 0175/1536] mptcp: Use tcp_under_memory_pressure() in mptcp_epollin_ready(). Some conditions used in mptcp_epollin_ready() are the same as tcp_under_memory_pressure(). We will modify tcp_under_memory_pressure() in the later patch. Let's use tcp_under_memory_pressure() instead. Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Reviewed-by: Matthieu Baerts (NGI0) Reviewed-by: Shakeel Butt Link: https://patch.msgid.link/20250815201712.1745332-3-kuniyu@google.com Signed-off-by: Jakub Kicinski --- net/mptcp/protocol.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index b15d7fab5c4b..a1787a1344ac 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -788,9 +788,7 @@ static inline bool mptcp_epollin_ready(const struct sock *sk) * as it can always coalesce them */ return (data_avail >= sk->sk_rcvlowat) || - (mem_cgroup_sockets_enabled && sk->sk_memcg && - mem_cgroup_under_socket_pressure(sk->sk_memcg)) || - READ_ONCE(tcp_memory_pressure); + tcp_under_memory_pressure(sk); } int mptcp_set_rcvlowat(struct sock *sk, int val); From e2afa83296bbac40829624b508492b562a21e4d4 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 15 Aug 2025 20:16:11 +0000 Subject: [PATCH 0176/1536] tcp: Simplify error path in inet_csk_accept(). When an error occurs in inet_csk_accept(), what we should do is only call release_sock() and set the errno to arg->err. But the path jumps to another label, which introduces unnecessary initialisation and tests for newsk. Let's simplify the error path and remove the redundant NULL checks for newsk. Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Reviewed-by: Shakeel Butt Link: https://patch.msgid.link/20250815201712.1745332-4-kuniyu@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/inet_connection_sock.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 1e2df51427fe..724bd9ed6cd4 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -706,9 +706,9 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) spin_unlock_bh(&queue->fastopenq.lock); } -out: release_sock(sk); - if (newsk && mem_cgroup_sockets_enabled) { + + if (mem_cgroup_sockets_enabled) { gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL; int amt = 0; @@ -732,18 +732,17 @@ out: release_sock(newsk); } + if (req) reqsk_put(req); - if (newsk) - inet_init_csk_locks(newsk); - + inet_init_csk_locks(newsk); return newsk; + out_err: - newsk = NULL; - req = NULL; + release_sock(sk); arg->err = error; - goto out; + return NULL; } EXPORT_SYMBOL(inet_csk_accept); From 9d85c565a7b7c78b732393c02bcaa4d5c275fe58 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 15 Aug 2025 20:16:12 +0000 Subject: [PATCH 0177/1536] net: Call trace_sock_exceed_buf_limit() for memcg failure with SK_MEM_RECV. Initially, trace_sock_exceed_buf_limit() was invoked when __sk_mem_raise_allocated() failed due to the memcg limit or the global limit. However, commit d6f19938eb031 ("net: expose sk wmem in sock_exceed_buf_limit tracepoint") somehow suppressed the event only when memcg failed to charge for SK_MEM_RECV, although the memcg failure for SK_MEM_SEND still triggers the event. Let's restore the event for SK_MEM_RECV. Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Reviewed-by: Shakeel Butt Link: https://patch.msgid.link/20250815201712.1745332-5-kuniyu@google.com Signed-off-by: Jakub Kicinski --- net/core/sock.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 7c26ec8dce63..380bc1aa6982 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3354,8 +3354,7 @@ suppress_allocation: } } - if (kind == SK_MEM_SEND || (kind == SK_MEM_RECV && charged)) - trace_sock_exceed_buf_limit(sk, prot, allocated, kind); + trace_sock_exceed_buf_limit(sk, prot, allocated, kind); sk_memory_allocated_sub(sk, amt); From bd4aa2337374dd04b8627efd26227ebd49f69285 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 15 Aug 2025 20:16:13 +0000 Subject: [PATCH 0178/1536] net: Clean up __sk_mem_raise_allocated(). In __sk_mem_raise_allocated(), charged is initialised as true due to the weird condition removed in the previous patch. It makes the variable unreliable by itself, so we have to check another variable, memcg, in advance. Also, we will factorise the common check below for memcg later. if (mem_cgroup_sockets_enabled && sk->sk_memcg) As a prep, let's initialise charged as false and memcg as NULL. Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Reviewed-by: Shakeel Butt Link: https://patch.msgid.link/20250815201712.1745332-6-kuniyu@google.com Signed-off-by: Jakub Kicinski --- net/core/sock.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 380bc1aa6982..000940ecf360 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3263,15 +3263,16 @@ EXPORT_SYMBOL(sk_wait_data); */ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind) { - struct mem_cgroup *memcg = mem_cgroup_sockets_enabled ? sk->sk_memcg : NULL; struct proto *prot = sk->sk_prot; - bool charged = true; + struct mem_cgroup *memcg = NULL; + bool charged = false; long allocated; sk_memory_allocated_add(sk, amt); allocated = sk_memory_allocated(sk); - if (memcg) { + if (mem_cgroup_sockets_enabled && sk->sk_memcg) { + memcg = sk->sk_memcg; charged = mem_cgroup_charge_skmem(memcg, amt, gfp_memcg_charge()); if (!charged) goto suppress_allocation; @@ -3358,7 +3359,7 @@ suppress_allocation: sk_memory_allocated_sub(sk, amt); - if (memcg && charged) + if (charged) mem_cgroup_uncharge_skmem(memcg, amt); return 0; From f7161b234f2ec7f18999009c4becc04eeb6b12a7 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 15 Aug 2025 20:16:14 +0000 Subject: [PATCH 0179/1536] net-memcg: Introduce mem_cgroup_from_sk(). We will store a flag in the lowest bit of sk->sk_memcg. Then, directly dereferencing sk->sk_memcg will be illegal, and we do not want to allow touching the raw sk->sk_memcg in many places. Let's introduce mem_cgroup_from_sk(). Other places accessing the raw sk->sk_memcg will be converted later. Note that we cannot define the helper as an inline function in memcontrol.h as we cannot access any fields of struct sock there due to circular dependency, so it is placed in sock.h. Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Acked-by: Roman Gushchin Acked-by: Shakeel Butt Link: https://patch.msgid.link/20250815201712.1745332-7-kuniyu@google.com Signed-off-by: Jakub Kicinski --- include/net/sock.h | 12 ++++++++++++ mm/memcontrol.c | 13 +++++++++---- net/ipv4/inet_connection_sock.c | 2 +- 3 files changed, 22 insertions(+), 5 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index c8a4b283df6f..811f95ea8d00 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2594,6 +2594,18 @@ static inline gfp_t gfp_memcg_charge(void) return in_softirq() ? GFP_ATOMIC : GFP_KERNEL; } +#ifdef CONFIG_MEMCG +static inline struct mem_cgroup *mem_cgroup_from_sk(const struct sock *sk) +{ + return sk->sk_memcg; +} +#else +static inline struct mem_cgroup *mem_cgroup_from_sk(const struct sock *sk) +{ + return NULL; +} +#endif + static inline long sock_rcvtimeo(const struct sock *sk, bool noblock) { return noblock ? 0 : READ_ONCE(sk->sk_rcvtimeo); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 46713b9ece06..d8a52d1d08fa 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5020,19 +5020,24 @@ out: void mem_cgroup_sk_free(struct sock *sk) { - if (sk->sk_memcg) - css_put(&sk->sk_memcg->css); + struct mem_cgroup *memcg = mem_cgroup_from_sk(sk); + + if (memcg) + css_put(&memcg->css); } void mem_cgroup_sk_inherit(const struct sock *sk, struct sock *newsk) { + struct mem_cgroup *memcg; + if (sk->sk_memcg == newsk->sk_memcg) return; mem_cgroup_sk_free(newsk); - if (sk->sk_memcg) - css_get(&sk->sk_memcg->css); + memcg = mem_cgroup_from_sk(sk); + if (memcg) + css_get(&memcg->css); newsk->sk_memcg = sk->sk_memcg; } diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 724bd9ed6cd4..93569bbe00f4 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -718,7 +718,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) lock_sock(newsk); mem_cgroup_sk_alloc(newsk); - if (newsk->sk_memcg) { + if (mem_cgroup_from_sk(newsk)) { /* The socket has not been accepted yet, no need * to look at newsk->sk_wmem_queued. */ From 43049b0db03823c2cd003ca7d3dddcd3924da8dc Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 15 Aug 2025 20:16:15 +0000 Subject: [PATCH 0180/1536] net-memcg: Introduce mem_cgroup_sk_enabled(). The socket memcg feature is enabled by a static key and only works for non-root cgroup. We check both conditions in many places. Let's factorise it as a helper function. Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Acked-by: Roman Gushchin Acked-by: Shakeel Butt Link: https://patch.msgid.link/20250815201712.1745332-8-kuniyu@google.com Signed-off-by: Jakub Kicinski --- include/net/proto_memory.h | 2 +- include/net/sock.h | 10 ++++++++++ include/net/tcp.h | 2 +- net/core/sock.c | 6 +++--- net/ipv4/tcp_output.c | 2 +- 5 files changed, 16 insertions(+), 6 deletions(-) diff --git a/include/net/proto_memory.h b/include/net/proto_memory.h index a6ab2f4f5e28..859e63de81c4 100644 --- a/include/net/proto_memory.h +++ b/include/net/proto_memory.h @@ -31,7 +31,7 @@ static inline bool sk_under_memory_pressure(const struct sock *sk) if (!sk->sk_prot->memory_pressure) return false; - if (mem_cgroup_sockets_enabled && sk->sk_memcg && + if (mem_cgroup_sk_enabled(sk) && mem_cgroup_under_socket_pressure(sk->sk_memcg)) return true; diff --git a/include/net/sock.h b/include/net/sock.h index 811f95ea8d00..3efdf680401d 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2599,11 +2599,21 @@ static inline struct mem_cgroup *mem_cgroup_from_sk(const struct sock *sk) { return sk->sk_memcg; } + +static inline bool mem_cgroup_sk_enabled(const struct sock *sk) +{ + return mem_cgroup_sockets_enabled && mem_cgroup_from_sk(sk); +} #else static inline struct mem_cgroup *mem_cgroup_from_sk(const struct sock *sk) { return NULL; } + +static inline bool mem_cgroup_sk_enabled(const struct sock *sk) +{ + return false; +} #endif static inline long sock_rcvtimeo(const struct sock *sk, bool noblock) diff --git a/include/net/tcp.h b/include/net/tcp.h index 526a26e7a150..9f01b6be6444 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -275,7 +275,7 @@ extern unsigned long tcp_memory_pressure; /* optimized version of sk_under_memory_pressure() for TCP sockets */ static inline bool tcp_under_memory_pressure(const struct sock *sk) { - if (mem_cgroup_sockets_enabled && sk->sk_memcg && + if (mem_cgroup_sk_enabled(sk) && mem_cgroup_under_socket_pressure(sk->sk_memcg)) return true; diff --git a/net/core/sock.c b/net/core/sock.c index 000940ecf360..ab658fe23e1e 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1032,7 +1032,7 @@ static int sock_reserve_memory(struct sock *sk, int bytes) bool charged; int pages; - if (!mem_cgroup_sockets_enabled || !sk->sk_memcg || !sk_has_account(sk)) + if (!mem_cgroup_sk_enabled(sk) || !sk_has_account(sk)) return -EOPNOTSUPP; if (!bytes) @@ -3271,7 +3271,7 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind) sk_memory_allocated_add(sk, amt); allocated = sk_memory_allocated(sk); - if (mem_cgroup_sockets_enabled && sk->sk_memcg) { + if (mem_cgroup_sk_enabled(sk)) { memcg = sk->sk_memcg; charged = mem_cgroup_charge_skmem(memcg, amt, gfp_memcg_charge()); if (!charged) @@ -3398,7 +3398,7 @@ void __sk_mem_reduce_allocated(struct sock *sk, int amount) { sk_memory_allocated_sub(sk, amount); - if (mem_cgroup_sockets_enabled && sk->sk_memcg) + if (mem_cgroup_sk_enabled(sk)) mem_cgroup_uncharge_skmem(sk->sk_memcg, amount); if (sk_under_global_memory_pressure(sk) && diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index caf11920a878..37fb320e6f70 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3578,7 +3578,7 @@ void sk_forced_mem_schedule(struct sock *sk, int size) sk_forward_alloc_add(sk, amt << PAGE_SHIFT); sk_memory_allocated_add(sk, amt); - if (mem_cgroup_sockets_enabled && sk->sk_memcg) + if (mem_cgroup_sk_enabled(sk)) mem_cgroup_charge_skmem(sk->sk_memcg, amt, gfp_memcg_charge() | __GFP_NOFAIL); } From bb178c6bc08525d758a57775458d644304011bf8 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 15 Aug 2025 20:16:16 +0000 Subject: [PATCH 0181/1536] net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge(). We will store a flag in the lowest bit of sk->sk_memcg. Then, we cannot pass the raw pointer to mem_cgroup_charge_skmem() and mem_cgroup_uncharge_skmem(). Let's pass struct sock to the functions. While at it, they are renamed to match other functions starting with mem_cgroup_sk_. Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Acked-by: Roman Gushchin Acked-by: Shakeel Butt Link: https://patch.msgid.link/20250815201712.1745332-9-kuniyu@google.com Signed-off-by: Jakub Kicinski --- include/linux/memcontrol.h | 29 ++++++++++++++++++++++++----- mm/memcontrol.c | 18 +++++++++++------- net/core/sock.c | 24 +++++++++++------------- net/ipv4/inet_connection_sock.c | 2 +- net/ipv4/tcp_output.c | 3 +-- 5 files changed, 48 insertions(+), 28 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 25921fbec685..0837d3de3a68 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1596,15 +1596,16 @@ static inline void mem_cgroup_flush_foreign(struct bdi_writeback *wb) #endif /* CONFIG_CGROUP_WRITEBACK */ struct sock; -bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages, - gfp_t gfp_mask); -void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages); #ifdef CONFIG_MEMCG extern struct static_key_false memcg_sockets_enabled_key; #define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key) + void mem_cgroup_sk_alloc(struct sock *sk); void mem_cgroup_sk_free(struct sock *sk); void mem_cgroup_sk_inherit(const struct sock *sk, struct sock *newsk); +bool mem_cgroup_sk_charge(const struct sock *sk, unsigned int nr_pages, + gfp_t gfp_mask); +void mem_cgroup_sk_uncharge(const struct sock *sk, unsigned int nr_pages); #if BITS_PER_LONG < 64 static inline void mem_cgroup_set_socket_pressure(struct mem_cgroup *memcg) @@ -1660,13 +1661,31 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id); void reparent_shrinker_deferred(struct mem_cgroup *memcg); #else #define mem_cgroup_sockets_enabled 0 -static inline void mem_cgroup_sk_alloc(struct sock *sk) { }; -static inline void mem_cgroup_sk_free(struct sock *sk) { }; + +static inline void mem_cgroup_sk_alloc(struct sock *sk) +{ +} + +static inline void mem_cgroup_sk_free(struct sock *sk) +{ +} static inline void mem_cgroup_sk_inherit(const struct sock *sk, struct sock *newsk) { } +static inline bool mem_cgroup_sk_charge(const struct sock *sk, + unsigned int nr_pages, + gfp_t gfp_mask) +{ + return false; +} + +static inline void mem_cgroup_sk_uncharge(const struct sock *sk, + unsigned int nr_pages) +{ +} + static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg) { return false; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d8a52d1d08fa..df3e9205c9e6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5043,17 +5043,19 @@ void mem_cgroup_sk_inherit(const struct sock *sk, struct sock *newsk) } /** - * mem_cgroup_charge_skmem - charge socket memory - * @memcg: memcg to charge + * mem_cgroup_sk_charge - charge socket memory + * @sk: socket in memcg to charge * @nr_pages: number of pages to charge * @gfp_mask: reclaim mode * * Charges @nr_pages to @memcg. Returns %true if the charge fit within * @memcg's configured limit, %false if it doesn't. */ -bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages, - gfp_t gfp_mask) +bool mem_cgroup_sk_charge(const struct sock *sk, unsigned int nr_pages, + gfp_t gfp_mask) { + struct mem_cgroup *memcg = mem_cgroup_from_sk(sk); + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return memcg1_charge_skmem(memcg, nr_pages, gfp_mask); @@ -5066,12 +5068,14 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages, } /** - * mem_cgroup_uncharge_skmem - uncharge socket memory - * @memcg: memcg to uncharge + * mem_cgroup_sk_uncharge - uncharge socket memory + * @sk: socket in memcg to uncharge * @nr_pages: number of pages to uncharge */ -void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages) +void mem_cgroup_sk_uncharge(const struct sock *sk, unsigned int nr_pages) { + struct mem_cgroup *memcg = mem_cgroup_from_sk(sk); + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { memcg1_uncharge_skmem(memcg, nr_pages); return; diff --git a/net/core/sock.c b/net/core/sock.c index ab658fe23e1e..5537ca263858 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1041,8 +1041,8 @@ static int sock_reserve_memory(struct sock *sk, int bytes) pages = sk_mem_pages(bytes); /* pre-charge to memcg */ - charged = mem_cgroup_charge_skmem(sk->sk_memcg, pages, - GFP_KERNEL | __GFP_RETRY_MAYFAIL); + charged = mem_cgroup_sk_charge(sk, pages, + GFP_KERNEL | __GFP_RETRY_MAYFAIL); if (!charged) return -ENOMEM; @@ -1054,7 +1054,7 @@ static int sock_reserve_memory(struct sock *sk, int bytes) */ if (allocated > sk_prot_mem_limits(sk, 1)) { sk_memory_allocated_sub(sk, pages); - mem_cgroup_uncharge_skmem(sk->sk_memcg, pages); + mem_cgroup_sk_uncharge(sk, pages); return -ENOMEM; } sk_forward_alloc_add(sk, pages << PAGE_SHIFT); @@ -3263,17 +3263,16 @@ EXPORT_SYMBOL(sk_wait_data); */ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind) { + bool memcg_enabled = false, charged = false; struct proto *prot = sk->sk_prot; - struct mem_cgroup *memcg = NULL; - bool charged = false; long allocated; sk_memory_allocated_add(sk, amt); allocated = sk_memory_allocated(sk); if (mem_cgroup_sk_enabled(sk)) { - memcg = sk->sk_memcg; - charged = mem_cgroup_charge_skmem(memcg, amt, gfp_memcg_charge()); + memcg_enabled = true; + charged = mem_cgroup_sk_charge(sk, amt, gfp_memcg_charge()); if (!charged) goto suppress_allocation; } @@ -3347,10 +3346,9 @@ suppress_allocation: */ if (sk->sk_wmem_queued + size >= sk->sk_sndbuf) { /* Force charge with __GFP_NOFAIL */ - if (memcg && !charged) { - mem_cgroup_charge_skmem(memcg, amt, - gfp_memcg_charge() | __GFP_NOFAIL); - } + if (memcg_enabled && !charged) + mem_cgroup_sk_charge(sk, amt, + gfp_memcg_charge() | __GFP_NOFAIL); return 1; } } @@ -3360,7 +3358,7 @@ suppress_allocation: sk_memory_allocated_sub(sk, amt); if (charged) - mem_cgroup_uncharge_skmem(memcg, amt); + mem_cgroup_sk_uncharge(sk, amt); return 0; } @@ -3399,7 +3397,7 @@ void __sk_mem_reduce_allocated(struct sock *sk, int amount) sk_memory_allocated_sub(sk, amount); if (mem_cgroup_sk_enabled(sk)) - mem_cgroup_uncharge_skmem(sk->sk_memcg, amount); + mem_cgroup_sk_uncharge(sk, amount); if (sk_under_global_memory_pressure(sk) && (sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0))) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 93569bbe00f4..0ef1eacd539d 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -727,7 +727,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) } if (amt) - mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp); + mem_cgroup_sk_charge(newsk, amt, gfp); kmem_cache_charge(newsk, gfp); release_sock(newsk); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 37fb320e6f70..dfbac0876d96 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3579,8 +3579,7 @@ void sk_forced_mem_schedule(struct sock *sk, int size) sk_memory_allocated_add(sk, amt); if (mem_cgroup_sk_enabled(sk)) - mem_cgroup_charge_skmem(sk->sk_memcg, amt, - gfp_memcg_charge() | __GFP_NOFAIL); + mem_cgroup_sk_charge(sk, amt, gfp_memcg_charge() | __GFP_NOFAIL); } /* Send a FIN. The caller locks the socket for us. From b2ffd10cddde47cc6830e4981e91e3215def62b1 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 15 Aug 2025 20:16:17 +0000 Subject: [PATCH 0182/1536] net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure(). We will store a flag in the lowest bit of sk->sk_memcg. Then, we cannot pass the raw pointer to mem_cgroup_under_socket_pressure(). Let's pass struct sock to it and rename the function to match other functions starting with mem_cgroup_sk_. Note that the helper is moved to sock.h to use mem_cgroup_from_sk(). Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Acked-by: Roman Gushchin Acked-by: Shakeel Butt Link: https://patch.msgid.link/20250815201712.1745332-10-kuniyu@google.com Signed-off-by: Jakub Kicinski --- include/linux/memcontrol.h | 18 ------------------ include/net/proto_memory.h | 2 +- include/net/sock.h | 22 ++++++++++++++++++++++ include/net/tcp.h | 2 +- 4 files changed, 24 insertions(+), 20 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0837d3de3a68..fb27e3d2fdac 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1642,19 +1642,6 @@ static inline u64 mem_cgroup_get_socket_pressure(struct mem_cgroup *memcg) } #endif -static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg) -{ -#ifdef CONFIG_MEMCG_V1 - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) - return !!memcg->tcpmem_pressure; -#endif /* CONFIG_MEMCG_V1 */ - do { - if (time_before64(get_jiffies_64(), mem_cgroup_get_socket_pressure(memcg))) - return true; - } while ((memcg = parent_mem_cgroup(memcg))); - return false; -} - int alloc_shrinker_info(struct mem_cgroup *memcg); void free_shrinker_info(struct mem_cgroup *memcg); void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id); @@ -1686,11 +1673,6 @@ static inline void mem_cgroup_sk_uncharge(const struct sock *sk, { } -static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg) -{ - return false; -} - static inline void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id) { diff --git a/include/net/proto_memory.h b/include/net/proto_memory.h index 859e63de81c4..8e91a8fa31b5 100644 --- a/include/net/proto_memory.h +++ b/include/net/proto_memory.h @@ -32,7 +32,7 @@ static inline bool sk_under_memory_pressure(const struct sock *sk) return false; if (mem_cgroup_sk_enabled(sk) && - mem_cgroup_under_socket_pressure(sk->sk_memcg)) + mem_cgroup_sk_under_memory_pressure(sk)) return true; return !!READ_ONCE(*sk->sk_prot->memory_pressure); diff --git a/include/net/sock.h b/include/net/sock.h index 3efdf680401d..3bc4d566f7d0 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2604,6 +2604,23 @@ static inline bool mem_cgroup_sk_enabled(const struct sock *sk) { return mem_cgroup_sockets_enabled && mem_cgroup_from_sk(sk); } + +static inline bool mem_cgroup_sk_under_memory_pressure(const struct sock *sk) +{ + struct mem_cgroup *memcg = mem_cgroup_from_sk(sk); + +#ifdef CONFIG_MEMCG_V1 + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return !!memcg->tcpmem_pressure; +#endif /* CONFIG_MEMCG_V1 */ + + do { + if (time_before64(get_jiffies_64(), mem_cgroup_get_socket_pressure(memcg))) + return true; + } while ((memcg = parent_mem_cgroup(memcg))); + + return false; +} #else static inline struct mem_cgroup *mem_cgroup_from_sk(const struct sock *sk) { @@ -2614,6 +2631,11 @@ static inline bool mem_cgroup_sk_enabled(const struct sock *sk) { return false; } + +static inline bool mem_cgroup_sk_under_memory_pressure(const struct sock *sk) +{ + return false; +} #endif static inline long sock_rcvtimeo(const struct sock *sk, bool noblock) diff --git a/include/net/tcp.h b/include/net/tcp.h index 9f01b6be6444..2936b8175950 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -276,7 +276,7 @@ extern unsigned long tcp_memory_pressure; static inline bool tcp_under_memory_pressure(const struct sock *sk) { if (mem_cgroup_sk_enabled(sk) && - mem_cgroup_under_socket_pressure(sk->sk_memcg)) + mem_cgroup_sk_under_memory_pressure(sk)) return true; return READ_ONCE(tcp_memory_pressure); From bf64002c94fc330b996bc438f3d1b6bd3d781659 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 15 Aug 2025 20:16:18 +0000 Subject: [PATCH 0183/1536] net: Define sk_memcg under CONFIG_MEMCG. Except for sk_clone_lock(), all accesses to sk->sk_memcg is done under CONFIG_MEMCG. As a bonus, let's define sk->sk_memcg under CONFIG_MEMCG. Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Acked-by: Roman Gushchin Acked-by: Shakeel Butt Link: https://patch.msgid.link/20250815201712.1745332-11-kuniyu@google.com Signed-off-by: Jakub Kicinski --- include/net/sock.h | 2 ++ net/core/sock.c | 4 ++++ 2 files changed, 6 insertions(+) diff --git a/include/net/sock.h b/include/net/sock.h index 3bc4d566f7d0..1c49ea13af4a 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -443,7 +443,9 @@ struct sock { __cacheline_group_begin(sock_read_rxtx); int sk_err; struct socket *sk_socket; +#ifdef CONFIG_MEMCG struct mem_cgroup *sk_memcg; +#endif #ifdef CONFIG_XFRM struct xfrm_policy __rcu *sk_policy[2]; #endif diff --git a/net/core/sock.c b/net/core/sock.c index 5537ca263858..ab6953d295df 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2512,8 +2512,10 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) sock_reset_flag(newsk, SOCK_DONE); +#ifdef CONFIG_MEMCG /* sk->sk_memcg will be populated at accept() time */ newsk->sk_memcg = NULL; +#endif cgroup_sk_clone(&newsk->sk_cgrp_data); @@ -4452,7 +4454,9 @@ static int __init sock_struct_check(void) CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_rxtx, sk_err); CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_rxtx, sk_socket); +#ifdef CONFIG_MEMCG CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_rxtx, sk_memcg); +#endif CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_write_rxtx, sk_lock); CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_write_rxtx, sk_reserved_mem); From 490a9591b5feb40b91052bbb0e6bc038ed8490ff Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 18 Aug 2025 13:54:22 -0700 Subject: [PATCH 0184/1536] selftests: net: Explicitly enable CONFIG_CRYPTO_SHA1 for IPsec xfrm_policy.sh, nft_flowtable.sh, and vrf-xfrm-tests.sh use 'ip xfrm' with SHA-1, either 'auth sha1' or 'auth-trunc hmac(sha1)'. That requires CONFIG_CRYPTO_SHA1, which CONFIG_INET_ESP intentionally doesn't select (as per its help text). Previously, the config for these tests relied on CONFIG_CRYPTO_SHA1 being selected by the unrelated option CONFIG_IP_SCTP. Since CONFIG_IP_SCTP is being changed to no longer do that, instead add CONFIG_CRYPTO_SHA1 to the configs explicitly. Reported-by: Paolo Abeni Closes: https://lore.kernel.org/r/766e4508-aaba-4cdc-92b4-e116e52ae13b@redhat.com Suggested-by: Florian Westphal Acked-by: Xin Long Signed-off-by: Eric Biggers Link: https://patch.msgid.link/20250818205426.30222-2-ebiggers@kernel.org Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/config | 1 + tools/testing/selftests/net/netfilter/config | 1 + 2 files changed, 2 insertions(+) diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index c24417d0047b..d548611e2698 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -26,6 +26,7 @@ CONFIG_IFB=y CONFIG_INET_DIAG=y CONFIG_INET_ESP=y CONFIG_INET_ESP_OFFLOAD=y +CONFIG_CRYPTO_SHA1=y CONFIG_NET_FOU=y CONFIG_NET_FOU_IP_TUNNELS=y CONFIG_NETFILTER=y diff --git a/tools/testing/selftests/net/netfilter/config b/tools/testing/selftests/net/netfilter/config index 79d5b33966ba..305e46b819cb 100644 --- a/tools/testing/selftests/net/netfilter/config +++ b/tools/testing/selftests/net/netfilter/config @@ -13,6 +13,7 @@ CONFIG_BRIDGE_VLAN_FILTERING=y CONFIG_CGROUP_BPF=y CONFIG_DUMMY=m CONFIG_INET_ESP=m +CONFIG_CRYPTO_SHA1=m CONFIG_IP_NF_MATCH_RPFILTER=m CONFIG_IP6_NF_MATCH_RPFILTER=m CONFIG_IP_NF_IPTABLES=m From dd91c79e4f58fbe2898dac84858033700e0e99fb Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 18 Aug 2025 13:54:23 -0700 Subject: [PATCH 0185/1536] sctp: Fix MAC comparison to be constant-time To prevent timing attacks, MACs need to be compared in constant time. Use the appropriate helper function for this. Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Link: https://patch.msgid.link/20250818205426.30222-3-ebiggers@kernel.org Signed-off-by: Jakub Kicinski --- net/sctp/sm_make_chunk.c | 3 ++- net/sctp/sm_statefuns.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index 3ead591c72fd..d099b605e44a 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -31,6 +31,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include +#include #include #include #include @@ -1788,7 +1789,7 @@ struct sctp_association *sctp_unpack_cookie( } } - if (memcmp(digest, cookie->signature, SCTP_SIGNATURE_SIZE)) { + if (crypto_memneq(digest, cookie->signature, SCTP_SIGNATURE_SIZE)) { *error = -SCTP_IERROR_BAD_SIG; goto fail; } diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index a0524ba8d787..d4d5b14b49b3 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -30,6 +30,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include #include #include #include @@ -4416,7 +4417,7 @@ static enum sctp_ierror sctp_sf_authenticate( sh_key, GFP_ATOMIC); /* Discard the packet if the digests do not match */ - if (memcmp(save_digest, digest, sig_len)) { + if (crypto_memneq(save_digest, digest, sig_len)) { kfree(save_digest); return SCTP_IERROR_BAD_SIG; } From bf40785fa437c1752117df2edb3220e9c37d98a6 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 18 Aug 2025 13:54:24 -0700 Subject: [PATCH 0186/1536] sctp: Use HMAC-SHA1 and HMAC-SHA256 library for chunk authentication For SCTP chunk authentication, use the HMAC-SHA1 and HMAC-SHA256 library functions instead of crypto_shash. This is simpler and faster. There's no longer any need to pre-allocate 'crypto_shash' objects; the SCTP code now simply calls into the HMAC code directly. As part of this, make SCTP always support both HMAC-SHA1 and HMAC-SHA256. Previously, it only guaranteed support for HMAC-SHA1. However, HMAC-SHA256 tended to be supported too anyway, as it was supported if CONFIG_CRYPTO_SHA256 was enabled elsewhere in the kconfig. Acked-by: Xin Long Signed-off-by: Eric Biggers Link: https://patch.msgid.link/20250818205426.30222-4-ebiggers@kernel.org Signed-off-by: Jakub Kicinski --- include/net/sctp/auth.h | 17 ++-- include/net/sctp/constants.h | 4 - include/net/sctp/structs.h | 5 -- net/sctp/Kconfig | 16 ++-- net/sctp/auth.c | 166 ++++++----------------------------- net/sctp/chunk.c | 3 +- net/sctp/sm_make_chunk.c | 2 +- net/sctp/sm_statefuns.c | 2 +- net/sctp/socket.c | 10 --- 9 files changed, 48 insertions(+), 177 deletions(-) diff --git a/include/net/sctp/auth.h b/include/net/sctp/auth.h index d4b3b2dcd15b..3d5879e08e78 100644 --- a/include/net/sctp/auth.h +++ b/include/net/sctp/auth.h @@ -22,16 +22,11 @@ struct sctp_endpoint; struct sctp_association; struct sctp_authkey; struct sctp_hmacalgo; -struct crypto_shash; -/* - * Define a generic struct that will hold all the info - * necessary for an HMAC transform - */ +/* Defines an HMAC algorithm supported by SCTP chunk authentication */ struct sctp_hmac { - __u16 hmac_id; /* one of the above ids */ - char *hmac_name; /* name for loading */ - __u16 hmac_len; /* length of the signature */ + __u16 hmac_id; /* one of SCTP_AUTH_HMAC_ID_* */ + __u16 hmac_len; /* length of the HMAC value in bytes */ }; /* This is generic structure that containst authentication bytes used @@ -78,9 +73,9 @@ int sctp_auth_asoc_copy_shkeys(const struct sctp_endpoint *ep, struct sctp_association *asoc, gfp_t gfp); int sctp_auth_init_hmacs(struct sctp_endpoint *ep, gfp_t gfp); -void sctp_auth_destroy_hmacs(struct crypto_shash *auth_hmacs[]); -struct sctp_hmac *sctp_auth_get_hmac(__u16 hmac_id); -struct sctp_hmac *sctp_auth_asoc_get_hmac(const struct sctp_association *asoc); +const struct sctp_hmac *sctp_auth_get_hmac(__u16 hmac_id); +const struct sctp_hmac * +sctp_auth_asoc_get_hmac(const struct sctp_association *asoc); void sctp_auth_asoc_set_default_hmac(struct sctp_association *asoc, struct sctp_hmac_algo_param *hmacs); int sctp_auth_asoc_verify_hmac_id(const struct sctp_association *asoc, diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h index 5859e0a16a58..8e0f4c4f7750 100644 --- a/include/net/sctp/constants.h +++ b/include/net/sctp/constants.h @@ -417,16 +417,12 @@ enum { SCTP_AUTH_HMAC_ID_RESERVED_0, SCTP_AUTH_HMAC_ID_SHA1, SCTP_AUTH_HMAC_ID_RESERVED_2, -#if defined (CONFIG_CRYPTO_SHA256) || defined (CONFIG_CRYPTO_SHA256_MODULE) SCTP_AUTH_HMAC_ID_SHA256, -#endif __SCTP_AUTH_HMAC_MAX }; #define SCTP_AUTH_HMAC_ID_MAX __SCTP_AUTH_HMAC_MAX - 1 #define SCTP_AUTH_NUM_HMACS __SCTP_AUTH_HMAC_MAX -#define SCTP_SHA1_SIG_SIZE 20 -#define SCTP_SHA256_SIG_SIZE 32 /* SCTP-AUTH, Section 3.2 * The chunk types for INIT, INIT-ACK, SHUTDOWN-COMPLETE and AUTH chunks diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 8a540ad9b509..6be6aec25731 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -1329,11 +1329,6 @@ struct sctp_endpoint { /* rcvbuf acct. policy. */ __u32 rcvbuf_policy; - /* SCTP AUTH: array of the HMACs that will be allocated - * we need this per association so that we don't serialize - */ - struct crypto_shash **auth_hmacs; - /* SCTP-AUTH: hmacs for the endpoint encoded into parameter */ struct sctp_hmac_algo_param *auth_hmacs_list; diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig index 24d5a35ce894..09c77b4d161b 100644 --- a/net/sctp/Kconfig +++ b/net/sctp/Kconfig @@ -7,9 +7,9 @@ menuconfig IP_SCTP tristate "The SCTP Protocol" depends on INET depends on IPV6 || IPV6=n - select CRYPTO - select CRYPTO_HMAC - select CRYPTO_SHA1 + select CRYPTO_LIB_SHA1 + select CRYPTO_LIB_SHA256 + select CRYPTO_LIB_UTILS select NET_CRC32C select NET_UDP_TUNNEL help @@ -79,15 +79,17 @@ config SCTP_COOKIE_HMAC_MD5 bool "Enable optional MD5 hmac cookie generation" help Enable optional MD5 hmac based SCTP cookie generation - select CRYPTO_HMAC if SCTP_COOKIE_HMAC_MD5 - select CRYPTO_MD5 if SCTP_COOKIE_HMAC_MD5 + select CRYPTO + select CRYPTO_HMAC + select CRYPTO_MD5 config SCTP_COOKIE_HMAC_SHA1 bool "Enable optional SHA1 hmac cookie generation" help Enable optional SHA1 hmac based SCTP cookie generation - select CRYPTO_HMAC if SCTP_COOKIE_HMAC_SHA1 - select CRYPTO_SHA1 if SCTP_COOKIE_HMAC_SHA1 + select CRYPTO + select CRYPTO_HMAC + select CRYPTO_SHA1 config INET_SCTP_DIAG depends on INET_DIAG diff --git a/net/sctp/auth.c b/net/sctp/auth.c index c58fffc86a0c..82aad477590e 100644 --- a/net/sctp/auth.c +++ b/net/sctp/auth.c @@ -12,36 +12,37 @@ * Vlad Yasevich */ -#include +#include +#include #include #include -#include #include #include -static struct sctp_hmac sctp_hmac_list[SCTP_AUTH_NUM_HMACS] = { +static const struct sctp_hmac sctp_hmac_list[SCTP_AUTH_NUM_HMACS] = { { /* id 0 is reserved. as all 0 */ .hmac_id = SCTP_AUTH_HMAC_ID_RESERVED_0, }, { .hmac_id = SCTP_AUTH_HMAC_ID_SHA1, - .hmac_name = "hmac(sha1)", - .hmac_len = SCTP_SHA1_SIG_SIZE, + .hmac_len = SHA1_DIGEST_SIZE, }, { /* id 2 is reserved as well */ .hmac_id = SCTP_AUTH_HMAC_ID_RESERVED_2, }, -#if IS_ENABLED(CONFIG_CRYPTO_SHA256) { .hmac_id = SCTP_AUTH_HMAC_ID_SHA256, - .hmac_name = "hmac(sha256)", - .hmac_len = SCTP_SHA256_SIG_SIZE, + .hmac_len = SHA256_DIGEST_SIZE, } -#endif }; +static bool sctp_hmac_supported(__u16 hmac_id) +{ + return hmac_id < ARRAY_SIZE(sctp_hmac_list) && + sctp_hmac_list[hmac_id].hmac_len != 0; +} void sctp_auth_key_put(struct sctp_auth_bytes *key) { @@ -444,76 +445,7 @@ struct sctp_shared_key *sctp_auth_get_shkey( return NULL; } -/* - * Initialize all the possible digest transforms that we can use. Right - * now, the supported digests are SHA1 and SHA256. We do this here once - * because of the restrictiong that transforms may only be allocated in - * user context. This forces us to pre-allocated all possible transforms - * at the endpoint init time. - */ -int sctp_auth_init_hmacs(struct sctp_endpoint *ep, gfp_t gfp) -{ - struct crypto_shash *tfm = NULL; - __u16 id; - - /* If the transforms are already allocated, we are done */ - if (ep->auth_hmacs) - return 0; - - /* Allocated the array of pointers to transorms */ - ep->auth_hmacs = kcalloc(SCTP_AUTH_NUM_HMACS, - sizeof(struct crypto_shash *), - gfp); - if (!ep->auth_hmacs) - return -ENOMEM; - - for (id = 0; id < SCTP_AUTH_NUM_HMACS; id++) { - - /* See is we support the id. Supported IDs have name and - * length fields set, so that we can allocated and use - * them. We can safely just check for name, for without the - * name, we can't allocate the TFM. - */ - if (!sctp_hmac_list[id].hmac_name) - continue; - - /* If this TFM has been allocated, we are all set */ - if (ep->auth_hmacs[id]) - continue; - - /* Allocate the ID */ - tfm = crypto_alloc_shash(sctp_hmac_list[id].hmac_name, 0, 0); - if (IS_ERR(tfm)) - goto out_err; - - ep->auth_hmacs[id] = tfm; - } - - return 0; - -out_err: - /* Clean up any successful allocations */ - sctp_auth_destroy_hmacs(ep->auth_hmacs); - ep->auth_hmacs = NULL; - return -ENOMEM; -} - -/* Destroy the hmac tfm array */ -void sctp_auth_destroy_hmacs(struct crypto_shash *auth_hmacs[]) -{ - int i; - - if (!auth_hmacs) - return; - - for (i = 0; i < SCTP_AUTH_NUM_HMACS; i++) { - crypto_free_shash(auth_hmacs[i]); - } - kfree(auth_hmacs); -} - - -struct sctp_hmac *sctp_auth_get_hmac(__u16 hmac_id) +const struct sctp_hmac *sctp_auth_get_hmac(__u16 hmac_id) { return &sctp_hmac_list[hmac_id]; } @@ -521,7 +453,8 @@ struct sctp_hmac *sctp_auth_get_hmac(__u16 hmac_id) /* Get an hmac description information that we can use to build * the AUTH chunk */ -struct sctp_hmac *sctp_auth_asoc_get_hmac(const struct sctp_association *asoc) +const struct sctp_hmac * +sctp_auth_asoc_get_hmac(const struct sctp_association *asoc) { struct sctp_hmac_algo_param *hmacs; __u16 n_elt; @@ -543,26 +476,10 @@ struct sctp_hmac *sctp_auth_asoc_get_hmac(const struct sctp_association *asoc) sizeof(struct sctp_paramhdr)) >> 1; for (i = 0; i < n_elt; i++) { id = ntohs(hmacs->hmac_ids[i]); - - /* Check the id is in the supported range. And - * see if we support the id. Supported IDs have name and - * length fields set, so that we can allocate and use - * them. We can safely just check for name, for without the - * name, we can't allocate the TFM. - */ - if (id > SCTP_AUTH_HMAC_ID_MAX || - !sctp_hmac_list[id].hmac_name) { - id = 0; - continue; - } - - break; + if (sctp_hmac_supported(id)) + return &sctp_hmac_list[id]; } - - if (id == 0) - return NULL; - - return &sctp_hmac_list[id]; + return NULL; } static int __sctp_auth_find_hmacid(__be16 *hmacs, int n_elts, __be16 hmac_id) @@ -606,7 +523,6 @@ int sctp_auth_asoc_verify_hmac_id(const struct sctp_association *asoc, void sctp_auth_asoc_set_default_hmac(struct sctp_association *asoc, struct sctp_hmac_algo_param *hmacs) { - struct sctp_endpoint *ep; __u16 id; int i; int n_params; @@ -617,16 +533,9 @@ void sctp_auth_asoc_set_default_hmac(struct sctp_association *asoc, n_params = (ntohs(hmacs->param_hdr.length) - sizeof(struct sctp_paramhdr)) >> 1; - ep = asoc->ep; for (i = 0; i < n_params; i++) { id = ntohs(hmacs->hmac_ids[i]); - - /* Check the id is in the supported range */ - if (id > SCTP_AUTH_HMAC_ID_MAX) - continue; - - /* If this TFM has been allocated, use this id */ - if (ep->auth_hmacs[id]) { + if (sctp_hmac_supported(id)) { asoc->default_hmac_id = id; break; } @@ -709,10 +618,9 @@ void sctp_auth_calculate_hmac(const struct sctp_association *asoc, struct sctp_shared_key *ep_key, gfp_t gfp) { struct sctp_auth_bytes *asoc_key; - struct crypto_shash *tfm; __u16 key_id, hmac_id; - unsigned char *end; int free_key = 0; + size_t data_len; __u8 *digest; /* Extract the info we need: @@ -733,19 +641,17 @@ void sctp_auth_calculate_hmac(const struct sctp_association *asoc, free_key = 1; } - /* set up scatter list */ - end = skb_tail_pointer(skb); - - tfm = asoc->ep->auth_hmacs[hmac_id]; - + data_len = skb_tail_pointer(skb) - (unsigned char *)auth; digest = (u8 *)(&auth->auth_hdr + 1); - if (crypto_shash_setkey(tfm, &asoc_key->data[0], asoc_key->len)) - goto free; + if (hmac_id == SCTP_AUTH_HMAC_ID_SHA1) { + hmac_sha1_usingrawkey(asoc_key->data, asoc_key->len, + (const u8 *)auth, data_len, digest); + } else { + WARN_ON_ONCE(hmac_id != SCTP_AUTH_HMAC_ID_SHA256); + hmac_sha256_usingrawkey(asoc_key->data, asoc_key->len, + (const u8 *)auth, data_len, digest); + } - crypto_shash_tfm_digest(tfm, (u8 *)auth, end - (unsigned char *)auth, - digest); - -free: if (free_key) sctp_auth_key_put(asoc_key); } @@ -788,14 +694,11 @@ int sctp_auth_ep_set_hmacs(struct sctp_endpoint *ep, for (i = 0; i < hmacs->shmac_num_idents; i++) { id = hmacs->shmac_idents[i]; - if (id > SCTP_AUTH_HMAC_ID_MAX) + if (!sctp_hmac_supported(id)) return -EOPNOTSUPP; if (SCTP_AUTH_HMAC_ID_SHA1 == id) has_sha1 = 1; - - if (!sctp_hmac_list[id].hmac_name) - return -EOPNOTSUPP; } if (!has_sha1) @@ -1021,8 +924,6 @@ int sctp_auth_deact_key_id(struct sctp_endpoint *ep, int sctp_auth_init(struct sctp_endpoint *ep, gfp_t gfp) { - int err = -ENOMEM; - /* Allocate space for HMACS and CHUNKS authentication * variables. There are arrays that we encode directly * into parameters to make the rest of the operations easier. @@ -1060,13 +961,6 @@ int sctp_auth_init(struct sctp_endpoint *ep, gfp_t gfp) ep->auth_chunk_list = auth_chunks; } - /* Allocate and initialize transorms arrays for supported - * HMACs. - */ - err = sctp_auth_init_hmacs(ep, gfp); - if (err) - goto nomem; - return 0; nomem: @@ -1075,7 +969,7 @@ nomem: kfree(ep->auth_chunk_list); ep->auth_hmacs_list = NULL; ep->auth_chunk_list = NULL; - return err; + return -ENOMEM; } void sctp_auth_free(struct sctp_endpoint *ep) @@ -1084,6 +978,4 @@ void sctp_auth_free(struct sctp_endpoint *ep) kfree(ep->auth_chunk_list); ep->auth_hmacs_list = NULL; ep->auth_chunk_list = NULL; - sctp_auth_destroy_hmacs(ep->auth_hmacs); - ep->auth_hmacs = NULL; } diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c index fd4f8243cc35..c655b571ca01 100644 --- a/net/sctp/chunk.c +++ b/net/sctp/chunk.c @@ -184,7 +184,8 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc, * DATA. */ if (sctp_auth_send_cid(SCTP_CID_DATA, asoc)) { - struct sctp_hmac *hmac_desc = sctp_auth_asoc_get_hmac(asoc); + const struct sctp_hmac *hmac_desc = + sctp_auth_asoc_get_hmac(asoc); if (hmac_desc) max_data -= SCTP_PAD4(sizeof(struct sctp_auth_chunk) + diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index d099b605e44a..a1a3c8494c5d 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -1320,7 +1320,7 @@ struct sctp_chunk *sctp_make_auth(const struct sctp_association *asoc, __u16 key_id) { struct sctp_authhdr auth_hdr; - struct sctp_hmac *hmac_desc; + const struct sctp_hmac *hmac_desc; struct sctp_chunk *retval; /* Get the first hmac that the peer told us to use */ diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index d4d5b14b49b3..4cb8f393434d 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -4362,7 +4362,7 @@ static enum sctp_ierror sctp_sf_authenticate( struct sctp_shared_key *sh_key = NULL; struct sctp_authhdr *auth_hdr; __u8 *save_digest, *digest; - struct sctp_hmac *hmac; + const struct sctp_hmac *hmac; unsigned int sig_len; __u16 key_id; diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 4921416434f9..0292881a847c 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -9581,16 +9581,6 @@ static int sctp_sock_migrate(struct sock *oldsk, struct sock *newsk, if (err) return err; - /* New ep's auth_hmacs should be set if old ep's is set, in case - * that net->sctp.auth_enable has been changed to 0 by users and - * new ep's auth_hmacs couldn't be set in sctp_endpoint_init(). - */ - if (oldsp->ep->auth_hmacs) { - err = sctp_auth_init_hmacs(newsp->ep, GFP_KERNEL); - if (err) - return err; - } - sctp_auto_asconf_init(newsp); /* Move any messages in the old socket's receive queue that are for the From 2f3dd6ec901f29aef5fff3d7a63b1371d67c1760 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 18 Aug 2025 13:54:25 -0700 Subject: [PATCH 0187/1536] sctp: Convert cookie authentication to use HMAC-SHA256 Convert SCTP cookies to use HMAC-SHA256, instead of the previous choice of the legacy algorithms HMAC-MD5 and HMAC-SHA1. Simplify and optimize the code by using the HMAC-SHA256 library instead of crypto_shash, and by preparing the HMAC key when it is generated instead of per-operation. This doesn't break compatibility, since the cookie format is an implementation detail, not part of the SCTP protocol itself. Note that the cookie size doesn't change either. The HMAC field was already 32 bytes, even though previously at most 20 bytes were actually compared. 32 bytes exactly fits an untruncated HMAC-SHA256 value. So, although we could safely truncate the MAC to something slightly shorter, for now just keep the cookie size the same. I also considered SipHash, but that would generate only 8-byte MACs. An 8-byte MAC *might* suffice here. However, there's quite a lot of information in the SCTP cookies: more than in TCP SYN cookies. So absent an analysis that occasional forgeries of all that information is okay in SCTP, I errored on the side of caution. Remove HMAC-MD5 and HMAC-SHA1 as options, since the new HMAC-SHA256 option is just better. It's faster as well as more secure. For example, benchmarking on x86_64, cookie authentication is now nearly 3x as fast as the previous default choice and implementation of HMAC-MD5. Also just make the kernel always support cookie authentication if SCTP is supported at all, rather than making it optional in the build. (It was sort of optional before, but it didn't really work properly. E.g., a kernel with CONFIG_SCTP_COOKIE_HMAC_MD5=n still supported HMAC-MD5 cookie authentication if CONFIG_CRYPTO_HMAC and CONFIG_CRYPTO_MD5 happened to be enabled in the kconfig for other reasons.) Acked-by: Xin Long Signed-off-by: Eric Biggers Link: https://patch.msgid.link/20250818205426.30222-5-ebiggers@kernel.org Signed-off-by: Jakub Kicinski --- Documentation/networking/ip-sysctl.rst | 11 ++--- include/net/netns/sctp.h | 4 +- include/net/sctp/constants.h | 5 +-- include/net/sctp/structs.h | 28 +++---------- net/sctp/Kconfig | 43 +++++-------------- net/sctp/endpointola.c | 23 ++++++----- net/sctp/protocol.c | 11 ++--- net/sctp/sm_make_chunk.c | 57 ++++++++------------------ net/sctp/socket.c | 31 +------------- net/sctp/sysctl.c | 55 +++++++++++-------------- 10 files changed, 80 insertions(+), 188 deletions(-) diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 9756d16e3df1..3d6782683eee 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -3508,16 +3508,13 @@ cookie_hmac_alg - STRING a listening sctp socket to a connecting client in the INIT-ACK chunk. Valid values are: - * md5 - * sha1 + * sha256 * none - Ability to assign md5 or sha1 as the selected alg is predicated on the - configuration of those algorithms at build time (CONFIG_CRYPTO_MD5 and - CONFIG_CRYPTO_SHA1). + md5 and sha1 are also accepted for backwards compatibility, but cause + sha256 to be selected. - Default: Dependent on configuration. MD5 if available, else SHA1 if - available, else none. + Default: sha256 rcvbuf_policy - INTEGER Determines if the receive buffer is attributed to the socket or to diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h index d25cd7a9c5ff..c0f97f36389e 100644 --- a/include/net/netns/sctp.h +++ b/include/net/netns/sctp.h @@ -75,8 +75,8 @@ struct netns_sctp { /* Whether Cookie Preservative is enabled(1) or not(0) */ int cookie_preserve_enable; - /* The namespace default hmac alg */ - char *sctp_hmac_alg; + /* Whether cookie authentication is enabled(1) or not(0) */ + int cookie_auth_enable; /* Valid.Cookie.Life - 60 seconds */ unsigned int valid_cookie_life; diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h index 8e0f4c4f7750..ae3376ba0b99 100644 --- a/include/net/sctp/constants.h +++ b/include/net/sctp/constants.h @@ -296,9 +296,8 @@ enum { SCTP_MAX_GABS = 16 }; */ #define SCTP_DEFAULT_MINSEGMENT 512 /* MTU size ... if no mtu disc */ -#define SCTP_SECRET_SIZE 32 /* Number of octets in a 256 bits. */ - -#define SCTP_SIGNATURE_SIZE 20 /* size of a SLA-1 signature */ +#define SCTP_COOKIE_KEY_SIZE 32 /* size of cookie HMAC key */ +#define SCTP_COOKIE_MAC_SIZE 32 /* size of HMAC field in cookies */ #define SCTP_COOKIE_MULTIPLE 32 /* Pad out our cookie to make our hash * functions simpler to write. diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 6be6aec25731..2ae390219efd 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -32,6 +32,7 @@ #ifndef __sctp_structs_h__ #define __sctp_structs_h__ +#include #include #include #include @@ -68,7 +69,6 @@ struct sctp_outq; struct sctp_bind_addr; struct sctp_ulpq; struct sctp_ep_common; -struct crypto_shash; struct sctp_stream; @@ -155,10 +155,6 @@ struct sctp_sock { /* PF_ family specific functions. */ struct sctp_pf *pf; - /* Access to HMAC transform. */ - struct crypto_shash *hmac; - char *sctp_hmac_alg; - /* What is our base endpointer? */ struct sctp_endpoint *ep; @@ -227,7 +223,8 @@ struct sctp_sock { frag_interleave:1, recvrcvinfo:1, recvnxtinfo:1, - data_ready_signalled:1; + data_ready_signalled:1, + cookie_auth_enable:1; atomic_t pd_mode; @@ -335,7 +332,7 @@ struct sctp_cookie { /* The format of our cookie that we send to our peer. */ struct sctp_signed_cookie { - __u8 signature[SCTP_SECRET_SIZE]; + __u8 mac[SCTP_COOKIE_MAC_SIZE]; __u32 __pad; /* force sctp_cookie alignment to 64 bits */ struct sctp_cookie c; } __packed; @@ -1307,22 +1304,9 @@ struct sctp_endpoint { /* This is really a list of struct sctp_association entries. */ struct list_head asocs; - /* Secret Key: A secret key used by this endpoint to compute - * the MAC. This SHOULD be a cryptographic quality - * random number with a sufficient length. - * Discussion in [RFC1750] can be helpful in - * selection of the key. - */ - __u8 secret_key[SCTP_SECRET_SIZE]; + /* Cookie authentication key used by this endpoint */ + struct hmac_sha256_key cookie_auth_key; - /* digest: This is a digest of the sctp cookie. This field is - * only used on the receive path when we try to validate - * that the cookie has not been tampered with. We put - * this here so we pre-allocate this once and can re-use - * on every receive. - */ - __u8 *digest; - /* sendbuf acct. policy. */ __u32 sndbuf_policy; diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig index 09c77b4d161b..e947646a380c 100644 --- a/net/sctp/Kconfig +++ b/net/sctp/Kconfig @@ -49,48 +49,25 @@ config SCTP_DBG_OBJCNT 'cat /proc/net/sctp/sctp_dbg_objcnt' If unsure, say N + choice - prompt "Default SCTP cookie HMAC encoding" - default SCTP_DEFAULT_COOKIE_HMAC_MD5 + prompt "Default SCTP cookie authentication method" + default SCTP_DEFAULT_COOKIE_HMAC_SHA256 help - This option sets the default sctp cookie hmac algorithm - when in doubt select 'md5' + This option sets the default SCTP cookie authentication method, for + when a method hasn't been explicitly selected via the + net.sctp.cookie_hmac_alg sysctl. -config SCTP_DEFAULT_COOKIE_HMAC_MD5 - bool "Enable optional MD5 hmac cookie generation" - help - Enable optional MD5 hmac based SCTP cookie generation - select SCTP_COOKIE_HMAC_MD5 + If unsure, choose the default (HMAC-SHA256). -config SCTP_DEFAULT_COOKIE_HMAC_SHA1 - bool "Enable optional SHA1 hmac cookie generation" - help - Enable optional SHA1 hmac based SCTP cookie generation - select SCTP_COOKIE_HMAC_SHA1 +config SCTP_DEFAULT_COOKIE_HMAC_SHA256 + bool "HMAC-SHA256" config SCTP_DEFAULT_COOKIE_HMAC_NONE - bool "Use no hmac alg in SCTP cookie generation" - help - Use no hmac algorithm in SCTP cookie generation + bool "None" endchoice -config SCTP_COOKIE_HMAC_MD5 - bool "Enable optional MD5 hmac cookie generation" - help - Enable optional MD5 hmac based SCTP cookie generation - select CRYPTO - select CRYPTO_HMAC - select CRYPTO_MD5 - -config SCTP_COOKIE_HMAC_SHA1 - bool "Enable optional SHA1 hmac cookie generation" - help - Enable optional SHA1 hmac based SCTP cookie generation - select CRYPTO - select CRYPTO_HMAC - select CRYPTO_SHA1 - config INET_SCTP_DIAG depends on INET_DIAG def_tristate INET_DIAG diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c index 7e77b450697c..31e989dfe846 100644 --- a/net/sctp/endpointola.c +++ b/net/sctp/endpointola.c @@ -35,6 +35,15 @@ /* Forward declarations for internal helpers. */ static void sctp_endpoint_bh_rcv(struct work_struct *work); +static void gen_cookie_auth_key(struct hmac_sha256_key *key) +{ + u8 raw_key[SCTP_COOKIE_KEY_SIZE]; + + get_random_bytes(raw_key, sizeof(raw_key)); + hmac_sha256_preparekey(key, raw_key, sizeof(raw_key)); + memzero_explicit(raw_key, sizeof(raw_key)); +} + /* * Initialize the base fields of the endpoint structure. */ @@ -45,10 +54,6 @@ static struct sctp_endpoint *sctp_endpoint_init(struct sctp_endpoint *ep, struct net *net = sock_net(sk); struct sctp_shared_key *null_key; - ep->digest = kzalloc(SCTP_SIGNATURE_SIZE, gfp); - if (!ep->digest) - return NULL; - ep->asconf_enable = net->sctp.addip_enable; ep->auth_enable = net->sctp.auth_enable; if (ep->auth_enable) { @@ -90,8 +95,8 @@ static struct sctp_endpoint *sctp_endpoint_init(struct sctp_endpoint *ep, /* Get the receive buffer policy for this endpoint */ ep->rcvbuf_policy = net->sctp.rcvbuf_policy; - /* Initialize the secret key used with cookie. */ - get_random_bytes(ep->secret_key, sizeof(ep->secret_key)); + /* Generate the cookie authentication key. */ + gen_cookie_auth_key(&ep->cookie_auth_key); /* SCTP-AUTH extensions*/ INIT_LIST_HEAD(&ep->endpoint_shared_keys); @@ -118,7 +123,6 @@ static struct sctp_endpoint *sctp_endpoint_init(struct sctp_endpoint *ep, nomem_shkey: sctp_auth_free(ep); nomem: - kfree(ep->digest); return NULL; } @@ -205,9 +209,6 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep) return; } - /* Free the digest buffer */ - kfree(ep->digest); - /* SCTP-AUTH: Free up AUTH releated data such as shared keys * chunks and hmacs arrays that were allocated */ @@ -218,7 +219,7 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep) sctp_inq_free(&ep->base.inqueue); sctp_bind_addr_free(&ep->base.bind_addr); - memset(ep->secret_key, 0, sizeof(ep->secret_key)); + memzero_explicit(&ep->cookie_auth_key, sizeof(ep->cookie_auth_key)); sk = ep->base.sk; /* Remove and free the port */ diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index a5ccada55f2b..3b2373b3bd5d 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -1334,14 +1334,9 @@ static int __net_init sctp_defaults_init(struct net *net) /* Whether Cookie Preservative is enabled(1) or not(0) */ net->sctp.cookie_preserve_enable = 1; - /* Default sctp sockets to use md5 as their hmac alg */ -#if defined (CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5) - net->sctp.sctp_hmac_alg = "md5"; -#elif defined (CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1) - net->sctp.sctp_hmac_alg = "sha1"; -#else - net->sctp.sctp_hmac_alg = NULL; -#endif + /* Whether cookie authentication is enabled(1) or not(0) */ + net->sctp.cookie_auth_enable = + !IS_ENABLED(CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE); /* Max.Burst - 4 */ net->sctp.max_burst = SCTP_DEFAULT_MAX_BURST; diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index a1a3c8494c5d..2c0017d058d4 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -30,7 +30,6 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt -#include #include #include #include @@ -1675,8 +1674,10 @@ static struct sctp_cookie_param *sctp_pack_cookie( * out on the network. */ retval = kzalloc(*cookie_len, GFP_ATOMIC); - if (!retval) - goto nodata; + if (!retval) { + *cookie_len = 0; + return NULL; + } cookie = (struct sctp_signed_cookie *) retval->body; @@ -1707,26 +1708,14 @@ static struct sctp_cookie_param *sctp_pack_cookie( memcpy((__u8 *)(cookie + 1) + ntohs(init_chunk->chunk_hdr->length), raw_addrs, addrs_len); - if (sctp_sk(ep->base.sk)->hmac) { - struct crypto_shash *tfm = sctp_sk(ep->base.sk)->hmac; - int err; - - /* Sign the message. */ - err = crypto_shash_setkey(tfm, ep->secret_key, - sizeof(ep->secret_key)) ?: - crypto_shash_tfm_digest(tfm, (u8 *)&cookie->c, bodysize, - cookie->signature); - if (err) - goto free_cookie; + /* Sign the cookie, if cookie authentication is enabled. */ + if (sctp_sk(ep->base.sk)->cookie_auth_enable) { + static_assert(sizeof(cookie->mac) == SHA256_DIGEST_SIZE); + hmac_sha256(&ep->cookie_auth_key, (const u8 *)&cookie->c, + bodysize, cookie->mac); } return retval; - -free_cookie: - kfree(retval); -nodata: - *cookie_len = 0; - return NULL; } /* Unpack the cookie from COOKIE ECHO chunk, recreating the association. */ @@ -1741,7 +1730,6 @@ struct sctp_association *sctp_unpack_cookie( struct sctp_signed_cookie *cookie; struct sk_buff *skb = chunk->skb; struct sctp_cookie *bear_cookie; - __u8 *digest = ep->digest; enum sctp_scope scope; unsigned int len; ktime_t kt; @@ -1771,30 +1759,19 @@ struct sctp_association *sctp_unpack_cookie( cookie = chunk->subh.cookie_hdr; bear_cookie = &cookie->c; - if (!sctp_sk(ep->base.sk)->hmac) - goto no_hmac; + /* Verify the cookie's MAC, if cookie authentication is enabled. */ + if (sctp_sk(ep->base.sk)->cookie_auth_enable) { + u8 mac[SHA256_DIGEST_SIZE]; - /* Check the signature. */ - { - struct crypto_shash *tfm = sctp_sk(ep->base.sk)->hmac; - int err; - - err = crypto_shash_setkey(tfm, ep->secret_key, - sizeof(ep->secret_key)) ?: - crypto_shash_tfm_digest(tfm, (u8 *)bear_cookie, bodysize, - digest); - if (err) { - *error = -SCTP_IERROR_NOMEM; + hmac_sha256(&ep->cookie_auth_key, (const u8 *)bear_cookie, + bodysize, mac); + static_assert(sizeof(cookie->mac) == sizeof(mac)); + if (crypto_memneq(mac, cookie->mac, sizeof(mac))) { + *error = -SCTP_IERROR_BAD_SIG; goto fail; } } - if (crypto_memneq(digest, cookie->signature, SCTP_SIGNATURE_SIZE)) { - *error = -SCTP_IERROR_BAD_SIG; - goto fail; - } - -no_hmac: /* IG Section 2.35.2: * 3) Compare the port numbers and the verification tag contained * within the COOKIE ECHO chunk to the actual port numbers and the diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 0292881a847c..ed8293a34240 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -37,7 +37,6 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt -#include #include #include #include @@ -4987,7 +4986,7 @@ static int sctp_init_sock(struct sock *sk) sp->default_rcv_context = 0; sp->max_burst = net->sctp.max_burst; - sp->sctp_hmac_alg = net->sctp.sctp_hmac_alg; + sp->cookie_auth_enable = net->sctp.cookie_auth_enable; /* Initialize default setup parameters. These parameters * can be modified with the SCTP_INITMSG socket option or @@ -5079,8 +5078,6 @@ static int sctp_init_sock(struct sock *sk) if (!sp->ep) return -ENOMEM; - sp->hmac = NULL; - sk->sk_destruct = sctp_destruct_sock; SCTP_DBG_OBJCNT_INC(sock); @@ -5117,18 +5114,8 @@ static void sctp_destroy_sock(struct sock *sk) sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); } -/* Triggered when there are no references on the socket anymore */ -static void sctp_destruct_common(struct sock *sk) -{ - struct sctp_sock *sp = sctp_sk(sk); - - /* Free up the HMAC transform. */ - crypto_free_shash(sp->hmac); -} - static void sctp_destruct_sock(struct sock *sk) { - sctp_destruct_common(sk); inet_sock_destruct(sk); } @@ -8530,22 +8517,8 @@ static int sctp_listen_start(struct sock *sk, int backlog) { struct sctp_sock *sp = sctp_sk(sk); struct sctp_endpoint *ep = sp->ep; - struct crypto_shash *tfm = NULL; - char alg[32]; int err; - /* Allocate HMAC for generating cookie. */ - if (!sp->hmac && sp->sctp_hmac_alg) { - sprintf(alg, "hmac(%s)", sp->sctp_hmac_alg); - tfm = crypto_alloc_shash(alg, 0, 0); - if (IS_ERR(tfm)) { - net_info_ratelimited("failed to load transform for %s: %ld\n", - sp->sctp_hmac_alg, PTR_ERR(tfm)); - return -ENOSYS; - } - sctp_sk(sk)->hmac = tfm; - } - /* * If a bind() or sctp_bindx() is not called prior to a listen() * call that allows new associations to be accepted, the system @@ -9561,7 +9534,6 @@ static int sctp_sock_migrate(struct sock *oldsk, struct sock *newsk, * copy. */ newsp->ep = newep; - newsp->hmac = NULL; /* Hook this new socket in to the bind_hash list. */ head = &sctp_port_hashtable[sctp_phashfn(sock_net(oldsk), @@ -9713,7 +9685,6 @@ struct proto sctp_prot = { static void sctp_v6_destruct_sock(struct sock *sk) { - sctp_destruct_common(sk); inet6_sock_destruct(sk); } diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index ee3eac338a9d..19acc57c3ed9 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -174,7 +174,7 @@ static struct ctl_table sctp_net_table[] = { }, { .procname = "cookie_hmac_alg", - .data = &init_net.sctp.sctp_hmac_alg, + .data = &init_net.sctp.cookie_auth_enable, .maxlen = 8, .mode = 0644, .proc_handler = proc_sctp_do_hmac_alg, @@ -388,10 +388,8 @@ static int proc_sctp_do_hmac_alg(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct net *net = container_of(ctl->data, struct net, - sctp.sctp_hmac_alg); + sctp.cookie_auth_enable); struct ctl_table tbl; - bool changed = false; - char *none = "none"; char tmp[8] = {0}; int ret; @@ -399,35 +397,28 @@ static int proc_sctp_do_hmac_alg(const struct ctl_table *ctl, int write, if (write) { tbl.data = tmp; - tbl.maxlen = sizeof(tmp); - } else { - tbl.data = net->sctp.sctp_hmac_alg ? : none; - tbl.maxlen = strlen(tbl.data); + tbl.maxlen = sizeof(tmp) - 1; + ret = proc_dostring(&tbl, 1, buffer, lenp, ppos); + if (ret) + return ret; + if (!strcmp(tmp, "sha256") || + /* for backwards compatibility */ + !strcmp(tmp, "md5") || !strcmp(tmp, "sha1")) { + net->sctp.cookie_auth_enable = 1; + return 0; + } + if (!strcmp(tmp, "none")) { + net->sctp.cookie_auth_enable = 0; + return 0; + } + return -EINVAL; } - - ret = proc_dostring(&tbl, write, buffer, lenp, ppos); - if (write && ret == 0) { -#ifdef CONFIG_CRYPTO_MD5 - if (!strncmp(tmp, "md5", 3)) { - net->sctp.sctp_hmac_alg = "md5"; - changed = true; - } -#endif -#ifdef CONFIG_CRYPTO_SHA1 - if (!strncmp(tmp, "sha1", 4)) { - net->sctp.sctp_hmac_alg = "sha1"; - changed = true; - } -#endif - if (!strncmp(tmp, "none", 4)) { - net->sctp.sctp_hmac_alg = NULL; - changed = true; - } - if (!changed) - ret = -EINVAL; - } - - return ret; + if (net->sctp.cookie_auth_enable) + tbl.data = (char *)"sha256"; + else + tbl.data = (char *)"none"; + tbl.maxlen = strlen(tbl.data); + return proc_dostring(&tbl, 0, buffer, lenp, ppos); } static int proc_sctp_do_rto_min(const struct ctl_table *ctl, int write, From d5a253702add0da3e1e19252ae2a251ee24b486d Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 18 Aug 2025 13:54:26 -0700 Subject: [PATCH 0188/1536] sctp: Stop accepting md5 and sha1 for net.sctp.cookie_hmac_alg The upgrade of the cookie authentication algorithm to HMAC-SHA256 kept some backwards compatibility for the net.sctp.cookie_hmac_alg sysctl by still accepting the values 'md5' and 'sha1'. Those algorithms are no longer actually used, but rather those values were just treated as requests to enable cookie authentication. As requested at https://lore.kernel.org/netdev/CADvbK_fmCRARc8VznH8cQa-QKaCOQZ6yFbF=1-VDK=zRqv_cXw@mail.gmail.com/ and https://lore.kernel.org/netdev/20250818084345.708ac796@kernel.org/ , go further and start rejecting 'md5' and 'sha1' completely. Signed-off-by: Eric Biggers Link: https://patch.msgid.link/20250818205426.30222-6-ebiggers@kernel.org Signed-off-by: Jakub Kicinski --- Documentation/networking/ip-sysctl.rst | 3 --- net/sctp/sysctl.c | 4 +--- 2 files changed, 1 insertion(+), 6 deletions(-) diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 3d6782683eee..43badb338d22 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -3511,9 +3511,6 @@ cookie_hmac_alg - STRING * sha256 * none - md5 and sha1 are also accepted for backwards compatibility, but cause - sha256 to be selected. - Default: sha256 rcvbuf_policy - INTEGER diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index 19acc57c3ed9..15e7db9a3ab2 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -401,9 +401,7 @@ static int proc_sctp_do_hmac_alg(const struct ctl_table *ctl, int write, ret = proc_dostring(&tbl, 1, buffer, lenp, ppos); if (ret) return ret; - if (!strcmp(tmp, "sha256") || - /* for backwards compatibility */ - !strcmp(tmp, "md5") || !strcmp(tmp, "sha1")) { + if (!strcmp(tmp, "sha256")) { net->sctp.cookie_auth_enable = 1; return 0; } From 08d07f25fd5e2ca3d32444f8c41e9e6e59e8a54e Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Fri, 15 Aug 2025 14:55:49 +0200 Subject: [PATCH 0189/1536] netfilter: ctnetlink: remove refcounting in dying list dumping There is no need to keep the object alive via refcount, use a cookie and then use that as the skip hint for dump resumption. Unlike the two earlier, similar patches in this file, this is a cleanup without intended side effects. Signed-off-by: Florian Westphal --- net/netfilter/nf_conntrack_netlink.c | 39 +++++++--------------------- 1 file changed, 10 insertions(+), 29 deletions(-) diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 50fd6809380f..3a04665adf99 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -60,7 +60,7 @@ MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("List and change connection tracking table"); struct ctnetlink_list_dump_ctx { - struct nf_conn *last; + unsigned long last_id; unsigned int cpu; bool done; }; @@ -1733,16 +1733,6 @@ static int ctnetlink_get_conntrack(struct sk_buff *skb, return nfnetlink_unicast(skb2, info->net, NETLINK_CB(skb).portid); } -static int ctnetlink_done_list(struct netlink_callback *cb) -{ - struct ctnetlink_list_dump_ctx *ctx = (void *)cb->ctx; - - if (ctx->last) - nf_ct_put(ctx->last); - - return 0; -} - #ifdef CONFIG_NF_CONNTRACK_EVENTS static int ctnetlink_dump_one_entry(struct sk_buff *skb, struct netlink_callback *cb, @@ -1757,11 +1747,11 @@ static int ctnetlink_dump_one_entry(struct sk_buff *skb, if (l3proto && nf_ct_l3num(ct) != l3proto) return 0; - if (ctx->last) { - if (ct != ctx->last) + if (ctx->last_id) { + if (ctnetlink_get_id(ct) != ctx->last_id) return 0; - ctx->last = NULL; + ctx->last_id = 0; } /* We can't dump extension info for the unconfirmed @@ -1775,12 +1765,8 @@ static int ctnetlink_dump_one_entry(struct sk_buff *skb, cb->nlh->nlmsg_seq, NFNL_MSG_TYPE(cb->nlh->nlmsg_type), ct, dying, 0); - if (res < 0) { - if (!refcount_inc_not_zero(&ct->ct_general.use)) - return 0; - - ctx->last = ct; - } + if (res < 0) + ctx->last_id = ctnetlink_get_id(ct); return res; } @@ -1796,10 +1782,10 @@ static int ctnetlink_dump_dying(struct sk_buff *skb, struct netlink_callback *cb) { struct ctnetlink_list_dump_ctx *ctx = (void *)cb->ctx; - struct nf_conn *last = ctx->last; #ifdef CONFIG_NF_CONNTRACK_EVENTS const struct net *net = sock_net(skb->sk); struct nf_conntrack_net_ecache *ecache_net; + unsigned long last_id = ctx->last_id; struct nf_conntrack_tuple_hash *h; struct hlist_nulls_node *n; #endif @@ -1807,7 +1793,7 @@ ctnetlink_dump_dying(struct sk_buff *skb, struct netlink_callback *cb) if (ctx->done) return 0; - ctx->last = NULL; + ctx->last_id = 0; #ifdef CONFIG_NF_CONNTRACK_EVENTS ecache_net = nf_conn_pernet_ecache(net); @@ -1818,24 +1804,21 @@ ctnetlink_dump_dying(struct sk_buff *skb, struct netlink_callback *cb) int res; ct = nf_ct_tuplehash_to_ctrack(h); - if (last && last != ct) + if (last_id && last_id != ctnetlink_get_id(ct)) continue; res = ctnetlink_dump_one_entry(skb, cb, ct, true); if (res < 0) { spin_unlock_bh(&ecache_net->dying_lock); - nf_ct_put(last); return skb->len; } - nf_ct_put(last); - last = NULL; + last_id = 0; } spin_unlock_bh(&ecache_net->dying_lock); #endif ctx->done = true; - nf_ct_put(last); return skb->len; } @@ -1847,7 +1830,6 @@ static int ctnetlink_get_ct_dying(struct sk_buff *skb, if (info->nlh->nlmsg_flags & NLM_F_DUMP) { struct netlink_dump_control c = { .dump = ctnetlink_dump_dying, - .done = ctnetlink_done_list, }; return netlink_dump_start(info->sk, skb, info->nlh, &c); } @@ -1862,7 +1844,6 @@ static int ctnetlink_get_ct_unconfirmed(struct sk_buff *skb, if (info->nlh->nlmsg_flags & NLM_F_DUMP) { struct netlink_dump_control c = { .dump = ctnetlink_dump_unconfirmed, - .done = ctnetlink_done_list, }; return netlink_dump_start(info->sk, skb, info->nlh, &c); } From d11b26402a33f5c45389e0a288430c457434c9cd Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Fri, 15 Aug 2025 18:09:36 +0200 Subject: [PATCH 0190/1536] netfilter: nft_set_pipapo_avx2: Drop the comment regarding protection The comment claims that the kernel_fpu_begin_mask() below protects access to the scratch map. This is not true because the access is only protected by local_bh_disable() above. Remove the misleading comment. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Florian Westphal --- net/netfilter/nft_set_pipapo_avx2.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index 2f090e253caf..fc734a8545b4 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1171,9 +1171,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, m = rcu_dereference(priv->match); - /* This also protects access to all data related to scratch maps. - * - * Note that we don't need a valid MXCSR state for any of the + /* Note that we don't need a valid MXCSR state for any of the * operations we use here, so pass 0 as mask and spare a LDMXCSR * instruction. */ From 416e53e39516714057d7d06d561e49d1a89fa524 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Fri, 15 Aug 2025 16:36:57 +0200 Subject: [PATCH 0191/1536] netfilter: nft_set_pipapo_avx2: split lookup function in two parts Split the main avx2 lookup function into a helper. This is a preparation patch: followup change will use the new helper from the insertion path if possible. This greatly improves insertion performance when avx2 is supported. Reviewed-by: Stefano Brivio Signed-off-by: Florian Westphal --- net/netfilter/nft_set_pipapo_avx2.c | 136 +++++++++++++++++----------- 1 file changed, 82 insertions(+), 54 deletions(-) diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index fc734a8545b4..994a2ad2d9b1 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1133,56 +1133,35 @@ static inline void pipapo_resmap_init_avx2(const struct nft_pipapo_match *m, uns } /** - * nft_pipapo_avx2_lookup() - Lookup function for AVX2 implementation - * @net: Network namespace - * @set: nftables API set representation - * @key: nftables API element representation containing key data + * pipapo_get_avx2() - Lookup function for AVX2 implementation + * @m: Storage containing the set elements + * @data: Key data to be matched against existing elements + * @genmask: If set, check that element is active in given genmask + * @tstamp: Timestamp to check for expired elements * * For more details, see DOC: Theory of Operation in nft_set_pipapo.c. * * This implementation exploits the repetitive characteristic of the algorithm * to provide a fast, vectorised version using the AVX2 SIMD instruction set. * - * Return: true on match, false otherwise. + * The caller must check that the FPU is usable. + * This function must be called with BH disabled. + * + * Return: pointer to &struct nft_pipapo_elem on match, NULL otherwise. */ -const struct nft_set_ext * -nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, - const u32 *key) +static struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m, + const u8 *data, u8 genmask, + u64 tstamp) { - struct nft_pipapo *priv = nft_set_priv(set); - const struct nft_set_ext *ext = NULL; struct nft_pipapo_scratch *scratch; - u8 genmask = nft_genmask_cur(net); - const struct nft_pipapo_match *m; const struct nft_pipapo_field *f; - const u8 *rp = (const u8 *)key; unsigned long *res, *fill; bool map_index; int i; - local_bh_disable(); - - if (unlikely(!irq_fpu_usable())) { - ext = nft_pipapo_lookup(net, set, key); - - local_bh_enable(); - return ext; - } - - m = rcu_dereference(priv->match); - - /* Note that we don't need a valid MXCSR state for any of the - * operations we use here, so pass 0 as mask and spare a LDMXCSR - * instruction. - */ - kernel_fpu_begin_mask(0); - scratch = *raw_cpu_ptr(m->scratch); - if (unlikely(!scratch)) { - kernel_fpu_end(); - local_bh_enable(); + if (unlikely(!scratch)) return NULL; - } map_index = scratch->map_index; @@ -1191,6 +1170,12 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, pipapo_resmap_init_avx2(m, res); + /* Note that we don't need a valid MXCSR state for any of the + * operations we use here, so pass 0 as mask and spare a LDMXCSR + * instruction. + */ + kernel_fpu_begin_mask(0); + nft_pipapo_avx2_prepare(); next_match: @@ -1200,7 +1185,7 @@ next_match: #define NFT_SET_PIPAPO_AVX2_LOOKUP(b, n) \ (ret = nft_pipapo_avx2_lookup_##b##b_##n(res, fill, f, \ - ret, rp, \ + ret, data, \ first, last)) if (likely(f->bb == 8)) { @@ -1216,7 +1201,7 @@ next_match: NFT_SET_PIPAPO_AVX2_LOOKUP(8, 16); } else { ret = nft_pipapo_avx2_lookup_slow(m, res, fill, f, - ret, rp, + ret, data, first, last); } } else { @@ -1232,7 +1217,7 @@ next_match: NFT_SET_PIPAPO_AVX2_LOOKUP(4, 32); } else { ret = nft_pipapo_avx2_lookup_slow(m, res, fill, f, - ret, rp, + ret, data, first, last); } } @@ -1240,29 +1225,72 @@ next_match: #undef NFT_SET_PIPAPO_AVX2_LOOKUP - if (ret < 0) - goto out; - - if (last) { - const struct nft_set_ext *e = &f->mt[ret].e->ext; - - if (unlikely(nft_set_elem_expired(e) || - !nft_set_elem_active(e, genmask))) - goto next_match; - - ext = e; - goto out; + if (ret < 0) { + scratch->map_index = map_index; + kernel_fpu_end(); + return NULL; } + if (last) { + struct nft_pipapo_elem *e; + + e = f->mt[ret].e; + if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) || + !nft_set_elem_active(&e->ext, genmask))) + goto next_match; + + scratch->map_index = map_index; + kernel_fpu_end(); + return e; + } + + map_index = !map_index; swap(res, fill); - rp += NFT_PIPAPO_GROUPS_PADDED_SIZE(f); + data += NFT_PIPAPO_GROUPS_PADDED_SIZE(f); } -out: - if (i % 2) - scratch->map_index = !map_index; kernel_fpu_end(); + return NULL; +} + +/** + * nft_pipapo_avx2_lookup() - Dataplane frontend for AVX2 implementation + * @net: Network namespace + * @set: nftables API set representation + * @key: nftables API element representation containing key data + * + * This function is called from the data path. It will search for + * an element matching the given key in the current active copy using + * the AVX2 routines if the fpu is usable or fall back to the generic + * implementation of the algorithm otherwise. + * + * Return: nftables API extension pointer or NULL if no match. + */ +const struct nft_set_ext * +nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) +{ + struct nft_pipapo *priv = nft_set_priv(set); + u8 genmask = nft_genmask_cur(net); + const struct nft_pipapo_match *m; + const u8 *rp = (const u8 *)key; + const struct nft_pipapo_elem *e; + + local_bh_disable(); + + if (unlikely(!irq_fpu_usable())) { + const struct nft_set_ext *ext; + + ext = nft_pipapo_lookup(net, set, key); + + local_bh_enable(); + return ext; + } + + m = rcu_dereference(priv->match); + + e = pipapo_get_avx2(m, rp, genmask, get_jiffies_64()); local_bh_enable(); - return ext; + return e ? &e->ext : NULL; } From 84c1da7b38d9ad8fadd5b0b76034a41f7761e404 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Fri, 15 Aug 2025 16:36:58 +0200 Subject: [PATCH 0192/1536] netfilter: nft_set_pipapo: use avx2 algorithm for insertions too Always prefer the avx2 implementation if its available. This greatly improves insertion performance (each insertion checks if the new element would overlap with an existing one): time nft -f - < Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Florian Westphal --- net/netfilter/nft_set_pipapo.c | 45 +++++++++++++++++++++++++---- net/netfilter/nft_set_pipapo_avx2.c | 8 ++--- net/netfilter/nft_set_pipapo_avx2.h | 4 +++ 3 files changed, 48 insertions(+), 9 deletions(-) diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 9a10251228fd..7ed9c5f0e233 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -397,7 +397,7 @@ int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules, } /** - * pipapo_get() - Get matching element reference given key data + * pipapo_get_slow() - Get matching element reference given key data * @m: storage containing the set elements * @data: Key data to be matched against existing elements * @genmask: If set, check that element is active in given genmask @@ -414,9 +414,9 @@ int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules, * * Return: pointer to &struct nft_pipapo_elem on match, NULL otherwise. */ -static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m, - const u8 *data, u8 genmask, - u64 tstamp) +static struct nft_pipapo_elem *pipapo_get_slow(const struct nft_pipapo_match *m, + const u8 *data, u8 genmask, + u64 tstamp) { struct nft_pipapo_scratch *scratch; unsigned long *res_map, *fill_map; @@ -502,6 +502,41 @@ out: return NULL; } +/** + * pipapo_get() - Get matching element reference given key data + * @m: Storage containing the set elements + * @data: Key data to be matched against existing elements + * @genmask: If set, check that element is active in given genmask + * @tstamp: Timestamp to check for expired elements + * + * This is a dispatcher function, either calling out the generic C + * implementation or, if available, the AVX2 one. + * This helper is only called from the control plane, with either RCU + * read lock or transaction mutex held. + * + * Return: pointer to &struct nft_pipapo_elem on match, NULL otherwise. + */ +static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m, + const u8 *data, u8 genmask, + u64 tstamp) +{ + struct nft_pipapo_elem *e; + + local_bh_disable(); + +#if defined(CONFIG_X86_64) && !defined(CONFIG_UML) + if (boot_cpu_has(X86_FEATURE_AVX2) && boot_cpu_has(X86_FEATURE_AVX) && + irq_fpu_usable()) { + e = pipapo_get_avx2(m, data, genmask, tstamp); + local_bh_enable(); + return e; + } +#endif + e = pipapo_get_slow(m, data, genmask, tstamp); + local_bh_enable(); + return e; +} + /** * nft_pipapo_lookup() - Dataplane fronted for main lookup function * @net: Network namespace @@ -523,7 +558,7 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, const struct nft_pipapo_elem *e; m = rcu_dereference(priv->match); - e = pipapo_get(m, (const u8 *)key, genmask, get_jiffies_64()); + e = pipapo_get_slow(m, (const u8 *)key, genmask, get_jiffies_64()); return e ? &e->ext : NULL; } diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index 994a2ad2d9b1..028c11724b42 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1149,9 +1149,9 @@ static inline void pipapo_resmap_init_avx2(const struct nft_pipapo_match *m, uns * * Return: pointer to &struct nft_pipapo_elem on match, NULL otherwise. */ -static struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m, - const u8 *data, u8 genmask, - u64 tstamp) +struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m, + const u8 *data, u8 genmask, + u64 tstamp) { struct nft_pipapo_scratch *scratch; const struct nft_pipapo_field *f; @@ -1261,7 +1261,7 @@ next_match: * * This function is called from the data path. It will search for * an element matching the given key in the current active copy using - * the AVX2 routines if the fpu is usable or fall back to the generic + * the AVX2 routines if the FPU is usable or fall back to the generic * implementation of the algorithm otherwise. * * Return: nftables API extension pointer or NULL if no match. diff --git a/net/netfilter/nft_set_pipapo_avx2.h b/net/netfilter/nft_set_pipapo_avx2.h index dbb6aaca8a7a..c2999b63da3f 100644 --- a/net/netfilter/nft_set_pipapo_avx2.h +++ b/net/netfilter/nft_set_pipapo_avx2.h @@ -5,8 +5,12 @@ #include #define NFT_PIPAPO_ALIGN (XSAVE_YMM_SIZE / BITS_PER_BYTE) +struct nft_pipapo_match; bool nft_pipapo_avx2_estimate(const struct nft_set_desc *desc, u32 features, struct nft_set_estimate *est); +struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m, + const u8 *data, u8 genmask, + u64 tstamp); #endif /* defined(CONFIG_X86_64) && !defined(CONFIG_UML) */ #endif /* _NFT_SET_PIPAPO_AVX2_H */ From 6aa67d5706f031f24cd486d8df7dc7fddca62b22 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Fri, 15 Aug 2025 18:09:35 +0200 Subject: [PATCH 0193/1536] netfilter: nft_set_pipapo: Store real pointer, adjust later. The struct nft_pipapo_scratch is allocated, then aligned to the required alignment and difference (in bytes) is then saved in align_off. The aligned pointer is used later. While this works, it gets complicated with all the extra checks if all member before map are larger than the required alignment. Instead of saving the aligned pointer, just save the returned pointer and align the map pointer in nft_pipapo_lookup() before using it. The alignment later on shouldn't be that expensive. With this change, the align_off can be removed and the pointer can be passed to kfree() as is. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Florian Westphal --- net/netfilter/nft_set_pipapo.c | 40 ++++++----------------------- net/netfilter/nft_set_pipapo.h | 6 ++--- net/netfilter/nft_set_pipapo_avx2.c | 8 +++--- 3 files changed, 14 insertions(+), 40 deletions(-) diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 7ed9c5f0e233..96b7539f5506 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -418,8 +418,8 @@ static struct nft_pipapo_elem *pipapo_get_slow(const struct nft_pipapo_match *m, const u8 *data, u8 genmask, u64 tstamp) { + unsigned long *res_map, *fill_map, *map; struct nft_pipapo_scratch *scratch; - unsigned long *res_map, *fill_map; const struct nft_pipapo_field *f; bool map_index; int i; @@ -432,8 +432,9 @@ static struct nft_pipapo_elem *pipapo_get_slow(const struct nft_pipapo_match *m, map_index = scratch->map_index; - res_map = scratch->map + (map_index ? m->bsize_max : 0); - fill_map = scratch->map + (map_index ? 0 : m->bsize_max); + map = NFT_PIPAPO_LT_ALIGN(&scratch->__map[0]); + res_map = map + (map_index ? m->bsize_max : 0); + fill_map = map + (map_index ? 0 : m->bsize_max); pipapo_resmap_init(m, res_map); @@ -1171,22 +1172,17 @@ static void pipapo_map(struct nft_pipapo_match *m, } /** - * pipapo_free_scratch() - Free per-CPU map at original (not aligned) address + * pipapo_free_scratch() - Free per-CPU map at original address * @m: Matching data * @cpu: CPU number */ static void pipapo_free_scratch(const struct nft_pipapo_match *m, unsigned int cpu) { struct nft_pipapo_scratch *s; - void *mem; s = *per_cpu_ptr(m->scratch, cpu); - if (!s) - return; - mem = s; - mem -= s->align_off; - kvfree(mem); + kvfree(s); } /** @@ -1203,11 +1199,8 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone, for_each_possible_cpu(i) { struct nft_pipapo_scratch *scratch; -#ifdef NFT_PIPAPO_ALIGN - void *scratch_aligned; - u32 align_off; -#endif - scratch = kvzalloc_node(struct_size(scratch, map, bsize_max * 2) + + + scratch = kvzalloc_node(struct_size(scratch, __map, bsize_max * 2) + NFT_PIPAPO_ALIGN_HEADROOM, GFP_KERNEL_ACCOUNT, cpu_to_node(i)); if (!scratch) { @@ -1222,23 +1215,6 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone, } pipapo_free_scratch(clone, i); - -#ifdef NFT_PIPAPO_ALIGN - /* Align &scratch->map (not the struct itself): the extra - * %NFT_PIPAPO_ALIGN_HEADROOM bytes passed to kzalloc_node() - * above guarantee we can waste up to those bytes in order - * to align the map field regardless of its offset within - * the struct. - */ - BUILD_BUG_ON(offsetof(struct nft_pipapo_scratch, map) > NFT_PIPAPO_ALIGN_HEADROOM); - - scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map); - scratch_aligned -= offsetof(struct nft_pipapo_scratch, map); - align_off = scratch_aligned - (void *)scratch; - - scratch = scratch_aligned; - scratch->align_off = align_off; -#endif *per_cpu_ptr(clone->scratch, i) = scratch; } diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h index 4a2ff85ce1c4..e10cdbaa65d8 100644 --- a/net/netfilter/nft_set_pipapo.h +++ b/net/netfilter/nft_set_pipapo.h @@ -125,13 +125,11 @@ struct nft_pipapo_field { /** * struct nft_pipapo_scratch - percpu data used for lookup and matching * @map_index: Current working bitmap index, toggled between field matches - * @align_off: Offset to get the originally allocated address - * @map: store partial matching results during lookup + * @__map: store partial matching results during lookup */ struct nft_pipapo_scratch { u8 map_index; - u32 align_off; - unsigned long map[]; + unsigned long __map[]; }; /** diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index 028c11724b42..f0d8c796d731 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1155,7 +1155,7 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m, { struct nft_pipapo_scratch *scratch; const struct nft_pipapo_field *f; - unsigned long *res, *fill; + unsigned long *res, *fill, *map; bool map_index; int i; @@ -1164,9 +1164,9 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m, return NULL; map_index = scratch->map_index; - - res = scratch->map + (map_index ? m->bsize_max : 0); - fill = scratch->map + (map_index ? 0 : m->bsize_max); + map = NFT_PIPAPO_LT_ALIGN(&scratch->__map[0]); + res = map + (map_index ? m->bsize_max : 0); + fill = map + (map_index ? 0 : m->bsize_max); pipapo_resmap_init_avx2(m, res); From 456010c8b99e65231160d4c706122ac5502fbcff Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 18 Aug 2025 13:02:13 +0200 Subject: [PATCH 0194/1536] netfilter: nft_set_pipapo: Use nested-BH locking for nft_pipapo_scratch nft_pipapo_scratch is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Florian Westphal --- net/netfilter/nft_set_pipapo.c | 5 +++++ net/netfilter/nft_set_pipapo.h | 2 ++ net/netfilter/nft_set_pipapo_avx2.c | 4 ++++ 3 files changed, 11 insertions(+) diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 96b7539f5506..b385cfcf886f 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -429,6 +429,7 @@ static struct nft_pipapo_elem *pipapo_get_slow(const struct nft_pipapo_match *m, scratch = *raw_cpu_ptr(m->scratch); if (unlikely(!scratch)) goto out; + __local_lock_nested_bh(&scratch->bh_lock); map_index = scratch->map_index; @@ -465,6 +466,7 @@ next_match: last); if (b < 0) { scratch->map_index = map_index; + __local_unlock_nested_bh(&scratch->bh_lock); local_bh_enable(); return NULL; @@ -484,6 +486,7 @@ next_match: * *next* bitmap (not initial) for the next packet. */ scratch->map_index = map_index; + __local_unlock_nested_bh(&scratch->bh_lock); local_bh_enable(); return e; } @@ -498,6 +501,7 @@ next_match: data += NFT_PIPAPO_GROUPS_PADDING(f); } + __local_unlock_nested_bh(&scratch->bh_lock); out: local_bh_enable(); return NULL; @@ -1215,6 +1219,7 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone, } pipapo_free_scratch(clone, i); + local_lock_init(&scratch->bh_lock); *per_cpu_ptr(clone->scratch, i) = scratch; } diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h index e10cdbaa65d8..eaab422aa56a 100644 --- a/net/netfilter/nft_set_pipapo.h +++ b/net/netfilter/nft_set_pipapo.h @@ -124,10 +124,12 @@ struct nft_pipapo_field { /** * struct nft_pipapo_scratch - percpu data used for lookup and matching + * @bh_lock: PREEMPT_RT local spinlock * @map_index: Current working bitmap index, toggled between field matches * @__map: store partial matching results during lookup */ struct nft_pipapo_scratch { + local_lock_t bh_lock; u8 map_index; unsigned long __map[]; }; diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index f0d8c796d731..29326f3fcaf3 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1163,6 +1163,7 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m, if (unlikely(!scratch)) return NULL; + __local_lock_nested_bh(&scratch->bh_lock); map_index = scratch->map_index; map = NFT_PIPAPO_LT_ALIGN(&scratch->__map[0]); res = map + (map_index ? m->bsize_max : 0); @@ -1228,6 +1229,7 @@ next_match: if (ret < 0) { scratch->map_index = map_index; kernel_fpu_end(); + __local_unlock_nested_bh(&scratch->bh_lock); return NULL; } @@ -1241,6 +1243,7 @@ next_match: scratch->map_index = map_index; kernel_fpu_end(); + __local_unlock_nested_bh(&scratch->bh_lock); return e; } @@ -1250,6 +1253,7 @@ next_match: } kernel_fpu_end(); + __local_unlock_nested_bh(&scratch->bh_lock); return NULL; } From 8f2c72f2252cf228879de0224d5055470fc20c06 Mon Sep 17 00:00:00 2001 From: Pengtao He Date: Tue, 19 Aug 2025 10:15:51 +0800 Subject: [PATCH 0195/1536] net: avoid one loop iteration in __skb_splice_bits If *len is equal to 0 at the beginning of __splice_segment it returns true directly. But when decreasing *len from a positive number to 0 in __splice_segment, it returns false. The __skb_splice_bits needs to call __splice_segment again. Recheck *len if it changes, return true in time. Reduce unnecessary calls to __splice_segment. Signed-off-by: Pengtao He Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20250819021551.8361-1-hept.hept.hept@gmail.com Signed-off-by: Jakub Kicinski --- net/core/skbuff.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index ee0274417948..23b776cd9879 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3112,7 +3112,9 @@ static bool __splice_segment(struct page *page, unsigned int poff, poff += flen; plen -= flen; *len -= flen; - } while (*len && plen); + if (!*len) + return true; + } while (plen); return false; } From 6b4b1d577e1fe8a4e436a250e098b5c247d978d9 Mon Sep 17 00:00:00 2001 From: Alex Tran Date: Mon, 18 Aug 2025 19:52:27 -0700 Subject: [PATCH 0196/1536] selftests/net/socket.c: removed warnings from unused returns MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit socket.c: In function ‘run_tests’: socket.c:59:25: warning: ignoring return value of ‘strerror_r’ \ declared with attribute ‘warn_unused_result’ [-Wunused-result] 59 | strerror_r(-s->expect, err_string1, ERR_STRING_SZ); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ socket.c:60:25: warning: ignoring return value of ‘strerror_r’ \ declared with attribute ‘warn_unused_result’ [-Wunused-result] 60 | strerror_r(errno, err_string2, ERR_STRING_SZ); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ socket.c:73:33: warning: ignoring return value of ‘strerror_r’ \ declared with attribute ‘warn_unused_result’ [-Wunused-result] 73 | strerror_r(errno, err_string1, ERR_STRING_SZ); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ changelog: v2 - const char* messages and fixed patch warnings of max 75 chars per line Signed-off-by: Alex Tran Link: https://patch.msgid.link/20250819025227.239885-1-alex.t.tran@gmail.com Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/socket.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/net/socket.c b/tools/testing/selftests/net/socket.c index db1aeb8c5d1e..be1080003c61 100644 --- a/tools/testing/selftests/net/socket.c +++ b/tools/testing/selftests/net/socket.c @@ -39,6 +39,7 @@ static int run_tests(void) { char err_string1[ERR_STRING_SZ]; char err_string2[ERR_STRING_SZ]; + const char *msg1, *msg2; int i, err; err = 0; @@ -56,13 +57,13 @@ static int run_tests(void) errno == -s->expect) continue; - strerror_r(-s->expect, err_string1, ERR_STRING_SZ); - strerror_r(errno, err_string2, ERR_STRING_SZ); + msg1 = strerror_r(-s->expect, err_string1, ERR_STRING_SZ); + msg2 = strerror_r(errno, err_string2, ERR_STRING_SZ); fprintf(stderr, "socket(%d, %d, %d) expected " "err (%s) got (%s)\n", s->domain, s->type, s->protocol, - err_string1, err_string2); + msg1, msg2); err = -1; break; @@ -70,12 +71,12 @@ static int run_tests(void) close(fd); if (s->expect < 0) { - strerror_r(errno, err_string1, ERR_STRING_SZ); + msg1 = strerror_r(errno, err_string1, ERR_STRING_SZ); fprintf(stderr, "socket(%d, %d, %d) expected " "success got err (%s)\n", s->domain, s->type, s->protocol, - err_string1); + msg1); err = -1; break; From eacb6e408dc861fb93baf10d6837204a3b39e915 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Tue, 19 Aug 2025 07:33:48 +0000 Subject: [PATCH 0197/1536] selftests: net: bpf_offload: print loaded programs on mismatch The test sometimes fails due to an unexpected number of loaded programs. e.g FAIL: 2 BPF programs loaded, expected 1 File "/usr/libexec/kselftests/net/./bpf_offload.py", line 940, in progs = bpftool_prog_list(expected=1) File "/usr/libexec/kselftests/net/./bpf_offload.py", line 187, in bpftool_prog_list fail(True, "%d BPF programs loaded, expected %d" % File "/usr/libexec/kselftests/net/./bpf_offload.py", line 89, in fail tb = "".join(traceback.extract_stack().format()) However, the logs do not show which programs were actually loaded, making it difficult to debug the failure. Add printing of the loaded programs when a mismatch is detected to help troubleshoot such errors. The list is printed on a new line to avoid breaking the current log format. Signed-off-by: Hangbin Liu Link: https://patch.msgid.link/20250819073348.387972-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/bpf_offload.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/bpf_offload.py b/tools/testing/selftests/net/bpf_offload.py index b2c271b79240..c856d266c8f3 100755 --- a/tools/testing/selftests/net/bpf_offload.py +++ b/tools/testing/selftests/net/bpf_offload.py @@ -184,8 +184,8 @@ def bpftool_prog_list(expected=None, ns="", exclude_orphaned=True): progs = [ p for p in progs if not p['orphaned'] ] if expected is not None: if len(progs) != expected: - fail(True, "%d BPF programs loaded, expected %d" % - (len(progs), expected)) + fail(True, "%d BPF programs loaded, expected %d\nLoaded Progs:\n%s" % + (len(progs), expected, pp.pformat(progs))) return progs def bpftool_map_list(expected=None, ns=""): From 781bf2cc0616d44cc4a63bab3664a54b221bdc14 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Tue, 19 Aug 2025 07:47:49 +0000 Subject: [PATCH 0198/1536] selftests: rtnetlink: print device info on preferred_lft test failure Even with slowwait used to avoid system sleep in the preferred_lft test, failures can still occur after long runtimes. Print the device address info when the test fails to provide better troubleshooting data. Signed-off-by: Hangbin Liu Link: https://patch.msgid.link/20250819074749.388064-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/rtnetlink.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index d6c00efeb664..91b0f6cae04d 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -313,6 +313,8 @@ kci_test_addrlft() slowwait 5 check_addr_not_exist "$devdummy" "10.23.11." if [ $? -eq 1 ]; then + # troubleshoot the reason for our failure + run_cmd ip addr show dev "$devdummy" check_err 1 end_test "FAIL: preferred_lft addresses remaining" return From 5f8a4f34f6dcf4b64c3cbcfd4aa5a8d7dbcd268d Mon Sep 17 00:00:00 2001 From: Michael Chan Date: Tue, 19 Aug 2025 09:39:15 -0700 Subject: [PATCH 0199/1536] bnxt_en: hsi: Update FW interface to 1.10.3.133 The major change is struct pcie_ctx_hw_stats_v2 which has new latency histograms added. Signed-off-by: Michael Chan Link: https://patch.msgid.link/20250819163919.104075-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski --- include/linux/bnxt/hsi.h | 315 +++++++++++++++++++++++++++++++-------- 1 file changed, 253 insertions(+), 62 deletions(-) diff --git a/include/linux/bnxt/hsi.h b/include/linux/bnxt/hsi.h index 549231703bce..8c5dac3b3ef3 100644 --- a/include/linux/bnxt/hsi.h +++ b/include/linux/bnxt/hsi.h @@ -276,6 +276,10 @@ struct cmd_nums { #define HWRM_REG_POWER_QUERY 0xe1UL #define HWRM_CORE_FREQUENCY_QUERY 0xe2UL #define HWRM_REG_POWER_HISTOGRAM 0xe3UL + #define HWRM_MONITOR_PAX_HISTOGRAM_START 0xe4UL + #define HWRM_MONITOR_PAX_HISTOGRAM_COLLECT 0xe5UL + #define HWRM_STAT_QUERY_ROCE_STATS 0xe6UL + #define HWRM_STAT_QUERY_ROCE_STATS_EXT 0xe7UL #define HWRM_WOL_FILTER_ALLOC 0xf0UL #define HWRM_WOL_FILTER_FREE 0xf1UL #define HWRM_WOL_FILTER_QCFG 0xf2UL @@ -407,9 +411,8 @@ struct cmd_nums { #define HWRM_FUNC_LAG_UPDATE 0x1b1UL #define HWRM_FUNC_LAG_FREE 0x1b2UL #define HWRM_FUNC_LAG_QCFG 0x1b3UL - #define HWRM_FUNC_TIMEDTX_PACING_RATE_ADD 0x1c2UL - #define HWRM_FUNC_TIMEDTX_PACING_RATE_DELETE 0x1c3UL - #define HWRM_FUNC_TIMEDTX_PACING_RATE_QUERY 0x1c4UL + #define HWRM_FUNC_TTX_PACING_RATE_PROF_QUERY 0x1c3UL + #define HWRM_FUNC_TTX_PACING_RATE_QUERY 0x1c4UL #define HWRM_SELFTEST_QLIST 0x200UL #define HWRM_SELFTEST_EXEC 0x201UL #define HWRM_SELFTEST_IRQ 0x202UL @@ -441,6 +444,7 @@ struct cmd_nums { #define HWRM_MFG_WRITE_CERT_NVM 0x21cUL #define HWRM_PORT_POE_CFG 0x230UL #define HWRM_PORT_POE_QCFG 0x231UL + #define HWRM_PORT_PHY_FDRSTAT 0x232UL #define HWRM_UDCC_QCAPS 0x258UL #define HWRM_UDCC_CFG 0x259UL #define HWRM_UDCC_QCFG 0x25aUL @@ -453,6 +457,8 @@ struct cmd_nums { #define HWRM_QUEUE_PFCWD_TIMEOUT_QCAPS 0x261UL #define HWRM_QUEUE_PFCWD_TIMEOUT_CFG 0x262UL #define HWRM_QUEUE_PFCWD_TIMEOUT_QCFG 0x263UL + #define HWRM_QUEUE_ADPTV_QOS_RX_QCFG 0x264UL + #define HWRM_QUEUE_ADPTV_QOS_TX_QCFG 0x265UL #define HWRM_TF 0x2bcUL #define HWRM_TF_VERSION_GET 0x2bdUL #define HWRM_TF_SESSION_OPEN 0x2c6UL @@ -551,6 +557,8 @@ struct cmd_nums { #define HWRM_DBG_COREDUMP_CAPTURE 0xff2cUL #define HWRM_DBG_PTRACE 0xff2dUL #define HWRM_DBG_SIM_CABLE_STATE 0xff2eUL + #define HWRM_DBG_TOKEN_QUERY_AUTH_IDS 0xff2fUL + #define HWRM_DBG_TOKEN_CFG 0xff30UL #define HWRM_NVM_GET_VPD_FIELD_INFO 0xffeaUL #define HWRM_NVM_SET_VPD_FIELD_INFO 0xffebUL #define HWRM_NVM_DEFRAG 0xffecUL @@ -632,8 +640,8 @@ struct hwrm_err_output { #define HWRM_VERSION_MAJOR 1 #define HWRM_VERSION_MINOR 10 #define HWRM_VERSION_UPDATE 3 -#define HWRM_VERSION_RSVD 97 -#define HWRM_VERSION_STR "1.10.3.97" +#define HWRM_VERSION_RSVD 133 +#define HWRM_VERSION_STR "1.10.3.133" /* hwrm_ver_get_input (size:192b/24B) */ struct hwrm_ver_get_input { @@ -688,6 +696,7 @@ struct hwrm_ver_get_output { #define VER_GET_RESP_DEV_CAPS_CFG_CFA_TRUFLOW_SUPPORTED 0x4000UL #define VER_GET_RESP_DEV_CAPS_CFG_SECURE_BOOT_CAPABLE 0x8000UL #define VER_GET_RESP_DEV_CAPS_CFG_SECURE_SOC_CAPABLE 0x10000UL + #define VER_GET_RESP_DEV_CAPS_CFG_DEBUG_TOKEN_SUPPORTED 0x20000UL u8 roce_fw_maj_8b; u8 roce_fw_min_8b; u8 roce_fw_bld_8b; @@ -872,7 +881,8 @@ struct hwrm_async_event_cmpl { #define ASYNC_EVENT_CMPL_EVENT_ID_REPRESENTOR_PAIR_CHANGE 0x4eUL #define ASYNC_EVENT_CMPL_EVENT_ID_VF_STAT_CHANGE 0x4fUL #define ASYNC_EVENT_CMPL_EVENT_ID_HOST_COREDUMP 0x50UL - #define ASYNC_EVENT_CMPL_EVENT_ID_MAX_RGTR_EVENT_ID 0x51UL + #define ASYNC_EVENT_CMPL_EVENT_ID_ADPTV_QOS 0x51UL + #define ASYNC_EVENT_CMPL_EVENT_ID_MAX_RGTR_EVENT_ID 0x52UL #define ASYNC_EVENT_CMPL_EVENT_ID_FW_TRACE_MSG 0xfeUL #define ASYNC_EVENT_CMPL_EVENT_ID_HWRM_ERROR 0xffUL #define ASYNC_EVENT_CMPL_EVENT_ID_LAST ASYNC_EVENT_CMPL_EVENT_ID_HWRM_ERROR @@ -1344,7 +1354,8 @@ struct hwrm_async_event_cmpl_dbg_buf_producer { #define ASYNC_EVENT_CMPL_DBG_BUF_PRODUCER_EVENT_DATA1_TYPE_CA2_TRACE 0x9UL #define ASYNC_EVENT_CMPL_DBG_BUF_PRODUCER_EVENT_DATA1_TYPE_RIGP1_TRACE 0xaUL #define ASYNC_EVENT_CMPL_DBG_BUF_PRODUCER_EVENT_DATA1_TYPE_AFM_KONG_HWRM_TRACE 0xbUL - #define ASYNC_EVENT_CMPL_DBG_BUF_PRODUCER_EVENT_DATA1_TYPE_LAST ASYNC_EVENT_CMPL_DBG_BUF_PRODUCER_EVENT_DATA1_TYPE_AFM_KONG_HWRM_TRACE + #define ASYNC_EVENT_CMPL_DBG_BUF_PRODUCER_EVENT_DATA1_TYPE_ERR_QPC_TRACE 0xcUL + #define ASYNC_EVENT_CMPL_DBG_BUF_PRODUCER_EVENT_DATA1_TYPE_LAST ASYNC_EVENT_CMPL_DBG_BUF_PRODUCER_EVENT_DATA1_TYPE_ERR_QPC_TRACE }; /* hwrm_async_event_cmpl_hwrm_error (size:128b/16B) */ @@ -1401,7 +1412,11 @@ struct hwrm_async_event_cmpl_error_report_base { #define ASYNC_EVENT_CMPL_ERROR_REPORT_BASE_EVENT_DATA1_ERROR_TYPE_DOORBELL_DROP_THRESHOLD 0x4UL #define ASYNC_EVENT_CMPL_ERROR_REPORT_BASE_EVENT_DATA1_ERROR_TYPE_THERMAL_THRESHOLD 0x5UL #define ASYNC_EVENT_CMPL_ERROR_REPORT_BASE_EVENT_DATA1_ERROR_TYPE_DUAL_DATA_RATE_NOT_SUPPORTED 0x6UL - #define ASYNC_EVENT_CMPL_ERROR_REPORT_BASE_EVENT_DATA1_ERROR_TYPE_LAST ASYNC_EVENT_CMPL_ERROR_REPORT_BASE_EVENT_DATA1_ERROR_TYPE_DUAL_DATA_RATE_NOT_SUPPORTED + #define ASYNC_EVENT_CMPL_ERROR_REPORT_BASE_EVENT_DATA1_ERROR_TYPE_DUP_UDCC_SES 0x7UL + #define ASYNC_EVENT_CMPL_ERROR_REPORT_BASE_EVENT_DATA1_ERROR_TYPE_DB_DROP 0x8UL + #define ASYNC_EVENT_CMPL_ERROR_REPORT_BASE_EVENT_DATA1_ERROR_TYPE_MD_TEMP 0x9UL + #define ASYNC_EVENT_CMPL_ERROR_REPORT_BASE_EVENT_DATA1_ERROR_TYPE_VNIC_ERR 0xaUL + #define ASYNC_EVENT_CMPL_ERROR_REPORT_BASE_EVENT_DATA1_ERROR_TYPE_LAST ASYNC_EVENT_CMPL_ERROR_REPORT_BASE_EVENT_DATA1_ERROR_TYPE_VNIC_ERR }; /* hwrm_async_event_cmpl_error_report_pause_storm (size:128b/16B) */ @@ -1914,6 +1929,12 @@ struct hwrm_func_qcaps_output { #define FUNC_QCAPS_RESP_FLAGS_EXT3_RX_RATE_PROFILE_SEL_SUPPORTED 0x8UL #define FUNC_QCAPS_RESP_FLAGS_EXT3_BIDI_OPT_SUPPORTED 0x10UL #define FUNC_QCAPS_RESP_FLAGS_EXT3_MIRROR_ON_ROCE_SUPPORTED 0x20UL + #define FUNC_QCAPS_RESP_FLAGS_EXT3_ROCE_VF_DYN_ALLOC_SUPPORT 0x40UL + #define FUNC_QCAPS_RESP_FLAGS_EXT3_CHANGE_UDP_SRCPORT_SUPPORT 0x80UL + #define FUNC_QCAPS_RESP_FLAGS_EXT3_PCIE_COMPLIANCE_SUPPORTED 0x100UL + #define FUNC_QCAPS_RESP_FLAGS_EXT3_MULTI_L2_DB_SUPPORTED 0x200UL + #define FUNC_QCAPS_RESP_FLAGS_EXT3_PCIE_SECURE_ATS_SUPPORTED 0x400UL + #define FUNC_QCAPS_RESP_FLAGS_EXT3_MBUF_STATS_SUPPORTED 0x800UL __le16 max_roce_vfs; __le16 max_crypto_rx_flow_filters; u8 unused_3[3]; @@ -1931,7 +1952,7 @@ struct hwrm_func_qcfg_input { u8 unused_0[6]; }; -/* hwrm_func_qcfg_output (size:1344b/168B) */ +/* hwrm_func_qcfg_output (size:1408b/176B) */ struct hwrm_func_qcfg_output { __le16 error_code; __le16 req_type; @@ -2124,7 +2145,43 @@ struct hwrm_func_qcfg_output { #define FUNC_QCFG_RESP_XID_PARTITION_CFG_TX_CK 0x1UL #define FUNC_QCFG_RESP_XID_PARTITION_CFG_RX_CK 0x2UL __le16 mirror_vnic_id; - u8 unused_7[7]; + u8 max_link_width; + #define FUNC_QCFG_RESP_MAX_LINK_WIDTH_UNKNOWN 0x0UL + #define FUNC_QCFG_RESP_MAX_LINK_WIDTH_X1 0x1UL + #define FUNC_QCFG_RESP_MAX_LINK_WIDTH_X2 0x2UL + #define FUNC_QCFG_RESP_MAX_LINK_WIDTH_X4 0x4UL + #define FUNC_QCFG_RESP_MAX_LINK_WIDTH_X8 0x8UL + #define FUNC_QCFG_RESP_MAX_LINK_WIDTH_X16 0x10UL + #define FUNC_QCFG_RESP_MAX_LINK_WIDTH_LAST FUNC_QCFG_RESP_MAX_LINK_WIDTH_X16 + u8 max_link_speed; + #define FUNC_QCFG_RESP_MAX_LINK_SPEED_UNKNOWN 0x0UL + #define FUNC_QCFG_RESP_MAX_LINK_SPEED_G1 0x1UL + #define FUNC_QCFG_RESP_MAX_LINK_SPEED_G2 0x2UL + #define FUNC_QCFG_RESP_MAX_LINK_SPEED_G3 0x3UL + #define FUNC_QCFG_RESP_MAX_LINK_SPEED_G4 0x4UL + #define FUNC_QCFG_RESP_MAX_LINK_SPEED_G5 0x5UL + #define FUNC_QCFG_RESP_MAX_LINK_SPEED_LAST FUNC_QCFG_RESP_MAX_LINK_SPEED_G5 + u8 negotiated_link_width; + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_WIDTH_UNKNOWN 0x0UL + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_WIDTH_X1 0x1UL + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_WIDTH_X2 0x2UL + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_WIDTH_X4 0x4UL + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_WIDTH_X8 0x8UL + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_WIDTH_X16 0x10UL + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_WIDTH_LAST FUNC_QCFG_RESP_NEGOTIATED_LINK_WIDTH_X16 + u8 negotiated_link_speed; + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_SPEED_UNKNOWN 0x0UL + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_SPEED_G1 0x1UL + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_SPEED_G2 0x2UL + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_SPEED_G3 0x3UL + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_SPEED_G4 0x4UL + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_SPEED_G5 0x5UL + #define FUNC_QCFG_RESP_NEGOTIATED_LINK_SPEED_LAST FUNC_QCFG_RESP_NEGOTIATED_LINK_SPEED_G5 + u8 unused_7[2]; + u8 pcie_compliance; + u8 unused_8; + __le16 l2_db_multi_page_size_kb; + u8 unused_9[5]; u8 valid; }; @@ -2322,6 +2379,7 @@ struct hwrm_func_cfg_input { #define FUNC_CFG_REQ_ENABLES2_ROCE_MAX_GID_PER_VF 0x200UL #define FUNC_CFG_REQ_ENABLES2_XID_PARTITION_CFG 0x400UL #define FUNC_CFG_REQ_ENABLES2_PHYSICAL_SLOT_NUMBER 0x800UL + #define FUNC_CFG_REQ_ENABLES2_PCIE_COMPLIANCE 0x1000UL u8 port_kdnet_mode; #define FUNC_CFG_REQ_PORT_KDNET_MODE_DISABLED 0x0UL #define FUNC_CFG_REQ_PORT_KDNET_MODE_ENABLED 0x1UL @@ -2353,7 +2411,8 @@ struct hwrm_func_cfg_input { __le16 xid_partition_cfg; #define FUNC_CFG_REQ_XID_PARTITION_CFG_TX_CK 0x1UL #define FUNC_CFG_REQ_XID_PARTITION_CFG_RX_CK 0x2UL - __le16 unused_2; + u8 pcie_compliance; + u8 unused_2; }; /* hwrm_func_cfg_output (size:128b/16B) */ @@ -2370,11 +2429,41 @@ struct hwrm_func_cfg_output { struct hwrm_func_cfg_cmd_err { u8 code; #define FUNC_CFG_CMD_ERR_CODE_UNKNOWN 0x0UL - #define FUNC_CFG_CMD_ERR_CODE_PARTITION_MIN_BW_RANGE 0x1UL - #define FUNC_CFG_CMD_ERR_CODE_PARTITION_MIN_MORE_THAN_MAX 0x2UL - #define FUNC_CFG_CMD_ERR_CODE_PARTITION_MIN_BW_UNSUPPORTED 0x3UL - #define FUNC_CFG_CMD_ERR_CODE_PARTITION_BW_PERCENT 0x4UL - #define FUNC_CFG_CMD_ERR_CODE_LAST FUNC_CFG_CMD_ERR_CODE_PARTITION_BW_PERCENT + #define FUNC_CFG_CMD_ERR_CODE_PARTITION_BW_OUT_OF_RANGE 0x1UL + #define FUNC_CFG_CMD_ERR_CODE_NPAR_PARTITION_DOWN_FAILED 0x2UL + #define FUNC_CFG_CMD_ERR_CODE_TPID_SET_DFLT_VLAN_NOT_SET 0x3UL + #define FUNC_CFG_CMD_ERR_CODE_RES_ARRAY_ALLOC_FAILED 0x4UL + #define FUNC_CFG_CMD_ERR_CODE_TX_RING_ASSET_TEST_FAILED 0x5UL + #define FUNC_CFG_CMD_ERR_CODE_TX_RING_RES_UPDATE_FAILED 0x6UL + #define FUNC_CFG_CMD_ERR_CODE_APPLY_MAX_BW_FAILED 0x7UL + #define FUNC_CFG_CMD_ERR_CODE_ENABLE_EVB_FAILED 0x8UL + #define FUNC_CFG_CMD_ERR_CODE_RSS_CTXT_ASSET_TEST_FAILED 0x9UL + #define FUNC_CFG_CMD_ERR_CODE_RSS_CTXT_RES_UPDATE_FAILED 0xaUL + #define FUNC_CFG_CMD_ERR_CODE_CMPL_RING_ASSET_TEST_FAILED 0xbUL + #define FUNC_CFG_CMD_ERR_CODE_CMPL_RING_RES_UPDATE_FAILED 0xcUL + #define FUNC_CFG_CMD_ERR_CODE_NQ_ASSET_TEST_FAILED 0xdUL + #define FUNC_CFG_CMD_ERR_CODE_NQ_RES_UPDATE_FAILED 0xeUL + #define FUNC_CFG_CMD_ERR_CODE_RX_RING_ASSET_TEST_FAILED 0xfUL + #define FUNC_CFG_CMD_ERR_CODE_RX_RING_RES_UPDATE_FAILED 0x10UL + #define FUNC_CFG_CMD_ERR_CODE_VNIC_ASSET_TEST_FAILED 0x11UL + #define FUNC_CFG_CMD_ERR_CODE_VNIC_RES_UPDATE_FAILED 0x12UL + #define FUNC_CFG_CMD_ERR_CODE_FAILED_TO_START_STATS_THREAD 0x13UL + #define FUNC_CFG_CMD_ERR_CODE_RDMA_SRIOV_DISABLED 0x14UL + #define FUNC_CFG_CMD_ERR_CODE_TX_KTLS_DISABLED 0x15UL + #define FUNC_CFG_CMD_ERR_CODE_TX_KTLS_ASSET_TEST_FAILED 0x16UL + #define FUNC_CFG_CMD_ERR_CODE_TX_KTLS_RES_UPDATE_FAILED 0x17UL + #define FUNC_CFG_CMD_ERR_CODE_RX_KTLS_DISABLED 0x18UL + #define FUNC_CFG_CMD_ERR_CODE_RX_KTLS_ASSET_TEST_FAILED 0x19UL + #define FUNC_CFG_CMD_ERR_CODE_RX_KTLS_RES_UPDATE_FAILED 0x1aUL + #define FUNC_CFG_CMD_ERR_CODE_TX_QUIC_DISABLED 0x1bUL + #define FUNC_CFG_CMD_ERR_CODE_TX_QUIC_ASSET_TEST_FAILED 0x1cUL + #define FUNC_CFG_CMD_ERR_CODE_TX_QUIC_RES_UPDATE_FAILED 0x1dUL + #define FUNC_CFG_CMD_ERR_CODE_RX_QUIC_DISABLED 0x1eUL + #define FUNC_CFG_CMD_ERR_CODE_RX_QUIC_ASSET_TEST_FAILED 0x1fUL + #define FUNC_CFG_CMD_ERR_CODE_RX_QUIC_RES_UPDATE_FAILED 0x20UL + #define FUNC_CFG_CMD_ERR_CODE_INVALID_KDNET_MODE 0x21UL + #define FUNC_CFG_CMD_ERR_CODE_SCHQ_CFG_FAIL 0x22UL + #define FUNC_CFG_CMD_ERR_CODE_LAST FUNC_CFG_CMD_ERR_CODE_SCHQ_CFG_FAIL u8 unused_0[7]; }; @@ -3780,6 +3869,7 @@ struct hwrm_func_backing_store_cfg_v2_input { #define FUNC_BACKING_STORE_CFG_V2_REQ_TYPE_CA2_TRACE 0x28UL #define FUNC_BACKING_STORE_CFG_V2_REQ_TYPE_RIGP1_TRACE 0x29UL #define FUNC_BACKING_STORE_CFG_V2_REQ_TYPE_AFM_KONG_HWRM_TRACE 0x2aUL + #define FUNC_BACKING_STORE_CFG_V2_REQ_TYPE_ERR_QPC_TRACE 0x2bUL #define FUNC_BACKING_STORE_CFG_V2_REQ_TYPE_INVALID 0xffffUL #define FUNC_BACKING_STORE_CFG_V2_REQ_TYPE_LAST FUNC_BACKING_STORE_CFG_V2_REQ_TYPE_INVALID __le16 instance; @@ -3865,6 +3955,7 @@ struct hwrm_func_backing_store_qcfg_v2_input { #define FUNC_BACKING_STORE_QCFG_V2_REQ_TYPE_CA2_TRACE 0x28UL #define FUNC_BACKING_STORE_QCFG_V2_REQ_TYPE_RIGP1_TRACE 0x29UL #define FUNC_BACKING_STORE_QCFG_V2_REQ_TYPE_AFM_KONG_HWRM_TRACE 0x2aUL + #define FUNC_BACKING_STORE_QCFG_V2_REQ_TYPE_ERR_QPC_TRACE 0x2bUL #define FUNC_BACKING_STORE_QCFG_V2_REQ_TYPE_INVALID 0xffffUL #define FUNC_BACKING_STORE_QCFG_V2_REQ_TYPE_LAST FUNC_BACKING_STORE_QCFG_V2_REQ_TYPE_INVALID __le16 instance; @@ -3904,6 +3995,7 @@ struct hwrm_func_backing_store_qcfg_v2_output { #define FUNC_BACKING_STORE_QCFG_V2_RESP_TYPE_CA1_TRACE 0x27UL #define FUNC_BACKING_STORE_QCFG_V2_RESP_TYPE_CA2_TRACE 0x28UL #define FUNC_BACKING_STORE_QCFG_V2_RESP_TYPE_RIGP1_TRACE 0x29UL + #define FUNC_BACKING_STORE_QCFG_V2_RESP_TYPE_ERR_QPC_TRACE 0x2aUL #define FUNC_BACKING_STORE_QCFG_V2_RESP_TYPE_INVALID 0xffffUL #define FUNC_BACKING_STORE_QCFG_V2_RESP_TYPE_LAST FUNC_BACKING_STORE_QCFG_V2_RESP_TYPE_INVALID __le16 instance; @@ -4027,6 +4119,7 @@ struct hwrm_func_backing_store_qcaps_v2_input { #define FUNC_BACKING_STORE_QCAPS_V2_REQ_TYPE_CA2_TRACE 0x28UL #define FUNC_BACKING_STORE_QCAPS_V2_REQ_TYPE_RIGP1_TRACE 0x29UL #define FUNC_BACKING_STORE_QCAPS_V2_REQ_TYPE_AFM_KONG_HWRM_TRACE 0x2aUL + #define FUNC_BACKING_STORE_QCAPS_V2_REQ_TYPE_ERR_QPC_TRACE 0x2bUL #define FUNC_BACKING_STORE_QCAPS_V2_REQ_TYPE_INVALID 0xffffUL #define FUNC_BACKING_STORE_QCAPS_V2_REQ_TYPE_LAST FUNC_BACKING_STORE_QCAPS_V2_REQ_TYPE_INVALID u8 rsvd[6]; @@ -4070,6 +4163,7 @@ struct hwrm_func_backing_store_qcaps_v2_output { #define FUNC_BACKING_STORE_QCAPS_V2_RESP_TYPE_CA2_TRACE 0x28UL #define FUNC_BACKING_STORE_QCAPS_V2_RESP_TYPE_RIGP1_TRACE 0x29UL #define FUNC_BACKING_STORE_QCAPS_V2_RESP_TYPE_AFM_KONG_HWRM_TRACE 0x2aUL + #define FUNC_BACKING_STORE_QCAPS_V2_RESP_TYPE_ERR_QPC_TRACE 0x2bUL #define FUNC_BACKING_STORE_QCAPS_V2_RESP_TYPE_INVALID 0xffffUL #define FUNC_BACKING_STORE_QCAPS_V2_RESP_TYPE_LAST FUNC_BACKING_STORE_QCAPS_V2_RESP_TYPE_INVALID __le16 entry_size; @@ -4216,6 +4310,10 @@ struct hwrm_port_phy_cfg_input { #define PORT_PHY_CFG_REQ_FLAGS_FEC_RS272_1XN_DISABLE 0x100000UL #define PORT_PHY_CFG_REQ_FLAGS_FEC_RS272_IEEE_ENABLE 0x200000UL #define PORT_PHY_CFG_REQ_FLAGS_FEC_RS272_IEEE_DISABLE 0x400000UL + #define PORT_PHY_CFG_REQ_FLAGS_LINK_TRAINING_ENABLE 0x800000UL + #define PORT_PHY_CFG_REQ_FLAGS_LINK_TRAINING_DISABLE 0x1000000UL + #define PORT_PHY_CFG_REQ_FLAGS_PRECODING_ENABLE 0x2000000UL + #define PORT_PHY_CFG_REQ_FLAGS_PRECODING_DISABLE 0x4000000UL __le32 enables; #define PORT_PHY_CFG_REQ_ENABLES_AUTO_MODE 0x1UL #define PORT_PHY_CFG_REQ_ENABLES_AUTO_DUPLEX 0x2UL @@ -4703,6 +4801,8 @@ struct hwrm_port_phy_qcfg_output { #define PORT_PHY_QCFG_RESP_OPTION_FLAGS_MEDIA_AUTO_DETECT 0x1UL #define PORT_PHY_QCFG_RESP_OPTION_FLAGS_SIGNAL_MODE_KNOWN 0x2UL #define PORT_PHY_QCFG_RESP_OPTION_FLAGS_SPEEDS2_SUPPORTED 0x4UL + #define PORT_PHY_QCFG_RESP_OPTION_FLAGS_LINK_TRAINING 0x8UL + #define PORT_PHY_QCFG_RESP_OPTION_FLAGS_PRECODING 0x10UL char phy_vendor_name[16]; char phy_vendor_partnumber[16]; __le16 support_pam4_speeds; @@ -4725,6 +4825,10 @@ struct hwrm_port_phy_qcfg_output { u8 link_down_reason; #define PORT_PHY_QCFG_RESP_LINK_DOWN_REASON_RF 0x1UL #define PORT_PHY_QCFG_RESP_LINK_DOWN_REASON_OTP_SPEED_VIOLATION 0x2UL + #define PORT_PHY_QCFG_RESP_LINK_DOWN_REASON_CABLE_REMOVED 0x4UL + #define PORT_PHY_QCFG_RESP_LINK_DOWN_REASON_MODULE_FAULT 0x8UL + #define PORT_PHY_QCFG_RESP_LINK_DOWN_REASON_BMC_REQUEST 0x10UL + #define PORT_PHY_QCFG_RESP_LINK_DOWN_REASON_TX_LASER_DISABLED 0x20UL __le16 support_speeds2; #define PORT_PHY_QCFG_RESP_SUPPORT_SPEEDS2_1GB 0x1UL #define PORT_PHY_QCFG_RESP_SUPPORT_SPEEDS2_10GB 0x2UL @@ -5882,9 +5986,10 @@ struct hwrm_port_led_qcaps_output { #define PORT_LED_QCAPS_RESP_LED0_STATE_CAPS_BLINK_SUPPORTED 0x8UL #define PORT_LED_QCAPS_RESP_LED0_STATE_CAPS_BLINK_ALT_SUPPORTED 0x10UL __le16 led0_color_caps; - #define PORT_LED_QCAPS_RESP_LED0_COLOR_CAPS_RSVD 0x1UL - #define PORT_LED_QCAPS_RESP_LED0_COLOR_CAPS_AMBER_SUPPORTED 0x2UL - #define PORT_LED_QCAPS_RESP_LED0_COLOR_CAPS_GREEN_SUPPORTED 0x4UL + #define PORT_LED_QCAPS_RESP_LED0_COLOR_CAPS_RSVD 0x1UL + #define PORT_LED_QCAPS_RESP_LED0_COLOR_CAPS_AMBER_SUPPORTED 0x2UL + #define PORT_LED_QCAPS_RESP_LED0_COLOR_CAPS_GREEN_SUPPORTED 0x4UL + #define PORT_LED_QCAPS_RESP_LED0_COLOR_CAPS_GRNAMB_SUPPORTED 0x8UL u8 led1_id; u8 led1_type; #define PORT_LED_QCAPS_RESP_LED1_TYPE_SPEED 0x0UL @@ -5900,9 +6005,10 @@ struct hwrm_port_led_qcaps_output { #define PORT_LED_QCAPS_RESP_LED1_STATE_CAPS_BLINK_SUPPORTED 0x8UL #define PORT_LED_QCAPS_RESP_LED1_STATE_CAPS_BLINK_ALT_SUPPORTED 0x10UL __le16 led1_color_caps; - #define PORT_LED_QCAPS_RESP_LED1_COLOR_CAPS_RSVD 0x1UL - #define PORT_LED_QCAPS_RESP_LED1_COLOR_CAPS_AMBER_SUPPORTED 0x2UL - #define PORT_LED_QCAPS_RESP_LED1_COLOR_CAPS_GREEN_SUPPORTED 0x4UL + #define PORT_LED_QCAPS_RESP_LED1_COLOR_CAPS_RSVD 0x1UL + #define PORT_LED_QCAPS_RESP_LED1_COLOR_CAPS_AMBER_SUPPORTED 0x2UL + #define PORT_LED_QCAPS_RESP_LED1_COLOR_CAPS_GREEN_SUPPORTED 0x4UL + #define PORT_LED_QCAPS_RESP_LED1_COLOR_CAPS_GRNAMB_SUPPORTED 0x8UL u8 led2_id; u8 led2_type; #define PORT_LED_QCAPS_RESP_LED2_TYPE_SPEED 0x0UL @@ -5918,9 +6024,10 @@ struct hwrm_port_led_qcaps_output { #define PORT_LED_QCAPS_RESP_LED2_STATE_CAPS_BLINK_SUPPORTED 0x8UL #define PORT_LED_QCAPS_RESP_LED2_STATE_CAPS_BLINK_ALT_SUPPORTED 0x10UL __le16 led2_color_caps; - #define PORT_LED_QCAPS_RESP_LED2_COLOR_CAPS_RSVD 0x1UL - #define PORT_LED_QCAPS_RESP_LED2_COLOR_CAPS_AMBER_SUPPORTED 0x2UL - #define PORT_LED_QCAPS_RESP_LED2_COLOR_CAPS_GREEN_SUPPORTED 0x4UL + #define PORT_LED_QCAPS_RESP_LED2_COLOR_CAPS_RSVD 0x1UL + #define PORT_LED_QCAPS_RESP_LED2_COLOR_CAPS_AMBER_SUPPORTED 0x2UL + #define PORT_LED_QCAPS_RESP_LED2_COLOR_CAPS_GREEN_SUPPORTED 0x4UL + #define PORT_LED_QCAPS_RESP_LED2_COLOR_CAPS_GRNAMB_SUPPORTED 0x8UL u8 led3_id; u8 led3_type; #define PORT_LED_QCAPS_RESP_LED3_TYPE_SPEED 0x0UL @@ -5936,9 +6043,10 @@ struct hwrm_port_led_qcaps_output { #define PORT_LED_QCAPS_RESP_LED3_STATE_CAPS_BLINK_SUPPORTED 0x8UL #define PORT_LED_QCAPS_RESP_LED3_STATE_CAPS_BLINK_ALT_SUPPORTED 0x10UL __le16 led3_color_caps; - #define PORT_LED_QCAPS_RESP_LED3_COLOR_CAPS_RSVD 0x1UL - #define PORT_LED_QCAPS_RESP_LED3_COLOR_CAPS_AMBER_SUPPORTED 0x2UL - #define PORT_LED_QCAPS_RESP_LED3_COLOR_CAPS_GREEN_SUPPORTED 0x4UL + #define PORT_LED_QCAPS_RESP_LED3_COLOR_CAPS_RSVD 0x1UL + #define PORT_LED_QCAPS_RESP_LED3_COLOR_CAPS_AMBER_SUPPORTED 0x2UL + #define PORT_LED_QCAPS_RESP_LED3_COLOR_CAPS_GREEN_SUPPORTED 0x4UL + #define PORT_LED_QCAPS_RESP_LED3_COLOR_CAPS_GRNAMB_SUPPORTED 0x8UL u8 unused_4[3]; u8 valid; }; @@ -7036,9 +7144,22 @@ struct hwrm_vnic_rss_cfg_output { /* hwrm_vnic_rss_cfg_cmd_err (size:64b/8B) */ struct hwrm_vnic_rss_cfg_cmd_err { u8 code; - #define VNIC_RSS_CFG_CMD_ERR_CODE_UNKNOWN 0x0UL - #define VNIC_RSS_CFG_CMD_ERR_CODE_INTERFACE_NOT_READY 0x1UL - #define VNIC_RSS_CFG_CMD_ERR_CODE_LAST VNIC_RSS_CFG_CMD_ERR_CODE_INTERFACE_NOT_READY + #define VNIC_RSS_CFG_CMD_ERR_CODE_UNKNOWN 0x0UL + #define VNIC_RSS_CFG_CMD_ERR_CODE_INTERFACE_NOT_READY 0x1UL + #define VNIC_RSS_CFG_CMD_ERR_CODE_UNABLE_TO_GET_RSS_CFG 0x2UL + #define VNIC_RSS_CFG_CMD_ERR_CODE_HASH_TYPE_UNSUPPORTED 0x3UL + #define VNIC_RSS_CFG_CMD_ERR_CODE_HASH_TYPE_ERR 0x4UL + #define VNIC_RSS_CFG_CMD_ERR_CODE_HASH_MODE_FAIL 0x5UL + #define VNIC_RSS_CFG_CMD_ERR_CODE_RING_GRP_TABLE_ALLOC_ERR 0x6UL + #define VNIC_RSS_CFG_CMD_ERR_CODE_HASH_KEY_ALLOC_ERR 0x7UL + #define VNIC_RSS_CFG_CMD_ERR_CODE_DMA_FAILED 0x8UL + #define VNIC_RSS_CFG_CMD_ERR_CODE_RX_RING_ALLOC_ERR 0x9UL + #define VNIC_RSS_CFG_CMD_ERR_CODE_CMPL_RING_ALLOC_ERR 0xaUL + #define VNIC_RSS_CFG_CMD_ERR_CODE_HW_SET_RSS_FAILED 0xbUL + #define VNIC_RSS_CFG_CMD_ERR_CODE_CTX_INVALID 0xcUL + #define VNIC_RSS_CFG_CMD_ERR_CODE_VNIC_INVALID 0xdUL + #define VNIC_RSS_CFG_CMD_ERR_CODE_VNIC_RING_TABLE_PAIR_INVALID 0xeUL + #define VNIC_RSS_CFG_CMD_ERR_CODE_LAST VNIC_RSS_CFG_CMD_ERR_CODE_VNIC_RING_TABLE_PAIR_INVALID u8 unused_0[7]; }; @@ -7177,7 +7298,7 @@ struct hwrm_vnic_rss_cos_lb_ctx_free_output { u8 valid; }; -/* hwrm_ring_alloc_input (size:704b/88B) */ +/* hwrm_ring_alloc_input (size:768b/96B) */ struct hwrm_ring_alloc_input { __le16 req_type; __le16 cmpl_ring; @@ -7195,6 +7316,7 @@ struct hwrm_ring_alloc_input { #define RING_ALLOC_REQ_ENABLES_MPC_CHNLS_TYPE 0x400UL #define RING_ALLOC_REQ_ENABLES_STEERING_TAG_VALID 0x800UL #define RING_ALLOC_REQ_ENABLES_RX_RATE_PROFILE_VALID 0x1000UL + #define RING_ALLOC_REQ_ENABLES_DPI_VALID 0x2000UL u8 ring_type; #define RING_ALLOC_REQ_RING_TYPE_L2_CMPL 0x0UL #define RING_ALLOC_REQ_RING_TYPE_TX 0x1UL @@ -7287,6 +7409,8 @@ struct hwrm_ring_alloc_input { #define RING_ALLOC_REQ_RX_RATE_PROFILE_SEL_LAST RING_ALLOC_REQ_RX_RATE_PROFILE_SEL_POLL_MODE u8 unused_4; __le64 cq_handle; + __le16 dpi; + __le16 unused_5[3]; }; /* hwrm_ring_alloc_output (size:128b/16B) */ @@ -7776,7 +7900,10 @@ struct hwrm_cfa_l2_set_rx_mask_cmd_err { u8 code; #define CFA_L2_SET_RX_MASK_CMD_ERR_CODE_UNKNOWN 0x0UL #define CFA_L2_SET_RX_MASK_CMD_ERR_CODE_NTUPLE_FILTER_CONFLICT_ERR 0x1UL - #define CFA_L2_SET_RX_MASK_CMD_ERR_CODE_LAST CFA_L2_SET_RX_MASK_CMD_ERR_CODE_NTUPLE_FILTER_CONFLICT_ERR + #define CFA_L2_SET_RX_MASK_CMD_ERR_CODE_MAX_VLAN_TAGS 0x2UL + #define CFA_L2_SET_RX_MASK_CMD_ERR_CODE_INVALID_VNIC_ID 0x3UL + #define CFA_L2_SET_RX_MASK_CMD_ERR_CODE_INVALID_ACTION 0x4UL + #define CFA_L2_SET_RX_MASK_CMD_ERR_CODE_LAST CFA_L2_SET_RX_MASK_CMD_ERR_CODE_INVALID_ACTION u8 unused_0[7]; }; @@ -8109,9 +8236,38 @@ struct hwrm_cfa_ntuple_filter_alloc_output { /* hwrm_cfa_ntuple_filter_alloc_cmd_err (size:64b/8B) */ struct hwrm_cfa_ntuple_filter_alloc_cmd_err { u8 code; - #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_UNKNOWN 0x0UL - #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_RX_MASK_VLAN_CONFLICT_ERR 0x1UL - #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_LAST CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_RX_MASK_VLAN_CONFLICT_ERR + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_UNKNOWN 0x0UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_ZERO_MAC 0x65UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_BC_MC_MAC 0x66UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_INVALID_VNIC 0x67UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_INVALID_PF_FID 0x68UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_INVALID_L2_CTXT_ID 0x69UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_NULL_L2_CTXT_CFG 0x6aUL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_NULL_L2_DATA_FLD 0x6bUL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_INVALID_CFA_LAYOUT 0x6cUL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_L2_CTXT_ALLOC_FAIL 0x6dUL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_ROCE_FLOW_ERR 0x6eUL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_INVALID_OWNER_FID 0x6fUL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_ZERO_REF_CNT 0x70UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_INVALID_FLOW_TYPE 0x71UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_INVALID_IVLAN 0x72UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_MAX_VLAN_ID 0x73UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_INVALID_TNL_REQ 0x74UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_L2_ADDR 0x75UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_L2_IVLAN 0x76UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_L3_ADDR 0x77UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_L3_ADDR_TYPE 0x78UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_T_L3_ADDR_TYPE 0x79UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_DST_VNIC_ID 0x7aUL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_VNI 0x7bUL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_INVALID_DST_ID 0x7cUL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_FAIL_ROCE_L2_FLOW 0x7dUL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_INVALID_NPAR_VLAN 0x7eUL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_ATSP_ADD 0x7fUL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_DFLT_VLAN_FAIL 0x80UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_INVALID_L3_TYPE 0x81UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_VAL_FAIL_TNL_FLOW 0x82UL + #define CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_LAST CFA_NTUPLE_FILTER_ALLOC_CMD_ERR_CODE_VAL_FAIL_TNL_FLOW u8 unused_0[7]; }; @@ -9181,7 +9337,7 @@ struct pcie_ctx_hw_stats { __le64 pcie_recovery_histogram; }; -/* pcie_ctx_hw_stats_v2 (size:4096b/512B) */ +/* pcie_ctx_hw_stats_v2 (size:4544b/568B) */ struct pcie_ctx_hw_stats_v2 { __le64 pcie_pl_signal_integrity; __le64 pcie_dl_signal_integrity; @@ -9212,6 +9368,9 @@ struct pcie_ctx_hw_stats_v2 { __le64 pcie_other_packet_count; __le64 pcie_blocked_packet_count; __le64 pcie_cmpl_packet_count; + __le32 pcie_rd_latency_histogram[12]; + __le32 pcie_rd_latency_all_normal_count; + __le32 unused_2; }; /* hwrm_stat_generic_qstats_input (size:256b/32B) */ @@ -9406,7 +9565,8 @@ struct hwrm_struct_hdr { #define STRUCT_HDR_STRUCT_ID_MSIX_PER_VF 0xc8UL #define STRUCT_HDR_STRUCT_ID_UDCC_RTT_BUCKET_COUNT 0x12cUL #define STRUCT_HDR_STRUCT_ID_UDCC_RTT_BUCKET_BOUND 0x12dUL - #define STRUCT_HDR_STRUCT_ID_LAST STRUCT_HDR_STRUCT_ID_UDCC_RTT_BUCKET_BOUND + #define STRUCT_HDR_STRUCT_ID_DBG_TOKEN_CLAIMS 0x190UL + #define STRUCT_HDR_STRUCT_ID_LAST STRUCT_HDR_STRUCT_ID_DBG_TOKEN_CLAIMS __le16 len; u8 version; #define STRUCT_HDR_VERSION_0 0x0UL @@ -9459,11 +9619,13 @@ struct hwrm_fw_set_structured_data_output { /* hwrm_fw_set_structured_data_cmd_err (size:64b/8B) */ struct hwrm_fw_set_structured_data_cmd_err { u8 code; - #define FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_UNKNOWN 0x0UL - #define FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_BAD_HDR_CNT 0x1UL - #define FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_BAD_FMT 0x2UL - #define FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_BAD_ID 0x3UL - #define FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_LAST FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_BAD_ID + #define FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_UNKNOWN 0x0UL + #define FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_BAD_HDR_CNT 0x1UL + #define FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_BAD_FMT 0x2UL + #define FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_BAD_ID 0x3UL + #define FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_ALREADY_ADDED 0x4UL + #define FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_INST_IN_PROG 0x5UL + #define FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_LAST FW_SET_STRUCTURED_DATA_CMD_ERR_CODE_INST_IN_PROG u8 unused_0[7]; }; @@ -9487,7 +9649,9 @@ struct hwrm_fw_get_structured_data_input { #define FW_GET_STRUCTURED_DATA_REQ_SUBTYPE_NON_TPMR_PEER 0x201UL #define FW_GET_STRUCTURED_DATA_REQ_SUBTYPE_NON_TPMR_OPERATIONAL 0x202UL #define FW_GET_STRUCTURED_DATA_REQ_SUBTYPE_HOST_OPERATIONAL 0x300UL - #define FW_GET_STRUCTURED_DATA_REQ_SUBTYPE_LAST FW_GET_STRUCTURED_DATA_REQ_SUBTYPE_HOST_OPERATIONAL + #define FW_GET_STRUCTURED_DATA_REQ_SUBTYPE_CLAIMS_SUPPORTED 0x320UL + #define FW_GET_STRUCTURED_DATA_REQ_SUBTYPE_CLAIMS_ACTIVE 0x321UL + #define FW_GET_STRUCTURED_DATA_REQ_SUBTYPE_LAST FW_GET_STRUCTURED_DATA_REQ_SUBTYPE_CLAIMS_ACTIVE u8 count; u8 unused_0; }; @@ -10172,7 +10336,8 @@ struct hwrm_dbg_log_buffer_flush_input { #define DBG_LOG_BUFFER_FLUSH_REQ_TYPE_CA2_TRACE 0x9UL #define DBG_LOG_BUFFER_FLUSH_REQ_TYPE_RIGP1_TRACE 0xaUL #define DBG_LOG_BUFFER_FLUSH_REQ_TYPE_AFM_KONG_HWRM_TRACE 0xbUL - #define DBG_LOG_BUFFER_FLUSH_REQ_TYPE_LAST DBG_LOG_BUFFER_FLUSH_REQ_TYPE_AFM_KONG_HWRM_TRACE + #define DBG_LOG_BUFFER_FLUSH_REQ_TYPE_ERR_QPC_TRACE 0xcUL + #define DBG_LOG_BUFFER_FLUSH_REQ_TYPE_LAST DBG_LOG_BUFFER_FLUSH_REQ_TYPE_ERR_QPC_TRACE u8 unused_1[2]; __le32 flags; #define DBG_LOG_BUFFER_FLUSH_REQ_FLAGS_FLUSH_ALL_BUFFERS 0x1UL @@ -10295,10 +10460,15 @@ struct hwrm_nvm_write_output { /* hwrm_nvm_write_cmd_err (size:64b/8B) */ struct hwrm_nvm_write_cmd_err { u8 code; - #define NVM_WRITE_CMD_ERR_CODE_UNKNOWN 0x0UL - #define NVM_WRITE_CMD_ERR_CODE_FRAG_ERR 0x1UL - #define NVM_WRITE_CMD_ERR_CODE_NO_SPACE 0x2UL - #define NVM_WRITE_CMD_ERR_CODE_LAST NVM_WRITE_CMD_ERR_CODE_NO_SPACE + #define NVM_WRITE_CMD_ERR_CODE_UNKNOWN 0x0UL + #define NVM_WRITE_CMD_ERR_CODE_FRAG_ERR 0x1UL + #define NVM_WRITE_CMD_ERR_CODE_NO_SPACE 0x2UL + #define NVM_WRITE_CMD_ERR_CODE_WRITE_FAILED 0x3UL + #define NVM_WRITE_CMD_ERR_CODE_REQD_ERASE_FAILED 0x4UL + #define NVM_WRITE_CMD_ERR_CODE_VERIFY_FAILED 0x5UL + #define NVM_WRITE_CMD_ERR_CODE_INVALID_HEADER 0x6UL + #define NVM_WRITE_CMD_ERR_CODE_UPDATE_DIGEST_FAILED 0x7UL + #define NVM_WRITE_CMD_ERR_CODE_LAST NVM_WRITE_CMD_ERR_CODE_UPDATE_DIGEST_FAILED u8 unused_0[7]; }; @@ -10438,7 +10608,11 @@ struct hwrm_nvm_get_dev_info_output { __le16 srt2_fw_minor; __le16 srt2_fw_build; __le16 srt2_fw_patch; - u8 unused_0[7]; + u8 security_soc_fw_major; + u8 security_soc_fw_minor; + u8 security_soc_fw_build; + u8 security_soc_fw_patch; + u8 unused_0[3]; u8 valid; }; @@ -10568,7 +10742,9 @@ struct hwrm_nvm_install_update_cmd_err { #define NVM_INSTALL_UPDATE_CMD_ERR_CODE_NO_SPACE 0x2UL #define NVM_INSTALL_UPDATE_CMD_ERR_CODE_ANTI_ROLLBACK 0x3UL #define NVM_INSTALL_UPDATE_CMD_ERR_CODE_NO_VOLTREG_SUPPORT 0x4UL - #define NVM_INSTALL_UPDATE_CMD_ERR_CODE_LAST NVM_INSTALL_UPDATE_CMD_ERR_CODE_NO_VOLTREG_SUPPORT + #define NVM_INSTALL_UPDATE_CMD_ERR_CODE_DEFRAG_FAILED 0x5UL + #define NVM_INSTALL_UPDATE_CMD_ERR_CODE_UNKNOWN_DIR_ERR 0x6UL + #define NVM_INSTALL_UPDATE_CMD_ERR_CODE_LAST NVM_INSTALL_UPDATE_CMD_ERR_CODE_UNKNOWN_DIR_ERR u8 unused_0[7]; }; @@ -10591,7 +10767,8 @@ struct hwrm_nvm_get_variable_input { __le16 index_2; __le16 index_3; u8 flags; - #define NVM_GET_VARIABLE_REQ_FLAGS_FACTORY_DFLT 0x1UL + #define NVM_GET_VARIABLE_REQ_FLAGS_FACTORY_DFLT 0x1UL + #define NVM_GET_VARIABLE_REQ_FLAGS_VALIDATE_OPT_VALUE 0x2UL u8 unused_0; }; @@ -10606,18 +10783,25 @@ struct hwrm_nvm_get_variable_output { #define NVM_GET_VARIABLE_RESP_OPTION_NUM_RSVD_0 0x0UL #define NVM_GET_VARIABLE_RESP_OPTION_NUM_RSVD_FFFF 0xffffUL #define NVM_GET_VARIABLE_RESP_OPTION_NUM_LAST NVM_GET_VARIABLE_RESP_OPTION_NUM_RSVD_FFFF - u8 unused_0[3]; + u8 flags; + #define NVM_GET_VARIABLE_RESP_FLAGS_VALIDATE_OPT_VALUE 0x1UL + u8 unused_0[2]; u8 valid; }; /* hwrm_nvm_get_variable_cmd_err (size:64b/8B) */ struct hwrm_nvm_get_variable_cmd_err { u8 code; - #define NVM_GET_VARIABLE_CMD_ERR_CODE_UNKNOWN 0x0UL - #define NVM_GET_VARIABLE_CMD_ERR_CODE_VAR_NOT_EXIST 0x1UL - #define NVM_GET_VARIABLE_CMD_ERR_CODE_CORRUPT_VAR 0x2UL - #define NVM_GET_VARIABLE_CMD_ERR_CODE_LEN_TOO_SHORT 0x3UL - #define NVM_GET_VARIABLE_CMD_ERR_CODE_LAST NVM_GET_VARIABLE_CMD_ERR_CODE_LEN_TOO_SHORT + #define NVM_GET_VARIABLE_CMD_ERR_CODE_UNKNOWN 0x0UL + #define NVM_GET_VARIABLE_CMD_ERR_CODE_VAR_NOT_EXIST 0x1UL + #define NVM_GET_VARIABLE_CMD_ERR_CODE_CORRUPT_VAR 0x2UL + #define NVM_GET_VARIABLE_CMD_ERR_CODE_LEN_TOO_SHORT 0x3UL + #define NVM_GET_VARIABLE_CMD_ERR_CODE_INDEX_INVALID 0x4UL + #define NVM_GET_VARIABLE_CMD_ERR_CODE_ACCESS_DENIED 0x5UL + #define NVM_GET_VARIABLE_CMD_ERR_CODE_CB_FAILED 0x6UL + #define NVM_GET_VARIABLE_CMD_ERR_CODE_INVALID_DATA_LEN 0x7UL + #define NVM_GET_VARIABLE_CMD_ERR_CODE_NO_MEM 0x8UL + #define NVM_GET_VARIABLE_CMD_ERR_CODE_LAST NVM_GET_VARIABLE_CMD_ERR_CODE_NO_MEM u8 unused_0[7]; }; @@ -10667,10 +10851,17 @@ struct hwrm_nvm_set_variable_output { /* hwrm_nvm_set_variable_cmd_err (size:64b/8B) */ struct hwrm_nvm_set_variable_cmd_err { u8 code; - #define NVM_SET_VARIABLE_CMD_ERR_CODE_UNKNOWN 0x0UL - #define NVM_SET_VARIABLE_CMD_ERR_CODE_VAR_NOT_EXIST 0x1UL - #define NVM_SET_VARIABLE_CMD_ERR_CODE_CORRUPT_VAR 0x2UL - #define NVM_SET_VARIABLE_CMD_ERR_CODE_LAST NVM_SET_VARIABLE_CMD_ERR_CODE_CORRUPT_VAR + #define NVM_SET_VARIABLE_CMD_ERR_CODE_UNKNOWN 0x0UL + #define NVM_SET_VARIABLE_CMD_ERR_CODE_VAR_NOT_EXIST 0x1UL + #define NVM_SET_VARIABLE_CMD_ERR_CODE_CORRUPT_VAR 0x2UL + #define NVM_SET_VARIABLE_CMD_ERR_CODE_LEN_TOO_SHORT 0x3UL + #define NVM_SET_VARIABLE_CMD_ERR_CODE_ACTION_NOT_SUPPORTED 0x4UL + #define NVM_SET_VARIABLE_CMD_ERR_CODE_INDEX_INVALID 0x5UL + #define NVM_SET_VARIABLE_CMD_ERR_CODE_ACCESS_DENIED 0x6UL + #define NVM_SET_VARIABLE_CMD_ERR_CODE_CB_FAILED 0x7UL + #define NVM_SET_VARIABLE_CMD_ERR_CODE_INVALID_DATA_LEN 0x8UL + #define NVM_SET_VARIABLE_CMD_ERR_CODE_NO_MEM 0x9UL + #define NVM_SET_VARIABLE_CMD_ERR_CODE_LAST NVM_SET_VARIABLE_CMD_ERR_CODE_NO_MEM u8 unused_0[7]; }; From 1cc174d33a1f9acfd840014b6ded874aa12edc6c Mon Sep 17 00:00:00 2001 From: Shruti Parab Date: Tue, 19 Aug 2025 09:39:16 -0700 Subject: [PATCH 0200/1536] bnxt_en: Refactor bnxt_get_regs() Separate the code that sends the FW message to retrieve pcie stats into a new helper function. The caller of the helper will call hwrm_req_hold() beforehand and so the caller will call hwrm_req_drop() in all cases afterwards. This helper will be useful when adding the support for the larger struct pcie_ctx_hw_stats_v2. Signed-off-by: Shruti Parab Signed-off-by: Michael Chan Reviewed-by: Przemek Kitszel Link: https://patch.msgid.link/20250819163919.104075-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski --- .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 43 +++++++++++-------- 1 file changed, 25 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index 68a4ee9f69b1..2eb7c09a116f 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -2074,6 +2074,25 @@ static int bnxt_get_regs_len(struct net_device *dev) return reg_len; } +static void * +__bnxt_hwrm_pcie_qstats(struct bnxt *bp, struct hwrm_pcie_qstats_input *req) +{ + struct pcie_ctx_hw_stats_v2 *hw_pcie_stats; + dma_addr_t hw_pcie_stats_addr; + int rc; + + hw_pcie_stats = hwrm_req_dma_slice(bp, req, sizeof(*hw_pcie_stats), + &hw_pcie_stats_addr); + if (!hw_pcie_stats) + return NULL; + + req->pcie_stat_size = cpu_to_le16(sizeof(*hw_pcie_stats)); + req->pcie_stat_host_addr = cpu_to_le64(hw_pcie_stats_addr); + rc = hwrm_req_send(bp, req); + + return rc ? NULL : hw_pcie_stats; +} + #define BNXT_PCIE_32B_ENTRY(start, end) \ { offsetof(struct pcie_ctx_hw_stats, start), \ offsetof(struct pcie_ctx_hw_stats, end) } @@ -2088,11 +2107,9 @@ static const struct { static void bnxt_get_regs(struct net_device *dev, struct ethtool_regs *regs, void *_p) { - struct pcie_ctx_hw_stats *hw_pcie_stats; struct hwrm_pcie_qstats_input *req; struct bnxt *bp = netdev_priv(dev); - dma_addr_t hw_pcie_stats_addr; - int rc; + u8 *src; regs->version = 0; if (!(bp->fw_dbg_cap & DBG_QCAPS_RESP_FLAGS_REG_ACCESS_RESTRICTED)) @@ -2104,24 +2121,14 @@ static void bnxt_get_regs(struct net_device *dev, struct ethtool_regs *regs, if (hwrm_req_init(bp, req, HWRM_PCIE_QSTATS)) return; - hw_pcie_stats = hwrm_req_dma_slice(bp, req, sizeof(*hw_pcie_stats), - &hw_pcie_stats_addr); - if (!hw_pcie_stats) { - hwrm_req_drop(bp, req); - return; - } - - regs->version = 1; - hwrm_req_hold(bp, req); /* hold on to slice */ - req->pcie_stat_size = cpu_to_le16(sizeof(*hw_pcie_stats)); - req->pcie_stat_host_addr = cpu_to_le64(hw_pcie_stats_addr); - rc = hwrm_req_send(bp, req); - if (!rc) { + hwrm_req_hold(bp, req); + src = __bnxt_hwrm_pcie_qstats(bp, req); + if (src) { u8 *dst = (u8 *)(_p + BNXT_PXP_REG_LEN); - u8 *src = (u8 *)hw_pcie_stats; int i, j; - for (i = 0, j = 0; i < sizeof(*hw_pcie_stats); ) { + regs->version = 1; + for (i = 0, j = 0; i < sizeof(struct pcie_ctx_hw_stats); ) { if (i >= bnxt_pcie_32b_entries[j].start && i <= bnxt_pcie_32b_entries[j].end) { u32 *dst32 = (u32 *)(dst + i); From b530173d3c8adbe2921240cd3bce06053ab47d27 Mon Sep 17 00:00:00 2001 From: Shruti Parab Date: Tue, 19 Aug 2025 09:39:17 -0700 Subject: [PATCH 0201/1536] bnxt_en: Add pcie_stat_len to struct bp Add this length field to capture the length of the pcie stats structure supported by the FW. This length will be determined in bnxt_ethtool_init(). The minimum of this FW length and the length known to the driver will determine the actual ethtool -d length. Suggested-by: Kalesh AP Reviewed-by: Andy Gospodarek Reviewed-by: Kalesh AP Signed-off-by: Shruti Parab Signed-off-by: Michael Chan Link: https://patch.msgid.link/20250819163919.104075-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 + .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 37 ++++++++++++++----- 2 files changed, 28 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 40ae34923511..25ca002fc382 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -2543,6 +2543,7 @@ struct bnxt { u16 fw_rx_stats_ext_size; u16 fw_tx_stats_ext_size; u16 hw_ring_stats_size; + u16 pcie_stat_len; u8 pri2cos_idx[8]; u8 pri2cos_valid; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index 2eb7c09a116f..abb895fb1a9c 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -2061,17 +2061,11 @@ static void bnxt_get_drvinfo(struct net_device *dev, static int bnxt_get_regs_len(struct net_device *dev) { struct bnxt *bp = netdev_priv(dev); - int reg_len; if (!BNXT_PF(bp)) return -EOPNOTSUPP; - reg_len = BNXT_PXP_REG_LEN; - - if (bp->fw_cap & BNXT_FW_CAP_PCIE_STATS_SUPPORTED) - reg_len += sizeof(struct pcie_ctx_hw_stats); - - return reg_len; + return BNXT_PXP_REG_LEN + bp->pcie_stat_len; } static void * @@ -2107,6 +2101,7 @@ static const struct { static void bnxt_get_regs(struct net_device *dev, struct ethtool_regs *regs, void *_p) { + struct hwrm_pcie_qstats_output *resp; struct hwrm_pcie_qstats_input *req; struct bnxt *bp = netdev_priv(dev); u8 *src; @@ -2121,14 +2116,15 @@ static void bnxt_get_regs(struct net_device *dev, struct ethtool_regs *regs, if (hwrm_req_init(bp, req, HWRM_PCIE_QSTATS)) return; - hwrm_req_hold(bp, req); + resp = hwrm_req_hold(bp, req); src = __bnxt_hwrm_pcie_qstats(bp, req); if (src) { u8 *dst = (u8 *)(_p + BNXT_PXP_REG_LEN); - int i, j; + int i, j, len; + len = min(bp->pcie_stat_len, le16_to_cpu(resp->pcie_stat_size)); regs->version = 1; - for (i = 0, j = 0; i < sizeof(struct pcie_ctx_hw_stats); ) { + for (i = 0, j = 0; i < len; ) { if (i >= bnxt_pcie_32b_entries[j].start && i <= bnxt_pcie_32b_entries[j].end) { u32 *dst32 = (u32 *)(dst + i); @@ -5273,6 +5269,26 @@ static int bnxt_get_ts_info(struct net_device *dev, return 0; } +static void bnxt_hwrm_pcie_qstats(struct bnxt *bp) +{ + struct hwrm_pcie_qstats_output *resp; + struct hwrm_pcie_qstats_input *req; + + bp->pcie_stat_len = 0; + if (!(bp->fw_cap & BNXT_FW_CAP_PCIE_STATS_SUPPORTED)) + return; + + if (hwrm_req_init(bp, req, HWRM_PCIE_QSTATS)) + return; + + resp = hwrm_req_hold(bp, req); + if (__bnxt_hwrm_pcie_qstats(bp, req)) + bp->pcie_stat_len = min_t(u16, + le16_to_cpu(resp->pcie_stat_size), + sizeof(struct pcie_ctx_hw_stats_v2)); + hwrm_req_drop(bp, req); +} + void bnxt_ethtool_init(struct bnxt *bp) { struct hwrm_selftest_qlist_output *resp; @@ -5281,6 +5297,7 @@ void bnxt_ethtool_init(struct bnxt *bp) struct net_device *dev = bp->dev; int i, rc; + bnxt_hwrm_pcie_qstats(bp); if (!(bp->fw_cap & BNXT_FW_CAP_PKG_VER)) bnxt_get_pkgver(dev); From 5a4cf42322a0260c7391a3e64d288861e43de673 Mon Sep 17 00:00:00 2001 From: Shruti Parab Date: Tue, 19 Aug 2025 09:39:18 -0700 Subject: [PATCH 0202/1536] bnxt_en: Add pcie_ctx_v2 support for ethtool -d Add support to dump the expanded v2 struct that contains PCIE read/write latency and credit histogram data. Reviewed-by: Kalesh AP Reviewed-by: Andy Gospodarek Signed-off-by: Shruti Parab Signed-off-by: Michael Chan Link: https://patch.msgid.link/20250819163919.104075-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index abb895fb1a9c..2830a2b17a27 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -2088,14 +2088,16 @@ __bnxt_hwrm_pcie_qstats(struct bnxt *bp, struct hwrm_pcie_qstats_input *req) } #define BNXT_PCIE_32B_ENTRY(start, end) \ - { offsetof(struct pcie_ctx_hw_stats, start), \ - offsetof(struct pcie_ctx_hw_stats, end) } + { offsetof(struct pcie_ctx_hw_stats_v2, start),\ + offsetof(struct pcie_ctx_hw_stats_v2, end) } static const struct { u16 start; u16 end; } bnxt_pcie_32b_entries[] = { BNXT_PCIE_32B_ENTRY(pcie_ltssm_histogram[0], pcie_ltssm_histogram[3]), + BNXT_PCIE_32B_ENTRY(pcie_tl_credit_nph_histogram[0], unused_1), + BNXT_PCIE_32B_ENTRY(pcie_rd_latency_histogram[0], unused_2), }; static void bnxt_get_regs(struct net_device *dev, struct ethtool_regs *regs, @@ -2123,7 +2125,13 @@ static void bnxt_get_regs(struct net_device *dev, struct ethtool_regs *regs, int i, j, len; len = min(bp->pcie_stat_len, le16_to_cpu(resp->pcie_stat_size)); - regs->version = 1; + if (len <= sizeof(struct pcie_ctx_hw_stats)) + regs->version = 1; + else if (len < sizeof(struct pcie_ctx_hw_stats_v2)) + regs->version = 2; + else + regs->version = 3; + for (i = 0, j = 0; i < len; ) { if (i >= bnxt_pcie_32b_entries[j].start && i <= bnxt_pcie_32b_entries[j].end) { From 5be7cb805bd9a6680b863a1477dbc6e7986cc223 Mon Sep 17 00:00:00 2001 From: Pavan Chebbi Date: Tue, 19 Aug 2025 09:39:19 -0700 Subject: [PATCH 0203/1536] bnxt_en: Add Hyper-V VF ID VFs of the P7 chip family created by Hyper-V will have the device ID of 0x181b. Reviewed-by: Somnath Kotur Reviewed-by: Kalesh AP Signed-off-by: Pavan Chebbi Signed-off-by: Michael Chan Link: https://patch.msgid.link/20250819163919.104075-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 ++++- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 2d4fdf5a0dc5..ba99de403138 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -142,6 +142,7 @@ static const struct { [NETXTREME_E_P5_VF] = { "Broadcom BCM5750X NetXtreme-E Ethernet Virtual Function" }, [NETXTREME_E_P5_VF_HV] = { "Broadcom BCM5750X NetXtreme-E Virtual Function for Hyper-V" }, [NETXTREME_E_P7_VF] = { "Broadcom BCM5760X Virtual Function" }, + [NETXTREME_E_P7_VF_HV] = { "Broadcom BCM5760X Virtual Function for Hyper-V" }, }; static const struct pci_device_id bnxt_pci_tbl[] = { @@ -217,6 +218,7 @@ static const struct pci_device_id bnxt_pci_tbl[] = { { PCI_VDEVICE(BROADCOM, 0x1808), .driver_data = NETXTREME_E_P5_VF_HV }, { PCI_VDEVICE(BROADCOM, 0x1809), .driver_data = NETXTREME_E_P5_VF_HV }, { PCI_VDEVICE(BROADCOM, 0x1819), .driver_data = NETXTREME_E_P7_VF }, + { PCI_VDEVICE(BROADCOM, 0x181b), .driver_data = NETXTREME_E_P7_VF_HV }, { PCI_VDEVICE(BROADCOM, 0xd800), .driver_data = NETXTREME_S_VF }, #endif { 0 } @@ -315,7 +317,8 @@ static bool bnxt_vf_pciid(enum board_idx idx) return (idx == NETXTREME_C_VF || idx == NETXTREME_E_VF || idx == NETXTREME_S_VF || idx == NETXTREME_C_VF_HV || idx == NETXTREME_E_VF_HV || idx == NETXTREME_E_P5_VF || - idx == NETXTREME_E_P5_VF_HV || idx == NETXTREME_E_P7_VF); + idx == NETXTREME_E_P5_VF_HV || idx == NETXTREME_E_P7_VF || + idx == NETXTREME_E_P7_VF_HV); } #define DB_CP_REARM_FLAGS (DB_KEY_CP | DB_IDX_VALID) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 25ca002fc382..1bb2a5de88cd 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -2130,6 +2130,7 @@ enum board_idx { NETXTREME_E_P5_VF, NETXTREME_E_P5_VF_HV, NETXTREME_E_P7_VF, + NETXTREME_E_P7_VF_HV, }; #define BNXT_TRACE_BUF_MAGIC_BYTE ((u8)0xbc) From a6d4f25888b83b8300aef28d9ee22765c1cc9b34 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 19 Aug 2025 17:40:30 +0000 Subject: [PATCH 0204/1536] net: set net.core.rmem_max and net.core.wmem_max to 4 MB SO_RCVBUF and SO_SNDBUF have limited range today, unless distros or system admins change rmem_max and wmem_max. Even iproute2 uses 1 MB SO_RCVBUF which is capped by the kernel. Decouple [rw]mem_max and [rw]mem_default and increase [rw]mem_max to 4 MB. Before: $ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max net.core.rmem_default = 212992 net.core.rmem_max = 212992 net.core.wmem_default = 212992 net.core.wmem_max = 212992 After: $ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max net.core.rmem_default = 212992 net.core.rmem_max = 4194304 net.core.wmem_default = 212992 net.core.wmem_max = 4194304 Signed-off-by: Eric Dumazet Reviewed-by: Neal Cardwell Link: https://patch.msgid.link/20250819174030.1986278-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- Documentation/admin-guide/sysctl/net.rst | 4 ++++ Documentation/networking/ip-sysctl.rst | 6 +++--- include/net/sock.h | 4 ++-- net/core/sock.c | 8 ++++---- net/ipv4/arp.c | 2 +- net/ipv6/ndisc.c | 2 +- 6 files changed, 15 insertions(+), 11 deletions(-) diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 7b0c4291c686..2ef50828aff1 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -222,6 +222,8 @@ rmem_max The maximum receive socket buffer size in bytes. +Default: 4194304 + rps_default_mask ---------------- @@ -247,6 +249,8 @@ wmem_max The maximum send socket buffer size in bytes. +Default: 4194304 + message_burst and message_cost ------------------------------ diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 43badb338d22..9f5891c9b07b 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -209,7 +209,7 @@ neigh/default/unres_qlen_bytes - INTEGER Setting negative value is meaningless and will return error. - Default: SK_WMEM_MAX, (same as net.core.wmem_default). + Default: SK_WMEM_DEFAULT, (same as net.core.wmem_default). Exact value depends on architecture and kernel options, but should be enough to allow queuing 256 packets @@ -805,8 +805,8 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max This value results in initial window of 65535. max: maximal size of receive buffer allowed for automatically - selected receiver buffers for TCP socket. This value does not override - net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables + selected receiver buffers for TCP socket. + Calling setsockopt() with SO_RCVBUF disables automatic tuning of that socket's receive buffer size, in which case this value is ignored. Default: between 131072 and 32MB, depending on RAM size. diff --git a/include/net/sock.h b/include/net/sock.h index 1c49ea13af4a..63a6a48afb48 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2970,8 +2970,8 @@ void sk_get_meminfo(const struct sock *sk, u32 *meminfo); */ #define _SK_MEM_PACKETS 256 #define _SK_MEM_OVERHEAD SKB_TRUESIZE(256) -#define SK_WMEM_MAX (_SK_MEM_OVERHEAD * _SK_MEM_PACKETS) -#define SK_RMEM_MAX (_SK_MEM_OVERHEAD * _SK_MEM_PACKETS) +#define SK_WMEM_DEFAULT (_SK_MEM_OVERHEAD * _SK_MEM_PACKETS) +#define SK_RMEM_DEFAULT (_SK_MEM_OVERHEAD * _SK_MEM_PACKETS) extern __u32 sysctl_wmem_max; extern __u32 sysctl_rmem_max; diff --git a/net/core/sock.c b/net/core/sock.c index ab6953d295df..8002ac6293dc 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -281,12 +281,12 @@ static struct lock_class_key af_elock_keys[AF_MAX]; static struct lock_class_key af_kern_callback_keys[AF_MAX]; /* Run time adjustable parameters. */ -__u32 sysctl_wmem_max __read_mostly = SK_WMEM_MAX; +__u32 sysctl_wmem_max __read_mostly = 4 << 20; EXPORT_SYMBOL(sysctl_wmem_max); -__u32 sysctl_rmem_max __read_mostly = SK_RMEM_MAX; +__u32 sysctl_rmem_max __read_mostly = 4 << 20; EXPORT_SYMBOL(sysctl_rmem_max); -__u32 sysctl_wmem_default __read_mostly = SK_WMEM_MAX; -__u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX; +__u32 sysctl_wmem_default __read_mostly = SK_WMEM_DEFAULT; +__u32 sysctl_rmem_default __read_mostly = SK_RMEM_DEFAULT; DEFINE_STATIC_KEY_FALSE(memalloc_socks_key); EXPORT_SYMBOL_GPL(memalloc_socks_key); diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 5cfc1c939673..833f2cf97178 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -170,7 +170,7 @@ struct neigh_table arp_tbl = { [NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ, [NEIGH_VAR_INTERVAL_PROBE_TIME_MS] = 5 * HZ, [NEIGH_VAR_GC_STALETIME] = 60 * HZ, - [NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX, + [NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_DEFAULT, [NEIGH_VAR_PROXY_QLEN] = 64, [NEIGH_VAR_ANYCAST_DELAY] = 1 * HZ, [NEIGH_VAR_PROXY_DELAY] = (8 * HZ) / 10, diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 7d5abb3158ec..57aaa7ae8ac3 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -130,7 +130,7 @@ struct neigh_table nd_tbl = { [NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ, [NEIGH_VAR_INTERVAL_PROBE_TIME_MS] = 5 * HZ, [NEIGH_VAR_GC_STALETIME] = 60 * HZ, - [NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX, + [NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_DEFAULT, [NEIGH_VAR_PROXY_QLEN] = 64, [NEIGH_VAR_ANYCAST_DELAY] = 1 * HZ, [NEIGH_VAR_PROXY_DELAY] = (8 * HZ) / 10, From a5c10aa3d1ba643385534b5d40c8a67497f904b6 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 19 Aug 2025 23:14:57 +0000 Subject: [PATCH 0205/1536] selftests/net: packetdrill: Support single protocol test. Currently, we cannot write IPv4 or IPv6 specific packetdrill tests as ksft_runner.sh runs each .pkt file for both protocols. Let's support single protocol test by checking --ip_version in the .pkt file. Signed-off-by: Kuniyuki Iwashima Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20250819231527.1427361-1-kuniyu@google.com Signed-off-by: Jakub Kicinski --- .../selftests/net/packetdrill/ksft_runner.sh | 47 +++++++++++-------- 1 file changed, 28 insertions(+), 19 deletions(-) diff --git a/tools/testing/selftests/net/packetdrill/ksft_runner.sh b/tools/testing/selftests/net/packetdrill/ksft_runner.sh index a7e790af38ff..0ae6eeeb1a8e 100755 --- a/tools/testing/selftests/net/packetdrill/ksft_runner.sh +++ b/tools/testing/selftests/net/packetdrill/ksft_runner.sh @@ -3,21 +3,22 @@ source "$(dirname $(realpath $0))/../../kselftest/ktap_helpers.sh" -readonly ipv4_args=('--ip_version=ipv4 ' - '--local_ip=192.168.0.1 ' - '--gateway_ip=192.168.0.1 ' - '--netmask_ip=255.255.0.0 ' - '--remote_ip=192.0.2.1 ' - '-D CMSG_LEVEL_IP=SOL_IP ' - '-D CMSG_TYPE_RECVERR=IP_RECVERR ') - -readonly ipv6_args=('--ip_version=ipv6 ' - '--mtu=1520 ' - '--local_ip=fd3d:0a0b:17d6::1 ' - '--gateway_ip=fd3d:0a0b:17d6:8888::1 ' - '--remote_ip=fd3d:fa7b:d17d::1 ' - '-D CMSG_LEVEL_IP=SOL_IPV6 ' - '-D CMSG_TYPE_RECVERR=IPV6_RECVERR ') +declare -A ip_args=( + [ipv4]="--ip_version=ipv4 + --local_ip=192.168.0.1 + --gateway_ip=192.168.0.1 + --netmask_ip=255.255.0.0 + --remote_ip=192.0.2.1 + -D CMSG_LEVEL_IP=SOL_IP + -D CMSG_TYPE_RECVERR=IP_RECVERR" + [ipv6]="--ip_version=ipv6 + --mtu=1520 + --local_ip=fd3d:0a0b:17d6::1 + --gateway_ip=fd3d:0a0b:17d6:8888::1 + --remote_ip=fd3d:fa7b:d17d::1 + -D CMSG_LEVEL_IP=SOL_IPV6 + -D CMSG_TYPE_RECVERR=IPV6_RECVERR" +) if [ $# -ne 1 ]; then ktap_exit_fail_msg "usage: $0