Commit 1b50f420 authored by Dennis Dalessandro's avatar Dennis Dalessandro Committed by Leon Romanovsky
Browse files

RDMA/hfi1: Remove opa_vnic

parent 2afa8b9f
Loading
Loading
Loading
Loading
+0 −15
Original line number Diff line number Diff line
@@ -92,21 +92,6 @@ iSCSI Extensions for RDMA (iSER)
.. kernel-doc:: drivers/infiniband/ulp/iser/iser_verbs.c
   :internal:

Omni-Path (OPA) Virtual NIC support
-----------------------------------

.. kernel-doc:: drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
   :internal:

.. kernel-doc:: drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h
   :internal:

.. kernel-doc:: drivers/infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c
   :internal:

.. kernel-doc:: drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c
   :internal:

InfiniBand SCSI RDMA protocol target support
--------------------------------------------

+0 −1
Original line number Diff line number Diff line
@@ -9,7 +9,6 @@ InfiniBand

   core_locking
   ipoib
   opa_vnic
   sysfs
   tag_matching
   ucaps
+0 −159
Original line number Diff line number Diff line
=================================================================
Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC)
=================================================================

Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature
supports Ethernet functionality over Omni-Path fabric by encapsulating
the Ethernet packets between HFI nodes.

Architecture
=============
The patterns of exchanges of Omni-Path encapsulated Ethernet packets
involves one or more virtual Ethernet switches overlaid on the Omni-Path
fabric topology. A subset of HFI nodes on the Omni-Path fabric are
permitted to exchange encapsulated Ethernet packets across a particular
virtual Ethernet switch. The virtual Ethernet switches are logical
abstractions achieved by configuring the HFI nodes on the fabric for
header generation and processing. In the simplest configuration all HFI
nodes across the fabric exchange encapsulated Ethernet packets over a
single virtual Ethernet switch. A virtual Ethernet switch, is effectively
an independent Ethernet network. The configuration is performed by an
Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM)
application. HFI nodes can have multiple VNICs each connected to a
different virtual Ethernet switch. The below diagram presents a case
of two virtual Ethernet switches with two HFI nodes::

                               +-------------------+
                               |      Subnet/      |
                               |     Ethernet      |
                               |      Manager      |
                               +-------------------+
                                  /          /
                                /           /
                              /            /
                            /             /
  +-----------------------------+  +------------------------------+
  |  Virtual Ethernet Switch    |  |  Virtual Ethernet Switch     |
  |  +---------+    +---------+ |  | +---------+    +---------+   |
  |  | VPORT   |    |  VPORT  | |  | |  VPORT  |    |  VPORT  |   |
  +--+---------+----+---------+-+  +-+---------+----+---------+---+
           |                 \        /                 |
           |                   \    /                   |
           |                     \/                     |
           |                    /  \                    |
           |                  /      \                  |
       +-----------+------------+  +-----------+------------+
       |   VNIC    |    VNIC    |  |    VNIC   |    VNIC    |
       +-----------+------------+  +-----------+------------+
       |          HFI           |  |          HFI           |
       +------------------------+  +------------------------+


The Omni-Path encapsulated Ethernet packet format is as described below.

==================== ================================
Bits                 Field
==================== ================================
Quad Word 0:
0-19                 SLID (lower 20 bits)
20-30                Length (in Quad Words)
31                   BECN bit
32-51                DLID (lower 20 bits)
52-56                SC (Service Class)
57-59                RC (Routing Control)
60                   FECN bit
61-62                L2 (=10, 16B format)
63                   LT (=1, Link Transfer Head Flit)

Quad Word 1:
0-7                  L4 type (=0x78 ETHERNET)
8-11                 SLID[23:20]
12-15                DLID[23:20]
16-31                PKEY
32-47                Entropy
48-63                Reserved

Quad Word 2:
0-15                 Reserved
16-31                L4 header
32-63                Ethernet Packet

Quad Words 3 to N-1:
0-63                 Ethernet packet (pad extended)

Quad Word N (last):
0-23                 Ethernet packet (pad extended)
24-55                ICRC
56-61                Tail
62-63                LT (=01, Link Transfer Tail Flit)
==================== ================================

Ethernet packet is padded on the transmit side to ensure that the VNIC OPA
packet is quad word aligned. The 'Tail' field contains the number of bytes
padded. On the receive side the 'Tail' field is read and the padding is
removed (along with ICRC, Tail and OPA header) before passing packet up
the network stack.

The L4 header field contains the virtual Ethernet switch id the VNIC port
belongs to. On the receive side, this field is used to de-multiplex the
received VNIC packets to different VNIC ports.

Driver Design
==============
Intel OPA VNIC software design is presented in the below diagram.
OPA VNIC functionality has a HW dependent component and a HW
independent component.

The support has been added for IB device to allocate and free the RDMA
netdev devices. The RDMA netdev supports interfacing with the network
stack thus creating standard network interfaces. OPA_VNIC is an RDMA
netdev device type.

The HW dependent VNIC functionality is part of the HFI1 driver. It
implements the verbs to allocate and free the OPA_VNIC RDMA netdev.
It involves HW resource allocation/management for VNIC functionality.
It interfaces with the network stack and implements the required
net_device_ops functions. It expects Omni-Path encapsulated Ethernet
packets in the transmit path and provides HW access to them. It strips
the Omni-Path header from the received packets before passing them up
the network stack. It also implements the RDMA netdev control operations.

The OPA VNIC module implements the HW independent VNIC functionality.
It consists of two parts. The VNIC Ethernet Management Agent (VEMA)
registers itself with IB core as an IB client and interfaces with the
IB MAD stack. It exchanges the management information with the Ethernet
Manager (EM) and the VNIC netdev. The VNIC netdev part allocates and frees
the OPA_VNIC RDMA netdev devices. It overrides the net_device_ops functions
set by HW dependent VNIC driver where required to accommodate any control
operation. It also handles the encapsulation of Ethernet packets with an
Omni-Path header in the transmit path. For each VNIC interface, the
information required for encapsulation is configured by the EM via VEMA MAD
interface. It also passes any control information to the HW dependent driver
by invoking the RDMA netdev control operations::

        +-------------------+ +----------------------+
        |                   | |       Linux          |
        |     IB MAD        | |      Network         |
        |                   | |       Stack          |
        +-------------------+ +----------------------+
                 |               |          |
                 |               |          |
        +----------------------------+      |
        |                            |      |
        |      OPA VNIC Module       |      |
        |  (OPA VNIC RDMA Netdev     |      |
        |     & EMA functions)       |      |
        |                            |      |
        +----------------------------+      |
                    |                       |
                    |                       |
           +------------------+             |
           |     IB core      |             |
           +------------------+             |
                    |                       |
                    |                       |
        +--------------------------------------------+
        |                                            |
        |      HFI1 Driver with VNIC support         |
        |                                            |
        +--------------------------------------------+
+0 −1
Original line number Diff line number Diff line
@@ -24,7 +24,6 @@ infiniband

   core_locking
   ipoib
   opa_vnic
   sysfs
   tag_matching
   user_mad
+0 −156
Original line number Diff line number Diff line
.. include:: ../disclaimer-zh_CN.rst

:Original: Documentation/infiniband/opa_vnic.rst

:翻译:

 司延腾 Yanteng Si <siyanteng@loongson.cn>

:校译:

 王普宇 Puyu Wang <realpuyuwang@gmail.com>
 时奎亮 Alex Shi <alexs@kernel.org>

.. _cn_infiniband_opa_vnic:

=============================================
英特尔全路径(OPA)虚拟网络接口控制器(VNIC)
=============================================

英特尔全路径(OPA)虚拟网络接口控制器(VNIC)功能通过封装HFI节点之间的以
太网数据包,支持Omni-Path结构上的以太网功能。

体系结构
========

Omni-Path封装的以太网数据包的交换模式涉及Omni-Path结构拓扑上覆盖的一个或
多个虚拟以太网交换机。Omni-Path结构上的HFI节点的一个子集被允许在特定的虚
拟以太网交换机上交换封装的以太网数据包。虚拟以太网交换机是通过配置结构上的
HFI节点实现的逻辑抽象,用于生成和处理报头。在最简单的配置中,整个结构的所有
HFI节点通过一个虚拟以太网交换机交换封装的以太网数据包。一个虚拟以太网交换机,
实际上是一个独立的以太网网络。该配置由以太网管理器(EM)执行,它是可信的结
构管理器(FM)应用程序的一部分。HFI节点可以有多个VNIC,每个连接到不同的虚
拟以太网交换机。下图介绍了两个虚拟以太网交换机与两个HFI节点的情况::

                               +-------------------+
                               |      子网/        |
                               |     以太网        |
                               |      管理         |
                               +-------------------+
                                  /          /
                                /           /
                              /            /
                            /             /
  +-----------------------------+  +------------------------------+
  |     虚拟以太网切换          |  |      虚拟以太网切换          |
  |  +---------+    +---------+ |  | +---------+    +---------+   |
  |  | VPORT   |    |  VPORT  | |  | |  VPORT  |    |  VPORT  |   |
  +--+---------+----+---------+-+  +-+---------+----+---------+---+
           |                 \        /                 |
           |                   \    /                   |
           |                     \/                     |
           |                    /  \                    |
           |                  /      \                  |
       +-----------+------------+  +-----------+------------+
       |   VNIC    |    VNIC    |  |    VNIC   |    VNIC    |
       +-----------+------------+  +-----------+------------+
       |          HFI           |  |          HFI           |
       +------------------------+  +------------------------+


Omni-Path封装的以太网数据包格式如下所述。

==================== ================================
位                   域
==================== ================================
Quad Word 0:
0-19                 SLID (低20位)
20-30                长度 (以四字为单位)
31                   BECN 位
32-51                DLID (低20位)
52-56                SC (服务级别)
57-59                RC (路由控制)
60                   FECN 位
61-62                L2 (=10, 16B 格式)
63                   LT (=1, 链路传输头 Flit)

Quad Word 1:
0-7                  L4 type (=0x78 ETHERNET)
8-11                 SLID[23:20]
12-15                DLID[23:20]
16-31                PKEY
32-47                熵
48-63                保留

Quad Word 2:
0-15                 保留
16-31                L4 头
32-63                以太网数据包

Quad Words 3 to N-1:
0-63                 以太网数据包 (pad拓展)

Quad Word N (last):
0-23                 以太网数据包 (pad拓展)
24-55                ICRC
56-61                尾
62-63                LT (=01, 链路传输尾 Flit)
==================== ================================

以太网数据包在传输端被填充,以确保VNIC OPA数据包是四字对齐的。“尾”字段
包含填充的字节数。在接收端,“尾”字段被读取,在将数据包向上传递到网络堆
栈之前,填充物被移除(与ICRC、尾和OPA头一起)。

L4头字段包含VNIC端口所属的虚拟以太网交换机ID。在接收端,该字段用于将收
到的VNIC数据包去多路复用到不同的VNIC端口。

驱动设计
========

英特尔OPA VNIC的软件设计如下图所示。OPA VNIC功能有一个依赖于硬件的部分
和一个独立于硬件的部分。

对IB设备分配和释放RDMA netdev设备的支持已经被加入。RDMA netdev支持与
网络堆栈的对接,从而创建标准的网络接口。OPA_VNIC是一个RDMA netdev设备
类型。

依赖于HW的VNIC功能是HFI1驱动的一部分。它实现了分配和释放OPA_VNIC RDMA
netdev的动作。它涉及VNIC功能的HW资源分配/管理。它与网络堆栈接口并实现所
需的net_device_ops功能。它在传输路径中期待Omni-Path封装的以太网数据包,
并提供对它们的HW访问。在将数据包向上传递到网络堆栈之前,它把Omni-Path头
从接收的数据包中剥离。它还实现了RDMA netdev控制操作。

OPA VNIC模块实现了独立于硬件的VNIC功能。它由两部分组成。VNIC以太网管理
代理(VEMA)作为一个IB客户端向IB核心注册,并与IB MAD栈接口。它与以太网
管理器(EM)和VNIC netdev交换管理信息。VNIC netdev部分分配和释放OPA_VNIC
RDMA netdev设备。它在需要时覆盖由依赖HW的VNIC驱动设置的net_device_ops函数,
以适应任何控制操作。它还处理以太网数据包的封装,在传输路径中使用Omni-Path头。
对于每个VNIC接口,封装所需的信息是由EM通过VEMA MAD接口配置的。它还通过调用
RDMA netdev控制操作将任何控制信息传递给依赖于HW的驱动程序::

        +-------------------+ +----------------------+
        |                   | |       Linux          |
        |     IB MAD        | |       网络           |
        |                   | |       栈             |
        +-------------------+ +----------------------+
                 |               |          |
                 |               |          |
        +----------------------------+      |
        |                            |      |
        |      OPA VNIC 模块         |      |
        |  (OPA VNIC RDMA Netdev     |      |
        |     & EMA 函数)            |      |
        |                            |      |
        +----------------------------+      |
                    |                       |
                    |                       |
           +------------------+             |
           |     IB 核心      |             |
           +------------------+             |
                    |                       |
                    |                       |
        +--------------------------------------------+
        |                                            |
        |      HFI1 驱动和 VNIC 支持                 |
        |                                            |
        +--------------------------------------------+
Loading