path: root/doc/guides/nics/mlx5.rst
AgeCommit message (Collapse)Author
2018-11-14doc: add mlx5 IPv6 multicast limitation in VMDekel Peled
This patch adds limitation notice for MLX5 PMD. IPv6 multicast messages are not received on VM when promiscuous and allmulticast modes are off, due to netlink restriction. Signed-off-by: Dekel Peled <> Acked-by: Shahaf Shuler <>
2018-11-14doc: add mlx5 Direct Verbs flow engine limitationShahaf Shuler
Would be good to add also a code which disable the dv_flow_en the user requested. However such support will need to use new netlink command to query the switchdev mode from the underlying kernel. Considering the current 18.11 release is close to RC3, only a documentation is added. Signed-off-by: Shahaf Shuler <>
2018-11-05net/mlx5: support default RSS key as nullOphir Munk
Applications which add RSS rules must supply an RSS key and length. If an application is only interested in default RSS operation it should not care about the exact RSS key. By setting the key to NULL - the PMD will use the default RSS key. In addition if the application does not care about the RSS type it can set it to 0 and the PMD will use the default type (ETH_RSS_IP). Signed-off-by: Ophir Munk <> Acked-by: Shahaf Shuler <>
2018-11-05net/mlx5: make vectorized Tx threshold configurableYongseok Koh
Add txqs_max_vec parameter to configure the maximum number of Tx queues to enable vectorized Tx. And its default value is set according to the architecture and device type. Signed-off-by: Yongseok Koh <> Acked-by: Shahaf Shuler <>
2018-11-05net/mlx5: add 128B padding of Rx completion entryYongseok Koh
A PMD parameter (rxq_cqe_pad_en) is added to enable 128B padding of CQE on RX side. The size of CQE is aligned with the size of a cacheline of the core. If cacheline size is 128B, the CQE size is configured to be 128B even though the device writes only 64B data on the cacheline. This is to avoid unnecessary cache invalidation by device's two consecutive writes on to one cacheline. However in some architecture, it is more beneficial to update entire cacheline with padding the rest 64B rather than striding because read-modify-write could drop performance a lot. On the other hand, writing extra data will consume more PCIe bandwidth and could also drop the maximum throughput. It is recommended to empirically set this parameter. Disabled by default. Signed-off-by: Yongseok Koh <> Acked-by: Shahaf Shuler <>
2018-10-11net/mlx5: add runtime parameter to enable Direct VerbsOri Kam
DV flow API is based on new kernel API and is missing some functionality like counter but add other functionality like encap. In order not to affect current users even if the kernel supports the new DV API it should be enabled only manually. Signed-off-by: Ori Kam <> Acked-by: Yongseok Koh <>
2018-08-28net/mlx5: disable ConnectX-4 Lx Multi Packet Send by defaultShahaf Shuler
On ConnectX-4 Lx the Multi Packet Send (MPW) feature is considered un-secure, as on some cases were the application provides incorrect mbufs on the Tx burst the host or NIC can get stuck. Hence, disabling the feature by default for this specific NIC. Users can still enable this feature and enjoy the performance gain (mostly for low number of cores) by using the txq_mpw_en devarg. This patch will impact the out of the box performance of some application using ConnectX-4 Lx for the sack of security and robustness. Since we need different defaults based on the underlying device the mpw field in the configuration struct was extended to contain also the MLX5_ARG_UNSET option. Cc: Signed-off-by: Shahaf Shuler <> Acked-by: Yongseok Koh <>
2018-08-09doc: update release notes for Mellanox driversShahaf Shuler
Signed-off-by: Shahaf Shuler <>
2018-07-26net/mlx5: lay groundwork for switch offloadsAdrien Mazarguil
With mlx5, unlike normal flow rules implemented through Verbs for traffic emitted and received by the application, those targeting different logical ports of the device (VF representors for instance) are offloaded at the switch level and must be configured through Netlink (TC interface). This patch adds preliminary support to manage such flow rules through the flow API (rte_flow). Instead of rewriting tons of Netlink helpers and as previously suggested by Stephen [1], this patch introduces a new dependency to libmnl [2] (LGPL-2.1) when compiling mlx5. [1] [2] Signed-off-by: Adrien Mazarguil <> Acked-by: Nelio Laranjeiro <> Acked-by: Yongseok Koh <>
2018-07-12net/mlx5: support 32-bit systemsMoti Haimovsky
This patch adds support for building and running mlx5 PMD on 32bit systems such as i686. The main issue to tackle was handling the 32bit access to the UAR as quoted from the mlx5 PRM: QP and CQ DoorBells require 64-bit writes. For best performance, it is recommended to execute the QP/CQ DoorBell as a single 64-bit write operation. For platforms that do not support 64 bit writes, it is possible to issue the 64 bits DoorBells through two consecutive writes, each write 32 bits, as described below: * The order of writing each of the Dwords is from lower to upper addresses. * No other DoorBell can be rung (or even start ringing) in the midst of an on-going write of a DoorBell over a given UAR page. The last rule implies that in a multi-threaded environment, the access to a UAR page (which can be accessible by all threads in the process) must be synchronized (for example, using a semaphore) unless an atomic write of 64 bits in a single bus operation is guaranteed. Such a synchronization is not required for when ringing DoorBells on different UAR pages. Signed-off-by: Moti Haimovsky <> Acked-by: Yongseok Koh <>
2018-07-11net/mlx5: add parameter for port representorsAdrien Mazarguil
Prior to this patch, all port representors detected on a given device were probed and Ethernet devices instantiated for each of them. This patch adds support for the standard "representor" parameter, which implies that port representors are not probed by default anymore, except for the list provided through device arguments. (Patch based on prior work from Yuanhan Liu) Signed-off-by: Adrien Mazarguil <> Reviewed-by: Xueming Li <>
2018-07-03net/mlx5: use stride index in Rx completion entryYongseok Koh
Multi-Packet Receive Queue is to receive multiple packets on a single large buffer. The number of consumed strides in CQE is accumulated to keep track of the current stride index. However, it is safer to directly use stride index in CQE to avoid out-of-order situation which can possibly be caused by introducing LRO in the future. If Rx CQE compression is enabled, HW can be configured to store the stride index in a mini-CQE but this will need newer version of library/driver. Therefore, since this change, MPRQ is only supported with the newer library/driver and Rx hash result is not supported if MPRQ is enabled along with Rx CQE compression. Signed-off-by: Yongseok Koh <> Acked-by: Shahaf Shuler <>
2018-05-17net/mlx5: support MPLS-in-GRE and MPLS-in-UDPMatan Azrad
Add support for MPLS over GRE and MPLS over UDP tunnel types as described in the next RFCs: 1. 2. 3. Signed-off-by: Matan Azrad <> Acked-by: Nelio Laranjeiro <>
2018-05-17net/mlx5: add Bluefield device idShahaf Shuler
Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <>
2018-05-14net/mlx5: add Multi-Packet Rx supportYongseok Koh
Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth by posting a single large buffer for multiple packets. Instead of posting a buffer per a packet, one large buffer is posted in order to receive multiple packets on the buffer. A MPRQ buffer consists of multiple fixed-size strides and each stride receives one packet. Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is comparatively small, or PMD attaches the Rx packet to the mbuf by external buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external buffers will be allocated and managed by PMD. Signed-off-by: Yongseok Koh <> Acked-by: Shahaf Shuler <>
2018-05-14net/mlx5: add new memory region supportYongseok Koh
This is the new design of Memory Region (MR) for mlx PMD, in order to: - Accommodate the new memory hotplug model. - Support non-contiguous Mempool. There are multiple layers for MR search. L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized array by linear search. L0/L1 is in an inline function - mlx5_mr_lookup_cache(). If L1 misses, the bottom-half function is called to look up the address from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh() and it is not an inline function. Data structure for L2 is the Binary Tree. If L2 misses, the search falls into the slowest path which takes locks in order to access global device cache (priv->mr.cache) which is also a B-tree and caches the original MR list (priv->mr.mr_list) of the device. Unless the global cache is overflowed, it is all-inclusive of the MR list. This is L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and can't be expanded on the fly due to deadlock. Refer to the comments in the code for the details - mr_lookup_dev(). If L3 is overflowed, the list will have to be searched directly bypassing the cache although it is slower. If L3 misses, a new MR for the address should be created - mlx5_mr_create(). When it creates a new MR, it tries to register adjacent memsegs as much as possible which are virtually contiguous around the address. This must take two locks - memory_hotplug_lock and priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any allocation/free of memory inside. In the free callback of the memory hotplug event, freed space is searched from the MR list and corresponding bits are cleared from the bitmap of MRs. This can fragment a MR and the MR will have multiple search entries in the caches. Once there's a change by the event, the global cache must be rebuilt and all the per-queue caches will be flushed as well. If memory is frequently freed in run-time, that may cause jitter on dataplane processing in the worst case by incurring MR cache flush and rebuild. But, it would be the least probable scenario. To guarantee the most optimal performance, it is highly recommended to use an EAL option - '--socket-mem'. Then, the reserved memory will be pinned and won't be freed dynamically. And it is also recommended to configure per-lcore cache of Mempool. Even though there're many MRs for a device or MRs are highly fragmented, the cache of Mempool will be much helpful to reduce misses on per-queue caches anyway. '--legacy-mem' is also supported. Signed-off-by: Yongseok Koh <>
2018-05-14net/mlx5: remove memory region supportYongseok Koh
This patch removes current support of Memory Region (MR) in order to accommodate the dynamic memory hotplug patch. This patch can be compiled but traffic can't flow and HW will raise faults. Subsequent patches will add new MR support. Signed-off-by: Yongseok Koh <>
2018-05-14net/mlx5: document update for TxXueming Li
Add document for hw header parsing and SWP. Signed-off-by: Xueming Li <> Acked-by: Yongseok Koh <>
2018-04-27doc: update mlx5 guide on tunnel offloadingXueming Li
Remove tunnel limitations, add new hardware tunnel offload features. Signed-off-by: Xueming Li <> Acked-by: Nelio Laranjeiro <>
2018-04-27net/mlx5: introduce VXLAN-GPE tunnel typeXueming Li
Signed-off-by: Xueming Li <> Acked-by: Nelio Laranjeiro <>
2018-04-27net/mlx5: support L3 VXLAN flowXueming Li
This patch support L3 VXLAN, no inner L2 header comparing to standard VXLAN protocol. L3 VXLAN using specific overlay UDP destination port to discriminate against standard VXLAN, device parameter and FW has to be configured to support it: sudo mlxconfig -d <device> -y s IP_OVER_VXLAN_EN=1 sudo mlxconfig -d <device> -y s IP_OVER_VXLAN_PORT=<port> Signed-off-by: Xueming Li <> Acked-by: Nelio Laranjeiro <>
2018-04-14net/mlx5: add parameter for Netlink support in VFNélio Laranjeiro
All Netlink request the PMD will do can also be done by a iproute2 command line interface, enabling VF behavior configuration without having to modify the application nor reaching PMD limits (e.g. MAC address number limit). Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-04-14net/mlx5: use Netlink to add/remove MAC addressesNélio Laranjeiro
VF devices are not able to receive traffic unless it fully requests it though Netlink. This will cause the request to be processed by the PF which will add/remove the MAC address to the VF table if the VF is trusted. Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-04-11align SPDX Mellanox copyrightsShahaf Shuler
Aligning Mellanox SPDX copyrights to a single format. In addition replace to SPDX licence files which were missed. Signed-off-by: Shahaf Shuler <> Acked-by: Adrien Mazarguil <>
2018-03-30net/mlx: fix rdma-core glue path with EAL pluginsAdrien Mazarguil
Glue object files are looked up in RTE_EAL_PMD_PATH by default when set and should be installed in this directory. During startup, EAL attempts to load them automatically like other plug-ins found there. While normally harmless, dlopen() fails when rdma-core is not installed, EAL interprets this as a fatal error and terminates the application. This patch requests glue objects to be installed in a different directory to prevent their automatic loading by EAL since they are PMD helpers, not actual DPDK plug-ins. Fixes: f6242d0655cd ("net/mlx: make rdma-core glue path configurable") Cc: Reported-by: Timothy Redaelli <> Signed-off-by: Adrien Mazarguil <> Tested-by: Timothy Redaelli <>
2018-02-06net/mlx: make rdma-core glue path configurableAdrien Mazarguil
Since rdma-core glue libraries are intrinsically tied to their respective PMDs and used as internal plug-ins, their presence in the default search path among other system libraries for the dynamic linker is not necessarily desired. This commit enables their installation and subsequent look-up at run time in RTE_EAL_PMD_PATH if configured to a nonempty string. This path can also be overridden by environment variables MLX[45]_GLUE_PATH. Signed-off-by: Adrien Mazarguil <>
2018-02-05doc: update mlx required OFED versionShahaf Shuler
Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-01-31net/mlx5: spawn rdma-core dependency plug-inAdrien Mazarguil
When mlx5 is not compiled directly as an independent shared object (e.g. CONFIG_RTE_BUILD_SHARED_LIB not enabled for performance reasons), DPDK applications inherit its dependencies on libibverbs and libmlx5 through This is an issue both when DPDK is delivered as a binary package (Linux distributions) and for end users because rdma-core then propagates as a mandatory dependency for everything. Application writers relying on binary DPDK packages are not necessarily aware of this fact and may end up delivering packages with broken dependencies. This patch therefore introduces an intermediate internal plug-in hard-linked with rdma-core (to preserve symbol versioning) loaded by the PMD through dlopen(), so that a missing rdma-core does not cause unresolved symbols, allowing applications to start normally. Signed-off-by: Adrien Mazarguil <>
2018-01-29net/mlx5: fix secondary process mempool registrationShahaf Shuler
Secondary process is not allowed to register mempools on the flight. The code will return invalid memory key for such case. Fixes: 87ec44ce1651 ("net/mlx5: add operations for secondary process") Cc: Signed-off-by: Shahaf Shuler <> Signed-off-by: Xueming Li <> Acked-by: Nelio Laranjeiro <>
2018-01-16net/mlx5: convert to new Tx offloads APIShahaf Shuler
Ethdev Tx offloads API has changed since: commit cba7f53b717d ("ethdev: introduce Tx queue offloads API") This commit support the new Tx offloads API. Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <>
2018-01-16doc: update mlx5 statistics queryShahaf Shuler
Update the guide with more details on the different statistics query possible with MLX5 PMD. Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <>
2017-11-07doc: update mlx5 guideShahaf Shuler
Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <> Acked-by: John McNamara <>
2017-11-01net/mlx5: fix flows when VXLAN tunnel is 0Nélio Laranjeiro
Fix a strange behavior from the NIC, when the flow starts with a VXLAN layer with a VNI equals to zero all the traffic will match within this rule. Fixes: 2e709b6aa0f5 ("net/mlx5: support VXLAN flow item") Cc: Signed-off-by: Nelio Laranjeiro <> Acked-by: Yongseok Koh <>
2017-10-26net/mlx5: fix Tx doorbell memory barrierYongseok Koh
Configuring UAR as IO-mapped makes maximum throughput decline by noticeable amount. If UAR is configured as write-combining register, a write memory barrier is needed on ringing a doorbell. rte_wmb() is mostly effective when the size of a burst is comparatively small. Revert the register back to write-combining and enforce a write memory barrier instead, except for vectorized Tx burst routines. Application can change it by setting MLX5_SHUT_UP_BF under its own necessity. Fixes: 9f9bebae5530 ("net/mlx5: don't map doorbell register to write combining") Signed-off-by: Yongseok Koh <> Acked-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <>
2017-10-26doc: update mlx5 flow count limitationsOri Kam
Signed-off-by: Ori Kam <> Acked-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <>
2017-10-12net/mlx5: support flow directorNélio Laranjeiro
Support same functionalities as in commit cf521eaa3c76 ("net/mlx5: remove flow director support") This implementation is done on top of the generic flow API. Signed-off-by: Nelio Laranjeiro <> Acked-by: Yongseok Koh <>
2017-10-12net/mlx5: remove flow director supportNélio Laranjeiro
Generic flow API should be use for flow steering as is provides a better and easier way to configure flows. Signed-off-by: Nelio Laranjeiro <> Acked-by: Yongseok Koh <>
2017-10-12net/mlx5: add operations for secondary processXueming Li
Add operations that are safe for secondary processes: * (x)stats * device info get * rx/tx descriptor status Signed-off-by: Xueming Li <> Acked-by: Nelio Laranjeiro <>
2017-10-06net/mlx5: support upstream rdma-coreShachar Beiser
This removes the dependency on specific Mellanox OFED libraries by using the upstream rdma-core and linux upstream community code. Both rdma-core upstream and Mellanox OFED are Linux user-space packages: 1. Rdma-core is Linux upstream user-space package.(Generic) 2. Mellanox OFED is Mellanox's Linux user-space package.(Proprietary) The difference between the two are the APIs towards the kernel. Support for x86-32 is removed due to issues in rdma-core library. ICC compilation will be supported as soon as the following patch is integrated in rdma-core: Signed-off-by: Shachar Beiser <> Signed-off-by: Nelio Laranjeiro <>
2017-10-06net/mlx5: enforce Tx num of segments limitationShahaf Shuler
Mellanox NICs has a limitation on the number of mbuf segments a multi segment mbuf can have. The max number depends on the Tx offloads requested. The current code not enforce such limitation, which might cause malformed work requests to be written to the device. This commit adds verification for the number of mbuf segments posted to the device. In case of overflow the packet will not be sent. In addition update the nic documentation with the limitation. Considering device limitation is 63 data segments in a work request, the maximum number of segment in mbuf was calculated taking TSO as the worst case: max_nb_segs = 63 - (control_segment + ethernet segment + TSO headers inline + inline segment + extra inline to align to cacheline) Cc: Signed-off-by: Shahaf Shuler <> Acked-by: Yongseok Koh <> Acked-by: Nelio Laranjeiro <>
2017-08-03net/mlx5: add parameters to enable/disable vector datapathNelio Laranjeiro
Vector code is very young and can present some issues for users, to avoid them to modify the selections function by commenting the code and recompile the PMD, new devices parameters are added to deactivate the Tx and/or Rx vector code. By using such device parameters, the user will be able to fall back to regular burst functions. Signed-off-by: Nelio Laranjeiro <> Acked-by: Yongseok Koh <>
2017-07-31doc: update mlx guidesShahaf Shuler
Update the guides with: * New supported features. * Supported OFED and FW versions. * Quick start guide. * Performance tunning guide. Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <> Acked-by: John McNamara <>
2017-07-06doc: add VLAN flow limitation on mlx5 PMDShahaf Shuler
On mlx5 PMD Flow pattern without any specific vlan will match for vlan packets as well. Cc: Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <>
2017-05-07doc: update mlx supported OFED and FWShahaf Shuler
Update the supported Mellanox OFED and FW versions. Signed-off-by: Shahaf Shuler <> Acked-by: Adrien Mazarguil <>
2017-04-04net/mlx5: add enhanced multi-packet send for ConnectX-5Yongseok Koh
ConnectX-5 supports enhanced version of multi-packet send (MPS). An MPS Tx descriptor can carry multiple packets either by including pointers of packets or by inlining packets. Inlining packet data can be helpful to better utilize PCIe bandwidth. In addition, Enhanced MPS supports hybrid mode - mixing inlined packets and pointers in a descriptor. This feature is enabled by default if supported by HW. Signed-off-by: Yongseok Koh <>
2017-04-04net/mlx5: add hardware checksum offload for tunnel packetsShahaf Shuler
Prior to this commit Tx checksum offload was supported only for the inner headers. This commit adds support for the hardware to compute the checksum for the outer headers as well. The support is for tunneling protocols GRE and VXLAN. Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <>
2017-04-04net/mlx5: support hardware TSOShahaf Shuler
Implement support for hardware TSO. Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <>
2017-03-01doc: use corelist instead of coremaskKeith Wiles
The coremask option in DPDK is difficult to use and we should be promoting the use of the corelist (-l) option. The patch adjusts the docs to use -l EAL option instead of the -c option. The patch only changes the docs and not the code as the -c option will continue to exist unless it is removed in the future. The -c option should be kept to maintain backward compatibility. Signed-off-by: Keith Wiles <> Acked-by: John McNamara <>
2017-02-14doc: update release notes for mlx5Nelio Laranjeiro
Signed-off-by: Nelio Laranjeiro <> Acked-by: John McNamara <>
2017-02-09doc: add flow API to features listNelio Laranjeiro
Signed-off-by: Nelio Laranjeiro <>