path: root/drivers/net/mlx5/mlx5_ethdev.c
AgeCommit message (Collapse)Author
2019-03-29net/mlx5: switch to the shared IB device contextViacheslav Ovsiienko
The code is updated to use the shared IB device context and device handles. The IB device context is shared between reprentors created over the single multiport IB device. All Verbs and DevX objects will be created within this shared context. Signed-off-by: Viacheslav Ovsiienko <> Acked-by: Shahaf Shuler <>
2019-03-29net/mlx5: switch to the shared context IB attributesViacheslav Ovsiienko
The code is updated to use the shared IB device attributes, located in the shared IB context. It saves some memory if there are representors created over the single Infiniband device with multiple ports. Signed-off-by: Viacheslav Ovsiienko <> Acked-by: Shahaf Shuler <>
2019-03-29net/mlx5: switch to the names in the shared IB contextViacheslav Ovsiienko
The IB device names are moved from device private data to the shared context, code involving the names is updated. The IB port index treatment is added where it is relevant. Signed-off-by: Viacheslav Ovsiienko <> Acked-by: Shahaf Shuler <>
2019-03-29net/mlx5: modify get ifindex routine for multiport IBViacheslav Ovsiienko
There is the routine mlx5_nl_ifindex() returning the network interface index associated with Infiniband device. We are going to support multiport IB devices, now function takes the IB port as argument and returns ifindex associated with tuple <IB device, IB port> Signed-off-by: Viacheslav Ovsiienko <> Acked-by: Shahaf Shuler <>
2019-03-29net/mlx5: add representor recognition on Linux 5.xViacheslav Ovsiienko
The master device and VF representors were distinguished by presence of port name, master device did not have one. The new Linux kernels starting from 5.0 provide the port name for master device and the implemented representor recognizing method does not work. The new recognizing method is based on querying the VF number, has been created on the base of the device. The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK attribute is specified in the Netlink request message. Also the presence check of device symlink in device sysfs folder is added to distinguish representors with sysfs based method. Signed-off-by: Viacheslav Ovsiienko <> Acked-by: Shahaf Shuler <>
2019-03-29net/mlx5: add missing return value checkAli Alnubani
This patch fixes the build failure with message: drivers/net/mlx5/mlx5_ethdev.c: In function ‘mlx5_sysfs_switch_info’: drivers/net/mlx5/mlx5_ethdev.c:1381:3: error: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Werror=unused-result] fscanf(file, "%s", port_name); ^ Which reproduces on Ubuntu 16.04 LTS with gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609. Fixes: b2f3a3810125 ("net/mlx5: support new representor naming format") Signed-off-by: Ali Alnubani <> Acked-by: Viacheslav Ovsiienko <> Acked-by: Dekel Peled <>
2019-03-20net/mlx5: support new representor naming formatDekel Peled
Kernel update [1] introduce new format of representors names. This patch implements RFC [2], updating MLX5 PMD to support the new format, while maintaining support of the existing format. [1] [2] Signed-off-by: Dekel Peled <> Acked-by: Viacheslav Ovsiienko <> Acked-by: Shahaf Shuler <>
2019-03-01net/mlx: prefix private structureThomas Monjalon
The private structure stored in rte_eth_dev->data->dev_private was named "struct priv". In order to ease code browsing, the structure is renamed "struct mlx[45]_priv". Cc: Signed-off-by: Thomas Monjalon <> Acked-by: Yongseok Koh <>
2019-02-13net/mlx: support firmware version queryThomas Monjalon
The API function rte_eth_dev_fw_version_get() is querying drivers via the operation callback fw_version_get(). The implementation of this operation is added for mlx4 and mlx5. Both functions are copying the same ibverbs field fw_ver which is retrieved when calling ibv_query_device[_ex]() during the port probing. It is tested with command "drvinfo" of examples/ethtool/. Signed-off-by: Thomas Monjalon <> Acked-by: Shahaf Shuler <>
2018-10-22mk: build with _GNU_SOURCE defined by defaultAnatoly Burakov
We use _GNU_SOURCE all over the place, but often times we miss defining it, resulting in broken builds on musl. Rather than fixing every library's and driver's and application's makefile, fix it by simply defining _GNU_SOURCE by default for all builds. Remove all usages of _GNU_SOURCE in source files and makefiles, and also fixup a couple of instances of using __USE_GNU instead of _GNU_SOURCE. Signed-off-by: Anatoly Burakov <>
2018-10-17net/mlx5: remove useless driver name comparisonThomas Monjalon
The function mlx5_dev_to_port_id() is returning all the ports associated to a rte_device. It was comparing driver names while already comparing rte_device pointers. If two devices are the same, they will have the same driver. So the useless driver name comparison is removed. Signed-off-by: Thomas Monjalon <> Acked-by: Shahaf Shuler <>
2018-10-11net/mlx5: always use representor ifindex for ioctlShahaf Shuler
In the current code, on some cases the representor ethdev is using the PF interface to query some link status information or pause parameters. It was done because in previous kernel versions there was no support from the kernel for the representor info. Using the PF i/f for such ioctl is error prone and not always working because: * On some cases there is no PF at all, only representors (e.g Bluefield with host representors) * Query the up/down status from representor and link status from PF is in-consist * PF link is down doesn't necessarily means representor is down. * setting different pause configuration for the PF and the representors will result on undefined behaviour Making the code cleaner and more robust by using only the representor i/f for the ioctl. whatever the kernel will provide on this query will be used. No need to do W.A. for kernel missing functionality. Note: 1. Setting pause parameters will obviously won't work on representors 2. Old kernel will not report all the possible representor info Fixes: 2b7302638898 ("net/mlx5: probe all port representors") Cc: Signed-off-by: Shahaf Shuler <>
2018-10-11net/mlx5: fix representor port link statusXueming Li
Current code uses PF links status for representor port, not the representor interface itself. This caused wrong representor port link status when toggling interface up or down. Fixes: 2b7302638898 ("net/mlx5: probe all port representors") Cc: Signed-off-by: Xueming Li <> Acked-by: Yongseok Koh <>
2018-07-26net/mlx5: fix invalid network interface indexAdrien Mazarguil
Network interface indices being unsigned, an invalid index or error is normally expressed through a zero value (see if_nametoindex()). mlx5_ifindex() has a signed return type for negative values in case of error. Since mlx5_nl.c does not check for errors, these may be fed back as invalid interfaces indices to subsequent system calls. This usage would have been correct if mlx5_ifindex() returned a zero value instead. This patch makes mlx5_ifindex() unsigned for convenience. Fixes: ccdcba53a3f4 ("net/mlx5: use Netlink to add/remove MAC addresses") Cc: Signed-off-by: Adrien Mazarguil <> Acked-by: Nelio Laranjeiro <> Acked-by: Yongseok Koh <>
2018-07-26net/mlx5: fix representors detectionNelio Laranjeiro
On systems where the required Netlink commands are not supported but Mellanox OFED is installed, representors information must be retrieved through sysfs. Fixes: 26c08b979d26 ("net/mlx5: add port representor awareness") Signed-off-by: Nelio Laranjeiro <> Acked-by: Shahaf Shuler <>
2018-07-26net/mlx5: fix build with old kernelsMoti Haimovsky
This commit fixes compilation errors due to missing definitions found when compiling mlx5 PMD from DPDK 17.11-LTS on Ubuntu 12.4 with kernel 3.15. Fixes: 75ef62a94301 ("net/mlx5: fix link speed capability information") Fixes: 5bfc9fc112dd ("net/mlx5: use static assert for compile-time sanity checks") Cc: Signed-off-by: Moti Haimovsky <> Acked-by: Shahaf Shuler <>
2018-07-12net/mlx5: use a macro for the RSS key sizeNelio Laranjeiro
ConnectX 4-5 support only 40 bytes of RSS key, using a compiled size hash key is not necessary. Signed-off-by: Nelio Laranjeiro <> Acked-by: Yongseok Koh <>
2018-07-11net/mlx5: probe all port representorsAdrien Mazarguil
Probe existing port representors in addition to their master device and associate them automatically. To avoid collision between Ethernet devices, they are named as follows: - "{DBDF}" for master/switch devices. - "{DBDF}_representor_{rep}" with "rep" starting from 0 for port representors. (Patch based on prior work from Yuanhan Liu) Signed-off-by: Adrien Mazarguil <> Signed-off-by: Nelio Laranjeiro <> Reviewed-by: Xueming Li <>
2018-07-11net/mlx5: drop useless support for several Verbs portsAdrien Mazarguil
Unlike mlx4 from which this capability was inherited, mlx5 devices expose exactly one Verbs port per PCI bus address. Each physical port gets assigned its own bus address with a single Verbs port. While harmless, this code requires an extra loop that would get in the way of subsequent refactoring. No functional impact. Signed-off-by: Adrien Mazarguil <>
2018-07-03net/mlx5: fix invalid error checkAdrien Mazarguil
Since its return type is unsigned, if_nametoindex() returns 0 in case of error, never -1. Fixes: ccdcba53a3f4 ("net/mlx5: use Netlink to add/remove MAC addresses") Cc: Signed-off-by: Adrien Mazarguil <> Acked-by: Nelio Laranjeiro <>
2018-05-28net/mlx5: fix memory region cache initXueming Li
MR cache init takes place on the device configuration. When the device is re-configured multiple times, for example when changing the number of queue on the flight, deadlock can happen. This patch moved MR cache init from device configuration function to probe function to make sure init only once. Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support") Signed-off-by: Xueming Li <> Acked-by: Yongseok Koh <>
2018-05-28net/mlx5: fix crash when configure is not calledAdrien Mazarguil
Although uncommon, applications may destroy a device immediately after probing it without going through dev_configure() first. This patch addresses a crash which occurs when mlx5_dev_close() calls mlx5_mr_release() due to an uninitialized entry in the private structure. Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support") Signed-off-by: Adrien Mazarguil <> Acked-by: Yongseok Koh <>
2018-05-14net/mlx5: add Multi-Packet Rx supportYongseok Koh
Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth by posting a single large buffer for multiple packets. Instead of posting a buffer per a packet, one large buffer is posted in order to receive multiple packets on the buffer. A MPRQ buffer consists of multiple fixed-size strides and each stride receives one packet. Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is comparatively small, or PMD attaches the Rx packet to the mbuf by external buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external buffers will be allocated and managed by PMD. Signed-off-by: Yongseok Koh <> Acked-by: Shahaf Shuler <>
2018-05-14net/mlx5: add new memory region supportYongseok Koh
This is the new design of Memory Region (MR) for mlx PMD, in order to: - Accommodate the new memory hotplug model. - Support non-contiguous Mempool. There are multiple layers for MR search. L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized array by linear search. L0/L1 is in an inline function - mlx5_mr_lookup_cache(). If L1 misses, the bottom-half function is called to look up the address from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh() and it is not an inline function. Data structure for L2 is the Binary Tree. If L2 misses, the search falls into the slowest path which takes locks in order to access global device cache (priv->mr.cache) which is also a B-tree and caches the original MR list (priv->mr.mr_list) of the device. Unless the global cache is overflowed, it is all-inclusive of the MR list. This is L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and can't be expanded on the fly due to deadlock. Refer to the comments in the code for the details - mr_lookup_dev(). If L3 is overflowed, the list will have to be searched directly bypassing the cache although it is slower. If L3 misses, a new MR for the address should be created - mlx5_mr_create(). When it creates a new MR, it tries to register adjacent memsegs as much as possible which are virtually contiguous around the address. This must take two locks - memory_hotplug_lock and priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any allocation/free of memory inside. In the free callback of the memory hotplug event, freed space is searched from the MR list and corresponding bits are cleared from the bitmap of MRs. This can fragment a MR and the MR will have multiple search entries in the caches. Once there's a change by the event, the global cache must be rebuilt and all the per-queue caches will be flushed as well. If memory is frequently freed in run-time, that may cause jitter on dataplane processing in the worst case by incurring MR cache flush and rebuild. But, it would be the least probable scenario. To guarantee the most optimal performance, it is highly recommended to use an EAL option - '--socket-mem'. Then, the reserved memory will be pinned and won't be freed dynamically. And it is also recommended to configure per-lcore cache of Mempool. Even though there're many MRs for a device or MRs are highly fragmented, the cache of Mempool will be much helpful to reduce misses on per-queue caches anyway. '--legacy-mem' is also supported. Signed-off-by: Yongseok Koh <>
2018-05-14ethdev: new Rx/Tx offloads APIWei Dai
This patch check if a input requested offloading is valid or not. Any reuqested offloading must be supported in the device capabilities. Any offloading is disabled by default if it is not set in the parameter dev_conf->[rt]xmode.offloads to rte_eth_dev_configure() and [rt]x_conf->offloads to rte_eth_[rt]x_queue_setup(). If any offloading is enabled in rte_eth_dev_configure() by application, it is enabled on all queues no matter whether it is per-queue or per-port type and no matter whether it is set or cleared in [rt]x_conf->offloads to rte_eth_[rt]x_queue_setup(). If a per-queue offloading hasn't be enabled in rte_eth_dev_configure(), it can be enabled or disabled for individual queue in ret_eth_[rt]x_queue_setup(). A new added offloading is the one which hasn't been enabled in rte_eth_dev_configure() and is reuqested to be enabled in rte_eth_[rt]x_queue_setup(), it must be per-queue type, otherwise trigger an error log. The underlying PMD must be aware that the requested offloadings to PMD specific queue_setup() function only carries those new added offloadings of per-queue type. This patch can make above such checking in a common way in rte_ethdev layer to avoid same checking in underlying PMD. This patch assumes that all PMDs in 18.05-rc2 have already converted to offload API defined in 17.11 . It also assumes that all PMDs can return correct offloading capabilities in rte_eth_dev_infos_get(). In the beginning of [rt]x_queue_setup() of underlying PMD, add offloads = [rt]xconf->offloads | dev->data->dev_conf.[rt]xmode.offloads; to keep same as offload API defined in 17.11 to avoid upper application broken due to offload API change. PMD can use the info that input [rt]xconf->offloads only carry the new added per-queue offloads to do some optimization or some code change on base of this patch. Signed-off-by: Wei Dai <> Signed-off-by: Ferruh Yigit <> Signed-off-by: Qi Zhang <>
2018-05-14net/mlx5: fix SW parser enablingXueming Li
Fixes: 5f8ba81c4228 ("net/mlx5: support generic tunnel offloading") Signed-off-by: Xueming Li <> Acked-by: Yongseok Koh <>
2018-05-14net/mlx5: add Rx and Tx tuning parametersShahaf Shuler
A new ethdev API was exposed by commit 3be82f5cc5e3 ("ethdev: support PMD-tuned Tx/Rx parameters") Enabling the PMD to provide default parameters in case no strict request from application in order to improve the out of the box experience. While the current API lacks the means for the PMD to provide the best possible value, providing the best default the PMD can guess. The values are based on Mellanox performance report and depends on the underlying NIC capabilities. Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <>
2018-05-14net/mlx5: fix ethtool link setting call orderShahaf Shuler
According to ethtool_link_setting API recommendation ETHTOOL_GLINKSETTINGS should be called before ETHTOOL_GSET as the later one deprecated. Fixes: f47ba80080ab ("net/mlx5: remove kernel version check") Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <>
2018-04-27net/mlx5: support generic tunnel offloadingXueming Li
This commit adds support for generic tunnel TSO and checksum offload. PMD will compute the inner/outer headers offset according to the mbuf fields. Hardware will do calculation based on offsets and types. Signed-off-by: Xueming Li <> Acked-by: Yongseok Koh <>
2018-04-27net/mlx5: split MAC address add/remove codeNélio Laranjeiro
Move some code in DPDK callbacks to add/remove MAC addresses to internal function. This modification will be necessary to handle implement the devop set_mc_addr_list. Signed-off-by: Nelio Laranjeiro <>
2018-04-27drivers/net: update link statusFerruh Yigit
Update link status related feature document items and minor updates in some link status related functions. Signed-off-by: Ferruh Yigit <> Acked-by: Adrien Mazarguil <>
2018-04-14ethdev: replace bus specific struct with generic devFerruh Yigit
Public struct rte_eth_dev_info has a "struct rte_pci_device" field in it although it is common for all ethdev in all buses. Replacing pci specific struct with generic device struct and updating places that are using pci device in a way to get this information from generic device. Signed-off-by: Ferruh Yigit <> Reviewed-by: David Marchand <> Acked-by: Pablo de Lara <> Acked-by: Thomas Monjalon <>
2018-04-14net/mlx5: use Netlink to add/remove MAC addressesNélio Laranjeiro
VF devices are not able to receive traffic unless it fully requests it though Netlink. This will cause the request to be processed by the PF which will add/remove the MAC address to the VF table if the VF is trusted. Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-04-11align SPDX Mellanox copyrightsShahaf Shuler
Aligning Mellanox SPDX copyrights to a single format. In addition replace to SPDX licence files which were missed. Signed-off-by: Shahaf Shuler <> Acked-by: Adrien Mazarguil <>
2018-04-04convert snprintf to strlcpyBruce Richardson
Since we have support for the strlcpy function in DPDK, replace all instances where a string is copied using snprintf. Signed-off-by: Bruce Richardson <> Reviewed-by: Stephen Hemminger <>
2018-03-30net/mlx5: fix RSS key length queryShahaf Shuler
The RSS key length returned by rte_eth_dev_info_get command was taken from the PMD private structure. This structure initialization was done only after the port configuration. Considering Mellanox device supports only 40B long RSS key, reporting the fixed number instead. Fixes: 29c1d8bb3e79 ("net/mlx5: handle a single RSS hash key for all protocols") Cc: Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <>
2018-03-30net/mlx5: enforce RSS key length limitationShahaf Shuler
RSS hash key must be 40 Bytes long. Cc: Signed-off-by: Shahaf Shuler <> Acked-by: Nelio Laranjeiro <>
2018-03-30net/mlx5: fix link status to use wait to completeNélio Laranjeiro
Wait to complete is present to let the application get a correct status when it requires it, it should not be ignored. Fixes: e313ef4c2fe8 ("net/mlx5: fix link state on device start") Fixes: cb8faed7dde8 ("mlx5: support link status update") Cc: Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-03-30net/mlx5: fix link status behaviorNélio Laranjeiro
This behavior is mixed between what should be handled by the application and what is under PMD responsibility. According to DPDK API: - link_update() should only query the link status [1] - link_set_{up,down}() should only set the link to the according status [1] - dev_{start,stop}() should enable/disable traffic reception/emission [2] On this PMD, the link status is retrieved from the net device associated owned by the Linux Kernel, it does not means that even when this interface is down, the PMD cannot send/receive traffic from the NIC those two information are unrelated, until the physical port is active and has a link, the PMD can receive/send traffic on the wire. According to DPDK API, calling the rte_eth_dev_start() even when the Linux interface link is down is then possible and allowed, as the traffic will flow between the DPDK application and the Physical port. This also means that a synchronization between the Linux interface and the DPDK application remains under the DPDK application responsibility. To handle such synchronization the application should behave as the following scheme, to start: rte_eth_get_link(port_id, &link); if (link.link_status == ETH_DOWN) rte_eth_dev_set_link_up(port_id); rte_eth_dev_start(port_id); Taking in account the possible returned values for each function. and to stop: rte_eth_dev_stop(port_id); rte_eth_dev_set_link_down(port_id); The application should also set the LSC interrupt callbacks to catch and behave accordingly when the administrator set the Linux device down/up. The same callbacks are called when the link on the medium falls/raise. [1] [2] Fixes: c7bf62255edf ("net/mlx5: fix handling link status event") Fixes: e313ef4c2fe8 ("net/mlx5: fix link state on device start") Cc: Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <> Acked-by: Yongseok Koh <>
2018-03-30net/mlx5: remove kernel version checkNélio Laranjeiro
Kernel version check was introduced in commit 3a49ffe38a95 ("net/mlx5: fix link status query") due to a bug fixed by commit ef09a7fc7620 ("net/mlx5: fix inconsistent link status query") This patch restore the previous behavior as described in Linux API. Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-03-30net/mlx5: use dynamic loggingNélio Laranjeiro
Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-03-30net/mlx5: use port id in PMD logNélio Laranjeiro
Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-03-30net/mlx5: standardize on negative errno valuesNélio Laranjeiro
Set rte_errno systematically as well. Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-03-30net/mlx5: prefix all functions with mlx5Nélio Laranjeiro
This change removes the need to distinguish unlocked priv_*() functions which are therefore renamed using a mlx5_*() prefix for consistency. At the same time, all functions from mlx5 uses a pointer to the ETH device instead of the one to the PMD private data. Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-03-30net/mlx5: remove control path locksNélio Laranjeiro
In priv struct only the memory region needs to be protected against concurrent access between the control plane and the data plane. Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-03-30net/mlx5: remove useless empty linesNélio Laranjeiro
Some empty lines have been added in the middle of the code without any reason. This commit removes them. Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-03-30net/mlx5: add missing function documentationNélio Laranjeiro
Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-03-30net/mlx5: mark parameters with unused attributeNélio Laranjeiro
Replaces all (void)foo; by __rte_unused macro except when variables are under #if statements. Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-03-30net/mlx5: fix sriov flagNélio Laranjeiro
priv_get_num_vfs() was used to help the PMD in prefetching the mbuf in datapath when the PMD was behaving in VF mode. This knowledge is no more used. Fixes: 528a9fbec6de ("net/mlx5: support ConnectX-5 devices") Cc: Signed-off-by: Nelio Laranjeiro <> Acked-by: Adrien Mazarguil <>
2018-03-30net/mlx: control netdevices through ioctl onlyAdrien Mazarguil
Several control operations implemented by these PMDs affect netdevices through sysfs, itself subject to file system permission checks enforced by the kernel, which limits their use for most purposes to applications running with root privileges. Since performing the same operations through ioctl() requires fewer capabilities (only CAP_NET_ADMIN) and given the remaining operations are already implemented this way, this patch standardizes on ioctl() and gets rid of redundant code. Signed-off-by: Adrien Mazarguil <> Reviewed-by: Marcelo Ricardo Leitner <>