summaryrefslogtreecommitdiff
path: root/drivers/net/mlx5/mlx5_txq.c
AgeCommit message (Collapse)Author
2019-11-26net/mlx5: add GENEVE in tunnel offloads capabilitiesSuanming Mou
GENEVE is available in tunnel offloads. Add it as the default support option. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com>
2019-11-26net/mlx5: fix legacy multi-packet Tx descriptorsViacheslav Ovsiienko
ConnectX-4LX supports multiple packets within the single Tx descriptor. This feature is named as "Legacy Multi-Packet Write" and imposes a lot of limitations: - no ACLs, it means no NIC Tx Flows are supported and Tx metadata become meaningless - the required minimal inline data must be zero - no SR-IOV, it means no support in E-Switch configurations, - no priority and dscp forcing - no VLAN insertion - no TSO - all packets within MPW session must have the same size This legacy MPW feature is mainly intended for test purposes. To explicitly engage the feature on ConnectX-4LX the devargs should be specified: - txq_mpw_en=1 This feature was dropped in 19.08, this patch reverts it back. Fixes: 18a1c20044c0 ("net/mlx5: implement Tx burst template") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-20net/mlx5: fix assert in Tx inline settingsViacheslav Ovsiienko
Assert condition is fixed to not alert for the case when multi-packet write is not supported/engaged at all. Fixes: b53cd86965a1 ("net/mlx5: adjust inline setting for large Tx queue sizes") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-20net/mlx5: fix Tx doorbell write memory barrierViacheslav Ovsiienko
As the result of testing it was found that some hosts have the performance penalty imposed by required write memory barrier after doorbell writing. Before 19.08 release there was some heuristics to decide whether write memory barrier should be performed. For the bursts of recommended size (or multiple) it was supposed there were some extra ongoing packets in the next burst and write memory barrier may be skipped (supposed to be performed in the next burst, at least after descriptor writing). This patch restores that behaviour, the devargs tx_db_nc=2 must be specified to engage this performance tuning feature. Fixes: 8409a28573d3 ("net/mlx5: control transmit doorbell register mapping") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-11net/mlx5: control transmit doorbell register mappingViacheslav Ovsiienko
The rdma core library can map doorbell register in two ways, depending on the environment variable "MLX5_SHUT_UP_BF": - as regular cached memory, the variable is either missing or set to zero. This type of mapping may cause the significant doorbell register writing latency and requires explicit memory write barrier to mitigate this issue and prevent write combining. - as non-cached memory, the variable is present and set to not "0" value. This type of mapping may cause performance impact under heavy loading conditions but the explicit write memory barrier is not required and it may improve core performance. The new devarg is introduced "tx_db_nc", if this parameter is set to zero, the doorbell register is forced to be mapped to cached memory and requires explicit memory barrier after writing to. If "tx_db_nc" is set to non-zero value the doorbell will be mapped as non-cached memory, not requiring the memory barrier. If "tx_db_nc" is missing the behaviour will be defined by presence of "MLX5_SHUT_UP_BF" in environment. If variable is missed the default value zero will be set for ARM64 hosts and one for others. In run time the code checks the mapping type and provides the memory barrier after writing to tx doorbell register if it is needed. The mapping type is extracted directly from the uar_mmap_offset field in the queue properties. Fixes: 18a1c20044c0 ("net/mlx5: implement Tx burst template") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-08ethdev: move egress metadata to dynamic fieldViacheslav Ovsiienko
The dynamic mbuf fields were introduced by [1]. The egress metadata is good candidate to be moved from statically allocated field tx_metadata to dynamic one. Because mbufs are used in half-duplex fashion only, it is safe to share this dynamic field with ingress metadata. The shared dynamic field contains either egress (if application going to transmit mbuf with tx_burst) or ingress (if mbuf is received with rx_burst) metadata and can be accessed by RTE_FLOW_DYNF_METADATA() macro or with rte_flow_dynf_metadata_set() and rte_flow_dynf_metadata_get() helper routines. PKT_TX_DYNF_METADATA/PKT_RX_DYNF_METADATA flag will be set along with the data. The mbuf dynamic field must be registered by calling rte_flow_dynf_metadata_register() prior accessing the data. The availability of dynamic mbuf metadata field can be checked with rte_flow_dynf_metadata_avail() routine. DEV_TX_OFFLOAD_MATCH_METADATA offload and configuration flag is removed. The metadata support in PMDs is engaged on dynamic field registration. Metadata feature is getting complex. We might have some set of actions and items that might be supported by PMDs in multiple combinations, the supported values and masks are the subjects to query by perfroming trials (with rte_flow_validate). [1] http://patches.dpdk.org/patch/62040/ Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Ori Kam <orika@mellanox.com>
2019-11-08net/mlx5: support Tx hairpin queuesOri Kam
This commit adds the support for creating Tx hairpin queues. Hairpin queue is a queue that is created using DevX and only used by the HW. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-08net/mlx5: prepare Tx queues to have different typesOri Kam
Currently all Tx queues are created using Verbs. This commit modify the naming so it will not include verbs, since in next commit a new type will be introduce (hairpin) Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-10-08net/mlx5: adjust inline setting for large Tx queue sizesViacheslav Ovsiienko
The hardware may have limitations on maximal amount of supported Tx descriptors building blocks (WQEBB). Application requires the Tx queue must accept the specified amount of packets. If inline data feature is engaged the packet may require more WQEBBs and overall amount of blocks may exceed the hardware capabilities. Application has to make a trade-off between Tx queue size and maximal data inline size. In case if the inline settings are not requested explicitly with devarg keys the default values are used. This patch adjusts the applied default values if large Tx queue size is requested and default inline settings can not be satisfied due to hardware limitations. The explicitly requested inline setting may be aligned (enlarging only) by configurations routines to provide better WQEBB filling, this implicit alignment is the subject for adjustment either. The warning message is emitted to the log if adjustment happens. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-10-07net/mlx5: move backing PCI device to private contextViacheslav Ovsiienko
Now all devices created over the same multiport IB device have shared context containing the backing PCI device field. For the VF LAG configurations it becomes possible the representors might be connected to VF created over different PFs. In this case representors have the different backing PCI devices and mentioned field should be moved to device private area. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-07net/mlx5: fix UAR remap initialization for 32-bit systemsViacheslav Ovsiienko
The txq_uar_init() routine uses the uninitialized uar_mmap_offset field in 32-bit configurations due to this field is initialized after txq_uar_init() call. Fixes: 120dc4a7dcd3 ("net/mlx5: remove device register remap") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
2019-08-06net/mlx5: fix inline data settingsViacheslav Ovsiienko
If the minimal inline data are required the data inline feature must be engaged. There were the incorrect settings enabling the entire small packet inline (in size up to 82B) which may result in sending rate declining if there is no enough cores. The same problem was raised if inline was enabled to support VLAN tag insertion by software. Fixes: 38b4b397a57d ("net/mlx5: add Tx configuration and setup") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
2019-07-23net/mlx5: update Tx queue create for LRODekel Peled
Update function mlx5_txq_ibv_new(), query and store the TIS transport domain value. It is required later on Rx side when creating matching TIR. Add field in mlx5 data structure to store Transport Domain ID. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-23net/mlx5: remove redundant item from unionDekel Peled
A variable of type struct ibv_cq_ex is declared in 2 unions, but isn't used. This patch removes the 2 redundant declarations. Fixes: 6218063b39a6 ("net/mlx5: refactor Rx data path") Fixes: 1d88ba171942 ("net/mlx5: refactor Tx data path") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-23net/mlx5: add Tx configuration and setupViacheslav Ovsiienko
This patch updates the Tx datapath control and configuration structures and code for managing Tx datapath settings. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-07-23net/mlx5: remove Tx implementationViacheslav Ovsiienko
This patch removes the existing Tx datapath code as preparation step before introducing the new implementation. The following entities are being removed: - deprecated devargs support - tx_burst() routines - related PRM definitions - SQ configuration code - Tx routine selection code - incompatible Tx completion code The following devargs are deprecated and ignored: - "txq_inline" is going to be converted to "txq_inline_max" for compatibility issue - "tx_vec_en" - "txqs_max_vec" - "txq_mpw_hdr_dseg_en" - "txq_max_inline_len" is going to be converted to "txq_inline_mpw" for compatibility issue The deprecated devarg keys are recognized by PMD and ignored/converted to the new ones in order not to block device probing. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-07-23net/mlx5: fix typos in commentsDekel Peled
Some spelling mistakes were found in comments. This patch fixes them. Fixes: d10b09db0a45 ("net/mlx5: fix allocation when no memory on device NUMA node") Fixes: fc2c498ccb94 ("net/mlx5: add Direct Verbs translate items") Fixes: 7d6bf6b866b8 ("net/mlx5: add Multi-Packet Rx support") Fixes: f6d9ab4e769f ("net/mlx5: check Tx queue size overflow") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-06-28net/mlx5: fix 32-bit buildAli Alnubani
This is to fix the error: ``` drivers/net/mlx5/mlx5_defs.h:14:26: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'off_t {aka long long int}' [-Werror=format=] drivers/net/mlx5/mlx5_txq.c:569:48: note: format string is defined here DRV_LOG(DEBUG, "port %u: uar_mmap_offset 0x%lx" ~~^ %llx ``` Which reproduces with gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0. Fixes: 6bf10ab69be0 ("net/mlx5: support 32-bit systems") Cc: stable@dpdk.org Signed-off-by: Ali Alnubani <alialnu@mellanox.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-06-14net/mlx5: handle Tx completion with errorMatan Azrad
When WQEs are posted to the HW to send packets, the PMD may get a completion report with error from the HW, aka error CQE which is associated to a bad WQE. The error reason may be bad address, wrong lkey, bad sizes, etc. that can wrongly be configured by the PMD or by the user. Checking all the optional mistakes to prevent error CQEs doesn't make sense due to performance impacts and huge complexity. The error CQEs change the SQ state to error state what causes all the next posted WQEs to be completed with CQE flush error forever. Currently, the PMD doesn't handle Tx error CQEs and even may crashed when one of them appears. Extend the Tx data-path to detect these error CQEs, to report them by the statistics error counters, to recover the SQ by moving the state to ready again and adjusting the management variables appropriately. Sometimes the error CQE root cause is very hard to debug and even may be related to some corner cases which are not reproducible easily, hence a dump file with debug information will be created for the first number of error CQEs, this number can be configured by the PMD probe parameters. Cc: stable@dpdk.org Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-05-27net/mlx5: fix memory free on queue create errorDekel Peled
In function mlx5_rxq_ibv_new(), pointer *tmpl allocation is attempted at the start, but not validated or freed in case of error. In function mlx5_txq_ibv_new(), pointer *txq_ibv allocation is attempted at the start, but not freed in case of error. This patch adds pointers initialization, validation and freeing. Fixes: 09cb5b581762 ("net/mlx5: separate DPDK from verbs Rx queue objects") Fixes: faf2667fe8d5 ("net/mlx5: separate DPDK from verbs Tx queue objects") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com>
2019-05-24net/mlx5: remove unused functionsDekel Peled
Functions implemented but never called: mlx5_rxq_ibv_releasable() mlx5_rxq_cleanup() mlx5_txq_ibv_releasable() Function declared but not implemented: rxq_alloc_mprq_buf() This patch removes these functions from code and header file. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-05-03net/mlx5: check Tx queue size overflowYongseok Koh
If Tx packet inlining is enabled, rdma-core library should allocate large Tx WQ enough to support it. It is better for PMD to calculate the size of WQ based on the parameters and return error with appropriate message if it exceeds the device capability. Cc: stable@dpdk.org Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-05-03net/mlx5: share Memory Regions for multiport deviceViacheslav Ovsiienko
The multiport Infiniband device support was introduced [1]. All active ports, belonging to the same Infiniband device use the single shared Infiniband context of that device and share the resources: - QPs are created within shared context - Verbs flows are also created with specifying port index - DV/DR resources - Protection Domain - Event Handlers This patchset adds support for Memory Regions sharing between ports, created on the base of multiport Infiniband device. The datapath of mlx5 uses the layered cache subsystem for allocating/releasing Memory Regions, only the lowest layer L3 is subject to share due to performance issues. [1] http://patches.dpdk.org/cover/51800/ Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-05-03net/mlx5: fix comments mixing Rx and TxDekel Peled
In mlx5_rxq.c, in some comments, text includes "Tx" instead of "Rx". In mlx5_txq.c, in some comments, text includes "Rx" instead of "Tx". This patch fixes these typos. Fixes: faf2667fe8d5 ("net/mlx5: separate DPDK from verbs Tx queue objects") Fixes: a1366b1a2be3 ("net/mlx5: add reference counter on DPDK Rx queues") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-12net/mlx5: remove device register remapYongseok Koh
UAR (User Access Region) register does not need to be remapped for primary process but it should be remapped only for secondary process. UAR register table is in the process private structure in rte_eth_devices[], (struct mlx5_proc_priv *)rte_eth_devices[port_id].process_private The actual UAR table follows the data structure and the table is used for both Tx and Rx. For Tx, BlueFlame in UAR is used to ring the doorbell. MLX5_TX_BFREG(txq) is defined to get a register for the txq. Processes access its own private data to acquire the register from the UAR table. For Rx, the doorbell in UAR is required in arming CQ event. However, it is a known issue that the register isn't remapped for secondary process. Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2019-04-12net/mlx5: remove redundant queue indexYongseok Koh
Queue index is redundantly stored for both Rx and Tx structures. E.g. txq_ctrl->idx and txq->stats.idx. Both are consolidated to single storage - rxq->idx and txq->idx. Also, rxq and txq are moved to the beginning of its control structure (rxq_ctrl and txq_ctrl) for cacheline alignment. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05net/mlx5: rework PMD global data initYongseok Koh
There's more need to have PMD global data structure. This should be initialized once per a process regardless of how many PMD instances are probed. mlx5_init_once() is called during probing and make sure all the init functions are called once per a process. Currently, such global data and its initialization functions are even scattered. Rather than 'extern'-ing such variables and calling such functions one by one making sure it is called only once by checking the validity of such variables, it will be better to have a global storage to hold such data and a consolidated function having all the initializations. The existing shared memory gets more extensively used for this purpose. As there could be multiple secondary processes, a static storage (local to process) is also added. As the reserved virtual address for UAR remap is a PMD global resource, this doesn't need to be stored in the device priv structure, but in the PMD global data. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29net/mlx5: provide IB port for the object being createdViacheslav Ovsiienko
The code is updated to provide IB port index for the Verbs objects being created - QPs and Verbs Flows. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29net/mlx5: switch to the shared IB device contextViacheslav Ovsiienko
The code is updated to use the shared IB device context and device handles. The IB device context is shared between reprentors created over the single multiport IB device. All Verbs and DevX objects will be created within this shared context. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29net/mlx5: switch to the shared context IB attributesViacheslav Ovsiienko
The code is updated to use the shared IB device attributes, located in the shared IB context. It saves some memory if there are representors created over the single Infiniband device with multiple ports. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29net/mlx5: switch to the shared protection domainViacheslav Ovsiienko
The PMD code is updated to use Protected Domain from the shared IB device context. The Domain is shared between all devices belonging to the same multiport Infiniband device. If IB device has only one port, the PD is not shared, because there is only ethernet device created over IB one. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-01net/mlx: prefix private structureThomas Monjalon
The private structure stored in rte_eth_dev->data->dev_private was named "struct priv". In order to ease code browsing, the structure is renamed "struct mlx[45]_priv". Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-26net/mlx5: support metadata as flow rule criteriaDekel Peled
As described in series starting at [1], it adds option to set metadata value as match pattern when creating a new flow rule. This patch adds metadata support in mlx5 driver, in two parts: - Add the validation and setting of metadata value in matcher, when creating a new flow rule. - Add the passing of metadata value from mbuf to wqe when indicated by ol_flag, in different burst functions. [1] "ethdev: support metadata as flow rule criteria" http://mails.dpdk.org/archives/dev/2018-September/113269.html Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-12net/mlx5: support 32-bit systemsMoti Haimovsky
This patch adds support for building and running mlx5 PMD on 32bit systems such as i686. The main issue to tackle was handling the 32bit access to the UAR as quoted from the mlx5 PRM: QP and CQ DoorBells require 64-bit writes. For best performance, it is recommended to execute the QP/CQ DoorBell as a single 64-bit write operation. For platforms that do not support 64 bit writes, it is possible to issue the 64 bits DoorBells through two consecutive writes, each write 32 bits, as described below: * The order of writing each of the Dwords is from lower to upper addresses. * No other DoorBell can be rung (or even start ringing) in the midst of an on-going write of a DoorBell over a given UAR page. The last rule implies that in a multi-threaded environment, the access to a UAR page (which can be accessible by all threads in the process) must be synchronized (for example, using a semaphore) unless an atomic write of 64 bits in a single bus operation is guaranteed. Such a synchronization is not required for when ringing DoorBells on different UAR pages. Signed-off-by: Moti Haimovsky <motih@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-11net/mlx5: drop useless support for several Verbs portsAdrien Mazarguil
Unlike mlx4 from which this capability was inherited, mlx5 devices expose exactly one Verbs port per PCI bus address. Each physical port gets assigned its own bus address with a single Verbs port. While harmless, this code requires an extra loop that would get in the way of subsequent refactoring. No functional impact. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-03net/mlx5: separate generic tunnel TSO from the standard oneShahaf Shuler
The generic tunnel TSO was depended in the regular one capabilities to be enabled. Cc: stable@dpdk.org Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-07-03net/mlx5: clean-up developer logsNelio Laranjeiro
Split maintainers logs from user logs. A lot of debug logs are present providing internal information on how the PMD works to users. Such logs should not be available for them and thus should remain available only when the PMD is compiled in debug mode. This commits removes some useless debug logs, move the Maintainers ones under DEBUG and also move dump into debug mode only. Cc: stable@dpdk.org Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-14net/mlx5: add new memory region supportYongseok Koh
This is the new design of Memory Region (MR) for mlx PMD, in order to: - Accommodate the new memory hotplug model. - Support non-contiguous Mempool. There are multiple layers for MR search. L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized array by linear search. L0/L1 is in an inline function - mlx5_mr_lookup_cache(). If L1 misses, the bottom-half function is called to look up the address from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh() and it is not an inline function. Data structure for L2 is the Binary Tree. If L2 misses, the search falls into the slowest path which takes locks in order to access global device cache (priv->mr.cache) which is also a B-tree and caches the original MR list (priv->mr.mr_list) of the device. Unless the global cache is overflowed, it is all-inclusive of the MR list. This is L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and can't be expanded on the fly due to deadlock. Refer to the comments in the code for the details - mr_lookup_dev(). If L3 is overflowed, the list will have to be searched directly bypassing the cache although it is slower. If L3 misses, a new MR for the address should be created - mlx5_mr_create(). When it creates a new MR, it tries to register adjacent memsegs as much as possible which are virtually contiguous around the address. This must take two locks - memory_hotplug_lock and priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any allocation/free of memory inside. In the free callback of the memory hotplug event, freed space is searched from the MR list and corresponding bits are cleared from the bitmap of MRs. This can fragment a MR and the MR will have multiple search entries in the caches. Once there's a change by the event, the global cache must be rebuilt and all the per-queue caches will be flushed as well. If memory is frequently freed in run-time, that may cause jitter on dataplane processing in the worst case by incurring MR cache flush and rebuild. But, it would be the least probable scenario. To guarantee the most optimal performance, it is highly recommended to use an EAL option - '--socket-mem'. Then, the reserved memory will be pinned and won't be freed dynamically. And it is also recommended to configure per-lcore cache of Mempool. Even though there're many MRs for a device or MRs are highly fragmented, the cache of Mempool will be much helpful to reduce misses on per-queue caches anyway. '--legacy-mem' is also supported. Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14net/mlx5: remove memory region supportYongseok Koh
This patch removes current support of Memory Region (MR) in order to accommodate the dynamic memory hotplug patch. This patch can be compiled but traffic can't flow and HW will raise faults. Subsequent patches will add new MR support. Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14ethdev: new Rx/Tx offloads APIWei Dai
This patch check if a input requested offloading is valid or not. Any reuqested offloading must be supported in the device capabilities. Any offloading is disabled by default if it is not set in the parameter dev_conf->[rt]xmode.offloads to rte_eth_dev_configure() and [rt]x_conf->offloads to rte_eth_[rt]x_queue_setup(). If any offloading is enabled in rte_eth_dev_configure() by application, it is enabled on all queues no matter whether it is per-queue or per-port type and no matter whether it is set or cleared in [rt]x_conf->offloads to rte_eth_[rt]x_queue_setup(). If a per-queue offloading hasn't be enabled in rte_eth_dev_configure(), it can be enabled or disabled for individual queue in ret_eth_[rt]x_queue_setup(). A new added offloading is the one which hasn't been enabled in rte_eth_dev_configure() and is reuqested to be enabled in rte_eth_[rt]x_queue_setup(), it must be per-queue type, otherwise trigger an error log. The underlying PMD must be aware that the requested offloadings to PMD specific queue_setup() function only carries those new added offloadings of per-queue type. This patch can make above such checking in a common way in rte_ethdev layer to avoid same checking in underlying PMD. This patch assumes that all PMDs in 18.05-rc2 have already converted to offload API defined in 17.11 . It also assumes that all PMDs can return correct offloading capabilities in rte_eth_dev_infos_get(). In the beginning of [rt]x_queue_setup() of underlying PMD, add offloads = [rt]xconf->offloads | dev->data->dev_conf.[rt]xmode.offloads; to keep same as offload API defined in 17.11 to avoid upper application broken due to offload API change. PMD can use the info that input [rt]xconf->offloads only carry the new added per-queue offloads to do some optimization or some code change on base of this patch. Signed-off-by: Wei Dai <wei.dai@intel.com> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2018-05-14net/mlx5: change device reference for secondary processYongseok Koh
rte_eth_devices[] is not shared between primary and secondary process, but a static array to each process. The reverse pointer of device (priv->dev) is invalid. Instead, priv has the pointer to shared data of the device, struct rte_eth_dev_data *dev_data; Two macros are added, #define PORT_ID(priv) ((priv)->dev_data->port_id) #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)]) Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14net/mlx5: fix calculation of Tx TSO inline room sizeYongseok Koh
rdma-core doesn't add up max_tso_header size to max_inline_data size. The library takes bigger value between the two. Fixes: 43e9d9794cde ("net/mlx5: support upstream rdma-core") Cc: stable@dpdk.org Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-05-14net/mlx5: fix SW parser enablingXueming Li
Fixes: 5f8ba81c4228 ("net/mlx5: support generic tunnel offloading") Signed-off-by: Xueming Li <xuemingl@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-04-27net/mlx5: support generic tunnel offloadingXueming Li
This commit adds support for generic tunnel TSO and checksum offload. PMD will compute the inner/outer headers offset according to the mbuf fields. Hardware will do calculation based on offsets and types. Signed-off-by: Xueming Li <xuemingl@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-04-11align SPDX Mellanox copyrightsShahaf Shuler
Aligning Mellanox SPDX copyrights to a single format. In addition replace to SPDX licence files which were missed. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-03-30net/mlx5: fix TSO enablementShahaf Shuler
TSO should be set if either of the TSO offload flags is requested. Fixes: dbccb4cddcd2 ("net/mlx5: convert to new Tx offloads API") Cc: stable@dpdk.org Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-03-30net/mlx5: use dynamic loggingNélio Laranjeiro
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-03-30net/mlx5: use port id in PMD logNélio Laranjeiro
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-03-30net/mlx5: standardize on negative errno valuesNélio Laranjeiro
Set rte_errno systematically as well. Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-03-30net/mlx5: change non failing function return valuesNélio Laranjeiro
These functions return int although they are not supposed to fail, resulting in unnecessary checks in their callers. Some are returning error where is should be a boolean. Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>