diff options
author | Yongseok Koh <yskoh@mellanox.com> | 2017-10-24 17:27:25 -0700 |
---|---|---|
committer | Ferruh Yigit <ferruh.yigit@intel.com> | 2017-10-26 02:33:01 +0200 |
commit | fb870be5a879c6617fecabf47873ae2b576e6e69 (patch) | |
tree | abd79339ea355b7eeed46287da4090dff6baffa5 /doc/guides/nics/mlx5.rst | |
parent | 2262eed7523b305e59e8172a141b75e385b43cf0 (diff) | |
download | dpdk-fb870be5a879c6617fecabf47873ae2b576e6e69.zip dpdk-fb870be5a879c6617fecabf47873ae2b576e6e69.tar.gz dpdk-fb870be5a879c6617fecabf47873ae2b576e6e69.tar.xz |
net/mlx5: fix Tx doorbell memory barrier
Configuring UAR as IO-mapped makes maximum throughput decline by
noticeable amount. If UAR is configured as write-combining register,
a write memory barrier is needed on ringing a doorbell.
rte_wmb() is mostly effective when the size of a burst is comparatively
small. Revert the register back to write-combining and enforce a write
memory barrier instead, except for vectorized Tx burst routines.
Application can change it by setting MLX5_SHUT_UP_BF under its own
necessity.
Fixes: 9f9bebae5530 ("net/mlx5: don't map doorbell register to write combining")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Diffstat (limited to 'doc/guides/nics/mlx5.rst')
-rw-r--r-- | doc/guides/nics/mlx5.rst | 17 |
1 files changed, 17 insertions, 0 deletions
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index bcc39f6..cdb880a 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -173,6 +173,23 @@ Environment variables This is disabled by default since this can also decrease performance for unaligned packet sizes. +- ``MLX5_SHUT_UP_BF`` + + Configures HW Tx doorbell register as IO-mapped. + + By default, the HW Tx doorbell is configured as a write-combining register. + The register would be flushed to HW usually when the write-combining buffer + becomes full, but it depends on CPU design. + + Except for vectorized Tx burst routines, a write memory barrier is enforced + after updating the register so that the update can be immediately visible to + HW. + + When vectorized Tx burst is called, the barrier is set only if the burst size + is not aligned to MLX5_VPMD_TX_MAX_BURST. However, setting this environmental + variable will bring better latency even though the maximum throughput can + slightly decline. + Run-time configuration ~~~~~~~~~~~~~~~~~~~~~~ |