|author||Yongseok Koh <email@example.com>||2017-10-24 17:27:25 -0700|
|committer||Ferruh Yigit <firstname.lastname@example.org>||2017-10-26 02:33:01 +0200|
net/mlx5: fix Tx doorbell memory barrier
Configuring UAR as IO-mapped makes maximum throughput decline by noticeable amount. If UAR is configured as write-combining register, a write memory barrier is needed on ringing a doorbell. rte_wmb() is mostly effective when the size of a burst is comparatively small. Revert the register back to write-combining and enforce a write memory barrier instead, except for vectorized Tx burst routines. Application can change it by setting MLX5_SHUT_UP_BF under its own necessity. Fixes: 9f9bebae5530 ("net/mlx5: don't map doorbell register to write combining") Signed-off-by: Yongseok Koh <email@example.com> Acked-by: Shahaf Shuler <firstname.lastname@example.org> Acked-by: Nelio Laranjeiro <email@example.com>
Diffstat (limited to 'doc/guides/nics/mlx5.rst')
1 files changed, 17 insertions, 0 deletions
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index bcc39f6..cdb880a 100644
@@ -173,6 +173,23 @@ Environment variables
This is disabled by default since this can also decrease performance for
unaligned packet sizes.
+ Configures HW Tx doorbell register as IO-mapped.
+ By default, the HW Tx doorbell is configured as a write-combining register.
+ The register would be flushed to HW usually when the write-combining buffer
+ becomes full, but it depends on CPU design.
+ Except for vectorized Tx burst routines, a write memory barrier is enforced
+ after updating the register so that the update can be immediately visible to
+ When vectorized Tx burst is called, the barrier is set only if the burst size
+ is not aligned to MLX5_VPMD_TX_MAX_BURST. However, setting this environmental
+ variable will bring better latency even though the maximum throughput can
+ slightly decline.