summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGavin Hu <gavin.hu@arm.com>2018-11-09 19:42:46 +0800
committerThomas Monjalon <thomas@monjalon.net>2018-11-13 16:57:58 +0100
commit86757c2c3ed5006940f93725d39131dfb0d09b60 (patch)
treee6cf7cc445632637e838719c161df34dd652fc79
parent5d08fecdd39f2373713a0475b75a4126800c9acf (diff)
downloaddpdk-86757c2c3ed5006940f93725d39131dfb0d09b60.zip
dpdk-86757c2c3ed5006940f93725d39131dfb0d09b60.tar.gz
dpdk-86757c2c3ed5006940f93725d39131dfb0d09b60.tar.xz
ring/c11: keep deterministic order allowing retry to work
Use case scenario: 1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some reasons (running out of cpu time, preempted,...) 2) Thread 2 is enqueuing. It succeeds in enqueuing and moves prod.head forward. 3) Thread 3 is dequeuing. It succeeds in dequeuing and moves the cons.tail beyond the prod.head read by thread 1. 4) Thread 1 is re-scheduled. It reads cons.tail. cpu1(producer) cpu2(producer) cpu3(consumer) load r->prod.head ^ load r->prod.head | load r->cons.tail | store r->prod.head(+n) stalled <-- enqueue -----> | store r->prod.tail(+n) | load r->cons.head | load r->prod.tail | store r->cons.head(+n) | <...dequeue.....> v store r->cons.tail(+n) load r->cons.tail For thread 1, the __atomic_compare_exchange_n detects the outdated prod.head and retry the flow with the new one. This retry flow works ok on strong ordering platform(eg:x86). But for weak ordering platforms(arm, ppc), loading cons.tail and prod.head might be re-ordered, prod.head is new but cons.tail becomes too old, the retry flow, based on the detection of outdated head, does not trigger as expected, thus the outdate cons.tail causes wrong free_entries. Similarly, for dequeuing, outdated prod.tail leads to wrong avail_entries. The fix is to keep the deterministic order of two loads allowing the retry to work. Run the ring perf test on the following testbed: HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core, 4 threads/core, 2.5GHz OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc gcc: 8.1.0 $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \ --socket-mem=1024 -- -i Without the patch: *** Testing using two physical cores *** SP/SC bulk enq/dequeue (size: 8): 5.64 MP/MC bulk enq/dequeue (size: 8): 9.58 SP/SC bulk enq/dequeue (size: 32): 1.98 MP/MC bulk enq/dequeue (size: 32): 2.30 With the patch: *** Testing using two physical cores *** SP/SC bulk enq/dequeue (size: 8): 5.75 MP/MC bulk enq/dequeue (size: 8): 10.18 SP/SC bulk enq/dequeue (size: 32): 1.80 MP/MC bulk enq/dequeue (size: 32): 2.34 The results showed the thread fence degrade the performance slightly, but it is required for correctness. Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
-rw-r--r--lib/librte_ring/rte_ring_c11_mem.h6
1 files changed, 6 insertions, 0 deletions
diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index 7bc74a4..dc49a99 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -66,6 +66,9 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
/* Reset n to the initial burst count */
n = max;
+ /* Ensure the head is read before tail */
+ __atomic_thread_fence(__ATOMIC_ACQUIRE);
+
/* load-acquire synchronize with store-release of ht->tail
* in update_tail.
*/
@@ -139,6 +142,9 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
/* Restore n as it may change every loop */
n = max;
+ /* Ensure the head is read before tail */
+ __atomic_thread_fence(__ATOMIC_ACQUIRE);
+
/* this load-acquire synchronize with store-release of ht->tail
* in update_tail.
*/