summaryrefslogtreecommitdiff
path: root/doc/guides/prog_guide/rcu_lib.rst
blob: 8fe5b1f73850f816d8ff79d0da4f435f3ddfc338 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
..  SPDX-License-Identifier: BSD-3-Clause
    Copyright(c) 2019 Arm Limited.

.. _RCU_Library:

RCU Library
============

Lockless data structures provide scalability and determinism.
They enable use cases where locking may not be allowed
(for example real-time applications).

In the following sections, the term "memory" refers to memory allocated
by typical APIs like malloc() or anything that is representative of
memory, for example an index of a free element array.

Since these data structures are lockless, the writers and readers
are accessing the data structures concurrently. Hence, while removing
an element from a data structure, the writers cannot return the memory
to the allocator, without knowing that the readers are not
referencing that element/memory anymore. Hence, it is required to
separate the operation of removing an element into two steps:

#. Delete: in this step, the writer removes the reference to the element from
   the data structure but does not return the associated memory to the
   allocator. This will ensure that new readers will not get a reference to
   the removed element. Removing the reference is an atomic operation.

#. Free (Reclaim): in this step, the writer returns the memory to the
   memory allocator only after knowing that all the readers have stopped
   referencing the deleted element.

This library helps the writer determine when it is safe to free the
memory by making use of thread Quiescent State (QS).

What is Quiescent State
-----------------------

Quiescent State can be defined as "any point in the thread execution where the
thread does not hold a reference to shared memory". It is up to the application
to determine its quiescent state.

Let us consider the following diagram:

.. _figure_quiescent_state:

.. figure:: img/rcu_general_info.*

   Phases in the Quiescent State model.


As shown in :numref:`figure_quiescent_state`, reader thread 1 accesses data
structures D1 and D2. When it is accessing D1, if the writer has to remove an
element from D1, the writer cannot free the memory associated with that
element immediately. The writer can return the memory to the allocator only
after the reader stops referencing D1. In other words, reader thread RT1 has
to enter a quiescent state.

Similarly, since reader thread 2 is also accessing D1, the writer has to
wait till thread 2 enters quiescent state as well.

However, the writer does not need to wait for reader thread 3 to enter
quiescent state. Reader thread 3 was not accessing D1 when the delete
operation happened. So, reader thread 1 will not have a reference to the
deleted entry.

It can be noted that, the critical sections for D2 is a quiescent state
for D1. i.e. for a given data structure Dx, any point in the thread execution
that does not reference Dx is a quiescent state.

Since memory is not freed immediately, there might be a need for
provisioning of additional memory, depending on the application requirements.

Factors affecting the RCU mechanism
-----------------------------------

It is important to make sure that this library keeps the overhead of
identifying the end of grace period and subsequent freeing of memory,
to a minimum. The following explains how grace period and critical
section affect this overhead.

The writer has to poll the readers to identify the end of grace period.
Polling introduces memory accesses and wastes CPU cycles. The memory
is not available for reuse during the grace period. Longer grace periods
exasperate these conditions.

The length of the critical section and the number of reader threads
is proportional to the duration of the grace period. Keeping the critical
sections smaller will keep the grace period smaller. However, keeping the
critical sections smaller requires additional CPU cycles (due to additional
reporting) in the readers.

Hence, we need the characteristics of a small grace period and large critical
section. This library addresses this by allowing the writer to do
other work without having to block until the readers report their quiescent
state.

RCU in DPDK
-----------

For DPDK applications, the start and end of a ``while(1)`` loop (where no
references to shared data structures are kept) act as perfect quiescent
states. This will combine all the shared data structure accesses into a
single, large critical section which helps keep the overhead on the
reader side to a minimum.

DPDK supports a pipeline model of packet processing and service cores.
In these use cases, a given data structure may not be used by all the
workers in the application. The writer does not have to wait for all
the workers to report their quiescent state. To provide the required
flexibility, this library has a concept of a QS variable. The application
can create one QS variable per data structure to help it track the
end of grace period for each data structure. This helps keep the grace
period to a minimum.

How to use this library
-----------------------

The application must allocate memory and initialize a QS variable.

Applications can call ``rte_rcu_qsbr_get_memsize()`` to calculate the size
of memory to allocate. This API takes a maximum number of reader threads,
using this variable, as a parameter. Currently, a maximum of 1024 threads
are supported.

Further, the application can initialize a QS variable using the API
``rte_rcu_qsbr_init()``.

Each reader thread is assumed to have a unique thread ID. Currently, the
management of the thread ID (for example allocation/free) is left to the
application. The thread ID should be in the range of 0 to
maximum number of threads provided while creating the QS variable.
The application could also use ``lcore_id`` as the thread ID where applicable.

The ``rte_rcu_qsbr_thread_register()`` API will register a reader thread
to report its quiescent state. This can be called from a reader thread.
A control plane thread can also call this on behalf of a reader thread.
The reader thread must call ``rte_rcu_qsbr_thread_online()`` API to start
reporting its quiescent state.

Some of the use cases might require the reader threads to make blocking API
calls (for example while using eventdev APIs). The writer thread should not
wait for such reader threads to enter quiescent state.  The reader thread must
call ``rte_rcu_qsbr_thread_offline()`` API, before calling blocking APIs. It
can call ``rte_rcu_qsbr_thread_online()`` API once the blocking API call
returns.

The writer thread can trigger the reader threads to report their quiescent
state by calling the API ``rte_rcu_qsbr_start()``. It is possible for multiple
writer threads to query the quiescent state status simultaneously. Hence,
``rte_rcu_qsbr_start()`` returns a token to each caller.

The writer thread must call ``rte_rcu_qsbr_check()`` API with the token to
get the current quiescent state status. Option to block till all the reader
threads enter the quiescent state is provided. If this API indicates that
all the reader threads have entered the quiescent state, the application
can free the deleted entry.

The APIs ``rte_rcu_qsbr_start()`` and ``rte_rcu_qsbr_check()`` are lock free.
Hence, they can be called concurrently from multiple writers even while
running as worker threads.

The separation of triggering the reporting from querying the status provides
the writer threads flexibility to do useful work instead of blocking for the
reader threads to enter the quiescent state or go offline. This reduces the
memory accesses due to continuous polling for the status.

The ``rte_rcu_qsbr_synchronize()`` API combines the functionality of
``rte_rcu_qsbr_start()`` and blocking ``rte_rcu_qsbr_check()`` into a single
API. This API triggers the reader threads to report their quiescent state and
polls till all the readers enter the quiescent state or go offline. This API
does not allow the writer to do useful work while waiting and introduces
additional memory accesses due to continuous polling.

The reader thread must call ``rte_rcu_qsbr_thread_offline()`` and
``rte_rcu_qsbr_thread_unregister()`` APIs to remove itself from reporting its
quiescent state. The ``rte_rcu_qsbr_check()`` API will not wait for this reader
thread to report the quiescent state status anymore.

The reader threads should call ``rte_rcu_qsbr_quiescent()`` API to indicate that
they entered a quiescent state. This API checks if a writer has triggered a
quiescent state query and update the state accordingly.

The ``rte_rcu_qsbr_lock()`` and ``rte_rcu_qsbr_unlock()`` are empty functions.
However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid
in debugging issues. One can mark the access to shared data structures on the
reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if
all the locks are unlocked.