Hírolvasó
Paul E. Mc Kenney: What Does It Mean To Be An RCU Implementation?
A correspondent closed out 2022 by sending me an off-list email asking whether or not a pair of Rust crates (rcu_clean and left_right) were really implementations of read-copy update (RCU), with an LWN commenter throwing in crossbeam's epoch crate for good measure. At first glance, this is a pair of simple yes/no questions that one should be able to answer off the cuff.
What Is An RCU?Except that there is quite a variety of RCU implementations in the wild. Even if we remain within the cozy confines of the Linux kernel, we have: (1) The original "vanilla" RCU, (2) Sleepable RCU (SRCU), (3) Tasks RCU, (4) Tasks Rude RCU, and Tasks Trace RCU. These differ not just in performance characteristics, in fact, it is not in general possible to mechanically convert (say) SRCU to RCU. The key attributes of RCU implementations are the marking of read-side code regions and data accesses on the one hand and some means of waiting on all pre-existing readers on the other. For more detail, see the 2019 LWN article and for more background, see the Linux Foundation RCU presentations here and here.
The next sections provide an overview of the Linux-kernel RCU implementations' functional properties, with performance and scalability characteristics left as an exercise for the interested reader.
Vanilla RCUVanilla RCU has quite a variety of bells and whistles:
- Explicit nesting read-side markers, rcu_read_lock(), rcu_read_unlock(), rcu_dereference(), and friends.
- Pointer-update function, rcu_assign_pointer().
- Synchronous grace-period-wait primitives, synchronize_rcu() and synchronize_rcu_expedited().
- An asynchronous grace-period wait primitive, call_rcu(). And additionally a synchronous callback wait primitive, rcu_barrier().
- Polled grace-period wait primitives.
SRCU has a similar variety of bells and whistles, but some important differences. The most important difference is that SRCU supports multiple domains, each represented by an srcu_struct structure. A reader in one domain does not block a grace period in another domain. In contrast, RCU is global in nature, with exactly one domain. On the other hand, the price SRCU pays for this flexibility is reduced amortization of grace-period overhead.
- Explicit read-side markers, srcu_read_lock(), srcu_read_unlock(), srcu_dereference(), and friends. Except that, unlike rcu_read_lock() and rcu_read_unlock(), srcu_read_lock() and srcu_read_unlock() do not nest. Instead, the return value from srcu_read_lock() must be passed to the corresponding srcu_read_unlock(). This means that SRCU (but not RCU!) can represent non-nested partially overlapping read-side critical sections. Not that this was considered a good thing, instead being a way of avoiding the need for T*S storage, where T is the number of tasks and S the number of srcu_struct structures.
- Synchronous grace-period-wait primitives, synchronize_srcu() and synchronize_srcu_expedited().
- An asynchronous grace-period wait primitive, call_srcu(). And additionally a synchronous callback wait primitive, srcu_barrier().
- Polled grace-period wait primitives, although less variety than RCU enjoys. (Does this enjoyment extend to RCU's users? You decide.)
Tasks RCU was designed specially to handle the trampolines used in Linux-kernel tracing.
- It has no explicit read-side markers. Instead, voluntary context switches separate successive Tasks RCU read-side critical sections.
- A synchronous grace-period-wait primitives, synchronize_rcu_tasks().
- An asynchronous grace-period wait primitive, call_rcu_tasks(). And additionally a synchronous callback wait primitive, rcu_barrier_tasks().
- No polled grace-period wait primitives.
By design, Tasks RCU does not wait for idle tasks. Something about them never doing any voluntary context switches on CPUs that remain idle for long periods of time. So trampoline that might be involved in tracing of code within the idle loop need something else, and that something is Tasks Rude RCU.
- It has no explicit read-side markers. Instead, any preemption-disabled region of code is a Tasks Rude RCU reader.
- A synchronous grace-period-wait primitives, synchronize_rcu_tasks_rude().
- An asynchronous grace-period wait primitive, call_rcu_tasks_rude(). And additionally a synchronous callback wait primitive, rcu_barrier_tasks_rude().
- No polled grace-period wait primitives.
Both Tasks RCU and Tasks Rude RCU disallow sleeping while executing in a given trampoline. Some BPF programs need to sleep, hence Tasks Trace RCU.
- Explicit nesting read-side markers, rcu_read_lock_trace() and rcu_read_unlock_trace().
- A synchronous grace-period-wait primitives, synchronize_rcu_tasks_trace().
- An asynchronous grace-period wait primitive, call_rcu_tasks_trace(). And additionally a synchronous callback wait primitive, rcu_barrier_tasks_trace().
- No polled grace-period wait primitives.
The various Linux examples are taken from a code base in which RCU has been under active development for more than 20 years, which might yield an overly stringent set of criteria. In contrast, the 1990s DYNIX/ptx implementation of RCU (called "rclock" for "read-copy lock") was only under active development for about five years. The implementation was correspondingly minimal, as can be seen from this February 2001 patch (hat trick to Greg Lehey):
- Explicit nesting read-side markers, RC_RDPROTECT() and RC_RDUNPROTECT(). The lack of anything resembling rcu_dereference() shows just how small DYNIX/ptx's installed base was.
- Pointer-update barrier, RC_MEMSYNC(). This is the counterpart of smp_wmb() in early Linux-kernel RCU use cases.
- No synchronous grace-period-wait primitive.
- An asynchronous grace-period wait primitive, rc_callback(). However, there was no synchronous callback wait primitive, perhaps because DYNIX/ptx did not have modules, let alone unloadable ones.
- No polled grace-period wait primitives.
Perhaps this can form the basis of an RCU classification system, though some translation will no doubt be required to bridge from C to Rust. There is ownership, if nothing else!
RCU Classification and Rust RCU CratesExcept that the first RCU crate, rcu_clean, throws a monkey wrench into the works. It does not have any grace-period primitives, but instead a clean() function that takes a reference to a RCU-protected data item. The user invokes this at some point in the code where it is known that there are no readers, either within this thread or anywhere else. In true Rust fashion, in some cases, the compiler is able to prove the presence or absence of readers and issue a diagnostic when needed. The documentation notes that the addition of grace periods (also known as "epochs") would allow greater accuracy.
This sort of thing is not unprecedented. The userspace RCU library has long had an rcu_quiescent_state() function that can be invoked from a given thread when that particular thread is in a quiescent state, and thus cannot have references to any RCU-protected object. However, rcu_clean takes this a step further by having no RCU grace-period mechanism at all.
Nevertheless, rcu_clean could be used to implement the add-only list RCU use case, so it is difficult to argue that is not an RCU implementation. But it is clearly a very primitive implementation. That said, primitive implementations do have their place, for example:
- Languages with garbage collectors have built-in RCU updaters.
- Programs with short runtimes can just leak memory, cleaning up when restarted.
- Other synchronization primitives can be used to protect and exclude readers.
In addition, an RCU implementation even more primitive than rcu_clean would omit the clean() function, instead leaking memory that had been removed from an RCU-protected structure.
The left_right crate definitely uses RCU in the guise of epochs, and it can be used for at least some of the things that RCU can be used for. It does have a single-writer restriction, though as the documentation says, you could use a Mutex to serialize at least some multi-writer use cases. In addition, it has long been known that RCU use cases involving only a single writer thread permit wait-free updaters as well as wait-free readers.
One might argue that the fact that the left_right crate uses RCU means that it cannot possibly be itself an implementation of RCU. Except that in the Linux kernel, RCU Tasks uses vanilla RCU, RCU Tasks Trace uses SRCU, and previous versions of SRCU used vanilla RCU. So let's give the left_right crate the benefit of the doubt, at least for the time being, but with the understanding that it might eventually instead be classified as an RCU use case rather than an RCU implementation.
The crossbeam epoch crate again uses the guise of epochs. It has explicit read-side markers in RAII guard form using the pin function and its Atomic pointers. Grace periods are computed automatically, and the defer method provides an asynchronous grace-period-wait function. As with DYNIX/ptx, the crossbeam epoch crate lacks any other means of waiting for grace periods, and it also lacks a callback-wait API. However, to it credit, and unlike DYNIX/ptx, this crate does provide safe means for handling pointers to RCU-protected data.
Here is a prototype classification system, again, leaving performance and scalability aside:
- Are there explicit RCU read-side markers? Of the Linux-kernel RCU implementations, RCU Tasks and RCU Tasks Rude lack such markers. Given the Rust borrow checker, it is hard to imagine an implementation without such markers, but feel free to prove me wrong.
- Are grace periods computed automatically? (If not, as in rcu_clean, none of the remaining questions apply.)
- Are there synchronous grace-period-wait APIs? All of the Linux-kernel implementations do, and left_right also looks to.
- Are there asynchronous grace-period-wait APIs? If so, are there callback-wait APIs?All of the Linux-kernel implementations do, but left_right does not appear to. Providing them seems doable, but might result in more than two copies of recently-updated data structures. The crossbeam's epoch crate provides an asynchronous grace-period-wait function in the form of defer, but lacks a callback-wait API.
- Are there polled grace-period-wait APIs? The Linux-kernel RCU and SRCU implementations do.
- Are there multiple grace-period domains? The Linux-kernel SRCU implementation does.
But does this classification scheme work for your favorite RCU implementation? What about your favorite RCU use case?
History- January 25, 2023: Initial version.
- January 26, 2023: Add DYNIX/ptx RCU equivalent, note that left_right might be a use of RCU rather than an implementation, and call out the fact that some of the Linux-kernel RCU implementations are based on others.
- January 30, 2023: Respond to LWN request for the crossbeam crate. Expand the section summarizing RCU.
[$] X clients and byte swapping
A pair of Free Software Foundation governance changes
A history of the FFmpeg project
See, unlike many people I don’t regard FFmpeg as something unique (in the sense that it’s a project only Fabrice Bellard could create). It was nice to have around and it helped immeasurably but without it something else would fill the niche. There were other people working on similar tasks after all (does anybody remember transcode? or gmerlin?). Hopefully you got an idea on how many talented unsung heroes had been working on FFmpeg and libav over the years.
The full set can be found on this page. (Thanks to Paul Wise).
OpenSUSE Leap 15.3 has reached end of life
Kibertámadás is okozhatta a pakisztáni áramszünetet
Pakisztán energiaügyi minisztere kedd reggeli sajtótájékoztatóján utalt arra, hogy a hétfői országos elektromos-hálózati kiesést egy kibertámadás is okozhatta.
The post Kibertámadás is okozhatta a pakisztáni áramszünetet first appeared on Nemzeti Kibervédelmi Intézet.
Security updates for Wednesday
Adatszivárgás történt a Paypalnál
A Paypal tájékoztatása szerint közel 35 000 ügyfél érintett egy tavaly decemberi adatszivárgás incidensben.
The post Adatszivárgás történt a Paypalnál first appeared on Nemzeti Kibervédelmi Intézet.
[$] Python packaging, visions, and unification
WINE 8.0 released
After 4 years of work, the PE conversion is finally complete: all modules can be built in PE format. This is an important milestone on the road to supporting various features such as copy protection, 32-bit applications on 64-bit hosts, Windows debuggers, x86 applications on ARM, etc.
Other changes include WoW64 support (allowing 32-bit modules to call into 64-bit libraries), Print Processor support, improved Direct3D support, and more.
A security audit of Git
For this portion of the research a total of 35 issues were discovered, including 2 critical severity findings and a high severity finding. Additionally, because of this research, a number of potentially catastrophic security bugs were discovered and resolved internally by the git security team.
See the full report for all the details.
Six stable kernel updates
Security updates for Tuesday
Hardverkulcs támogatással érkezett az iOS 16.3
Régóta várt extra védelmi funkcióval jelent meg az Apple által ma kiadott iOS 16.3-os verzió, a frissítést követően elérhető lesz a hardverkulcs támogatás. A hardverkulcsok vagy más néven biztonsági kulcsok, olyan kisméretű fizikai eszközök, amelyek kinézetüket tekintve hasonlítanak egy pendrive-hoz, és fizikai (USB-C) csatlakozással vagy NFC (Near Field Communication) technológiával segítik a felhasználók kétfaktoros (2FA) […]
The post Hardverkulcs támogatással érkezett az iOS 16.3 first appeared on Nemzeti Kibervédelmi Intézet.
Már széles körben tesztelik a Messenger új titkosítási funkcióját
A Meta Platforms bejelentette, hogy a Facebook Messenger végponttól végpontig terjedő titkosítási (E2EE) funkciója teljeskörű tesztelési fázisba lépett világszerte, aminek köszönhetően egyre több felhasználó csevegése lesz ellátva az extra védelmi funkcióval. A közösségi média óriás még augusztusban jelentette be, hogy a funkciót fokozatosan kívánják alapértelmezetten bevezetni platformjaikon, hogy elkerüljék az esetleges negatív hatásokat mind az […]
The post Már széles körben tesztelik a Messenger új titkosítási funkcióját first appeared on Nemzeti Kibervédelmi Intézet.
Game of Trees 0.82 released.
[$] Hiding a process's executable from itself
Zawinski: mozilla.org's 25th anniversary
On January 20th, 1998, Netscape laid off a lot of people. One of them would have been me, as my "department", such as it was, had been eliminated, but I ended up mometarily moving from "clienteng" over to the "website" division. For about 48 hours I thought that I might end up writing a webmail product or something.
That, uh, didn't happen.
That announcement was the opening topic on the second-ever LWN.net Weekly Edition as well.
The return of the Linux Kernel Podcast
This time around, I’m not committing to any specific cadence – let’s call it “periodic” (every few weeks). In each episode, I will aim to broadly summarize the latest happenings in the “plumbing” of the Linux kernel, and occasionally related bits of userspace “plumbing” (glibc, systemd, etc.), as well as impactful toolchain changes that enable new features or rebaseline requirements.