Hírolvasó

Paul E. Mc Kenney: What Does It Mean To Be An RCU Implementation?

2 év 6 hónap óta
Under Construction

A correspondent closed out 2022 by sending me an off-list email asking whether or not a pair of Rust crates (rcu_clean and left_right) were really implementations of read-copy update (RCU), with an LWN commenter throwing in crossbeam's epoch crate for good measure.  At first glance, this is a pair of simple yes/no questions that one should be able to answer off the cuff.

What Is An RCU?

Except that there is quite a variety of RCU implementations in the wild.  Even if we remain within the cozy confines of the Linux kernel, we have: (1) The original "vanilla" RCU, (2) Sleepable RCU (SRCU), (3) Tasks RCU, (4) Tasks Rude RCU, and Tasks Trace RCU.  These differ not just in performance characteristics, in fact, it is not in general possible to mechanically convert (say) SRCU to RCU.  The key attributes of RCU implementations are the marking of read-side code regions and data accesses on the one hand and some means of waiting on all pre-existing readers on the other.  For more detail, see the 2019 LWN article and for more background, see the Linux Foundation RCU presentations here and here.

The next sections provide an overview of the Linux-kernel RCU implementations' functional properties, with performance and scalability characteristics left as an exercise for the interested reader.

Vanilla RCU

Vanilla RCU has quite a variety of bells and whistles:

  • Explicit nesting read-side markers, rcu_read_lock(), rcu_read_unlock(), rcu_dereference(), and friends.
  • Pointer-update function, rcu_assign_pointer().
  • Synchronous grace-period-wait primitives, synchronize_rcu() and synchronize_rcu_expedited().
  • An asynchronous grace-period wait primitive, call_rcu().  And additionally a synchronous callback wait primitive, rcu_barrier().
  • Polled grace-period wait primitives.
Sleepable RCU (SRCU)

SRCU has a similar variety of bells and whistles, but some important differences.  The most important difference is that SRCU supports multiple domains, each represented by an srcu_struct structure.  A reader in one domain does not block a grace period in another domain.  In contrast, RCU is global in nature, with exactly one domain.  On the other hand, the price SRCU pays for this flexibility is reduced amortization of grace-period overhead.

  • Explicit read-side markers, srcu_read_lock(), srcu_read_unlock(), srcu_dereference(), and friends.  Except that, unlike rcu_read_lock() and rcu_read_unlock(), srcu_read_lock() and srcu_read_unlock() do not nest.  Instead, the return value from srcu_read_lock() must be passed to the corresponding srcu_read_unlock().  This means that SRCU (but not RCU!) can represent non-nested partially overlapping read-side critical sections.  Not that this was considered a good thing, instead being a way of avoiding the need for T*S storage, where T is the number of tasks and S the number of srcu_struct structures.
  • Synchronous grace-period-wait primitives, synchronize_srcu() and synchronize_srcu_expedited().
  • An asynchronous grace-period wait primitive, call_srcu(). And additionally a synchronous callback wait primitive, srcu_barrier().
  • Polled grace-period wait primitives, although less variety than RCU enjoys.  (Does this enjoyment extend to RCU's users?  You decide.)
Tasks RCU

Tasks RCU was designed specially to handle the trampolines used in Linux-kernel tracing.

  • It has no explicit read-side markers.  Instead, voluntary context switches separate successive Tasks RCU read-side critical sections.
  • A synchronous grace-period-wait primitives, synchronize_rcu_tasks().
  • An asynchronous grace-period wait primitive, call_rcu_tasks(). And additionally a synchronous callback wait primitive, rcu_barrier_tasks().
  • No polled grace-period wait primitives.
Tasks Rude RCU

By design, Tasks RCU does not wait for idle tasks.  Something about them never doing any voluntary context switches on CPUs that remain idle for long periods of time.  So trampoline that might be involved in tracing of code within the idle loop need something else, and that something is Tasks Rude RCU.

  • It has no explicit read-side markers. Instead, any preemption-disabled region of code is a Tasks Rude RCU reader.
  • A synchronous grace-period-wait primitives, synchronize_rcu_tasks_rude().
  • An asynchronous grace-period wait primitive, call_rcu_tasks_rude(). And additionally a synchronous callback wait primitive, rcu_barrier_tasks_rude().
  • No polled grace-period wait primitives.
Tasks Trace RCU

Both Tasks RCU and Tasks Rude RCU disallow sleeping while executing in a given trampoline.  Some BPF programs need to sleep, hence Tasks Trace RCU.

  • Explicit nesting read-side markers, rcu_read_lock_trace() and rcu_read_unlock_trace().
  • A synchronous grace-period-wait primitives, synchronize_rcu_tasks_trace().
  • An asynchronous grace-period wait primitive, call_rcu_tasks_trace(). And additionally a synchronous callback wait primitive, rcu_barrier_tasks_trace().
  • No polled grace-period wait primitives.
DYNIX/ptx rclock

The various Linux examples are taken from a code base in which RCU has been under active development for more than 20 years, which might yield an overly stringent set of criteria.  In contrast, the 1990s DYNIX/ptx implementation of RCU (called "rclock" for "read-copy lock") was only under active development for about five years.  The implementation was correspondingly minimal, as can be seen from this February 2001 patch (hat trick to Greg Lehey):

  • Explicit nesting read-side markers, RC_RDPROTECT() and RC_RDUNPROTECT(). The lack of anything resembling rcu_dereference() shows just how small DYNIX/ptx's installed base was.
  • Pointer-update barrier, RC_MEMSYNC().  This is the counterpart of smp_wmb() in early Linux-kernel RCU use cases.
  • No synchronous grace-period-wait primitive.
  • An asynchronous grace-period wait primitive, rc_callback(). However, there was no synchronous callback wait primitive, perhaps because DYNIX/ptx did not have modules, let alone unloadable ones.
  • No polled grace-period wait primitives.

Perhaps this can form the basis of an RCU classification system, though some translation will no doubt be required to bridge from C to Rust.  There is ownership, if nothing else!

RCU Classification and Rust RCU Crates

Except that the first RCU crate, rcu_clean, throws a monkey wrench into the works.  It does not have any grace-period primitives, but instead a clean() function that takes a reference to a RCU-protected data item.  The user invokes this at some point in the code where it is known that there are no readers, either within this thread or anywhere else.  In true Rust fashion, in some cases, the compiler is able to prove the presence or absence of readers and issue a diagnostic when needed.  The documentation notes that the addition of grace periods (also known as "epochs") would allow greater accuracy.

This sort of thing is not unprecedented.  The userspace RCU library has long had an rcu_quiescent_state() function that can be invoked from a given thread when that particular thread is in a quiescent state, and thus cannot have references to any RCU-protected object.  However, rcu_clean takes this a step further by having no RCU grace-period mechanism at all.

Nevertheless, rcu_clean could be used to implement the add-only list RCU use case, so it is difficult to argue that is not an RCU implementation.  But it is clearly a very primitive implementation.  That said, primitive implementations do have their place, for example:

  • Languages with garbage collectors have built-in RCU updaters.
  • Programs with short runtimes can just leak memory, cleaning up when restarted.
  • Other synchronization primitives can be used to protect and exclude readers.

In addition, an RCU implementation even more primitive than rcu_clean would omit the clean() function, instead leaking memory that had been removed from an RCU-protected structure.

The left_right crate definitely uses RCU in the guise of epochs, and it can be used for at least some of the things that RCU can be used for.  It does have a single-writer restriction, though as the documentation says, you could use a Mutex to serialize at least some multi-writer use cases.  In addition, it has long been known that RCU use cases involving only a single writer thread permit wait-free updaters as well as wait-free readers.

One might argue that the fact that the left_right crate uses RCU means that it cannot possibly be itself an implementation of RCU.  Except that in the Linux kernel, RCU Tasks uses vanilla RCU, RCU Tasks Trace uses SRCU, and previous versions of SRCU used vanilla RCU.  So let's give the left_right crate the benefit of the doubt, at least for the time being, but with the understanding that it might eventually instead be classified as an RCU use case rather than an RCU implementation.

The crossbeam epoch crate again uses the guise of epochs.  It has explicit read-side markers in RAII guard form using the pin function and its Atomic pointers.  Grace periods are computed automatically, and the defer method provides an asynchronous grace-period-wait function.  As with DYNIX/ptx, the crossbeam epoch crate lacks any other means of waiting for grace periods, and it also lacks a callback-wait API.  However, to it credit, and unlike DYNIX/ptx, this crate does provide safe means for handling pointers to RCU-protected data.

Here is a prototype classification system, again, leaving performance and scalability aside:

  1. Are there explicit RCU read-side markers?  Of the Linux-kernel RCU implementations, RCU Tasks and RCU Tasks Rude lack such markers.  Given the Rust borrow checker, it is hard to imagine an implementation without such markers, but feel free to prove me wrong.
  2. Are grace periods computed automatically?  (If not, as in rcu_clean, none of the remaining questions apply.)
  3. Are there synchronous grace-period-wait APIs?  All of the Linux-kernel implementations do, and left_right also looks to.
  4. Are there asynchronous grace-period-wait APIs?  If so, are there callback-wait APIs?All of the Linux-kernel implementations do, but left_right does not appear to.  Providing them seems doable, but might result in more than two copies of recently-updated data structures.  The crossbeam's epoch crate provides an asynchronous grace-period-wait function in the form of defer, but lacks a callback-wait API.
  5. Are there polled grace-period-wait APIs?  The Linux-kernel RCU and SRCU implementations do.
  6. Are there multiple grace-period domains?  The Linux-kernel SRCU implementation does.

But does this classification scheme work for your favorite RCU implementation?  What about your favorite RCU use case?

History
  • January 25, 2023: Initial version.
  • January 26, 2023: Add DYNIX/ptx RCU equivalent, note that left_right might be a use of RCU rather than an implementation, and call out the fact that some of the Linux-kernel RCU implementations are based on others.
  • January 30, 2023: Respond to LWN request for the crossbeam crate.  Expand the section summarizing RCU.

[$] X clients and byte swapping

2 év 6 hónap óta
While there are still systems with both byte orders, little-endian has largely "won" the battle at this point since the vast majority of today's systems store data with the least-significant byte first (at the lowest address). But when the X11 protocol was developed in the 1980s, there were lots of systems of each byte order, so the X protocol allowed either order and the server (display side) would swap the bytes to its byte order as needed. Over time, the code for swapping data in the messages, which was written in a more-trusting era, has bit-rotted so that it is now a largely untested attack surface that is nearly always unused. Peter Hutterer has been doing some work to stop using that code by default, both in upstream X.org code and in downstream Fedora.
jake

A history of the FFmpeg project

2 év 6 hónap óta
Kostya Shishkov has just posted the concluding installment of an extensive history of the FFmpeg project:

See, unlike many people I don’t regard FFmpeg as something unique (in the sense that it’s a project only Fabrice Bellard could create). It was nice to have around and it helped immeasurably but without it something else would fill the niche. There were other people working on similar tasks after all (does anybody remember transcode? or gmerlin?). Hopefully you got an idea on how many talented unsung heroes had been working on FFmpeg and libav over the years.

The full set can be found on this page. (Thanks to Paul Wise).

corbet

OpenSUSE Leap 15.3 has reached end of life

2 év 6 hónap óta
Users of the openSUSE Leap 15.3 distribution will want to be looking at moving on; support for that release has come to an end. "The currently maintained stable release is openSUSE Leap 15.4, which will be maintained until around end of 2023 (same lifetime as SLES 15 SP4 regular support)".
corbet

Security updates for Wednesday

2 év 6 hónap óta
Security updates have been issued by Debian (libde265, nodejs, and swift), Fedora (nautilus), Oracle (bash, bind, curl, dbus, expat, firefox, go-toolset, golang, java-1.8.0-openjdk, java-11-openjdk, java-17-openjdk, libreoffice, libtiff, libxml2, libXpm, nodejs, nodejs-nodemon, postgresql-jdbc, qemu, ruby:2.5, sqlite, sssd, sudo, and usbguard), Red Hat (bind, go-toolset-1.18, go-toolset:rhel8, kernel, kernel-rt, kpatch-patch, pcs, sssd, and virt:rhel, virt-devel:rhel), Scientific Linux (bind, java-1.8.0-openjdk, kernel, and sssd), SUSE (mozilla-nss, rubygem-websocket-extensions, rust1.65, rust1.66, and samba), and Ubuntu (mysql-5.7, mysql-5.7, mysql-8.0, pam, and samba).
corbet

[$] Python packaging, visions, and unification

2 év 6 hónap óta
The Python community is currently struggling with a longtime difficulty in its ecosystem: how to develop, package, distribute, and maintain libraries and applications. The current situation is sub-optimal in several dimensions due, at least in part, to the existence of multiple, non-interoperable mechanisms and tools to handle some of those needs. Last week, we had an overview of Python packaging as a prelude to starting to dig into the discussions. In this installment, we start to look at the kinds of problems that exist—and the barriers to solving them.
jake

WINE 8.0 released

2 év 6 hónap óta
Version 8.0 of the WINE Windows compatibility layer has been released. The headline feature appears to be the conversion to PE ("portable executable") modules:

After 4 years of work, the PE conversion is finally complete: all modules can be built in PE format. This is an important milestone on the road to supporting various features such as copy protection, 32-bit applications on 64-bit hosts, Windows debuggers, x86 applications on ARM, etc.

Other changes include WoW64 support (allowing 32-bit modules to call into 64-bit libraries), Print Processor support, improved Direct3D support, and more.

corbet

A security audit of Git

2 év 6 hónap óta
The Open Source Technology Improvement Fund has announced the completion of a security audit of the Git source.

For this portion of the research a total of 35 issues were discovered, including 2 critical severity findings and a high severity finding. Additionally, because of this research, a number of potentially catastrophic security bugs were discovered and resolved internally by the git security team.

See the full report for all the details.

corbet

Security updates for Tuesday

2 év 6 hónap óta
Security updates have been issued by Debian (kernel and spip), Fedora (kernel), Mageia (chromium-browser-stable, docker, firefox, jpegoptim, nautilus, net-snmp, phoronix-test-suite, php, php-smarty, samba, sdl2, sudo, tor, viewvc, vim, virtualbox, and x11-server), Red Hat (bash, curl, dbus, expat, firefox, go-toolset, golang, java-1.8.0-openjdk, java-17-openjdk, kernel, kernel-rt, kpatch-patch, libreoffice, libtasn1, libtiff, libxml2, libXpm, nodejs, nodejs-nodemon, pcs, postgresql-jdbc, sqlite, sssd, sudo, systemd, and usbguard), Scientific Linux (firefox, java-11-openjdk, and sudo), SUSE (freeradius-server, python-mechanize, and upx), and Ubuntu (exuberant-ctags, haproxy, ruby2.5, ruby3.0, and wheel).
corbet

Hardverkulcs támogatással érkezett az iOS 16.3

2 év 6 hónap óta

Régóta várt extra védelmi funkcióval jelent meg az Apple által ma kiadott iOS 16.3-os verzió, a frissítést követően elérhető lesz a hardverkulcs támogatás. A hardverkulcsok vagy más néven biztonsági kulcsok, olyan kisméretű fizikai eszközök, amelyek kinézetüket tekintve hasonlítanak egy pendrive-hoz, és fizikai (USB-C) csatlakozással vagy NFC (Near Field Communication) technológiával segítik a felhasználók kétfaktoros (2FA) […]

The post Hardverkulcs támogatással érkezett az iOS 16.3 first appeared on Nemzeti Kibervédelmi Intézet.

NKI

Már széles körben tesztelik a Messenger új titkosítási funkcióját

2 év 6 hónap óta

A Meta Platforms bejelentette, hogy a Facebook Messenger végponttól végpontig terjedő titkosítási (E2EE) funkciója teljeskörű tesztelési fázisba lépett világszerte, aminek köszönhetően egyre több felhasználó csevegése lesz ellátva az extra védelmi funkcióval. A közösségi média óriás még augusztusban jelentette be, hogy a funkciót fokozatosan kívánják alapértelmezetten bevezetni platformjaikon, hogy elkerüljék az esetleges negatív hatásokat mind az […]

The post Már széles körben tesztelik a Messenger új titkosítási funkcióját first appeared on Nemzeti Kibervédelmi Intézet.

NKI

Zawinski: mozilla.org's 25th anniversary

2 év 6 hónap óta
Jamie Zawinski reminds us that the 25th anniversary of the Netscape open-source announcement — a crucial moment in free-software history — has just passed.

On January 20th, 1998, Netscape laid off a lot of people. One of them would have been me, as my "department", such as it was, had been eliminated, but I ended up mometarily moving from "clienteng" over to the "website" division. For about 48 hours I thought that I might end up writing a webmail product or something.

That, uh, didn't happen.

That announcement was the opening topic on the second-ever LWN.net Weekly Edition as well.

corbet

The return of the Linux Kernel Podcast

2 év 6 hónap óta
After a brief break of ... a dozen years or so ... Jon Masters has announced the return of his kernel podcast:

This time around, I’m not committing to any specific cadence – let’s call it “periodic” (every few weeks). In each episode, I will aim to broadly summarize the latest happenings in the “plumbing” of the Linux kernel, and occasionally related bits of userspace “plumbing” (glibc, systemd, etc.), as well as impactful toolchain changes that enable new features or rebaseline requirements.

corbet