Hírolvasó

Paul E. Mc Kenney: Can the Kernel Concurrency Sanitizer Own Rust Code?

3 év 10 hónap óta
Given the data-race-freedom guarantees of Rust's non-unsafe code, one might reasonably argue that there is no point in the Kernel Concurrency Sanitizer (KCSAN) analyzing such code. However, the Linux kernel is going to need unsafe Rust code, and although there has been significant progress in formal verification of such code, one of the leading researchers in this area says “we hope to eventually develop formal methods that can be put directly in the hands of programmers”. We would be wise to take careful note of the word “eventually”. Furthermore, even given unanticipated universal acclamation of Rust within the Linux kernel community combined with equally unanticipated advances in C-to-Rust translation capabilities, a significant fraction of the existing tens of millions of lines of Linux-kernel C code will persist for some time to come. Both the unsafe Rust code and the C code can interfere with Rust non-unsafe code, and furthermore there are special cases where safe code can violate unsafe code's assumptions. Therefore, run-time analysis of Rust code (safe code included) is likely to be able to find issues that compile-time analysis cannot.

As a result, Rust seems unlikely to render KCSAN obsolete any time soon. This suggests that Rust code should include KCSAN instrumentation, both via the compiler and via atomic and volatile primitives, as is currently the case for the C compilers and primitives currently in use. The public KCSAN interface is as follows:

  1. __kcsan_check_access(): This function makes a relevant memory access known to KCSAN. It takes a pointer to the object being accessed, the size of the object, and an access type (for example, zero for read, KCSAN_ACCESS_WRITE for write, and KCSAN_ACCESS_ATOMIC for atomic read-modify-write).
  2. The __tsan_{read,write}{1,2,4,8}() family of functions serve the same purpose as __kcsan_check_access(), but the size and access type are implicit in the function name instead of being passed as arguments, which can provide better performance. This family of functions is used by KCSAN-enabled compilers.
  3. ASSERT_EXCLUSIVE_ACCESS() is invoked from C code and causes KCSAN to complain if there is a concurrent access to the specified object. A companion ASSERT_EXCLUSIVE_ACCESS_SCOPED() expands the scope of the concurrent-access complaint to the full extent of the compound statement invoking this macro.
  4. ASSERT_EXCLUSIVE_WRITER() is invoked from C code and causes KCSAN to complain if there is a concurrent write to the specified object. A companion ASSERT_EXCLUSIVE_WRITER_SCOPED() expands the scope of the concurrent-writer complaint to the full extent of the compound statement invoking this macro.
  5. ASSERT_EXCLUSIVE_BITS() is similar to ASSERT_EXCLUSIVE_WRITER(), but focuses KCSAN's attention on changes to bits set in the mask passed as this macro's second argument.

These last three categories of interface members are designed to be directly invoked in order to check concurrency designs. See for example the direct call to ASSERT_EXCLUSIVE_WRITER() from the rcu_cpu_starting() function in Linux-kernel RCU. It would of course be quite useful for these interface members to be available from within Rust code.

KCSAN also provides a data_race() primitive that suppresses data-race diagnostics, but allows the compiler free rein on optimization. This is used in the Linux kernel for things like infrequent diagnostics that are not part of the overall concurrency design.

KCSAN has helped me fix a number of embarrassing bugs in Linux-kernel RCU that might otherwise have inconvenience end users, so they just might be helpful to people introducing Rust code into the Linux kernel. It is therefore quite encouraging that Marco Elver (KCSAN lead developer and maintainer) points out off-list that the rustc compiler already has sanitizer support, which should suffice not only for KCSAN, but for the equally helpful Kernel Address Sanitizer (KASAN) as well. He also notes that some care is required to pass in the relevant -mllvm options to rustc and to ensure that it is not attempting to link against compiler-rt because doing so is not appropriate when building the Linux kernel. Marco also noted that KCSAN relies on the front-end to properly handle volatile accesses, and Miguel Ojeda kindly confirmed that it does. So there is hope!

For more information on KCSAN and its reason for being:

  1. "Concurrency bugs should fear the big bad data-race detector" (Part 1 and Part 2).
  2. Who's afraid of a big bad optimizing compiler?.
  3. Calibrating your fear of big bad optimizing compilers.
  4. Sections 4.3.4 ("Accessing Shared Variables"), 15.2 ("Tricks and Traps"), and 15.3 ("Compile-Time Consternation") of perfbook.


HistoryOctober 7, 2021: Add current state of KCSAN/Rust integration. Clarify Rust module per Gary Guo email.
October 12, 2021: Self-review.
October 28, 2021: Add data_race() primitive.

[$] Moving Google toward the mainline

3 év 10 hónap óta
Two Google engineers came to Open Source Summit North America 2021 to talk about a project to change the way the company creates and maintains the kernel it runs in its data centers on its production systems. Andrew Delgadillo and Dylan Hatch described the current production kernel (Prodkernel) and the problems that occur because it is so far from the mainline. Project Icebreaker is an effort to change that and to provide a near-mainline kernel for development and testing within Google; the talk looked at the project, its risks, its current status, and its plans.
jake

Asahi Linux Progress Report September

3 év 10 hónap óta
The Asahi Linux project has a progress report on its goal of running Linux on Mac M1 hardware. Earlier this year we saw the absolute lowest level drivers being merged into the kernel. Those are important for bring-up, but to get a usable system we need many more. Over September we’ve seen a lot of action on this front, with many important drivers now in review or even already merged for Linux 5.16. The goal of the Asahi Linux project is to upstream everything into the Linux kernel, so all our drivers are eventually headed for upstream review.
ris

AlmaLinux Foundation opens membership

3 év 10 hónap óta
The AlmaLinux Foundation has opened membership to everyone. The AlmaLinux Foundation [...] was created as a 501(c)(6) non-profit (the same as the Linux Foundation) in order to put OWNERSHIP of the OS, the Intellectual Property and the direction of the project into the hands of the community. By joining as a member (100% free for community members) you have the right and the ability to vote on board members and the direction of the project and other decisions as they will come up in the future.
ris

Paul E. Mc Kenney: Will Your Rust Code Survive the Attack of the Zombie Pointers?

3 év 10 hónap óta
Some of the previous posts in this series have been said to be quite difficult, so I figured I owed you all an easy one. And the zombie-pointer problem really does have a trivial solution, at least in the context of the Linux kernel. In other environments, all bets are off.

But first, what on earth is a zombie pointer?

To answer that question, let's start with a last-in-first-out (LIFO) stack. We don't know when this algorithm was invented or who invented it, but a very similar algorithm was described in R. J. Treiber's classic 1986 technical report (IBM Almaden Research Center RJ 5118), and it was referred to as prior art in (expired) US Patent 3,886,525, which was filed in 1973 (hat trick to Maged Michael). Fun though it might be to analyze this code's original IBM assembly-language implementation, let's instead look at a C11 implementation:

1 struct node_t* _Atomic top; 2 3 void list_push(value_t v) 4 { 5 struct node_t *newnode = (struct node_t *) malloc(sizeof(*newnode)); 6 7 set_value(newnode, v); 8 newnode->next = atomic_load(&top); 9 do { 10 // newnode->next may have become invalid 11 } while (!atomic_compare_exchange_weak(&top, &newnode->next, newnode)); 12 } 13 14 void list_pop_all() 15 { 16 struct node_t *p = atomic_exchange(&top, NULL); 17 18 while (p) { 19 struct node_t *next = p->next; 20 21 foo(p); 22 free(p); 23 p = next; 24 } 25 }
Those used to the Linux-kernel cmpxchg() atomic operation should keep in mind that when the C11 atomic_compare_exchange_weak() fails, it writes the current value referenced by its first argument to the pointer passed as its second argument. In other words, instead of passing atomic_compare_exchange_weak() the new value, you instead pass it a pointer to the new value. Each time atomic_compare_exchange_weak() fails, it updates the pointed-to new value. This sometimes allows this function to be used in a tight loop, as in line 11 above.

With these atomic_compare_exchange_weak() semantics in mind, let's step through execution of list_push(C) on a stack initially containing objects A and B:

  1. Line 5 allocates memory for the new object C.
  2. Line 7 initializes this newly allocated memory.
  3. Line 8 sets the new object's ->next pointer to the current top-of-stack pointer.
  4. Line 11 invokes atomic_compare_exchange_weak(), which atomically compares the value of newnode->next to the value of top, and if they are equal, stores the pointer newnode into top and returns true. (As noted before, if the comparison is instead not-equal, atomic_compare_exchange_weak() instead stores the value of top into newnode->next and returns false, repeating until such time as the pointers compare equal.)
  5. One way or another, the stack ends up containing objects C, A, and B, ordered from the top of stack down.

Oddly enough, this code would still work even if line 8 were omitted, however, that would result in a high probability that the first call to atomic_compare_exchange_weak() would fail, which would not be so good for common-case performance. It would also result in compile-time complaints about uninitialized variables, so line 8 is doubly unlikely to be omitted in real life.

While we are at it, let's step through list_pop_all():

  1. Line 16 atomically stores NULL into top and returns its previous value. After this lines executes, the stack is empty and its previous contents are referenced by local variable p.
  2. Lines 18-24 execute for each object on list p:
    1. Line 19 prefetches a pointer to the next object.
    2. Line 21 passes the current object to function foo().
    3. Line 22 frees the current object.
    4. Line 23 advances to the next object.

Thus far, we have seen no zombie pointers. Let's therefore consider the following sequence of events, with the stack initially containing objects A and B:

  1. Thread 1 starts pushing object C, but is interrupted, preempted, otherwise impeded just after executing line 8.
  2. Thread 2 pops the entire list and frees all of the objects. Object C's ->next pointer is now invalid because it points to now-freed memory formerly known as object A.
  3. Thread 2 pushes an object, and happens to allocate memory for it at the same address as the old object A, so let's call it A'.
  4. Object C's ->next pointer is still invalid as far as an omniscient compiler is concerned, but from an assembly language viewpoint happens to reference the type-compatible object A'. This pointer is now a "zombie pointer" that has come back from the dead, or that has at least come to have a more entertaining form of invalidity.
  5. Thread 1 resumes and executes the atomic_compare_exchange_weak() on line 11. Now this primitive, like its Linux-kernel counterpart cmpxchg(), operates not on the compiler's idea of what the pointer is, but rather on the actual bits making up that pointer. Therefore, that atomic_compare_exchange_weak() succeeds despite the fact that the object C's ->next pointer is invalid.
  6. Thread 1 has thus completed its push of object C, and the resulting stack looks great from an assembly-language viewpoint. But the pointer from object C to A' is a zombie pointer, and compilers are therefore within their rights to do arbitrarily strange things with it.

As noted earlier, this has a trivial solution within the Linux kernel. Please do not peek too early. ;-)

But what can be done outside of the Linux kernel?

The traditional solution has been to hide the allocator from the compiler. After all, if the compiler cannot see the calls to malloc() and free(), it cannot figure out that a given pointer is invalid (or, in C-standard parlance, indeterminate). Creating your own allocator with its own function names suffices, and back in the day you often needed to create your own allocator if performance and scalability was important to you.

However, this would also deprive Rust of information that it needs to check pointer operations. So maybe Rust needs to know about allocation and deallocation, but needs to be very careful not to pass this information on to the optimizer. Assuming that this even makes sense in the context of the implementation of Rust.

Perhaps code working with invalid and zombie pointers needs to be carefully written in C. And perhaps there is some way to write it safely in unsafe Rust.

The final approach is to wait until backend support for safe handling of invalid/zombie pointers arrives, and then add Rust syntax as appropriate. Here is some documentation of current efforts towards adding such support to C++:

  1. CPPCON 2020 presentation
  2. Pointer lifetime-end zap (informational/historical)
  3. Pointer lifetime-end zap proposed solutions

So there is good progress, but it might still be some time before this is supported by either the standard or by compilers.

HistoryOctober 6, 2021: Expand description per feedback from Jonathan Corbet
October 12, 2021: Self-review.

Firefox 93.0

3 év 10 hónap óta
Firefox 93.0 has been released. With this version Firefox supports the new AVIF image format, which is based on the modern and royalty free AV1 video codec. The PDF viewer supports filling more forms, such as XFA-based forms used by multiple governments and banks. Downloads that rely on insecure connections are blocked, protecting against potentially malicious or unsafe downloads. Details on these features and more can be found in the release notes.
ris

LLVM 13.0.0 released

3 év 10 hónap óta
Version 13.0.0 of the LLVM compiler suite is out. There is a long list of changes, as always; see the numerous sets of release notes below for details.
corbet

Security updates for Tuesday

3 év 10 hónap óta
Security updates have been issued by Fedora (cryptopp), Mageia (kernel, kernel-linus, and sqlite), openSUSE (rabbitmq-server), Red Hat (kernel and samba), SUSE (glibc and webkit2gtk3), and Ubuntu (containerd, docker.io, imlib2, ledgersmb, mercurial, mongodb, and node-bl).
ris

Paul E. Mc Kenney: How Much of the Kernel Can Rust Own?

3 év 10 hónap óta
Rust concurrency makes heavy use of ownership and borrowing. The purpose of this post is not to give an exposition of Rust's capabilities and limitations in this area, but rather to give a series of examples of ownership in the Linux kernel.

The first example involves Linux-kernel per-CPU variables. In some cases, such variables are protected by per-CPU locks, for example, a number of fields in the per-CPU rcu_data structure are used by the kernel threads that manage grace periods for offloaded callbacks, and these fields are protected by the ->nocb_gp_lock field in the same instance of that same structure. In other cases, access to a given per-CPU variable is permitted only by the corresponding CPU, and even then only if that CPU has disabled preemption. For example, the per-CPU rcu_data structure's ->ticks_this_gp field may be updated only from the corresponding CPU, and only when preemption is disabled. In the particular case, preemption is disabled as a side-effect of having disabled interrupts.

The second example builds on the first. In kernels built with CONFIG_RCU_NOCB_CPU=n, the per-CPU rcu_data structure's ->cblist field may be updated from the corresponding CPU, and only when preemption is disabled. However, it is also allowed from some other CPU when the corresponding CPU has been taken offline, but only when that other CPU that is orchestrating the offlining of the corresponding CPU.

(What about kernels built with CONFIG_RCU_NOCB_CPU=y? They must also acquire a ->nocb_lock that is also contained within the per-CPU rcu_data structure.)

The third example moves to the non-per-CPU rcu_node structures, which are arranged into a combining tree that processes events from an arbitrarily large number of CPUs while maintaining bounded lock contention. Each rcu_node> structure has a ->qsmask bitmask that tracks which of its children need to report a quiescent state for the current grace period, and a ->gp_tasks pointer that, when non-NULL, references a list of tasks that blocked while in an RCU read-side critical section that blocks the current grace period. The children of each leaf rcu_node structure are the rcu_data structures feeding into that rcu_node structure. Finally, there is a singleton rcu_state structure that contains a ->gp_seq field that numbers the current grace period and also indicates whether or not it is in progress.

We now have enough information to state the ownership rule for the rcu_state structure's ->gp_seq field. This field may be updated only if all of the following hold:

  1. The current CPU holds the root rcu_node structure's ->lock.
  2. All rcu_node structures' ->qsmask fields are zero.
  3. All rcu_node structures' ->gp_tasks pointers are NULL.

This might seem excessively ornate, but it is the natural consequence of RCU's semantics and design:

  1. The next grace period is not permitted to start until after the previous one has ended. This is a design choice that allows RCU to function well in the face of update-side overload from large numbers of CPUs.
  2. A grace period is not allowed to end until all CPUs have been observed in a quiescent state, that is, until all rcu_node structures' ->qsmask fields have become zero.
  3. A grace period is not allowed to end until all tasks that were preempted while executing within a critical section blocking that grace period have resumed and exited their critical sections, that is, until all rcu_node structures' ->gp_tasks pointers have become NULL.
  4. If multiple CPUs would like to start a grace period at the same time, there has to be something that works out which CPU is actually going to do the starting, and the last part of that something is the root rcu_node structure's ->lock.

Trust me, there are far more complex ownership models in the Linux kernel!

The fourth example involves per-CPU data that is protected by a reader-writer lock. A CPU is permitted to both read and update its own data only if it read-holds the lock. A CPU is permitted to read other CPUs' data only if it write-holds the lock. That is right, CPUs are allowed to write when read-holding the lock and and are allowed to read while write-holding the lock. Perhaps reader-writer locks should instead have been called exclusive-shared locks, but it is some decades too late now!

The fifth and final example involves multiple locks, for example, when some readers must traverse the data structure from an interrupt handler and others must sleep while traversing that same structure. One way to implement this is to have one interrupt-disabled spinlock (for readers in interrupt handlers) and a mutex (for readers that must sleep during the traversal). Updaters must then hold both locks.

So how on earth does the current C-language Linux kernel code keep all this straight???

One indispensable tool is the assertion, which in the Linux kernel includes WARN(), BUG(), and friends. For example, RCU uses these to verify the values of the aforementioned ->qsmask and ->gp_tasks fields during between-grace-periods traversals of the rcu_node combining tree.

Another indispensable tool is lockdep, which, in addition to checking for deadlocks, allows code to verify that conditions are right. A few of the many available lockdep assertions are shown below:

  1. lockdep_assert_held(): Complains if the specified lock is not held by the current CPU/task.
  2. lockdep_assert_held_write(): Complains if the specified reader-writer lock is not write-held by the current CPU/task.
  3. lockdep_assert_held_read():: Complains if the specified reader-writer lock is not read-held by the current CPU/task.
  4. lockdep_assert_none_held_once(): Complains if the current CPU/task holds any locks.
  5. lockdep_assert_irqs_disabled(): Complains if the current CPU has interrupts enabled.
  6. rcu_read_lock_held(): Complains if the current CPU/task is not within an RCU read-side critical section.

Of course, none of these assertions are going to do anything for you unless you make use of them. On the other hand, some of the more complex ownership criteria are going to be quite difficult for compilers or other tooling to intuit without significant help from the developer.

A more entertaining ownership scenario is described in Section 5.4.6 ("Applying Exact Limit Counters") of perfbook. Chapter 8 ("Data Ownership") describes several other ownership scenarios.

HistoryOctober 12, 2021: Self-review.

Paul E. Mc Kenney: Can Rust Code Own RCU?

3 év 10 hónap óta
Read-copy update (RCU) vaguely resembles a reader-writer lock [1], but one in which readers do not exclude writers. This change in semantic permits RCU readers to be exceedingly fast and scalable. In fact, in the most aggressive case, rcu_read_lock() and rcu_read_unlock() generate no code, and rcu_dereference() emits but a single load instruction. This most aggressive case is achieved in production via Linux-kernel CONFIG_PREEMPT_NONE=y builds. Additional information on RCU is presented at the end of this post.

Can Rust Ownership be Adapted to RCU?The fact that RCU readers do not exclude writers opens the door to data races between RCU readers and updaters. This in turn suggests that least some RCU use cases are likely to pose challenges to Rust's ownership semantics, however, discussions with Wedson Almeida Filho at Linux Plumbers Conference suggested that these semantics might be flexed a little bit, perhaps in a manner similar to the way that Linux-kernel RCU flexes lockdep's semantics. In addition, the data races between the read-side rcu_dereference() primitive and the update-side rcu_assign_pointer() primitive might reasonably be resolved by having the Rust implementation of these two primitives use unsafe mode. Or, perhaps better yet, the Rust implementation of these two primitives might simply wrapper the Linux-kernel primitives in order to get the benefit of existing tools such as the aforementioned lockdep and the sparse static-analysis checker. One downside of this approach is potential performance degradation due to the extra function call implied by the wrappering. Longer term, LTO might eliminate this function call, but if the initial uses of Rust are confined to performance-insensitive device drivers, the extra overhead should not be a problem.

Linux-Kernel Tooling and RCUHere are a few examples of these Linux-kernel tools:

  1. A pointer can be marked __rcu, which will cause the sparse static-analysis checker to complain if that pointer is accessed directly, instead of via rcu_dereference() and rcu_assign_pointer(). This checking can help developers avoid accidental destruction of the address and data dependencies on which many RCU use cases depend.
  2. In most RCU use cases, the rcu_dereference() primitive that accesses RCU-protected pointers must itself be protected by rcu_read_lock(). The Linux-kernel lockdep facility has been therefore adapted to complain about unprotected uses of rcu_dereference().
  3. Passing the same pointer twice in quick succession to a pair of call_rcu() invocations is just as bad as doing the same with a pair of kfree() invocations: Both situations are a double free. The Linux kernel's debug-objects facility checks for this sort of abuse.

This is by no means a complete list. Although I am by no means claiming that these sorts of tools provide the same degree of safety as does Rust safe mode, it is important to understand that Rust unsafe mode is not to be compared to standard C, but rather to the dialect of C used in the Linux kernel, augmented by the associated tools, processes, and coding guidelines.

RCU Use CasesThere are in fact Rust implementations of variants of RCU, including:

  1. crossbeam-rs
  2. ArcSwapAny
I currently have no opinion on whether or not these could be used within the Linux kernel, and any such opinion is irrelevant. You see, Rust's use of RCU within the Linux kernel would need to access RCU-protected data structures provided by existing C code. In this case, it will not help for Rust code to define its own RCU. It must instead interoperate with the existing Linux-kernel RCU implementations.

But even interoperating with Linux-kernel RCU is not trivial. To help with this task, this section presents a few RCU use cases within the Linux kernel.

The first use case is the textbook example where once an RCU-protected object is exposed to readers, its data fields remain constant. This approach allows Rust's normal read-only mode of ownership to operate as intended. The pointers will of course change as objects are inserted or deleted, but these are handled by rcu_dereference() and rcu_assign_pointer() and friends. These special primitives might be implemented using unsafe Rust code or C code, as the case might be.

In-place updates of RCU-protected objects are sometimes handled using the list_replace_rcu() primitive. In this use case, a new version of the object is allocated, the old version is copied to the new version, any required updates are carried out on the new version, and then list_replace_rcu() is used to make the new version available to RCU readers. Readers then see either the old version or the new version, but either way they see an object with constant data fields, again allowing Rust ownership to work as intended.

However, one complication arises for objects removed from an RCU-protected data structure. Such objects are not owned by the updater until after a grace period elapses, and they can in fact be accessed by readers in the meantime. It is not clear to me how Rust would handle this. For purposes of comparison, within the Linux kernel, the Kernel Concurrency Sanitizer (KCSAN) handles this situation using dynamic runtime checking. With this checking enabled, when an updater takes full ownership of a structure before readers are done with it, KCSAN reports a data race. These issues should be most immediately apparent to anyone attempting to create a Rust wrapper for the call_rcu() function.

Because RCU readers do not exclude RCU updaters, it is possible for an RCU reader to upgrade itself to an updater while traversing an RCU-protected data structure. This is usually handled by a lock embedded in the object itself. From what I understand, the easiest way to apply Rust ownership to these situations is to make the RCU-protected object contain a structure that in turn contains the data protected by the lock. People wishing to apply Rust to these sorts of situations are invited to review the Linux-kernel source code to check for possible complications, for example, cases where readers locklessly read fields that are updated under the protection of the lock, and cases where multiple locks provide one type of protection or another to overlapping subsets of fields in the RCU-protected object.

Again because RCU readers do not exclude RCU updaters, there can also be lockless in-place updates (either from readers or updaters) to RCU-protected objects, for example, to set flags, update statistics, or to provide type-safe memory for any number of lockless algorithms (for example, using SLAB_TYPESAFE_BY_RCU). Rust could presumably use appropriate atomic or volatile operations in this case.

One important special case of RCU readers locklessly reading updated-in-place data is the case where readers cannot be allowed to operate on an object that has been removed from the RCU-protected data structure. These situations normally involve a per-object flag and lock. Updaters acquire the object's lock, remove the object from the structure, set the object's flag, and release the lock. Readers check the flag, and if the flag is set, pretend that they did not find the corresponding object. This class of use cases illustrate how segregating read-side and update-side fields of an RCU-protected object can be non-trivial.

RCU is sometimes used to protect combinations of data structures, sometimes involving nested RCU readers. In some cases, a given RCU read-side critical section might span a great many functions across several translation units.

It is not unusual for RCU use cases to include a search function that is invoked both by readers and updaters. This function will therefore sometimes be within an RCU read-side critical section and other times be protected by the update-side lock (or by some other update-side synchronization mechanism).

Sequence locking is sometimes used in conjunction with RCU, so that RCU protects traversal of the data structure and sequence locking detects updates profound enough to cause problems for the RCU traversals. The poster boy for this use case is in the Linux kernel's directory-entry cache. In this case, certain sequences of rename operations could fool readers into traversing pathnames that never actually existed. Using the sequence lock named rename_lock to protect such traversals allows readers to reject such bogus pathnames. More information on the directory-entry cache is available in Neil Brown's excellent Linux Weekly News series (part 1, part 2, part 3). One key take-away of this use case is that sequence locking is sometimes used to protect arbitrarily large data structures.

Sequence locking can be used to easily provide readers with a consistent view of a data structure, for a very wide range of definitions of "consistent". However, all of these techniques allow updaters to block readers. There are a number of other approaches to read-side consistency both within and across data structures, including the Issaquah Challenge.

One final use case is phased state change, where RCU readers check a state variable and take different actions depending on the current state. Updaters can change the state variable, but must wait for the completion of all readers that might have seen the old value before taking further action. The rcu-sync functionality (rcu_sync_init() and friends) implements a variant of RCU-protected phased state change. This use case is notable in that there are no linked data structures and nothing is ever freed.

This is by no means an exhaustive list of RCU use cases. RCU has been in the Linux kernel for almost 20 years, and during that time kernel developers have come up with any number of new ways of applying RCU to concurrency problems.

Rust RCU OptionsThe most straightforward approach is to provide Rust wrappers for the relevant C-language RCU API members. As noted earlier, wrappering the low-overhead read-side API members will introduce function-call overhead in non-LTO kernels, but this should not be a problem for performance-insensitive device drivers. However, fitting RCU into Rust's ownership model might not be as pretty as one might hope.

A more utopian approach would be to extend Rust's ownership model, one way or another, to understand RCU. One approach would be for Rust to gain "type safe" and "guaranteed existence" ownership modes, which would allow Rust to better diagnose abuses of the RCU API. For example, this might help with pointers to RCU-protected objects being erroneously leaked from RCU read-side critical sections.

So how might a pointer to an RCU-protected object be non-erroneously leaked from an RCU read-side critical section, you ask? One way is to acquire a reference via a reference count within that object before exiting that critical section. Another way is to acquire a lock within that object before exiting that critical section.

So what does the Linux kernel currently do to find this type of pointer leak? One approach is to enable KCSAN and build the kernel with CONFIG_RCU_STRICT_GRACE_PERIOD=y, and to run this on a small system. Nevertheless, this might be one area where Rust could improve Linux-kernel diagnostics, albeit not without effort.

For More InformationAdditional information on RCU may be found here:

  1. Linux-kernel documentation, with special emphasis on Linux-kernel RCU's requirements.
  2. The following sections of perfbook discuss other aspects of RCU:

    1. Section 9.5 ("Read-Copy Update").
    2. Section 10.3 ("Read-Mostly Data Structures").
    3. Section 13.5 ("RCU Rescues").
    4. Section 15.4.2 ("RCU [memory-ordering properties ]").
  3. The RCU API, 2019 Edition.
  4. Userspace RCU.
  5. Folly-library RCU.
  6. Section 9.6.3.3 of perfbook lists several other RCU implementations.
  7. Proposed Wording for Concurrent Data Structures: Read-Copy-Update (RCU) (C++ Working Paper).

RCU's semantics ordered by increasing formality:

  1. The "Fundamental Requirements" section of the aforementioned Linux-kernel RCU requirements. Extremely informal, but you have to start somewhere!
  2. RCU Semantics: A First Attempt. Also quite informal, but with a bit more math involved.
  3. Verifying Highly Concurrent Algorithms with Grace (extended version). Formal semantics expressed in separation logic.
  4. Frightening Small Children and Disconcerting Grown-ups: Concurrency in the Linux Kernel. Executable formal semantics expressed in Cat language. This paper also includes a proof that the traditional "wait for all pre-existing readers" semantic is equivalent to these executable formal semantics within the context of the Linux-kernel memory model.
  5. Linux-kernel memory model (LKMM), see especially the Sync-rcu and Sync-srcu relations in file linux-kernel.cat.

Furthermore, significant portions of Linux-kernel RCU have been subjected to mechanical proofs of correctness:

  1. Lihao Liang applied the C Bounded Model Checker (CBMC) to Tree RCU (paper).
  2. Michalis Kokologiannakis applied Nidhugg to Tree RCU (paper).
  3. Lance Roy applied CBMC to Classic SRCU, represented by Linux-kernel commit 418b2977b343 ("rcutorture: Add CBMC-based formal verification for SRCU"). Sadly, the scripting has not aged well.

For more information, see Verification Challenge 6.

For those interested in the userspace RCU library, there have been both manual and mechanical proofs of correctness. A manual proof of correctness of userspace RCU in the context of a linked list has also been carried out.

Endnotes[1]  At its core, RCU is nothing more nor less than a way of waiting for already-started things to finish. Within the Linux kernel, something to be waited on is delimited by rcu_read_lock() and rcu_read_unlock(). For their part, synchronize_rcu() and call_rcu() do the waiting, synchronously and asynchronously, respectively.

It turns out that you can do quite a lot with this simple capability, including approximating some reader-writer-lock use cases, emulating a number of reference-counting use cases, providing type-safe memory, and providing existence guarantees, this last sometimes being pressed into service as a poor-man's garbage collector. Section 9.3.3 ("RCU Usage") of perfbook provides more detail on these and other RCU use cases.


HistoryOctober 12, 2021: Self-review. Note that some of the comments are specific to earlier versions of this blog post.
October 13, 2021: Add RCU-protected phased state change use case.
October 18, 2021: Add read/update common code and Tasseroti citation.

[$] New features coming in Julia 1.7

3 év 10 hónap óta
Julia is an open-source programming language and ecosystem for high-performance scientific computing; its development team has made the first release candidate for version 1.7 available for testing on Linux, BSD, macOS, and Windows. Back in May, we looked at the increased performance that arrived with Julia 1.6, its last major release. In this article we describe some of the changes and new features in the language and its libraries that are coming in 1.7.
jake

[$] Rust and GCC, two different ways

3 év 10 hónap óta
Developers working in languages like C or C++ have access to two competing compilers — GCC and LLVM — either of which can usually get the job done. Rust developers, though, are currently limited to the LLVM-based rustc compiler. While rustc works well, there are legitimate reasons for developers to wish for an alternative. As it turns out, there are two different ways to compile Rust using GCC under development, though neither is ready at the moment. Developers of both approaches came to the 2021 Linux Plumbers Conference to present the status of their work.
corbet

Security updates for Monday

3 év 10 hónap óta
Security updates have been issued by Debian (apache2, fig2dev, mediawiki, plib, and qemu), Fedora (chromium, curl, kernel, kernel-headers, kernel-tools, openssh, rust-addr2line, rust-backtrace, rust-cranelift-bforest, rust-cranelift-codegen, rust-cranelift-codegen-meta, rust-cranelift-codegen-shared, rust-cranelift-entity, rust-cranelift-frontend, rust-cranelift-native, rust-cranelift-wasm, rust-gimli, rust-object, rust-wasmparser, rust-wasmtime-cache, rust-wasmtime-environ, rust-wasmtime-fiber, rust-wasmtime-types, rust-wast, rust-wat, and webkit2gtk3), Mageia (apache-mod_auth_openidc, c-ares, chromium-browser-stable, icu, libspf2, perl-DBI, python, and python-rsa), openSUSE (haproxy and opera), Oracle (kernel), SUSE (firefox and libvirt), and Ubuntu (python3.8).
ris

Kernel prepatch 5.15-rc4

3 év 10 hónap óta
The 5.15-rc4 kernel prepatch is out for testing.

One thing standing out in the diffs might be the m68k 'set_fs()' removal - not really a regression fix, but it has been pending for a while, and it turned out that the problems attributed to it were due to an entirely unrelated m68k signal handling issue. So with that fixed, we could get rid of set_fs from another architecture.

See this article for information on set_fs() and its removal.

corbet

McKenney: So You Want to Rust the Linux Kernel?

3 év 10 hónap óta
Paul McKenney has started a blog series on Rust for the Linux kernel. He has posted six of a planned 11 articles, though several are labeled as "under construction". This series focuses mostly on use cases and opportunities, rather than on any non-trivial solutions. Please note that I am not in any way attempting to dictate or limit Rust's level of ambition. I am instead noting the memory-model consequences of a few potential levels of ambition, ranging from "portions of a few drivers", "a few drivers", "some core code" and up to and including "the entire kernel". Greater levels of ambition will require greater willingness to accommodate a wider variety of LKMM [Linux-kernel memory model] requirements.

[...] These blog posts will therefore present approaches ranging upwards from trivial workarounds. But be warned that some of the high-quality approaches require profound reworking of compiler backends that have thus far failed to spark joy in the hearts of compiler writers. In addition, Rust enjoys considerable use outside of the Linux kernel, for example, as something into which to rewrite inefficient Python scripts. (A megawatt here, a megawatt there, and pretty soon you are talking about real power consumption!) Therefore, there are probably sharp limits beyond which the core Rust developers are unwilling to go.

jake

Paul E. Mc Kenney: Can Rust Code Own Sequence Locks?

3 év 10 hónap óta
What Are Sequence Locks?Sequence locks vaguely resemble reader-writer locks except that readers cannot block writers. If a writer runs concurrently with a reader, that reader is forced to retry its critical section. If a reader successfully completes its critical section (as in there were no concurrent writers), then its accesses will have been data-race free. Of course, an incessant stream of writers could starve all readers, and the Linux-kernel sequence-lock implementation provides extensions to deal with this in cases where incessant writing is a possibility. The usual approach is to instead arrange things so that writers are never incessant.

In the simple case where readers loop until they succeed, with no provisions for incessant writers, a reader looks like this:

do { seq = read_seqbegin(&test_seqlock); /* read-side access. */ } while (read_seqretry(&test_seqlock, seq));
A writer looks like this:

write_seqlock(&test_seqlock); /* Update */ write_sequnlock(&test_seqlock);
The read-side access can run concurrently with the writer's update, which is not consistent with the data-race-free nature of non-unsafe Rust. How can this be handled?

Linux-Kernel Sequence-Lock Use CasesFirst, please note that if Rust-language sequence-locking readers are to interoperate with C-language sequence-locking updaters (or vice versa), then the Rust-language and C-language implementations must be compatible. One easy way to achieve such compatibility is to provide Rust code wrappers around the C-language implementation, or perhaps better yet, around higher-level C-language functions making use of sequence locking. Of course, if Rust-language use of sequence locks only ever accesses data that is never accessed by C-language code, then Rust could provide its own implementation of sequence locking. As always, choose carefully!

Next, let's look at a few classes of sequence-locking use cases within the Linux kernel.

One common use case is the textbook case where the read-side loop picks up a few values, all within a single function. For example, sequence locking is sometimes used to fetch a 64-bit value on a 32-bit system (see the sock_read_timestamp() function in include/net/sock.h). The remainder of this section looks at some additional use cases that receive significant Linux-kernel use:

  1. Readers can gather data from multiple sources, including functions in other translation units that are invoked via function pointers. See for example the timer_cs_read() function in arch/sparc/kernel/time_32.c.
  2. Readers often perform significant computations. See for example the badblocks_check() function in block/badblocks.c.
  3. Readers can contain sequence-lock writers for that same sequence lock, as is done in the sdma_flush() function in drivers/infiniband/hw/hfi1/sdma.c. Although one could reasonably argue that this could be structured in other ways, this approach does avoid an added check, which could be valuable on fastpaths.
  4. The call to read_seqretry() is sometimes buried in a helper function, for example, in the sdma_progress() function in drivers/infiniband/hw/hfi1/sdma.h and the follow_dotdot_rcu() function in fs/namei.c.
  5. Readers sometimes use memcpy(), as might be expected by those suggesting that sequence-lock readers should clone/copy the data. See for example the neigh_hh_output() function in include/net/neighbour.h.
  6. Sequence-locking readers are often used in conjunction with RCU readers, as is described in the Can Rust Code Own RCU? posting.

Some of these use cases impose significant constraints on sequence-locking implementations. If the constraints rule out all reasonable Rust implementations, one approach would be to provide different Rust implementations for different use cases. However, it would be far better to avoid further API explosion!

Rust Sequence-Lock OptionsOne approach would be for all sequence-lock critical sections to be marked unsafe, but this removes at least some of the rationale for switching from C to Rust. In addition, some believe that this approach can pose severe Rust-syntax challenges in some use cases.

Another approach is to require all sequence-lock critical sections to use marked accesses. This avoids the data races and presumably also the need for unsafe, but it prevents the kernel concurrency sanitizer (KCSAN) from locating data races involving sequence-lock write-side critical sections. Or might, if it were not for the use of kcsan_flat_atomic_begin() and kcsan_flat_atomic_end() in sequence-locking read-side primitives and the use of kcsan_nestable_atomic_begin() and kcsan_nestable_atomic_end() in sequence-locking write-side primitives. So perhaps Rust could make use of similar hooks.

Yet another approach is to place the data referenced by readers into a separate structure and to switch back and forth between a pair of instances of such structures. This has been attempted in the Linux kernel with unsatisfactory results. In fact, in those cases where it is feasible to use multiple instances of a separate structure, Linux-kernel RCU is usually better choice.

One more approach would be creating something like a "tentatively readable" Rust ownership class that could directly handle sequence-locking read-side critical sections. For example, if a quantity computed from a read within a failed critical section were used subsequently, Rust could emit a warning. Hopefully without much in the way of false positives. (We can dream, can't we?)

One might instead search the Linux-kernel source tree for uses of sequence locking and to note that it is used only by a very few device drivers:

16 drivers/dma-buf/ 1 drivers/firmware/efi/ 5 drivers/gpu/drm/ 2 drivers/gpu/drm/amd/amdgpu/ 2 drivers/gpu/drm/i915/gem/ 15 drivers/gpu/drm/i915/gt/ 5 drivers/hwmon/ 72 drivers/infiniband/hw/hfi1/ 11 drivers/md/ 15 drivers/net/ethernet/mellanox/mlx4/ 19 drivers/net/ethernet/mellanox/mlx5/core/lib/
These device drivers (or at least those portions of them that make use of sequence locking) could then be relegated to C code.

However, the use of sequence locking has been increasing, and it is likely that it will continue to increase, including within device drivers. Longer term, Rust's ownership model should therefore be extended to cover sequence locking in a natural and safe manner.

For More InformationFor more on sequence locking, see the seqlock.rst documentation. Sequence locking is also covered in Sections 9.4 ("Sequence Locking") and 13.4 ("Sequence-Locking Specials") of perfbook.

HistoryOctober 6, 2021: Updated to add sectioning, Linux-kernel use cases, and updated KCSAN commentary.
October 12, 2021: Self-review. Note that some of the comments are specific to earlier versions of this blog post.
October 13, 2021: Add a note discussing possible interoperability between Rust-language and C-language sequence locking.