I had the honor of attending
this workshop, however, this is not a full report. Instead, I am reporting on what I learned and how it relates to my
So You Want to Rust the Linux Kernel? blog series that I started in 2021, and of which this post is a member. I will leave more complete coverage of a great many interesting sessions to others (for example,
here,
here, and
here), who are probably better versed in Rust than am I in any case.
Device Communication and Memory OrderingAndreas Hindborg talked about his work on a Rust-language
PCI NVMe driver for the Linux kernel x86 architecture. I focus on the driver's interaction with device firmware in main memory. As is often the case, there is a shared structure in normal memory in which the device firmware reports the status of an I/O request. This structure has a special 16-bit field that is used to report that all of the other fields are now filled in, so that the Linux device driver can now safely access them.
This of course requires some sort of memory ordering, both on the part of the device firmware and on the part of the device driver. One straightforward approach is for the driver to use smp_load_acquire() to access that 16-bit field, and only if the returned value indicates that the other fields have been filled in, access those other fields.
To the surprise of several Linux-kernel developers in attendance, Andreas's driver instead does a volatile load of the entire structure, 16-bit field and all. If the 16-bit field indicates that all the fields have been updated by the firmware, it then does a second volatile load of the entire structure and then proceeds to use the values from this second load. Apparently, the performance degradation from these duplicate accesses, though measurable, is quite small. However, given the possibility of larger structures for which the performance degradation might be more profound, Andreas was urged to come up with a more elaborate Rust construction that would permit separate access to the 16-bit field, thus avoiding the redundant memory traffic.
In a subsequent conversation, Andreas noted that the overall performance degradation of this Rust-language prototype compared to the C-language version is at most 2.61%, and that in one case there is a yet-as-unexplained performance
increase of 0.67%. Of course, given that both C and Rust are compiled, statically typed, non-garbage-collected, and handled by the same compiler back-end, it should not be a surprise that they achieve similar performance. An interesting question is whether the larger amount of information available to the Rust compiler will ever allow it to consistently provide better performance than does C.
Rust and Linux-Kernel FilesystemsWedson Almeida Filho described his work on a Rust implementations of portions of the Linux kernel's
9p filesystem. The part of this effort that caught my attention was use of the Rust language's asynchronous facilities to implement some of the required state machines and use of a Rust-language type-aware packet-decoding facility.
So what do these two features have to do with concurrency in general, let alone memory-ordering in particular?
Not much. Sure, call_rcu() is (most of) the async equivalent of synchronize_rcu(), but that should be well-known and thus not particularly newsworthy.
But these two features might have considerable influence over the success or lack thereof for the Rust-for-Linux effort.
In the past, much of the justification for use of Rust in Linux has been memory safety, based on the fact that 70% of the Linux exploits have involved memory unsafety. Suppose that Rust manages to address the entire 70%, as opposed to relegating (say) half of that to Rust-language unsafe code. Then use of Rust might at best provide a factor-of-three improvement.
Of course, a factor of three is quite good in absolute terms. However, in my experience, at least an order of magnitude is usually required to with high probability motivate the kind of large change represented by the C-to-Rust transition. Therefore, in order for me to rate the Rust-for-Linux projects success as highly probable, another factor of three is required.
One possible source of the additional factor of three might come from the Rust-for-Linux community giving neglected drivers some much-needed attention. Perhaps the 9p work would qualify, but it seems unlikely that the NVMe drivers are lacking attention.
Perhaps Rust async-based state machines and type-aware packet decoding will go some ways towards providing more of the additional factor of three. Or perhaps a single factor of three will suffice in this particular case. Who knows?
“Pinning” for the Linux Kernel in RustBenno Lossin and Gary Guo reported on their respective efforts to implement pinning in Rust for the Linux kernel (a more detailed report may be found
here).
But perhaps you, like me, are wondering just what pinning is and why the Linux kernel needs it.
It turns out that the Rust language allows the compiler great freedom to rearrange data structures by default, similar to what would happen in C-language code if each and every structure was marked with the Linux kernel's __randomize_layout directive. In addition, by default, Rust-language data can be moved at runtime.
This apparently allows additional optimizations, but it is not particularly friendly to Linux-kernel idioms that take addresses of fields in structures. Such idioms include pretty much all of Linux's linked-list code, as well as things like RCU callbacks. And who knows, perhaps this situation helps to explain recent hostility towards linked lists expressed on social media. ;-)
The Rust language accommodates these idioms by allowing data to be pinned, thus preventing the compiler from moving it around.
However, pinning data has consequences, including the fact that such pinning is visible in Rust's type system. This visibility allows Rust compilers to complain when (for example) list_for_each_entry() is passed an unpinned data structure. Such complaints are important, given that passing unpinned data to primitives expecting pinned data would result in memory unsafety. The code managing the pinning is expected to be optimized away, but there are concerns that it would make itself all too apparent during code review and debugging.
Benno's and Gary's approaches were compared and contrasted, but my guess is that additional work will be required to completely solve this problem. Some attendees would like to see pinning addressed at the Rust-language level rather than at the library level, but time will tell what approach is most appropriate.
RCU Use CasesAlthough RCU was not an official topic, for some reason it came up frequently during the hallway tracks that I participated in. Which is probably a good thing. ;-)
One question is exactly how the various RCU use cases should be represented in Rust. Rust's atomic-reference-count facility has been put forward, and it is quite possible that, in combination with liberal use of atomics, it will serve for some of the more popular use cases. Wedson suggested that
Revocable covers another group of use cases, and at first glance, it appears that Wedson might be quite right. Still others might be handled by wait-wakeup mechanisms. There are still some use cases to be addressed, but some progress is being made.
Rust and PythonSome people suggested that Rust might take over from Python, adding that Rust solves many problems that Python is said to encounter in large-scale programs, and that Rust should prove more ergonomic than Python. The first is hard to argue with, given that Python seems to be most successful in smaller-scale projects that make good use of Python's extensive libraries. The second seems quite unlikely to me, in fact, it is hard for me to imagine that anything about Rust would seem in any way ergonomic to a dyed-in-the-wool Python developer.
One particularly non-ergonomic attribute of current Rust is likely to be its libraries, which last I checked were considerably less extensive than those of Python. Although the software-engineering shortcomings of large-scale Python projects can and have motivated conversion to Rust, it appears to me that smaller Python projects making heavier use of a wider variety of libraries might need more motivation.
Automatic conversion of Python libraries, anyone? ;-)
ConclusionsI found Kangrejos 2022 to be extremely educational and thought-provoking, and am very glad that I was able to attend. I am still uncertain about Rust's prospects in the Linux kernel, but the session provided some good examples of actual work done and problems solved. Perhaps Rust's async and type-sensitive packet-decoding features will help increase its probability of success, and perhaps there are additional Rust features waiting to be applied, for example, use of Rust iterators to simplify lockdep cycle detection. And who knows? Perhaps future versions of Rust will be able to offer consistent performance advantages. Stranger things have happened!
History
September 19, 2022: Changes based on LPC and LWN reports.