Hírolvasó

Vetter: Locking Engineering Principles

3 év óta
Daniel Vetter offers some advice for developers of locking schemes in the kernel.

Validating locking by hand against all the other locking designs and nesting rules the kernel has overall is nigh impossible, extremely slow, something only few people can do with any chance of success and hence in almost all cases a complete waste of time. We need tools to automate this, and in the Linux kernel this is lockdep.

Therefore if lockdep doesn’t understand your locking design your design is at fault, not lockdep. Adjust accordingly.

corbet

Nethercote: Twenty years of Valgrind

3 év óta
Nicholas Nethercote marks the 20th anniversary of the Valgrind 1.0 release.

It’s both delightful and surreal to see that Valgrind is still in wide use today. Julian [Seward’s] original goal was to raise the bar when it came to correctness for C and C++ programs. This has clearly been a huge success. Memcheck has found countless bugs in countless programs, and is a standard part of the testing setup for many of them.

corbet

Security updates for Wednesday

3 év óta
Security updates have been issued by Debian (kernel and openjdk-17), Fedora (ceph, lua, and moodle), Oracle (java-1.8.0-openjdk), Red Hat (grafana), SUSE (git, kernel, libxml2, nodejs16, and squid), and Ubuntu (imagemagick, protobuf-c, and vim).
corbet

Daniel Vetter: Locking Engineering Principles

3 év óta

For various reasons I spent the last two years way too much looking at code with terrible locking design and trying to rectify it, instead of a lot more actual building cool things. Symptomatic that the last post here on my neglected blog is also a rant on lockdep abuse.

I tried to distill all the lessons learned into some training slides, and this two part is the writeup of the same. There are some GPU specific rules, but I think the key points should apply to at least apply to kernel drivers in general.

The first part here lays out some principles, the second part builds a locking engineering design pattern hierarchy from the most easiest to understand and maintain to the most nightmare inducing approaches.

Also with locking engineering I mean the general problem of protecting data structures against concurrent access by multiple threads and trying to ensure that each sufficiently consistent view of the data it reads and that the updates it commits won’t result in confusion. Of course it highly depends upon the precise requirements what exactly sufficiently consistent means, but figuring out these kind of questions is out of scope for this little series here.

Priorities in Locking Engineering

Designing a correct locking scheme is hard, validating that your code actually implements your design is harder, and then debugging when - not if! - you screwed up is even worse. Therefore the absolute most important rule in locking engineering, at least if you want to have any chance at winning this game, is to make the design as simple and dumb as possible.

1. Make it Dumb

Since this is the key principle the entire second part of this series will go through a lot of different locking design patterns, from the simplest and dumbest and easiest to understand, to the most hair-raising horrors of complexity and trickiness.

Meanwhile let’s continue to look at everything else that matters.

2. Make it Correct

Since simple doesn’t necessarily mean correct, especially when transferring a concept from design to code, we need guidelines. On the design front the most important one is to design for lockdep, and not fight it, for which I already wrote a full length rant. Here I will only go through the main lessons: Validating locking by hand against all the other locking designs and nesting rules the kernel has overall is nigh impossible, extremely slow, something only few people can do with any chance of success and hence in almost all cases a complete waste of time. We need tools to automate this, and in the Linux kernel this is lockdep.

Therefore if lockdep doesn’t understand your locking design your design is at fault, not lockdep. Adjust accordingly.

A corollary is that you actually need to teach lockdep your locking rules, because otherwise different drivers or subsystems will end up with defacto incompatible nesting and dependencies. Which, as long as you never exercise them on the same kernel boot-up, much less same machine, wont make lockdep grumpy. But it will make maintainers very much question why they are doing what they’re doing.

Hence at driver/subsystem/whatever load time, when CONFIG_LOCKDEP is enabled, take all key locks in the correct order. One example for this relevant to GPU drivers is in the dma-buf subsystem.

In the same spirit, at every entry point to your library or subsytem, or anything else big, validate that the callers hold up the locking contract with might_lock(), might_sleep(), might_alloc() and all the variants and more specific implementations of this. Note that there’s a huge overlap between locking contracts and calling context in general (like interrupt safety, or whether memory allocation is allowed to call into direct reclaim), and since all these functions compile away to nothing when debugging is disabled there’s really no cost in sprinkling them around very liberally.

On the implementation and coding side there’s a few rules of thumb to follow:

  • Never invent your own locking primitives, you’ll get them wrong, or at least build something that’s slow. The kernel’s locks are built and tuned by people who’ve done nothing else their entire career, you wont beat them except in bug count, and that by a lot.

  • The same holds for synchronization primitives - don’t build your own with a struct wait_queue_head, or worse, hand-roll your own wait queue. Instead use the most specific existing function that provides the synchronization you need, e.g. flush_work() or flush_workqueue() and the enormous pile of variants available for synchronizing against scheduled work items.

    A key reason here is that very often these more specific functions already come with elaborate lockdep annotations, whereas anything hand-roll tends to require much more manual design validation.

  • Finally at the intersection of “make it dumb” and “make it correct”, pick the simplest lock that works, like a normal mutex instead of an read-write semaphore. This is because in general, stricter rules catch bugs and design issues quicker, hence picking a very fancy “anything goes” locking primitives is a bad choice.

    As another example pick spinlocks over mutexes because spinlocks are a lot more strict in what code they allow in their critical section. Hence much less risk you put something silly in there by accident and close a dependency loop that could lead to a deadlock.

3. Make it Fast

Speed doesn’t matter if you don’t understand the design anymore in the future, you need simplicity first.

Speed doesn’t matter if all you’re doing is crashing faster. You need correctness before speed.

Finally speed doesn’t matter where users don’t notice it. If you micro-optimize a path that doesn’t even show up in real world workloads users care about, all you’ve done is wasted time and committed to future maintenance pain for no gain at all.

Similarly optimizing code paths which should never be run when you instead improve your design are not worth it. This holds especially for GPU drivers, where the real application interfaces are OpenGL, Vulkan or similar, and there’s an entire driver in the userspace side - the right fix for performance issues is very often to radically update the contract and sharing of responsibilities between the userspace and kernel driver parts.

The big example here is GPU address patch list processing at command submission time, which was necessary for old hardware that completely lacked any useful concept of a per process virtual address space. But that has changed, which means virtual addresses can stay constant, while the kernel can still freely manage the physical memory by manipulating pagetables, like on the CPU. Unfortunately one driver in the DRM subsystem instead spent an easy engineer decade of effort to tune relocations, write lots of testcases for the resulting corner cases in the multi-level fastpath fallbacks, and even more time handling the impressive amounts of fallout in the form of bugs and future headaches due to the resulting unmaintainable code complexity …

In other subsystems where the kernel ABI is the actual application contract these kind of design simplifications might instead need to be handled between the subsystem’s code and driver implementations. This is what we’ve done when moving from the old kernel modesetting infrastructure to atomic modesetting. But sometimes no clever tricks at all help and you only get true speed with a radically revamped uAPI - io_uring is a great example here.

Protect Data, not Code

A common pitfall is to design locking by looking at the code, perhaps just sprinkling locking calls over it until it feels like it’s good enough. The right approach is to design locking for the data structures, which means specifying for each structure or member field how it is protected against concurrent changes, and how the necessary amount of consistency is maintained across the entire data structure with rules that stay invariant, irrespective of how code operates on the data. Then roll it out consistently to all the functions, because the code-first approach tends to have a lot of issues:

  • A code centric approach to locking often leads to locking rules changing over the lifetime of an object, e.g. with different rules for a structure or member field depending upon whether an object is in active use, maybe just cached or undergoing reclaim. This is hard to teach to lockdep, especially when the nesting rules change for different states. Lockdep assumes that the locking rules are completely invariant over the lifetime of the entire kernel, not just over the lifetime of an individual object or structure even.

    Starting from the data structures on the other hand encourages that locking rules stay the same for a structure or member field.

  • Locking design that changes depending upon the code that can touch the data would need either complicated documentation entirely separate from the code - so high risk of becoming stale. Or the explanations, if there are any are sprinkled over the various functions, which means reviewers need to reacquire the entire relevant chunks of the code base again to make sure they don’t miss an odd corner cases.

    With data structure driven locking design there’s a perfect, because unique place to document the rules - in the kerneldoc of each structure or member field.

  • A consequence for code review is that to recheck the locking design for a code first approach every function and flow has to be checked against all others, and changes need to be checked against all the existing code. If this is not done you might miss a corner cases where the locking falls apart with a race condition or could deadlock.

    With a data first approach to locking changes can be reviewed incrementally against the invariant rules, which means review of especially big or complex subsystems actually scales.

  • When facing a locking bug it’s tempting to try and fix it just in the affected code. By repeating that often enough a locking scheme that protects data acquires code specific special cases. Therefore locking issues always need to be first mapped back to new or changed requirements on the data structures and how they are protected.

The big antipattern of how you end up with code centric locking is to protect an entire subsystem (or worse, a group of related subsystems) with a single huge lock. The canonical example was the big kernel lock BKL, that’s gone, but in many cases it’s just replaced by smaller, but still huge locks like console_lock().

This results in a lot of long term problems when trying to adjust the locking design later on:

  • Since the big lock protects everything, it’s often very hard to tell what it does not protect. Locking at the fringes tends to be inconsistent, and due to that its coverage tends to creep ever further when people try to fix bugs where a given structure is not consistently protected by the same lock.

  • Also often subsystems have different entry points, e.g. consoles can be reached through the console subsystem directly, through vt, tty subsystems and also through an enormous pile of driver specific interfaces with the fbcon IOCTLs as an example. Attempting to split the big lock into smaller per-structure locks pretty much guarantees that different entry points have to take the per-object locks in opposite order, which often can only be resolved through a large-scale rewrite of all impacted subsystems.

    Worse, as long as the big subsystem lock continues to be in use no one is spotting these design issues in the code flow. Hence they will slowly get worse instead of the code moving towards a better structure.

For these reasons big subsystem locks tend to live way past their justified usefulness until code maintenance becomes nigh impossible: Because no individual bugfix is worth the task to really rectify the design, but each bugfix tends to make the situation worse.

From Principles to Practice

Stay tuned for next week’s installment, which will cover what these principles mean when applying to practice: Going through a large pile of locking design patterns from the most desirable to the most hair raising complex.

[$] Docker and the OCI container ecosystem

3 év óta
Docker has transformed the way many people develop and deploy software. It wasn't the first implementation of containers on Linux, but Docker's ideas about how containers should be structured and managed were different from its predecessors. Those ideas matured into industry standards, and an ecosystem of software has grown around them. Docker continues to be a major player in the ecosystem, but it is no longer the only whale in the sea — Red Hat has also done a lot of work on container tools, and alternative implementations are now available for many of Docker's offerings.
jake

Security updates for Tuesday

3 év óta
Security updates have been issued by Debian (spip), Mageia (libtiff and logrotate), Oracle (java-1.8.0-openjdk and java-11-openjdk), SUSE (gpg2, logrotate, and phpPgAdmin), and Ubuntu (python-bottle).
corbet

Fedora to disallow CC0-licensed code

3 év óta
The Creative Commons CC0 license is essentially a public-domain declaration (or as close as is possible in jurisdictions that lack a public domain). The Fedora project has allowed the distribution of code under this license, but, as announced by Richard Fontana, that policy is changing and CC0 will no longer be allowed for code:

The reason for the change: Over a long period of time a consensus has been building in FOSS that licenses that preclude any form of patent licensing or patent forbearance cannot be considered FOSS. CC0 has a clause that says: "No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document."

Existing CC0-licensed packages may be grandfathered in, but that evidently has not yet been decided.

corbet

[$] Support for Intel's Linear Address Masking

3 év óta
A 64-bit pointer can address a lot of memory — far more than just about any application could ever need. As a result, there are bits within that pointer that are not really needed to address memory, and which might be put to other needs. Storing a few bits of metadata within a pointer is a common enough use case that multiple architectures are adding support for it at the hardware level. Intel is no exception; support for its "Linear Address Masking" (LAM) feature has been slowly making its way toward the mainline kernel.
corbet

Security updates for Monday

3 év óta
Security updates have been issued by Debian (chromium, djangorestframework, gsasl, and openjdk-11), Fedora (giflib, openssl, python-ujson, and xen), Mageia (virtualbox), SUSE (git, gpg2, java-1_7_1-ibm, java-1_8_0-ibm, java-1_8_0-openjdk, mozilla-nspr, mozilla-nss, mozilla-nss, python-M2Crypto, and s390-tools), and Ubuntu (php8.1).
jake

Debian.community domain name seized

3 év óta
The Debian project, Debian.ch, and Software in the Public Interest recently filed a WIPO action to take control of the "debian.community" domain name, which has been used by Daniel Pocock to attack the Debian project and its members. Red Hat had made a similar attempt to take control of WeMakeFedora.org earlier this year, but that attempt failed. The Debian action succeeded, though; on July 19, WIPO decided in favor of the action and ordered the domain name transferred. That domain name can no longer be used, but the attacks seem certain to continue.
corbet

Kernel prepatch 5.19-rc8

3 év óta
The 5.19-rc8 kernel prepatch is out for testing. "There's nothing really surprising in here - a few smaller fixups for the retbleed mess as expected, and the usual random one-liners elsewhere."
corbet

[$] Stuffing the return stack buffer

3 év óta
"Retbleed" is the name given to a class of speculative-execution vulnerabilities involving return instructions. Mitigations for Retbleed have found their way into the mainline kernel but, as of this writing, some remaining problems have kept them from the stable update releases. Mitigating Retbleed can impede performance severely, especially on some Intel processors. Thomas Gleixner and Peter Zijlstra think they have found a better way that bypasses the existing mitigations and misleads the processor's speculative-execution mechanisms instead.
corbet

Security updates for Friday

3 év óta
Security updates have been issued by Fedora (gnupg2, oci-seccomp-bpf-hook, suricata, and vim), Oracle (java-11-openjdk), Slackware (net), and SUSE (kernel, nodejs16, rubygem-rack, and webkit2gtk3).
jake

Six new stable kernels

3 év 1 hónap óta
The 5.15.56, 5.10.132, 5.4.207, 4.19.253, 4.14,289, and 4.9.324 stable kernels have been released. The 5.18.13 stable kernel has been delayed due to some problems found during review; 5.18.13-rc3 is out for review and is due on July 23. Note that none of these kernels has mitigations for the Retbleed vulnerabilities; those are still in the works for the stable kernels.

Update: Seemingly a day early, 5.18.13 was released on July 22.

jake

[$] Living with the Rust trademark

3 év 1 hónap óta
The intersection of free software and trademark law has not always been smooth. Free-software licenses have little to say about trademarks but, sometimes, trademark licenses can appear to take away some of the freedoms that free-software licenses grant. The Firefox browser has often been the focal point for trademark-related controversy; happily, those problems appear to be in the past now. Instead, the increasing popularity of the Rust language is drawing attention to its trademark policies.
corbet

-current has moved to 7.2-beta

3 év 1 hónap óta

With the following commit(s), Theo de Raadt (deraadt@) moved -current to version 7.2-beta:

CVSROOT: /cvs Module name: src Changes by: deraadt@cvs.openbsd.org 2022/07/20 09:12:39 Modified files: sys/conf : newvers.sh sys/sys : param.h etc/root : root.mail usr.bin/signify: signify.1 sys/arch/macppc/stand/tbxidata: bsd.tbxi Log message: move to 7.2-beta. this gets done very early, to avoid finding out version number issues close to release

Snapshots are (already) available for several platforms.

(Regular readers will know what comes next…)
This serves as an excellent reminder to upgrade snapshots frequently, test both base and ports, and report problems [plus, of course, donate!].

Game of Trees 0.74 released

3 év 1 hónap óta
For those who have been paying attention to the Game of Trees development list, there has been a lot going on with got(1). Apologies here at undeadly for having missed some release announcements!

Having written as much, got 0.74 was released on July 14th, 2022!

Release notes may be found here: https://gameoftrees.org/releases/CHANGES

The -portable release also got some attention, and those release notes may be found here: http://gameoftrees.org/releases/portable/CHANGELOG

Read more…