Hírolvasó

Paul E. Mc Kenney: Rusting the Linux Kernel: Compiler Writers Hate Dependencies (Address/Data)

3 év 10 hónap óta
An address dependency involves a load whose return value directly or indirectly determines the address of a later load or store, which results in the earlier load being ordered before the later load or store. A data dependency involves a load whose return value directly or indirectly determines the value stored by a later store, which results in the load being ordered before the store. These are used heavily by RCU. Although they are not quite as fragile as control dependencies, compilers still do not know about them. Therefore, care is still required, as can be seen in the rcu_dereference.rst Linux-kernel coding guidelines. As with control dependencies, address and data dependencies enjoy very low overheads, but unlike control dependencies, they are used heavily in the Linux kernel via rcu_dereference() and friends.

  1. As with control dependencies, the trivial solution is to promote READ_ONCE() to smp_load_acquire(). Unlike with control dependencies, there are only a few such READ_ONCE() instances, and almost all of them are conveniently located in definitions of the rcu_dereference() family of functions. Because the rest of the kernel might not be happy with the increase in overhead due to such a promotion, it would likely be necessary to provide Rust-specific implementations of rcu_dereference() and friends.
  2. Again as with control dependencies, an even more trivial solution is to classify code containing address and data dependencies as core Linux-kernel code that is outside of Rust's scope. However, given that there are some thousands of instances of rcu_dereference() scattered across the Linux kernel, this solution might be a bit more constraining than many Rust advocates might hope for.
  3. Provide Rust wrappers for the rcu_dereference() family of primitives. This does incur additional function-call overhead, but on the other hand, if initial use of Rust is confined to performance-insensitive device drivers, this added overhead is unlikely to be problem.
  4. But if wrapper overhead nevertheless proves problematic, provide higher-level C-language functions that encapsulate the required address and data dependencies, and that do enough work that the overhead of the wrappering for Rust-language use is insignificant.
  5. And yet again as with control dependencies, the best approach from the Linux-kernel-in-Rust developer's viewpoint is for Rust to enforce the address/data-dependency code-style restrictions documented in the aforementioned rcu_dereference.rst. There is some reason to hope that this enforcement would be significantly easier than for control dependencies.
  6. Wait for compiler backends to learn about address and data dependencies. This might take some time, but there is ongoing work along these lines that is described below.
One might hope that C/C++'s memory_order_consume would correctly handle address and data dependencies, and in fact it does. Unfortunately, in all known compilers, it does so by promoting memory_order_consume to memory_order_acquire, which adds overhead just as surely as does smp_load_acquire(). There has been considerable work done over a period of some years towards remedying this situation, including these working papers:

  1. P0371R1: Temporarily discourage memory_order_consume (for some definition of "temporarily").
  2. P0098R1: Towards Implementation and Use of memory_order_consume, which reviews a number of potential remedies.
  3. P0190R4: Proposal for New memory_order_consume Definition, which selects a solution involving marking pointers carrying dependencies. Note that many Linux-kernel developers would likely demand a compiler command-line argument that caused the compiler to act as if all pointers had been so marked. Akshat Garg prototyped marked dependency-carrying pointers in gcc as part of a Google Summer of Code project.
  4. P0750R1: Consume, which proposes carrying dependencies in an extra bit associated with the pointer. This approach is much more attractive to some compiler developers, but many committee members did not love the resulting doubling of pointer sizes.
  5. P0735R1: Interaction of memory_order_consume with release sequences, which addresses an obscure C/C++ memory-model corner case
There appears to be a recent uptick in interest in a solution to this problem, so there is some hope for progress. However, much more work is needed.

More information on address and data dependencies may be found in Section 15.3.2 ("Address- and Data-Dependency Difficulties") of perfbook.

HistoryOctober 12, 2021: Self-review.

Paul E. Mc Kenney: Rusting the Linux Kernel: Compiler Writers Hate Dependencies (Control)

3 év 10 hónap óta
At the assembly language level on many weakly ordered architectures, a conditional branch acts as a very weak, very cheap, but very useful memory-barrier instruction. It orders any load whose return value feeds into the condition codes before all stores that execute after the branch instruction completes, whether the branch is taken or not. ARMv8 also has a conditional-move instruction (CSEL) that provides similar ordering.

Because the ordering properties of the conditional branch involve dependencies from the load to the branch and from the branch to the store, and because the branch is a control-flow instruction, this ordering is said to be due to a control dependency.

Because compilers do not understand them, control dependencies are quite fragile, as documented by the many cautionary tales in the Linux kernel's memory-barriers.txt documentation (search for the "CONTROL DEPENDENCIES" heading). But they are very low cost, so they are used on a few critically important fastpaths in the Linux kernel.

Rust could deal with control dependencies in a number of ways:

  1. The trivial solution is to promote the loads heading the control dependencies to smp_load_acquire(). This works, but adds instruction overhead on some architectures and needlessly limits compiler optimizations on all architectures (but to be fair, ARMv8 does exactly this when built with link-time optimizations). Another difficulty is identifying (whether manually or automatically) exactly which READ_ONCE() calls need to be promoted.
  2. An even more trivial solution is to classify code containing control dependencies as core Linux-kernel code that is outside of Rust's scope. Because there are very few uses of control dependencies in the Linux kernel, Rust would not lose much by taking this approach. In addition, there is the possibility of creating higher-level C-language primitives containing the needed control dependencies which are then wrappered for Rust-language use.
  3. The best approach from the Linux-kernel-in-Rust developer's viewpoint is for Rust to enforce the code style restrictions documented in memory-barriers.txt. However, there is some chance that this approach might prove to be non-trivial.
  4. Wait for compiler backends to learn about control dependencies. This might be a bit of a wait, especially given the difficulty even defining control dependencies within the current nomenclature of the C/C++ standards.

More information on control dependencies may be found in Section 15.3.3 ("Control-Dependency Calamities") of perfbook.

HistoryOctober 12, 2021: Self-review.

Paul E. Mc Kenney: Rusting the Linux Kernel: Atomics and Barriers and Locks, Oh My!

3 év 10 hónap óta
One way to reduce the number of occurrences of unsafe in Rust code in Linux is to push the unsafety down into atomic operations, memory barriers, and locking primitives, which are the topic of this post. But first, here are some materials describing LKMM:

  1. P0124R7: Linux-Kernel Memory Model: A C++ standards-committee working paper comparing the C/C++ memory model to LKMM.
  2. Linux Weekly News series on LKMM (Part 1 and Part 2).
  3. The infamous ASPLOS'18 paper entitled Frightening Small Children and Disconcerting Grown-ups: Concurrency in the Linux Kernel (non-paywalled), with a title-based tip of the hat to the irrepressible Mel Gorman.
  4. Chapter 15 of perfbook ("Advanced Synchronization: Memory Ordering").
  5. The Linux kernel's tools/memory-model directory, featuring an executable version of LKMM.

For all of these references, I give a big "Thank You!!!" to my co-authors.

LKMM is not the most complex memory model out there, but neither is it the simplest. In addition, it is in some ways more strict than the C/C++ memory models, which means that strict adherence to coding guidelines is required in order to prevent compiler optimizations from breaking Linux-kernel code. Many of these optimizations are not localized, but are instead scattered hither and yon throughout the compilers, including throughout the compiler backends. The optimizations in the backends are a special challenge to Rust, which seems to take the approach of layering safety on top of (or perhaps within) the compiler frontend. Later posts in this series will look at several pragmatic options available to Rust Linux-kernel code.

There is one piece of good news: Compilers are forbidden from introducing data races into code, at least not into code that is free of undefined behavior.

With all of that out of the way, let's look at Rust's options for dealing with Linux-kernel atomics and barriers and locks.

The first approach is to carefully read the P0124R7: Linux-Kernel Memory Model working paper and even more carefully follow its advice in selecting C/C++ primitives that best match Linux-kernel atomics, barriers, and locks. This approach works well for data whose definition and use is confined to Rust code, and with sufficient care and ongoing attention can also work for atomic operations and memory barriers involving data shared with C code. However, expecting Rust locking primitives to interoperate with Linux-kernel locking primitives might not be a strategy to win. It seems wise to make direct use of the existing Linux-kernel locking primitives, keeping in mind that this means properly wrappering them in order to make Rust ownership work properly. Those who doubt the wisdom of wrappering the C-language Linux-kernel locking primitives should consider the following:

  1. Linux-kernel locks are complex and highly optimized. Keeping two implementations is an excellent way to inject profound bugs into the Linux kernel.
  2. Linux-kernel locks are deeply entwined with the lockdep lock dependency checker. The data structures implementing each lock class would need to be shared between C and Rust code, which is another excellent way to inject bugs.
  3. On some architectures, Linux-kernel locks must interact with memory-mapped I/O (MMIO) accesses. Any Rust-language implementation of Linux-kernel locks must therefore be architecture-dependent and must know quite a bit about Linux-kernel MMIO.

As described in later sections, it might be useful to promote READ_ONCE() to smp_load_acquire() instead of implementing it as a volatile load. It might also be useful to promote WRITE_ONCE() to smp_store_release() instead of implementing it as a volatile store, depending on what sort of data-race analysis Rust provides for unsafe code. There is some C/C++ work in flight towards providing better definitions for volatile operations, but it is still early days for this work.

If READ_ONCE() and WRITE_ONCE() are instead to be implemented as volatile operations in Rust, please take care to check the individual architectures that are affected. DEC Alpha requires a full memory-barrier instruction at the end of READ_ONCE(), Itanium requires promotion of volatile loads to acquire loads (but this is carried out by the compiler), and ARMv8 requires READ_ONCE() to be promoted to acquire (but only in CONFIG_LTO=y builds).

Device drivers make heavy use of volatile accesses and memory barriers for MMIO accesses, and Linux-kernel device drivers are no exception. As noted earlier, some architectures require that these accesses interact with locking primitives. Furthermore, there are many device-specific special cases surrounding device control in general and MMIO in particular. Therefore, Rust-language device drivers should access the existing Linux-kernel C-language primitives rather than creating their own, especially to start with. There might well be exceptions to this rule, for example, Rust might be applied to a device driver that is only used by architectures that do not require interaction with locking primitives. But if you write driver containing Rust-language MMIO primitives, please carefully and prominently document the resulting architecture restrictions.

This suggests another approach, namely not bothering implementing any of these primitives in Rust, but rather to make direct use of the Linux-kernel implementations, as suggested earlier for locking and MMIO primitives. And again, this requires wrappering them for use by Rust code. However, such wrappering introduces another level of function call, potentially for tiny functions. Although it is expected that LTO will successfully inline tiny functions, not all of the world is yet ready for LTO. In the meantime, where feasible, developers should avoid invoking tiny C functions from Rust-language fastpaths.

This being the real world, we should expect that the Rust/C determination will need to be made on a case-by-case basis, with many devils in the details.

HistoryOctober 12, 2021: Self-review changes.
October 13, 2021: Add explicit justification for wrappering the Linux kernel's C-language locks and add a few observations about MMIO accesses

Paul E. Mc Kenney: Rust Concurrency Philosophy: A Historical Perspective

3 év 10 hónap óta
At first glance, Rust's concurrency philosophy resembles that of Sequent's DYNIX and DYNIX/ptx in the 1980s and early 1990s: "Lock data, not code" (see Jack Inman's classic USENIX'85 paper "Implementing Loosely Coupled Functions on Tightly Coupled Engines", sadly invisible to search engines). Of course, Sequent lacked Rust's automatic checking, and Sequent's software engineers made much less disciplined use of ownership than Rust fans recommend. Nevertheless, this resemblance has resulted in some comparisons of Rust with the DEC Alpha, which had a similar concurrency philosophy.

Interestingly enough, DYNIX and early versions of DYNIX/ptx used compile-time-allocated arrays for almost all of its data structures. You want your kernel to support up to N tasks? Very well, build your kernel to have its array of N task structures. This worked surprisingly well, perhaps because the important concurrent applications of that time had very predictable resource requirements, including numbers of tasks. Nevertheless, as you might expect, this did become quite the configuration nightmare. So why were arrays used in the first place?

To the best of my knowledge, the earliest published complete articulation of the reason appeared in Gamsa et al.'s landmark paper  "Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System". The key point is that you cannot protect a dynamically allocated object with a lock located within that object. The DYNIX arrays avoided deallocation (or, alternatively, provided a straightforward implementation of type-safe memory), thus allowing these objects to be protected with internal locks. Avoiding the need for global locks or reference counters was an important key to the performance and scalability prized by Sequent's customers.

This strategy worked less well when Sequent added a distributed lock manager because the required number of locks was not predictable, nor was there a useful upper bound. This problem was solved in part by the addition of RCU, which provided a high-performance and scalable means of resolving races between acquiring a given object's lock and deletion (and subsequent freeing) of that same object. Given that DEC Alpha famously had difficulty with RCU, it is only reasonable to ask how Rust will do with it. Or must the concurrency designs of those portions of the Linux kernel that are to be written in Rust be "fitted" to a Rust-language ProcRustean bed [1]? Those who prize Rust's fearless-concurrency goal above all else might reasonably argue that this ProcRustean bed is in fact a most excellent thing. However, some Linux-kernel maintainers (including this one) might in their turn reasonably argue that within the context of some portions of the Linux kernel, a proper level of fear is a very healthy thing. As is the ability to do one's job in a reasonably straightforward manner!

One way to avoid this ProcRustean bed is to use the Rust unsafe facility, and in fact "unsafe" has been the answer to a disturbingly large number of my questions about Rust [2]. However, use of this facility introduces the possibility of data races, which in turn raises the question of Rust's memory model. Within the Linux kernel, the answer to this question is of course LKMM, or perhaps some reasonable subset of LKMM.

However, in my personal experience, I have most frequently seen Rust being used to rewrite scripts that became performance problems upon being more widely deployed than expected. In some cases, these rewrites greatly improved user experience as well as performance. This means that Rust is heavily used outside of the Linux kernel, which in turn means that LKMM might not be the right answer for Rust in general, though some in the Rust community have come out strongly in favor of extending the ProcRustian bed. But this blog series is focused on Rust for the Linux kernel, so the question of the memory model for Rust in general is out of scope. Again, for Rust in the Linux kernel, some subset of LKMM is clearly the correct memory model.

Many of the following posts in this series cover ways that Rust might work with specific aspects of LKMM, including some wild speculation about how Rust's ownership model might be generalized in a manner similar to Linux-kernel's lockdep checking has been generalized for cross-released locks and for RCU. Readers wishing to learn more about non-ProcRustean concurrency designs are invited to peruse "Is Parallel Programming Hard, And, If So, What Can You Do About It?, hereinafter called "perfbook". Specific chapters and sections of the Second Edition of this book will be cited as appropriate by later posts in this series.

Endnotes[1]  Making this 1990s-style concurrency scale usually involves hashed arrays of locks. These are often deadlock-prone, but there are heavily used techniques that (mostly) avoid the deadlocks. See Section 7.1.1.6 ("Acquire Needed Locks First") in perfbook for one such technique. However, hashed arrays of locks are prone to scalability problems, especially on multi-socket systems, due to poor locality of reference. See for example Section 10.2.3 ("Hash-Table Performance") for performance results on hash tables, also in perfbook. As a result, most attempts to apply hashed arrays of locks to the Linux kernel resulted in RCU being used instead. The performance and scalability benefits of RCU (and hazard pointers) are shown in Section 10.3 ("Read-Mostly Data Structures"), again in perfbook.   [2]  But please note that Rust's unsafe code has only limited undefined-behavior unsafe superpowers:

  1. Dereferencing a raw pointer (and it is the programmer's responsibility to avoid destructive wild-pointer dereferences).
  2. Calling an unsafe function or method.
  3. Accessing or modifying a mutable static variable (and it is the programmer's responsibility to avoid destructive data races).
  4. Implementing an unsafe trait.
  5. Accessing fields of a union (and it is the programmer's responsibility to avoid accesses that invoke undefined behavior, or, alternatively, understand how the compiler at hand reacts to any undefined behavior that might be invoked).
I do find the Rust community's sharp focus on undefined-behavior-induced bugs over other types of bugs to be rather surprising. Perhaps this is because I have not had so much trouble with undefined-behavior-induced bugs. Conspiracy theorists might imagine an unholy alliance between UB-happy developers of compiler optimizations on the one hand and old-school parallel programmers wishing to turn the clock back to the 1990s on the other hand. I am happy to let such theorists imagine such things. ;-)

Besides, more recent discussions have focused on memory safety rather than the full gamut of undefined behavior.

HistoryOctober 12, 2021: Self-review. Note that some of the comments are specific to earlier versions of this blog post.
October 13, 2021: Add note on memory safety specifically rather than undefined behavior in general.

[$] How Red Hat uses GitLab for kernel development

3 év 10 hónap óta
Much of the free-software development world has adopted Git forges (such as GitHub, GitLab, or sourcehut) with enthusiasm. The kernel community has not. Reasons for that reticence vary, but one that is often heard is that these forges simply don't work well at the scale needed for the kernel project. At a Kernel-Summit session during the 2021 Linux Plumbers conference, Donald Zickus and Prarit Bhargava sought to show how Red Hat has put GitLab to good use to support its kernel team. Not only can these forges work for kernel development, they said, but moving to a forge can bring a number of advantages.
corbet

Security updates for Friday

3 év 10 hónap óta
Security updates have been issued by Debian (curl, krb5, openssl1.0, and taglib), Fedora (cifs-utils), SUSE (libqt5-qtbase and rubygem-activerecord-4_2), and Ubuntu (linux-raspi, linux-raspi-5.4 and linux-raspi2).
ris

September 30th, 2021 syspatches: some assembly might be required

3 év 10 hónap óta
Did you just run syspatch(8) and see it fail?

Here's the reason: one of the two root certificates behind the (excellent) Let's Encrypt CA service has expired. A bug in (the "legacy" verifier of) LibreSSL also contributed.

The syspatches (for OpenBSD 6.8, 032, for OpenBSD 6.9, 018) mitigate the unfortunate situation.

However, your syspatch may fail if your local mirror uses a Let's Encrypt certificate. Patch-22! In that case, the best advice may be to try a mirror that does not use a Let's Encrypt certificate just to get past this speed bump.

Read more…

Paul E. Mc Kenney: So You Want to Rust the Linux Kernel?

3 év 10 hónap óta
There has been much discussion of using the Rust language in the Linux kernel (for example, here, here, and here), at the Kangrejos Rust for Linux Workshop (here, here, and here) and 2021 Linux Plumbers Conference had a number of sessions on this topic, as did Maintainers Summit. At least two of these sessions mentioned the question of how Rust is to handle the Linux-kernel memory model (LKMM), and I volunteered to write this blog series on this topic.

This series focuses mostly on use cases and opportunities, rather than on any non-trivial solutions. Please note that I am not in any way attempting to dictate or limit Rust's level of ambition. I am instead noting the memory-model consequences of a few potential levels of ambition, ranging from "portions of a few drivers", "a few drivers", "some core code" and up to and including "the entire kernel". Greater levels of ambition will require greater willingness to accommodate a wider variety of LKMM requirements.

One could instead argue that portions or even all of the Linux kernel should instead be hammered into the Rust ownership model. On the other hand, might the rumored sudden merge of the ksmdb driver (https://lwn.net/Articles/871098/) have been due to the implicit threat of its being rewritten in Rust? [1] Nevertheless, in cases where Rust is shown to offer particularly desirable advantages, it is quite possible that Rust and some parts of the Linux kernel might meet somewhere in the middle.

These blog posts will therefore present approaches ranging upwards from trivial workarounds. But be warned that some of the high-quality approaches require profound reworking of compiler backends that have thus far failed to spark joy in the hearts of compiler writers. In addition, Rust enjoys considerable use outside of the Linux kernel, for but one example that I have personally observed, as something into which to rewrite inefficient Python scripts. (A megawatt here, a megawatt there, and pretty soon you are talking about real power consumption!) Therefore, there might well be sharp limits beyond which the core Rust developers are unwilling to go.

The remaining posts in this series (along with their modification dates) are as follows:

  1. Rust Concurrency Philosophy: A Historical Perspective (October 13, 2021)
  2. Atomics and Barriers and Locks, Oh My! (October 13, 2021)
  3. Compiler Writers Hate Dependencies (Control) (October 12, 2021)
  4. Compiler Writers Hate Dependencies (Address/Data) (October 12, 2021)
  5. Compiler Writers Hate Dependencies (OOTA) (November 12, 2021)
  6. Can Rust Code Own Sequence Locks? (October 13, 2021)
  7. Can Rust Code Own RCU? (October 18, 2021)
  8. How Much of the Kernel Can Rust Own? (October 12, 2021)
  9. Will Your Rust Code Survive the Attack of the Zombie Pointers? (October 12, 2021)
  10. Can the Kernel Concurrency Sanitizer Own Rust Code? (October 28, 2021)
  11. Summary and Conclusions (October 13, 2021)
  12. TL;DR: Memory-Model Recommendations for Rusting the Linux Kernel (October 21, 2021)
  13. Bonus Post: What Memory Model Should the Rust Language Use? (November 4, 2021)

Please note that this blog series is not a Rust tutorial. Those wanting to learn how to actually program in Rust might start here, here, here, or of course here.

Endnotes
[1]  Some in the Linux-kernel community might be happy with either outcome: (1) The threat of conversion to Rust caused people to push more code into mainline and (2) Out-of-tree code was converted to Rust by Rust advocates and then pushed into mainline. The latter case might need special care for longer-term maintenance of the resulting Rust code, but perhaps the original authors might be persuaded to declare victory, learn Rust, and maintain the code. Who knows? ;-)

HistoryOctober 8, 2021: Fix s/LInux/Linux/ typo noted by Miguel Ojeda
October 12, 2021: Self-review, including making it clear that Rust might have use cases other than rewriting inefficient scripts.
October 13, 2021: Add a link to the recommendations post.
October 22, 2021: This blog series is not a Rust tutorial.
November 3, 2021: Add post on memory model for Rust in general.

The October 12 update affected the whole series, for example, removing the "under construction" markings. Summary of significant updates to other posts:

  • The historical post grew a bit based on feedback elsewhere.
  • The sequence-locking post gained a list of Linux-kernel use cases and much else besides. Sequence locking seems to cause about as much trouble for Rust ownership as it does for the C/C++ memory model. I added my view of the properties of a best-case Rust implementation.
  • The RCU post saw a lot of change. It gained a list of Linux-kernel use cases and some additional explanations of RCU. Verbiage was added explaining the need to interface to the existing C-language Linux-kernel RCU implementation as opposed to inventing a Rust-only RCU implementation. I added my views on a best-case Rust implementations, including RCU-usage bugs that such an implementation might be able to locate that are currently difficult to find. Finally, I added a list of papers presenting RCU semantic (with varying degrees of formality) as well as papers describing mechanical proofs of correctness for significant portions of Linux-kernel RCU.
  • The zombie-pointer post gained a much more detailed description of how zombie pointers can rise from the dead.
  • The KCSAN post grew more-detailed descriptions of KCSAN integration and use. Apparently Rust is much farther down the KCSAN road than I would have expected!
  • The summary and conclusions gained more details on Linux-kernel undefined-behavior avoidance and on memory models.

Ratiu: A tale of two toolchains and glibc

3 év 10 hónap óta
Adrian Ratiu writes on the Collabora blog about the challenges that face developers trying to build the GNU C Library with the LLVM compiler.

Is it worth it to fix glibc (and other projects which support only GCC) to build with LLVM? Is it better to just replace them with alternatives already supporting LLVM? Is it best to use both GCC and LLVM, each for their respective supported projects?

This post is an exploration starting from these questions but does not attempt to give any definite answers. The intent here is to not be divisive and controversial, but to raise awareness by describing parts of the current status-quo and to encourage collaboration.

corbet

Bottomley: Linux Plumbers Conference Matrix and BBB integration

3 év 10 hónap óta
James Bottomley explains how the integration of Matrix and BigBlueButton was done for the just-concluded Linux Plumbers Conference.

One thing that emerged from our initial disaster with Matrix on the first day is that we failed to learn from the experiences of other open source conferences (i.e. FOSDEM, which used Matrix and ran into the same problems). So, an object of this post is to document for posterity what we did and how to repeat it.

corbet

[$] User-space interrupts

3 év 10 hónap óta
The term "interrupt" brings to mind a signal that originates in the hardware and which is handled in the kernel; even software interrupts are a kernel concept. But there is, it seems, a use case for enabling user-space processes to send interrupts directly to each other. An upcoming Intel processor generation includes support for this capability; at the 2021 Linux Plumbers Conference, Sohil Mehta ran a Kernel-Summit session on how Linux might support that feature.
corbet

James Bottomley: Linux Plumbers Conference Matrix and BBB integration

3 év 10 hónap óta

The recently completed Linux Plumbers Conference (LPC) 2021 used the Big Blue Button (BBB) project again as its audio/video online conferencing platform and Matrix for IM and chat. Why we chose BBB has been discussed previously. However this year we replaced RocketChat with Matrix to achieve federation, allowing non-registered conference attendees to join the chat. Also, based on feedback from our attendees, we endeavored to replace the BBB chat window with a Matrix one so anyone could see and participate in one contemporaneous chat stream within BBB and beyond. This enabled chat to be available before, during and after each session.

One thing that emerged from our initial disaster with Matrix on the first day is that we failed to learn from the experiences of other open source conferences (i.e. FOSDEM, which used Matrix and ran into the same problems). So, an object of this post is to document for posterity what we did and how to repeat it.

Integrating Matrix Chat into BBB

Most of this integration was done by Guy Lunardi.

It turns out that Chat is fairly deeply embedded into BBB, so replacing the existing chat module is hard. Fortunately, BBB also contains an embedded etherpad which is simply produced via an iFrame redirection. So what we did is to disable the BBB chat panel and replace it with a new iFrame based component that opened an embedded Matrix chat client. The client we chose was riot-embedded, which is a relatively recent project but seemed to work reasonably well. The final problem was to pass through user credentials. Up until three days before the conference, we had been happy with the embedded Matrix client simply creating a one-time numbered guest account every time it was opened, but we worried about this being a security risk and so implemented pass through login credentials at the last minute (life’s no fun unless you live dangerously).

Our custom front end for BBB (lpcfe) was created last year by Jon Corbet. It uses a fairly simple email/registration confirmation code for username/password via LDAP. The lpcfe front end Jon created is here git://git.lwn.net/lpcfe.git; it manages the whole of the conference log in process and presents the current and future sessions (with join buttons) according to the timezone of the browser viewing it.

The credentials are passed through directly using extra parameters to BBB (see commit fc3976e “Pass email and regcode through to BBB”). We eventually passed these through using a GET request. Obviously if we were using a secret password, this would be a problem, but since the password was a registration code handed out by a third party, it’s acceptable. I imagine if anyone wishes to take this work forward, add native Matrix device/session support in riot-embedded would be better.

The main change to get this working in riot-embedded is here, and the supporting patch to BBB is here.

Note that the Matrix room ID used by the client was added as an extra parameter to the flat text file that drives the conference track layout of lpcfe. All Matrix rooms were created as public (and published) so anyone going to our :lpc.events matrix domain could see and join them.

Setting up Matrix for the Conference

We used the matrix-synapse server and did a standard python venv pip install on Ubuntu of the latest tag. We created around 30+ public rooms: one for each Microconference and track of the conference and some admin and hallway rooms. We used LDAP to feed the authentication portion of lpcfe/Matrix, but we had a problem using email addresses since the standard matrix user name cannot have an ‘@’ symbol in it. Eventually we opted to transform everyone’s email to a matrix compatible form simply by replacing the ‘@’ with a ‘.’, which is why everyone in our conference appeared with ridiculously long matrix user names like @jejb.ibm.com:lpc.events

This ‘@’ to ‘.’ transformation was a huge source of problems due to the unwillingness of engineers to read instructions, so if we do this over again, we’ll do the transformation silently in the login javascript of our Matrix web client. (we did this in riot-embedded but ran out of time to do it in Element web as well).

Because we used LDAP, the actual matrix account for each user was created the first time they log into our server, so we chose at this point to use auto-join to add everyone to the 30+ LPC Matrix rooms we’d already created. This turned out to be a huge problem.

Testing our Matrix and BBB integration

We tried to organize a “Town Hall” event where we invited lots of people to test out the infrastructure we’d be using for the conference. Because we wanted this to be open, we couldn’t use the pre-registration/LDAP authentication infrastructure so Jon quickly implemented a guest mode (and we didn’t auto join anyone to any Matrix rooms other than the townhall chat).

In the end we got about 220 users to test during which time the Matrix and BBB infrastructure behaved quite well. Based on this test, we chose a 2 vCPU Linode VM for our Matrix server.

What happened on the Day

Come the Monday of the conference, the first problem we ran into was procrastination: the conference registered about 1,000 attendees, of whom, about 500 tried to log on about 5 minutes prior to the first session. Since accounts were created and rooms joined upon the first login, this is clearly a huge thundering herd problem of our own making … oops. The Matrix server itself shot up to 100% CPU on the python synapse process and simply stayed there, adding new users at a rate of about one every 30 seconds. All the chat tabs froze because logins were taking ages as well. The first thing we did was to scale the server up to a 16 CPU bare metal system, but that didn’t help because synapse is single threaded … all we got was the matrix synapse python process running at 100% one one of the CPUs, still taking 30 seconds per first log in.

Fixing the First Day problems

The first thing we realized is we had to multi-thread the synapse server. This is well known but the issue is also quite well hidden deep in the Matrix documents. It also happens that the Matrix documents are slightly incomplete. The first scaling attempt we tried: simply adding 16 generic worker apps to scale across all our physical CPUs failed because the Matrix server stopped federating and then the database crashed with “FATAL: remaining connection slots are reserved for non-replication superuser connections”.

Fixing the connection problem (alter system set max_connections = 1000;) triggered a shared memory too small issue which was eventually fixed by bumping the shared buffer segment to 8GB (alter system set shared_buffers=1024000;). I suspect these parameters were way too large, but the Linode we were on had 32GB of main memory, so fine tuning in this emergency didn’t seem a good use of time.

Fixing the worker problem was way more complex. The way Matrix works, you have to use a haproxy to redirect incoming connections to individual workers and you have to ensure that the same worker always services the same transaction (which you achieve by hashing on IP address). We got a lot of advice from FOSDEM on this aspect, but in the end, instead of using an external haproxy, we went for the built in forward proxy load balancing in nginx. The federation problem seems to be that Matrix simply doesn’t work without a federation sender. In the end, we created 15 generic workers and one each of media server, frontend server and federation sender.

Our configuration files are

once you have all the units enabled in systemd, you can then simply do systemctl start/stop matrix-synapse.target

Finally, to fix the thundering herd problem (for people who hadn’t already logged in), we ran through the entire spreadsheet of email/confirmation numbers doing an automatic login using the user management API on the server itself. At this point we had about half the accounts auto created, so this script created the rest.

emaillist=lpc2021-all-attendees.txt IFS=' ' while read first last confirmation email; do bbblogin=${email/+*@/@} matrixlogin=${bbblogin/@/.} curl -XPOST -d '{"type":"m.login.password", "user":"'${matrixlogin}'", "password":"'${confirmation}'"}' "http://localhost:8008/_matrix/client/r0/login" sleep 1 done < ${emaillist}

The lpc2021-all-attendees.txt is a tab separated text file used to drive the mass mailings to plumbers attendees, but we adapted it to log everyone in to the matrix server.

Conclusion

With the above modifications, the matrix server on a Dedicated 32GB (16 cores) Linode ran smoothly for the rest of the conference. The peak load got to 17 and the peak total CPU usage never got above 70%. Finally, the peak memory usage was around 16GB including cache (so the server was a bit over provisioned).

In the end, 878 of the 944 registered attendees logged into our BBB servers at one time or another and we got a further 100 external matrix users (who may or may not also have had a conference account).

Security updates for Thursday

3 év 10 hónap óta
Security updates have been issued by Debian (libxstream-java, uwsgi, and weechat), Fedora (libspf2, libvirt, mingw-python3, mono-tools, python-flask-restx, and sharpziplib), Mageia (gstreamer, libgcrypt, libgd, mosquitto, php, python-pillow, qtwebengine5, and webkit2), openSUSE (postgresql12 and postgresql13), SUSE (haproxy, postgresql12, postgresql13, and rabbitmq-server), and Ubuntu (commons-io and linux-oem-5.13).
ris

PostgreSQL 14 released

3 év 10 hónap óta
Version 14 of the PostgreSQL relational database manager is out.

PostgreSQL 14 brings a variety of features that help developers and administrators deploy their data-backed applications. PostgreSQL continues to add innovations on complex data types, including more convenient access for JSON and support for noncontiguous ranges of data. This latest release adds to PostgreSQL's trend on improving high performance and distributed data workloads, with advances in connection concurrency, high-write workloads, query parallelism and logical replication.

More information can be found in the release notes.

corbet

[$] Taming the BPF superpowers

3 év 10 hónap óta
Work toward the signing of BPF programs has been finding its way into recent mainline kernel releases; it is intended to improve security by limiting the BPF programs that can be successfully loaded into the kernel. As John Fastabend described in his "Watching the super powers" session at the 2021 Linux Plumbers Conference, this new feature has the potential to completely break his tools. But rather than just complain, he decided to investigate solutions; the result is an outline for an auditing mechanism that brings greater flexibility to the problem of controlling which programs can be run.
corbet

Security updates for Wednesday

3 év 10 hónap óta
Security updates have been issued by Fedora (iaito, libssh, radare2, and squashfs-tools), openSUSE (hivex, shibboleth-sp, and transfig), SUSE (python-urllib3 and shibboleth-sp), and Ubuntu (apache2, linux, linux-aws, linux-aws-hwe, linux-azure, linux-azure-4.15, linux-dell300x, linux-gcp, linux-gcp-4.15, linux-hwe, linux-kvm, linux-oracle, linux-snapdragon, and linux-hwe-5.11, linux-azure, linux-azure-5.11, linux-oracle-5.11).
ris

[$] A fork for the time-zone database?

3 év 10 hónap óta
A controversy about the handling of the Time Zone Database (tzdb) has been brewing since May, but has come to a head in recent weeks. Changes that were proposed to simplify the main database file have some consequences in terms of time-zone history and changes to the representation of some zones. Those changes have upset a number of users of the database—to the point where some have called for a fork. A September 25 release of tzdb with some, but not all, of the changes seems unlikely to resolve the conflict.
jake