Hírolvasó

[$] Multi-generational LRU: the next generation

4 év 2 hónap óta
The multi-generational LRU patch set is a significant reworking of the kernel's memory-management subsystem that promises better performance for a number of workloads; it was covered here in April. Since then, two new versions of that work have been released by developer Yu Zhao, with version 3 being posted on May 20. Some significant changes have been made since the original post, so another look is in order.
corbet

Security updates for Monday

4 év 2 hónap óta
Security updates have been issued by Debian (libx11, prosody, and ring), Fedora (ceph, glibc, kernel, libxml2, python-pip, slurm, and tpm2-tss), Mageia (bind, libx11, mediawiki, openjpeg2, postgresql, and thunderbird), openSUSE (Botan, cacti, cacti-spine, chromium, djvulibre, fribidi, graphviz, java-1_8_0-openj9, kernel, libass, libxml2, lz4, and python-httplib2), and Slackware (expat).
ris

Kernel prepatch 5.13-rc3

4 év 2 hónap óta
The third 5.13 kernel prepatch is out for testing. "It's been a very calm rc3 week, and at least in pure number of commits this is the smallest rc3 we've had in the 5.x series. Considering that the merge window was not in any way small, this is a bit surprising, but I suspect it's one of those 'not everybody sent in fixes this week' things that will rectify itself next week." This prepatch does include reverts and fixes for a long series of broken patches identified in the TAB report on the UMN mess.
corbet

David Sterba: Authenticated hashes for btrfs (part 1)

4 év 2 hónap óta

There was a request to provide authenticated hashes in btrfs, natively as one of the btrfs checksum algorithms. Sounds fun but there’s always more to it, even if this sounds easy to implement.

Johaness T. at that time in SUSE sent the patchset adding the support for SHA256 [1] with a Labs conference paper, summarizing existing solutions and giving details about the proposed implementation and use cases.

The first version of the patchset posted got some feedback, issues were found and some ideas suggested. Things have stalled a bit, but the feature is still very interesting and really not hard to implement. The support for additional checksums has provided enough support code to just plug in the new algorithm and enhance the existing interfaces to provide the key bytes. So until now I’ve assumed you know what an authenticated hash means, but for clarity and in simple terms: a checksum that depends on a key. The main point is that it’s impossible to generate the same checksum for given data without knowing the key, where impossible is used in the cryptographic-strength sense, there’s an almost zero probability doing that by chance and brute force attack is not practical.

Auth hash, fsverity

Notable existing solution for that is fsverity that works in read-only fashion, where the key is securely hidden and used only to verify that data that are read from media haven’t been tampered with. A typical use case is an OS image in your phone. But that’s not all. Images of OS appear in all sorts of boxed devices, IoT. Nowadays, with explosion of edge computing, assuring integrity of the end devices is a fundamental requirement.

Where btrfs can add some value is the read AND write support, with an authenticated hash. This brings questions around key handling, and not everybody is OK with a device that could potentially store malicious/invalid data with a proper authenticated checksum. So yeah, use something else, this is not your use case, or maybe there’s another way how to make sure the key won’t be compromised easily. This is beyond the scope of what filesystem can do, though.

As an example use case of writable filesystem with authenticated hash: detect outside tampering with on-disk data, eg. when the filesystem was unmounted. Filesystem metadata formats are public, interesting data can be located by patterns on the device, so changing a few bytes and updating the checksum(s) is not hard.

There’s one issue that was brought up and I think it’s not hard to observe anyway: there’s a total dependency on the key to verify a basic integrity of the data. Ie. without the key it’s not possible to say if the data are valid as if a basic checksum was used. This might be still useful for a read-only access to the filesystem, but absence of key makes this impossible.

Existing implementations

As was noted in the LWN discussion [2], what ZFS does, there are two checksums. One is the authenticated and one is not. I point you to the comment stating that, as I was not able to navigate far enough in the ZFS code to verify the claim, but the idea is clear. It’s said that the authenticated hash is eg. SHA512 and the plain hash is SHA256, split half/half in the bytes available for checksum. The way the hash is stored is a simple trim of the first 16 bytes of each checksum and store them consecutively. As both hashes are cryptographically strong, the first 16 bytes should provide enough strength despite the truncation. Where 16 bytes is 128 bits.

When I was thinking about that, I had a different idea how to do that. Not that copying the scheme would not work for btrfs, anything that the linux kernel crypto API provides is usable, the same is achievable. I’m not judging the decisions what hashes to use or how to do the split, it works and I don’t see a problem in the strength. Where I see potential for an improvement is performance, without sacrificing strength too much. Trade-offs.

The CPU or software implementation of SHA256 is comparably slower to checksums with hardware aids (like CRC32C instructions) or hashes designed to perform well on CPUs. That was the topic of the previous round of new hashes, so we now compete against BLAKE2b and XXHASH. There are CPUs with native instructions to calculate SHA256 and the performance improvement is noticeable, orders of magnitude better. But the support is not as widespread as eg. for CRC32C. Anyway, there’s always choice and hardware improves over time. The number of hashes may seem to explode but as long as it’s manageable inside the filesystem, we take it. And a coffee please.

Secondary hash

The checksum scheme proposed is to use a cryptographic hash and a non-cryptographic one. Given the current support for SHA256 and BLAKE2b, the cryptographic hash is given. There are two of them and that’s fine. I’m not drawing an exact parallel with ZFS, the common point for the cryptographic hash is that there are limited options and the calculation is expensive by design. This is where the non-cryptographic hash can be debated. Also I want to call it secondary hash, with obvious meaning that it’s not too important by default and comes second when the authenticated hash is available.

We have CRC32C and XXHASH to choose from. Note that there are already two hashes from the start so supporting both secondary hashes would double the number of final combinations. We’ve added XXHASH to enhance the checksum collision space from 32 bits to 64 bits. What I propose is to use just XXHASH as the secondary hash, resulting in two new hashes for the authenticated and secondary hash. I haven’t found a good reason to also include CRC32C.

Another design point was where to do the split and truncation. As the XXHASH has fixed length, this could be defined as 192 bits for the cryptographic hash and 64 bits for full XXHASH.

Here we are, we could have authenticated SHA256 accompanied by XXHASH, or the same with BLAKE2b. The checksum split also splits the decision tree what to do when the checksum partially matches. For a single checksum it’s a simple yes/no decision. The partial match is the interesting case:

  • primary (key available) hash matches, secondary does not – as the authenticated hash is hard to forge, it’s trusted (even if it’s not full length of the digest)
  • primary (key available) does not match, secondary does not – checksum mismatch for the same reason as above
  • primary (key not available) does not match, secondary does – this is the prime time for the secondary hash, the floor is yours

This leads to 4 outcomes of the checksum verification, compared to 2. A boolean type can simply represent the yes/no outcome but for two hashes it’s not that easy. It depends on the context, though I think it still should be straightforward to decide what to do that in the code. Nevertheless, this has to be updated in all calls to checksum verification and has to reflect the key availability eg. in case where the data are auto-repaired during scrub or when there’s a copy.

Performance considerations

The performance comparison should be now clear: we have the potentially slow SHA256 but fast XXHASH, for each metadata and data block, vs slow SHA512 and slow SHA256. As I reckon it’s possible to also select SHA256/SHA256 split in ZFS, but that can’t beat SHA256/XXHASH.

The key availability seems to be the key point in all that, puns notwithstanding. The initial implementation assumed for simplicity to provide the raw key bytes to kernel and to the userspace utilities. This is maybe OK for a prototype but under any circumstances can’t survive until a final release. There’s key management wired deep into linux kernel, there’s a library for the whole API and command line tools. We ought to use that. Pass the key by name, not the raw bytes.

Key management has it’s own culprits and surprises (key owned vs possessed), but let’s assume that there’s a standardized way how to obtain the key bytes from the key name. In kernel its “READ_USER_KEY_BYTES”, in userspace it’s either keyctl_read from libkeyutils or a raw syscall to keyctl. Problem solved, on the low-level. But, well, don’t try that over ssh.

Accessing a btrfs image for various reasons (check, image, restore) now needs the key to verify data or even the key itself to perform modifications (check + repair). The command line interface has to be extended for all commands that interact with the filesystem offline, ie. the image and not the mounted filesystem.

This results to a global option, like btrfs --auth-key 1234 ispect-internal dump-tree, compared to btrfs inspect-internal dump-tree --auth-key 1234. This is not finalized, but a global option is now the preferred choice.

Final words

I have a prototype, that does not work in all cases but at least passes mkfs and mount. The number of checksum verification cases got above what I was able to fix by the time of writing this. I think this has enough matter on itself so I’m pushing it out out as part 1. There are open questions regarding the command line interface and also a some kind of proof or discussion regarding attacks. Stay tuned.

References:

Brendan Gregg: What is Observability

4 év 2 hónap óta
It's a made-up computer word that my word processor decorates with a wiggly red you-can't-spell line. At least it did until I clicked "Add to Dictionary" (it got too annoying as I was writing a book on computer observability). Some people abbreviate it as o11y. Observability: The ability to observe. Observe-ability. Observability. In computer engineering we use it to describe the tools, data sources, and methods for understanding (observing!) how a technology is operating. We don't use the _real_ word "observable" since that implies the wrong thing. Imagine "observable metrics": Are there metrics that _aren't_ observable? Using observability in sentences: - What observability tools are installed? (Means: What tools exist that only read state?) - What observability does that database have? (Means: What metrics and logs does it have?) - How do you do observability? (Means: What products do you use for metrics, tracing, etc.?) - Let me try some observability first. (Means: Let me look at the system without changing it.) Wait, aren't all performance tools observability tools? No. _Experimental_ tools change the state of the system to understand it. For example, benchmarks. As an analogy, a car's dashboard is a collection of observability tools that let you understand how the car is operating (speed, rpm, temperature). A car's 0-60 mph time is an _experiment_. When I was a performance consultant I'd show up to random companies who wanted me to fix their computer performance issues. If they trusted me with a login to their production servers, I could help them a lot quicker. To get that trust I knew which tools looked but didn't touch: Which were observability tools and which were experimental tools. "I'll start with observability tools only" is something I'd say at the start of every engagement. Note that observability tools aren't completely harmless: Their execution consumes resources, usually negligible, but in some cases it's enough to perturb the target of study. This is the "observer effect." Another use of the term observability is as a reminder to switch between tool types, and not to get stuck on one. A colleague (Roch Bourbonnais from memory) once told me: "You have two hands. Observability and experimentation." It stuck with me as it also makes the point that when you're only using one type to solve a performance problem __you're working one-handed__.

CSIRO's seL4 project shut down

4 év 2 hónap óta
In 2018, LWN covered a talk by Gernot Heiser about the seL4 project, which has developed an open-source operating system for safety-critical applications and gone to the trouble of proving its correctness. Much of that work has been done at CSIRO in Australia. Heiser has announced via Twitter that CSIRO's support for this project is being shut down, with the staff being redirected to artificial-intelligence projects. Hopefully the seL4 Foundation, established in 2020, will be able to carry on this interesting work.
corbet

Perl 5.34.0 released

4 év 2 hónap óta
Version 5.34.0 of the Perl language has been released. "Perl 5.34.0 represents approximately 11 months of development since Perl 5.32.0 and contains approximately 280,000 lines of changes across 2,100 files from 78 authors." See this page for a list of changes; they include a new try/catch syntax, a new octal syntax, and many improvements to various modules.
corbet

Security updates for Friday

4 év 2 hónap óta
Security updates have been issued by Arch Linux (ceph, chromium, firefox, gitlab, hedgedoc, keycloak, libx11, mariadb, opendmarc, prosody, python-babel, python-flask-security-too, redmine, squid, and vivaldi), Debian (lz4), Fedora (ceph and python-pydantic), and openSUSE (cacti, cacti-spine).
jake

[$] Why RISC-V doesn't (yet) support KVM

4 év 2 hónap óta
The RISC-V CPU architecture has been gaining prominence for some years; its relatively open nature makes it an attractive platform on which a number of companies have built products. Linux supports RISC-V well, but there is one gaping hole: there is no support for virtualization with KVM, despite the fact that a high-quality implementation exists. A recent attempt to add that support is shining some light on a part of the ecosystem that, it seems, does not work quite as well as one would like.
corbet

Security updates for Thursday

4 év 2 hónap óta
Security updates have been issued by Fedora (cacti, cacti-spine, exif, firefox, kernel, mariadb, and thunderbird), Mageia (kernel, kernel-linus, and libxml2), openSUSE (exim and jhead), Oracle (slapi-nis and xorg-x11-server), Scientific Linux (slapi-nis and xorg-x11-server), Slackware (libX11), SUSE (djvulibre, fribidi, graphviz, grub2, libass, libxml2, lz4, python-httplib2, redis, rubygem-actionpack-4_2, and xen), and Ubuntu (pillow and python-babel).
jake

Linux Plumbers Conference: Scheduler Microconference Accepted into 2021 Linux Plumbers Conference

4 év 2 hónap óta

We are pleased to announce that the Scheduler Microconference has been accepted into the 2021 Linux Plumbers Conference! The scheduler is an important functionality of the Linux kernel, deciding what process gets to run when, where and for how long. With different topologies and workloads, it is no easy task to give the user the best experience possible. Schedulers are one of the most discussed topics on the Linux Kernel Mailing List, but many of these topics need further discussion in a conference format. Indeed, the scheduler microconference is responsible for many topics to make progress.

At last year’s meet up, the Scheduler microconference achieved the following results:

Not only were enhancements made, but the meetup also helped prove that some topics were not feasible and we do not need to spend more time on them.

This year’s topics to be discussed include:

Come and join us in the discussion of controlling what tasks get to run on your machine and when. We hope to see you there!

[$] A bunch of releases from the Pallets projects

4 év 2 hónap óta
May 11 marked a new major release for the Python-based Flask web microframework project, but Flask 2.0 was only part of the story. While the framework may be the most visible piece, it is one of a small handful of cooperating libraries that provide solutions for various web-development tasks; all are incorporated into the Pallets projects organization. For the first time, all six libraries that make up Pallets were released at the same time and each had a new major version number. In part, that new major version indicated that Python 2 support was being left behind, but there is plenty more that went into the coordinated release.
jake

Security updates for Wednesday

4 év 2 hónap óta
Security updates have been issued by Fedora (cacti, cacti-spine, exif, and hivex), Red Hat (bash, bind, bluez, brotli, container-tools:rhel8, cpio, curl, dotnet3.1, dotnet5.0, dovecot, evolution, exiv2, freerdp, ghostscript, glibc, GNOME, go-toolset:rhel8, grafana, gssdp and gupnp, httpd:2.4, idm:DL1, idm:DL1 and idm:client, ipa, kernel, kernel-rt, krb5, libdb, libvncserver, libxml2, linux-firmware, mailman:2.1, mingw packages, NetworkManager and libnma, opensc, p11-kit, pandoc, perl, pki-core:10.6 and pki-deps:10.6, poppler and evince, python-cryptography, python-lxml, python-urllib3, python27:2.7, python3, python38:3.8, qt5-qtbase, raptor2, redis:6, rh-mariadb103-mariadb and rh-mariadb103-galera, rust-toolset:rhel8, samba, sane-backends, shim, slapi-nis, spice, spice-vdagent, sqlite, squid:4, sudo, systemd, tigervnc, trousers, unbound, userspace graphics, xorg-x11, and mesa, virt:rhel and virt-devel:rhel, wpa_supplicant, and xorg-x11-server), SUSE (kernel), and Ubuntu (djvulibre, gst-plugins-base1.0, linux-raspi, linux-raspi-5.4, python-pip, and runc).
ris

Upheaval at freenode

4 év 2 hónap óta
Several readers have alerted us to some serious problems at freenode, which runs an IRC network that is popular in the free-software world. Evidently there has been a change of control within the volunteer-run organization that has led to the resignations of multiple different volunteers, at least in part due to a concern about the personal information of freenode users under the new management. "The freenode resignation FAQ" has collected a bunch of information (and links to even more resignation letters) that may help shed some light on this mess. From the FAQ: "Freenode staff have stepped down. The network that runs at freenode.org/net/com should now be assumed to be under control of a malicious party." In the meantime, many of the volunteers who resigned have formed Libera.Chat to continue the legacy of freenode. LWN will be keeping an eye on the situation, stay tuned ...
jake