Kernel Planet

Matthew Garrett: What the fuck is an SBAT and why does everyone suddenly care

11 hónap 2 hét óta
Short version: Secure Boot Advanced Targeting and if that's enough for you you can skip the rest you're welcome.

Long version: When UEFI Secure Boot was specified, everyone involved was, well, a touch naive. The basic security model of Secure Boot is that all the code that ends up running in a kernel-level privileged environment should be validated before execution - the firmware verifies the bootloader, the bootloader verifies the kernel, the kernel verifies any additional runtime loaded kernel code, and now we have a trusted environment to impose any other security policy we want. Obviously people might screw up, but the spec included a way to revoke any signed components that turned out not to be trustworthy: simply add the hash of the untrustworthy code to a variable, and then refuse to load anything with that hash even if it's signed with a trusted key.

Unfortunately, as it turns out, scale. Every Linux distribution that works in the Secure Boot ecosystem generates their own bootloader binaries, and each of them has a different hash. If there's a vulnerability identified in the source code for said bootloader, there's a large number of different binaries that need to be revoked. And, well, the storage available to store the variable containing all these hashes is limited. There's simply not enough space to add a new set of hashes every time it turns out that grub (a bootloader initially written for a simpler time when there was no boot security and which has several separate image parsers and also a font parser and look you know where this is going) has another mechanism for a hostile actor to cause it to execute arbitrary code, so another solution was needed.

And that solution is SBAT. The general concept behind SBAT is pretty straightforward. Every important component in the boot chain declares a security generation that's incorporated into the signed binary. When a vulnerability is identified and fixed, that generation is incremented. An update can then be pushed that defines a minimum generation - boot components will look at the next item in the chain, compare its name and generation number to the ones stored in a firmware variable, and decide whether or not to execute it based on that. Instead of having to revoke a large number of individual hashes, it becomes possible to push one update that simply says "Any version of grub with a security generation below this number is considered untrustworthy".

So why is this suddenly relevant? SBAT was developed collaboratively between the Linux community and Microsoft, and Microsoft chose to push a Windows update that told systems not to trust versions of grub with a security generation below a certain level. This was because those versions of grub had genuine security vulnerabilities that would allow an attacker to compromise the Windows secure boot chain, and we've seen real world examples of malware wanting to do that (Black Lotus did so using a vulnerability in the Windows bootloader, but a vulnerability in grub would be just as viable for this). Viewed purely from a security perspective, this was a legitimate thing to want to do.

(An aside: the "Something has gone seriously wrong" message that's associated with people having a bad time as a result of this update? That's a message from shim, not any Microsoft code. Shim pays attention to SBAT updates in order to avoid violating the security assumptions made by other bootloaders on the system, so even though it was Microsoft that pushed the SBAT update, it's the Linux bootloader that refuses to run old versions of grub as a result. This is absolutely working as intended)

The problem we've ended up in is that several Linux distributions had not shipped versions of grub with a newer security generation, and so those versions of grub are assumed to be insecure (it's worth noting that grub is signed by individual distributions, not Microsoft, so there's no externally introduced lag here). Microsoft's stated intention was that Windows Update would only apply the SBAT update to systems that were Windows-only, and any dual-boot setups would instead be left vulnerable to attack until the installed distro updated its grub and shipped an SBAT update itself. Unfortunately, as is now obvious, that didn't work as intended and at least some dual-boot setups applied the update and that distribution's Shim refused to boot that distribution's grub.

What's the summary? Microsoft (understandably) didn't want it to be possible to attack Windows by using a vulnerable version of grub that could be tricked into executing arbitrary code and then introduce a bootkit into the Windows kernel during boot. Microsoft did this by pushing a Windows Update that updated the SBAT variable to indicate that known-vulnerable versions of grub shouldn't be allowed to boot on those systems. The distribution-provided Shim first-stage bootloader read this variable, read the SBAT section from the installed copy of grub, realised these conflicted, and refused to boot grub with the "Something has gone seriously wrong" message. This update was not supposed to apply to dual-boot systems, but did anyway. Basically:

1) Microsoft applied an update to systems where that update shouldn't have been applied
2) Some Linux distros failed to update their grub code and SBAT security generation when exploitable security vulnerabilities were identified in grub

The outcome is that some people can't boot their systems. I think there's plenty of blame here. Microsoft should have done more testing to ensure that dual-boot setups could be identified accurately. But also distributions shipping signed bootloaders should make sure that they're updating those and updating the security generation to match, because otherwise they're shipping a vector that can be used to attack other operating systems and that's kind of a violation of the social contract around all of this.

It's unfortunate that the victims here are largely end users faced with a system that suddenly refuses to boot the OS they want to boot. That should never happen. I don't think asking arbitrary end users whether they want secure boot updates is likely to result in good outcomes, and while I vaguely tend towards UEFI Secure Boot not being something that benefits most end users it's also a thing you really don't want to discover you want after the fact so I have sympathy for it being default on, so I do sympathise with Microsoft's choices here, other than the failed attempt to avoid the update on dual boot systems.

Anyway. I was extremely involved in the implementation of this for Linux back in 2012 and wrote the first prototype of Shim (which is now a massively better bootloader maintained by a wider set of people and that I haven't touched in years), so if you want to blame an individual please do feel free to blame me. This is something that shouldn't have happened, and unless you're either Microsoft or a Linux distribution it's not your fault. I'm sorry.

comments

Matthew Garrett: Client-side filtering of private data is a bad idea

11 hónap 2 hét óta
(The issues described in this post have been fixed, I have not exhaustively researched whether any other issues exist)

Feeld is a dating app aimed largely at alternative relationship communities (think "classier Fetlife" for the most part), so unsurprisingly it's fairly popular in San Francisco. Their website makes the claim:

Can people see what or who I'm looking for?
No. You're the only person who can see which genders or sexualities you're looking for. Your curiosity and privacy are always protected.

which is based on you being able to restrict searches to people of specific genders, sexualities, or relationship situations. This sort of claim is one of those things that just sits in the back of my head worrying me, so I checked it out.

First step was to grab a copy of the Android APK (there are multiple sites that scrape them from the Play Store) and run it through apk-mitm - Android apps by default don't trust any additional certificates in the device certificate store, and also frequently implement certificate pinning. apk-mitm pulls apart the apk, looks for known http libraries, disables pinning, and sets the appropriate manifest options for the app to trust additional certificates. Then I set up mitmproxy, installed the cert on a test phone, and installed the app. Now I was ready to start.

What became immediately clear was that the app was using graphql to query. What was a little more surprising is that it appears to have been implemented such that there's no server state - when browsing profiles, the client requests a batch of profiles along with a list of profiles that the client has already seen. This has the advantage that the server doesn't need to keep track of a session, but also means that queries just keep getting larger and larger the more you swipe. I'm not a web developer, I have absolutely no idea what the tradeoffs are here, so I point this out as a point of interest rather than anything else.

Anyway. For people unfamiliar with graphql, it's basically a way to query a database and define the set of fields you want returned. Let's take the example of requesting a user's profile. You'd provide the profile ID in question, and request their bio, age, rough distance, status, photos, and other bits of data that the client should show. So far so good. But what happens if we request other data?

graphql supports introspection to request a copy of the database schema, but this feature is optional and was disabled in this case. Could I find this data anywhere else? Pulling apart the apk revealed that it's a React Native app, so effectively a framework for allowing writing of native apps in Javascript. Sometimes you'll be lucky and find the actual Javascript source there, but these days it's more common to find Hermes blobs. Fortunately hermes-dec exists and does a decent job of recovering something that approximates the original input, and from this I was able to find various lists of database fields.

So, remember that original FAQ statement, that your desires would never be shown to anyone else? One of the fields mentioned in the app was "lookingFor", a field that wasn't present in the default profile query. What happens if we perform the incredibly complicated hack of exporting a profile query as a curl statement, add "lookingFor" into the set of requested fields, and run it?

Oops.

So, point 1 is that you can't simply protect data by having your client not ask for it - private data must never be released. But there was a whole separate class of issue that was an even more obvious issue.

Looking more closely at the profile data returned, I noticed that there were fields there that weren't being displayed in the UI. Those included things like "ageRange", the range of ages that the profile owner was interested in, and also whether the profile owner had already "liked" or "disliked" your profile (which means a bunch of the profiles you see may already have turned you down, but the app simply didn't show that). This isn't ideal, but what was more concerning was that profiles that were flagged as hidden were still being sent to the app and then just not displayed to the user. Another example of this is that the app supports associating your profile with profiles belonging to partners - if one of those profiles was then hidden, the app would stop showing the partnership, but was still providing the profile ID in the query response and querying that ID would still show the hidden profile contents.

Reporting this was inconvenient. There was no security contact listed on the website or in the app. I ended up finding Feeld's head of trust and safety on Linkedin, paying for a month of Linkedin Pro, and messaging them that way. I was then directed towards a HackerOne program with a link to terms and conditions that 404ed, and it took a while to convince them I was uninterested in signing up to a program without explicit terms and conditions. Finally I was just asked to email security@, and successfully got in touch. I heard nothing back, but after prompting was told that the issues were fixed - I then looked some more, found another example of the same sort of issue, and eventually that was fixed as well. I've now been informed that work has been done to ensure that this entire class of issue has been dealt with, but I haven't done any significant amount of work to ensure that that's the case.

You can't trust clients. You can't give them information and assume they'll never show it to anyone. You can't put private data in a database with no additional acls and just rely on nobody ever asking for it. You also can't find a single instance of this sort of issue and fix it without verifying that there aren't other examples of the same class. I'm glad that Feeld engaged with me earnestly and fixed these issues, and I really do hope that this has altered their development model such that it's not something that comes up again in future.

(Edit to add: as far as I can tell, pictures tagged as "private" which are only supposed to be visible if there's a match were appropriately protected, and while there is a "location" field that contains latitude and longitude this appears to only return 0 rather than leaking precise location. I also saw no evidence that email addresses, real names, or any billing data was leaked in any way)

comments

Lucas De Marchi: Linux module dependencies

11 hónap 3 hét óta

With the imminent release of kmod 33, I thought it’d be good to have a post about the different types of module dependencies that we have in the Linux kernel and kmod. The new version adds another type, weak dependency, and as the name implies, is the weakest of all. But let’s revisit what are the other types first.


Hard (symbol) dependency

This is the first dependency that every appeared in kmod (and module-init-tools). A hard (or as some call, “symbol”) dependency occurs when your module calls or uses an exported symbol of another module. The most common way is by calling a function that is exported in another module. Example: the xe.ko calls a function ttm_bo_pin() that is provided and exported by another module, ttm.ko. Looking to the source:

$ nm build/drivers/gpu/drm/ttm/ttm.ko | grep -e "\bttm_bo_pin\b" 0000000000000bc0 T ttm_bo_pin $ nm build64/drivers/gpu/drm/xe/xe.ko | grep -e "\bttm_bo_pin\b" U ttm_bo_pin

It is not possible to insert the xe.ko module before ttm.ko and if you try via insmod (that doesn’t handle dependencies) it will fail with the kernel complaining that the ttm_bo_pin symbol is undefined.

The manual invocations to nm illustrates what the depmod tool does: it opens the modules and reads the ELF headers. Then it takes note of all the symbols required and provided by each module, creating a graph of symbol dependencies. Ultimately that leads to module dependencies: xe ➛ ttm. This is recorded in the modules.dep file and its sibling modules.dep.bin. The former is human-readable and the latter is used by libkmod, but they contain the same information: all dependencies for all the modules. Also note that each line reflects indirect dependencies: module A calls symbol from B and B calls symbol from C will lead to A depending on both B and C. Real world example:

$ cat /lib/modules/$(uname -r)/modules.dep | grep kernel/drivers/gpu/drm/xe/xe.ko.zst: kernel/drivers/gpu/drm/xe/xe.ko.zst: kernel/drivers/gpu/drm/drm_gpuvm.ko.zst kernel/drivers/gpu/drm/drm_exec.ko.zst kernel/drivers/gpu/drm/scheduler/gpu-sched.ko.zst kernel/drivers/gpu/drm/drm_buddy.ko.zst kernel/drivers/i2c/algos/i2c-algo-bit.ko.zst kernel/drivers/gpu/drm/drm_suballoc_helper.ko.zst kernel/drivers/gpu/drm/drm_ttm_helper.ko.zst kernel/drivers/gpu/drm/ttm/ttm.ko.zst kernel/drivers/gpu/drm/display/drm_display_helper.ko.zst kernel/drivers/media/cec/core/cec.ko.zst kernel/drivers/acpi/video.ko.zst kernel/drivers/platform/x86/wmi.ko.zst

So the xe.ko (with .zst extension since it’s compressed) directly or indirectly depends on drm_gpuvm.ko, drm_exec.ko, gpu-sched.ko, drm_buddy.ko, i2c-algo-bit.ko, drm_suballoc_helper.ko, drm_ttm_helper.ko, ttm.ko, drm_display_helper.ko, cec.ko, video.ko, wmi.ko. Same information, but using the kmod tools rather than looking at the raw index:

$ modinfo -F depends xe drm_display_helper,ttm,drm_gpuvm,drm_suballoc_helper,video,drm_buddy,drm_exec,drm_ttm_helper,gpu-sched,cec,i2c-algo-bit


Soft dependency

There are situations in the kernel where it’s not possible or desired to use a symbol directly from another module - they may interact by registering in a subsystem, scanning a bus etc.. In this case depmod doesn’t have enough information from the ELF file about that. Yet the user would have a more complete support if both modules were available - it may even cause failures visible to the end user and not only “partial support for features”.

The softdep implementation contains 2 parts: pre and post dependencies. The post dependencies are not very much used in practice: they instruct kmod to load another module after loading the target one.

They can come from a configuration file like e.g. /etc/modprobe.d/foo.conf or from the kernel itself by embedding that info in the module. From the kernel source this is achieved by using the macro MODULE_SOFTDEP(). Example:

lib/libcrc32c.c:MODULE_SOFTDEP("pre: crc32c");

When libkmod is loading a module it will first load, in order:

  1. hard dependencies
  2. soft pre dependencies
  3. target module
  4. soft post dependencies

Historically softdeps were also a way to (mostly) get rid of install rules, in which the configuration instructs libkmod to execute something instead of loading the module - people would add an install rule to execute something and then call modprobe again with --ignore-install to fake a dependency. That could easily lead to a runtime loop which is avoided with softdep since kmod can (and does) check for loops.


Weak dependency

After explaining the other types of dependencies, back to the new addition in kmod 33. These are very similar to pre softdep: they come either from a configuration file or embedded in the module and they express a dependency that wouldn’t cause the target module to fail to load, but that may cause the initialization to export less features or fail while initializing. There is one important difference: weak dependencies don’t cause libkmod to actually load the module. Rather the dependency information may be used by tools like dracut and other tools responsible for assembling an initrd to make the module available since it may or may not be used. Why are they called “weak”? This was a borrowed terminology from “weak symbols”: a weak symbol is there, waiting to be used, but it may or may not be, with the final decision happening in the final link stage.

With weak dependencies, hopefully some of the pre softdep embedded in the kernel may be replaced: if the target module is already doing a request_module() or in some way getting the other module to be loaded, it doesn’t need a softdep that would serialize the module load order and possibly load more modules than required.

Pete Zaitcev: Fedora Panic Canceled

1 év óta

The other day I was watching a video by Rich Jones about Fedora on RISC-V. In it, he mentions off-hand that CentOS Stream 10 inherits from Fedora 40.

I don't know how that happened anymore, but previously someone made me think that 1. there will be no more numbered releases of CentOS, which is why it is called "CentOS Stream" now, and 2. CentOS Stream is now the upstream of RHEL, replacing Fedora. I really was concerned for the future of Fedora, that was superfluous in that arrangement, you know!

But apparently, Fedora is still the trunk upstream, and CenOS Stream is only named like that. Nothing changes except CentOS is no longer a clone of RHEL, but instead RHEL is a clone of CentOS. What was all the panic for?

I made a VM at Kamatera a few months ago, and they didn't even have Fedora among images. I ended using Rocky.

Brendan Gregg: No More Blue Fridays

1 év óta

In the future, computers will not crash due to bad software updates, even those updates that involve kernel code. In the future, these updates will push eBPF code.

Friday July 19th provided an unprecedented example of the inherent dangers of kernel programming, and has been called the largest outage in the history of information technology. Windows computers around the world encountered blue-screens-of-death and boot loops, causing outages for hospitals, airlines, banks, grocery stores, media broadcasters, and more. This was caused by a config update by a security company for their widely used product that included a kernel driver on Windows systems. The update caused the kernel driver to try to read invalid memory, an error type that will crash the kernel.

For Linux systems, the company behind this outage was already in the process of adopting eBPF, which is immune to such crashes. Once Microsoft's eBPF support for Windows becomes production-ready, Windows security software can be ported to eBPF as well. These security agents will then be safe and unable to cause a Windows kernel crash.

eBPF (no longer an acronym) is a secure kernel execution environment, similar to the secure JavaScript runtime built into web browsers. If you're using Linux, you likely already have eBPF available on your systems whether you know it or not, as it was included in the kernel several years ago. eBPF programs cannot crash the entire system because they are safety-checked by a software verifier and are effectively run in a sandbox. If the verifier finds any unsafe code, the program is rejected and not executed. The verifier is rigorous -- the Linux implementation has over 20,000 lines of code -- with contributions from industry (e.g., Meta, Isovalent, Google) and academia (e.g., Rutgers University, University of Washington). The safety this provides is a key benefit of eBPF, along with heightened security and lower resource usage.

Some eBPF-based security startups (e.g., Oligo, Uptycs) have made their own statements about the recent outage, and the advantages of migrating to eBPF. Larger tech companies are also adopting eBPF for security. As an example, Cisco acquired the eBPF-startup Isovalent and has announced a new eBPF security product: Cisco Hypershield, a fabric for security enforcement and monitoring. Google and Meta already rely on eBPF to detect and stop bad actors in their fleet, thanks to eBPF's speed, deep visibility, and safety guarantees. Beyond security, eBPF is also used for networking and observability.

The worst thing an eBPF program can do is to merely consume more resources than is desirable, such as CPU cycles and memory. eBPF cannot prevent developers writing poor code -- wasteful code -- but it will prevent serious issues that cause a system to crash. That said, as a new technology eBPF has had some bugs in its management code, including a Linux kernel panic discovered by the same security company in the news today. This doesn't mean that eBPF has solved nothing, substituting a vendor's bug for its own. Fixing these bugs in eBPF means fixing these bugs for all eBPF vendors, and more quickly improving the security of everyone.

There are other ways to reduce risks during software deployment that can be employed as well: canary testing, staged rollouts, and "resilience engineering" in general. What's important about the eBPF method is that it is a software solution that will be available in both Linux and Windows kernels by default, and has already been adopted for this use case.

If your company is paying for commercial software that includes kernel drivers or kernel modules, you can make eBPF a requirement. It's possible for Linux today, and Windows soon. While some vendors have already proactively adopted eBPF (thank you), others might need a little encouragement from their paying customers. Please help raise awareness, and together we can make such global outages a lesson of the past.

Authors: Brendan Gregg, Intel; Daniel Borkmann, Isovalent; Joe Stringer, Isovalent; KP Singh, Google.

Linux Plumbers Conference: System Boot and Security Microconference CFP

1 év óta

The System Boot and Security Microconference has been a critical platform for enthusiasts and professionals working on firmware, bootloaders, system boot, and security. This year, the conference focuses on the challenges of upstreaming boot process improvements to the Linux kernel. Cryptography, an ever-evolving field, poses unique demands on secure elements and TPMs as newer algorithms are introduced and older ones are deprecated. Additionally, new hardware architectures with DRTM capabilities, such as ARM’s D-RTM specification and the increased use of fTPMs in innovative applications, add to the complexity of the task. This is the fifth time the conference has been held in the last six years.

Trusted Platform Modules (TPMs) for encrypting disks have become widespread across various distributions. This highlights the vital role that TPMs play in ensuring platform security. As the field of confidential computing continues to grow, virtual machine firmware must evolve to meet end-users’ demands, and Linux would have to leverage exposed capabilities to provide relevant security properties. Mechanisms like UEFI Secure Boot that were once limited to OEMs now empower end-users. The System Boot and Security Microconference aims to address these challenges collaboratively and transparently. We welcome talks on the following technologies that can help achieve this goal.

  • TPMs, HSMs, secure elements
  • Roots of Trust: SRTM and DRTM
  • Intel TXT, SGX, TDX
  • AMD SKINIT, SEV
  • ARM DRTM
  • Growing Attestation ecosystem
  • IMA
  • TrenchBoot, tboot
  • TianoCore EDK II (UEFI), SeaBIOS, coreboot, U-Boot, LinuxBoot, hostboot
  • Measured Boot, Verified Boot, UEFI Secure Boot, UEFI Secure Boot Advanced Targeting (SBAT)
  • shim
  • boot loaders: GRUB2, systemd-boot/sd-boot, network boot, PXE, iPXE
  • UKI
  • u-root
  • OpenBMC, u-bmc
  • legal, organizational, and other similar issues relevant to people interested in system boot and security.

If you want to participate in this microconference and have ideas to share, please use the Call for Proposals (CFP) process. Your submissions should focus on new advancements, innovations, and solutions related to firmware, bootloader, and operating system development. It’s essential to explain clearly what will be discussed, why, and what outcomes you expect from the discussion.

Edit: The submission deadline has been updated to July 14th!

Linux Plumbers Conference: Sched-Ext: The BPF extensible scheduler class Microconference CFP

1 év óta

sched_ext is a Linux kernel feature which enables implementing host-wide, safe kernel thread schedulers in BPF, and dynamically loading them at runtime. sched_ext enables safe and rapid iterations of scheduler implementations, thus radically widening the scope of scheduling strategies that can be experimented with and deployed, even in massive and complex production environments.

sched_ext was first sent to the upstream list as an RFC patch set back in November 2022. Since then, the project has evolved a great deal, both technically, as well as in the significant growth of the community of sched_ext users and contributors.

This MC is the space for the community to discuss the developments of sched_ext, its impact on the community, and to outline future strategies aimed at integrating this feature into the Linux kernel and mainstream Linux distributions.

Ideas of topics to be discussed include (but are not limited to):

  • Challenges and plans to facilitate the upstream merge of sched_ext
  • User-space scheduling (offload part / all of the scheduling from kernel to user-space)
  • Scheduling for gaming and latency-sensitive workloads
  • Scheduling & cpufreq integration
  • Distro support

While we anticipate having a schedule with existing talk proposals at the MC, we invite you to submit proposals for any topic(s) you’d like to discuss. Time permitting, we are happy to readjust the schedule for additional topics that are of relevance to the sched_ext community.

Submissions are made via LPC submission system, selecting the track Sched-Ext: The BPF extensible scheduler class.

We will consider the submissions until July 12th.

Linux Plumbers Conference: In-person registration is sold out

1 év óta

This year it took us a bit more time, but we did run out of places and the conference is currently sold out for in-person registration.
We are setting up a waitlist  for in-person registration (virtual attendee places are still available).
Please fill in this form and try to be clear about your reasons for wanting to attend.
We are giving waitlist priority to new attendees and people expected to contribute content.

Linux Plumbers Conference: Rust Microconference CFP

1 év 1 hónap óta

The Rust Microconference returns this year again. It covers both Rust in the kernel and Rust in general.

The submission deadline is July 14th. Submissions are made via the LPC submission system, selecting Rust MC for Track. Please see The Ideal Microconference Topic Session as well.

Possible Rust for Linux topics:

  • Rust in the kernel (e.g. status update, next steps).
  • Use cases for Rust around the kernel (e.g. subsystems, drivers, other modules…).
  • Discussions on how to abstract existing subsystems safely, on API design, on coding guidelines.
  • Integration with kernel systems and other infrastructure (e.g. build system, documentation, testing and CIs, maintenance, unstable
    features, architecture support, stable/LTS releases, Rust versioning, third-party crates).
  • Updates on its subprojects (e.g. klint, pinned-init).
  • Rust versioning requirements and using Linux distributions’ toolchains.

Possible Rust topics:

  • Language and standard library (e.g. upcoming features, stabilization of the remaining features the kernel needs, memory model).
  • Compilers and codegen (e.g. rustc improvements, LLVM and Rust, rustc_codegen_gcc, gccrs.
  • Other tooling and new ideas (Coccinelle for Rust, bindgen, Compiler Explorer, Cargo, Clippy, Miri).
  • Educational material.
  • Any other Rust topic within the Linux ecosystem.

Hope to see you there!

Linux Plumbers Conference: In memory of Daniel Bristot de Oliveira

1 év 1 hónap óta

It comes with great sadness that on June 24th, 2024 we lost a great contributor to the Linux Plumbers Conference and the whole of Linux generally. Daniel Bristot de Oliveira passed away unexpectedly at the age of 37. Daniel has been an active participant of Linux Plumbers since 2017. Not only has he given numerous talks, which were extremely educational, he also took leadership roles in running Microconferences. He was this year’s main Microconference runner for both the Scheduler Microconference as well as the Real-Time Microconference. This year’s conference will be greatly affected by his absence. Many have stated how Daniel made them feel welcomed at Linux Plumbers. He always had a smile, would make jokes and help developers come to a conclusion for those controversial topics. He perfectly embodied the essence of what Linux Plumbers was all about. He will be missed.

Linux Plumbers Conference: Tracing / Perf Events Microconference CFP

1 év 1 hónap óta

The Linux kernel has grown in complexity over the years. Complete understanding of how it works via code inspection has become virtually impossible. Today, tracing is used to follow the kernel as it performs its complex tasks. Tracing is used today for much more than simply debugging. Its framework has become the way for other parts of the Linux kernel to enhance and even make possible new features. Live kernel patching is based on the infrastructure of function tracing, as well as BPF. It is now even possible to model the behavior and correctness of the system via runtime verification which attaches to trace points. There is still much more that is happening in this space, and this microconference will be the forum to explore current and new ideas.

This year, focus will also be on perf events:

Perf events are a mechanism for presenting performance counters and Linux software events to users. There are kernel and userland components to perf events. The kernel supplies the APIs and the perf tooling presents the data to users.

Possible ideas for topics for this year’s conference:

  • Feedback about the tracing/perf subsystems overall (e.g. how can people help the maintainers).
  • Reboot persistent in-memory tracing buffers, this would make ftrace a very powerful debugging and performance analysis tool for kexec and could also be used for post crash debugging.
  • Handling exposing enum names dynamically to user space to improve symbolic printing.
  • Userspace instrumentation (libside), including discussion of its impacts on the User events ABI.
  • Collect state dump events from kernel drivers (e.g. dump wifi interfaces configuration at a given point in time through trace buffers).
  • Current work implementing performance monitoring in the kernel
  • User land profiling and analysis tools using the perf event API
  • Improving the kernel perf event and PMU APIs
  • Interaction between perf events and subsystems like cgroups, kvm, drm, bpf, etc.
  • Improving the perf tool and its interfaces in particular w.r.t. to scalability of the tool
  • Implementation of new perf features and tools using eBPF, like the ones in tools/perf/util/bpf_skel/
  • Further use of type information to augment the perf tools
  • Novel uses of perf events for debugging and correctness
  • New challenges in performance monitoring for the Linux kernel
  • Regression testing/CI integration for the perf kernel infrastructure and tools
  • Improving documentation
  • Security aspects of using tracing/perf tools

The submission deadline has been updated to July 12th.

Come and join us in the discussion, we hope to see you there!

Please follow the suggestions from this BLOG post when submitting a CFP for this track.

Submissions are made via LPC submission system, selecting Track “Tracing / Perf events MC

Gustavo A. R. Silva: How to use the new counted_by attribute in C (and Linux)

1 év 1 hónap óta

The counted_by attribute

The counted_by attribute was introduced in Clang-18 and will soon be available in GCC-15. Its purpose is to associate a flexible-array member with a struct member that will hold the number of elements in this array at some point at run-time. This association is critical for enabling runtime bounds checking via the array bounds sanitizer and the __builtin_dynamic_object_size() built-in function. In user-space, this extra level of security is enabled by -D_FORTIFY_SOURCE=3. Therefore, using this attribute correctly enhances C codebases with runtime bounds-checking coverage on flexible-array members.

Here is an example of a flexible array annotated with this attribute:

struct bounded_flex_struct { ... size_t count; struct foo flex_array[] __attribute__((__counted_by__(count))); };

In the above example, count is the struct member that will hold the number of elements of the flexible array at run-time. We will call this struct member the counter.

In the Linux kernel, this attribute facilitates bounds-checking coverage through fortified APIs such as the memcpy() family of functions, which internally use __builtin_dynamic_object_size() (CONFIG_FORTIFY_SOURCE). As well as through the array-bounds sanitizer (CONFIG_UBSAN_BOUNDS).

The __counted_by() macro

In the kernel we wrap the counted_by attribute in the __counted_by() macro, as shown below.

#if __has_attribute(__counted_by__) # define __counted_by(member) __attribute__((__counted_by__(member))) #else # define __counted_by(member) #endif
  • c8248faf3ca27 (“Compiler Attributes: counted_by: Adjust name…”)

And with this we have been annotating flexible-array members across the whole kernel tree over the last year.

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sched.h b/drivers/net/ethernet/chelsio/cxgb4/sched.h index 5f8b871d79afac..6b3c778815f09e 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/sched.h +++ b/drivers/net/ethernet/chelsio/cxgb4/sched.h @@ -82,7 +82,7 @@ struct sched_class { struct sched_table { /* per port scheduling table */ u8 sched_size; - struct sched_class tab[]; + struct sched_class tab[] __counted_by(sched_size); };
  • ceba9725fb45 (“cxgb4: Annotate struct sched_table with …”)

However, as we are about to see, not all __counted_by() annotations are always as straightforward as the one above.

__counted_by() annotations in the kernel

There are a number of requirements to properly use the counted_by attribute. One crucial requirement is that the counter must be initialized before the first reference to the flexible-array member. Another requirement is that the array must always contain at least as many elements as indicated by the counter. Below you can see an example of a kernel patch addressing these requirements.

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.c index dac7eb77799bd1..68960ae9898713 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.c @@ -33,7 +33,7 @@ struct brcmf_fweh_queue_item { u8 ifaddr[ETH_ALEN]; struct brcmf_event_msg_be emsg; u32 datalen; - u8 data[]; + u8 data[] __counted_by(datalen); }; /* @@ -418,17 +418,17 @@ void brcmf_fweh_process_event(struct brcmf_pub *drvr, datalen + sizeof(*event_packet) > packet_len) return; - event = kzalloc(sizeof(*event) + datalen, gfp); + event = kzalloc(struct_size(event, data, datalen), gfp); if (!event) return; + event->datalen = datalen; event->code = code; event->ifidx = event_packet->msg.ifidx; /* use memcpy to get aligned event message */ memcpy(&event->emsg, &event_packet->msg, sizeof(event->emsg)); memcpy(event->data, data, datalen); - event->datalen = datalen; memcpy(event->ifaddr, event_packet->eth.h_dest, ETH_ALEN); brcmf_fweh_queue_event(fweh, event);
  • 62d19b358088 (“wifi: brcmfmac: fweh: Add __counted_by…”)

In the patch above, datalen is the counter for the flexible-array member data. Notice how the assignment to the counter event->datalen = datalen had to be moved to before calling memcpy(event->data, data, datalen), this ensures the counter is initialized before the first reference to the flexible array. Otherwise, the compiler would complain about trying to write into a flexible array of size zero, due to datalen being zeroed out by a previous call to kzalloc(). This assignment-after-memcpy has been quite common in the Linux kernel. However, when dealing with counted_by annotations, this pattern should be changed. Therefore, we have to be careful when doing these annotations. We should audit all instances of code that reference both the counter and the flexible array and ensure they meet the proper requirements.

In the kernel, we’ve been learning from our mistakes and have fixed some buggy annotations we made in the beginning. Here are a couple of bugfixes to make you aware of these issues:

  • 6dc445c19050 (“clk: bcm: rpi: Assign ->num before accessing…”)
  • 9368cdf90f52 (“clk: bcm: dvp: Assign ->num before accessing…”)

Another common issue is when the counter is updated inside a loop. See the patch below.

diff --git a/drivers/net/wireless/ath/wil6210/cfg80211.c b/drivers/net/wireless/ath/wil6210/cfg80211.c index 8993028709ecfb..e8f1d30a8d73c5 100644 --- a/drivers/net/wireless/ath/wil6210/cfg80211.c +++ b/drivers/net/wireless/ath/wil6210/cfg80211.c @@ -892,10 +892,8 @@ static int wil_cfg80211_scan(struct wiphy *wiphy, struct wil6210_priv *wil = wiphy_to_wil(wiphy); struct wireless_dev *wdev = request->wdev; struct wil6210_vif *vif = wdev_to_vif(wil, wdev); - struct { - struct wmi_start_scan_cmd cmd; - u16 chnl[4]; - } __packed cmd; + DEFINE_FLEX(struct wmi_start_scan_cmd, cmd, + channel_list, num_channels, 4); uint i, n; int rc; @@ -977,9 +975,8 @@ static int wil_cfg80211_scan(struct wiphy *wiphy, vif->scan_request = request; mod_timer(&vif->scan_timer, jiffies + WIL6210_SCAN_TO); - memset(&cmd, 0, sizeof(cmd)); - cmd.cmd.scan_type = WMI_ACTIVE_SCAN; - cmd.cmd.num_channels = 0; + cmd->scan_type = WMI_ACTIVE_SCAN; + cmd->num_channels = 0; n = min(request->n_channels, 4U); for (i = 0; i < n; i++) { int ch = request->channels[i]->hw_value; @@ -991,7 +988,8 @@ static int wil_cfg80211_scan(struct wiphy *wiphy, continue; } /* 0-based channel indexes */ - cmd.cmd.channel_list[cmd.cmd.num_channels++].channel = ch - 1; + cmd->num_channels++; + cmd->channel_list[cmd->num_channels - 1].channel = ch - 1; wil_dbg_misc(wil, "Scan for ch %d : %d MHz\n", ch, request->channels[i]->center_freq); } @@ -1007,16 +1005,15 @@ static int wil_cfg80211_scan(struct wiphy *wiphy, if (rc) goto out_restore; - if (wil->discovery_mode && cmd.cmd.scan_type == WMI_ACTIVE_SCAN) { - cmd.cmd.discovery_mode = 1; + if (wil->discovery_mode && cmd->scan_type == WMI_ACTIVE_SCAN) { + cmd->discovery_mode = 1; wil_dbg_misc(wil, "active scan with discovery_mode=1\n"); } if (vif->mid == 0) wil->radio_wdev = wdev; rc = wmi_send(wil, WMI_START_SCAN_CMDID, vif->mid, - &cmd, sizeof(cmd.cmd) + - cmd.cmd.num_channels * sizeof(cmd.cmd.channel_list[0])); + cmd, struct_size(cmd, channel_list, cmd->num_channels)); out_restore: if (rc) { diff --git a/drivers/net/wireless/ath/wil6210/wmi.h b/drivers/net/wireless/ath/wil6210/wmi.h index 71bf2ae27a984f..b47606d9068c8b 100644 --- a/drivers/net/wireless/ath/wil6210/wmi.h +++ b/drivers/net/wireless/ath/wil6210/wmi.h @@ -474,7 +474,7 @@ struct wmi_start_scan_cmd { struct { u8 channel; u8 reserved; - } channel_list[]; + } channel_list[] __counted_by(num_channels); } __packed; #define WMI_MAX_PNO_SSID_NUM (16)
  • 34c34c242a1b (“wifi: wil6210: cfg80211: Use __counted_by…”)

The patch above does a bit more than merely annotating the flexible array with the __counted_by() macro, but that’s material for a future post. For now, let’s focus on the following excerpt.

- cmd.cmd.scan_type = WMI_ACTIVE_SCAN; - cmd.cmd.num_channels = 0; + cmd->scan_type = WMI_ACTIVE_SCAN; + cmd->num_channels = 0; n = min(request->n_channels, 4U); for (i = 0; i < n; i++) { int ch = request->channels[i]->hw_value; @@ -991,7 +988,8 @@ static int wil_cfg80211_scan(struct wiphy *wiphy, continue; } /* 0-based channel indexes */ - cmd.cmd.channel_list[cmd.cmd.num_channels++].channel = ch - 1; + cmd->num_channels++; + cmd->channel_list[cmd->num_channels - 1].channel = ch - 1; wil_dbg_misc(wil, "Scan for ch %d : %d MHz\n", ch, request->channels[i]->center_freq); } ... --- a/drivers/net/wireless/ath/wil6210/wmi.h +++ b/drivers/net/wireless/ath/wil6210/wmi.h @@ -474,7 +474,7 @@ struct wmi_start_scan_cmd { struct { u8 channel; u8 reserved; - } channel_list[]; + } channel_list[] __counted_by(num_channels); } __packed;

Notice that in this case, num_channels is our counter, and it’s set to zero before the for loop. Inside the for loop, the original code used this variable as an index to access the flexible array, then updated it via a post-increment, all in one line: cmd.cmd.channel_list[cmd.cmd.num_channels++]. The issue is that once channel_list was annotated with the __counted_by() macro, the compiler enforces dynamic array indexing of channel_list to stay below num_channels. Since num_channels holds a value of zero at the moment of the array access, this leads to undefined behavior and may trigger a compiler warning.

As shown in the patch, the solution is to increment num_channels before accessing the array, and then access the array through an index adjustment below num_channels.

Another option is to avoid using the counter as an index for the flexible array altogether. This can be done by using an auxiliary variable instead. See an excerpt of a patch below.

diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h index 38eb7ec86a1a65..21ebd70f3dcc97 100644 --- a/include/net/bluetooth/hci.h +++ b/include/net/bluetooth/hci.h @@ -2143,7 +2143,7 @@ struct hci_cp_le_set_cig_params { __le16 c_latency; __le16 p_latency; __u8 num_cis; - struct hci_cis_params cis[]; + struct hci_cis_params cis[] __counted_by(num_cis); } __packed; @@ -1722,34 +1717,33 @@ static int hci_le_create_big(struct hci_conn *conn, struct bt_iso_qos *qos) static int set_cig_params_sync(struct hci_dev *hdev, void *data) { ... + u8 aux_num_cis = 0; u8 cis_id; ... for (cis_id = 0x00; cis_id < 0xf0 && - pdu.cp.num_cis < ARRAY_SIZE(pdu.cis); cis_id++) { + aux_num_cis < pdu->num_cis; cis_id++) { struct hci_cis_params *cis; conn = hci_conn_hash_lookup_cis(hdev, NULL, 0, cig_id, cis_id); @@ -1758,7 +1752,7 @@ static int set_cig_params_sync(struct hci_dev *hdev, void *data) qos = &conn->iso_qos; - cis = &pdu.cis[pdu.cp.num_cis++]; + cis = &pdu->cis[aux_num_cis++]; cis->cis_id = cis_id; cis->c_sdu = cpu_to_le16(conn->iso_qos.ucast.out.sdu); cis->p_sdu = cpu_to_le16(conn->iso_qos.ucast.in.sdu); @@ -1769,14 +1763,14 @@ static int set_cig_params_sync(struct hci_dev *hdev, void *data) cis->c_rtn = qos->ucast.out.rtn; cis->p_rtn = qos->ucast.in.rtn; } + pdu->num_cis = aux_num_cis; ...
  • ea9e148c803b (“Bluetooth: hci_conn: Use __counted_by() and…”)

Again, the entire patch does more than merely annotate the flexible-array member, but let’s just focus on how aux_num_cis is used to access flexible array pdu->cis[].

In this case, the counter is num_cis. As in our previous example, originally, the counter is used to directly access the flexible array: &pdu.cis[pdu.cp.num_cis++]. However, the patch above introduces a new variable aux_num_cis to be used instead of the counter: &pdu->cis[aux_num_cis++]. The counter is then updated after the loop: pdu->num_cis = aux_num_cis.

Both solutions are acceptable, so use whichever is convenient for you.

Here, you can see a recent bugfix for some buggy annotations that missed the details discussed above:

  • [PATCH] wifi: iwlwifi: mvm: Fix _counted_by usage in cfg80211_wowlan_nd*

In a future post, I’ll address the issue of annotating flexible arrays of flexible structures. Spoiler alert: don’t do it!

Linux Plumbers Conference: Microconference topic submissions deadlines are coming soon!

1 év 1 hónap óta

We are excited about the submissions that are coming in to Linux Plumbers 2024. If you want to discuss a topic at one of the Microconferences, you should start putting together a problem statement and submit. Each Microconference has its own defined deadline. To submit, go to the Call for Proposals page and select Submit new abstract. After filling out your problem statement in the Content section, make sure to select the proper Microconference in the Track pull down list. It is recommended to read this blog before writing up your submission.

And a reminder that the other tracks submissions are ending soon as well.

Matthew Garrett: SSH agent extensions as an arbitrary RPC mechanism

1 év 1 hónap óta
A while back, I wrote about using the SSH agent protocol to satisfy WebAuthn requests. The main problem with this approach is that it required starting the SSH agent with a special argument and also involved being a little too friendly with the implementation - things worked because I could provide an arbitrary public key and the implementation never validated that, but it would be legitimate for it to start doing so and then break everything. And it also only worked for keys stored on tokens that ssh supports - there was no way to extend this to other keystores on the client (such as the Secure Enclave on Macs, or TPM-backed keys on PCs). I wanted a better solution.

It turns out that it was far easier than I expected. The ssh agent protocol is documented here, and the interesting part is the extension support extension mechanism. Basically, you can declare an extension and then just tunnel whatever you want over it. As before, my goto was the go ssh agent package which conveniently implements both the client and server side of this. Implementing the local agent is trivial - look up SSH_AUTH_SOCK, connect to it, create a new agent client that can communicate with that by calling NewClient, and then implement the ExtendedAgent interface, create a new socket, and call ServeAgent against that. Most of the ExtendedAgent functions should simply call through to the original agent, with the exception of Extension(). Just add a case statement against extensionType, define some reasonably namespaced extension, and you're done.

Now you need to use this agent. You probably don't want to use this for arbitrary hosts (agent forwarding should only be enabled for remote systems you trust, not arbitrary machines you connect to - if you enabled agent forwarding for github and github got compromised, github would be able to use any private keys loaded into your agent, and you probably don't want that). So the right approach is to add a Host entry to the ssh config with a ForwardAgent stanza pointing at the socket you created in your new agent. This way the configured subset of remote hosts will automatically talk to this new custom agent, while forwarding for anything else will still be at the user's discretion.

For the remote end things are even easier. Look up SSH_AUTH_SOCK and call NewClient as before, and then simply call client.Extension(). Whatever you stick in the contents argument will simply end up being received at the client end. You now have a communication channel between a the remote system and the local client, and what you do with that is up to you. I'm using it to allow a remote system to obtain auth tokens from Okta and forward WebAuthn challenges that can either be satisfied via a local WebAuthn token or by passing the query off to Mac TouchID, but there's fundamentally no constraints whatsoever on what can be done here.

(If you want to do this on Windows and still have everything work with existing clients you'll need to take this into account - Windows didn't really do Unix sockets until recently so everything there is awful)

comments

Linux Plumbers Conference: Registration for LPC 2024 is open

1 év 2 hónap óta

We’re happy to announce that registration for LPC 2024 is now open. To register please go to our attend page.

To try to prevent the instant sellout we had in previous years we are keeping our cancellation policy of no refunds, only transfers of registrations. You will find more details during the registration process. LPC 2024 follows the Linux Foundation’s health & safety policy.

As usual we expect to sell our rather quickly so don’t delay your registration for too long!

Linux Plumbers Conference: Update on the Microconference situation

1 év 2 hónap óta

Unfortunately we still do not know the total cost of the 4th track yet. We are still in the process of looking at the costs of adding another room, but we do not want to delay the acceptance of topics to Microconferences any further. We have decided to accept all pending Microconferences with one caveat. That is, we are not accepting the rest as full Microconferences. The Microconferences being accepted now will become one of the following at Linux Plumbers 2024:

  • A full 3 hour Microconference
  • A 1 and a half hour Microconference (Nanoconference)
  • A full 3 hour Microconference but without normal Audio/Video

That last one is another option we are looking at. The main cost to having a 4th track is the manned AV operations. But we could add the 4th track without normal AV. Instead, these would get a BBB room where an Owl video camera (or the like) and a Jabra speaker will be in place. The quality of the AV will not be as good as having a fully manned room, but this would be better than being rejected from the conference, or having half the time of a full microconference.

Even with a 4th track, two still need to become Nanoconferences.

In the mean time, we will be accepting the rest of the Microconferences so that they can start putting together content. How they are presented at Linux Plumbers is still to be determined.

Note that this also means we will likely be dropping the 3 free passes that a Microconference usually gets down to only 2 passes.

The accepted Microconferences (as full, half or no A/V) are:

  • Sched_ext
  • Containers and Checkpoint/Restore
  • Confidential Computing
  • Real-Time
  • Build Systems
  • RISC-V
  • Compute Express Link
  • X86
  • VFIO/IOMMU/PCI
  • System Boot and Security
  • Zone Storage
  • Internet of Things
  • Embedded
  • Complex Cameras
  • Power Management and Thermal Control
  • Kernel <-> Userspace/Init/System Management Boundaries and APIs

Harald Welte: OsmoDevCon 2024: "Using bpftrace to analyze osmocom performance"

1 év 3 hónap óta

I've presented a talk Using bpftrace to analyze osmocom performance as part of the OsmoDevCon 2024 conference on Open Source Mobile Communications.

bpftrace is a utility that uses the Linux kernel tracing infrastructure (and eBPF) in order to provide tracing capabilities within the kernel, like uprobe, kprobe, tracepoints, etc.

bpftrace can help us to analyze the performance of [unmodified] Osmocom programs and quickly provide information like, for example:

  • Histogram of time spent in a specific system call

  • Histogram of any argument or return value of any system call

You can find the video recording at https://media.ccc.de/v/osmodevcon2024-203-using-bpftrace-to-analyze-osmocom-performance

Harald Welte: OsmoDevCon 2024: "Introduction to XDP, eBPF and AF_XDP"

1 év 3 hónap óta

I've presented a talk Introduction to XDP, eBPF and AF_XDP as part of the OsmoDevCon 2024 conference on Open Source Mobile Communications.

This talk provides a generic introduction to a set of modern Linux kernel technologies:

  • eBPF (extended Berkeley Packet Filter) is a kind of virtual machine that runs sandboxed programs inside the Linux kernel.

  • XDP (eXpress Data Path) is a framework for eBPF that enables high-performance programmable packet processing in the Linux kernel

  • AF_XDP is an address family that is optimized for high-performance packet processing. It allows in-kernel XDP eBPF programs to efficiently pass packets to userspace via memory-mapped ring buffers.

The talk provides a high-level overview. It should provide some basics before the other/later talks on bpftrace and eUPF.

You can find the video recording at https://media.ccc.de/v/osmodevcon2024-204-introduction-to-xdp-ebpf-and-afxdp

Ellenőrizve
40 perc 33 másodperc ago
Kernel Planet - http://planet.kernel.org
Feliratkozás a következőre: Kernel Planet hírcsatorna