Hírolvasó

malloc(3) leak detection gains backtraces

1 év 8 hónap óta

Otto Moerbeek (otto@), the author of OpenBSD's malloc(3) implementation, has comitted another great feature - backtraces for leak detection:

CVSROOT: /cvs Module name: src Changes by: otto@cvs.openbsd.org 2023/12/04 00:01:45 Modified files: lib/libc/stdlib: malloc.3 malloc.c Log message: Save backtraces to show in leak dump. Depth of backtrace set by malloc option D (aka 1), 2, 3 or 4. No performance impact if not used. ok asou@

Otto's original message to tech@ includes an example use of the feature.

Gustavo A. R. Silva: November 2023 – Linux Kernel work

1 év 8 hónap óta

-Wstringop-overflow

Late in October I sent a patch to globally enable the -Wstringop-overflowcompiler option, which finally landed in linux-next on November 28th. It’s expected to be merged into mainline during the next merge window, likely in the last couple of weeks of December, but “We’ll see”. I plan to send a pull request for this to Linus when the time is right. 🙂

I’ll write more about the challenges of enabling this compiler option once it’s included in 6.8-rc1, early next year. In the meantime, it’s worth mentioning that several people, including Kees Cook, Arnd Bergmann, and myself, have sent patches to fix -Wstringop-overflow warnings over the past few years.

Below are the patches that address the last warnings, together with the couple of patches that enable the option in the kernel. The first of them enables the option globally for all versions of GCC. However, -Wstringop-overflow is buggy in GCC-11. Therefore, I wrote a second patch adding this option under new configuration CC_STRINGOP_OVERFLOW in init/Kconfig, which is enabled by default for all versions of GCC except GCC-11. To handle the GCC-11 case I added another configuration: GCC11_NO_STRINGOP_OVERFLOW, which will disable -Wstringop-overflowby default for GCC-11 only.

Boot crash on ARM64

Another relevant task I worked on recently was debugging and fixing a boot crash on ARM64, reported by Joey Gouly. This issue was interesting as it related to some long-term work in the Kernel Self-Protection Project (KSPP), particularly our efforts to transform “fake” flexible arrays into C99 flexible-array members. In short, there was a zero-length fake flexible array at the end of a structure annotated with the __randomize_layout attribute, which needed to be transformed into a C99 flexible-array member.

This becomes problematic due to how compilers previously treated such arrays before the introduction of -fstrict-flex-arrays=3. The randstruct GCC plugin treated these arrays as actual flexible arrays, thus leaving their memory layout untouched when the kernel is built with CONFIG_RANDSTRUCT. However, after commit 1ee60356c2dc (‘gcc-plugins: randstruct: Only warn about true flexible arrays’), this behavior changed. Fake flexible arrays were no longer treated the same as proper C99 flexible-array members, leading to randomized memory layout for these arrays in structures annotated with __randomize_layout, which was the root cause of the boot crash.

To address this, I sent two patches. The first patch is the actual bugfix, which includes the flexible-array transformation. The second patch is complementary to commit 1ee60356c2dc, updating a code comment to clarify that “we don’t randomize the layout of the last element of a struct if it’s a proper flexible array.”

diff --git a/include/net/neighbour.h b/include/net/neighbour.h index 07022bb0d44d..0d28172193fa 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -162,7 +162,7 @@ struct neighbour { struct rcu_head rcu; struct net_device *dev; netdevice_tracker dev_tracker; - u8 primary_key[0]; + u8 primary_key[]; } __randomize_layout; struct neigh_ops { diff --git a/scripts/gcc-plugins/randomize_layout_plugin.c b/scripts/gcc-plugins/randomize_layout_plugin.c index 910bd21d08f4..746ff2d272f2 100644 --- a/scripts/gcc-plugins/randomize_layout_plugin.c +++ b/scripts/gcc-plugins/randomize_layout_plugin.c @@ -339,8 +339,7 @@ static int relayout_struct(tree type) /* * enforce that we don't randomize the layout of the last - * element of a struct if it's a 0 or 1-length array - * or a proper flexible array + * element of a struct if it's a proper flexible array */ if (is_flexible_array(newtree[num_fields - 1])) { has_flexarray = true;

These two patches will be soon backported to a couple of -stable trees.

-Wflex-array-member-not-at-end

During my last presentation at Kernel Recipes in September this year, I discussed a bit about -Wflex-array-member-not-at-end, which is a compiler option currently under development for GCC-14.

One of the highlights of the talk was a 6-year-old bug that I initially uncovered through grepping, and later, while reviewing some build logs from previous months, I realized that -Wflex-array-member-not-at-end had also detected this problem:

This bugfix was backported to 6.5.7, 6.1.57, 5.15.135, 5.10.198, 5.4.258 and 4.19.296 stable kernels.

Encouraged by this discovery, I started hunting for more similar bugs. My efforts led to fixing a couple more:

On November 28th, these two bugfixes were successfully backported to multiple stable kernel trees. The first fix was applied to the 6.6.3, 6.5.13, 6.1.64 stable kernels. The second fix was also applied to these, along with the 5.15.140 stable kernel.

I will have a lot of fun with -Wflex-array-member-not-at-end next year.

-Warray-bounds

In addition to these tasks, I continued addressing -Warray-boundsissues. Below are some of the patches I sent for this.

Patch review and ACKs.

I’ve also been involved in patch review and providing ACKs. Kees Cook, for instance, has been actively annotating flexible-array members with the__counted_byattribute, and I’ve been reviewing those patches.

Google Open Source Peer Bonus Award

In other news from November, I want to share that I’m thrilled to be the recipient of this award from Google for the first time. I feel really grateful and honored!

This comes as a result of my contributions to the Linux kernel over the years.

Honestly, I didn’t even know about the existence of this award until I received an email from someone at Google informing me about it. However, learning about it made me feel really great!

My appreciation goes out to my teammates in the Kernel Self-Protection Project, especially to Kees Cook, who has been an invaluable mentor to me over the years. Special thanks to Greg Kroah-Hartman as well, who was instrumental in setting me on my journey as a Linux kernel developer.

Acknowledgements

Special thanks to The Linux Foundation and Google for supporting my Linux kernel work.

[$] What remains to be done for proxy execution

1 év 8 hónap óta
The kernel's deadline scheduling class offers a solution to a number of realtime (or generally latency-sensitive) problems, but it is also resistant to the usual solutions for the priority-inversion problem. The development community has been pursuing proxy execution as a solution to a few scheduling challenges, including this one; the problem is difficult and progress has been slow. LWN last looked at proxy execution in June; at the 2023 Linux Plumbers Conference, John Stultz gave an overview of proxy execution, the current status of the work, and the remaining problems to solve.
corbet

GDB 14.1 released

1 év 8 hónap óta
Version 14.1 of the GDB debugger is out. Changes include initial support for the debugger adapter protocol, NO_COLOR support, the ability to work with integer types larger than 64 bits, a number of enhancements to the Python API, and more.
corbet

Security updates for Monday

1 év 8 hónap óta
Security updates have been issued by Debian (amanda, ncurses, nghttp2, opendkim, rabbitmq-server, and roundcube), Fedora (golang-github-openprinting-ipp-usb, kernel, kernel-headers, kernel-tools, and samba), Mageia (audiofile, galera, libvpx, and virtualbox), Oracle (kernel and postgresql:13), SUSE (openssl-3, optipng, and python-Pillow), and Ubuntu (firefox).
jake

Davidlohr Bueso: LPC 2023: CXL Microconference

1 év 8 hónap óta

The Compute Express Link (CXL) microconference was held, for a second straight time, at this year's Linux Plumbers Conference. The goals for the track were to openly discuss current on-going development efforts around the core driver, as well as experimental memory management topics which lead to accommodating kernel infrastructure for new technology and use cases.

CXL session at LPC23
(i) CXL Emulation in QEMU - Progress, status and most importantly what next? The cxl qemu maintainers presented the current state of the emulation, for which significant progress has been made, extending support beyond basic enablement. During this year, features such as volatile devices, CDAT, poison and injection infrastructure have been added upstream qemu, while several others are in the process, such as CCI/mailbox, Scan Media and dynamic capacity. There was also further highlighting of the latter, for which DCD support was presented along with extent management issues found in the 3.0 spec. Similarly, Fabric Management was another important topic, continuing the debate about qemu's role in FM development, which is still quite early. Concerns about the production (beyond testing) use cases for CCI kernel support were discussed, as well as semantics and interfaces that constrain qemu, such as host and switch coupling and differences with BMC behavior.
(ii) CXL Type-2 core support. The state and purpose of existing experimental support for type 2 (accelerators) devices was presented, for both the kernel and qemu sides. The kernel support led to preliminary abstraction improvement work being upstreamed, facilitating actual accelerator integration with the cxl core driver. However, the rest is merely guess work and the floor is open for an actual hardware backed proposal. In addition, HDM-DB support would also be welcomed as a step forward. The qemu side is very basic and designed to just exercise core checks, for which it's emulation should be limited, specially in light of cxl_test.
(iii) Plumbing challenges in Dynamic capacity device. An in-depth coverage and discussion, from a kernel side, of the state of DCD support and considerations around corner cases. Semantics of releasing DC for full partial extents (ranges) are two different beasts altogether. Releasing all the already given  memory can simply require memory being offline and be done, avoiding unnecessary complexity in the kernel. Therefore the kernel can perfectly well reject the request, and FM design should keep that into consideration. Partial extents, on the other hand, are unsupported for the sake of simplicity, at least until a solid industry use case comes along. Forced DC removal of online memory semantics were also discussed, emphasizing that such DC memory is not guaranteed to ever be given back by the kernel, mapped or not. Forcing the event, the hardware does not care and the kernel has most likely crashed anyway. Support for extent tagging was another topic, establishing the need for supporting it, coupling a device to a tag domain, being a sensible use case. For now at least, the implementation can be kept to to simply enumerate tags and the necessary attributes to leave the memory matching to userspace, instead of more complex surgeries to create DAX devices on specific extents, dealing with sparse regions.
(iv) Adding RAS Support for CXL Port Devices. Starting with a general overview of RAS, this touched on the current state for support in CXL 1.1 and 2.0.  Special handling is required for RCH: due to the RCRB implementation, the RCH downstream port does not have a BDF, needed for AER error handling; this work was merged in v6.7. As for CXL Virtual Hierarchy implementation, it is left still open, potentially things could move away from the PCIe port service driver model, which is not entirely liked. There are however, clear requirements: not-CXL specific (AER is a PCIe protocol, used by CXL.io); implement driver callback logic specific to that technology or device, giving flexibility to handle that specific need; and allow enable/disable on a per-device granularity. There were discussions around the order for which a registration handler is added in the PCI port driver, noting that it made sense to go top-down from the port and searching children, instead of written from a lower level.
(v) Shared CXL 3 memory: what will be required? Overview of the state, semantics and requirements for supporting shared fabric attached memory (FAM). A strong enablement use case is leveraging applications that already handle data sets in files. In addition appropriate workload candidates will fit the "master writer, multiple readers" read-only model for which this sort of machinery would make sense. Early results show that the benefits can out-weigh costly remote CXL memory access such as fitting larger data sets in FAM that would otherwise be possible in a single host. Similarly this avoids cache-coherency costs by simply never modifying the memory. A number of concrete data science and AI usecases were presented. Shared FAM is meant to be mmap-able, file-backed, special purpose memory, for which a FAMFS prototype is described, overcoming limitations of just using DAX device/FSDAX, such as distributing metadata in a shareable way.
(vi) CXL Memory Tiering for heterogenous computing. Discusses the pros and cons of interleaving heterogeneous (ie: DRAM and CXL) memory through hardware and/or software for bandwidth optimization. Hardware interleaving is simple to configure through the BIOS, but limited by not allowing the OS to manage allocations, otherwise hiding the NUMA topology (single node) as well as being a static configuration. The software interleaving solves these limitations with hardware and relies on weighted nodes for allocation distribution when doing the initial mapping (vma). Several interfaces have been posted, which incrementally are converging into a NUMA node based interface. The caveat is to have a single (configurable) system-wide set of weights, or to allow more flexibility, such as hierarchically through cgroups - something which has not been particularly sold yet. Combining both hardware and software models relies on within a socket, splitting channels among respective DDR and CXL NUMA nodes for which software can explicitly (numactl) set the interleaving - it is still restrained however by being static as the BIOS is in charge of setting the number of NUMA nodes. 
(vii) A move_pages() equivalent for physical memory. Through an experimental interface, this focused on the semantics of tiering and device driven page movement. There are currently various mechanisms for access detection, such as PMU-based, fault hinting for page promotion and idle bit page monitoring; each with its set of limitations, while runtime overhead is a universal concern. Hardware mechanisms could help with the burden but the problem is that devices only know physical memory and must therefore do expensive reverse mapping lookups; nor are there any interfaces for this, and it is difficult to with out hardware standardization. A good starting point would be to keep the suggested move_phys_pages as an interface, but not have it be an actual syscall.

[$] A Nouveau graphics driver update

1 év 8 hónap óta
Support for NVIDIA graphics processors has traditionally been a sore point for Linux users; NVIDIA has not felt the need to cooperate with the kernel community or make free drivers available, and the reverse-engineered Nouveau driver has often struggled to keep up with product releases. There have, however, been signs of improvement in recent years. At the 2023 Linux Plumbers Conference, graphics subsystem maintainer Dave Airlie provided an update on the state of support for NVIDIA GPUs and what remains to be done.
corbet

Security updates for Friday

1 év 8 hónap óta
Security updates have been issued by Debian (chromium, gimp-dds, horizon, libde265, thunderbird, vlc, and zbar), Fedora (java-17-openjdk and xen), Mageia (optipng, roundcubemail, and xrdp), Red Hat (postgresql), Slackware (samba), SUSE (chromium, containerd, docker, runc, libqt4, opera, python-django-grappelli, sqlite3, and traceroute), and Ubuntu (linux-azure, linux-azure-4.15, linux-gcp, linux-gcp-4.15, linux-azure, linux-azure-5.15, linux-azure-fde, linux-azure-fde-5.15, linux-gcp, linux-gcp-5.15, linux-gke, linux-gkeop, linux-gkeop-5.15, linux-azure, linux-azure-5.4, linux-gcp, linux-gcp-5.4, linux-gkeop, and linux-azure, linux-azure-6.2, linux-azure-fde-6.2, linux-gcp, linux-gcp-6.2).
jake

[$] A Rust implementation of Android's Binder

1 év 8 hónap óta
The Android system was once famous for extensive, out-of-tree kernel enhancements. Many of those have been eliminated or upstreamed over the years, bringing Android much closer to the mainline kernel. One significant component in the "upstreamed" category is Binder, an interprocess communication mechanism that is used only by Android. There are a number of factors that make Binder a good candidate for rewriting in the Rust language; at the 2023 Linux Plumbers Conference, Carlos Llamas and Alice Ryhl described the motivation behind and implementation of a rewrite of Binder in Rust.
corbet

Security updates for Thursday

1 év 8 hónap óta
Security updates have been issued by Fedora (chromium, gnutls, gst-devtools, gstreamer1, gstreamer1-doc, libcap, mingw-poppler, python-gstreamer1, qbittorrent, webkitgtk, and xen), Mageia (docker, kernel-linus, and python-django), Oracle (dotnet6.0, dotnet7.0, dotnet8.0, firefox, samba, squid, and thunderbird), Red Hat (firefox, postgresql:13, squid, and thunderbird), SUSE (cilium, freerdp, java-1_8_0-ibm, and java-1_8_0-openj9), and Ubuntu (ec2-hibinit-agent, freerdp2, gimp, gst-plugins-bad1.0, openjdk-17, openjdk-21, openjdk-lts, openjdk-8, pypy3, pysha3, and u-boot-nezha).
jake

LibreQoS 1.4 released

1 év 8 hónap óta
The LibreQoS project describes itself as:

LibreQoS is a Quality of Experience (QoE) Smart Queue Management (SQM) system designed for Internet Service Providers to optimize the flow of their network traffic and thus reduce bufferbloat, keep the network responsive, and improve the end-user experience.

Version 1.4 of LibreQoS was released on November 17. "Version 1.4 is a huge milestone. A whole new back-end, new GUI, 30%+ performance improvements, support for single-interface mode."

corbet

[$] An overview of kernel samepage merging (KSM)

1 év 8 hónap óta
In the Kernel Summit track at the 2023 Linux Plumbers Conference (LPC), Stefan Roesch led a session on kernel samepage merging (KSM). He gave an overview of the feature and described some recent changes to KSM. He showed how an application can enable KSM to deduplicate its memory and how the feature can be evaluated to determine whether it is a good fit for new workloads. In addition, he provided some real-world data of the benefits from his workplace at Meta.
jake

Roundcube becomes part of Nextcloud

1 év 8 hónap óta
Nextcloud has announced the "acquisition" of the Roundcube webmail system.

As a product, Roundcube has an established path to success on its own. With opportunities remaining to be explored, a direct merger between Roundcube and Nextcloud is not planned. Neither will Roundcube replace Nextcloud Mail or the other way around. The products both have strengths and weaknesses and as open source products they already do share some underlying libraries and tools, but remain independent offerings for overlapping but different use scenarios. Nextcloud Mail will evolve as it is, focused on being used naturally within Nextcloud. Roundcube will continue to serve its active and new users as a stand-alone secure mail client.

corbet

Security updates for Wednesday

1 év 8 hónap óta
Security updates have been issued by Debian (gst-plugins-bad1.0 and postgresql-multicorn), Fedora (golang-github-nats-io, golang-github-nats-io-jwt-2, golang-github-nats-io-nkeys, golang-github-nats-io-streaming-server, libcap, nats-server, openvpn, and python-geopandas), Mageia (kernel), Red Hat (c-ares, curl, fence-agents, firefox, kernel, kernel-rt, kpatch-patch, libxml2, pixman, postgresql, and tigervnc), SUSE (python-azure-storage-queue, python-Twisted, and python3-Twisted), and Ubuntu (afflib, ec2-hibinit-agent, linux-nvidia-6.2, linux-starfive-6.2, and poppler).
corbet

[$] Using drgn on production kernels

1 év 8 hónap óta
The drgn Python-based kernel debugger was developed by Omar Sandoval for use in his job on the kernel team at Meta. He now spends most of his time working on drgn, both in developing new features for the tool and in using it to debug production problems at Meta, which gives him a view of both ends of that feedback loop. At the 2023 Linux Plumbers Conference (LPC), he led a session on drgn in the kernel debugging microconference, where he wanted to brainstorm on how to add some new features to the debugger and, in particular, how to allow them to work on production kernels.
jake