Hírolvasó

The conclusion of the FSF board review

4 hónap 2 hét óta
The Free Software Foundation has announced the completion of the review of its board of directors; the process resulted in the reconfirmation of all five sitting board members.

The review examined board members Ian Kelling, Geoffrey Knauth, Henry Poole, Richard Stallman, and Gerald Sussman. The process generated detailed philosophical and policy discussions between board members and the FSF's global associate members on topics ranging from the firmness of the Free Software Definition, developments in machine learning, to the board's president position.

corbet

How LWN is faring in 2025

4 hónap 2 hét óta
Just over six months ago, The Economist described the US economy as "the envy of the world". That headline would be unlikely to appear now. The economic boom referenced in that article feels like a distant memory, markets are falling, and uncertainty is at an all-time high. Like everybody else, LWN is affected by the current turbulence in the political and economic spheres; we expect to get through this period, but there will be some challenges.
corbet

Brendan Gregg: Doom GPU Flame Graphs

4 hónap 2 hét óta

AI Flame Graphs are now open source and include Intel Battlemage GPU support, which means it can also generate full-stack GPU flame graphs for providing new insights into gaming performance, especially when coupled with FlameScope (an older open source project of mine). Here's an example of GZDoom, and I'll start with flame scopes for both CPU and GPU utilization, with details annotated:

(Here are the raw CPU and GPU versions.) FlameScope shows a subsecond-offset heatmap of profile samples, where each column is one second (in this example, made up of 50 x 20ms blocks) and the color depth represents the number of samples, revealing variance and perturbation that you can select to generate a flame graph just for that time range. Update: the row size can be ajusted (it is limited by the sample rate captured in the profile), e.g., you could generate 60 rows to match 60fps games.

Putting these CPU and GPU flame scopes side by side has enabled your eyes to do pattern matching to solve what would otherwise be a time-consuming task of performance correlation. The gaps in the GPU flame scope on the right – where the GPU was not doing much work – match the heavier periods of CPU work on the left.

CPU Analysis

FlameScope lets us click on the interesting periods. By selecting one of the CPU shader compilation stripes we get the flame graph just for that range:

This is brilliant, and we can see exactly why the CPUs were busy for about 180 ms (the vertical length of the red stripe): it's doing compilation of GPU shaders and some NIR preprocessing (optimizations to the NIR intermediate representation that Mesa uses internally). If you are new to flame graphs, you look for the widest towers and optimize them first. Here is the interactive SVG.

CPU flame graphs and CPU flame scope aren't new (from 2011 and 2018, both open source). What is new is full-stack GPU flame graphs and GPU flame scope.

GPU Analysis

Interesting details can also be selected in the GPU FlameScope for generating GPU flame graphs. This example selects the "room 3" range, which is a room in the Doom map that contains hundreds of enemies. The green frames are the actual instructions running on the GPU, aqua shows the source for these functions, and red (C) and yellow (C++) show the CPU code paths that initiated the GPU programs. The gray "-" frames just help highlight the boundary between CPU and GPU code. (This is similar to what I described in the AI flame graphs post, which included extra frames for kernel code.) The x-axis is proportional to cost, so you look for the widest things and find ways to reduce them.

I've included the interactive SVG version of this flame graph so you can mouse-over elements and click to zoom. (PNG version.)

The GPU flame graph is split between stalls coming from rendering walls (41.4%), postprocessing effects (35.7%), stenciling (17.2%), and sprites (4.95%). The CPU stacks are further differentiated by the individual shaders that are causing stalls, along with the reasons for those stalls.

GZDoom

We picked GZDoom to try since it's an open source version of a well known game that runs on Linux (our profiler does not support Windows yet). Intel Battlemage makes light work of GZDoom, however, and since the GPU profile is stall-based we weren't getting many samples. We could have switched to a more modern and GPU-demanding game, but didn't have any great open source ideas, so I figured we'd just make GZDoom more demanding. We built GPU demanding maps for GZDoom (I can't believe I have found a work-related reason to be using Slade), and also set some Battlemage tunables to limit resources, magnifying the utilization of remaining resources.

Our GZDoom test map has three rooms: room 1 is empty, room 2 is filled with torches, and room 3 is open with a large skybox and filled with enemies, including spawnpoints for Sergeants. This gave us a few different workloads to examine by walking between the rooms.

Using iaprof: Intel's open source accelerator profiler

The AI Flame Graph project is pioneering work, and has needed various changes to graphics compilers, libraries, and kernel drivers, not just the code but also how they are built. Since Intel has its own public cloud (the Intel® Tiber™ AI Cloud) we can fix the software stack in advance so that for customers it "just works." Check the available releases. It currently supports the Intel Max Series GPU.

If you aren't on the Intel cloud, or you wish to try this with Intel Battlemage, then it can require a lot of work to get the system ready to be profiled. Requirements include:

  • A Linux system with superuser (root) access, so that eBPF and Intel eustalls can be used.
  • A newer Linux kernel with the latest Intel GPU drivers. For Intel Battlemage this means Linux 6.15+ with the Xe driver; For the Intel Max Series GPU it's Linux 5.15 with the i915 driver.
  • The Linux kernel built with Intel driver-specific eustall and eudebug interfaces (see the github docs for details). Some of these modifications are upstreamed in the latest versions of Linux and others are currently in progress. (These interfaces are made available by default on the Intel® Tiber™ AI Cloud.)
  • All system libraries or programs that are being profiled need to include frame pointers so that the full stacks are visible, including Intel's oneAPI and graphics libraries. For this example, GZDoom itself needed to be compiled with frame pointers and also all libraries used by GZDoom (glibc, etc.). This is getting easier in the lastest versions of Fedora and Ubuntu (e.g., Ubuntu 24.04 LTS) which are shipping system libraries with frame pointers by default. But I'd expect there will be applications and dependencies that don't have frame pointers yet, and need recompilation. If your flame graph has areas that are very short, one or two frames deep, this is why.

If you are new to custom kernel builds and library tinkering, then getting this all working may feel like Nightmare! difficulty. Over time things will improve and gradually get easier: check the github docs. Intel can also develop a much easier version of this tool as part of a broader product offering and get it working on more than just Linux and Battlemage (either watch this space or, if you have an Intel rep, ask them to make it a priority).

Once you have it all working, you can run the iaprof command to profile the GPU. E.g.:

git clone --recursive https://github.com/intel/iaprof cd iaprof make deps make sudo iaprof record > profile.txt cat profile.txt | iaprof flame > flame.svg

iaprof is modeled on the Linux perf command. (Maybe one day it'll become included in perf directly.) Thanks to Gabriel Muñoz for getting the work done to get this open sourced.

FAQ and Future Work

From the launch of AI flame graphs last year, I can guess what FAQ #1 will be: “What about NVIDIA?”. They do have flame graphs in Nsight Graphics for GPU workloads, although their flame graphs are currently shallow as it is GPU code only, and onerous to use as I believe it requires an interposer; on the plus side they have click-to-source. The new GPU profiling method we've been developing allows for easy, everything, anytime profiling, like you expect from CPU profilers.

Future work will include github releases, more hardware support, and overhead reduction. We're the first to use eustalls in this way, and we need to add more optimization to reach our target of <5% overhead, especially with the i915 driver.

Conclusion

We've open sourced AI flame graphs and tested it on new hardware, Intel Battlemage, and a non-AI workload: GZDoom (gaming). It's great to see a view of both CPU and GPU resources down to millisecond resolution, where we can see visual patterns in the flame scope heat maps that can be selected to produce flame graphs to show the code. We applied these new tools to GZDoom and explained GPU pauses by selecting the corresponding CPU burst and reading the flame graph, as well as GPU code use for arbitrary time windows.

While we have open sourced this, getting it all running requires Intel hardware and Linux kernel and library tinkering – which can be a lot of work. (Actually playing Doom on Nightmare! difficulty may be easier.) This will get better over time. We look forward to seeing if anyone can fight their way through this work in the meantime and what new performance issues they can solve.

Authors: Brendan Gregg, Ben Olson, Brandon Kammerdiener, Gabriel Muñoz.

Security updates for Wednesday

4 hónap 2 hét óta
Security updates have been issued by Debian (glibc and libraw), Fedora (digikam, icecat, mingw-LibRaw, perl, perl-Devel-Cover, and perl-PAR-Packer), Red Hat (ghostscript, kernel, and kernel-rt), Slackware (mozilla), SUSE (augeas, firefox, and java-11-openjdk), and Ubuntu (binutils, libxml2, and nodejs).
jzb

LibreSSL 4.1.0 released

4 hónap 2 hét óta

LibreSSL version 4.1.0 has been released.

This is the version found in (the recently released) OpenBSD 7.7

The release notes read,

We have released LibreSSL 4.1.0, which will be arriving in the LibreSSL directory of your local OpenBSD mirror soon. This is the first stable release for the 4.1.x branch, also available with OpenBSD 7.7 It includes the following changes from LibreSSL 4.0.0: * Portable changes - Added initial experimental support for loongarch64. - Fixed compilation for mips32 and reenable CI. - Fixed CMake builds on FreeBSD. - Fixed the --prefix option for cmake --install. - Fixed tests for MinGW due to missing sh(1).

Read more…

LWN's Mastodon migration

4 hónap 2 hét óta
The LWN.net fediverse (Mastodon) feed has moved; we are now known as @LWN@lwn.net. The migration magic has shifted many of our followers over automatically but, if you follow that stream, you might want to make sure that you have shifted to the new source.
corbet

Meson 1.8.0 released

4 hónap 2 hét óta

Version 1.8.0 of the Meson build system has been released. Notable changes in this release include the ability to run rustdoc for Rust projects, support for the c2y and gnu2y compiler options, and a new argument (android_exe_type) that makes it possible to use the same meson.build file for Android and non-Android systems.

jzb

Firefox 138.0 released

4 hónap 2 hét óta
Version 138.0 of the Firefox web browser has been released. Changes include some profile-management improvements, the ability to get weather-related suggestions in the address bar (US only), and some security fixes.
corbet

Barnes: Parallel ./configure

4 hónap 2 hét óta
Tavian Barnes takes on the tedious process of waiting for configure scripts to run.

I paid good money for my 24 CPU cores, but ./configure can only manage to use 69% of one of them. As a result, this random project takes about 13.5× longer to configure the build than it does to actually do the build.

The purpose of a ./configure script is basically to run the compiler a bunch of times and check which runs succeeded. In this way it can test whether particular headers, functions, struct fields, etc. exist, which lets people write portable software. This is an embarrassingly parallel problem, but Autoconf can't parallelize it, and neither can CMake, neither can Meson, etc., etc.

(Thanks to Paul Wise).

corbet

[$] Cache awareness for the CPU scheduler

4 hónap 2 hét óta
The kernel's CPU scheduler has to balance a wide range of objectives. The tasks in the system must be scheduled fairly, with latency for any given task kept within bounds. All of the CPUs in the system should be kept busy if there is enough work to do, but unneeded CPUs should be shut down to reduce power consumption. A task should also run on the CPU that is most likely to have cached the memory that task is using. This patch series from Chen Yu aims to improve how the scheduler handles cache locality for multi-threaded processes.
corbet

Signing key change for Kali Linux

4 hónap 2 hét óta
The Kali Linux distribution has announced that software updates will soon start failing for all users:

This is not only you, this is for everyone, and this is entirely our fault. We lost access to the signing key of the repository, so we had to create a new one. At the same time, we froze the repository (you might have noticed that there was no update since Friday 18th), so nobody was impacted yet. But we're going to unfreeze the repository this week, and it's now signed with the new key.

The announcement includes instructions for how to recover from the problem.

corbet

Security updates for Tuesday

4 hónap 2 hét óta
Security updates have been issued by AlmaLinux (glibc, php:8.1, and thunderbird), Debian (libreoffice), Fedora (caddy), Mageia (chromium-browser-stable), Red Hat (php:8.1), SUSE (glow), and Ubuntu (kicad, linux-aws-5.15, linux-azure-nvidia, linux-gcp-5.15, mistral, python-mistral-lib, tomcat8, and trafficserver).
corbet