Hírolvasó

Security updates for Saturday

4 év 1 hónap óta
Security updates have been issued by Arch Linux (gitlab, nodejs, openexr, php, php7, rabbitmq, ruby-addressable, and spice), Fedora (suricata), Gentoo (binutils, docker, runc, and tor), Mageia (avahi, botan2, connman, gstreamer1.0-plugins, htmldoc, jhead, libcroco, libebml, libosinfo, openexr, php, php-smarty, pjproject, and python), openSUSE (apache2, bind, bouncycastle, ceph, containerd, docker, runc, cryptctl, curl, dovecot23, firefox, graphviz, gstreamer-plugins-bad, java-1_8_0-openj9, java-1_8_0-openjdk, libass, libjpeg-turbo, libopenmpt, libqt5-qtwebengine, libu2f-host, libwebp, libX11, lua53, lz4, nginx, ovmf, postgresql10, postgresql12, python-urllib3, qemu, roundcubemail, solo, thunderbird, ucode-intel, wireshark, and xterm), and SUSE (permissions).
corbet

Linux Plumbers Conference: Testing and Fuzzing Microconference Accepted into 2021 Linux Plumbers Conference

4 év 1 hónap óta

We are pleased to announce that the Testing and Fuzzing Microconference has been accepted into the 2021 Linux Plumbers Conference. In spite of the huge number of products shipping with the Linux kernel which are being thoroughly tested by OEMs and distribution providers, there is still no enforced quality standard upstream. How can we make best use of all the publicly available infrastructure and test frameworks in order to fill this gap? Testing and fuzzing upstream as well as gathering results from products is crucial to keeping a project that has over 5,000 commits every month stable for all to use.

Last year’s meetup achieved the following:

  • KernelCI enabled LLVM=1 Clang builds and produced initial results from kselftests and real-time tests
  • KCIDB achieved multiple integrations, acting as a central collecting point for KernelCI, CKI, syzbot, etc.
  • KFENCE was successfully merged.
  • Clang: CFI, weeding out issued upstream, etc.
  • KUnit started acting as the standard for some drivers.

This year’s topics to be discussed include:

  • KernelCI: Extending coverage and improving user experience.
  • Growing KCIDB, integrating more sources.
  • Better sanitizers: KFENCE, improving KCSAN.
  • Using Clang for better testing coverage: Now that the kernel fully supports building with clang, how can all that work be leveraged into using clang’s features?
  • How to spread KUnit throughout the kernel?
  • Testing in-kernel Rust code.

Come and join us in the discussion of keeping Linux being the best quality it can be.

We hope to see you there.

Announcing Arti, a pure-Rust Tor implementation (Tor blog)

4 év 1 hónap óta
The Tor project, which provides tools for internet privacy and anonymity, has announced a rewrite of the Tor protocols in Rust, called Arti. It is not ready for prime time, yet, but based on a grant from Zcash Open Major Grants (ZOMG), significant work is ongoing; the plan is "to try bring Arti to a production-quality client implementation over the next year and a half". The C implementation is not going away anytime soon, but the idea is that Arti will eventually supplant it. The project sees a number of benefits from using Rust, including: For years now, we've wanted to split Tor's relay cryptography across multiple CPU cores, but we've run into trouble. C's support for thread-safety is quite fragile, and it is very easy to write a program that looks safe to run across multiple threads, but which introduces subtle bugs or security holes. If one thread accesses a piece of state at the same time that another thread is changing it, then your whole program can exhibit some truly confusing and bizarre bugs.

But in Rust, this kind of bug is easy to avoid: the same type system that keeps us from writing memory unsafety prevents us from writing dangerous concurrent access patterns. Because of that, Arti's circuit cryptography has been multicore from day 1, at very little additional programming effort.

jake

[$] Syncing all the things

4 év 1 hónap óta
Computing devices are wonderful; they surely must be, since so many of us have so many of them. The proliferation of computers leads directly to a familiar problem, though: the files we want are always on the wrong machine. One solution is synchronization services that keep a set of files up to date across a multitude of machines; a number of companies have created successful commercial offerings based on such services. Some of us, though, are stubbornly resistant to the idea of placing our data in the hands of corporations and their proprietary systems. For those of us who would rather stay in control of our data, systems like Syncthing offer a possible solution.
corbet

Security updates for Friday

4 év 1 hónap óta
Security updates have been issued by Debian (apache2 and scilab), Fedora (chromium and perl-Mojolicious), Gentoo (inspircd, redis, and wireshark), and Mageia (fluidsynth, glib2.0, gnome-shell, grub2, gupnp, hivex, libupnp, redis, and zstd).
jake

[$] Another misstep for Audacity

4 év 1 hónap óta
While it has often been said that there is no such thing as bad publicity, the new owners of the Audacity audio-editor project may beg to differ. The project has only recently weathered the controversies around its acquisition by the Muse Group, proposed telemetry features, and imposition of a new license agreement on its contributors. Now, the posting of a new privacy policy has set off a new round of criticism, with some accusing the project of planning to ship spyware. The situation with Audacity is not remotely as bad as it has been portrayed, but it is a lesson on what can happen when a project loses the trust of its user community.
corbet

Security updates for Thursday

4 év 1 hónap óta
Security updates have been issued by CentOS (linuxptp), Fedora (kernel and php), Gentoo (bladeenc, blktrace, jinja, mechanize, privoxy, and rclone), Oracle (linuxptp, ruby:2.6, and ruby:2.7), Red Hat (kernel and kpatch-patch), SUSE (kubevirt), and Ubuntu (avahi).
jake

[$] Rust for Linux redux

4 év 1 hónap óta
On July 4, the Rust for Linux project posted another version of its patch set adding support for the language to the kernel. It would seem that the project feels that it is ready to be considered for merging into the mainline. Perhaps a bigger question lingers, though: is the kernel development community ready for Rust? That part still seems to be up in the air.
jake

Security updates for Wednesday

4 év 1 hónap óta
Security updates have been issued by Fedora (glibc), Gentoo (doas, firefox, glib, schismtracker, and tpm2-tss), Mageia (httpcomponents-client), openSUSE (virtualbox), Red Hat (linuxptp), Scientific Linux (linuxptp), and Ubuntu (libuv1 and php7.2, php7.4).
ris

[$] Python attributes, __slots__, and API design

4 év 1 hónap óta
A discussion on the python-ideas mailing list touched on a number of interesting topics, from the problems with misspelled attribute names through the design of security-sensitive interfaces and to the use of the __slots__ attribute of objects. The latter may not be all that well-known (or well-documented), but could potentially fix the problem at hand, though not in a backward-compatible way. The conversation revolves around the ssl module in the standard library, which has been targeted for upgrades, more than once, over the years—with luck, the maintainers may find time for some upgrades relatively soon.
jake

Virtuozzo VzLinux 8.4 Now Available

4 év 1 hónap óta
The Virtuozzo team has announced the release of VzLinux 8.4; its fork of RHEL. "Thanks for noticing that we are fixing bugs so quickly (24 hours) and that you think VzLinux is stable and enterprise ready. To those who have asked if we will be following a similar path as CentOS, shifting its focus to Stream, the answer is: there are no plans for us to go this route, VzLinux will remain free to download, use and distribute. See the release notes for details.
ris

Security updates for Tuesday

4 év 1 hónap óta
Security updates have been issued by Arch Linux (python-django), Debian (libuv1, libxstream-java, and php7.3), Fedora (rabbitmq-server), Gentoo (glibc, google-chrome, libxml2, and postsrsd), openSUSE (libqt5-qtwebengine and roundcubemail), SUSE (python-rsa), and Ubuntu (djvulibre).
ris

[$] Bye-bye bdflush()

4 év 1 hónap óta
The addition of system calls to the Linux kernel is a routine affair; it happens during almost every merge window. The removal of system calls, instead, is much more uncommon. That appears likely to happen soon, though, as discussions proceed on the removal of bdflush(). Read on for a look at the purpose and history of this obscure system call and to learn whether you will miss it (you won't).
corbet

Security updates for Monday

4 év 1 hónap óta
Security updates have been issued by Arch Linux (electron11, electron12, istio, jenkins, libtpms, mediawiki, mruby, opera, puppet, and python-fastapi), Debian (djvulibre and openexr), Fedora (dovecot, libtpms, nginx, and php-league-flysystem), Gentoo (corosync, freeimage, graphviz, and libqb), Mageia (busybox, file-roller, live, networkmanager, and php), openSUSE (clamav-database, lua53, and roundcubemail), Oracle (389-ds:1.4, kernel, libxml2, python38:3.8 and python38-devel:3.8, and ruby:2.5), and SUSE (crmsh, djvulibre, python-py, and python-rsa).
ris

Darktable 3.6 released

4 év 1 hónap óta
Version 3.6 of the Darktable raw photo editor has been released. "The darktable team is proud to announce our second summer feature release, darktable 3.6. Merry (summer) Christmas! This is the first of two releases this year and, from here on, we intend to issue two new feature releases each year, around the summer and winter solstices." The list of new features is long, including a new color-balance module, a "censorize" module for partial pixelization of images, a new demosaic algorithm, and more.
corbet

Brendan Gregg: USENIX LISA2021 Computing Performance: On the Horizon

4 év 1 hónap óta
It's an exciting time for developments in computer performance, not just for the BPF technology (which I often [write about]) but also for processors with 3D stacking and cloud vendor CPUs (e.g., AWS Graviton2); for memory with the arrival of DDR5 and High Bandwidth Memory (HBM) on-processor; for storage including new uses for 3D Xpoint as a 3D NAND accelerator; for networking with the rise of QUIC and eXpress Data Path (XDP); and so on. I summarized these topics and more as a plenary conference talk, including my own predictions (as a senior performance engineer) for the future of computing performance, with a focus on back-end servers. The video is on [youtube]: The slides are on [slideshare] or as a [PDF]: I work on many areas of performance, but recently I've had a lot of demand to talk about BPF. This was a chance to talk about other things I've been working on, such as the present and future of hardware performance. I also wrote about these topics in detail for my recent [Systems Performance 2nd Edition] book. Note that my predictions in this talk may be wrong, but they should be thought-provoking. I hope you enjoy it! ## References I've reproduced the talk references below, so you can click on links: - [Gregg 08] Brendan Gregg, “ZFS L2ARC,” http://www.brendangregg.com/blog/2008-07-22/zfs-l2arc.html, Jul 2008 - [Gregg 10] Brendan Gregg, “Visualizations for Performance Analysis (and More),” https://www.usenix.org/conference/lisa10/visualizations-performance-analysis-and-more, 2010 - [Greenberg 11] Marc Greenberg, “DDR4: Double the speed, double the latency? Make sure your system can handle next-generation DRAM,” https://www.chipestimate.com/DDR4-Double-the-speed-double-the-latencyMake-sure-your-system-can-handle-next-generation-DRAM/Cadence/Technical-Article/2011/11/22, Nov 2011 - [Hruska 12] Joel Hruska, “The future of CPU scaling: Exploring options on the cutting edge,” https://www.extremetech.com/computing/184946-14nm-7nm-5nm-how-low-can-cmos-go-it-depends-if-you-ask-the-engineers-or-the-economists, Feb 2012 - [Gregg 13] Brendan Gregg, “Blazing Performance with Flame Graphs,” https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg, 2013 - [Shimpi 13] Anand Lal Shimpi, “Seagate to Ship 5TB HDD in 2014 using Shingled Magnetic Recording,” https://www.anandtech.com/show/7290/seagate-to-ship-5tb-hdd-in-2014-using-shingled-magnetic-recording, Sep 2013 - [Borkmann 14] Daniel Borkmann, “net: tcp: add DCTCP congestion control algorithm,” https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e3118e8359bb7c59555aca60c725106e6d78c5ce, 2014 - [Macri 15] Joe Macri, “Introducing HBM,” https://www.amd.com/en/technologies/hbm, Jul 2015 - [Cardwell 16] Neal Cardwell, et al., “BBR: Congestion-Based Congestion Control,” https://queue.acm.org/detail.cfm?id=3022184, 2016 - [Gregg 16] Brendan Gregg, “Unikernel Profiling: Flame Graphs from dom0,” http://www.brendangregg.com/blog/2016-01-27/unikernel-profiling-from-dom0.html, Jan 2016 - [Gregg 16b] Brendan Gregg, “Linux 4.X Tracing Tools: Using BPF Superpowers,” https://www.usenix.org/conference/lisa16/conference-program/presentation/linux-4x-tracing-tools-using-bpf-superpowers, 2016 - [Alcorn 17] Paul Alcorn, “Seagate To Double HDD Speed With Multi-Actuator Technology,” https://www.tomshardware.com/news/hdd-multi-actuator-heads-seagate,36132.html, 2017 - [Alcorn 17b] Paul Alcorn, “Hot Chips 2017: Intel Deep Dives Into EMIB,” https://www.tomshardware.com/news/intel-emib-interconnect-fpga-chiplet,35316.html#xenforo-comments-3112212, 2017 - [Corbet 17] Jonathan Corbet, “Two new block I/O schedulers for 4.12,” https://lwn.net/Articles/720675, Apr 2017 - [Gregg 17] Brendan Gregg, “AWS EC2 Virtualization 2017: Introducing Nitro,” http://www.brendangregg.com/blog/2017-11-29/aws-ec2-virtualization-2017.html, Nov 2017 - [Russinovich 17] Mark Russinovich, “Inside the Microsoft FPGA-based configurable cloud,” https://www.microsoft.com/en-us/research/video/inside-microsoft-fpga-based-configurable-cloud, 2017 - [Gregg 18] Brendan Gregg, “Linux Performance 2018,” http://www.brendangregg.com/Slides/Percona2018_Linux_Performance.pdf, 2018 - [Hady 18] Frank Hady, “Achieve Consistent Low Latency for Your Storage-Intensive Workloads,” https://www.intel.com/content/www/us/en/architecture-and-technology/optane-technology/low-latency-for-storage-intensive-workloads-article-brief.html, 2018 - [Joshi 18] Amit Joshi, et al., “Titus, the Netflix container management platform, is now open source,” https://netflixtechblog.com/titus-the-netflix-container-management-platform-is-now-open-source-f868c9fb5436, Apr 2018 - [Cutress 19] Dr. Ian Cutress, “Xilinx Announces World Largest FPGA: Virtex Ultrascale+ VU19P with 9m Cells,” https://www.anandtech.com/show/14798/xilinx-announces-world-largest-fpga-virtex-ultrascale-vu19p-with-9m-cells, Aug 2019 - [Gallatin 19] Drew Gallatin, “Kernel TLS and hardware TLS offload in FreeBSD 13,” https://people.freebsd.org/~gallatin/talks/euro2019-ktls.pdf, 2019 - [Redestad 19] Claes Redestad, Staffan Friberg, Aleksey Shipilev, “JEP 230: Microbenchmark Suite,” http://openjdk.java.net/jeps/230, updated 2019 - [Bearman 20] Ian Bearman, “Exploring Profile Guided Optimization of the Linux Kernel,” https://linuxplumbersconf.org/event/7/contributions/771, 2020 - [Burnes 20] Andrew Burnes, “GeForce RTX 30 Series Graphics Cards: The Ultimate Play,” https://www.nvidia.com/en-us/geforce/news/introducing-rtx-30-series-graphics-cards, Sep 2020 - [Charlene 20] Charlene, “800G Is Coming: Set Pace to More Higher Speed Applications,” https://community.fs.com/blog/800-gigabit-ethernet-and-optics.html, May 2020 - [Cutress 20] Dr. Ian Cutress, “Insights into DDR5 Sub-timings and Latencies,” https://www.anandtech.com/show/16143/insights-into-ddr5-subtimings-and-latencies, Oct 2020 - [Ford 20] A. Ford, et al., “TCP Extensions for Multipath Operation with Multiple Addresses,” https://datatracker.ietf.org/doc/html/rfc8684, Mar 2020 - [Gregg 20] Brendan Gregg, “Systems Performance: Enterprise and the Cloud, Second Edition,” Addison-Wesley, 2020 - [Hruska 20] Joel Hruska, “Intel Demos PCIe 5.0 on Upcoming Sapphire Rapids CPUs,” https://www.extremetech.com/computing/316257-intel-demos-pcie-5-0-on-upcoming-sapphire-rapids-cpus, Oct 2020 - [Liu 20] Linda Liu, “Samsung QVO vs EVO vs PRO: What’s the Difference? [Clone Disk],” https://www.partitionwizard.com/clone-disk/samsung-qvo-vs-evo.html, 2020 - [Moore 20] Samuel K. Moore, “A Better Way to Measure Progress in Semiconductors,” https://spectrum.ieee.org/semiconductors/devices/a-better-way-to-measure-progress-in-semiconductors, Jul 2020 - [Peterson 20] Zachariah Peterson, “DDR5 vs. DDR6: Here's What to Expect in RAM Modules,” https://resources.altium.com/p/ddr5-vs-ddr6-heres-what-expect-ram-modules, Nov 2020 - [Salter 20] Jim Salter, “Western Digital releases new 18TB, 20TB EAMR drives,” https://arstechnica.com/gadgets/2020/07/western-digital-releases-new-18tb-20tb-eamr-drives, Jul 2020 - [Spier 20] Martin Spier, Brendan Gregg, et al., “FlameScope,” https://github.com/Netflix/flamescope, 2020 - [Tolvanen 20] Sami Tolvanen, Bill Wendling, and Nick Desaulniers, “LTO, PGO, and AutoFDO in the Kernel,” Linux Plumber’s Conference, https://linuxplumbersconf.org/event/7/contributions/798, 2020 - [Vega 20] Juan Camilo Vega, Marco Antonio Merlini, Paul Chow, “FFShark: A 100G FPGA Implementation of BPF Filtering for Wireshark,” IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2020 - [Warren 20] Tom Warren, “Microsoft reportedly designing its own ARM-based chips for servers and Surface PCs,” https://www.theverge.com/2020/12/18/22189450/microsoft-arm-processors-chips-servers-surface-report, Dec 2020 - [Google 21] Google, “Cloud TPU,” https://cloud.google.com/tpu, 2021 - [Haken 21] Michael Haken, et al., “Delta Lake 1S Server Design Specification 1v05, https://www.opencompute.org/documents/delta-lake-1s-server-design-specification-1v05-pdf, 2021 - [Intel 21] Intel corporation, “Intel® OptaneTM Technology,” https://www.intel.com/content/www/us/en/products/docs/storage/optane-technology-brief.html, 2021 - [Quach 21a] Katyanna Quach, “Global chip shortage probably won't let up until 2023, warns TSMC: CEO 'still expects capacity to tighten more',” https://www.theregister.com/2021/04/16/tsmc_chip_forecast, Apr 2021 - [Quach 21b] Katyanna Quach, “IBM says it's built the world's first 2nm semiconductor chips,” https://www.theregister.com/2021/05/06/ibm_2nm_semiconductor_chips, May 2021 - [Ridley 21] Jacob Ridley, “IBM agrees with Intel and TSMC: this chip shortage isn't going to end anytime soon,” https://www.pcgamer.com/ibm-agrees-with-intel-and-tsmc-this-chip-shortage-isnt-going-to-end-anytime-soon, May 2021 - [Shilov 21] Anton Shilov, “Samsung Develops 512GB DDR5 Module with HKMG DDR5 Chips,” https://www.tomshardware.com/news/samsung-512gb-ddr5-memory-module, Mar 2021 - [Shilov 21b] Anton Shilov, “Seagate Ships 20TB HAMR HDDs Commercially, Increases Shipments of Mach.2 Drives,” https://www.tomshardware.com/news/seagate-ships-hamr-hdds-increases-dual-actuator-shipments, 2021 - [Shilov 21c] Anton Shilov, “SK Hynix Envisions 600-Layer 3D NAND & EUV-Based DRAM,” https://www.tomshardware.com/news/sk-hynix-600-layer-3d-nand-euv-dram, Mar 2021 - [Shilov 21d] Anton Shilov, “Sapphire Rapids Uncovered: 56 Cores, 64GB HBM2E, Multi-Chip Design,” https://www.tomshardware.com/news/intel-sapphire-rapids-xeon-scalable-specifications-and-features, Apr 2021 - [SuperMicro 21] SuperMicro, “B12SPE-CPU-25G (For SuperServer Only),” https://www.supermicro.com/en/products/motherboard/B12SPE-CPU-25G, 2021 - [Thaler 21] Dave Thaler, Poorna Gaddehosur, “Making eBPF work on Windows,” https://cloudblogs.microsoft.com/opensource/2021/05/10/making-ebpf-work-on-windows, May 2021 - [TornadoVM 21] TornadoVM, “TornadoVM Run your software faster and simpler!” https://www.tornadovm.org, 2021 - [Trader 21] Tiffany Trader, “Cerebras Second-Gen 7nm Wafer Scale Engine Doubles AI Performance Over First-Gen Chip,” https://www.enterpriseai.news/2021/04/21/latest-cerebras-second-gen-7nm-wafer-scale-engine-doubles-ai-performance-over-first-gen-chip, Apr 2021 - [Vahdat 21] Amin Vahdat, “The past, present and future of custom compute at Google,” https://cloud.google.com/blog/topics/systems/the-past-present-and-future-of-custom-compute-at-google, Mar 2021 - [Wikipedia 21] “Semiconductor device fabrication,” https://en.wikipedia.org/wiki/Semiconductor_device_fabrication, 2021 - [Wikipedia 21b] “Silicon,” https://en.wikipedia.org/wiki/Silicon, 2021 - [ZonedStorage 21] Zoned Storage, “Zoned Namespaces (ZNS) SSDs,” https://zonedstorage.io/introduction/zns, 2021 I've taken care to cite the author names along with the talk title and date, including for Internet sources, instead of the common practice of just listing URLs. I followed that practice when writing some earlier books, and it has since struck me as unfair that some references had author names and some didn't. Nowadays I always include full names when known. In case you are interested, at the same conference I also gave a talk on [BPF Internals]. [youtube]: https://www.youtube.com/watch?v=5nN1wjA_S30 [PDF]: /Slides/LISA2021_ComputingPerformance.pdf [Systems Performance 2nd Edition]: /systems-performance-2nd-edition-book.html [BPF Internals]: /blog/2021-06-15/bpf-internals.html [slideshare]: https://www.slideshare.net/brendangregg/computing-performance-on-the-horizon-2021 [write about]: /blog/2021-07-03/how-to-add-bpf-observability.html

Brendan Gregg: How To Add eBPF Observability To Your Product

4 év 1 hónap óta
There's an arms race to add [eBPF] (BPF) to commercial observability products, and in this post I'll describe how to quickly do that. This is also applicable for people adding it to their own in-house monitoring systems. People like to show me their BPF observability products after they have prototyped or built them, but I often wish I had given them advice before they started. As the leader of BPF observability, it's advice I've been including in recent talks, and now I'm including it in this post. First, I know you're busy. You might not even like BPF. To be pragmatic, I'll describe how to spend the least effort to get the most value. Think of this as "version 1": A starting point that's pretty useful. Whether you follow this advice or not, at least please understand it to avoid later regrets and pain. If you're using an open source monitoring platform, first check if it already has a BPF agent. This post assumes it doesn't, and you'll be adding something for the first time. ## 1. Run your first tool Start by installing the [bcc] or [bpftrace] tools. E.g., bcc on Ubuntu: # apt-get install bpfcc-tools Then try running a tool. E.g., to see process execution with timestamps using execsnoop(8): # execsnoop-bpfcc -T TIME PCOMM PID PPID RET ARGS 19:36:15 service 828567 6009 0 /usr/sbin/service --status-all 19:36:15 basename 828568 828567 0 19:36:15 basename 828569 828567 0 /usr/bin/basename /usr/sbin/service 19:36:15 env 828570 828567 0 /usr/bin/env -i LANG=en_AU.UTF-8 LANGUAGE=en_AU:en LC_CTYPE= LC_NUMERIC= LC_TIME= LC_COLLATE= LC_MONETARY= LC_MESSAGES= LC_PAPER= LC_NAME= LC_ADDRESS= LC_TELEPHONE= LC_MEASUREMENT= LC_IDENTIFICATION= LC_ALL= PATH=/opt/local/bin:/opt/local/sbin:/usr/local/git/bin:/home/bgregg/.local/bin:/home/bgregg/bin:/opt/local/bin:/opt/local/sbin:/ TERM=xterm-256color /etc/init.d/acpid 19:36:15 acpid 828570 828567 0 /etc/init.d/acpid status 19:36:15 run-parts 828571 828570 0 /usr/bin/run-parts --lsbsysinit --list /lib/lsb/init-functions.d 19:36:15 systemctl 828572 828570 0 /usr/bin/systemctl -p LoadState --value show acpid.service 19:36:15 readlink 828573 828570 0 /usr/bin/readlink -f /etc/init.d/acpid [...] While basic, I've solved many perf issues with this tool alone, including for misconfigured systems where a shell script is launching failing processes in a loop, and when some minor application is crashing and is restarting every few minutes but has not yet been noticed. ## 2. Add a tool to your product Now imagine adding execsnoop(8) to your product. You likely already have agents running on all your customer systems. Do they have a way to run a command and return the text output? Or run a command and send the output elsewhere for aggregation (S3, Hive, Druid, etc.)? There are so many options it's really your own preference based on your existing system and customer environments. When you add your first tool to your product, have it run it for a short duration such as 10 to 60 seconds. I just noticed execsnoop(8) doesn't have a duration option yet, so in the interim you could wrap it with watch -s2 60 execsnoop-bpfcc. If you want to run these tools 24x7, study overheads to understand the cost first. Low frequency events such as process execution should be negligible to capture. Instead of bcc, you can also use the [bpftrace] versions. These typically don't have canned options (-v, -l, etc.), but do have a json output mode. E.g.: # bpftrace -f json execsnoop.bt {"type": "attached_probes", "data": {"probes": 2}} {"type": "printf", "data": "TIME(ms) PID ARGS\n"} {"type": "printf", "data": "2737 849176 "} {"type": "join", "data": "ls -F"} {"type": "printf", "data": "5641 849178 "} {"type": "join", "data": "date"} This mode was added so that BPF observability products can be built on top of bpftrace. ## 3. Don't worry about dependencies I am indeed suggesting that you install bcc or bpftrace on your customer systems, and they currently have llvm dependencies. This can add up to tens of Mbytes, which can be a problem for some resource-constrained environments (embedded). We've been doing lots of work to fix this in the future. bcc has newer versions of the tools (libbpf-tools) that use [BTF and CO-RE] \(and not Python) and will ultimately mean you can install 100-Kbyte binary versions of the tools with no dependencies. bpftrace has a similar plan to produce a small dependency-less binary using the newer kernel features. This does require at least Linux 5.8 to work well, and your customers may not run that for years. In the interim I'd suggest not worrying about the llvm dependencies for now since it will be fixed later. Note that not all Linux distributions have enabled CONFIG_DEBUG_INFO_BTF=y, which is necessary for the future of BTF and CO-RE. Major distros have set it, such as in Ubuntu 20.10, Fedora 30, and RHEL 8.2. But if you know some of your customers are running something uncommon, please check and encourage them or the distro vendor to set CONFIG_DEBUG_INFO_BTF=y and CONFIG_DEBUG_INFO_BTF_MODULES=y to avoid pain in the future. ## 4. Version 1 dashboard Now you have one BPF observability tool in your product, it's time to add more. Here are the top ten tools you can run and present as a generic BPF observability dashboard, along with suggested visualizations:
    ToolShowsVisualization 1.execsnoopNew processes (via exec(2))table 2.opensnoopFiles openedtable 3.ext4slowerSlow filesystem I/Otable 4.biolatencyDisk I/O latency histogramheat map 5.biosnoopDisk I/O per-event detailstable, offset heat map 6.cachestatFile system cache statisticsline charts 7.tcplifeTCP connectionstable, distributed graph 8.tcpretransTCP retransmissionstable 9.runqlatCPU scheduler latencyheat map 10.profileCPU stack trace samplesflame graph
This is based on my [bcc Tutorial], and many also exist in bpftrace. I chose these to find the most performance wins with the fewest tools. Note that runqlat and profile can have noticable overheads, so I'd run these tools for between 10 and 60 seconds only and generate a report. Some are low enough overhead to be run 24x7 if desired (e.g., execsnoop, biolatency, tcplife, tcpretrans). There is already documentation as man pages and example files in the bcc and bpftrace repositories that you can link to, to help your customers understand the tool output. E.g., here's the execsnoop(8) example files in bcc and bpftrace. Once you have this all working, you have version 1! ## bcc vs bpftrace The bcc tools are the easiest to use, as they usually have many command-line options. The bpftrace tools are easier to edit and customize, and bpftrace has a json output mode. If you're completely new to tracing, go with bcc. If you want to do some hacking and customizing of the tools, go with bpftrace. In the end, they are both good options. ## Case study: Netflix Netflix is building a new GUI that does this tool dashboard and more, based on the bpftrace versions of these tools. The architecture is: While the bpftrace binary is installed on all the target systems, the bpftrace tools (text files) live on a web server and are pushed out when needed. This means we can ensure we're always running the latest version of the tools by updating them in one place. This is currently part of our FlameCommander UI, which also runs flame graphs across the cloud. Our previous BPF GUI was part of [Vector], and used bcc, but we've since deprecated that. We'll likely open source the new one at some point and have a post about it on the Netflix tech blog. ## Case study: Facebook Facebook are advanced users of BPF, but deep details of how they run the tools fleet-wide aren't fully public. Based on the activity in bcc, and their development of the BTF and CO-RE technologies, I'd strongly suspect their solution is based on the bcc libbpf-tool versions. ## Porting Pitfalls BPF tracing tools are like application and kernel patches. They need constant updates to keep working across different software versions. Porting them to a different language and then not maintaining them may be like trying to apply a Linux 4.15 patch to Linux 5.12. If you're lucky, it blows up! If you're unlucky, the patch applies but corrupts some things in a subtle way that you don't notice until later. It depends on the tool. As an extreme example, I wrote cachestat(8) while on vacation in 2014 for use on the Netflix cloud, which was a mix of Linux 3.2 and 3.13 at the time. BPF didn't exist on those versions, so I used basic Ftrace capabilities that were available on Linux 3.2. I described this approach as [brittle] and a [sandcastle] that would need maintenance as the kernel changed. It was later ported to BPF with kprobes, and has now been rewritten and included in commercial observability products. Unsurprisingly, I've heard it has problems on newer kernels, printing output that doesn't make sense. It really needs an overhaul. When I (or someone) does, anyone pulling updates from bcc will automatically get the fixed version, no effort. Those that have rewritten it will need to rewrite theirs. I fear they won't, and customers will be running a broken version of cachestat(8) for years. Note that if BPF was available on my target environment when I wrote cachestat(8), I would have coded it completely differently. People are porting something written for Linux 3.2 and running it on Linux 5.x. In a previous blog post, [An Unbelievable Demo], I talked about how something similar happened many years ago where old tracing tool versions were used without updates. The problems I'm describing are specific to BPF software and kernel tracing. As a different example, my flame graph software has been rewritten over a dozen times, and since it's a simple and finished algorithm I don't see a big problem with that. I prefer people help with the newer [d3 version], but if people do their own it's no big deal. You can code it and it'll work forever. That's not the case with uprobe- and kprobe-based BPF tools, because they do need maintenance. ## Think like a sysadmin, not like a programmer In summary, start by checking if there's already a BPF agent for your monitoring systems, and if not, build one based on the existing [bcc] or [bpftrace] tools rather than rewriting everything from scratch. This is thinking like a sysadmin who installs and maintains software, and not like a programmer who codes everything. Install the bcc or bpftrace tools, add them to your observability product, and pull package updates as needed. That will be a quick and useful version 1. BPF up and running! I see people think like a programmer instead and feel they must start by learning bcc and BPF programming in depth. Then, having discovered everything is C or Python, some rewrite it all in a different language. First, learning bcc and BPF well takes weeks; Learning the subtleties and pitfalls of system tracing can take months or years. To give you a taste of what you're in for, check out my [BPF Internals] talk. If you really want to do this and have the time, you certainly can (you'll probably wind up at tracing conferences and bumping into me: See you at Linux Plumber's or the Tracing Summit!) But if you're under some deadline to add BPF observability, try thinking like a sysadmin instead and just build upon the existing tools. That's the fast way. Think like a programmer later, if or when you have the time. Second, the BPF software, especially certain kprobe-based tools, require ongoing maintenance. A tool may work on Linux 5.3 but break on 5.4, as a traced function was renamed or a new code path added. The BPF libraries and frameworks are also changing and evolving, most recently with the BTF and CO-RE support. This is something I hope people consider before choosing to rewrite them: Do you have a plan to rewrite all the updates as well, or will you end up stuck on an old port of the library? It's easier to pull updates of everything than to maintain your own versions. Finally, what if you have a great idea for a _better_ BPF library or framework than what we're using in bcc and bpftrace? Talk to us, try it out, innovate. We're at the start of the BPF era and there's lots more to explore. But please understand what exists first and the maintenance burden you are taking on. Your energies may be better spent creating something new, on top of what exists, than porting something old. [bcc]: https://github.com/iovisor/bcc [bpftrace]: https://github.com/iovisor/bpftrace [book]: /bpf-performance-tools-book.html [choosing]: /blog/2015-07-08/choosing-a-linux-tracer.html [An Unbelievable Demo]: /blog/2021-06-04/an-unbelievable-demo.html [d3 version]: https://github.com/spiermar/d3-flame-graph [bcc Tutorial]: https://github.com/iovisor/bcc/blob/master/docs/tutorial.md [brittle]: /blog/2014-12-31/linux-page-cache-hit-ratio.html [sandcastle]: https://github.com/brendangregg/perf-tools/blob/master/fs/cachestat [BTF and CO-RE]: /blog/2020-11-04/bpf-co-re-btf-libbpf.html [Vector]: https://github.com/Netflix/vector [eBPF]: https://ebpf.io/ [BPF Internals]: /blog/2021-06-15/bpf-internals.html

[$] The first half of the 5.14 merge window

4 év 1 hónap óta
As of this writing, just under 5,000 non-merge changesets have been pulled into the mainline repository for the 5.14 development cycle. That is less than half of the patches that have been queued up in linux-next, so it is fair to say that this merge window is getting off to a bit of a slow start. Nonetheless, a fair number of significant changes have been merged.
corbet