Hírolvasó

Darktable 3.6 released

4 év 1 hónap óta
Version 3.6 of the Darktable raw photo editor has been released. "The darktable team is proud to announce our second summer feature release, darktable 3.6. Merry (summer) Christmas! This is the first of two releases this year and, from here on, we intend to issue two new feature releases each year, around the summer and winter solstices." The list of new features is long, including a new color-balance module, a "censorize" module for partial pixelization of images, a new demosaic algorithm, and more.
corbet

Brendan Gregg: USENIX LISA2021 Computing Performance: On the Horizon

4 év 1 hónap óta
It's an exciting time for developments in computer performance, not just for the BPF technology (which I often [write about]) but also for processors with 3D stacking and cloud vendor CPUs (e.g., AWS Graviton2); for memory with the arrival of DDR5 and High Bandwidth Memory (HBM) on-processor; for storage including new uses for 3D Xpoint as a 3D NAND accelerator; for networking with the rise of QUIC and eXpress Data Path (XDP); and so on. I summarized these topics and more as a plenary conference talk, including my own predictions (as a senior performance engineer) for the future of computing performance, with a focus on back-end servers. The video is on [youtube]: The slides are on [slideshare] or as a [PDF]: I work on many areas of performance, but recently I've had a lot of demand to talk about BPF. This was a chance to talk about other things I've been working on, such as the present and future of hardware performance. I also wrote about these topics in detail for my recent [Systems Performance 2nd Edition] book. Note that my predictions in this talk may be wrong, but they should be thought-provoking. I hope you enjoy it! ## References I've reproduced the talk references below, so you can click on links: - [Gregg 08] Brendan Gregg, “ZFS L2ARC,” http://www.brendangregg.com/blog/2008-07-22/zfs-l2arc.html, Jul 2008 - [Gregg 10] Brendan Gregg, “Visualizations for Performance Analysis (and More),” https://www.usenix.org/conference/lisa10/visualizations-performance-analysis-and-more, 2010 - [Greenberg 11] Marc Greenberg, “DDR4: Double the speed, double the latency? Make sure your system can handle next-generation DRAM,” https://www.chipestimate.com/DDR4-Double-the-speed-double-the-latencyMake-sure-your-system-can-handle-next-generation-DRAM/Cadence/Technical-Article/2011/11/22, Nov 2011 - [Hruska 12] Joel Hruska, “The future of CPU scaling: Exploring options on the cutting edge,” https://www.extremetech.com/computing/184946-14nm-7nm-5nm-how-low-can-cmos-go-it-depends-if-you-ask-the-engineers-or-the-economists, Feb 2012 - [Gregg 13] Brendan Gregg, “Blazing Performance with Flame Graphs,” https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg, 2013 - [Shimpi 13] Anand Lal Shimpi, “Seagate to Ship 5TB HDD in 2014 using Shingled Magnetic Recording,” https://www.anandtech.com/show/7290/seagate-to-ship-5tb-hdd-in-2014-using-shingled-magnetic-recording, Sep 2013 - [Borkmann 14] Daniel Borkmann, “net: tcp: add DCTCP congestion control algorithm,” https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e3118e8359bb7c59555aca60c725106e6d78c5ce, 2014 - [Macri 15] Joe Macri, “Introducing HBM,” https://www.amd.com/en/technologies/hbm, Jul 2015 - [Cardwell 16] Neal Cardwell, et al., “BBR: Congestion-Based Congestion Control,” https://queue.acm.org/detail.cfm?id=3022184, 2016 - [Gregg 16] Brendan Gregg, “Unikernel Profiling: Flame Graphs from dom0,” http://www.brendangregg.com/blog/2016-01-27/unikernel-profiling-from-dom0.html, Jan 2016 - [Gregg 16b] Brendan Gregg, “Linux 4.X Tracing Tools: Using BPF Superpowers,” https://www.usenix.org/conference/lisa16/conference-program/presentation/linux-4x-tracing-tools-using-bpf-superpowers, 2016 - [Alcorn 17] Paul Alcorn, “Seagate To Double HDD Speed With Multi-Actuator Technology,” https://www.tomshardware.com/news/hdd-multi-actuator-heads-seagate,36132.html, 2017 - [Alcorn 17b] Paul Alcorn, “Hot Chips 2017: Intel Deep Dives Into EMIB,” https://www.tomshardware.com/news/intel-emib-interconnect-fpga-chiplet,35316.html#xenforo-comments-3112212, 2017 - [Corbet 17] Jonathan Corbet, “Two new block I/O schedulers for 4.12,” https://lwn.net/Articles/720675, Apr 2017 - [Gregg 17] Brendan Gregg, “AWS EC2 Virtualization 2017: Introducing Nitro,” http://www.brendangregg.com/blog/2017-11-29/aws-ec2-virtualization-2017.html, Nov 2017 - [Russinovich 17] Mark Russinovich, “Inside the Microsoft FPGA-based configurable cloud,” https://www.microsoft.com/en-us/research/video/inside-microsoft-fpga-based-configurable-cloud, 2017 - [Gregg 18] Brendan Gregg, “Linux Performance 2018,” http://www.brendangregg.com/Slides/Percona2018_Linux_Performance.pdf, 2018 - [Hady 18] Frank Hady, “Achieve Consistent Low Latency for Your Storage-Intensive Workloads,” https://www.intel.com/content/www/us/en/architecture-and-technology/optane-technology/low-latency-for-storage-intensive-workloads-article-brief.html, 2018 - [Joshi 18] Amit Joshi, et al., “Titus, the Netflix container management platform, is now open source,” https://netflixtechblog.com/titus-the-netflix-container-management-platform-is-now-open-source-f868c9fb5436, Apr 2018 - [Cutress 19] Dr. Ian Cutress, “Xilinx Announces World Largest FPGA: Virtex Ultrascale+ VU19P with 9m Cells,” https://www.anandtech.com/show/14798/xilinx-announces-world-largest-fpga-virtex-ultrascale-vu19p-with-9m-cells, Aug 2019 - [Gallatin 19] Drew Gallatin, “Kernel TLS and hardware TLS offload in FreeBSD 13,” https://people.freebsd.org/~gallatin/talks/euro2019-ktls.pdf, 2019 - [Redestad 19] Claes Redestad, Staffan Friberg, Aleksey Shipilev, “JEP 230: Microbenchmark Suite,” http://openjdk.java.net/jeps/230, updated 2019 - [Bearman 20] Ian Bearman, “Exploring Profile Guided Optimization of the Linux Kernel,” https://linuxplumbersconf.org/event/7/contributions/771, 2020 - [Burnes 20] Andrew Burnes, “GeForce RTX 30 Series Graphics Cards: The Ultimate Play,” https://www.nvidia.com/en-us/geforce/news/introducing-rtx-30-series-graphics-cards, Sep 2020 - [Charlene 20] Charlene, “800G Is Coming: Set Pace to More Higher Speed Applications,” https://community.fs.com/blog/800-gigabit-ethernet-and-optics.html, May 2020 - [Cutress 20] Dr. Ian Cutress, “Insights into DDR5 Sub-timings and Latencies,” https://www.anandtech.com/show/16143/insights-into-ddr5-subtimings-and-latencies, Oct 2020 - [Ford 20] A. Ford, et al., “TCP Extensions for Multipath Operation with Multiple Addresses,” https://datatracker.ietf.org/doc/html/rfc8684, Mar 2020 - [Gregg 20] Brendan Gregg, “Systems Performance: Enterprise and the Cloud, Second Edition,” Addison-Wesley, 2020 - [Hruska 20] Joel Hruska, “Intel Demos PCIe 5.0 on Upcoming Sapphire Rapids CPUs,” https://www.extremetech.com/computing/316257-intel-demos-pcie-5-0-on-upcoming-sapphire-rapids-cpus, Oct 2020 - [Liu 20] Linda Liu, “Samsung QVO vs EVO vs PRO: What’s the Difference? [Clone Disk],” https://www.partitionwizard.com/clone-disk/samsung-qvo-vs-evo.html, 2020 - [Moore 20] Samuel K. Moore, “A Better Way to Measure Progress in Semiconductors,” https://spectrum.ieee.org/semiconductors/devices/a-better-way-to-measure-progress-in-semiconductors, Jul 2020 - [Peterson 20] Zachariah Peterson, “DDR5 vs. DDR6: Here's What to Expect in RAM Modules,” https://resources.altium.com/p/ddr5-vs-ddr6-heres-what-expect-ram-modules, Nov 2020 - [Salter 20] Jim Salter, “Western Digital releases new 18TB, 20TB EAMR drives,” https://arstechnica.com/gadgets/2020/07/western-digital-releases-new-18tb-20tb-eamr-drives, Jul 2020 - [Spier 20] Martin Spier, Brendan Gregg, et al., “FlameScope,” https://github.com/Netflix/flamescope, 2020 - [Tolvanen 20] Sami Tolvanen, Bill Wendling, and Nick Desaulniers, “LTO, PGO, and AutoFDO in the Kernel,” Linux Plumber’s Conference, https://linuxplumbersconf.org/event/7/contributions/798, 2020 - [Vega 20] Juan Camilo Vega, Marco Antonio Merlini, Paul Chow, “FFShark: A 100G FPGA Implementation of BPF Filtering for Wireshark,” IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2020 - [Warren 20] Tom Warren, “Microsoft reportedly designing its own ARM-based chips for servers and Surface PCs,” https://www.theverge.com/2020/12/18/22189450/microsoft-arm-processors-chips-servers-surface-report, Dec 2020 - [Google 21] Google, “Cloud TPU,” https://cloud.google.com/tpu, 2021 - [Haken 21] Michael Haken, et al., “Delta Lake 1S Server Design Specification 1v05, https://www.opencompute.org/documents/delta-lake-1s-server-design-specification-1v05-pdf, 2021 - [Intel 21] Intel corporation, “Intel® OptaneTM Technology,” https://www.intel.com/content/www/us/en/products/docs/storage/optane-technology-brief.html, 2021 - [Quach 21a] Katyanna Quach, “Global chip shortage probably won't let up until 2023, warns TSMC: CEO 'still expects capacity to tighten more',” https://www.theregister.com/2021/04/16/tsmc_chip_forecast, Apr 2021 - [Quach 21b] Katyanna Quach, “IBM says it's built the world's first 2nm semiconductor chips,” https://www.theregister.com/2021/05/06/ibm_2nm_semiconductor_chips, May 2021 - [Ridley 21] Jacob Ridley, “IBM agrees with Intel and TSMC: this chip shortage isn't going to end anytime soon,” https://www.pcgamer.com/ibm-agrees-with-intel-and-tsmc-this-chip-shortage-isnt-going-to-end-anytime-soon, May 2021 - [Shilov 21] Anton Shilov, “Samsung Develops 512GB DDR5 Module with HKMG DDR5 Chips,” https://www.tomshardware.com/news/samsung-512gb-ddr5-memory-module, Mar 2021 - [Shilov 21b] Anton Shilov, “Seagate Ships 20TB HAMR HDDs Commercially, Increases Shipments of Mach.2 Drives,” https://www.tomshardware.com/news/seagate-ships-hamr-hdds-increases-dual-actuator-shipments, 2021 - [Shilov 21c] Anton Shilov, “SK Hynix Envisions 600-Layer 3D NAND & EUV-Based DRAM,” https://www.tomshardware.com/news/sk-hynix-600-layer-3d-nand-euv-dram, Mar 2021 - [Shilov 21d] Anton Shilov, “Sapphire Rapids Uncovered: 56 Cores, 64GB HBM2E, Multi-Chip Design,” https://www.tomshardware.com/news/intel-sapphire-rapids-xeon-scalable-specifications-and-features, Apr 2021 - [SuperMicro 21] SuperMicro, “B12SPE-CPU-25G (For SuperServer Only),” https://www.supermicro.com/en/products/motherboard/B12SPE-CPU-25G, 2021 - [Thaler 21] Dave Thaler, Poorna Gaddehosur, “Making eBPF work on Windows,” https://cloudblogs.microsoft.com/opensource/2021/05/10/making-ebpf-work-on-windows, May 2021 - [TornadoVM 21] TornadoVM, “TornadoVM Run your software faster and simpler!” https://www.tornadovm.org, 2021 - [Trader 21] Tiffany Trader, “Cerebras Second-Gen 7nm Wafer Scale Engine Doubles AI Performance Over First-Gen Chip,” https://www.enterpriseai.news/2021/04/21/latest-cerebras-second-gen-7nm-wafer-scale-engine-doubles-ai-performance-over-first-gen-chip, Apr 2021 - [Vahdat 21] Amin Vahdat, “The past, present and future of custom compute at Google,” https://cloud.google.com/blog/topics/systems/the-past-present-and-future-of-custom-compute-at-google, Mar 2021 - [Wikipedia 21] “Semiconductor device fabrication,” https://en.wikipedia.org/wiki/Semiconductor_device_fabrication, 2021 - [Wikipedia 21b] “Silicon,” https://en.wikipedia.org/wiki/Silicon, 2021 - [ZonedStorage 21] Zoned Storage, “Zoned Namespaces (ZNS) SSDs,” https://zonedstorage.io/introduction/zns, 2021 I've taken care to cite the author names along with the talk title and date, including for Internet sources, instead of the common practice of just listing URLs. I followed that practice when writing some earlier books, and it has since struck me as unfair that some references had author names and some didn't. Nowadays I always include full names when known. In case you are interested, at the same conference I also gave a talk on [BPF Internals]. [youtube]: https://www.youtube.com/watch?v=5nN1wjA_S30 [PDF]: /Slides/LISA2021_ComputingPerformance.pdf [Systems Performance 2nd Edition]: /systems-performance-2nd-edition-book.html [BPF Internals]: /blog/2021-06-15/bpf-internals.html [slideshare]: https://www.slideshare.net/brendangregg/computing-performance-on-the-horizon-2021 [write about]: /blog/2021-07-03/how-to-add-bpf-observability.html

Brendan Gregg: How To Add eBPF Observability To Your Product

4 év 1 hónap óta
There's an arms race to add [eBPF] (BPF) to commercial observability products, and in this post I'll describe how to quickly do that. This is also applicable for people adding it to their own in-house monitoring systems. People like to show me their BPF observability products after they have prototyped or built them, but I often wish I had given them advice before they started. As the leader of BPF observability, it's advice I've been including in recent talks, and now I'm including it in this post. First, I know you're busy. You might not even like BPF. To be pragmatic, I'll describe how to spend the least effort to get the most value. Think of this as "version 1": A starting point that's pretty useful. Whether you follow this advice or not, at least please understand it to avoid later regrets and pain. If you're using an open source monitoring platform, first check if it already has a BPF agent. This post assumes it doesn't, and you'll be adding something for the first time. ## 1. Run your first tool Start by installing the [bcc] or [bpftrace] tools. E.g., bcc on Ubuntu: # apt-get install bpfcc-tools Then try running a tool. E.g., to see process execution with timestamps using execsnoop(8): # execsnoop-bpfcc -T TIME PCOMM PID PPID RET ARGS 19:36:15 service 828567 6009 0 /usr/sbin/service --status-all 19:36:15 basename 828568 828567 0 19:36:15 basename 828569 828567 0 /usr/bin/basename /usr/sbin/service 19:36:15 env 828570 828567 0 /usr/bin/env -i LANG=en_AU.UTF-8 LANGUAGE=en_AU:en LC_CTYPE= LC_NUMERIC= LC_TIME= LC_COLLATE= LC_MONETARY= LC_MESSAGES= LC_PAPER= LC_NAME= LC_ADDRESS= LC_TELEPHONE= LC_MEASUREMENT= LC_IDENTIFICATION= LC_ALL= PATH=/opt/local/bin:/opt/local/sbin:/usr/local/git/bin:/home/bgregg/.local/bin:/home/bgregg/bin:/opt/local/bin:/opt/local/sbin:/ TERM=xterm-256color /etc/init.d/acpid 19:36:15 acpid 828570 828567 0 /etc/init.d/acpid status 19:36:15 run-parts 828571 828570 0 /usr/bin/run-parts --lsbsysinit --list /lib/lsb/init-functions.d 19:36:15 systemctl 828572 828570 0 /usr/bin/systemctl -p LoadState --value show acpid.service 19:36:15 readlink 828573 828570 0 /usr/bin/readlink -f /etc/init.d/acpid [...] While basic, I've solved many perf issues with this tool alone, including for misconfigured systems where a shell script is launching failing processes in a loop, and when some minor application is crashing and is restarting every few minutes but has not yet been noticed. ## 2. Add a tool to your product Now imagine adding execsnoop(8) to your product. You likely already have agents running on all your customer systems. Do they have a way to run a command and return the text output? Or run a command and send the output elsewhere for aggregation (S3, Hive, Druid, etc.)? There are so many options it's really your own preference based on your existing system and customer environments. When you add your first tool to your product, have it run it for a short duration such as 10 to 60 seconds. I just noticed execsnoop(8) doesn't have a duration option yet, so in the interim you could wrap it with watch -s2 60 execsnoop-bpfcc. If you want to run these tools 24x7, study overheads to understand the cost first. Low frequency events such as process execution should be negligible to capture. Instead of bcc, you can also use the [bpftrace] versions. These typically don't have canned options (-v, -l, etc.), but do have a json output mode. E.g.: # bpftrace -f json execsnoop.bt {"type": "attached_probes", "data": {"probes": 2}} {"type": "printf", "data": "TIME(ms) PID ARGS\n"} {"type": "printf", "data": "2737 849176 "} {"type": "join", "data": "ls -F"} {"type": "printf", "data": "5641 849178 "} {"type": "join", "data": "date"} This mode was added so that BPF observability products can be built on top of bpftrace. ## 3. Don't worry about dependencies I am indeed suggesting that you install bcc or bpftrace on your customer systems, and they currently have llvm dependencies. This can add up to tens of Mbytes, which can be a problem for some resource-constrained environments (embedded). We've been doing lots of work to fix this in the future. bcc has newer versions of the tools (libbpf-tools) that use [BTF and CO-RE] \(and not Python) and will ultimately mean you can install 100-Kbyte binary versions of the tools with no dependencies. bpftrace has a similar plan to produce a small dependency-less binary using the newer kernel features. This does require at least Linux 5.8 to work well, and your customers may not run that for years. In the interim I'd suggest not worrying about the llvm dependencies for now since it will be fixed later. Note that not all Linux distributions have enabled CONFIG_DEBUG_INFO_BTF=y, which is necessary for the future of BTF and CO-RE. Major distros have set it, such as in Ubuntu 20.10, Fedora 30, and RHEL 8.2. But if you know some of your customers are running something uncommon, please check and encourage them or the distro vendor to set CONFIG_DEBUG_INFO_BTF=y and CONFIG_DEBUG_INFO_BTF_MODULES=y to avoid pain in the future. ## 4. Version 1 dashboard Now you have one BPF observability tool in your product, it's time to add more. Here are the top ten tools you can run and present as a generic BPF observability dashboard, along with suggested visualizations:
    ToolShowsVisualization 1.execsnoopNew processes (via exec(2))table 2.opensnoopFiles openedtable 3.ext4slowerSlow filesystem I/Otable 4.biolatencyDisk I/O latency histogramheat map 5.biosnoopDisk I/O per-event detailstable, offset heat map 6.cachestatFile system cache statisticsline charts 7.tcplifeTCP connectionstable, distributed graph 8.tcpretransTCP retransmissionstable 9.runqlatCPU scheduler latencyheat map 10.profileCPU stack trace samplesflame graph
This is based on my [bcc Tutorial], and many also exist in bpftrace. I chose these to find the most performance wins with the fewest tools. Note that runqlat and profile can have noticable overheads, so I'd run these tools for between 10 and 60 seconds only and generate a report. Some are low enough overhead to be run 24x7 if desired (e.g., execsnoop, biolatency, tcplife, tcpretrans). There is already documentation as man pages and example files in the bcc and bpftrace repositories that you can link to, to help your customers understand the tool output. E.g., here's the execsnoop(8) example files in bcc and bpftrace. Once you have this all working, you have version 1! ## bcc vs bpftrace The bcc tools are the easiest to use, as they usually have many command-line options. The bpftrace tools are easier to edit and customize, and bpftrace has a json output mode. If you're completely new to tracing, go with bcc. If you want to do some hacking and customizing of the tools, go with bpftrace. In the end, they are both good options. ## Case study: Netflix Netflix is building a new GUI that does this tool dashboard and more, based on the bpftrace versions of these tools. The architecture is: While the bpftrace binary is installed on all the target systems, the bpftrace tools (text files) live on a web server and are pushed out when needed. This means we can ensure we're always running the latest version of the tools by updating them in one place. This is currently part of our FlameCommander UI, which also runs flame graphs across the cloud. Our previous BPF GUI was part of [Vector], and used bcc, but we've since deprecated that. We'll likely open source the new one at some point and have a post about it on the Netflix tech blog. ## Case study: Facebook Facebook are advanced users of BPF, but deep details of how they run the tools fleet-wide aren't fully public. Based on the activity in bcc, and their development of the BTF and CO-RE technologies, I'd strongly suspect their solution is based on the bcc libbpf-tool versions. ## Porting Pitfalls BPF tracing tools are like application and kernel patches. They need constant updates to keep working across different software versions. Porting them to a different language and then not maintaining them may be like trying to apply a Linux 4.15 patch to Linux 5.12. If you're lucky, it blows up! If you're unlucky, the patch applies but corrupts some things in a subtle way that you don't notice until later. It depends on the tool. As an extreme example, I wrote cachestat(8) while on vacation in 2014 for use on the Netflix cloud, which was a mix of Linux 3.2 and 3.13 at the time. BPF didn't exist on those versions, so I used basic Ftrace capabilities that were available on Linux 3.2. I described this approach as [brittle] and a [sandcastle] that would need maintenance as the kernel changed. It was later ported to BPF with kprobes, and has now been rewritten and included in commercial observability products. Unsurprisingly, I've heard it has problems on newer kernels, printing output that doesn't make sense. It really needs an overhaul. When I (or someone) does, anyone pulling updates from bcc will automatically get the fixed version, no effort. Those that have rewritten it will need to rewrite theirs. I fear they won't, and customers will be running a broken version of cachestat(8) for years. Note that if BPF was available on my target environment when I wrote cachestat(8), I would have coded it completely differently. People are porting something written for Linux 3.2 and running it on Linux 5.x. In a previous blog post, [An Unbelievable Demo], I talked about how something similar happened many years ago where old tracing tool versions were used without updates. The problems I'm describing are specific to BPF software and kernel tracing. As a different example, my flame graph software has been rewritten over a dozen times, and since it's a simple and finished algorithm I don't see a big problem with that. I prefer people help with the newer [d3 version], but if people do their own it's no big deal. You can code it and it'll work forever. That's not the case with uprobe- and kprobe-based BPF tools, because they do need maintenance. ## Think like a sysadmin, not like a programmer In summary, start by checking if there's already a BPF agent for your monitoring systems, and if not, build one based on the existing [bcc] or [bpftrace] tools rather than rewriting everything from scratch. This is thinking like a sysadmin who installs and maintains software, and not like a programmer who codes everything. Install the bcc or bpftrace tools, add them to your observability product, and pull package updates as needed. That will be a quick and useful version 1. BPF up and running! I see people think like a programmer instead and feel they must start by learning bcc and BPF programming in depth. Then, having discovered everything is C or Python, some rewrite it all in a different language. First, learning bcc and BPF well takes weeks; Learning the subtleties and pitfalls of system tracing can take months or years. To give you a taste of what you're in for, check out my [BPF Internals] talk. If you really want to do this and have the time, you certainly can (you'll probably wind up at tracing conferences and bumping into me: See you at Linux Plumber's or the Tracing Summit!) But if you're under some deadline to add BPF observability, try thinking like a sysadmin instead and just build upon the existing tools. That's the fast way. Think like a programmer later, if or when you have the time. Second, the BPF software, especially certain kprobe-based tools, require ongoing maintenance. A tool may work on Linux 5.3 but break on 5.4, as a traced function was renamed or a new code path added. The BPF libraries and frameworks are also changing and evolving, most recently with the BTF and CO-RE support. This is something I hope people consider before choosing to rewrite them: Do you have a plan to rewrite all the updates as well, or will you end up stuck on an old port of the library? It's easier to pull updates of everything than to maintain your own versions. Finally, what if you have a great idea for a _better_ BPF library or framework than what we're using in bcc and bpftrace? Talk to us, try it out, innovate. We're at the start of the BPF era and there's lots more to explore. But please understand what exists first and the maintenance burden you are taking on. Your energies may be better spent creating something new, on top of what exists, than porting something old. [bcc]: https://github.com/iovisor/bcc [bpftrace]: https://github.com/iovisor/bpftrace [book]: /bpf-performance-tools-book.html [choosing]: /blog/2015-07-08/choosing-a-linux-tracer.html [An Unbelievable Demo]: /blog/2021-06-04/an-unbelievable-demo.html [d3 version]: https://github.com/spiermar/d3-flame-graph [bcc Tutorial]: https://github.com/iovisor/bcc/blob/master/docs/tutorial.md [brittle]: /blog/2014-12-31/linux-page-cache-hit-ratio.html [sandcastle]: https://github.com/brendangregg/perf-tools/blob/master/fs/cachestat [BTF and CO-RE]: /blog/2020-11-04/bpf-co-re-btf-libbpf.html [Vector]: https://github.com/Netflix/vector [eBPF]: https://ebpf.io/ [BPF Internals]: /blog/2021-06-15/bpf-internals.html

[$] The first half of the 5.14 merge window

4 év 1 hónap óta
As of this writing, just under 5,000 non-merge changesets have been pulled into the mainline repository for the 5.14 development cycle. That is less than half of the patches that have been queued up in linux-next, so it is fair to say that this merge window is getting off to a bit of a slow start. Nonetheless, a fair number of significant changes have been merged.
corbet

Security updates for Friday

4 év 1 hónap óta
Security updates have been issued by Fedora (ansible and seamonkey), openSUSE (go1.15 and opera), Oracle (kernel and microcode_ctl), and Red Hat (go-toolset-1.15 and go-toolset-1.15-golang).
jake

Kuhn: It Matters Who Owns Your Copylefted Copyrights

4 év 1 hónap óta
Bradley Kuhn has posted a lengthy missive on the Software Freedom Conservancy blog about the hazards of distributed copyright ownership.

As a result, in debates about copyright ownership, discussions of what policy contributors want regarding the fruits of their labor is sadly moot. Without a clear, organized mitigation strategy to assure that FOSS contributors keep their own copyrights, a project (such as GCC or glibc) that switches from a standing “(nearly) all copyrights assigned to a charity” model to a plain Developer Certificate of Origin (DCO) or naked inbound=outbound contributor arrangement will, after a period of years, mostly likely to have copyrights that are primarily held by the employers of the most prolific contributors, rather than by the contributors themselves.

corbet

[$] Core scheduling lands in 5.14

4 év 1 hónap óta
The core scheduling feature has been under discussion for over three years. For those who need it, the wait is over at last; core scheduling was merged for the 5.14 kernel release. Now that this work has reached a (presumably) final form, a look at why this feature makes sense and how it works is warranted. Core scheduling is not for everybody, but it may prove to be quite useful for some user communities.
corbet

Security updates for Thursday

4 év 1 hónap óta
Security updates have been issued by Debian (htmldoc, ipmitool, and node-bl), Fedora (libgcrypt and libtpms), Mageia (dhcp, glibc, p7zip, sqlite3, systemd, and thunar), openSUSE (arpwatch, go1.15, and kernel), SUSE (curl, dbus-1, go1.15, and qemu), and Ubuntu (xorg-server).
jake

[$] Mozilla Rally: trading privacy for the "public good"

4 év 1 hónap óta
A new project from Mozilla, which is meant to help researchers collect browsing data, but only with the informed consent of the browser-user, is taking a lot of heat, perhaps in part because the company can never seem to do anything right, at least in the eyes of some. Mozilla Rally was announced on June 25 as joint venture between the company and researchers at Princeton University "to enable crowdsourced science for public good". The idea is that users can volunteer to give academic studies access to the same kinds of browser data that is being tracked in some browsers today. Whether the privacy safeguards are strong enough—and if there is sufficient reason for users to sign up—remains to be seen.
jake

Linux Plumbers Conference: Real-time Microconference Accepted into 2021 Linux Plumbers Conference

4 év 1 hónap óta

We are pleased to announce that the Real-time Microconference has been accepted into the 2021 Linux Plumbers Conference. Since 2004, the project that has become known as PREEMPT_RT, formally the real-time patch, has improved the real-time and low-latency features of the Linux kernel. Over the past decade, many parts of PREEMPT_RT have been included into the official Linux codebase. Examples include: mutexes, high-resolution timers, lockdep, ftrace, RT scheduling, SCHED_DEADLINE, RCU_PREEMPT, generic interrupts, priority inheritance futexes, threaded interrupt handlers, and more. The number of patches that need integration has been significantly reduced, and the rest is mature enough to make their way into mainline Linux.

The following accomplishments have been made as a result of last year’s microconference:

This year’s topics to be discussed include:

  • New tools for PREEMPT_RT analysis.
  • How do we teach the rest of the kernel developers how not to break PREEMPT_RT?
  • Stable maintainers tools discussion & improvements.
  • The usage of PREEMPT_RT on safety-critical systems: what do we need to do?
  • Make NAPI and the kernel-rt working better together.
  • Migrate disable and the problems that they cause on rt tasks.
  • It is time to discuss the “BKL”-like style of our preempt/bh/irq_disable() synchronization functions.
  • How do we close the documentation gap
  • The status of the merge, and how can we resolve the last issues that block the merge.
  • Invite the developers of the areas where patches are still under discussion to help to find an agreement.
  • How can we improve the testing of the -rt, to follow the problems raised as Linus tree advances?
  • What’s next?

Come and join us in the discussion of controlling what tasks get to runon your machine and when.

We hope to see you there.

Security updates for Wednesday

4 év 1 hónap óta
Security updates have been issued by Debian (fluidsynth), Fedora (libgcrypt and tpm2-tools), Mageia (nettle, nginx, openvpn, and re2c), openSUSE (kernel, roundcubemail, and tor), Oracle (edk2, lz4, and rpm), Red Hat (389-ds:1.4, edk2, fwupd, kernel, kernel-rt, libxml2, lz4, python38:3.8 and python38-devel:3.8, rpm, ruby:2.5, ruby:2.6, and ruby:2.7), and SUSE (kernel and lua53).
ris

An EPYC escape: Case-study of a KVM breakout (Project Zero blog)

4 év 1 hónap óta
Over at the Project Zero blog, Felix Wilhelm posted a lengthy account of a vulnerability he found in the Linux kernel's KVM (Kernel-based virtual machine) subsystem: In this blog post I describe a vulnerability in KVM’s AMD-specific code and discuss how this bug can be turned into a full virtual machine escape. To the best of my knowledge, this is the first public writeup of a KVM guest-to-host breakout that does not rely on bugs in user space components such as QEMU. The discussed bug was assigned CVE-2021-29657, affects kernel versions v5.10-rc1 to v5.12-rc6 and was patched at the end of March 2021. As the bug only became exploitable in v5.10 and was discovered roughly 5 months later, most real world deployments of KVM should not be affected. I still think the issue is an interesting case study in the work required to build a stable guest-to-host escape against KVM and hope that this writeup can strengthen the case that hypervisor compromises are not only theoretical issues.
jake

[$] An unpleasant surprise for My Book Live owners

4 év 1 hónap óta
Embedded devices need regular software updates in order to even be minimally safe on today's internet. Products that have reached their "end of life", thus are no longer being updated, are essentially ticking time bombs—it is only a matter of time before they are vulnerable to attack. That situation played out in June for owners of Western Digital (WD) My Book Live network-attached storage (NAS) devices; what was meant to be a disk for home users accessible via the internet turned into a black hole when a remote command-execution flaw was used to delete all of the data stored there. Or so it seemed at first.
jake

Security updates for Tuesday

4 év 1 hónap óta
Security updates have been issued by Debian (klibc and libjdom2-java), Mageia (bash, glibc, gnutls, java-openjdk, kernel, kernel-linus, leptonica, libgcrypt, openjpeg2, tor, and trousers), openSUSE (bouncycastle, chromium, go1.16, and kernel), Oracle (docker-engine docker-cli and qemu), Red Hat (kpatch-patch), and SUSE (arpwatch, go1.16, kernel, libsolv, microcode_ctl, and python-urllib3, python-requests).
ris

The first ever KernelCI hackfest

4 év 1 hónap óta
The KernelCI continuous-integration project held its first hackfest recently. Developers from the KernelCI team, Google, and Collabora worked to improve many different aspects of KernelCI testing capabilities. There are plans for more hackfests. The first-ever KernelCI hackfest was a success. It kicked off the work to enable kernel testing through Chromium OS, a product-specific userspace. Enabling full userspace images and real-world tests like video call simulations adds a lot of complexity to the testing process. However, the benefits are a clear win for the community. They allow a more thorough kernel testing and validation through real application use cases, which can exercise several different kernel areas at the same time in an organized manner. Generally, it is not simple for lower-level kernel test suites like kselftests or LTP to orchestrate a similar use case.
ris

[$] Some 5.13 development statistics

4 év 1 hónap óta
As expected, the 5.13 development cycle turned out to be a busy one, with 16,030 non-merge changesets being pulled into the mainline over a period of nine weeks. The 5.13 release happened on June 27, meaning that it must be time for our traditional look at the provenance of the code that was merged for this kernel.
corbet

Security updates for Monday

4 év 1 hónap óta
Security updates have been issued by Debian (bluez, intel-microcode, tiff, and xmlbeans), Fedora (openssh and php-phpmailer6), openSUSE (freeradius-server, java-1_8_0-openjdk, live555, openexr, roundcubemail, tor, and tpm2.0-tools), SUSE (bouncycastle and zziplib), and Ubuntu (linux-kvm and thunderbird).
ris

The 5.13 kernel has been released

4 év 1 hónap óta
Linus has released the 5.13 kernel.

Of course, if the last week was small and calm, 5.13 overall is actually fairly large. In fact, it's one of the bigger 5.x releases, with over 16k commits (over 17k if you count merges), from over 2k developers. But it's a 'big all over' kind of thing, not something particular that stands out as particularly unusual.

Headline features in this release include the "misc" group controller, multiple sources for trusted keys, kernel stack randomization on every system call, support for Clang control-flow integrity enforcement, the ability to call kernel functions directly from BPF programs, minor-fault handling for userfaultfd(), the removal of /dev/kmem, the Landlock security module, and, of course, thousands of cleanups and fixes.

corbet