Hírolvasó

Security updates for Thursday

3 év 11 hónap óta
Security updates have been issued by Debian (sssd), Fedora (libtpms and vim), openSUSE (kernel and php7-pear), Oracle (kernel), Slackware (curl), and Ubuntu (libgcrypt20 and squashfs-tools).
jake

[$] Revisiting NaNs in Python

3 év 11 hónap óta
Back in January 2020, we looked at some oddities in Python's handling of Not a Number (NaN) values in its statistics module. The conversation went quiet after that, but it has been revived recently with an eye toward fixing the problems that were reported. As detailed in that earlier article, NaNs are rather strange beasts in the floating-point universe, so figuring out how best to deal with their presence is less straightforward than it might seem.
jake

Security updates for Wednesday

3 év 11 hónap óta
Security updates have been issued by Arch Linux (chromium, element-desktop, element-web, firefox, ghostscript, and hedgedoc), Fedora (kernel and openssl), openSUSE (ghostscript, htmldoc, and openssl-1_0_0), Oracle (libtirpc), Red Hat (cyrus-imapd, kernel, and kernel-rt), SUSE (ghostscript), and Ubuntu (apport, curl, and squashfs-tools).
ris

[$] Roundup: managing issues for 20 years

3 év 11 hónap óta
The Roundup Issue Tracker is a flexible tool for managing issues via the web or email. However, Roundup is useful for more than web-based bug tracking or help-desk ticketing; it can be used as a simple wiki or to manage tasks with the Getting Things Done (GTD) methodology. The 20th-anniversary edition of Roundup, version 2.1.0, was released in July; it is a maintenance release, but there have been a number of larger improvements in the last year or so. Here we introduce Roundup's features along with the recent developments that have helped make Roundup even more useful for tracking issues to their resolution.
jake

Pete Zaitcev: Scalability of a varying degree

3 év 11 hónap óta

Seen at official site of Qumulo:

Scale

Platforms must be able to serve petabytes of data, billions of files, millions of operations, and thousands of users.

Thousands of users...? Isn't it a little too low? Typical Swift clusters in Telcos have tens of millions of users, of which tens or hundreds of thousands are active simultaneously.

Google's Chumby paper has a little section on scalability problem with talking to a cluster over TCP/IP. Basically at low tens of thousands you're starting to have serious issues with kernel sockets and TIME_WAIT. So maybe that.

Security updates for Tuesday

3 év 11 hónap óta
Security updates have been issued by openSUSE (libaom and nextcloud), Oracle (cyrus-imapd, firefox, and thunderbird), Red Hat (kernel and kpatch-patch), Scientific Linux (firefox and thunderbird), and Ubuntu (apport).
ris

Paul E. Mc Kenney: Stupid RCU Tricks: Making Race Conditions More Probable

3 év 11 hónap óta
Given that it is much more comfortable chasing down race conditions reported by rcutorture than those reported from the field, it would be good to make race conditions more probable during rcutorture runs than in production. A number of tricks are used to make this happen, including making rare events (such as CPU-hotplug operations) happen more frequently, testing the in-kernel RCU API directly from within the kernel, and so on.

Another approach is to change timing. Back at Sequent in the 1990s, one way that this was accomplished was by plugging different-speed CPUs into the same system and then testing on that system. It was observed that for certain types of race conditions, the probability of the race occurring increased by the ratio of the CPU speeds. One such race condition is when a timed event on the slow CPU races with a workload-driven event on the fast CPU. If the fast CPU is (say) two times faster than the slow CPU, then the timed event will provide two times greater “collision cross section” than if the same workload was running on CPUs running at the same speed.

Given that modern CPUs can easily adjust their core clock rates at runtime, it is tempting to try this same trick on present-day systems. Unfortunately, everything and its dog is adjusting CPU clock rates for various purposes, plus a number of modern CPUs are quite happy to let you set their core clock rates to a value sufficient to result in physical damage. Throwing rcutorture into this fray might be entertaining, but it is unlikely to be all that productive.

Another approach is to make use of memory latency. The idea is for the rcutorture scripting to place one pair of a given scenario's vCPUs in the hyperthreads of a single core and to place another pair of that same scenario's vCPUs in the hyperthreads of a different single core, and preferably a core on some other socket. The theory is that the different communications latencies and bandwidths within a core on the one hand and between cores (or, better yet, between sockets) on the other should have roughly the same effect as does varying CPU core clock rates.

OK, theory is all well and good, but what happens in practice?

As it turns out, on dual-socket systems, quite a bit.

With this small change to the rcutorture scripting, RCU Tasks Trace suddenly started triggering assertions. These test failures led to no fewer than 12 fixes, perhaps most notably surrounding proper handling of the count of tasks from which quiescent states are needed. This caused me to undertake a full review of RCU Tasks Trace, greatly assisted by Boqun Feng, Frederic Weisbecker, and Neeraj Upadhyay, with Neeraj providing half of the fixes. There is likely to be another fix or three, but then again isn't that always the case?

More puzzling were the 2,199.0-second RCU CPU stall warnings (described in more detail here). These were puzzling for a number of reasons:

  1. The RCU CPU stall warning timeout is set to only 21 seconds.
  2. There was absolutely no console output during the full stall duration.
  3. The stall duration was never 2,199.1 seconds and never 2,198.9 seconds, but always exactly 2,199.0 seconds, give or take a (very) few tens of milliseconds. (Kudos to Willy Tarreau for pointing out offlist that 2,199.02 seconds is almost exactly 2 to the 41st power worth of nanoseconds. Coincidence? You decide!)
  4. The stalled CPU usually took only a handful of scheduling-clock interrupts during the stall, but would sometimes take them at a rate of 100,000 per second, which seemed just a bit excessive for a kernel built with HZ=1000.
  5. At the end of the stall, the kernel happily continued, usually with no other complaints.
These stall warnings appeared most frequently when running rcutorture's TREE04 scenario.

But perhaps this is not a stall, but instead a case of time jumping forward. This might explain the precision of the stall duration, and would definitely explain the lack of intervening console output, the lack of other complaints, and the kernel's being happy to continue at the end of the stall. Not so much the occasional extreme rate of scheduling-clock interrupts, but perhaps that is a separate problem.

However, running large numbers (as in 200) of concurrent shorter one-hour TREE04 runs often resulted in the run terminating (forcibly) in the middle of the stall. Now this might be due to the host's and the guests' clocks all jumping forward at the same time, except that different guests stalled at different times, and even when running TREE04, most guests didn't stall at all. Therefore, the stalls really did stall, and for a very long time.

But then it should be possible to work out what the CPUs were doing in the meantime. One approach would be to use tracing, but previous experience with massive volumes of trace messages (and thus lost trace messages) suggested a more surgical approach. Furthermore, the last console message before the stall was always of the form “kvm-clock: cpu 3, msr d4a80c1, secondary cpu clock” and the first console message after the stall was always of the form “kvm-guest: stealtime: cpu 3, msr 1f597140”. These are widely separated and and are often printed from different CPUs, which also suggests a more surgical approach. This situation also implicates CPU hotplug, but this is not at all unusual.

The first attempt at exploratory surgery used the jiffies counter to check for segments of code taking more than 100 seconds to complete. Unfortunately, these checks never triggered, even in runs having stall warnings. So maybe the jiffies counter is not being updated. It is easy enough to switch to ktime_get_mono_fast_ns(), right? Except that this did not trigger, either.

Maybe there is a long-running interrupt handler? Mark Rutland recently posted a patchset to detect exactly that, so I applied it. But it did not trigger.

I switched to ktime_get() in order to do cross-CPU time comparisons, and out of shear paranoia added checks for time going backwards. And these backwards-time checks really did trigger just before the stall warnings appeared, once again demonstrating the concurrent-programming value of a healthy level paranoia, and also explaining why many of my earlier checks were not triggering. Time moved forward, and then jumped backwards, making it appear that no time had passed. (Time did jump forward again, but that happened after the last of my debug code had executed.)

Adding yet more checks showed that the temporal issues were occurring within stop_machine_from_inactive_cpu(). This invocation takes the mtrr_rendezvous_handler() function as an argument, and it really does take 2,199.0 seconds (that is, about 36 minutes) from the time that stop_machine_from_inactive_cpu() is called until the time that mtrr_rendezvous_handler() is called. But only sometimes.

Further testing confirmed that increasing the frequency of CPU-hotplug operations increased the frequency of 2,199.0-second stall warnings.

A extended stint of code inspection suggested further diagnostics, which showed that one of the CPUs would be stuck in the multi_cpu_stop() state machine. The stuck CPU was never CPU 0 and was never the incoming CPU. Further tests showed that the scheduler always thought that all of the CPUs, including the stuck CPU, were in the TASK_RUNNING state. Even more instrumentation showed that the stuck CPU was failing to advance to state 2 (MULTI_STOP_DISABLE_IRQ), meaning that all of the other CPUs were spinning in a reasonably tight loop with interrupts disabled. This could of course explain the lack of console messages, at least from the non-stuck CPUs.

Might qemu and KVM be to blame? A quick check of the code revealed that vCPUs are preserved across CPU-hotplug events, that is, taking a CPU offline does not cause qemu to terminate the corresponding user-level thread. Furthermore, the distribution of stuck CPUs was uniform across the CPUs other than CPU 0. The next step was to find out where CPUs were getting stuck within the multi_cpu_stop() state machine. The answer was “at random places”. Further testing also showed that the identity of the CPU orchestrating the onlining of the incoming CPU had nothing to do with the problem.

Now TREE04 marks all but CPU 0 as nohz_full CPUs, meaning that they disable their scheduling-clock interrupts when running in userspace when only one task is runnable on that CPU. Maybe the CPUs need to manually enable their scheduling-clock interrupt when starting multi_cpu_stop()? This did not fix the problem, but it did manage to shorten some of the stalls, in a few cases to less than ten minutes.

The next trick was to send an IPI to the stalled CPU every 100 seconds during multi_cpu_stop() execution. To my surprise, this IPI was handled by the stuck CPU, although with surprisingly long delays ranging from just a bit less than one millisecond to more than eight milliseconds.

This suggests that the stuck CPUs might be suffering from an interrupt storm, so that the IPI had to wait for its turn among a great many other interrupts. Further testing therefore sent an NMI backtrace at 100 seconds into multi_cpu_stop() execution. The resulting stack traces showed that the stuck CPU was always executing within sysvec_apic_timer_interrupt() or some function that it calls. Further checking showed that the stuck CPU was in fact suffering from an interrupt storm, namely an interrupt storm of scheduling-clock interrupts. This spurred another code-inspection session.

Subsequent testing showed that the interrupt duration was about 3.5 microseconds, which corresponded to about one third of the stuck CPU's time. It appears that the other two-thirds is consumed repeatedly entering and exiting the interrupt.

The retriggering of the scheduling-clock interrupt does have some potential error conditions, including setting times in the past and various overflow possibilities. Unfortunately, further diagnostics showed that none of this was happening. However, they also showed that the code was trying to schedule the next interrupt at time KTIME_MAX, so that an immediate relative-time-zero interrupt is a rather surprising result.

So maybe this confusion occurs only when multi_cpu_stop() preempts some timekeeping activity. Now TREE04 builds its kernels with CONFIG_PREEMPT=n, but maybe there is an unfortunately placed call to schedule() or some such. Except that further code inspection found no such possibility. Furthermore, another test run that dumped the previous task running on each CPU showed nothing suspicious (aside from rcutorture, which some might argue is always suspicious).

And further debugging showed that tick_program_event() thought that it was asking for the scheduling-clock interrupt to be turned off completely. This seemed like a good time to check with the experts, and Frederic Weisbecker, noting that all of the action was happening within multi_cpu_stop() and its called functions, ran the following command to enlist ftrace, while also limiting its output to something that the console might reasonably keep up with:

./kvm.sh --configs "18*TREE04" --allcpus --bootargs "ftrace=function_graph ftrace_graph_filter=multi_cpu_stop" --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y"
This showed that there was no hrtimer pending (consistent with KTIME_MAX), and that the timer was nevertheless being set to fire immediately. Frederic then proposed the following small patch:

--- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -595,7 +595,8 @@ void irq_enter_rcu(void)  {         __irq_enter_raw();   -       if (is_idle_task(current) && (irq_count() == HARDIRQ_OFFSET)) +       if (tick_nohz_full_cpu(smp_processor_id()) || +           (is_idle_task(current) && (irq_count() == HARDIRQ_OFFSET)))                 tick_irq_enter();           account_hardirq_enter(current);
This forces the jiffies counter to be recomputed upon interrupt from nohz_full CPUs in addition to idle CPUs, which avoids the timekeeping confusion that caused KTIME_MAX to be interpreted as zero.

And a 20-hour run for each of 200 instances of TREE04 was free of RCU CPU stall warnings! (This represents 4,000 hours of testing consuming 32,000 CPU-hours.)

This was an example of that rare form of deadlock, a temporary deadlock. The stuck CPU was stuck because timekeeping wasn't happening. Timekeeping wasn't happening because all the timekeeping CPUs were spinning in multi_cpu_stop() with interrupts disabled. The other CPUs could not exit their spinloops (and thus could not update timekeeping information) because the stuck CPU did not advance through the multi_cpu_stop() state machine.

So what caused this situation to be temporary? I must confess that I have not dug into it (nor do I intend to), but my guess is that integer overflow resulted in KTIME_MAX once again taking on its proper role, thus ending the stuck CPU's interrupt storm and in turn allowing the multi_cpu_stop() state machine to advance.

Nevertheless, this completely explains the mystery. Assuming integer overflow, the extremely repeatable stall durations make perfect sense. The RCU CPU stall warning did not happen at the expected 21 seconds because all the CPUs were either spinning with interrupts disabled on the one hand or being interrupt stormed on the other. The interrupt-stormed CPU did not report the RCU CPU stall because the jiffies counter wasn't incrementing. A random CPU would report the stall, depending on which took the first scheduling-clock tick after time jumped backwards (again, presumably due to integer overflow) and back forwards. In the relatively rare case where this CPU was the stuck CPU, it reported an amazing number of scheduling clock ticks, otherwise very few. Since everything was stuck, it is only a little surprising that the kernel continued blithely on after the stall ended. TREE04 reproduced the problem best because it had the largest proportion of nohz_full CPUs.

All in all, this experience was a powerful (if sometimes a bit painful) demonstration of the ability of controlled memory latencies to flush out rare race conditions!

A disagreement over the PostgreSQL trademark

3 év 11 hónap óta
This release on PostgreSQL.org describes an ongoing disagreement over the PostgreSQL trademark:

In 2020, the PostgreSQL Core Team was made aware that an organization had filed applications to register the 'PostgreSQL' and 'PostgreSQL Community' trademarks in the European Union and the United States, and had already registered trademarks in Spain. The organization, a 3rd party not-for-profit corporation in Spain called 'Fundación PostgreSQL,' did not give any indication to the PostgreSQL Core Team or PGCAC that they would file these applications.

corbet

[$] The rest of the 5.15 merge window

3 év 11 hónap óta
Linus Torvalds released 5.15-rc1 and closed the merge window for this release on September 12; at that point, 10,471 non-merge changesets had found their way into the mainline repository. Those changesets contain a lot of significant changes and improvements. Read on for a summary of what came into the mainline in the roughly 7,000 changesets pulled since our first-half summary was written.
corbet

Security updates for Monday

3 év 11 hónap óta
Security updates have been issued by Debian (qemu and thunderbird), Fedora (chromium, firefox, and mosquitto), openSUSE (apache2-mod_auth_openidc, gifsicle, openssl-1_1, php7-pear, and wireshark), Oracle (oswatcher), Red Hat (cyrus-imapd, firefox, and thunderbird), SUSE (apache2-mod_auth_openidc, compat-openssl098, php7-pear, and wireshark), and Ubuntu (git and linux, linux-aws, linux-aws-hwe, linux-azure, linux-azure-4.15, linux-dell300x, linux-hwe, linux-kvm, linux-oracle, linux-snapdragon).
ris

GDB 11.1 released

3 év 11 hónap óta
Version 11.1 of the GDB debugger is out. There are a number of new features, and somebody will surely be disappointed to see that support for debugging Arm Symbian programs has been removed.
corbet

Kernel prepatch 5.15-rc1

3 év 11 hónap óta
Linus has released 5.15-rc1 and closed the merge window for this development cycle.

So 5.15 isn't shaping up to be a particularly large release, at least in number of commits. At only just over 10k non-merge commits, this is in fact the smallest rc1 we have had in the 5.x series. We're usually hovering in the 12-14k commit range.

That said, counting commits isn't necessarily the best measure, and that might be particularly true this time around. We have a few new subsystems, with NTFSv3 and ksmbd standing out.

corbet

SPDX Becomes Internationally Recognized Standard for Software Bill of Materials

3 év 11 hónap óta
The Linux Foundation has announced that Software Package Data Exchange (SPDX) has become an international standard (ISO/IEC 5962:2021). SPDX has been used in the kernel and other projects to identify the licenses and attach other metadata to software components. Between eighty and ninety percent (80%-90%) of a modern application is assembled from open source software components. An SBOM [software bill of materials] accounts for the software components contained in an application — open source, proprietary, or third-party — and details their provenance, license, and security attributes. SBOMs are used as a part of a foundational practice to track and trace components across software supply chains. SBOMs also help to proactively identify software issues and risks and establish a starting point for their remediation.

SPDX results from ten years of collaboration from representatives across industries, including the leading Software Composition Analysis (SCA) vendors – making it the most robust, mature, and adopted SBOM standard.

jake

[$] The folio pull-request pushback

3 év 11 hónap óta
When we last caught up with the page folio patch set, it appeared to be on track to be pulled into the mainline during the 5.15 merge window. Matthew Wilcox duly sent a pull request in August to make that happen. While it is possible that folios could still end up in 5.15, that has not happened as of this writing and appears increasingly unlikely. What we got instead was a lengthy discussion on the merits of the folio approach.
corbet

Security updates for Friday

3 év 11 hónap óta
Security updates have been issued by Debian (firefox-esr, ghostscript, ntfs-3g, and postorius), Fedora (java-1.8.0-openjdk-aarch32, libtpms, and salt), openSUSE (libaom, libtpms, and openssl-1_0_0), Red Hat (openstack-neutron), SUSE (grilo, java-1_7_0-openjdk, libaom, libtpms, mariadb, openssl-1_0_0, openssl-1_1, and php74-pear), and Ubuntu (firefox and ghostscript).
corbet

By default, scp(1) now uses SFTP protocol

3 év 11 hónap óta

Thanks to a commit by Damien Miller (djm@), scp(1) (in -current) now defaults to using the SFTP protocol:

CVSROOT: /cvs Module name: src Changes by: djm@cvs.openbsd.org 2021/09/08 17:31:39 Modified files: usr.bin/ssh : scp.1 scp.c Log message: Use the SFTP protocol by default. The original scp/rcp protocol remains available via the -O flag. Note that ~user/ prefixed paths in SFTP mode require a protocol extension that was first shipped in OpenSSH 8.7. ok deraadt, after baking in snaps for a while without incident

As explained in the OpenSSH Release Notes,

SFTP offers more predictable filename handling and does not require expansion of glob(3) patterns via the shell on the remote side.

Cro: Maintain it With Zig

3 év 11 hónap óta
This blog post by Loris Cro makes the claim that the Zig language is the solution to a lot of low-level programming problems:

Freeing the art of systems programming from the grips of C/C++ cruft is the only way to push for real change in our industry, but rewriting everything is not the answer. In the Zig project we’re making the C/C++ ecosystem more fun and productive. Today we have a compiler, a linker and a build system, and soon we’ll also have a package manager, making Zig a complete toolchain that can fetch dependencies and build C/C++/Zig projects from any target, for any target.

(LWN looked at Zig last year).

corbet