Hírolvasó

[$] Pyodide: Python for the browser

4 év 3 hónap óta
Python in the browser has long been an item on the wish list of many in the Python community. At this point, though, JavaScript has well-cemented its role as the language embedded into the web and its browsers. The Pyodide project provides a way to run Python in the browser by compiling the existing CPython interpreter to WebAssembly and running that binary within the browser's JavaScript environment. Pyodide came about as part of Mozilla's Iodide project, which has fallen by the wayside, but Pyodide is now being spun out as a community-driven project.
jake

Paul E. Mc Kenney: Stupid RCU Tricks: Which tests do I run???

4 év 3 hónap óta
The rcutorture test suite has quite a few options, including locktorture, rcuscale, refscale, and scftorture in addition to rcutorture itself. These tests can be run with the assistance of either KASAN or KCSAN. Given that RCU contains kernel modules, there is the occasional need for an allmodconfig build. Testing of kvfree_rcu() is currently a special case of rcuscale. Some care is required to adapt some of the tests to the test system, for example, based on the number of available CPUs. Both rcuscale and refscale have varying numbers of primitives that they test, so how to keep up with the inevitable additions and deletions? How much time should be devoted to each of locktorture, scftorture, and rcutorture, which, in contrast with rcuscale and refscale, do not have natural accuracy-driven durations? And finally, if you do run all of these things, you end up with about 100 gigabytes of test artifacts scattered across more than 50 date-stamped directories in tools/testing/selftests/rcutorture/bin/res.

Back in the old days, I kept mental track of the -rcu tree and ran the tests appropriate to whatever was queued there. This strategy broke down in late 2020 due to family health issues (everyone is now fine, thank you!), resulting in a couple of embarrassing escapes. Some additional automation was clearly required.

This automation took the form of a new torture.sh script. This is not intended to be the main testing mechanism, but instead an overnight touch-test of the full rcutorture suite that is run occasionally, for example, just after accepting a large patch series or just before sending a pull request.

By default, torture.sh runs everything both with and without KASAN, and with a 10-minute “duration base”. The translation from “duration base” into wall-clock time is a bit indirect. The fewer CPUs you have, the more tests you run, and the longer it takes your system to build a kernel, the more wall-clock time that “10 minutes” will turn into. On my 16-hardware-thread laptop, running everything (including the non-default KCSAN runs) turns that 10-minute duration base into about 11 hours. Increasing the duration base by five minutes increases the total wall-clock time by about 100 minutes.

This is therefore not a test to be integrated into a per-commit CI system, however, manually selecting specific tests for the most recent RCU-related commit is far easier than keeping the entire -rcu stack in one's head. And torture.sh assists with this by providing sets of --configs- and --do- parameters.

The --configs- parameters are as follows:

  1. --configs-rcutorture.
  2. --configs-locktorture.
  3. --configs-scftorture.
These arguments are passed to the --configs argument of kvm.sh for the --torture rcu, --torture lock, and --torture scf cases, respectively. By default, --configs CFLIST is passed. You may accumulate a long list via multiple --configs- arguments, or you can just as easily pass a long quoted list of scenarios through a single --configs- argument.

The --do- parameters are as follows:

  1. --do-all, which enables everything, including non-default options such as KCSAN.
  2. --do-allmodconfig, which does a single allmodconfig kernel build without running anything, and without either KASAN or KCSAN.
  3. --do-clocksourcewd, which does a short test of the clocksource watchdog, verifying that it can tell the difference between delay-based skew and clock-based skew.
  4. --do-kasan, which enables KASAN on everything except -do-allmodconfig.
  5. --do-kcsan, which enables KCSAN on everything except -do-allmodconfig.
  6. --do-kvfree, which runs a special rcuscale test of the kvfree_rcu() primitive.
  7. --do-locktorture, which enables a set of locktorture runs.
  8. --do-none, which disables everything. Yes, you can give a long series of --do-all and --do-none arguments if you really want to, but the usual approach is to follow --do-none with the lists of tests you want to enable, for example, --do-none --do-clocksourcewd will test only the clocksource watchdog, and do so in but a few minutes.
  9. --do-rcuscale, which enables rcuscale update-side performance tests, adapted to the number of CPUs on your system.
  10. --do-rcutorture, which enables rcutorture stress tests.
  11. --do-refscale, which enables refscale read-side performance tests, adapted to the number of CPUs on your system.
  12. --do-scftorture, which enables scftorture stress tests for smp_call_function() and friends, adapted to the number of CPUs on your system.
Each of these --do- parameters has a corresponding --do-no- parameter, wit the exception of --do-all and --do-none, each of which is the other's --do-no- parameter. This allows all-but runs, for example, --do-all --do-no-rcutorture would run everything (even KCSAN), but none of the rcutorture runs.

As of early 2021, KCSAN is still a bit picky about compiler versions, so the --kcsan-kmake-arg allows you to specify arguments to the --kmake-arg argument to kvm.sh. For example, right now, I use --kcsan-kmake-arg "CC=clang-11".

As noted earlier, both rcuscale and refscale can have tests added and removed over time. The torture.sh script deals with this by doing a grep through the rcuscale.c and refscale source code, respectively, and running all of the tests that it finds.

The --duration argument specifies the duration base, which, as noted earlier, defaults to 10 minutes. This duration base is apportioned across the kvm.sh script's --duration parameter, with 70% for rcutorture, 10% for locktorture, and 20% for scftorture. So if you specify --duration 20 to torture.sh, the rcutorture kvm.sh runs will specify --duration 14, the locktorture kvm.sh runs will specify --duration 2, and the scftorture kvm.sh runs will specify --duration 4.

The 100GB full run is addressed at least partially by compressing KASAN vmlinux files, which gains roughly a factor of two overall, courtesy of the 1GB size of each such file. Normally, torture.sh uses all available CPUs to do the compression, but you can restrict it using the --compress-kasan-vmlinux parameter. At the extreme, --compress-kasan-vmlinux 0 will disable compression entirely, which can be an attractive option given that compressing takes about an hour of wall-clock time on my 16-CPU laptop.

Finally, torture.sh places all of its output under a date-stamped directory suffixed with -torture, for example, tools/testing/selftests/rcutorture/res/2021.05.03-20.10.12-torture. This allows bulky torture.sh directories to be more aggressively cleaned up when disks start getting full.

Taking all of this together, torture.sh provides a very useful overnight “acceptance test” for RCU.

Why Sleep Apnea Patients Rely on a CPAP Machine Hacker (Vice)

4 év 3 hónap óta
Vice takes a look at the SleepyHead system for the management of CPAP machines.

The free, open-source, and definitely not FDA-approved piece of software is the product of thousands of hours of hacking and development by a lone Australian developer named Mark Watkins, who has helped thousands of sleep apnea patients take back control of their treatment from overburdened and underinvested doctors. The software gives patients access to the sleep data that is already being generated by their CPAP machines but generally remains inaccessible, hidden by proprietary data formats that can only be read by authorized users (doctors) on proprietary pieces of software that patients often can’t buy or download.

corbet

Making eBPF work on Windows (Microsoft Open Source Blog)

4 év 3 hónap óta
The Microsoft Open Source Blog takes a look at implementing eBPF support in Windows. "Although support for eBPF was first implemented in the Linux kernel, there has been increasing interest in allowing eBPF to be used on other operating systems and also to extend user-mode services and daemons in addition to just the kernel. Today we are excited to announce a new Microsoft open source project to make eBPF work on Windows 10 and Windows Server 2016 and later. The ebpf-for-windows project aims to allow developers to use familiar eBPF toolchains and application programming interfaces (APIs) on top of existing versions of Windows. Building on the work of others, this project takes several existing eBPF open source projects and adds the “glue” to make them run on Windows."
ris

Announcing coreboot 4.14

4 év 3 hónap óta
The coreboot firmware project has released version 4.14. "These changes have been all over the place, so that there's no particular area to focus on when describing this release: We had improvements to mainboards, to chipsets (including much welcomed work to open source implementations of what has been blobs before), to the overall architecture."
ris

Security updates for Tuesday

4 év 3 hónap óta
Security updates have been issued by Debian (hivex), Fedora (djvulibre and thunderbird), openSUSE (monitoring-plugins-smart and perl-Image-ExifTool), Oracle (kernel and kernel-container), Red Hat (kernel and kpatch-patch), SUSE (drbd-utils, java-11-openjdk, and python3), and Ubuntu (exiv2, firefox, libxstream-java, and pyyaml).
ris

[$] The second half of the 5.13 merge window

4 év 3 hónap óta
By the time the last pull request was acted on and 5.13-rc1 was released, a total of 14,231 non-merge commits had found their way into the mainline. That makes the 5.13 merge window larger than the entire 5.12 development cycle (13,015 commits) and just short of all of 5.11 (14,340). In other words, 5.13 looks like one of the busier development cycles we have seen for a little while. About 6,400 of these commits came in after the first-half summary was written, and they include a number of significant new features.
corbet

Security updates for Monday

4 év 3 hónap óta
Security updates have been issued by Debian (libxml2), Fedora (autotrace, babel, kernel, libopenmpt, libxml2, mingw-exiv2, mingw-OpenEXR, mingw-openexr, python-markdown2, and samba), openSUSE (alpine, avahi, libxml2, p7zip, redis, syncthing, and vlc), and Ubuntu (webkit2gtk).
ris

Kernel prepatch 5.13-rc1

4 év 3 hónap óta
The first 5.13 kernel prepatch is out for testing, and the merge window is closed for this development cycle. "This was - as expected - a fairly big merge window, but things seem to have proceeded fairly smoothly. Famous last words." In the end, 14,231 non-merge changesets were pulled into the mainline during the merge window — more than were seen during the entire 5.12 cycle.
corbet

Brendan Gregg: Poor Disk Performance

4 év 3 hónap óta
People often tell me they don't understand performance tool output because they can't tell what's "good" or "bad." It can be hard as performance is subjective. What's good for one user may be bad for another. There are also cases where I can't tell either: The tools only provide clues for further analysis. I recently encountered terrible disk performance and thought it'd be useful to collect Linux tool screenshots and share them for reference. E.g., iostat(1): $ iostat -xz 10 [...] Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 4.40 6.00 42.00 43.20 0.00 4.30 0.00 41.75 6.45 0.80 0.03 9.55 7.20 0.15 0.16 dm-0 4.40 10.30 42.00 43.20 0.00 0.00 0.00 0.00 6.55 0.47 0.03 9.55 4.19 0.54 0.80 dm-1 4.40 9.80 42.00 43.20 0.00 0.00 0.00 0.00 6.55 0.49 0.03 9.55 4.41 0.56 0.80 sdb 4.50 0.00 576.00 0.00 0.00 0.00 0.00 0.00 434.31 0.00 1.98 128.00 0.00 222.22 100.00 It's the sdb disk and I'm first looking at the r_await column to see the average time in milliseconds for reads. An average of 434 ms is awful, and a small queue size (aqu-sz) indicates it's a problem with the disk and not the workload applied. I want to see distributions and event logs. But first, about this disk... See the dust on this disk? ## Flying height Were you ever taught in computer science that the size of a dust particle dwarfs the distance between the disk head and the platter? Something like: It's called "[flying height]" or "fly height," and (from that reference) was about 5 nanometers for 2011 drives. Particles of dust can be 1000x bigger. The heads "float" on a film of air, and this is sometimes described as "air lubrication." To quote from an article about hard drive [air filters]: "some hard drives are not rated to exceed 7,000 feet while operating because the air pressure would be too low inside the drive to float the heads properly." Such hard drives have air ports, and air filters, to equalize pressure with the outside air. (Update: Some modern drives after 2015 are sealed with [helium].) I was first told about the ratio between fly height and particles of dust in a computer studies class at school, with the teacher drawing this diagram on a chalkboard. I assumed that a speck of dust would destroy a drive head at 7200 rpm. Right? I just found a Quora article with a better diagram than mine, which also asks the question So, what do YOU think would happen if the disk read/write head were to run over a speck of dust? (The article doesn't answer.) ## What happened The disk photo is an 80 Gbyte Western Digital IDE disk I found when packing up to move house. Missing its lid. Dusty. I'd also recently bought a [SATA/IDE to USB hub] and couldn't resist seeing if the disk was readable despite the dust, and finding out what was on it (I'd forgotten). Surely it's unreadable, right?... The drive failed immediately. The disk sped up, the head clicked, then sped down with an error. I found the lid but no drive screws, and rested it on top. Still errored. By pushing down on the lid, however, (simulating screws) it sped up and down a few times before failing. The harder I pushed the less it vibrated and the more it worked, until I finally had it returning I/O, albeit slowly. (This may be the opposite of my famous [shouting video]: This time I'm suppressing vibration to make a disk work.) I managed to read over 99.9999% of disk sectors successfully. It took several hours so I left a bottle of apple juice pressing the lid down. Performance was still poor, but the head wasn't obliterated. Only an 8-Kbyte sequential chunk failed and could not be read (big bit of dust?). The iostat output from earlier (and the screenshots below) are the performance of this disk, dust-n-all. While dust may have been a factor, I think the biggest cause for poor performance was vibration with the lid unscrewed, based on how much faster it worked when I used my body weight to hold the lid down. I could hear it spin faster. It seemed to have several set speeds, and when pushing hard it would try a faster speed for a couple of seconds, then a faster one, until it found the fastest it could operate (presumably it tries faster speeds until it begins to get sector-ECC errors). The way it tried faster speeds somehow reminded me of how 32x CDROM drives operated. ## Screenshots Back to my opening line: The following screenshots may help you better understand these tool outputs. I'll start with the worst performance and then show moderately-poor performance. From these outputs I try to determine if the problem is: - **The workload**: High-latency disk I/O is commonly caused by the workload applied. It may be due to queueing, especially from file systems that send a batch of writes. It can also be simply large I/O, or the presence of other disk commands that slow subsequent I/O. - **The disk**: If it isn't the workload applied, then slow I/O may well be caused by a bad disk. Analysis is similar whether the disk is rotational magnetic or flash-memory based. Rotational disks have extra latency from head seeks for random I/O, and spin ups from the idle state. The workload is 128 Kbyte sequential reads using the dd(1) utility. I'd guess they'd normally take between 1 and 2 ms for this disk. ### Worst performance iostat(1), printing 10-second summaries: $ iostat -xz 10 Linux 4.15.0-66-generic (lgud-bgregg) 12/16/2020 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 7.70 0.01 2.03 0.09 0.00 90.17 [...] avg-cpu: %user %nice %system %iowait %steal %idle 7.90 0.00 2.07 10.87 0.00 79.15 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.40 15.30 2.00 167.20 0.00 2.70 0.00 15.00 7.00 0.81 0.01 5.00 10.93 0.13 0.20 dm-0 0.40 18.00 2.00 167.20 0.00 0.00 0.00 0.00 7.00 7.69 0.14 5.00 9.29 0.33 0.60 dm-1 0.30 17.80 1.60 167.20 0.00 0.00 0.00 0.00 6.67 7.78 0.14 5.33 9.39 0.29 0.52 dm-2 0.10 0.00 0.40 0.00 0.00 0.00 0.00 0.00 8.00 0.00 0.00 4.00 0.00 8.00 0.08 sdb 7.30 0.00 934.40 0.00 0.00 0.00 0.00 0.00 269.70 0.00 1.97 128.00 0.00 136.88 99.92 avg-cpu: %user %nice %system %iowait %steal %idle 7.70 0.00 1.66 10.97 0.00 79.68 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 4.40 6.00 42.00 43.20 0.00 4.30 0.00 41.75 6.45 0.80 0.03 9.55 7.20 0.15 0.16 dm-0 4.40 10.30 42.00 43.20 0.00 0.00 0.00 0.00 6.55 0.47 0.03 9.55 4.19 0.54 0.80 dm-1 4.40 9.80 42.00 43.20 0.00 0.00 0.00 0.00 6.55 0.49 0.03 9.55 4.41 0.56 0.80 sdb 4.50 0.00 576.00 0.00 0.00 0.00 0.00 0.00 434.31 0.00 1.98 128.00 0.00 222.22 100.00 avg-cpu: %user %nice %system %iowait %steal %idle 6.89 0.00 1.90 10.99 0.00 80.23 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.30 7.60 1.20 119.20 0.00 4.40 0.00 36.67 2.67 1.63 0.01 4.00 15.68 0.20 0.16 dm-0 0.30 12.00 1.20 119.20 0.00 0.00 0.00 0.00 2.67 2.30 0.03 4.00 9.93 0.55 0.68 dm-1 0.30 11.40 1.20 119.20 0.00 0.00 0.00 0.00 2.67 2.42 0.03 4.00 10.46 0.58 0.68 sdb 3.50 0.00 448.00 0.00 0.00 0.00 0.00 0.00 579.66 0.00 1.99 128.00 0.00 285.71 100.00 This output shows 10-second statistical summaries. Massive r_await with little aqu-sz, as mentioned earlier. The read size is large (128 Kbyte average as seen in iostat(1)), but that's not excessive. biolatency (this is my BPF tool from [bcc]), printing 60-second histograms, per disk (-D): # biolatency -D 60 1 Tracing block device I/O... Hit Ctrl-C to end. disk = 'nvme0n1' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 12 |* | 16 -> 31 : 318 |****************************************| 32 -> 63 : 210 |************************** | 64 -> 127 : 106 |************* | 128 -> 255 : 65 |******** | 256 -> 511 : 29 |*** | 512 -> 1023 : 31 |*** | 1024 -> 2047 : 81 |********** | 2048 -> 4095 : 93 |*********** | 4096 -> 8191 : 76 |********* | disk = 'sdb' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 0 | | 512 -> 1023 : 0 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 0 | | 4096 -> 8191 : 0 | | 8192 -> 16383 : 0 | | 16384 -> 32767 : 1 | | 32768 -> 65535 : 15 |** | 65536 -> 131071 : 214 |****************************************| 131072 -> 262143 : 84 |*************** | 262144 -> 524287 : 46 |******** | 524288 -> 1048575 : 7 |* | 1048576 -> 2097151 : 0 | | 2097152 -> 4194303 : 1 | | Note the sdb latencies range from 32 ms to over 2 seconds! biosnoop (this is my BPF tool from [bcc]), printing every disk event: # biosnoop TIME(s) COMM PID DISK T SECTOR BYTES LAT(ms) 0.000000 dd 16014 sdb R 37144544 131072 77.96 0.008933 biosnoop 21118 nvme0n1 R 652936664 4096 7.53 0.143268 dd 16014 sdb R 37144800 131072 143.20 0.333243 dmcrypt_write 347 nvme0n1 W 244150736 4096 2.72 0.333256 dmcrypt_write 347 nvme0n1 W 244150744 4096 2.49 0.333259 dmcrypt_write 347 nvme0n1 W 244150752 4096 1.38 0.361565 dd 16014 sdb R 37145056 131072 218.24 0.463294 dd 16014 sdb R 37145312 131072 101.70 0.590237 dd 16014 sdb R 37145568 131072 126.92 0.734682 dd 16014 sdb R 37145824 131072 144.38 0.864665 Cache2 I/O 6515 nvme0n1 R 694714632 4096 0.10 0.961290 dd 16014 sdb R 37146080 131072 226.55 1.063137 dd 16014 sdb R 37146336 131072 101.79 1.198111 dd 16014 sdb R 37146592 131072 134.91 1.425886 dd 16014 sdb R 37146848 131072 227.74 1.619342 dd 16014 sdb R 37147104 131072 193.38 1.754445 dd 16014 sdb R 37147360 131072 135.04 1.856156 dd 16014 sdb R 37147616 131072 101.65 2.000656 dd 16014 sdb R 37147872 131072 144.42 2.102591 dd 16014 sdb R 37148128 131072 101.83 2.204427 dd 16014 sdb R 37148384 131072 101.77 2.397540 dd 16014 sdb R 37148640 131072 193.05 2.567098 dd 16014 sdb R 37148896 131072 169.52 2.576776 dmcrypt_write 347 nvme0n1 W 94567816 57344 7.46 2.577205 dmcrypt_write 347 nvme0n1 W 499469088 12288 0.02 2.577272 dmcrypt_write 347 nvme0n1 W 499469112 16384 0.04 2.580759 dmcrypt_write 347 nvme0n1 W 499469144 4096 2.03 2.752098 dd 16014 sdb R 37149152 131072 184.94 2.945566 dd 16014 sdb R 37149408 131072 193.41 3.039011 dd 16014 sdb R 37149664 131072 93.38 3.165834 dd 16014 sdb R 37149920 131072 126.76 3.401771 dd 16014 sdb R 37150176 131072 235.87 3.536805 dd 16014 sdb R 37150432 131072 134.95 3.705294 dd 16014 sdb R 37150688 131072 168.43 3.772291 Cache2 I/O 6515 nvme0n1 R 694703744 4096 7.55 3.873563 dd 16014 sdb R 37150944 131072 168.21 4.018151 dd 16014 sdb R 37151200 131072 144.53 4.253137 dd 16014 sdb R 37151456 131072 234.92 4.310591 dmcrypt_write 347 nvme0n1 W 220635024 16384 2.70 [...] This shows individual I/O to disk sdb taking 100 ms and more (LAT(ms)). If I ran this for long enough I should see outliers reaching up to over 2 seconds. I don't see evidence of queueing in this biosnoop output: One tell-tale sign of queueing is when I/O latencies ramp up (e.g.: 10ms, 20ms, 30ms, 40ms, etc.) with a steady completion time between them (seen in the TIME(s) column). This can be when the disk is working through its queue, so later I/O have steadily increasing latency. But the completion times and latencies in this output show that the disk doesn't appear to have a deep queue. It's just plain slow. ### Poor performance By pressing hard on the disk lid it was able to operate faster, but still somewhat poor. # biosnoop TIME(s) COMM PID DISK T SECTOR BYTES LAT(ms) [...] 2.643276 dd 16014 sdb R 46133728 131072 1.60 2.660996 dd 16014 sdb R 46133984 131072 16.98 2.671327 dd 16014 sdb R 46134240 131072 10.31 2.673299 dd 16014 sdb R 46134496 131072 1.94 2.675298 dd 16014 sdb R 46134752 131072 1.97 2.685624 dd 16014 sdb R 46135008 131072 10.29 2.705410 dd 16014 sdb R 46135264 131072 19.76 2.707425 dd 16014 sdb R 46135520 131072 1.96 2.710357 dd 16014 sdb R 46135776 131072 1.66 2.716280 dd 16014 sdb R 46136032 131072 1.62 2.739534 dd 16014 sdb R 46136288 131072 19.07 2.741464 dd 16014 sdb R 46136544 131072 1.90 2.743432 dd 16014 sdb R 46136800 131072 1.93 2.745563 dd 16014 sdb R 46137056 131072 1.57 2.756934 dd 16014 sdb R 46137312 131072 10.11 2.783863 dd 16014 sdb R 46137568 131072 26.90 2.785830 dd 16014 sdb R 46137824 131072 1.93 2.787835 dd 16014 sdb R 46138080 131072 1.97 2.790935 dd 16014 sdb R 46138336 131072 2.55 [...] The latencies here look like they are a mix of normal speed (~1.9 ms) and slower ones (~10ms and slower). Given it's a 7,200 rpm disk, a revolution takes ~8ms, so if it needs to retry sectors I'd expect to see latencies of 2ms, 10ms, 18ms, 26ms, etc. Here's the biolatency(1) histograms when the disk is running faster: disk = 'sdb' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 0 | | 512 -> 1023 : 0 | | 1024 -> 2047 : 13 |****** | 2048 -> 4095 : 82 |****************************************| 4096 -> 8191 : 0 | | 8192 -> 16383 : 9 |**** | 16384 -> 32767 : 7 |*** | 32768 -> 65535 : 41 |******************** | 65536 -> 131071 : 77 |************************************* | 131072 -> 262143 : 2 | | 262144 -> 524287 : 1 | | The distribution is bimodal. The faster mode will be the sequential reads, the slower mode shows the retries. And the iostat(1) output when the disk is in this faster state: $ iostat -xz 10 [...] avg-cpu: %user %nice %system %iowait %steal %idle 11.78 0.00 2.68 2.82 0.00 82.72 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 3.50 11.70 15.60 146.40 0.40 2.30 10.26 16.43 2.40 0.21 0.00 4.46 12.51 0.05 0.08 dm-0 3.90 14.00 15.60 146.40 0.00 0.00 0.00 0.00 2.87 0.17 0.01 4.00 10.46 0.54 0.96 dm-1 1.40 13.70 5.60 146.40 0.00 0.00 0.00 0.00 4.29 0.18 0.01 4.00 10.69 0.29 0.44 dm-2 2.50 0.00 10.00 0.00 0.00 0.00 0.00 0.00 2.08 0.00 0.01 4.00 0.00 2.08 0.52 sdb 321.40 0.00 41139.20 0.00 0.00 0.00 0.00 0.00 5.11 0.00 1.64 128.00 0.00 3.01 96.88 The average (r_await) of 5.11 ms really doesn't tell the full story like the histogram or per-event output does. ## More questions What's happening to all that dust? Is it stuck to the platter surface, or does it bounce around when the disk is spinning? The photo I included was after I read the entire disk, so the dust didn't end up in the internal air filters. It was still on the platter. Would a 1 TB disk be as tolerant to dust as this old 80 GB disk? (When I was a sysadmin, I heard a story of how old VAX drives would stall, so holes had been drilled in them with tape over the holes. When stalled, the sysadmin would peel back the tape and use their finger to spin-start them. Those even older drives must have been more tolerant of dust!) And at what point is there too much dust? I don't recommend you try this, but if I had time or interest I'd create a perspex lid and see how much dust a drive can keep working with. At least I answered one question. I found that these hard drive heads were not destroyed by dust, and could read almost everything from a dusty disk, albeit slowly. Perhaps that's not the case with more modern SMR disks with smaller tolerances, but I'd have to try, given the surprising result this time. [flying height]: https://en.wikipedia.org/wiki/Flying_height [SATA/IDE to USB hub]: https://www.amazon.com/gp/product/B01NAUIA6G/ [shouting video]: http://www.brendangregg.com/blog/2008-12-31/unusual-disk-latency.html [air filters]: https://www.karlstechnology.com/blog/hard-drive-air-filters [bcc]: https://github.com/iovisor/bcc [helium]: https://techreport.com/news/27031/shingled-platters-breathe-helium-inside-hgsts-10tb-hard-drive/

An IEEE statement on the UMN paper

4 év 3 hónap óta
The IEEE, whose Symposium on Security and Privacy conference had accepted the "hypocrite commits" paper for publication, has posted a statement [PDF] on the episode.

The paper was reviewed by four reviewers in the Fall S&P 2021 review cycle and received a very positive overall rating (2 Accept and 2 Weak Accept scores, putting it in the top 5% of submitted papers). The reviewers noted that the fact that a malicious actor can attempt to intentionally add a vulnerability to an open source project is not new, but also acknowledged that the authors provide several new insights by describing why this might be easier than expected, and why it might be difficult for maintainers to detect the problem. One of the PC members briefly mentioned a possible ethical concern in their review, but that comment was not significantly discussed any further at the time; we acknowledge that we missed it.

The statement concludes with some actions to be taken by IEEE to ensure that ethically questionable papers are not accepted again.

corbet

[$] Noncoherent DMA mappings

4 év 3 hónap óta
While it is sometimes possible to perform I/O by moving data through the CPU, the only way to get the required level of performance is usually for devices to move data directly to and from memory. Direct memory access (DMA) I/O has been well supported in the Linux kernel since the early days, but there are always ways in which that support can be improved, especially when hardware adds some challenges of its own. The somewhat confusingly named "non-contiguous" DMA API that was added for 5.13 shows the kinds of things that have to be done to get the best performance on current systems.
corbet

Security updates for Friday

4 év 3 hónap óta
Security updates have been issued by Debian (mediawiki and unbound1.9), Fedora (djvulibre and samba), Mageia (ceph, messagelib, and pagure), openSUSE (alpine and exim), Oracle (kernel and postgresql), Scientific Linux (postgresql), and Ubuntu (thunderbird and unbound).
jake

An Interview With Linus Torvalds: Open Source And Beyond - Part 2 (Tag1)

4 év 3 hónap óta
The second half of the interview with Linus Torvalds on the Tag1 Consulting site has been posted.

I think one of the reasons Linux succeeded was exactly the fact that I actually did NOT have a big plan, and did not have high expectations of where things would go, and so when people started sending me patches, or sending me requests for features, to me that was all great, and I had no preconceived notion of what Linux should be. End result: all those individuals (and later big companies) that wanted to participate in Linux kernel development had a fairly easy time to do so, because I was quite open to Linux doing things that I personally had had no real interest in originally.

corbet

[$] A pair of memory-allocation improvements in 5.13

4 év 3 hónap óta
Among the many changes merged for 5.13 can be found performance improvements throughout the kernel. This work does not always stand out the way that new features do, but it is vitally important for the future of the kernel overall. In the memory-management area, a couple of long-running patch sets have finally made it into the mainline; these provide a bulk page-allocation interface and huge-page mappings in the vmalloc() area. Both of these changes should make things faster, at least for some workloads.
corbet