Chris Jepeway egy érdekes patch fejlesztésébe kezdett. A patchet elküldte a NetBSD tech-performance levelezési listára. A kód segítségével megvalósítható a diszk-szintű tranzakció fürtözés ("disk-level transaction clustering"), amely azt teszi, hogy egybe fogja a sorozatban következő folyamatos lemez írási vagy olvasási műveleteket egy írási vagy olvasási műveletté. Ez a logika a lemez meghajtó program szintjén került beillesztésre, és körülbelül 5%-nyi növekedést lehet ezzel a módszerrel elérni a lemez áteresztőképességben. Ahogy Chris mondja: "Semmi egetrengető újdonság nincs ebben, természetesen, de demonstrálni lehet vele, hogy ez a fürtözős hiányzik a jelenlegi FS / VM interfészből."
Jason R Thorpe szerint ez egy rendkívül cool stuff, de néhány változtatásra azért még szükség lehet:
From: Chris Jepeway
To: tech-perform AT netbsd.org
Subject: Disk-level Transaction Clustering
Date: Sat, 07 Sep 2002 02:42:19 -0400
I've whacked disk-level transaction clustering into
the sd and wd drivers of -current from about 3 days ago.
This is before the gehenna-devsw merge, so I dunno whether
the patches I've put up will apply as of today.
I've only tested the sd driver, I haven't yet tried compiling the wd driver with clustering enabled. And only on an FFS partition w/o softdeps enabled.
For a simple benchmark, I used ssh/pax to copy a full-ish
/usr/src/sys tree (it had the kernels from a release build
in it) onto a test machine where sd clustering was enabled.
About 99K total xfers were done to disk. Of these, about
1300 were clusters built by the sd driver. These 1300
clusters held about 5100 buffers that would have been
individually scheduled if the driver weren't combinging
them. So, clustering saved about 3800 xfers, roughly a
4% savings.
I then built the GENERIC kernel with clustering disabled.
About 12800 xfers were done during the build. Building
GENERIC again with clustering turned on did about 12200 xfers, where 1000 buffers or so were combined into 300 clusters. That's about a 5% savings. CPU time and wall time for both compiles were comparable.
None of this is earth-shattering news, of course, but it does
demonstrate that clusters can be missed at the FS/VM interface. And that's when only one process uses the disk in question.
If someone points me at some benchmarks enjoyed by the powers that be, I'll be glad to generate harder numbers. See a msg posted a few days back on what/how I'd test.
Further info and code/patches at
http://www.blasted-heath.com/nbsd/cluster/
I had to hand edit the patch to remove some lines in
sys/conf/files and the like that weren't relevant to
clustering, so there's a chance that part of the patch
might not apply cleanly. If you try it, let me know how
it goes.
Chris
-------------------------------------------------------------------
From: Jason R Thorpe
Subject: Re: Disk-level Transaction Clustering
Date: Sat, 7 Sep 2002 09:32:55 -0700
On Sat, Sep 07, 2002 at 02:42:19AM -0400, Chris Jepeway wrote:
> Further info and code/patches at
>
> http://www.blasted-heath.com/nbsd/cluster/
This is pretty cool stuff, but I have some suggestions on how to make it better :-)
You really don't want to use a VM map to make the clusters. This can have painful side-effects on some architectures, esp. since you are using kmappings ... this is basically not going to work on any platform which has a virtually-indexed cache.
Instead, I suggest using uios to describe the clusters. Make a flag called B_UIO for the buf structure, and when that is set, b_data points to a uio structure. When you build a cluster, allocate a uio and an iovec array (maybe always allocate an iovec array large enough to handle up to some max_cluster requests).
...then modify the SCSI HBA drivers to use bus_dmamap_load_uio instead of bus_dmamap_load when they see B_UIO. Note that there is already some
#if 0'd code for this in some HBA drivers (historical reasons).
It would also be nice if the building of clusters were hidden inside the BUFQ interface. I suggest adding a new flag when the bufq is allocated, BUFQ_CLUSTER, or something.
Now, for devices which aren't using bus_dma, we could just avoid setting BUFQ_CLUSTER in those cases. They won't get the benefit of clustering, but they will also continue to work.
--
-- Jason R. Thorpe
--------------------------------------------------------------------
From: Chris Jepeway
Subject: Re: Disk-level Transaction Clustering
Date: Thu, 12 Sep 2002 16:57:14 -0400
> This is pretty cool stuff, but I have some suggestions on how to make
> it better :-)
Cool.
> You really don't want to use a VM map to make the clusters. This can have
> painful side-effects on some architectures, esp. since you are using
> kmappings ... this is basically not going to work on any platform which has
> a virtually-indexed cache.
OK. That's b/c these machines cant't handle VA aliases in the cache, so aliases aren't allowed on them? Is there some way to inval the cache, in that case? And are there other reasons why it won't work? I ask to both understand and to try to support clusters on those configs that don't bus_dma.
> Instead, I suggest using uios to describe the clusters. Make a flag
> called B_UIO for the buf structure, and when that is set, b_data points
> to a uio structure. When you build a cluster, allocate a uio and an iovec
> array (maybe always allocate an iovec array large enough to handle up to
> some max_cluster requests).
>
> ....then modify the SCSI HBA drivers to use bus_dmamap_load_uio instead
> of bus_dmamap_load when they see B_UIO. Note that there is already some
> #if 0'd code for this in some HBA drivers (historical reasons).
A buddy of mine had pointed out that code and suggested this approach, too. Makes sense to me, so that's what I'll aim for.
> It would also be nice if the building of clusters were hidden inside the
> BUFQ interface. I suggest adding a new flag when the bufq is allocated,
> BUFQ_CLUSTER, or something.
I think I like this, too.
> Now, for devices which aren't using bus_dma, we could just avoid setting
> BUFQ_CLUSTER in those cases. They won't get the benefit of clustering, but
> they will also continue to work.
Hm. I could have 2 clustering methods, one that uses bus_dma, one that uses VM tricks. Prefer bus_dma over VM, prefer VM over nothing, and force nothing if the VM h/w for the system can't dtrt? If BUFQ_CLUSTER is set on machines that can't support them, it'd just be ignored.
Chris
--------------------------------------------------------------------
From: Chuck Silvers
Subject: Re: Disk-level Transaction Clustering
Date: Sat, 7 Sep 2002 12:54:04 -0700
hi,
hmm, that's interesting, could you find out what was in the blocks that you were able to cluster? I'd guess it's inode data, but it could be something else.
it's kind of disappointing that there was no measurable improvement in performance, though. could you try experimenting with ccd or raidframe and see if it helps noticably in that context? it'll probably help if you use a machine with a slower CPU as well. my point with trying
to see a performance improvement is that if we think there should be a performance improvement but there isn't one, then maybe something isn't working correctly.
-Chuck