Post by Kyle WheelerBut if you're here to rehash the "fast" argument, I think we can't get
anywhere without pointing to the CourierMTA's webpage of mbox/maildir
benchmarks: http://www.courier-mta.org/mbox-vs-maildir/
I'm very familiar with this page, and I consider it fairly useless.
First, it has a tendency to focus on operations where courier wins,
and somewhat downplays cases where it doesn't. For example, it does
no tests at all with extremely large numbers of messages. On typical
Unix file systems, maildir basically falls over, because accessing
files in large directories is inherently slow (to the point of being
painful) on such filesystems.
Second, and much more importantly, it assumes that University of
Washintgon's mbox implementation is representative of how well mbox is
capable of performing, and that Courier's maildir implementation is
similarly representative of how well maildir performs. In other
words, you're actually comparing two specific implementations -- not
mbox vs. maildir per se. That pretty much invalidates every aspect
of the conclusions drawn on this page (though they may well be valid
for Courier vs. UW-IMAP). Despite this, it is interesting to note
that UW-IMAP mostly outperforms Courier on low-end hardware *by a
lot*, with the sole exception of the very special case of expunge
(which the study calls delete), whereas on high-end hardware, Courier
wins by only a small margin.
It's been a long while since I looked at UW's implementation, but I do
remember thinking that it had a number of opportunities for
optimization. I believe, for example, that UW-IMAP's caching was
basically nonexistant (which would explain why Courier does so much
better on all the .2 tests). When comparing Mutt's implementations of
mbox vs. maildir, mbox BLOWS AWAY maildir opening large mailboxes
(i.e. pre-header-caching). IIRC UW-IMAP also uses stdio... which, being
double-buffered, is the least efficient method of I/O. On reasonably
modern (i.e. not broken) implementations, using memory-mapped I/O is
substantially faster. For maildir, the difference probably wouldn't
matter much since the reads and writes tend to be small. For mbox,
that matters a lot (see W. Richard Stevens, Advanced Programming in
the Unix Environment, for an example of how drastically MMIO can
improve I/O performance).
Post by Kyle WheelerYou only need to open(2) every individual message if you're reading
the whole thing for the first time. You certainly don't need to do
that if you're delivering mail, or deleting mail, or marking a message
as read, or what have you.
Yes, exactly. I have dozens of mailboxes, most of which (in my work
environment) are high-volume folders... With my usage patterns
(especially pre-headercache), the speed of opening mailboxes matters A
LOT. FWIW, last I was paying attention, mbox was not receiving the
benefits of header caching in Mutt. For my particular usage patterns,
this matters much, much more than say, the time it takes to expunge a
single message from a large mail box. With my particular usage
patterns, the latter case happens pretty much never. Opening large
mailboxes happens pretty frequently.
As it happens, mbox (on Mutt at least) is actually about the same or
faster for almost all of the operations that actually make a
difference to my e-mail experience. I tend to keep my busy incoming
folders small, and either delete or archive messages from those
folders into mbox folders when I'm done processing them. I rarely
delete messages from those mbox folders, but I still do open them very
frequently to remind myself of whatever's in the messages I saved
there. So for me, maildir's huge win deleting messages in large
folders is a *complete* non-issue. A good mbox implementation with
caching will perform about as well as or even beat maildir handily in
almost every other case. For me, using maildir was as much about
Mutt's behavior when using it, as it was about performance and safety.
With recent improvements from Brendan and/or Rocco, the behavior is no
longer sufficiently different that there's really any benefit at all
for me to use maildir (I don't keep my mail on network shares of any
sort), but there is genuine benefit from using mbox. It may still be
true that mbox is not receiving the benefits of hcache, but if so I
don't really notice the difference. I still do use both, but it's
mostly a remnant of past issues that no longer exist.
--
Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address. Replying to it will result in
undeliverable mail due to spam prevention. Sorry for the inconvenience.