Source from upstream; imap-2007f.tar.gz
MD5 2126fd125ea26b73b20f01fcd5940369
This commit is contained in:
217
docs/formats.txt
Normal file
217
docs/formats.txt
Normal file
@@ -0,0 +1,217 @@
|
||||
/* ========================================================================
|
||||
* Copyright 1988-2006 University of Washington
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
*
|
||||
* ========================================================================
|
||||
*/
|
||||
|
||||
Mailbox Format Characteristics
|
||||
Mark Crispin
|
||||
11 December 2006
|
||||
|
||||
|
||||
When a mailbox storage technology uses local files and
|
||||
directories directly, the file(s) and directories are layed out in a
|
||||
mailbox format.
|
||||
|
||||
I. Flat-File Formats
|
||||
|
||||
In these formats, a mailbox and all the messages inside are a
|
||||
single file on the filesystem. The mailbox name is the name of the
|
||||
file in the filesystem, relative to the user's "mail home directory."
|
||||
|
||||
A flat-file format mailbox is always a file, never a directory.
|
||||
This means that it is impossible to have a flat-file format mailbox
|
||||
that has inferior mailbox names under it (so-called "dual-usage"
|
||||
mailboxes). For some inexplicable reason, some people want this.
|
||||
|
||||
The mail home directory is usually the same as the user login
|
||||
home directory if that concept is meaningful; otherwise, it is some
|
||||
other default directory (e.g. "C:\My Documents" on Windows 98). This
|
||||
can be redefined by modifying the c-client source code or in an
|
||||
application via the SET_HOMEDIR mail_parameters() call.
|
||||
|
||||
For example, a mailbox named "project" is likely to be found in
|
||||
the file "project" in the user's home directory. Similarly, a mailbox
|
||||
named "test/trial1" (assuming a UNIX system) is likely to be found in
|
||||
the file "trial1" in the subdirectory "test" in the user's home
|
||||
directory.
|
||||
|
||||
Note that the name "INBOX" has special semantics and rules, as
|
||||
described in the file naming.txt.
|
||||
|
||||
The following flat-file formats are supported by c-client as of
|
||||
the time of this writing:
|
||||
|
||||
. unix This is the traditional UNIX mailbox format, in use for nearly
|
||||
30 years. It uses a line starting with "From " to indicate
|
||||
start of message, and stores the message status inside the
|
||||
RFC822 message header.
|
||||
|
||||
unix is not particularly efficient; the entire mailbox file
|
||||
must be read when the mailbox is open, and when reading message
|
||||
texts it is necessary to convert the newline convention to
|
||||
Internet standard CR LF form. unix preserves UIDs, and allows
|
||||
the creation of keywords.
|
||||
|
||||
Only one process may have a unix-format mailbox open
|
||||
read/write at a time.
|
||||
|
||||
. mmdf This is the format used by the MMDF mailer. It uses a line
|
||||
consisting of 4 <CTRL/A> (0x01) characters to indicate start
|
||||
and end of message. Optionally, there may also be a unix
|
||||
format "From " line. It otherwise has the same
|
||||
characteristics as unix format.
|
||||
|
||||
. mbx This is the current preferred mailbox format. It can be
|
||||
handled quite efficiently by c-client, without the problems
|
||||
that exist with unix and mmdf formats. Messages are stored
|
||||
in Internet standard CR LF format.
|
||||
|
||||
mbx permits shared access, including shared expunge. It
|
||||
preserves UIDs, and allows the creation of keywords.
|
||||
|
||||
. mtx This is supported for compatibility with the past. This is
|
||||
the old Tenex/TOPS-20 mail.txt format. It can be handled
|
||||
quite efficiently by c-client, and has most of the
|
||||
characteristics of mbx format.
|
||||
|
||||
mtx is deficient in that it does not support shared expunge;
|
||||
it has no means to store UIDs; and it has no way to define
|
||||
keywords except through an external configuration file.
|
||||
|
||||
. tenex This is supported for compatibility with the past. This is
|
||||
the old Columbia MM format. This is similar to mtx format,
|
||||
only it uses UNIX-style bare-LF newlines instead of CR LF
|
||||
newlines, thus incurring a performance penalty for newline
|
||||
conversion.
|
||||
|
||||
. phile This is not strictly a format. Any file which is not in a
|
||||
recognized format is in phile format, which treats the entire
|
||||
contents of the file as a single message.
|
||||
|
||||
|
||||
II. File/Message Formats
|
||||
|
||||
In these formats, a mailbox is a directory, and each the messages
|
||||
inside are separate files inside the directory. The file names of
|
||||
these files are generally the text form of a number, which also
|
||||
matches the UID of the message.
|
||||
|
||||
In the case of mx, the mailbox name is the name of the directory
|
||||
in the filesystem, relative to the user's "mail home directory." In
|
||||
the case of news and mh, the mailbox name is in a separate namespace
|
||||
as described in the file naming.txt.
|
||||
|
||||
A file/message format mailbox is always a directory. This means
|
||||
that it is possible to have a file/message format mailbox that has
|
||||
inferior mailbox names under it (so-called "dual-usage" mailboxes).
|
||||
For some inexplicable reason, some people want this.
|
||||
|
||||
Note that the name "INBOX" has special semantics and rules, as
|
||||
described in the file naming.txt.
|
||||
|
||||
The following file/message formats are supported by c-client as of
|
||||
the time of this writing:
|
||||
|
||||
. mx This is an experimental format, and may be removed in a future
|
||||
release. An mx format mailbox has a .mxindex file which holds
|
||||
the message status and unique identifiers. Messages are
|
||||
stored in Internet standard CF LF form, so the file size of
|
||||
the message file equals the size of the message.
|
||||
|
||||
mx is somewhat inefficient; the entire directory must be read
|
||||
and each file stat()'d. We found it intolerable for a
|
||||
moderate sized mailbox (2000 messages) and have more or less
|
||||
abandoned it.
|
||||
|
||||
. mh This is supported for compatibility with the past. This is
|
||||
the format used by the old mh program.
|
||||
|
||||
mh is very inefficient; the entire directory must be read
|
||||
and each file stat()'d, and in order to determine the size
|
||||
of a message, the entire file must be read and newline
|
||||
conversion performed.
|
||||
|
||||
mh is deficient in that it does not support any permanent
|
||||
flags or keywords; and has no means to store UIDs (because
|
||||
the mh "compress" command renames all the files, that's
|
||||
why).
|
||||
|
||||
. news This is an export of the local filesystem's news spool, e.g.
|
||||
/var/spool/news. Access to mailboxes in news format is read
|
||||
only; however, message "deleted" status is preserved in a
|
||||
.newsrc file in the user's home directory. There is no other
|
||||
status or keywords.
|
||||
|
||||
news is very inefficient; the entire directory must be
|
||||
read and each file stat()'d, and in order to determine the
|
||||
size of a message, the entire file must be read and newline
|
||||
conversion performed.
|
||||
|
||||
news is deficient in that it does not support permanent flags
|
||||
other than deleted; does not support keywords; and has no
|
||||
expunge.
|
||||
|
||||
|
||||
Soapbox on File/Message Formats
|
||||
|
||||
If it sounds from the above descriptions that we're not putting
|
||||
too much effort into file/message formats, you are correct.
|
||||
|
||||
There's a general reason why file/message formats are a bad idea.
|
||||
Just about every filesystem in existance serializes file creation and
|
||||
deletions because these manipulate the free space map. This turns out
|
||||
to be an enormous problem when you start creating/deleting more than a
|
||||
few messages per second; you spend all your time thrashing in the
|
||||
filesystem.
|
||||
|
||||
It is also extremely slow to do a text search through a
|
||||
file/message format mailbox. All of those open()s and close()s really
|
||||
add up to major filesystem thrashing.
|
||||
|
||||
|
||||
What about Cyrus and Maildir?
|
||||
|
||||
Both formats are vulnerable to the filesystem thrashing outlined
|
||||
above.
|
||||
|
||||
The Cyrus format used by CMU's Cyrus server (and Esys' server)
|
||||
has a special associated flat file in each directory that contains
|
||||
extensive data (including pre-parsed ENVELOPEs and BODYSTRUCTUREs)
|
||||
about the messages. Put another way, it's a (considerably) more
|
||||
featureful form of mx. It also uses certain operating system
|
||||
facilities (e.g. file/memory mapping) which are not available on older
|
||||
systems, at a cost of much more limited portability than c-client.
|
||||
These considerably ameliorate the fundamental problems with
|
||||
file/message formats; in fact, Cyrus is halfway to being a database.
|
||||
Rather than support Cyrus format in c-client, you should run Cyrus or
|
||||
Esys if you want that format.
|
||||
|
||||
The Maildir format used by qmail has all of the performance
|
||||
disadvantages of mh noted above, with the additional problem that the
|
||||
files are renamed in order to change their status so you end up having
|
||||
to rescan the directory frequently to locate the current names
|
||||
(particularly in a shared mailbox scenario). It doesn't scale, and it
|
||||
represents a support nightmare; it is therefore not supported in the
|
||||
official distribution. Maildir support code for c-client is available
|
||||
from third parties; but, if you use it, it is entirely at your own
|
||||
risk (read: don't complain about how poorly it performs or bugs).
|
||||
|
||||
|
||||
So what does this all mean?
|
||||
|
||||
A database (such as used by Exchange) is really a much better
|
||||
approach if you want to move away from flat files. mx and especially
|
||||
Cyrus take a tenative step in that direction; mx failed mostly because
|
||||
it didn't go anywhere near far enough. Cyrus goes much further, and
|
||||
scores remarkable benefits from doing so.
|
||||
|
||||
However, a well-designed pure database without the overhead of
|
||||
separate files would do even better.
|
||||
Reference in New Issue
Block a user