Source from upstream; imap-2007f.tar.gz
MD5 2126fd125ea26b73b20f01fcd5940369
This commit is contained in:
417
docs/locking.txt
Normal file
417
docs/locking.txt
Normal file
@@ -0,0 +1,417 @@
|
||||
/* ========================================================================
|
||||
* Copyright 1988-2006 University of Washington
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
*
|
||||
* ========================================================================
|
||||
*/
|
||||
|
||||
UNIX Advisory File Locking Implications on c-client
|
||||
Mark Crispin, 28 November 1995
|
||||
|
||||
|
||||
THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE FACT THAT
|
||||
LINUX SUPPORTS BOTH flock() AND fcntl() AND THAT OSF/1
|
||||
HAS BEEN BROKEN SO THAT IT ONLY SUPPORTS fcntl().
|
||||
-- JUNE 15, 2004
|
||||
|
||||
THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE CODE IN THE
|
||||
IMAP-4 TOOLKIT AS OF NOVEMBER 28, 1995. SOME STATEMENTS
|
||||
IN THIS DOCUMENT DO NOT APPLY TO EARLIER VERSIONS OF THE
|
||||
IMAP TOOLKIT.
|
||||
|
||||
INTRODUCTION
|
||||
|
||||
Advisory locking is a mechanism by which cooperating processes
|
||||
can signal to each other their usage of a resource and whether or not
|
||||
that usage is critical. It is not a mechanism to protect against
|
||||
processes which do not cooperate in the locking.
|
||||
|
||||
The most basic form of locking involves a counter. This counter
|
||||
is -1 when the resource is available. If a process wants the lock, it
|
||||
executes an atomic increment-and-test-if-zero. If the value is zero,
|
||||
the process has the lock and can execute the critical code that needs
|
||||
exclusive usage of a resource. When it is finished, it sets the lock
|
||||
back to -1. In C terms:
|
||||
|
||||
while (++lock) /* try to get lock */
|
||||
invoke_other_threads (); /* failed, try again */
|
||||
.
|
||||
. /* critical code here */
|
||||
.
|
||||
lock = -1; /* release lock */
|
||||
|
||||
This particular form of locking appears most commonly in
|
||||
multi-threaded applications such as operating system kernels. It
|
||||
makes several presumptions:
|
||||
(1) it is alright to keep testing the lock (no overflow)
|
||||
(2) the critical resource is single-access only
|
||||
(3) there is shared writeable memory between the two threads
|
||||
(4) the threads can be trusted to release the lock when finished
|
||||
|
||||
In applications programming on multi-user systems, most commonly
|
||||
the other threads are in an entirely different process, which may even
|
||||
be logged in as a different user. Few operating systems offer shared
|
||||
writeable memory between such processes.
|
||||
|
||||
A means of communicating this is by use of a file with a mutually
|
||||
agreed upon name. A binary semaphore can be passed by means of the
|
||||
existance or non-existance of that file, provided that there is an
|
||||
atomic means to create a file if and only if that file does not exist.
|
||||
In C terms:
|
||||
|
||||
/* try to get lock */
|
||||
while ((fd = open ("lockfile",O_WRONLY|O_CREAT|O_EXCL,0666)) < 0)
|
||||
sleep (1); /* failed, try again */
|
||||
close (fd); /* got the lock */
|
||||
.
|
||||
. /* critical code here */
|
||||
.
|
||||
unlink ("lockfile"); /* release lock */
|
||||
|
||||
This form of locking makes fewer presumptions, but it still is
|
||||
guilty of presumptions (2) and (4) above. Presumption (2) limits the
|
||||
ability to have processes sharing a resource in a non-conflicting
|
||||
fashion (e.g. reading from a file). Presumption (4) leads to
|
||||
deadlocks should the process crash while it has a resource locked.
|
||||
|
||||
Most modern operating systems provide a resource locking system
|
||||
call that has none of these presumptions. In particular, a mechanism
|
||||
is provided for identifying shared locks as opposed to exclusive
|
||||
locks. A shared lock permits other processes to obtain a shared lock,
|
||||
but denies exclusive locks. In other words:
|
||||
|
||||
current state want shared want exclusive
|
||||
------------- ----------- --------------
|
||||
unlocked YES YES
|
||||
locked shared YES NO
|
||||
locked exclusive NO NO
|
||||
|
||||
Furthermore, the operating system automatically relinquishes all
|
||||
locks held by that process when it terminates.
|
||||
|
||||
A useful operation is the ability to upgrade a shared lock to
|
||||
exclusive (provided there are no other shared users of the lock) and
|
||||
to downgrade an exclusive lock to shared. It is important that at no
|
||||
time is the lock ever removed; a process upgrading to exclusive must
|
||||
not relenquish its shared lock.
|
||||
|
||||
Most commonly, the resources being locked are files. Shared
|
||||
locks are particularly important with files; multiple simultaneous
|
||||
processes can read from a file, but only one can safely write at a
|
||||
time. Some writes may be safer than others; an append to the end of
|
||||
the file is safer than changing existing file data. In turn, changing
|
||||
a file record in place is safer than rewriting the file with an
|
||||
entirely different structure.
|
||||
|
||||
|
||||
FILE LOCKING ON UNIX
|
||||
|
||||
In the oldest versions of UNIX, the use of a semaphore lockfile
|
||||
was the only available form of locking. Advisory locking system calls
|
||||
were not added to UNIX until after the BSD vs. System V split. Both
|
||||
of these system calls deal with file resources only.
|
||||
|
||||
Most systems only have one or the other form of locking. AIX
|
||||
and newer versions of OSF/1 emulate the BSD form of locking as a jacket
|
||||
into the System V form. Ultrix and Linux implement both forms.
|
||||
|
||||
BSD
|
||||
|
||||
BSD added the flock() system call. It offers capabilities to
|
||||
acquire shared lock, acquire exclusive lock, and unlock. Optionally,
|
||||
the process can request an immediate error return instead of blocking
|
||||
when the lock is unavailable.
|
||||
|
||||
|
||||
FLOCK() BUGS
|
||||
|
||||
flock() advertises that it permits upgrading of shared locks to
|
||||
exclusive and downgrading of exclusive locks to shared, but it does so
|
||||
by releasing the former lock and then trying to acquire the new lock.
|
||||
This creates a window of vulnerability in which another process can
|
||||
grab the exclusive lock. Therefore, this capability is not useful,
|
||||
although many programmers have been deluded by incautious reading of
|
||||
the flock() man page to believe otherwise. This problem can be
|
||||
programmed around, once the programmer is aware of it.
|
||||
|
||||
flock() always returns as if it succeeded on NFS files, when in
|
||||
fact it is a no-op. There is no way around this.
|
||||
|
||||
Leaving aside these two problems, flock() works remarkably well,
|
||||
and has shown itself to be robust and trustworthy.
|
||||
|
||||
SYSTEM V/POSIX
|
||||
|
||||
System V added new functions to the fnctl() system call, and a
|
||||
simple interface through the lockf() subroutine. This was
|
||||
subsequently included in POSIX. Both offer the facility to apply the
|
||||
lock to a particular region of the file instead of to the entire file.
|
||||
lockf() only supports exclusive locks, and calls fcntl() internally;
|
||||
hence it won't be discussed further.
|
||||
|
||||
Functionally, fcntl() locking is a superset of flock(); it is
|
||||
possible to implement a flock() emulator using fcntl(), with one minor
|
||||
exception: it is not possible to acquire an exclusive lock if the file
|
||||
is not open for write.
|
||||
|
||||
The fcntl() locking functions are: query lock station of a file
|
||||
region, lock/unlock a region, and lock/unlock a region and block until
|
||||
have the lock. The locks may be shared or exclusive. By means of the
|
||||
statd and lockd daemons, fcntl() locking is available on NFS files.
|
||||
|
||||
When statd is started at system boot, it reads its /etc/state
|
||||
file (which contains the number of times it has been invoked) and
|
||||
/etc/sm directory (which contains a list of all remote sites which are
|
||||
client or server locking with this site), and notifies the statd on
|
||||
each of these systems that it has been restarted. Each statd then
|
||||
notifies the local lockd of the restart of that system.
|
||||
|
||||
lockd receives fcntl() requests for NFS files. It communicates
|
||||
with the lockd at the server and requests it to apply the lock, and
|
||||
with the statd to request it for notification when the server goes
|
||||
down. It blocks until all these requests are completed.
|
||||
|
||||
There is quite a mythos about fcntl() locking.
|
||||
|
||||
One religion holds that fcntl() locking is the best thing since
|
||||
sliced bread, and that programs which use flock() should be converted
|
||||
to fcntl() so that NFS locking will work. However, as noted above,
|
||||
very few systems support both calls, so such an exercise is pointless
|
||||
except on Ultrix and Linux.
|
||||
|
||||
Another religion, which I adhere to, has the opposite viewpoint.
|
||||
|
||||
|
||||
FCNTL() BUGS
|
||||
|
||||
For all of the hairy code to do individual section locking of a
|
||||
file, it's clear that the designers of fcntl() locking never
|
||||
considered some very basic locking operations. It's as if all they
|
||||
knew about locking they got out of some CS textbook with not
|
||||
investigation of real-world needs.
|
||||
|
||||
It is not possible to acquire an exclusive lock unless the file
|
||||
is open for write. You could have append with shared read, and thus
|
||||
you could have a case in which a read-only access may need to go
|
||||
exclusive. This problem can be programmed around once the programmer
|
||||
is aware of it.
|
||||
|
||||
If the file is opened on another file designator in the same
|
||||
process, the file is unlocked even if no attempt is made to do any
|
||||
form of locking on the second designator. This is a very bad bug. It
|
||||
means that an application must keep track of all the files that it has
|
||||
opened and locked.
|
||||
|
||||
If there is no statd/lockd on the NFS server, fcntl() will hang
|
||||
forever waiting for them to appear. This is a bad bug. It means that
|
||||
any attempt to lock on a server that doesn't run these daemons will
|
||||
hang. There is no way for an application to request flock() style
|
||||
``try to lock, but no-op if the mechanism ain't there''.
|
||||
|
||||
There is a rumor to the effect that fcntl() will hang forever on
|
||||
local files too if there is no local statd/lockd. These daemons are
|
||||
running on mailer.u, although they appear not to have much CPU time.
|
||||
A useful experiment would be to kill them and see if imapd is affected
|
||||
in any way, but I decline to do so without an OK from UCS! ;-) If
|
||||
killing statd/lockd can be done without breaking fcntl() on local
|
||||
files, this would become one of the primary means of dealing with this
|
||||
problem.
|
||||
|
||||
The statd and lockd daemons have quite a reputation for extreme
|
||||
fragility. There have been numerous reports about the locking
|
||||
mechanism being wedged on a systemwide or even clusterwide basis,
|
||||
requiring a reboot to clear. It is rumored that this wedge, once it
|
||||
happens, also blocks local locking. Presumably killing and restarting
|
||||
statd would suffice to clear the wedge, but I haven't verified this.
|
||||
|
||||
There appears to be a limit to how many locks may be in use at a
|
||||
time on the system, although the documentation only mentions it in
|
||||
passing. On some of their systems, UCS has increased lockd's ``size
|
||||
of the socket buffer'', whatever that means.
|
||||
|
||||
C-CLIENT USAGE
|
||||
|
||||
c-client uses flock(). On System V systems, flock() is simulated
|
||||
by an emulator that calls fcntl().
|
||||
|
||||
|
||||
BEZERK AND MMDF
|
||||
|
||||
Locking in the traditional UNIX formats was largely dictated by
|
||||
the status quo in other applications; however, additional protection
|
||||
is added against inadvertantly running multiple instances of a
|
||||
c-client application on the same mail file.
|
||||
|
||||
(1) c-client attempts to create a .lock file (mail file name with
|
||||
``.lock'' appended) whenever it reads from, or writes to, the mail
|
||||
file. This is an exclusive lock, and is held only for short periods
|
||||
of time while c-client is actually doing the I/O. There is a 5-minute
|
||||
timeout for this lock, after which it is broken on the presumption
|
||||
that it is a stale lock. If it can not create the .lock file due to
|
||||
an EACCES (protection failure) error, it once silently proceeded
|
||||
without this lock; this was for systems which protect /usr/spool/mail
|
||||
from unprivileged processes creating files. Today, c-client reports
|
||||
an error unless it is built otherwise. The purpose of this lock is to
|
||||
prevent against unfavorable interactions with mail delivery.
|
||||
|
||||
(2) c-client applies a shared flock() to the mail file whenever
|
||||
it reads from the mail file, and an exclusive flock() whenever it
|
||||
writes to the mail file. This lock is freed as soon as it finishes
|
||||
reading. The purpose of this lock is to prevent against unfavorable
|
||||
interactions with mail delivery.
|
||||
|
||||
(3) c-client applies an exclusive flock() to a file on /tmp
|
||||
(whose name represents the device and inode number of the file) when
|
||||
it opens the mail file. This lock is maintained throughout the
|
||||
session, although c-client has a feature (called ``kiss of death'')
|
||||
which permits c-client to forcibly and irreversibly seize the lock
|
||||
from a cooperating c-client application that surrenders the lock on
|
||||
demand. The purpose of this lock is to prevent against unfavorable
|
||||
interactions with other instances of c-client (rewriting the mail
|
||||
file).
|
||||
|
||||
Mail delivery daemons use lock (1), (2), or both. Lock (1) works
|
||||
over NFS; lock (2) is the only one that works on sites that protect
|
||||
/usr/spool/mail against unprivileged file creation. Prudent mail
|
||||
delivery daemons use both forms of locking, and of course so does
|
||||
c-client.
|
||||
|
||||
If only lock (2) is used, then multiple processes can read from
|
||||
the mail file simultaneously, although in real life this doesn't
|
||||
really change things. The normal state of locks (1) and (2) is
|
||||
unlocked except for very brief periods.
|
||||
|
||||
|
||||
TENEX AND MTX
|
||||
|
||||
The design of the locking mechanism of these formats was
|
||||
motivated by a design to enable multiple simultaneous read/write
|
||||
access. It is almost the reverse of how locking works with
|
||||
bezerk/mmdf.
|
||||
|
||||
(1) c-client applies a shared flock() to the mail file when it
|
||||
opens the mail file. It upgrades this lock to exclusive whenever it
|
||||
tries to expunge the mail file. Because of the flock() bug that
|
||||
upgrading a lock actually releases it, it will not do so until it has
|
||||
acquired an exclusive lock (2) first. The purpose of this lock is to
|
||||
prevent against expunge taking place while some other c-client has the
|
||||
mail file open (and thus knows where all the messages are).
|
||||
|
||||
(2) c-client applies a shared flock() to a file on /tmp (whose
|
||||
name represents the device and inode number of the file) when it
|
||||
parses the mail file. It applies an exclusive flock() to this file
|
||||
when it appends new mail to the mail file, as well as before it
|
||||
attempts to upgrade lock (1) to exclusive. The purpose of this lock
|
||||
is to prevent against data being appended while some other c-client is
|
||||
parsing mail in the file (to prevent reading of incomplete messages).
|
||||
It also protects against the lock-releasing timing race on lock (1).
|
||||
|
||||
OBSERVATIONS
|
||||
|
||||
In a perfect world, locking works. You are protected against
|
||||
unfavorable interactions with the mailer and against your own mistake
|
||||
by running more than one instance of your mail reader. In tenex/mtx
|
||||
formats, you have the additional benefit that multiple simultaneous
|
||||
read/write access works, with the sole restriction being that you
|
||||
can't expunge if there are any sharers of the mail file.
|
||||
|
||||
If the mail file is NFS-mounted, then flock() locking is a silent
|
||||
no-op. This is the way BSD implements flock(), and c-client's
|
||||
emulation of flock() through fcntl() tests for NFS files and
|
||||
duplicates this functionality. There is no locking protection for
|
||||
tenex/mtx mail files at all, and only protection against the mailer
|
||||
for bezerk/mmdf mail files. This has been the accepted state of
|
||||
affairs on UNIX for many sad years.
|
||||
|
||||
If you can not create .lock files, it should not affect locking,
|
||||
since the flock() locks suffice for all protection. This is, however,
|
||||
not true if the mailer does not check for flock() locking, or if the
|
||||
the mail file is NFS-mounted.
|
||||
|
||||
What this means is that there is *no* locking protection at all
|
||||
in the case of a client using an NFS-mounted /usr/spool/mail that does
|
||||
not permit file creation by unprivileged programs. It is impossible,
|
||||
under these circumstances, for an unprivileged program to do anything
|
||||
about it. Worse, if EACCES errors on .lock file creation are no-op'ed
|
||||
, the user won't even know about it. This is arguably a site
|
||||
configuration error.
|
||||
|
||||
The problem with not being able to create .lock files exists on
|
||||
System V as well, but the failure modes for flock() -- which is
|
||||
implemented via fcntl() -- are different.
|
||||
|
||||
On System V, if the mail file is NFS-mounted and either the
|
||||
client or the server lacks a functioning statd/lockd pair, then the
|
||||
lock attempt would have hung forever if it weren't for the fact that
|
||||
c-client tests for NFS and no-ops the flock() emulator in this case.
|
||||
Systemwide or clusterwide failures of statd/lockd have been known to
|
||||
occur which cause all locks in all processes to hang (including
|
||||
local?). Without the special NFS test made by c-client, there would
|
||||
be no way to request BSD-style no-op behavior, nor is there any way to
|
||||
determine that this is happening other than the system being hung.
|
||||
|
||||
The additional locking introduced by c-client was shown to cause
|
||||
much more stress on the System V locking mechanism than has
|
||||
traditionally been placed upon it. If it was stressed too far, all
|
||||
hell broke loose. Fortunately, this is now past history.
|
||||
|
||||
TRADEOFFS
|
||||
|
||||
c-client based applications have a reasonable chance of winning
|
||||
as long as you don't use NFS for remote access to mail files. That's
|
||||
what IMAP is for, after all. It is, however, very important to
|
||||
realize that you can *not* use the lock-upgrade feature by itself
|
||||
because it releases the lock as an interim step -- you need to have
|
||||
lock-upgrading guarded by another lock.
|
||||
|
||||
If you have the misfortune of using System V, you are likely to
|
||||
run into problems sooner or later having to do with statd/lockd. You
|
||||
basically end up with one of three unsatisfactory choices:
|
||||
1) Grit your teeth and live with it.
|
||||
2) Try to make it work:
|
||||
a) avoid NFS access so as not to stress statd/lockd.
|
||||
b) try to understand the code in statd/lockd and hack it
|
||||
to be more robust.
|
||||
c) hunt out the system limit of locks, if there is one,
|
||||
and increase it. Figure on at least two locks per
|
||||
simultaneous imapd process and four locks per Pine
|
||||
process. Better yet, make the limit be 10 times the
|
||||
maximum number of processes.
|
||||
d) increase the socket buffer (-S switch to lockd) if
|
||||
it is offered. I don't know what this actually does,
|
||||
but giving lockd more resources to do its work can't
|
||||
hurt. Maybe.
|
||||
3) Decide that it can't possibly work, and turn off the
|
||||
fcntl() calls in your program.
|
||||
4) If nuking statd/lockd can be done without breaking local
|
||||
locking, then do so. This would make SVR4 have the same
|
||||
limitations as BSD locking, with a couple of additional
|
||||
bugs.
|
||||
5) Check for NFS, and don't do the fcntl() in the NFS case.
|
||||
This is what c-client does.
|
||||
|
||||
Note that if you are going to use NFS to access files on a server
|
||||
which does not have statd/lockd running, your only choice is (3), (4),
|
||||
or (5). Here again, IMAP can bail you out.
|
||||
|
||||
These problems aren't unique to c-client applications; they have
|
||||
also been reported with Elm, Mediamail, and other email tools.
|
||||
|
||||
Of the other two SVR4 locking bugs:
|
||||
|
||||
Programmer awareness is necessary to deal with the bug that you
|
||||
can not get an exclusive lock unless the file is open for write. I
|
||||
believe that c-client has fixed all of these cases.
|
||||
|
||||
The problem about opening a second designator smashing any
|
||||
current locks on the file has not been addressed satisfactorily yet.
|
||||
This is not an easy problem to deal with, especially in c-client which
|
||||
really doesn't know what other files/streams may be open by Pine.
|
||||
|
||||
Aren't you so happy that you bought an System V system?
|
||||
Reference in New Issue
Block a user