Portable file locking for R. Based on the “inter process” locks in the Python fasteners package.
The package provides a function with_flock
which evaluates an expression only if a file lock can be obtained. By default it will block and periodically retry to open the file.
For example, suppose you want to place an advisory lock in a directory path
. We can create a lockfile in that directory and then we can do whatever we want in the directory – other processes obeying this convention will wait:
path <- tempfile()
dir.create(path)
lockfile <- file.path(path, "lock")
realfile <- file.path(path, "db")
seagull::with_flock(lockfile, {
prev <- if (file.exists(realfile)) readLines(realfile) else character(0)
writeLines(c(prev, paste(Sys.getpid(), "wuz here")), realfile)
})
## NULL
readLines(realfile)
## [1] "8196 wuz here"
The code above will wait until the db file is ready, read it, add a new line to it, then release the lock. If multiple processes were trying to do this at once they would access the file in an unspecified order but a race condition between read and write is eliminated.
For example:
cl <- parallel::makeCluster(4, "PSOCK")
ign <- parallel::clusterEvalQ(cl, library(seagull))
pids <- unlist(parallel::clusterCall(cl, Sys.getpid))
pids
## [1] 8204 8212 8220 8228
Run the code from above (slightly awkward due to controlling the cluster):
f <- function(lockfile, realfile) {
seagull::with_flock(lockfile, {
prev <- if (file.exists(realfile)) readLines(realfile) else character(0)
writeLines(c(prev, paste(Sys.getpid(), "wuz here")), realfile)
})
}
ign <- parallel::clusterCall(cl, f, lockfile, realfile)
The file now contains four entries, one from each node (plus the original line from the time we ran it locally):
writeLines(readLines(realfile))
## 8196 wuz here
## 8204 wuz here
## 8212 wuz here
## 8228 wuz here
## 8220 wuz here
Note also that the order of the lines written is not the same as the order of the PIDs. Because the file is polled for access by each process it is undefined which order they will get access in.
Portability. On unix systems this uses fcntl
which should work on Linux and BSD based systems, modern NFS and SMB. On Windows it uses the Win32 API (_locking
from sys/locking.h
).
This package does not use R's connection objects. A sane native implementation of file locking would use the result of file(...)
, but because of the way R's connection objects are implemented this is not possible (the actual file descriptor object is stored in the private data of the connection object and the format of that is non-API).
Unfortunately with the implementation of unix file locking the lock will be broken if any connection to the object is closed. So it is in general unsafe to open a second connection to the file object. A possibly safe pattern would be:
The lock will be lost at step 4 above but that should not matter as seagull
never writes to the files it holds. Note that this situation applies to things like writeLines
, write.csv
, etc; those functions encompass steps 2-4 in the above.
Future versions may implement the full connection interface to allow passing the locked file around as an R connection.
devtools::install_github("richfitz/seagull")
MIT + file LICENSE © Rich FitzJohn.
flock
with_flock
(with_flock_)