Create a mdb_env "environment" object. This is the way
that interacts with a lmdb database and once created, includes
methods for querying the environment, creating databases, starting
transactions and (through those) adding, getting and removing
data. This page includes reference documentation for the
object and readers are first directed to the vignette
(vignette("thor").
Usage
mdb_env(
path,
mode = as.octmode("644"),
subdir = TRUE,
readonly = FALSE,
metasync = TRUE,
sync = TRUE,
writemap = FALSE,
lock = TRUE,
mapasync = FALSE,
rdahead = TRUE,
meminit = TRUE,
maxdbs = NULL,
maxreaders = NULL,
mapsize = NULL,
reversekey = FALSE,
create = TRUE
)Arguments
- path
The directory in which the database files will reside. If
createisTRUEthis path will be created for you if it does not exist (in contrast with thelmdbC API). IfsubdirisFALSEthis is the path to the database file and an additional lock file will be created by appending "-lock" topath.- mode
The file mode (UNIX file permissions) to set on created files. this must be an
octmodeobject, with the default (as.octmode("644") being user-writeable and world-readable.- subdir
By default, lmdb creates its files within a directory (at
path). Ifsubdir = FALSEthen thepathis interpreted as the path to the main database file and a lock file will be created with "-lock" appended to the filename. Passingsubdir = FALSEis equivalent to lmdb'sMDB_NOSUBDIRflag.- readonly
Open the environment in read-only mode. No write operations are allowed. LMDB will still modify the lock file. Passing
readonly = TRUEis equivalent to lmdb'sMDB_RDONLYflag. If you want a to modify nothing no disk, passlock = FALSEalso (but beware that concurrent access may not go to plan).- metasync
If
FALSE, flush system buffers to disk only once per transaction, omit the metadata flush. Defer that until the system flushes files to disk, or next commit or the next call to the$sync()method. This optimization maintains database integrity, but a system crash may undo the last committed transaction. I.e. it preserves the A, C and I (atomicity, consistency, isolation) properties but not D (durability) database property. Passingmetasync = FALSEis equivalent to lmdb'sMDB_NOMETASYNCflag.- sync
If
FALSE, don't flush system buffers to disk when committing a transaction. This optimization means a system crash can corrupt the database or lose the last transactions if buffers are not yet flushed to disk. The risk is governed by how often the system flushes dirty buffers to disk and how often the$sync()method is called. However, if the filesystem preserves write order andwritemap = FALSE, transactions exhibit ACI (atomicity, consistency, isolation) properties and only lose D (durability). I.e. database integrity is maintained, but a system crash may undo the final transactions. Note thatsync = FALSE, writemap = TRUEleaves the system with no hint for when to write transactions to disk, unless$sync()is called.map_async = TRUE, writemap = TRUEmay be preferable. Passingsync = FALSEis equivalent to lmdb'sMDB_NOSYNCflag.- writemap
If
TRUE, use a writeable memory map unlessreadonly = TRUEis set. This uses fewer mallocs but loses protection from application bugs like wild pointer writes and other bad updates into the database. This may be slightly faster for databases that fit entirely in RAM, but is slower for databases larger than RAM. Incompatible with nested transactions. Do not mix processes withwritemap = TRUEandwritemap = FALSEon the same environment. This can defeat durability ($sync()etc). Passingwritemap = TRUEis equivalent to lmdb'sMDB_WRITEMAPflag.- lock
If
FALSE, don't do any locking. If concurrent access is anticipated, the caller must manage all concurrency itself. For proper operation the caller must enforce single-writer semantics, and must ensure that no readers are using old transactions while a writer is active. The simplest approach is to use an exclusive lock so that no readers may be active at all when a writer begins. Passinglock = FALSEis equivalent to lmdb'sMDB_NOLOCKflag.- mapasync
If
TRUE, When usingwritemap = TRUE, use asynchronous flushes to disk. As withsync = FALSE, a system crash can then corrupt the database or lose the last transactions. Calling$sync()ensures on-disk database integrity until next commit. Passingmapasync = FALSEis equivalent to lmdb'sMDB_MAPASYNCflag.- rdahead
If
FALSE, turn off readahead. Most operating systems perform readahead on read requests by default. This option turns it off if the OS supports it. Turning it off may help random read performance when the DB is larger than RAM and system RAM is full.rdahead = FALSEis not implemented on Windows. Passingrdahead = FALSEis equivalent to lmdb'sMDB_NORDAHEADflag.- meminit
If
FALSE, don't initialize malloc'd memory before writing to unused spaces in the data file. By default, memory for pages written to the data file is obtained using malloc. While these pages may be reused in subsequent transactions, freshly malloc'd pages will be initialized to zeroes before use. This avoids persisting leftover data from other code (that used the heap and subsequently freed the memory) into the data file. Note that many other system libraries may allocate and free memory from the heap for arbitrary uses. E.g., stdio may use the heap for file I/O buffers. This initialization step has a modest performance cost so some applications may want to disable it using this flag. This option can be a problem for applications which handle sensitive data like passwords, and it makes memory checkers like Valgrind noisy. This flag is not needed withwritemap = TRUE, which writes directly to the mmap instead of using malloc for pages. Passingmeminit = FALSEis equivalent to lmdb'sMDB_NOMEMINIT.- maxdbs
The number of databases available within the environment. If 0 (the default), then the environment holds just one database (the main db). To use named databases this must be set greater than one.
- maxreaders
Maximum number of simultaneous read transactions. Can only be set in the first process to open an environment.
- mapsize
Maximum size database may grow to; used to size the memory mapping. This is measured in bytes, and the default (as set in lmdb) is only 1MB (2^20 bytes). If database grows larger than
map_size, an error will be thrown and the user must close and reopen themdb_env. On 64-bit there is no penalty for making this huge (say 1TB). Must be <2GB on 32-bit. Increasing this may cause your operating system to report the disk as being used while your database is open, though this is just the amount reserved.- reversekey
Passed through to
open_databasefor the main database. IfTRUE, keys are strings to be compared in reverse order, from the end of the strings to the beginning (e.g., DNS names). By default, keys are treated as strings and compared from beginning to end. Passingreversekey = TRUEis equivalent to lmdb'sMDB_REVERSEKEY.- create
If
FALSE, do not create the directorypathif it is missing.
Details
The thor package is a wrapper around lmdb and so
below I have provided pointers to relevant options in lmdb
- the wrapper is fairly thin and so picks up limitations and
restrictions from the underlying library. Some portions of the
documentation here derives from the lmdb source documentation -
the file lmdb.h in particular.
Methods
pathReturn the absolute path to the LMDB store (on disk)
Usage:
path()Value: A string
Note: In lmdb.h this is
mdb_env_get_path()flagsReturn flags as used in construction of the LMDB environment
Usage:
flags()Value: A named logical vector. Names correspond to arguments to the constructor.
Note: In lmdb.h this is
mdb_env_get_flags()infoBrief information about the LMDB environment
Usage:
info()Value: An integer vector with elements
mapsize,last_pgno,last_txnid,maxreadersandnumreaders.Note: In lmdb.h this is
mdb_env_info()statBrief statistics about the LMDB environment.
Usage:
stat()Value: An integer vector with elements
psize(the size of a database page),depth(depth of the B-tree),brancb_pages(number of internal non-leaf) pages),leaf_pages(number of leaf pages),overflow_pages(number of overflow pages) andentries(number of data items).Note: In lmdb.h this is
mdb_env_stat()maxkeysizeThe maximum size of a key (the value can be bigger than this)
Usage:
maxkeysize()Value: A single integer
Note: In lmdb.h this is
mdb_env_get_maxkeysize()maxreadersThe maximum number of readers
Usage:
maxreaders()Value: A single integer
Note: In lmdb.h this is
mdb_env_get_maxreaders()beginBegin a transaction
Usage:
begin(db = NULL, write = FALSE, sync = NULL, metasync = NULL)Arguments:
db: A database handle, as returned byopen_database. IfNULL(the default) then the default database will be used.write: Scalar logical, indicating if this should be a write transaction. There can be only one write transaction per database (seemdb_txnfor more details) - it is an error to try to open more than one.sync: Scalar logical, indicating if the data should be synchronised synchronised (flushed to disk) after writes; see main parameter list.metasync: Scalar logical, indicating if the metadata should be synchronised (flushed to disk) after writes; see main parameter list.
Details: Transactions are the key objects for interacting with an LMDB database (aside from the convenience interface below). They are described in more detail in
mdb_txn.Value: A
mdb_txnobjectNote: In lmdb.h this is
mdb_begin()with_transactionEvaluate some code within a transaction
Usage:
with_transaction(fun, db = NULL, write = FALSE)Arguments:
fun: A function of one argument that does the work of the transaction.with_transactionwill pass the transaction to this function. This is most easily explained with an example, so see the bottom of the helpdb: A database handle, as returned byopen_database. IfNULL(the default) then the default database will be used.write: Scalar logical, indicating if this should be a write transaction. There can be only one write transaction per database (seemdb_txnfor more details) - it is an error to try to open more than one.
Details: This exists to simplify a pattern where one wants to open a transaction, evaluate some code with that transaction and if anything goes wrong abort, but otherwise commit. It is most useful with read-write transactions, but can be used with both (and the default is for readonly transactions, like
begin().open_databaseOpen a named database, or return one if already opened.
Usage:
open_database(key = NULL, reversekey = FALSE, create = TRUE)Arguments:
key: Name of the database; ifNULLthis returns the default database (always open).reversekey: Compare strings in reverse order? Seereversekeydocumentation abovecreate: Create database if it does not exist already?
Details: LMDB environments can hold multiple databases, provided they have been opened with
maxdbsgreater than one. There is always a "default" database - this is unnamed and cannot be dropped. Other databases have a key (i.e., a name) and can be dropped. These database objects are passed through to other methods, notablydrop_databaseandbeginNote: In lmdb.h this is
mdb_open()drop_databaseDrop a database
Usage:
drop_database(db, delete = TRUE)Arguments:
db: A database object, as returned byopen_databasedelete: Scalar logical, indicating if the database should be deleted too. IfFALSE, the values are deleted from the database (i.e., it is emptied). IfTRUEthen the actual database is deleted too.
Value: No return value, called for side effects only
Note: In lmdb.h this is
mdb_drop()syncFlush the data buffers to disk.
Usage:
sync(force = FALSE)Arguments:
force: Scalar logical; force a synchronous flush. Otherwise if the environment was constructed withsync = FALSEthe flushes will be omitted, and withmapasync = TRUEthey will be asynchronous.
Details: Data is always written to disk when a transaction is committed, but the operating system may keep it buffered. LMDB always flushes the OS buffers upon commit as well, unless the environment was opened with
sync = FALSEor in partmetasync = FALSE. This call is not valid if the environment was opened withreadonly = TRUE.Note: In lmdb.h this is
mdb_env_sync()copyCopy the entire environment state to a new path. This can be used to make a backup of the database.
Usage:
copy(path, compact = FALSE)Arguments:
path: Scalar character; the new pathcompact: Scalar logical; perform compaction while copying? This omits free pages and sequentially renumbers all pages in output. This can take longer than the default but produce a smaller database
Value: Invisibly, the new path (allowing use of
$copy(tempfile))Note: In lmdb.h this is
mdb_env_copy()&mdb_env_copy2()closeClose the environment. This closes all cursors and transactions (active write transactions are aborted).
Usage:
close()Value: No return value, called for side effects only
Note: In lmdb.h this is
mdb_env_close()destroyTotally destroy an LMDB environment. This closes the database and removes the files. Use with care!
Usage:
destroy()Value: No return value, called for side effects only
reader_listList information about database readers
Usage:
reader_list()Value: A character matrix with columns
pid(process ID),thread(a pointer address), andtxnid(a small integer)Note: In lmdb.h this is
mdb_reader_list()reader_checkCheck for, and remove, stale entries in the reader lock table.
Usage:
reader_check()Value: An integer, being the number of stale readers discarded. However, this function is primarily called for its side effect.
Note: In lmdb.h this is
mdb_reader_check()getRetrieve a value from the database
Usage:
get(key, missing_is_error = TRUE, as_raw = NULL, db = NULL)Arguments:
key: A string (or raw vector) - the key to getmissing_is_error: Logical, indicating if a missing value is an error (by default it is). Alternatively, withmissing_is_error = FALSE, a missing value will returnNULL. Because no value can beNULL(all values must have nonzero length) aNULLis unambiguously missing.as_raw: EitherNULL, or a logical, to indicate the result type required. Withas_raw = NULL, the default, the value will be returned as a string if possible. If not possible it will return a raw vector. Withas_raw = TRUE,get()will always return a raw vector, even when it is possibly to represent the value as a string. Ifas_raw = FALSE,getwill return a string, but throw an error if this is not possible. This is discussed in more detail in the thor vignette (vignette("thor"))db: A database handle that would be passed through to create the transaction (see the$beginmethod).
Details: This is a helper method that establishes a temporary read-only transaction, calls the corresponding method in
mdb_txnand then aborts the transaction.Note: In lmdb.h this is
mdb_get()putPut values into the database. In other systems, this might be called "
set".Usage:
put(key, value, overwrite = TRUE, append = FALSE, db = NULL)Arguments:
key: The name of the key (string or raw vector)value: The value to save (string or raw vector)overwrite: Logical - whenTRUEit will overwrite existing data; whenFALSEthrow an errorappend: Logical - whenTRUE, append the given key/value to the end of the database. This option allows fast bulk loading when keys are already known to be in the correct order. But if you load unsorted keys withappend = TRUEan error will be throwndb: A database handle that would be passed through to create the transaction (see the$beginmethod).
Details: This is a helper method that establishes a temporary read-write transaction, calls the corresponding method in
mdb_txnand then commits the transaction. This will only be possible to use if there is not an existing write transaction in effect for this environment.Note: In lmdb.h this is
mdb_put()delRemove a key/value pair from the database
Usage:
del(key, db = NULL)Arguments:
key: The name of the key (string or raw vector)db: A database handle that would be passed through to create the transaction (see the$beginmethod).
Details: This is a helper method that establishes a temporary read-write transaction, calls the corresponding method in
mdb_txnand then commits the transaction. This will only be possible to use if there is not an existing write transaction in effect for this environment.Value: A scalar logical, indicating if the value was deleted
Note: In lmdb.h this is
mdb_del()existsTest if a key exists in the database.
Usage:
exists(key, db = NULL)Arguments:
key: The name of the key to test (string or raw vector). Unlikeget,putanddel(but likemget,mputandmdel),existsis vectorised. So the input here can be; a character vector of any length (returning the same length logical vector), a raw vector (representing one key, returning a scalar logical) or alistwith each element being either a scalar character or a raw vector, returning a logical the same length as the list.db: A database handle that would be passed through to create the transaction (see the$beginmethod).
Details: This is an extension of the raw LMDB API and works by using
mdb_getfor each key (which for lmdb need not copy data) and then testing whether the return value isMDB_SUCCESSorMDB_NOTFOUND.This is a helper method that establishes a temporary read-only transaction, calls the corresponding method in
mdb_txnand then aborts the transaction.Value: A logical vector
listList keys in the database
Usage:
list(starts_with = NULL, as_raw = FALSE, size = NULL, db = NULL)Arguments:
starts_with: Optionally, a prefix for all strings. Note that is not a regular expression or a filename glob. Usingfoowill matchfoo,foo:barandfoobarbut notfoorFOO. Because LMDB stores keys in a sorted tree, using a prefix can greatly reduce the number of keys that need to be tested.as_raw: Same interpretation asas_rawin$get()but with a different default. It is expected that most of the time keys will be strings, so by default we'll try and return a character vectoras_raw = FALSE. Change the default if your database contains raw keys.size: For use withstarts_with, optionally a guess at the number of keys that would be returned. withstarts_with = NULLwe can look the number of keys up directly so this is ignored.db: A database handle that would be passed through to create the transaction (see the$beginmethod).
Details: This is a helper method that establishes a temporary read-only transaction, calls the corresponding method in
mdb_txnand then aborts the transaction.mgetGet values for multiple keys at once (like
$getbut vectorised overkey)Usage:
mget(key, as_raw = NULL, db = NULL)Arguments:
key: The keys to get values for. Zero, one or more keys are allowed.as_raw: As for$get(), logical (orNULL) indicating if raw or string output is expected or desired.db: A database handle that would be passed through to create the transaction (see the$beginmethod).
Details: This is a helper method that establishes a temporary read-only transaction, calls the corresponding method in
mdb_txnand then aborts the transaction.mputPut multiple values into the database (like
$putbut vectorised overkey/value).Usage:
mput(key, value, overwrite = TRUE, append = FALSE, db = NULL)Arguments:
key: The keys to setvalue: The values to set against these keys. Must be the same length askey.overwrite: As for$putappend: As for$putdb: A database handle that would be passed through to create the transaction (see the$beginmethod).
Details: The implementation simply calls
mdb_putrepeatedly (but with a single round of error checking) so duplicatekeyentries will result in the last key winning.This is a helper method that establishes a temporary read-write transaction, calls the corresponding method in
mdb_txnand then commits the transaction. This will only be possible to use if there is not an existing write transaction in effect for this environment.mdelDelete multiple values from the database (like
$delbut vectorised overkey).Usage:
mdel(key, db = NULL)Arguments:
key: The keys to deletedb: A database handle that would be passed through to create the transaction (see the$beginmethod).
Details: This is a helper method that establishes a temporary read-write transaction, calls the corresponding method in
mdb_txnand then commits the transaction. This will only be possible to use if there is not an existing write transaction in effect for this environment.Value: A logical vector, the same length as
key, indicating if each key was deleted.
Examples
# Create a new environment (just using defaults)
env <- thor::mdb_env(tempfile())
# At its most simple (using temporary transactions)
env$put("a", "hello world")
#> NULL
env$get("a")
#> [1] "hello world"
# Or create transactions
txn <- env$begin(write = TRUE)
txn$put("b", "another")
txn$put("c", "value")
# Transaction not committed so value not visible outside our transaction
env$get("b", missing_is_error = FALSE)
#> NULL
# After committing, the values are visible for new transactions
txn$commit()
env$get("b", missing_is_error = FALSE)
#> [1] "another"
# A convenience method, 'with_transaction' exists to allow
# transactional workflows with less code repetition.
# This will get the old value of a key 'a', set 'a' to a new value
# and return the old value:
env$with_transaction(function(txn) {
val <- txn$get("a")
txn$put("a", "new_value")
val
}, write = TRUE)
#> [1] "hello world"
# If an error occurred, the transaction would be aborted. So far,
# not very interesting!
# More interesting: implementing redis's RPOPLPUSH that takes the
# last value off of the end of one list and pushes it into the
# start of another.
rpoplpush <- function(env, src, dest) {
f <- function(txn) {
# Take the value out of the source list and update
val <- unserialize(txn$get(src, as_raw = TRUE))
take <- val[[length(val)]]
txn$put(src, serialize(val[-length(val)], NULL))
# Put the value onto the destination list
val <- unserialize(txn$get(dest, as_raw = TRUE))
txn$put(dest, serialize(c(val, take), NULL))
# And we'll return the value that was modified
take
}
env$with_transaction(f, write = TRUE)
}
# Set things up - a source list with numbers 1:5 and an empty
# destination list
env$put("src", serialize(1:5, NULL))
#> NULL
env$put("dest", serialize(integer(0), NULL))
#> NULL
# then try it out:
rpoplpush(env, "src", "dest") # 5
#> [1] 5
rpoplpush(env, "src", "dest") # 4
#> [1] 4
rpoplpush(env, "src", "dest") # 3
#> [1] 3
# Here is the state of the two lists
unserialize(env$get("src"))
#> [1] 1 2
unserialize(env$get("dest"))
#> [1] 5 4 3
# The above code will fail if one of the lists is available
env$del("dest")
#> [1] TRUE
try(rpoplpush(env, "src", "dest"))
#> Error in mdb_get(self$.ptr, self$.db$.ptr, key, missing_is_error, as_proxy, :
#> Key 'dest' not found in database
# but because it's in a transaction, this failed attempt leaves src
# unchanged
unserialize(env$get("src"))
#> [1] 1 2