Create an object cache; a "storr". A storr is a simple key-value store where the actual content is stored in a content-addressable way (so that duplicate objects are only stored once) and with a caching layer so that repeated lookups are fast even if the underlying storage driver is slow.
Details
To create a storr you need to provide a "driver" object. There
are three in this package: driver_environment for ephemeral
in-memory storage, driver_rds for serialized storage to disk,
and driver_dbi for use with DBI-compliant database interfaces.
The redux
package (on CRAN) provides a storr driver that uses
Redis.
There are convenience functions (e.g., storr_environment and storr_rds) that may be more convenient to use than this function.
Once a storr has been made it provides a number of methods.
Because storr uses R6
(R6::R6Class) objects, each
method is accessed by using $
on a storr object (see the
examples). The methods are described below in the "Methods"
section.
The default_namespace
affects all methods of the storr object
that refer to namespaces; if a namespace is not given, then the
action (get, set, del, list, import, export) will affect the
default_namespace
. By default this is "objects"
.
Methods
destroy
Totally destroys the storr by telling the driver to destroy all the data and then deleting the driver. This will remove all data and cannot be undone.
Usage:
destroy()
flush_cache
Flush the temporary cache of objects that accumulates as the storr is used. Should not need to be called often.
Usage:
flush_cache()
set
Set a key to a value.
Usage:
set(key, value, namespace = self$default_namespace, use_cache = TRUE)
Arguments:
key
: The key name. Can be any string.\item{\code{value}: Any R object to store. The object will generally be serialized (this is not actually true for the environment storr) so only objects that would usually be expected to survive a `saveRDS`/`readRDS` roundtrip will work. This excludes Rcpp modules objects, external pointers, etc. But any "normal" R object will work fine. } \item{\code{namespace}: An optional namespace. By default the default namespace that the storr was created with will be used (by default that is "objects"). Different namespaces allow different types of objects to be stored without risk of names colliding. Use of namespaces is optional, but if used they must be a string. } \item{\code{use_cache}: Use the internal cache to avoid reading or writing to the underlying storage if the data has already been seen (i.e., we have seen the hash of the object before). }
Value: Invisibly, the hash of the saved object.
set_by_value
Like
set
but saves the object with a key that is the same as the hash of the object. Equivalent to$set(digest::digest(value), value)
.Usage:
set_by_value(value, namespace = self$default_namespace, use_cache = TRUE)
Arguments:
value
: An R object to save, with the same limitations asset
.
get
Retrieve an object from the storr. If the requested value is not found then a
KeyError
will be raised (an R error, but can be caught withtryCatch
; see the "storr" vignette).Usage:
get(key, namespace = self$default_namespace, use_cache = TRUE)
Arguments:
key
: The name of the key to get.
get_hash
Retrieve the hash of an object stored in the storr (rather than the object itself).
Usage:
get_hash(key, namespace = self$default_namespace)
Arguments:
key
: The name of the key to get.
del
Delete an object from the storr.
Usage:
del(key, namespace = self$default_namespace)
Arguments:
key
: A vector of names of keys
Value: A logical vector the same length as the recycled length of key/namespace, with each element being
TRUE
if an object was deleted,FALSE
otherwise.duplicate
Duplicate the value of a set of keys into a second set of keys. Because the value stored against a key is just the hash of its content, this operation is very efficient - it does not make a copy of the data, just the pointer to the data (for more details see the storr vignette which explains the storage model in more detail). Multiple keys (and/or namespaces) can be provided, with keys and namespaces recycled as needed. However, the number of source and destination keys must be the same. The order of operation is not defined, so if the sets of keys are overlapping it is undefined behaviour.
Usage:
duplicate(key_src, key_dest, namespace = self$default_namespace, namespace_src = namespace, namespace_dest = namespace)
Arguments:
key_src
: The source key (or vector of keys)\item{\code{key_dest}: The destination key } \item{\code{namespace}: The namespace to copy keys within (used only of `namespace_src` and `namespace_dest` are not provided } \item{\code{namespace_src}: The source namespace - use this where keys are duplicated across namespaces. } \item{\code{namespace_dest}: The destination namespace - use this where keys are duplicated across namespaces. }
fill
Set one or more keys (potentially across namespaces) to the same value, without duplication effort serialisation, or duplicating data.
Usage:
fill(key, value, namespace = self$default_namespace, use_cache = TRUE)
Arguments:
key
: A vector of keys to get; zero to many valid keys\item{\code{value}: A single value to set all keys to } \item{\code{namespace}: A vector of namespaces (either a single namespace or a vector) } \item{\code{use_cache}: Use the internal cache to avoid reading or writing to the underlying storage if the data has already been seen (i.e., we have seen the hash of the object before). }
clear
Clear a storr. This function might be slow as it will iterate over each key. Future versions of storr might allow drivers to implement a bulk clear method that will allow faster clearing.
Usage:
clear(namespace = self$default_namespace)
Arguments:
namespace
: A namespace, to clear a single namespace, orNULL
to clear all namespaces.
exists
Test if a key exists within a namespace
Usage:
exists(key, namespace = self$default_namespace)
Arguments:
key
: A vector of names of keys
Value: A logical vector the same length as the recycled length of key/namespace, with each element being
TRUE
if the object exists andFALSE
otherwise.exists_object
Test if an object with a given hash exists within the storr
Usage:
exists_object(hash)
Arguments:
hash
: Hash to test
mset
Set multiple elements at once
Usage:
mset(key, value, namespace = self$default_namespace, use_cache = TRUE)
Arguments:
key
: A vector of keys to set; zero to many valid keys\item{\code{value}: A vector of values } \item{\code{namespace}: A vector of namespaces (either a single namespace or a vector) } \item{\code{use_cache}: Use the internal cache to avoid reading or writing to the underlying storage if the data has already been seen (i.e., we have seen the hash of the object before). }
Details: The arguments
key
andnamespace
are recycled such that either can be given as a scalar if the other is a vector. Other recycling is not allowed.mget
Get multiple elements at once
Usage:
mget(key, namespace = self$default_namespace, use_cache = TRUE, missing = NULL)
Arguments:
key
: A vector of keys to get; zero to many valid keys\item{\code{namespace}: A vector of namespaces (either a single namespace or a vector) } \item{\code{use_cache}: Use the internal cache to avoid reading or writing to the underlying storage if the data has already been seen (i.e., we have seen the hash of the object before). } \item{\code{missing}: Value to use for missing elements; by default `NULL` will be used. IF `NULL` is a value that you might have stored in the storr you might want to use a different value here to distinguish "missing" from "set to NULL". In addition, the `missing` attribute will indicate which values were missing. }
Details: The arguments
key
andnamespace
are recycled such that either can be given as a scalar if the other is a vector. Other recycling is not allowed.Value: A list with a length of the recycled length of
key
andnamespace
. If any elements are missing, then an attributemissing
will indicate the elements that are missing (this will be an integer vector with the indices of values were not found in the storr).mset_by_value
Set multiple elements at once, by value. A cross between
mset
andset_by_value
.Usage:
mset_by_value(value, namespace = self$default_namespace, use_cache = TRUE)
Arguments:
value
: A list or vector of values to set into the storr.
gc
Garbage collect the storr. Because keys do not directly map to objects, but instead map to hashes which map to objects, it is possible that hash/object pairs can persist with nothing pointing at them. Running
gc
will remove these objects from the storr.Usage:
gc()
get_value
Get the content of an object given its hash.
Usage:
get_value(hash, use_cache = TRUE)
Arguments:
hash
: The hash of the object to retrieve.
Value: The object if it is present, otherwise throw a
HashError
.set_value
Add an object value, but don't add a key. You will not need to use this very often, but it is used internally.
Usage:
set_value(value, use_cache = TRUE)
Arguments:
value
: An R object to set.
Value: Invisibly, the hash of the object.
mset_value
Add a vector of object values, but don't add keys. You will not need to use this very often, but it is used internally.
Usage:
mset_value(values, use_cache = TRUE)
Arguments:
values
: A list of R objects to set
list
List all keys stored in a namespace.
Usage:
list(namespace = self$default_namespace)
Arguments:
namespace
: The namespace to list keys within.
Value: A sorted character vector (possibly zero-length).
list_hashes
List all hashes stored in the storr
Usage:
list_hashes()
Value: A sorted character vector (possibly zero-length).
list_namespaces
List all namespaces known to the database
Usage:
list_namespaces()
Value: A sorted character vector (possibly zero-length).
import
Import R objects from an environment.
Usage:
import(src, list = NULL, namespace = self$default_namespace, skip_missing = FALSE)
Arguments:
src
: Object to import objects from; can be a list, environment or another storr.\item{\code{list}: Names of of objects to import (or `NULL` to import all objects in `envir`. If given it must be a character vector. If named, the names of the character vector will be the names of the objects as created in the storr. } \item{\code{namespace}: Namespace to get objects from, and to put objects into. If `NULL`, all namespaces from `src` will be imported. If named, then the same rule is followed as `list`; `namespace = c(a = b)` will import the contents of namespace `b` as namespace `a`. } \item{\code{skip_missing}: Logical, indicating if missing keys (specified in `list`) should be skipped over, rather than being treated as an error (the default). }
export
Export objects from the storr into something else.
Usage:
export(dest, list = NULL, namespace = self$default_namespace, skip_missing = FALSE)
Arguments:
dest
: A target destination to export objects to; can be a list, environment, or another storr. Uselist()
to export to a brand new list, or useas.list(object)
for a shorthand.\item{\code{list}: Names of objects to export, with the same rules as `list` in `$import`. } \item{\code{namespace}: Namespace to get objects from, and to put objects into. If `NULL`, then this will export namespaces from this (source) storr into the destination; if there is more than one namespace,this is only possible if `dest` is a storr (otherwise there will be an error). } \item{\code{skip_missing}: Logical, indicating if missing keys (specified in `list`) should be skipped over, rather than being treated as an error (the default). }
Value: Invisibly,
dest
, which allows use ofe <- st$export(new.env())
andx <- st$export(list())
.archive_export
Export objects from the storr into a special "archive" storr, which is an storr_rds with name mangling turned on (which encodes keys with base64 so that they do not violate filesystem naming conventions).
Usage:
archive_export(path, names = NULL, namespace = NULL)
Arguments:
path
: Path to create the storr at; can exist already.
archive_import
Inverse of
archive_export
; import objects from a storr that was created byarchive_export
.Usage:
archive_import(path, names = NULL, namespace = NULL)
Arguments:
path
: Path of the exported storr.
index_export
Generate a data.frame with an index of objects present in a storr. This can be saved (for an rds storr) in lieu of the keys/ directory and re-imported with
index_import
. It will provide a more version control friendly export of the data in a storr.Usage:
index_export(namespace = NULL)
Arguments:
namespace
: Optional character vector of namespaces to export. The default is to export all namespaces.
index_import
Import an index.
Usage:
index_import(index)
Arguments:
index
: Must be a data.frame with columns 'namespace', 'key' and 'hash' (in any order). It is an error if not all hashes are present in the storr.
Examples
st <- storr(driver_environment())
## Set "mykey" to hold the mtcars dataset:
st$set("mykey", mtcars)
## and get the object:
st$get("mykey")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## List known keys:
st$list()
#> [1] "mykey"
## List hashes
st$list_hashes()
#> [1] "a63c70e73b58d0823ab3bcbd3b543d6f"
## List keys in another namespace:
st$list("namespace2")
#> character(0)
## We can store things in other namespaces:
st$set("x", mtcars, "namespace2")
st$set("y", mtcars, "namespace2")
st$list("namespace2")
#> [1] "x" "y"
## Duplicate data do not cause duplicate storage: despite having three
## keys we only have one bit of data:
st$list_hashes()
#> [1] "a63c70e73b58d0823ab3bcbd3b543d6f"
st$del("mykey")
## Storr objects can be created that have a default namespace that is
## not "objects" by using the `default_namespace` argument (this
## one also points at the same memory as the first storr).
st2 <- storr(driver_environment(st$driver$envir),
default_namespace = "namespace2")
## All functions now use "namespace2" as the default namespace:
st2$list()
#> [1] "x" "y"
st2$del("x")
st2$del("y")