刘旭的个人网站

Strings

Redis strings store sequences of bytes, including text, serialized objects, and binary arrays.

By default, a single Redis string can be a maximum of 512 MB.

Most string operations are O(1), which means they’re highly efficient. However, be careful with SUBSTR, GETRANGE, and SETRANGE commands, which can be O(n). These random-access string commands may cause performance issues when dealing with large strings.

Lists

Redis lists are linked lists of string values. Redis lists are implemented with linked Lists because for a database system it is crucial to be able to add elements to a very long list in a very fast way. When fast access to the middle of a large collection of elements is important, you should use sorted sets.

Lists have a special feature that make them suitable to implement queues, and in general as a building block for inter process communication systems: blocking operations.

Lists are useful for a number of tasks, two very representative use cases are the following:

List operations that access its head or tail are O(1), which means they’re highly efficient. However, commands that manipulate elements within a list are usually O(n).

Sets

A Redis set is an unordered collection of unique strings. You can use Redis sets to efficiently:

Most set operations, including adding, removing, and checking whether an item is a set member, are O(1). This means that they’re highly efficient. However, for large sets with hundreds of thousands of members or more, you should exercise caution when running the SMEMBERS command. This command is O(N) and returns the entire set in a single response. As an alternative, consider the SSCAN, which lets you retrieve all members of a set iteratively.

Sorted Sets

A Redis sorted set is a collection of unique strings ordered by an associated score. When more than one string has the same score, the strings are ordered lexicographically.

Sorted sets are implemented via a dual-ported data structure containing both a skip list and a hash table. Most sorted set operations are O(log(n)), where n is the number of members. Exercise some caution when running the ZRANGE command with large return values. This command’s time complexity is O(log(n)+m), where m is the number of results returned.

Some use cases for sorted sets include:

Hashes

Redis hashes are record types structured as collections of field-value pairs.

It’s worth noting that small hashes (i.e., a few elements with small values) are encoded in special way in memory that make them very memory efficient.

Most Redis hash commands are O(1). A few commands, such as HKEYS, HVALS, HGETALL, and most of the expiration-related commands, are O(n), where n is the number of field-value pairs.

Streams

A Redis stream is a data structure that acts like an append-only log but also implements several operations to overcome some of the limits of a typical append-only log. These include random access in O(1) time and complex consumption strategies, such as consumer groups.

Redis generates a unique ID for each stream entry. You can use these IDs to retrieve their associated entries later or to read and process all subsequent entries in the stream. The fact that each stream entry has an ID is another similarity with log files, where line numbers, or the byte offset inside the file, can be used in order to identify a given entry.

Adding an entry to a stream is O(1). Accessing any single entry is O(n), where n is the length of the ID. Since stream IDs are typically short and of a fixed length, this effectively reduces to a constant time lookup. For details on why, note that streams are implemented as radix trees.

Examples of Redis stream use cases include:

query by range

We could see a stream not only as a messaging system, but also a time series store. In this case, one natural query mode is to get messages by ranges of time, or alternatively to iterate the messages using a cursor to incrementally check all the history.

To query the stream by range we are only required to specify two IDs or two timestamps.

XRANGE race:france - +
XRANGE race:france 1692632086369 1692632086371
XRANGE race:france (1692632094485-0 + COUNT 2
XREVRANGE race:france + - COUNT 1

subscribe to new items

Usually what we want is to subscribe to new items arriving to the stream. Using the traditional terminology we want the streams to be able to fan out messages to multiple clients.

This concept may appear related to Redis Pub/Sub, where you subcribe to a channel, or to Redis blocking lists, where you wait for a key to get new elements to fetch, but there are fundamental differences in the way you consume a stream:

XREAD COUNT 2 STREAMS race:france 0
XREAD BLOCK 0 STREAMS race:france $

The blocking form of XREAD is able to listen to multiple streams, just by specifying multiple key names. If the request can be serverd synchronously because there is at least one stream with elements greater than the corresponding ID we specified, it returns with the results. Otherwise, the command will block and will return the items of the first stream which gets new data.

Similarly to blocking list operations, blocking streams reads are fair from the point of view of clients waiting for data, since the semantics is FIFO style. The first client that blocked for a given stream will be the first to be unblocked when new items are available.

consumer groups

In certain problems what we want to do is not to provide the same stream of messages to many clients, but to provide a different subset of messages from the same stream to many clients. An obvious case when this is useful is that of messages which are slow to process: the ability to have N different workers that will receive different parts of the stream allows us to scale message processing, by routing different messages to different workers that are ready to do more work.

In order to achieve this, Redis uses a concept called consumer groups. A consumer group is like a pseudo consumer that gets data from a stream, and actually serve multiple consumers, providing certain guarantees:

XGROUP CREATE race:france france_riders $
XGROUP CREATE race:france france_riders 0
XREADGROUP GROUP france_riders Alice COUNT 1 STREAMS race:france >
XREADGROUP GROUP france_riders Alice COUNT 1 STREAMS race:france 0

In the real world consumers may permanently fail and never recover. What happens to the pending messages of the consumer that never recovers after stopping for any reason? Redis consumer groups offer a feature that is used in these situations in order to claim the pending messages of a given consumer so that such messages will change ownership and will be re-assigned to a different consumer.

XPENDING race:france france_riders
XPENDING race:france france_riders - + 1
XCLAIM race:france france_riders Bob 60000 1721014050450-0
XAUTOCLAIM race:france france_riders Alice 60000 0-0

The XINFO command is an observability interface that can be used with sub-commands in order to get information about streams or consumer groups.

XINFO STREAM race:france
XINFO GROUPS race:france
XINFO CONSUMERS race:france france_riders

Consumer groups in Redis streams may resemble in some way Kafka partitioning-based consumer groups, however note that Redis streams are, in practical terms, very different. Basically Kafka partitions are more similar to using N different Redis keys, while Redis consumer groups are a server-side load balancing system of messages from a given stream to N different consumers.

capped streams

XADD race:italy MAXLEN 2 * rider Jones
XADD race:italy MAXLEN ~ 1000 * rider Wood

XTRIM race:italy MAXLEN 10
XTRIM mystream MAXLEN ~ 10

Bitmaps

Bitmaps are not an actual data type, but a set of bit-oriented operations defined on the String type. Since strings are binary safe blobs and their maximum length is 512 MB, they are suitable to set up to 2^32 different bits.

One of the biggest advantages of bitmaps is that they often provide extreme space savings when storing information. For example in a system where different users are represented by incremental user IDs, it is possible to remember a single bit information of 4 billion of users using just 512 MB of memory.

Bitmaps are trivial to split into multiple keys, for example for the sake of sharding the data set abd because in general it is better to avoid working with huge keys. To split a bitmap across different keys instead of setting all the bits into a key, a trivial strategy is just to store M bits per key and obtain the key name with bit-number/M and the Nth bit to address inside the key with bit-number MOD M.

HyperLogLogs

A HyperLogLog is a probabilistic data structure used in order to estimate the cardinality of a set. Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end up with an estimated measure with a standard error, which in the case of the Redis implementation is less than 1%.