In Redis, there is a data type. When storing, two data structures are used to store them separately. So why does Redis do this? Will doing this cause the same data to take up twice the space?

Five basic types of collection objects

A collection object in Redis is an unordered collection containing elements of string type, and the elements in the collection are unique and not repeatable.

There are two underlying data structures for collection objects: intset and hashtable. Internally, the encoding is used to distinguish:

intset encoding

intset (set of integers) can hold integer values ​​of type int16_t, int32_t, int64_t, and it is guaranteed that there are no duplicate elements in the set. The intset data structure is defined as follows (in the source code inset.h):

The following figure is a simplified diagram of the collection object storage of an intset:

encoding

The encoding inside intset records the data storage type of the current integer set, there are three main types:

INTSET_ENC_INT16

At this time, each element in contents[] is an integer value of int16_t type, the range is: -32768 ~ 32767 (-2 to the 15th power ~ 2 to the 15th power – 1).

INTSET_ENC_INT32

At this time, each element in contents[] is an integer value of int32_t type, the range is: -2147483648 ~ 2147483647 (-2 to the 31st power ~ 2 to the 31st power – 1).

INTSET_ENC_INT64

At this time, each element in contents[] is an integer value of type int64_t, the range is: -9223372036854775808 ~ 9223372036854775807 (-2 to the 63rd power ~ 2 to the 63rd power – 1).

contents[]

Although the definition of contents[] is of int8_t type, the actual storage type is determined by the encoding above.

Integer collection upgrade

If the elements in the integer set are all 16-bit at the beginning and are stored in the int16_t type, then another 32-bit integer needs to be stored, then the original integer set needs to be upgraded, and the 32-bit integer can be stored after the upgrade. Stored in a collection of integers. This involves the type upgrade of the integer set. The upgrade process mainly has 4 steps:

Expand the size of the underlying array space according to the type of the newly added element, and allocate the new space according to the number of bits of the existing elements after the upgrade.

Convert the existing elements, and put the converted elements back into the array one by one from back to front.

Put the new element at the head or tail of the array (because the condition for triggering the upgrade is that the integer type of the current array cannot store the new element, so the new element is either larger or smaller than the existing element).

Modify the encoding property to the latest encoding, and modify the length property synchronously.

PS: Like the encoding of string objects, once the type of the integer collection is upgraded, the encoding will remain and cannot be downgraded.

Upgrade example

1. Suppose we have a collection storage whose encoding is int16_t, and internally stores 3 elements:

2. At this time, an integer 50000 needs to be inserted, and it is found that it cannot be stored, and 50000 is an integer of type int32_t, so it is necessary to apply for a new space, and the size of the application space is 4 * 32 – 48=80.

3. Now there are 4 elements to be placed in the new array, and the original array is ranked 3rd, so it is necessary to move the upgraded 3 to bits 64-95.

4. Go ahead and move the upgraded 2 to 32-63 bits.

5. Continue to move the upgraded 1 to bits 0-31.

6. The 50000 will then be placed in bits 96-127.

7. Finally, the encoding and length attributes will be modified, and the upgrade will be completed after modification.

hashtable encoding

The hashtable structure was analyzed in detail when the hash object was described earlier.

intset and hashtable encoding conversion

Redis will choose to use the intset encoding when a collection satisfies the following two conditions:

All elements held by a collection object are integer values.

The number of elements stored in the collection object is less than or equal to 512 (this threshold can be controlled by the configuration file set-max-intset-entries).

Once the elements in the collection do not meet the above two conditions, the hashtable encoding will be selected.

Common commands for collection objects

sadd key member1 member2: Add one or more elements member to the set key, and return the number of successful additions. If the element already exists, it will be ignored.

sismember key member: Determine whether the element member exists in the set key.

srem key member1 member2: Remove the elements in the set key, and the non-existing elements will be ignored.

smove source dest member: Move the element member from the collection source to dest, or do nothing if member does not exist.

smembers key: Returns all elements in the set key.

Knowing the common commands for manipulating collection objects, we can verify the type and encoding of the hash object mentioned above. Before testing, in order to prevent the interference of other key values, we first execute the flushall command to clear the Redis database. Execute the following commands in sequence:

Get the following effect:

It can be seen that when there are only integers in the set elements, the set uses the intset encoding, and when the set elements contain non-integers, the hashtable encoding is used.

Five basic types of ordered collection objects

The difference between an ordered set and a set in Redis is that each element in the ordered set is associated with a double-type score, and then sorted in ascending order of the score. In other words, the order of the sorted set is determined by the fraction when we set the value ourselves. There are two underlying data structures for sorted set objects: skiplist and ziplist. Internally, it is also distinguished by encoding:

skiplist encoding

skiplist is the skip list, sometimes referred to simply as the skip list. The ordered set object encoded with skiplist uses the zset structure as the underlying implementation, and the zset contains both a dictionary and a skip list.

jump table

The skip table is an ordered data structure, and its main feature is to achieve the purpose of accessing nodes quickly by maintaining multiple pointers to other nodes in each node. In most cases, the efficiency of the skip table can be equal to that of the balanced tree, but the implementation of the skip table is far simpler than the implementation of the balanced tree, so Redis chooses to use the skip table to implement the ordered set. The following figure is an ordinary ordered linked list. If we want to find the element 35, we can only traverse from the beginning to the end (the elements in the linked list do not support random access, so binary search cannot be used, and the array can be accessed randomly through subscripts) , so binary search is generally applicable to sorted arrays), and the time complexity is O(n).

Then if we can jump directly to the middle of the linked list, we can save a lot of resources. This is the principle of the skip list. As shown in the following figure, it is an example of the data structure of the skip list:

In the above figure, level1, level2, and level3 are the levels of the skip list. Each level has a pointer to the next element of the same level. For example, when we traverse to find element 35 in the above figure, there are three options:

The first is to execute the pointer of the level1 level, which needs to be traversed 7 times (1->8->9->12->15->20->35) to find element 35.

The second is to execute the level2 level pointer, and only need to traverse 5 times (1->9->12->15->35) to find element 35.

The third type is to execute the elements of the level3 level. At this time, it only needs to traverse 3 times (1->12->35) to find the element 35, which greatly improves the efficiency.

Storage structure of skiplist

Each node in the skip list is a zskiplistNode node (in the source code server.h):

level

level is the level in the jump table, which is an array, which means that an element of a node can have multiple levels, that is, multiple pointers to other nodes, and the program can choose the fastest path to improve access through pointers at different levels speed.

level is a number between 1 and 32 that is randomly generated according to the power law every time a new node is created.

forward (forward pointer)

Each layer will have a pointer to the element at the end of the linked list, and the forward pointer needs to be used when traversing the element.

span

The span records the distance between two nodes. It should be noted that if it points to NULL, the span is 0.

backward (backward pointer)

Unlike the forward pointer, there is only one back pointer, so you can only go back to the previous node each time (the back pointer is not drawn in the above figure).

ele (element)

The element in the jump table is an sds object (the earlier version used the redisObject object), and the element must be unique and cannot be repeated.

score

The node's score is a double-type floating-point number. In the jump table, the nodes are arranged in ascending order according to their scores. The scores of different nodes can be repeated.

The above description is only a node in the skip list, and multiple zskiplistNode nodes form a zskiplist object:

At this point, you may think that the ordered set is implemented with this zskiplist, but in fact, Redis does not directly use the zskiplist to implement, but uses the zset object to wrap it again.

So in the end, if an ordered set uses skiplist encoding, its data structure is shown in the following figure:

The key in the dictionary in the upper part of the above figure corresponds to the element in the ordered set (member), and the value corresponds to the score (score). In the lower part of the above figure, the jump table integers 1, 8, 9, and 12 also correspond to the element (member), and the double-type number in the last row is the score (score).

That is to say, the data in the dictionary and the jump table both point to the elements we store (the two data structures ultimately point to the same address, so the data will not be stored redundantly). Why does Redis do this?

Why choose to use both a dictionary and a skip table

Sorted sets can be implemented by using skip table directly or using dictionary alone, but let’s think about it, if we use skip table alone, then although we can use a pointer with a large span to traverse the elements to find the data we need, it is complicated. The degree still reaches O(logN), and the complexity of obtaining an element in a dictionary is O(1). If you use a dictionary alone to obtain elements, it is very fast, but the dictionary is unordered, so if you want to find a range, you need to Sorting is a time-consuming operation, so Redis combines two data structures to maximize performance, which is also the subtlety of Redis design.

ziplist encoding

Compressed lists are used in both list objects and hash objects. For details, click here.

https://blog.csdn.net/zwx900102/article/details/112651435

ziplist and skiplist encoding conversion

When an ordered set object meets both of the following conditions, it will use ziplist encoding for storage:

The number of elements stored in the sorted set object is less than 128 (can be modified by configuring zset-max-ziplist-entries).

The total length of all elements stored in the sorted set object is less than 64 bytes (can be modified by configuring zset-max-ziplist-value).

Ordered Collection Object Common Commands

zadd key score1 member1 score2 member2: Add one or more elements (members) and their scores to the sorted set key.

zscore key member: Returns the score of the member member in the sorted set key.

zincrby key num member: add num to the member in the sorted set key, and num can be a negative number.

zcount key min max: Returns the number of members whose score value is in the interval [min,max] in the sorted set key.

zrange key start stop: Returns all members in the [start, stop] interval after the score in the sorted set key is arranged from small to large.

zrevrange key start stop: Returns all members in the [start, stop] interval after the scores in the sorted set key are arranged from large to small.

zrangebyscore key min max: Returns all elements in the [min,max] interval in the sorted set sorted by score from small to large. Note that the default is a closed interval, but you can add (or [ to control the open and closed interval before the values ​​of max and min.

zrevrangebyscore key max min: Returns all elements in the [min,max] interval in the sorted set sorted by score from large to small. Note that the default is a closed interval, but you can add (or [ to control the open and closed interval before the values ​​of max and min.

zrank key member: Returns the rank (from small to large) of elements in member in the sorted set, and the returned result starts from 0.

zrevrank key member: Returns the ranking of the elements in the member in the sorted set (from large to small), and the returned result starts from 0.

zlexcount key min max: Returns the number of members between min and max in the sorted set. Note that min and max in this command must be preceded by (or [ to control the opening and closing interval, and the special values ​​- and + represent negative infinity and positive infinity, respectively.

Knowing the common commands for operating ordered set objects, we can verify the type and encoding of the hash object mentioned above. Before testing, in order to prevent the interference of other key values, we first execute the flushall command to clear the Redis database. Before executing the command, we first put the parameters in the configuration file

zset-max-ziplist-entries

Modify it to 2, and then restart the Redis service. After the restart is complete, execute the following commands in sequence:

Get the following effect:

Summarize

This paper mainly analyzes the implementation principles of the underlying storage structures intset and skiplist of set objects and sorted set objects, and focuses on how sorted sets implement sorting and why two data structures (dictionary and skip list) are used for simultaneous storage. reason for the data.



Reviewing Editor: Liu Qing

Leave a Reply

Your email address will not be published.