redis中bitmaps使用介绍

来源:互联网 发布:淘宝有卖正品的吗 编辑:程序博客网 时间:2024/06/06 00:08

redis中有两个主要用于统计的数据结构bitmaps和hyperloglogs。

bitmaps:

下面是官网对于bitmaps介绍的中文翻译

Bitmaps are not an actual data type, but a set of bit-oriented operations defined on the String type. Since strings are binary safe blobs and their maximum length is 512 MB, they are suitable to set up to 232 different bits.

Bit operations are divided into two groups: constant-time single bit operations, like setting a bit to 1 or 0, or getting its value, and operations on groups of bits, for example counting the number of set bits in a given range of bits (e.g., population counting).

One of the biggest advantages of bitmaps is that they often provide extreme space savings when storing information. For example in a system where different users are represented by incremental user IDs, it is possible to remember a single bit information (for example, knowing whether a user wants to receive a newsletter) of 4 billion of users using just 512 MB of memory.


bitmaps事实上并不是一种新的数据类型,而是基于字符串位操作的集合,由于字符串是二进制安全的,并且最长可支持512M,所以它们可以用来存储2的32次方(512 * 1024 * 1024 * 8 )不同位的数据。

bitmaps的位操作分成两组:1.固定时间的单个位操作,比如把字符串的某个位设置为1或者0,或者获取某个位上的值 2.对于一组位的操作,对给定的比特范围内,统计设定值为1的数目。

bitmaps最大的优势是在存储数据时可以极大的节省空间,比如在一个项目中采用自增长的id来标识用户,就可以仅用512M的内存来记录4亿用户的信息(比如用户是否希望收到新的通知,用1和0标识)


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Bits are set and retrieved using the SETBIT and GETBIT commands:

> setbit key 10 1(integer) 1> getbit key 10(integer) 1> getbit key 11(integer) 0

The SETBIT command takes as its first argument the bit number, and as its second argument the value to set the bit to, which is 1 or 0. The command automatically enlarges the string if the addressed bit is outside the current string length.

GETBIT just returns the value of the bit at the specified index. Out of range bits (addressing a bit that is outside the length of the string stored into the target key) are always considered to be zero.


bitmaps 可以用SETBIT命令设置一个位上的值,用GETBIT获取一个位上的值,SETBIT命令把它的第一个参数(key 后面的第一个参数)指定要设置哪个位上的值,第二个参数当作要设置的值(当然只能设置1或者0)。如果当前的地址位已经超过字符串的长度,系统会自动增加字符串的长度。

GETBIT命令返回指定二进制位上的值,超过当前字符串所占的最大的二进制位 全部返回0。


php代码示例:

<?php$redis = new Redis();  $redis->connect('127.0.0.1', 6379);$redis->flushAll();$redis->setBit('bm', 0, 1);$redis->setBit('bm', 1, 1);  $redis->setBit('bm', 2, 1);    $redis->setBit('bm', 5, 0);    $redis->setBit('bm', 7, 1);$redis->setBit('bm', 8, 1);$redis->setBit('bm', 14, 1); echo $redis->getBit('bm', 0) . PHP_EOL;  echo $redis->getBit('bm', 1) . PHP_EOL;       echo $redis->getBit('bm', 7) . PHP_EOL;  echo $redis->getBit('bm', 15) . PHP_EOL;    $str = $redis->get('bm');echo  "length: " . strlen($str) . PHP_EOL;for($i = 0; $i < strlen($str); $i++){echo decbin(ord($str[$i])) . ' ';}



执行结果:

1110length: 211100001 10000010 //二进制位依据惯例从0算起 7是第8个二进制位 14是第十五个二进制位 以此类推

前面说过,bitmaps结构主要用作统计,所以存在几个统计用的方法


There are three commands operating on group of bits:

  1. BITOP performs bit-wise operations between different strings. The provided operations are AND, OR, XOR and NOT.
  2. BITCOUNT performs population counting, reporting the number of bits set to 1.
  3. BITPOS finds the first bit having the specified value of 0 or 1.

Both BITPOS and BITCOUNT are able to operate with byte ranges of the string, instead of running for the whole length of the string.

> setbit key 0 1(integer) 0> setbit key 100 1(integer) 0> bitcount key(integer) 2


对于一组二进制位有三种操作:

1、BITTOP 在不同的字符串间进行位操作,包括按位与,按位或,按位异或,按位取反等,这和php中的位操作符类似

2、BITCOUNT执行计数操作,统计一个字符串中二进制位设置为1的个数

3、BITPOS 获取第一个值设为0或1的位置


BITPOS和BITCOUNT 都可以操作字符串的一部分二进制位,来代替对整个字符串操作

php代码演示:

<?php/** * 打印字符串的二进制形式 */function str2bin($str){       $rt = '';    $delimiter = '';     for($i = 0; $i < strlen($str); $i++)    {                   $bits = decbin(ord($str[$i]));        $len = strlen($bits);        $zeroFillLen = 8 - $len;        $bits = str_repeat('0', $zeroFillLen) . $bits; //不足八位前面填充 '0'        $rt .= $delimiter . $bits;        $delimiter = ' ';    }    return $rt;}$redis = new Redis();$redis->connect('127.0.0.1', 6379);$redis->setBit('bm1', 0, 1);$redis->setBit('bm1', 2, 1);$redis->setBit('bm1', 5, 1);$redis->setBit('bm1', 13, 1);$redis->setBit('bm2', 5, 1);$redis->setBit('bm2', 6, 1);$redis->setBit('bm2', 13, 1);$redis->setBit('bm2', 14, 1);echo str2bin($redis->get('bm1')) . PHP_EOL;echo str2bin($redis->get('bm2')) . PHP_EOL;//public function bitOp( $operation, $retKey, ...$keys)$redis->bitOp('AND', 'bm', 'bm1', 'bm2');echo 'AND: ' . str2bin($redis->get('bm')) . PHP_EOL;$redis->bitOp('OR', 'bm', 'bm1', 'bm2');echo 'OR: ' . str2bin($redis->get('bm')) . PHP_EOL;$redis->bitOp('XOR', 'bm', 'bm1', 'bm2');echo 'XOR: ' . str2bin($redis->get('bm')) . PHP_EOL;$redis->bitOp('NOT', 'bm', 'bm1');echo 'NOT: ' . str2bin($redis->get('bm')) . PHP_EOL;echo 'bitcount: ' . $redis->bitCount('bm') . PHP_EOL;echo 'bitpos: ' . $redis->bitpos('bm2', 1) . PHP_EOL;





执行结果:


10100100 0000010000000110 00001110AND: 00000100 00000100OR: 10100110 00001110XOR: 10100010 00001010NOT: 01011011 11111011bitcount: 12bitpos: 5

Common user cases for bitmaps are:

  • Real time analytics of all kinds.
  • Storing space efficient but high performance boolean information associated with object IDs.

For example imagine you want to know the longest streak of daily visits of your web site users. You start counting days starting from zero, that is the day you made your web site public, and set a bit with SETBIT every time the user visits the web site. As a bit index you simply take the current unix time, subtract the initial offset, and divide by 3600*24.

This way for each user you have a small string containing the visit information for each day. With BITCOUNT it is possible to easily get the number of days a given user visited the web site, while with a few BITPOS calls, or simply fetching and analyzing the bitmap client-side, it is possible to easily compute the longest streak.

Bitmaps are trivial to split into multiple keys, for example for the sake of sharding the data set and because in general it is better to avoid working with huge keys. To split a bitmap across different keys instead of setting all the bits into a key, a trivial strategy is just to store M bits per key and obtain the key name with bit-number/M and the Nth bit to address inside the key with bit-number MOD M.


bitmaps的典型用途:

  • 各类实时分析
  • 使用较少空间并且具有高性能的处理id与boolean值关联的数据

     例如假设你想知道你的网站访问者中最长连续访问天数的用户,以你网站上线的那天为0开始算起,每次用户访问你网站时,用bitmaps(例如可以用用户的uid作为key,译注), 用当前访问的unix 时间戳, 减去一个偏移值(比如网站上线时的时间戳,译者注),再被 3600 * 24 整除得到的一个值作为bitmaps的index。

    这样每个用户只需要用一个很短的字符串去保存每天的访问信息,利用bitcount很容易得到指定用户访问网站的天数,使用若干bitpos命令,或者简单的获取和分析bitmap客户端,就可以很容易计算出最长连续访问。

    bitmaps 很容易切割成用多个key存储,为了分享数据, 而且一般来说避免使用一个key存储太多内容。用多个key存储一个bitmaps的内容, 而不是把所有bits都存在一个bitmaps key里面,一个简单的方案是每个key存储 M 个bits的数据,每个key用总的bits数 整除 M得到的值来命名, 第N bit 的index 用当前的bit number 取余 M。







原创粉丝点击