Storm存储结果至Redis

来源：互联网发布：linux编程入门书籍编辑：程序博客网时间：2024/05/24 06:08

原有的事务支持使用MemcachedState来进行，现在需要将其迁移至Redis，并且需要记录所有key值列表，因为在redis中虽然可以使用keys *操作，但不是被推荐的方式，所以把所有结果存在Redis中的一个HASH格式字段中。

关于Redis与Storm集成的相关文档，可以参考：

http://storm.apache.org/releases/2.0.0-SNAPSHOT/storm-redis.html

由于Redis中也有着较多种类型的数据结构，这也为我们提供了可能，将所有的key至统一放置到set中，或其他更为合适的数据结构中。

搭建启动Redis

目前，分配过来的4台服务器，只有135剩余内存较多，分出1G用来作为Redis存储使用，搭建一台单机Redis服务，用于记录所有的查询日志。

启动该服务：

sudo bin/redis-server conf/redis.6388.conf

Storm集成Redis

添加maven依赖：

<dependency>            <groupId>org.apache.storm</groupId>            <artifactId>storm-redis</artifactId>            <version>${storm.version}</version>        </dependency>

对于正常的Bolt来说，storm-redis提供了基本的bolt实现，RedisLookupBolt和RedisStoreBolt，

其中使用了策略模式，将实际要查询/保存相关的key设置以及策略放到了RedisLookup/StoreMapper中，在LookupBolt和StoreBolt中进行实际的查找、保存操作，根据RedisDataType的不同，支持Redis的各种数据类型：STRING, HASH, LIST, SET, SORTED_SET, HYPER_LOG_LOG。

从对应传输过来的Tuple中查找、保存相应字段的值，在RedisLookupBolt中，根据不同的key值，从key值/或者additionalKey中使用不同的方法来get得到对应的值。

@Override    public void execute(Tuple input) {        String key = lookupMapper.getKeyFromTuple(input);        Object lookupValue;        JedisCommands jedisCommand = null;        try {            jedisCommand = getInstance();            switch (dataType) {                case STRING:                    lookupValue = jedisCommand.get(key);                    break;                case LIST:                    lookupValue = jedisCommand.lpop(key);                    break;                case HASH:                    lookupValue = jedisCommand.hget(additionalKey, key);                    break;                case SET:                    lookupValue = jedisCommand.scard(key);                    break;                case SORTED_SET:                    lookupValue = jedisCommand.zscore(additionalKey, key);                    break;                case HYPER_LOG_LOG:                    lookupValue = jedisCommand.pfcount(key);                    break;                default:                    throw new IllegalArgumentException("Cannot process such data type: " + dataType);            }            List<Values> values = lookupMapper.toTuple(input, lookupValue);            for (Values value : values) {                collector.emit(input, value);            }            collector.ack(input);        } catch (Exception e) {            this.collector.reportError(e);            this.collector.fail(input);        } finally {            returnInstance(jedisCommand);        }

Redis TridentState支持

此外，storm-redis中还支持trident state：

RedisState and RedisMapState, which provide Jedis interface just for single redis.RedisClusterState and RedisClusterMapState, which provide JedisCluster interface, just for redis cluster.

由于我们使用的是single redis模式（非集群），在下面的UML图中会有所体现：

使用RedisDataTypeDescription来定义保存到Redis的数据类型和额外的key，其中支持两种数据类型：STRING和HASH。如果使用HASH类型，则需要定义额外的key，因为hash属于两层的，我们定义的additionalKey为最外层的key类型。

例如我们需要保存结果至Redis的Hash数据结构中，则需要定义RedisDataTypeDescription.RedisDataType.HASH，定义hash的key："controller:5min”，根据key进行group by操作，当前使用非事务型（对数据正确性敏感度不高）。

            Options<Object> fiveMinitesOptions = new Options<>();            fiveMinitesOptions.dataTypeDescription = new RedisDataTypeDescription(RedisDataTypeDescription.RedisDataType.HASH,                    "controller:5min");            logStream.each(new Fields("logObject"), new Log5MinGroupFunction(), new Fields("key"))                    .groupBy(new Fields("key"))                    .persistentAggregate(RedisMapState.nonTransactional(poolConfig, fiveMinitesOptions), new Fields("logObject"),                            new LogCombinerAggregator(), new Fields("statistic"));

最后在Redis中保存的值为：

controller:5min          Log5MinGroupFunction生成的key，LogCombinerAggregator合并完成后的value；

Log5MinGroupFunction生成的key会经过KeyFactory.build(List<Object> key)方法转换，可以考虑自定义生成的key；最终的value会通过Serializer的序列化以及反序列化方法转换成byte[]存放至Redis中，默认是通过JSON的格式。

在AbstractRedisMapState中，对于传过来的keys进行统一KeyFactory.get操作，而实际获取值和持久化值是通过 retrieveValuesFromRedis以及updateStatesToRedis两个方法来实现的

@Override public List<T> multiGet(List<List<Object>> keys) {        if (keys.size() == 0) {            return Collections.emptyList();        }        List<String> stringKeys = buildKeys(keys);        List<String> values = retrieveValuesFromRedis(stringKeys);        return deserializeValues(keys, values);    }private List<String> buildKeys(List<List<Object>> keys) {        List<String> stringKeys = new ArrayList<String>();        for (List<Object> key : keys) {            stringKeys.add(getKeyFactory().build(key));        }        return stringKeys;    }@Override    public void multiPut(List<List<Object>> keys, List<T> vals) {        if (keys.size() == 0) {            return;        }        Map<String, String> keyValues = new HashMap<String, String>();        for (int i = 0; i < keys.size(); i++) {            String val = new String(getSerializer().serialize(vals.get(i)));            String redisKey = getKeyFactory().build(keys.get(i));            keyValues.put(redisKey, val);        }        updateStatesToRedis(keyValues);    }

在RedisMapState中，从Redis中获取值的方法：

@Override    protected List<String> retrieveValuesFromRedis(List<String> keys) {        String[] stringKeys = keys.toArray(new String[keys.size()]);        Jedis jedis = null;        try {            jedis = jedisPool.getResource();            RedisDataTypeDescription description = this.options.dataTypeDescription;            switch (description.getDataType()) {            case STRING:                return jedis.mget(stringKeys);            case HASH:                return jedis.hmget(description.getAdditionalKey(), stringKeys);

可以看出，支持两种类型STRING以及HASH，可以通过批量获取的API获取多个keys值，update的过程也比较类似，如果是STRING类型，通过pipeline的方式（分布式不支持）可以极大提高查找效率；如果为hash类型，直接通过hmget即可。

protected void updateStatesToRedis(Map<String, String> keyValues) {        Jedis jedis = null;        try {            jedis = jedisPool.getResource();            RedisDataTypeDescription description = this.options.dataTypeDescription;            switch (description.getDataType()) {            case STRING:                String[] keyValue = buildKeyValuesList(keyValues);                jedis.mset(keyValue);                if(this.options.expireIntervalSec > 0){                    Pipeline pipe = jedis.pipelined();                    for(int i = 0; i < keyValue.length; i += 2){                        pipe.expire(keyValue[i], this.options.expireIntervalSec);                    }                    pipe.sync();                }                break;            case HASH:                jedis.hmset(description.getAdditionalKey(), keyValues);                if (this.options.expireIntervalSec > 0) {                    jedis.expire(description.getAdditionalKey(), this.options.expireIntervalSec);                }                break;

查看图片附件

0 0