2-Elasticsearch集群数据批量导入

来源:互联网 发布:淘宝数据包下载的流程 编辑:程序博客网 时间:2024/05/22 06:43

1、数据形式

我们采用Person的作为数据出发点,将文件中的序列化为Json的Person对象导入Elasticsearch集群中。
本文中的代码详见:https://github.com/hawkingfoo/es-batch-import

1.1 数据类型

public class Person {    private int pid;            // person id    private int age;    private boolean sex;    private String name;    private String addr;}

1.2 序列化Json后的文件类型

Person.dat id与json串以\t作为分割。

0   {"pid":0,"age":41,"sex":true,"name":"Lucy","addr":"Shanghai"}1   {"pid":1,"age":9,"sex":true,"name":"Jenny","addr":"Shenzhen"}2   {"pid":2,"age":9,"sex":true,"name":"Lily","addr":"Tianjin"}3   {"pid":3,"age":42,"sex":false,"name":"David","addr":"Guangzhou"}4   {"pid":4,"age":40,"sex":true,"name":"Mary","addr":"Chongqing"}5   {"pid":5,"age":3,"sex":true,"name":"Jenny","addr":"Guangzhou"}

2、ES建立index和mapping

建立5个分片1个副本的index,其中ES的type为infos,对应的mapping如下:

{  "settings": {    "index": {      "creation_date": "1470300617555",      "legacy": {        "routing": {          "hash": {            "type": "org.elasticsearch.cluster.routing.DjbHashFunction"          },          "use_type": "false"        }      },      "number_of_shards": "5",      "number_of_replicas": "1",      "uuid": "mJXGBmnYS12mXBo0aGrR3Q",      "version": {        "created": "1070099",        "upgraded": "2030499"      }    }  },  "mappings": {    "infos": {      "_timestamp": {},      "properties": {        "sex": {          "type": "boolean"        },        "name": {          "index": "not_analyzed",          "type": "string"        },        "pid": {          "type": "integer"        },        "addr": {          "index": "not_analyzed",          "type": "string"        },        "age": {          "type": "integer"        }      }    }  }}

3、导入程序模块

3.1 流程图

导入模块

整个导入模块的流程图如上,Main创建ESClientBulkProcessor;读取Person.dat中的Json串,组成UpdateRequest后加入到BulkProcessor中,当BulkProcessor满足一定的写入条件后,会批量进行发送到ES集群。

3.2 ESClient建立

添加Maven依赖:

<dependency>        <groupId>org.elasticsearch</groupId>        <artifactId>elasticsearch</artifactId>        <version>2.3.4</version></dependency>
// ESConfigpublic class ESConfig {    private String esClusterName;    // 集群名称    private String esClusterAddress; // 集群地址    private String esIndex;          // ES库    private String esType;           // ES表    private int batchSize;           // 批量导入大小    private String filePath;         // 导入文件的路径    private int esThreadNum;         // 导入到ES的并发数量    private String localClientIP;    // 本机IP地址    public String getEsClusterName() {        return esClusterName;    }    public ESConfig setEsClusterName(String esClusterName) {        this.esClusterName = esClusterName;        return this;    }    public String getEsClusterAddress() {        return esClusterAddress;    }    public ESConfig setEsClusterAddress(String esClusterAddress) {        this.esClusterAddress = esClusterAddress;        return this;    }    public String getEsIndex() {        return esIndex;    }    public ESConfig setEsIndex(String esIndex) {        this.esIndex = esIndex;        return this;    }    public String getEsType() {        return esType;    }    public ESConfig setEsType(String esType) {        this.esType = esType;        return this;    }    public int getBatchSize() {        return batchSize;    }    public ESConfig setBatchSize(int batchSize) {        this.batchSize = batchSize;        return this;    }    public String getFilePath() {        return filePath;    }    public ESConfig setFilePath(String filePath) {        this.filePath = filePath;        return this;    }    public int getEsThreadNum() {        return esThreadNum;    }    public ESConfig setEsThreadNum(int esThreadNum) {        this.esThreadNum = esThreadNum;        return this;    }    public String getLocalClientIP() {        return localClientIP;    }    public ESConfig setLocalClientIP(String localClientIP) {        this.localClientIP = localClientIP;        return this;    }}

ESClient:

public class ESClient {    private static final Logger logger = LogManager.getLogger(ESClient.class);    public BulkProcessor createBulkProcessor(ESConfig esConfig) {        String clusterName = esConfig.getEsClusterName();        String clusterAddr = esConfig.getEsClusterAddress();        if (clusterName == null || clusterName.isEmpty()) {            logger.error("invalid cluster name.");            return null;        }        if (clusterAddr == null || clusterAddr.isEmpty()) {            logger.info("invalid cluster address.");            return null;        }        String[] addr = clusterAddr.split(":");        if (addr.length != 2) {            logger.info("invalid cluster address.");            return null;        }        Settings settings = Settings.settingsBuilder()                .put("cluster.name", clusterName)                .put("cluster.transport.sniff", true)                .put("index.refresh_interval", "60s")                .build();        // 创建 TransportClient        TransportClient transportClient = new TransportClient.Builder()                .settings(settings).build();        List<InetSocketTransportAddress> addrList = new ArrayList<>();        try {            addrList.add(new InetSocketTransportAddress(InetAddress.getByName(addr[0]),                    Integer.parseInt(addr[1])));        } catch (Exception e) {            logger.error("exception:", e);            return null;        }        for (InetSocketTransportAddress address : addrList) {            transportClient.addTransportAddress(address);        }        Client client = transportClient;        // 初始化Bulk处理器        BulkProcessor bulkProcessor = BulkProcessor.builder(                client,                new BulkProcessor.Listener() {                    long begin;                    long cost;                    int count = 0;                    @Override                    public void beforeBulk(long executionId, BulkRequest bulkRequest) {                        begin = System.currentTimeMillis();                    }                    @Override                    public void afterBulk(long executionId, BulkRequest bulkRequest, BulkResponse bulkResponse) {                        cost = (System.currentTimeMillis() - begin) / 1000;                        count += bulkRequest.numberOfActions();                        logger.info("bulk success. size:[{}] cost:[{}s]", count, cost);                    }                    @Override                    public void afterBulk(long executionId, BulkRequest bulkRequest, Throwable throwable) {                        logger.error("bulk update has failures, will retry:" + throwable);                    }                })                .setBulkActions(esConfig.getBatchSize())                    // 批量导入个数                .setBulkSize(new ByteSizeValue(1, ByteSizeUnit.MB))    // 满1MB进行导入                .setConcurrentRequests(esConfig.getEsThreadNum())           // 并发数                .setFlushInterval(TimeValue.timeValueSeconds(5))            // 冲刷间隔60s                .setBackoffPolicy(BackoffPolicy.constantBackoff(TimeValue.timeValueSeconds(1), 3)) // 重试3次,间隔1s                .build();        return bulkProcessor;    }}

在3.1节中,我们曾提到过满足发送条件这个概念,对应于上面BulkProcessor中的3个set方法。分别是:
- 当导入数据(UpdateRequest)的个数达到后,进行发送;
- 当导入数据的大小达到1MB后,进行发送;
- 当距离上一次发送超过60秒时,进行发送。

3.3 读取并组装UpdateRequest

ESImporter:

public class ESImporter {    private static final Logger logger = LogManager.getLogger(ESImporter.class);    public void importer(ESConfig esConfig) {        File file = new File(esConfig.getFilePath());        BufferedReader reader = null;        // 创建BulkProcessor        BulkProcessor bulkProcessor = new ESClient().createBulkProcessor(esConfig);        if (bulkProcessor == null) {            logger.error("create bulk processor failed.");            return;        }        UpdateRequest updateRequest;        String[] arrStr;        try {            reader = new BufferedReader(new FileReader(file));            String tempString;            // 一次读入一行,直到读入null为文件结束            while ((tempString = reader.readLine()) != null) {                arrStr = tempString.split("\t");                if (arrStr.length != 2) {                    continue;                }                updateRequest = new UpdateRequest(esConfig.getEsIndex(), esConfig.getEsType(), arrStr[0])                        .doc(arrStr[1]).docAsUpsert(true);                bulkProcessor.add(updateRequest);            }            reader.close();        } catch (Exception e) {            e.printStackTrace();        } finally {            try {                if (reader != null) {                    reader.close();                }                if (bulkProcessor != null) {                    bulkProcessor.awaitClose(1, TimeUnit.MINUTES);                }            } catch (Exception e) {                // do nothing            }        }    }}

这个模块主要是读取文件中的Json行,组装成UpdateRequest后,加入到bulkProcessor中。

3.4 服务启动模块

ImportMain:

public class ImportMain {    private static final Logger logger = LogManager.getLogger(ImportMain.class);    public static void main(String[] args) {        try {            if (args.length < 1) {                System.err.println("usage: <file_path>");                System.exit(1);            }            ESConfig esConfig = new ESConfig()                    .setEsClusterName("elasticsearch")                    .setEsClusterAddress("127.0.0.1:9300")                    .setEsIndex("person")                    .setEsType("infos")                    .setBatchSize(100)                    .setFilePath(args[0])                    .setEsThreadNum(1);            long begin = System.currentTimeMillis();            ESImporter esImporter = new ESImporter();            esImporter.importer(esConfig);            long cost = System.currentTimeMillis() - begin;            logger.info("import end. cost:[{}ms]", cost);        } catch (Exception e) {            logger.error("exception:", e);        }    }}

3.5 代码目录

代码目录

3.6 ES集群查看

导入结束后,在ES集群上可以看到导入的docs。
docs

data

原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 金毛体味很重怎么办 直筒连衣裙太短怎么办 托班社会下雨了怎么办 托班下雨了怎么办教案 吃鸡界面有鼠标怎么办 老年机成英语了怎么办 手机成了英语了怎么办 塑料袋融化粘到衣服上怎么办 厕所被卫生纸堵了怎么办 钻石画的胶不粘了怎么办 客厅沙发选大了怎么办 连衣裙腰大了怎么办呀 憋尿后出现尿急尿涨怎么办 脚有酸酸的味道怎么办 三星手机home键失灵怎么办 三星s6返回键失灵怎么办 三星s7屏幕漏液怎么办 三星s8出现蓝框怎么办 三星手机短信图标没了怎么办 ps没有足够的ram怎么办 ps性能调不了是怎么办 苹果6p照相模糊怎么办 相框玻璃碎了怎么办 word文档加密后忘记密码怎么办 手机wps密码忘了怎么办 苹果手表忘了密码怎么办 苹果系统忘了密码怎么办 ps画板建小了怎么办 wps表格密码忘了怎么办 word文档变成虚的怎么办 wps论文中表格跨页怎么办 word文档复制过来有边框怎么办 wps表格跨页断开怎么办 锅的铆钉老是松怎么办 文胸不知道怎么染色了怎么办 未后的信息我该怎么办? 做leep手术后大出血怎么办 眼线笔出不了水怎么办 手机字体变成空心字怎么办 平安树树枝黑了怎么办 柳树被虫钻洞了怎么办