企业级搜索elasticsearch应用03-前置处理器

来源：互联网发布：多益网络线上二笔编辑：程序博客网时间：2024/05/22 03:16

一。Ingest Node

IngestNode节点被用于对原始json数据预处理的节点该节点只需要在 elasticsearch.yml中添加 node.ingest: true 需要预处理的文档只需要

添加一个pipeline（管道）指定一系列的processor （处理器）所以管道被运行在ingest节点 pipeline管道包含若干个处理器源文档的处理可以

指定pipeline管道

在处理文档时指定管道的语法是

PUT 索引/类型/id?pipeline=指定管道{  "foo": "bar"}

pipeline的维护
1》添加管道（参考https://www.elastic.co/guide/en/elasticsearch/reference/current/put-pipeline-api.html）
管道添加语法

PUT _ingest/pipeline/管道id{  "description" : "测试管道",  "processors" : [     指定多个处理器  ]}

这里演示一个set处理器在文档添加是 set一个字段（使用之前数据 http://blog.csdn.net/liaomin416100569/article/details/78727827）
比如在添加数据是添加一个状态字段 status:1

添加处理器

curl -XPUT '192.168.58.147:9200/_ingest/pipeline/myp' -d '{  "description" : "测试管道",  "processors" : [    {      "set" : {        "field": "status",        "value": "1"      }    }  ]}';

尝试添加一个用户信息

curl -XPUT '192.168.58.147:9200/user/info/12?pipeline=myp' -d '{"country":"中国","provice":"广东省","city":"广州市","age":"89","name":"王冠宇","desc":"王冠宇是王五的儿子"}';

查看该doc 发现多了一个status:1的字段

[root@node1 ~]# curl -XGET '192.168.58.147:9200/user/info/12?pretty'{  "_index" : "user",  "_type" : "info",  "_id" : "12",  "_version" : 1,  "found" : true,  "_source" : {    "country" : "中国",    "city" : "广州市",    "provice" : "广东省",    "name" : "王冠宇",    "age" : "89",    "desc" : "王冠宇是王五的儿子",    "status" : "1"  }}

2》删除管道

curl -XDELETE '192.168.58.147:9200/_ingest/pipeline/myp'

修改参考官方文档

二。常用前置处理器（参考https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest-processors.html）

1》 set前置处理器（https://www.elastic.co/guide/en/elasticsearch/reference/current/set-processor.html）

在文档被处理前添加一个字段语法

{  "set": {    "field": "field1",    "value": 582.1  }}

2》GROK处理器（https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html）
使用grok表达式匹配字段中特定规则的字符到对应json
grok表达式是正则更上一层的表达式 grok也支持正则的语法
语法格式是
%{SYNTAX:SEMANTIC}
SYNTAX表示匹配的表达式 SEMANTIC表示将匹配的值写入的字段名比如

3.44 55.3.244.1

匹配grok表达式

%{NUMBER:duration} %{IP:client}

NUMBER和IP都是内置的表达式名称默认存在直接使用举例比如管道定义

curl -XPUT '192.168.58.147:9200/_ingest/pipeline/myp1?pretty' -d '{  "description" : "grok测试",  "processors": [    {      "grok": {        "field": "message",        "patterns": ["%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"]      }    }  ]}';

测试数据

curl -XPUT '192.168.58.147:9200/test/test/1?pipeline=myp1&pretty' -d '{  "message": "55.3.244.1 GET /index.html 15824 0.043"}';

查询结果

[root@node1 httpd]# curl -XGET '192.168.58.147:9200/test/test/1?pretty'{  "_index" : "test",  "_type" : "test",  "_id" : "1",  "_version" : 1,  "found" : true,  "_source" : {    "duration" : "0.043",    "request" : "/index.html",    "method" : "GET",    "bytes" : "15824",    "client" : "55.3.244.1",    "message" : "55.3.244.1 GET /index.html 15824 0.043"  }}

其他处理器参考官网

阅读全文

0 0