Elasticsearch相关操作梳理

来源：互联网发布：九章算法强化班视频编辑：程序博客网时间：2024/06/09 16:01

1. 创建新表

curl -XPUT 'host:port/table_name?pretty'

table_name：表名，不能有大写字母，不能以下划线开头

pretty：使返回结果以便于阅读的JSON格式返回

自定义 schema，即mapping两种方式：

1.在创建表的时候一块创建：

host:port/table_name

{
    "settings": {
        "number_of_shards":4,
        "number_of_replicas":1
    },
    "mappings": {
        "dishlist": {
            "_all": {
                "enabled":false
            },
            "properties": {
                "id": {
                    "type":"string",
                    "index":"not_analyzed"
                },
                "dish_name": {
                    "type":"string",
                    "index":"not_analyzed"
                },
                "dish_name_analyzed": {
                    "type":"string",
                    "index":"analyzed",
                    "analyzer":"ik_max_word"
                },
                "dish_id": {
                    "type":"string",
                    "index":"not_analyzed"
                },
                "poi_id": {
                    "type":"long",
                    "index":"not_analyzed"
                },
                "poi_name": {
                    "type":"string",
                    "store":"true"
                }
            }
        }
    }
}

2.在创建type的时候创建

host:port/table_name/type_name/_mapping?pretty

{
    "type_name": {
        "_all": {
            "enabled":false
        },
        "properties": {
            "id": {
                "type":"string",
                "index":"not_analyzed"
            },
            "dish_name": {
                "type":"string",
                "index":"not_analyzed"
            },
            "dish_name_analyzed": {
                "type":"string",
                "index":"analyzed",
                "analyzer":"ik_max_word"
            },
            "dish_id": {
                "type":"string",
                "index":"not_analyzed"
            },
            "poi_id": {
                "type":"long",
                "index":"not_analyzed"
            },
            "poi_name": {
                "type":"string",
                "index":"not_analyzed"
            }
        }
    }
}

1. 单条插入数据

1、实时导入（带ID）

curl -XPUT 'host:port/table_name/type/id?pretty' -d '{  "field": "value"}'

type：类型，可理解为二级表名。

id：该条数据的唯一id

2、实时导入（不带ID）

curl -XPOST 'host:port/table_name/type?pretty' -d '{  "field": "value"}'

不带 id 时使用 “POST”，系统将自动随机生成一个唯一id

导入时如果想更新已存在的

curl -XPOST 'host:port/table_name/type?pretty' -d '

{ “doc”:{ "field": "value" } "detect_noop":false//无视是否修改，强制合并到现有的文档}'

批量导入：

导入文件需要如下格式：

{ "index": { "_index": "table_name", "_type": "type_name", "_id": "id1" }}

{ "field1": "value1", "field2": "value2", ……}

{ "index": { "_index": "table_name", "_type": "type_name", "_id": "id2" }}

{ "field1": "value1", "field2": "value2", ……}

_id 是可以指定为具体的field的比如下面这样（要保证值是唯一的）

{ "index": { "_index": "table_name", "_type": "type_name", "_id": "value1" }}

{ "field1": "value1", "field2": "value2", ……}

然后用以下接口导入数据：

curl -XPOST 'host:port/_bulk' --data-binary @import.json

import.json 为导入文件名。

也可在此链接中指定表名或type名，这样就不用在导入文件中指定：

curl -XPOST 'host:port/table_name/type_name/_bulk' --data-binary @import.json

***注意每个 import.json 文件不能过大，最好在10M左右，大文件可以分割为小文件并行导入***

2. 创建自定义分词和同义词配置

1：在elasticsearch-x.x.x/config目录下新建同义词文件synonym.txt。
其中，synonym.txt 编码格式为’utf-8’，内容建议为空。

2：创建索引

host:port/dishtag

{
    "settings": {
        "number_of_shards":3,
        "number_of_replicas":1,
        "index": {
            "analysis": {
                "analyzer": {
                    "by_smart": {
                        "type":"custom",
                        "tokenizer":"ik_smart",
                        "filter": [
                            "by_sfr"
                        ]
                    },
                    "by_max_word": {
                        "type":"custom",
                        "tokenizer":"ik_max_word",
                        "filter": [
                            "by_sfr"
                        ]
                    }
                },
                "filter": {
                    "by_sfr": {
                        "type":"synonym",
                        "synonyms_path":"analysis/synonym.txt"
                    }
                }
            }
        }
    },
    "mappings": {
        "dishtag": {
            "_all": {
                "enabled":false
            },
            "properties": {
                "id": {
                    "type":"string",
                    "index":"not_analyzed"
                },
                "tag_id": {
                    "type":"string",
                    "index":"not_analyzed"
                },
                "tag_name": {
                    "type":"string",
                    "index":"not_analyzed",
                    "search_analyzer":"by_max_word",
                    "analyzer":"by_smart"
                }
            }
        }
    }
}

3：添加同义词

4：测试分词后同义词是否生效

host:port/dstag/_analyze?analyzer=by_max_word&pretty&text=鱼

这种结果就是同义词配置已经生效

小结

同义词字典或是IK用户自定义词典更新，必须每次重启elasticsearch才有效。
同义词词对是必须能被完成切分的词语。

阅读全文

1 0