Elasticsearch全文检索企业开发记录总结（三）：Mapping相关配置

来源：互联网发布：2016淘宝类目排行榜编辑：程序博客网时间：2024/09/21 09:02

理解Mapping

什么是mapping
ES的mapping非常类似于静态语言中的数据类型：声明一个变量为int类型的变量，以后这个变量都只能存储int类型的数据。同样的，一个number类型的mapping字段只能存储number类型的数据。
同语言的数据类型相比，mapping还有一些其他的含义，mapping不仅告诉ES一个field中是什么类型的值，它还告诉ES如何索引数据以及数据是否能被搜索到。
当你的查询没有返回相应的数据，你的mapping很有可能有问题。当你拿不准的时候，直接检查你的mapping。
剖析mapping
一个mapping由一个或多个analyzer组成，一个analyzer又由一个或多个filter组成的。当ES索引文档的时候，它把字段中的内容传递给相应的analyzer，analyzer再传递给各自的filters。
filter的功能很容易理解：一个filter就是一个转换数据的方法，输入一个字符串，这个方法返回另一个字符串，比如一个将字符串转为小写的方法就是一个filter很好的例子。
一个analyzer由一组顺序排列的filter组成，执行分析的过程就是按顺序一个filter一个filter依次调用， ES存储和索引最后得到的结果。
总结来说， mapping的作用就是执行一系列的指令将输入的数据转成可搜索的索引项。

IK+pinyin 分词器安装与配置

ES作为最强大的全文检索工具，中英文分词几乎是必备功能，下面简单说明下分词器安装步骤（详细步骤网上很多，这里选择nextbang 作者为例）：

下载中文/拼音分词器
IK中文分词器：https://github.com/medcl/elasticsearch-analysis-ik
拼音分词器：https://github.com/medcl/elasticsearch-analysis-pinyin
安装
通过releases找到和es对应版本的zip文件，或者source文件（自己通过mvn package打包）；当然也可以下载最新master的代码。
进入elasticsearch安装目录/plugins；mkdir pinyin；cd pinyin；
cp 刚才打包的zip文件到pinyin目录；unzip解压
部署后，记得重启es节点
配置
settings配置

PUT  my_index/_settings "index" : {        "number_of_shards" : "3",        "number_of_replicas" : "1",        "analysis" : {          "analyzer" : {            "default" : {              "tokenizer" : "ik_max_word"            },            "pinyin_analyzer" : {              "tokenizer" : "my_pinyin"            }          },          "tokenizer" : {            "my_pinyin" : {              "keep_separate_first_letter" : "false",              "lowercase" : "true",              "type" : "pinyin",              "limit_first_letter_length" : "16",              "keep_original" : "true",              "keep_full_pinyin" : "true"            }          }        }      }

mapping 配置

PUT my_index/index_type/_mapping"ep" : {        "_all" : {          "analyzer" : "ik_max_word"        },        "properties" : {            "name" : {                "type" : "text",                "analyzer" : "ik_max_word",                "include_in_all" : true,                "fields" : {                    "pinyin" : {                        "type" : "text",                        "term_vector" : "with_positions_offsets",                        "analyzer" : "pinyin_analyzer",                        "boost" : 10.0                      }                 }            }      }}

4、测试

通过_analyze测试下分词器是否能正常运行：

GET my_index/_analyze{    "text":["刘德华"],    "ananlyzer":"pinyin_analyzer"}向index中put中文数据：POST my_index/index_type -d'{"name":"刘德华"}'

Mapping映射设计

根据全文检索业务的需求进行数据表的映射设计，下面为本项目的设计原则：

根据业务展示页面每个数据内容涉及到的字段进行类型确认、是否需要进行聚合、是否加入索引以及是否需要进行分词等。
以每个展示单位为映射整体，进行主表与关联的数据表进行一对多映射，保证在聚合查询时，可以得到每个展示单位的数据的聚合结果。
酒店表数据Mapping设计：

{  "hotel": {    "mappings": {      "data": {        "properties": {          "address": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              },              "pinyin": {                "type": "text",                "analyzer": "pinyin"              }            },            "analyzer": "ik_max_word"          },          "areaCode": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "areaName": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              },              "pinyin": {                "type": "text",                "analyzer": "pinyin"              }            },            "analyzer": "ik_max_word"          },          "autotrophy": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "cityCode": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "cityName": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              },              "pinyin": {                "type": "text",                "analyzer": "pinyin"              }            },            "analyzer": "ik_max_word"          },          "coordinate": {            "type": "geo_point"          },          "createTime": {            "type": "date",            "format": "yyyy-MM-dd HH:mm:ss||epoch_millis"          },          "deleted": {            "type": "boolean"          },          "dictGuestQualificationName": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              },              "pinyin": {                "type": "text",                "analyzer": "pinyin"              }            },            "analyzer": "ik_max_word"          },          "dictTypeByPositionCode": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "dictTypeByPositionName": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              },              "pinyin": {                "type": "text",                "analyzer": "pinyin"              }            },            "analyzer": "ik_max_word"          },          "dictTypeByServiceCode": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "dictTypeByServiceName": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              },              "pinyin": {                "type": "text",                "analyzer": "pinyin"              }            },            "analyzer": "ik_max_word"          },          "featurePicPath": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "forumStatus": {            "type": "long"          },          "geohash": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "grade": {            "type": "double"          },          "hotelExtendPicPath1": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "hotelExtendPicPath2": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "hotelExtendPicPath3": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "id": {            "type": "long"          },          "initGrade": {            "type": "double"          },          "level": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "lobbyPicPath": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "name": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              },              "pinyin": {                "type": "text",                "analyzer": "pinyin"              }            },            "analyzer": "ik_max_word"          },          "phone": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "provinceCode": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "provinceName": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              },              "pinyin": {                "type": "text",                "analyzer": "pinyin"              }            },            "analyzer": "ik_max_word"          },          "reason": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              },              "pinyin": {                "type": "text",                "analyzer": "pinyin"              }            },            "analyzer": "ik_max_word"          },          "roomPicPath": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "scale": {            "type": "long"          },          "score": {            "type": "double"          },          "serviceScope": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              },              "pinyin": {                "type": "text",                "analyzer": "pinyin"              }            },            "analyzer": "ik_max_word"          },          "soundphone": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "star": {            "type": "long"          },          "statusCode": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "statusName": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              }            }          },          "streetName": {            "type": "text",            "fields": {              "keyword": {                "type": "keyword",                "ignore_above": 256              },              "pinyin": {                "type": "text",                "analyzer": "pinyin"              }            },            "analyzer": "ik_max_word"          },          "updateTime": {            "type": "date",            "format": "yyyy-MM-dd HH:mm:ss||epoch_millis"          },          "vipEnable": {            "type": "long"          }        }      }    }  }}

Mapping解析

上面mapping涉及到的：

type：数据类型
fields ：可以对一个字段提供多种索引模式，例如，一个string 字段可以映射为text全文搜索的字段，也可以映射keyword为排序或聚合的字段
analyzer：指定分词器
ignore_above ：超过多少个字符的文本，将会被忽略，不被索引
…
官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html#_multi_fields_2

阅读全文

'); })();