elasticsearch查询语句篇
来源:互联网 发布:怎么知道ftp端口是多少 编辑:程序博客网 时间:2024/05/29 03:09
1.ElasticSearch基本概念
整个搜索客户端github地址:https://github.com/cweeyii/elasticsearch-parent
elasticsearch基本概念见:https://es.xiaoleilu.com/010_Intro/05_What_is_it.html
集群模式安装:http://blog.csdn.net/cweeyii/article/details/71055884
2. 重点概念
- 搜素类型(searchType)
特别是你需要检索出满足条件的文档数量时,可以直接设置为count类型,即只会返回命中的文档数量。(相当于mysql:select count(1) from table where valid=0)
PS:该类型现在已经被废弃可以直接设置search条件中的from=0 size=0即可,效率一样。
#检索条件构造:SearchRequestBuilder builder = client.prepareSearch(indexName).setQuery(searchCondition.getQueryBuilder()).setFrom(0).setSize(0);#结果数量获取:SearchHits hits = searchResponse.getHits();hits.getTotalHits();
- 默认对象
ES建立的索引中会包换多个元数据字段,每一个都以下划线开头,例如 _type, _id,_index 和 _source
这些字段是十分有用的,例如可以将用户记录中的主键设置为_id的内容,可以实现根据主键更新es记录的作用,并且可以实现根据id获取记录或者实现查询中过滤指定id的记录的功能。参见:IdsQueryBuilder。并且如果没有设置逻辑的routing,那么记录定位shard分片就是根据_id来实现的。
_index索引名字 _type索引类型 _id文档id _source (Elasticsearch 用来保存文档主体 JSON字段) - 动态映射
当 Elasticsearch 处理一个位置的字段时,它通过【动态映射】来确定字段的数据类型且自动将该字段加到类型映射中。例如:你可以不用先自己去建立mapping关系,es会根据你传入的索引类型中的字段的类型来自动映射,如string类型会分词和存储。这个功能在你要对索引的对象加上一个字段的时候,非常有用。你不需要去删除和修改mapping,只要刷一遍数据,这个字段就自动刷到索引中了。
但是有时候该功能不是想要的。如下面一个mapping就不能通过自动映射来实现:
{ "settings": { "index": { "number_of_replicas": "1", "number_of_shards": "5" } }, "mappings": { "enterprise_basic_info": { "_all": { "enabled": false }, "properties": { "id": { "type": "long" }, "enterpriseName": { "type": "string", "analyzer": "ik_max_word" }, "address": { "type": "string", "analyzer": "ik_max_word" }, "latitude": { "type": "double" }, "longitude": { "type": "double" }, "phone": { "type": "string", "index": "not_analyzed" }, "businessCategory": { "type": "string", "index": "not_analyzed" }, "cityName": { "type": "string", "index": "not_analyzed" }, "districtName": { "type": "string", "index": "not_analyzed" }, "valid": { "type": "long" }, "location": { "type": "geo_point", "geohash_prefix":true, "geohash_precision":12 } } } }}
其中城市和行政区虽然都是字符串类型但是并不需要其被分词。因此对于建立索引推荐还是自己配置mapping
- 索引别名
索引别名有点像指针的作用,其并不会存储数据或者产生一个新的索引,其主要是指定向一个索引。别名常用于索引的快速切换的功能。例如:刚开始你的索引别名my_index指向my_index1,你可以不用开关机,修改代码直接将my_index指向my_index2 - QueryBuilder和FilterBuilder的区别
FilterBuilder在检索的时候,实现的是过滤的功能,它会将所有的记录根据筛选条件进行预先的筛选,然后在筛选的结果里面进行QueryBuilder的查询。因此FilterBuilder也有选取满足指定条件的记录的功能,并且该筛选结果会被缓存起来,下一次有同样条件的筛选要求,就不需要重新计算了,另外与QueryBuilder比较FilterBuilder其不需要计算文档的相关性,因此速度更快。【官网解释】
PS:我做实验发现,并没有速度的提升,有可能进行了QueryBuilder的优化,或者我索引文档的数量太少(10000条记录)体现不出差别:
见后文的github中代码:ElasticSearchConditionTest
int times = 10000; SearchCondition filterCondition = new SearchCondition(); filterCondition.setFilterBuilder(OperationBuilderFactory.builder().queryString("address", "云中", OperationType.MUST) .term("valid", 1, OperationType.MUST).builder()); Long beginTime1 = System.currentTimeMillis(); for (int i = 0; i < times; i++) { List<EnterpriseBasicInfoDTO> basicInfoDTOList = enterpriseSearchHandle.getListByCondition(filterCondition, null); } Long beginTime2 = System.currentTimeMillis(); LOGGER.info("运行Filter {} 花费时间{} 秒", times, (beginTime2 - beginTime1) / 1000); SearchCondition queryCondition = new SearchCondition(); queryCondition.setQueryBuilder(OperationBuilderFactory.builder().queryString("address", "云中", OperationType.MUST). term("valid", 1, OperationType.MUST).builder()); for (int i = 0; i < times; i++) { List<EnterpriseBasicInfoDTO> basicInfoDTOList = enterpriseSearchHandle.getListByCondition(queryCondition, null); } Long beginTime3 = System.currentTimeMillis(); LOGGER.info("运行query {} 花费时间{} 秒", times, (beginTime3 - beginTime2) / 1000);
执行结果:可以发现进行一万次就只有几秒的提升,感觉并没有太大区别。
18:10:51.718 INFO (ElasticSearchConditionTest.java:41) - 运行Filter 10000 花费时间27 秒18:11:26.416 INFO (ElasticSearchConditionTest.java:49) - 运行query 10000 花费时间34 秒
- 快速的距离范围查找GeoHash
GeoHash算法是主要用于解决快速查找邻域范围(如500m内商家)类的其他记录的功能的算法。其主要思想是将整个地球品面分为8分,每一份由不同的字符表示,同样的对于每一份也递归的进行切分,最后根据你设置的geohash的长度,没一份覆盖的范围越来越小,因此如果需要求范围内点,只需要获取领域返回的其他几块,之后在这些删选数据中在进行高消耗的详细计算。
上图是geohash不同长度对应的精度。如11位长的geohash编码能够到达查找15米范围的所有相邻点的功能。
要设置坐标的geohash功能需要添加一个新的字段来表示,如我有一堆POI有经纬度坐标,为了要实现geohash范围查找的功能,我需要在mapping中加入一个location字段
#mapping中设置 "latitude": { "type": "double" }, "longitude": { "type": "double" }, "location": { "type": "geo_point", "geohash_prefix":true, "geohash_precision":12 }#在建立索引的类型对象中只需要如下设置即可:(利用fastJson序列化只根据get和set方法来判断是否具有location字段,你可以不用设置该字段,具体代码也可以看下面的gitHub链接,里面有具体的实现)public String getLocation() { return latitude + "," + longitude; }
- ElasticSearch具体操作
term匹配(不进行分词)准确匹配:TermQueryBuilder
queryString(进行分词)分词匹配:QueryStringQueryBuilder根据QueryStringQueryBuilder.Operator的操作是AND 还是OR操作来决定分词结果是需要同时包含,还是包含其中一个就行。
prefix 准确匹配: 如果索引的字段需要进行分词,那么根据该分词结果的term是否有prefix指定的前缀,如果有则匹配。如果索引的字段不进行分词,那么看该字段内容是否有prefix前缀。PrefixQueryBuilder
range(范围匹配,大小、小于、between and):指定字段是否在该范围内,如果在则匹配。RangeQueryBuilder
notInId或者idIn:根据id进行筛选或者过滤。IdsQueryBuilder
fuzzy模糊匹配:根据字符串之间的编辑距离来匹配FuzzyQueryBuilder
wildcard通配符匹配:根据通配符来匹配字符串WildcardQueryBuilder
geoDistance地理坐标范围匹配:根据各种计算距离的方式来实现距离范围匹配GeoDistanceQueryBuilder
geoHash根据geohash编码来进行近似范围匹配:GeohashCellQuery.Builder - 开发的elasticsearch通用包
github地址:https://github.com/cweeyii/elasticsearch-parent
client包:主要实现对Query操作的编辑包装和搜索操作的封装,特别好用
重要类介绍:
query和filter的条件构造类:
package com.cweeyii.operation;import org.elasticsearch.common.unit.DistanceUnit;import org.elasticsearch.index.query.*;import org.springframework.util.CollectionUtils;import java.util.ArrayList;import java.util.List;import java.util.Map;import java.util.concurrent.ConcurrentHashMap;/** * Created by wenyi on 17/5/9. * Email:caowenyi@meituan.com */public class OperationBuilderFactory { public static Builder builder() { return new Builder(); } public static class Builder { private Map<OperationType, List<QueryBuilder>> queryBuilderMap = new ConcurrentHashMap<>(); private Builder(){} public Builder term(String field, Object value, OperationType operationType) { List<QueryBuilder> queryBuilders = getQueryBuilders(operationType); queryBuilders.add(new TermQueryBuilder(field, value)); return this; } public Builder queryString(String field, String value, OperationType operationType, QueryStringQueryBuilder.Operator operator) { List<QueryBuilder> queryBuilders = getQueryBuilders(operationType); queryBuilders.add(new QueryStringQueryBuilder(value).field(field).defaultOperator(operator)); return this; } public Builder queryString(String field, String value, OperationType operationType) { return queryString(field, value, operationType, QueryStringQueryBuilder.Operator.OR); } public Builder prefix(String field, String prefix, OperationType operationType) { List<QueryBuilder> queryBuilders = getQueryBuilders(operationType); queryBuilders.add(new PrefixQueryBuilder(field, prefix)); return this; } public Builder range(String field, Object from, Object to, OperationType operationType) { List<QueryBuilder> queryBuilders = getQueryBuilders(operationType); queryBuilders.add(new RangeQueryBuilder(field).from(from).to(to)); return this; } public Builder notInId(List<String> ids, OperationType operationType) { List<QueryBuilder> queryBuilders = getQueryBuilders(operationType); queryBuilders.add(new IdsQueryBuilder().ids(ids)); return this; } public Builder fuzzy(String field, Object value, OperationType operationType) { List<QueryBuilder> queryBuilders = getQueryBuilders(operationType); queryBuilders.add(new FuzzyQueryBuilder(field, value)); return this; } public Builder wildcard(String field, String value, OperationType operationType) { List<QueryBuilder> queryBuilders = getQueryBuilders(operationType); queryBuilders.add(new WildcardQueryBuilder(field, value)); return this; } public Builder geoDistance(String field, double lat, double lon, double distance, OperationType operationType) { List<QueryBuilder> queryBuilders = getQueryBuilders(operationType); queryBuilders.add(new GeoDistanceQueryBuilder(field).point(lat, lon).distance(distance, DistanceUnit.METERS)); return this; } public Builder geoHash(String field, double lat, double lon, int precisionLevel, OperationType operationType) { List<QueryBuilder> queryBuilders = getQueryBuilders(operationType); queryBuilders.add(new GeohashCellQuery.Builder(field).point(lat, lon).precision(precisionLevel).neighbors(true)); return this; } public QueryBuilder builder() { BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery(); List<QueryBuilder> mustBuilders = getQueryBuilders(OperationType.MUST); if (!CollectionUtils.isEmpty(mustBuilders)) { for (QueryBuilder queryBuilder : mustBuilders) { boolQueryBuilder.must(queryBuilder); } } List<QueryBuilder> mustNotBuilders = getQueryBuilders(OperationType.MUST_NOT); if (!CollectionUtils.isEmpty(mustNotBuilders)) { for (QueryBuilder queryBuilder : mustNotBuilders) { boolQueryBuilder.mustNot(queryBuilder); } } List<QueryBuilder> shouldBuilders = getQueryBuilders(OperationType.SHOULD); if (!CollectionUtils.isEmpty(shouldBuilders)) { for (QueryBuilder queryBuilder : shouldBuilders) { boolQueryBuilder.should(queryBuilder); } } return boolQueryBuilder; } public List<QueryBuilder> getQueryBuilders(OperationType operationType) { List<QueryBuilder> queryBuilders = queryBuilderMap.get(operationType); if (queryBuilders == null) { synchronized (this) { if (queryBuilders == null) { queryBuilders = new ArrayList<>(); queryBuilderMap.put(operationType, queryBuilders); } } } return queryBuilders; } }}
搜素条件包装类:实现了搜索条件的封装、排序、聚合
public class SearchCondition { private QueryBuilder queryBuilder = null; private QueryBuilder filterBuilder = null; private List<SortBuilder> orders = new ArrayList<>(); private List<AbstractAggregationBuilder> aggregationBuilders = new ArrayList<>(); private SearchType searchType; private int limit = 20; private int offset = 0; private int total = 0; public List<AbstractAggregationBuilder> getAggregationBuilders() { return aggregationBuilders; } public void setAggregationBuilders(List<AbstractAggregationBuilder> aggregationBuilders) { this.aggregationBuilders = aggregationBuilders; } public List<SortBuilder> getOrders() { return orders; } public void setOrders(List<SortBuilder> orders) { this.orders = orders; } public int getTotal() { return total; } public SearchCondition setTotal(int total) { this.total = total; return this; } public SearchCondition orderBy(String field, double lat, double lon, SortOrder order, GeoDistance geoDistance) { if (!StringUtils.isEmpty(field)) { orders.add(new GeoDistanceSortBuilder(field).order(order).point(lat, lon).geoDistance(geoDistance)); } return this; } public SearchCondition orderBy(String field, double lat, double lon) { return orderBy(field, lat, lon, SortOrder.ASC, GeoDistance.DEFAULT); } public SearchCondition orderBy(String field, SortOrder order) { if (!StringUtils.isEmpty(field)) { orders.add(new FieldSortBuilder(field).order(order)); } return this; } public SearchCondition orderBy(String field) { return orderBy(field, SortOrder.ASC); } public QueryBuilder getQueryBuilder() { if (queryBuilder == null) { return QueryBuilders.matchAllQuery(); } return queryBuilder; } public SearchCondition setQueryBuilder(QueryBuilder queryBuilder) { this.queryBuilder = queryBuilder; return this; } public QueryBuilder getFilterBuilder() { return filterBuilder; } public SearchCondition setFilterBuilder(QueryBuilder filterBuilder) { this.filterBuilder = filterBuilder; return this; } public SearchCondition setAggregation(String field, double lat, double lon, Pair<Double, Double>... rangePoints) { if (!StringUtils.isEmpty(field)) { GeoDistanceBuilder geoDistanceBuilder = new GeoDistanceBuilder(field).point(new GeoPoint(lat, lon)).unit(DistanceUnit.METERS); for (Pair<Double, Double> rangePoint : rangePoints) { geoDistanceBuilder.addRange(rangePoint.getFirst(), rangePoint.getSecond()); } aggregationBuilders.add(geoDistanceBuilder); } return this; } public SearchType getSearchType() { return searchType; } public SearchCondition setSearchType(SearchType searchType) { this.searchType = searchType; return this; } public int getLimit() { return limit; } public SearchCondition setLimit(int limit) { this.limit = limit; return this; } public int getOffset() { return offset; } public SearchCondition setOffset(int offset) { this.offset = offset; return this; }}
使用方法:
SearchCondition searchCondition = new SearchCondition(); searchCondition.setFilterBuilder(OperationBuilderFactory.builder().queryString("address", "云中", OperationType.MUST) .term("valid", 1, OperationType.MUST).builder()); searchCondition.setFilterBuilder(OperationBuilderFactory.builder().geoHash("location", lat, lon, 5, OperationType.MUST).builder()) .orderBy("location", lat, lon, SortOrder.ASC, GeoDistance.ARC).orderBy("id", SortOrder.ASC).setOffset(0).setLimit(100);
0 0
- elasticsearch查询语句篇
- Elasticsearch-sql 用SQL查询Elasticsearch语句
- 用SQL语句查询elasticsearch
- Elasticsearch 常用查询语句理解
- ElasticSearch入门常用查询语句
- 查询表达式转变为elasticsearch查询语句
- elasticsearch条件查询语句与聚合查询语句模板
- ElasticSearch 常用的查询过滤语句
- ElasticSearch 常用的查询过滤语句
- ElasticSearch 常用的查询过滤语句
- elasticsearch分布式搜索查询语句(实例)
- ElasticSearch常用的查询过滤语句
- ElasticSearch 常用的查询过滤语句
- ElasticSearch常用的基本查询语句详解
- ElasticSearch 常用的查询过滤语句
- ElasticSearch 常用的查询过滤语句
- ElasticSearch 常用的查询过滤语句
- ElasticSearch 常用的查询过滤语句
- msqli_* prevent SQL injection
- 算法中的小技巧
- kafka数据可靠性深度解读
- 2017华为软件精英挑战赛总结
- 解决VSCODE界面出现阴影-MAC版
- elasticsearch查询语句篇
- Kinect V2基本资料
- 【java】深度优先搜索和广度优先搜索
- ActiveMQ的介绍
- 第10章 提纲掣领
- LeetCode Algorithms 5. Longest Palindromic Substring 题解
- 将输入流(InputStream)对象保存到数据库(mysql)
- Retrofit2.0使用详解&&封装
- 关于ggplot2画散点图、条形图的一些细节认识