elasticsearch-dsl 2.0.0 介绍

来源：互联网发布：图书馆数据库设计报告编辑：程序博客网时间：2024/06/08 06:23

elasticsearch-dsl 2.0.0 　by Honza Král　原文链接　翻译:AbnerGong

Elasticsearch DSL是一个高级库，为了对Elasticsearch进行辅助书写和运行的。它建在官方低级客户端(elasticsearch-py)之上。
它提供了书写和操纵查询的非常方便和流畅的方式。而且它保持与Elasticsearch JSON DSL非常接近的属于和结构。它从Python揭开了整个DSL，通过定义类或者类似查询集的方式。
它也提供了可选的对文档的包装方式：定义mapping，取回和保存文档，包装文档数据用用户定义的类。
要用其它的Elasticsearch APIs(比如cluster health)只需要用根本客户端即可(underlying client)

适应性(Compatibility)

搜索样例(Search Example)

我们先直接用dict写一个典型的搜索请求：
（译者注：下文中的filtered在elasticsearch2.0版本以后已经被bool取代）

from elasticsearch import Elasticsearchclient = Elasticsearch()response = client.search(    index="my-index",    body={      "query": {        "filtered": {          "query": {            "bool": {              "must": [{"match": {"title": "python"}}],              "must_not": [{"match": {"description": "beta"}}]            }          },          "filter": {"term": {"category": "search"}}        }      },      "aggs" : {        "per_tag": {          "terms": {"field": "tags"},          "aggs": {            "max_lines": {"max": {"field": "lines"}}          }        }      }    })for hit in response['hits']['hits']:    print(hit['_score'], hit['_source']['title'])for tag in response['aggregations']['per_tag']['buckets']:    print(tag['key'], tag['max_lines']['value'])

用这个方法的问题在于它非常冗长，还可能会有错误嵌套的语法错误，很难修改(比如加入另一个filter)而且绝对写起来很无趣

让我们用Python DSL重写一下这个样例：

from elasticsearch import Elasticsearchfrom elasticsearch_dsl import Search, Qclient = Elasticsearch()s = Search(using=client, index="my-index") \    .filter("term", category="search") \    .query("match", title="python")   \    .query(~Q("match", description="beta"))s.aggs.bucket('per_tag', 'terms', field='tags') \    .metric('max_lines', 'max', field='lines')response = s.execute()for hit in response:    print(hit.meta.score, hit.title)for tag in response.aggregations.per_tag.buckets:    print(tag.key, tag.max_lines.value)

正如你所看到的，这个库处理了(took care of)：
- 通过名称(eq. “match”)创建合适的Query对象
- 将一些查询组到一个bool查询中
- 因为.filter()被使用而创建一个filtered查询
- 提供对返回结果数据的很方便的访问
- 没有用到弯曲或竖直的括号（即大括号或中括号）

持续性样例(Persistence Example)

from datetime import datetimefrom elasticsearch_dsl import DocType, String, Date, Integerfrom elasticsearch_dsl.connections import connections# Define a default Elasticsearch clientconnections.create_connection(hosts=['localhost'])class Article(DocType):    title = String(analyzer='snowball', fields={'raw': String(index='not_analyzed')})    body = String(analyzer='snowball')    tags = String(index='not_analyzed')    published_from = Date()    lines = Integer()    class Meta:        index = 'blog'    def save(self, ** kwargs):        self.lines = len(self.body.split())        return super(Article, self).save(** kwargs)    def is_published(self):        return datetime.now() > self.published_from# create the mappings in elasticsearchArticle.init()# create and save and articlearticle = Article(meta={'id': 42}, title='Hello world!', tags=['test'])article.body = ''' looong text '''article.published_from = datetime.now()article.save()article = Article.get(id=42)print(article.is_published())# Display cluster healthprint(connections.get_connection().cluster.health())

在这个例子你能看到：

提供一个默认连接
用mapping配置定义一些域
设置索引名
定义自定义的方法
重写(override)内置的.save()方法来hook into the 持续生命周期
取回并保存对象到Elasticsearch中
访问基本客户端for other APIs
你可以在文档的persistence章节查看更多内容

从elasticsearch-py迁移

你不用非得转换你的整个应用为了获得Python DSL的好处，你可以逐渐地，通过先从你已经存在的dict创建一个search对象，用API更改它并序列化回dict：

body = {...} # insert complicated query here# Convert to Search objects = Search.from_dict(body)# Add some filters, aggregations, queries, ...s.filter("term", tags="python")# Convert back to dict to plug back into existing codebody = s.to_dict()

官方文档 Documentation

https://elasticsearch-dsl.readthedocs.org/

0 0