elasticsearch-dsl 2.0.0 介绍

来源:互联网 发布:图书馆数据库设计报告 编辑:程序博客网 时间:2024/06/08 06:23

elasticsearch-dsl 2.0.0  by Honza Král  原文链接  翻译:AbnerGong

Elasticsearch DSL是一个高级库,为了对Elasticsearch进行辅助书写和运行的。它建在官方低级客户端(elasticsearch-py)之上。
它提供了书写和操纵查询的非常方便和流畅的方式。而且它保持与Elasticsearch JSON DSL非常接近的属于和结构。它从Python揭开了整个DSL,通过定义类或者类似查询集的方式。
它也提供了可选的对文档的包装方式:定义mapping,取回和保存文档,包装文档数据用用户定义的类。
要用其它的Elasticsearch APIs(比如cluster health)只需要用根本客户端即可(underlying client)

适应性(Compatibility)

搜索样例(Search Example)

我们先直接用dict写一个典型的搜索请求:
(译者注:下文中的filtered在elasticsearch2.0版本以后已经被bool取代)

from elasticsearch import Elasticsearchclient = Elasticsearch()response = client.search(    index="my-index",    body={      "query": {        "filtered": {          "query": {            "bool": {              "must": [{"match": {"title": "python"}}],              "must_not": [{"match": {"description": "beta"}}]            }          },          "filter": {"term": {"category": "search"}}        }      },      "aggs" : {        "per_tag": {          "terms": {"field": "tags"},          "aggs": {            "max_lines": {"max": {"field": "lines"}}          }        }      }    })for hit in response['hits']['hits']:    print(hit['_score'], hit['_source']['title'])for tag in response['aggregations']['per_tag']['buckets']:    print(tag['key'], tag['max_lines']['value'])

用这个方法的问题在于它非常冗长,还可能会有错误嵌套的语法错误,很难修改(比如加入另一个filter)而且绝对写起来很无趣

让我们用Python DSL重写一下这个样例:

from elasticsearch import Elasticsearchfrom elasticsearch_dsl import Search, Qclient = Elasticsearch()s = Search(using=client, index="my-index") \    .filter("term", category="search") \    .query("match", title="python")   \    .query(~Q("match", description="beta"))s.aggs.bucket('per_tag', 'terms', field='tags') \    .metric('max_lines', 'max', field='lines')response = s.execute()for hit in response:    print(hit.meta.score, hit.title)for tag in response.aggregations.per_tag.buckets:    print(tag.key, tag.max_lines.value)

正如你所看到的,这个库处理了(took care of):
- 通过名称(eq. “match”)创建合适的Query对象
- 将一些查询组到一个bool查询中
- 因为.filter()被使用而创建一个filtered查询
- 提供对返回结果数据的很方便的访问
- 没有用到弯曲或竖直的括号(即大括号或中括号)

持续性样例(Persistence Example)

from datetime import datetimefrom elasticsearch_dsl import DocType, String, Date, Integerfrom elasticsearch_dsl.connections import connections# Define a default Elasticsearch clientconnections.create_connection(hosts=['localhost'])class Article(DocType):    title = String(analyzer='snowball', fields={'raw': String(index='not_analyzed')})    body = String(analyzer='snowball')    tags = String(index='not_analyzed')    published_from = Date()    lines = Integer()    class Meta:        index = 'blog'    def save(self, ** kwargs):        self.lines = len(self.body.split())        return super(Article, self).save(** kwargs)    def is_published(self):        return datetime.now() > self.published_from# create the mappings in elasticsearchArticle.init()# create and save and articlearticle = Article(meta={'id': 42}, title='Hello world!', tags=['test'])article.body = ''' looong text '''article.published_from = datetime.now()article.save()article = Article.get(id=42)print(article.is_published())# Display cluster healthprint(connections.get_connection().cluster.health())

在这个例子你能看到:

  • 提供一个默认连接
  • 用mapping配置定义一些域
  • 设置索引名
  • 定义自定义的方法
  • 重写(override)内置的.save()方法来hook into the 持续生命周期
  • 取回并保存对象到Elasticsearch中
  • 访问基本客户端for other APIs
    你可以在文档的persistence章节查看更多内容

从elasticsearch-py迁移

你不用非得转换你的整个应用为了获得Python DSL的好处,你可以逐渐地,通过先从你已经存在的dict创建一个search对象,用API更改它并序列化回dict:

body = {...} # insert complicated query here# Convert to Search objects = Search.from_dict(body)# Add some filters, aggregations, queries, ...s.filter("term", tags="python")# Convert back to dict to plug back into existing codebody = s.to_dict()

官方文档 Documentation

https://elasticsearch-dsl.readthedocs.org/

0 0