入门

来源：互联网发布：淘宝怎么看卖家信誉度编辑：程序博客网时间：2024/05/01 07:34
集群和节点:节点(node) 是一个运行着的Elasticsearch 实例, 集群(cluster）是一组具有相同cluster.name 的节点集合可以组成一个集群。你最好找一个合适的名字带替换cluster.name的默认值,这样可以防止一个新启动的节点加入到相同的网络中cluster.name: es_clusternode.name: node01path.data: /elk/elasticsearch/datapath.logs: /elk/elasticsearch/logsnetwork.host: 192.168.32.80network.port: 9200discovery.zen.ping.unicast.hosts: ["192.168.32.80", "192.168.32.81"]http://192.168.32.81:9200/_count?pretty/                             GET{"query": {"match_all": {}}}返回:{    "count": 500,    "_shards": {        "total": 21,        "successful": 21,        "failed": 0    }}面向文档:Relational DB -> Databases -> Tables -> Rows -> ColumnsElasticsearch -> Indices -> Types -> Documents -> Fields                 索引->类型->文档->字段Elasticsearch集群可以包含多个索引,每个索引可以包含多个类型的(type),每个类型包含多个文档,然后每个文档包含多个字段所以为了创建员工目录，我们将进行如下操作：1.为每个员工的文档(document)建立索引，每个文档包含了相应员工的所有信息。2.每个文档的类型为 employee  。3.employee  类型归属于索引 megacorp  。4.megacorp  索引存储在Elasticsearch集群中。http://192.168.32.81:9200/megacorp/employee/1/                                         PUT{"first_name" : "John","last_name" : "Smith","age" : 25,"about" : "I love to go rock climbing","interests": [ "sports", "music" ]}我们看到path: /megacorp/employee/1  包含三部分信息：名字 说明megacorp 索引名employee 类型名1        这个员工的ID让我们在目录中加入更多额员工信息:PUT /megacorp/employee/2{"first_name" : "Jane","last_name" : "Smith","age" : 32,"about" : "I like to collect rock albums","interests": [ "music" ]}PUT /megacorp/employee/3{"first_name" : "Douglas","last_name" : "Fir","age" : 35,"about": "I like to build cabinets","interests": [ "forestry" ]}Elasticsearch集群可以包含多个索引检索文档:http://192.168.32.80:9200/megacorp/employee/1/                                         GET{    "_index": "megacorp",    "_type": "employee",    "_id": "1",    "_version": 1,    "found": true,    "_source": {        "first_name": "John",        "last_name": "Smith",        "age": 25,        "about": "I love to go rock climbing",        "interests": [            "sports"            ,            "music"        ]    }} 我们通过HTTP 方法get来检索文档,同样的,我们可以使用DELETE 方法删除文档,使用HEAD 方法检索某文档是否存在。如果想要更新已存在的文档,我们只需要PUT一次。简单搜索:GET 请求非常简单---你能轻松获取你想要的文档,让我们来进一步尝试一些东西,比如简单的搜索！我们尝试一个最简单的搜索全部员工的请求:http://192.168.32.80:9200/megacorp/employee/_search/你可以看到我们依然使用megacorp 索引和employee 索引,但是我们在结尾使用关键字_search 来取代原来的文档ID.响应内部的hits 数组包含了我们所有的三个文档,默认情况下搜索返回前10个结果接下来,让我们搜索姓氏包含"Smith"的员工,要做到这一点,我们将在命令行中使用轻量级的搜索方法。这种方法被称作查询字符串(query string)搜索,因为我们像传递URL参数一样去传递查询语句curl localhost:9200/films/md/_search?q=tag:good demo:/root# curl http://192.168.32.81:9200/megacorp/employee/_search?q=last_name:lee{"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.30685282,"hits":[{"_index":"megacorp","_type":"employee","_id":"3","_score":0.30685282,"_source":{"first_name":"Jane","last_name":"lee","age":32,"about":"I like to collect rock albums","interests":["music"]}}]}}demo:/root# http://192.168.32.81:9200/megacorp/employee/_search/                                         ?q=last_name:lee                          GET{    "took": 7,    "timed_out": false,    "_shards": {        "total": 5,        "successful": 5,        "failed": 0    },    "hits": {        "total": 1,        "max_score": 0.30685282,        "hits": [            {                "_index": "megacorp",                "_type": "employee",                "_id": "3",                "_score": 0.30685282,                "_source": {                    "first_name": "Jane",                    "last_name": "lee",                    "age": 32,                    "about": "I like to collect rock albums",                    "interests": [                        "music"                    ]                }            }        ]    }使用DSL语句查询:DSL 以JSON 请求体的形式出现,我们可以这样表示之前关于“Smith”的查询:必须POST 请求:http://192.168.32.81:9200/megacorp/employee/_search/                                                       POST{"query" : {"match" : {"last_name" : "Smith"}}}更复杂的搜索:  我们让搜索稍微改变的复杂一些,我们依旧像要找到姓氏为"Smith"的员工,但是我们只想得到年龄大于30岁的员工。 我们的语句将添加过滤器(filter)，它是得我们高效率的执行一个结果话的检索:http://192.168.32.81:9200/megacorp/employee/_search/                                                  POST{"query" : {"filtered" : {"filter" : {"range" : {"age" : { "gt" : 30 } }},"query" : {"match" : {"last_name" : "smith" }}}}}返回:{    "took": 29,    "timed_out": false,    "_shards": {        "total": 5,        "successful": 5,        "failed": 0    },    "hits": {        "total": 1,        "max_score": 0.30685282,        "hits": [            {                "_index": "megacorp",                "_type": "employee",                "_id": "2",                "_score": 0.30685282,                "_source": {                    "first_name": "Jane",                    "last_name": "Smith",                    "age": 32,                    "about": "I like to collect rock albums",                    "interests": [                        "music"                    ]                }            }        ]    }}<1> 这部分查询属于区间过滤器(range filter),它用于查找所有年龄大于30岁的数据<2> 这部分查询与之前的 match  语句(query)一致。全文搜索:到目前为止搜索都很简单：搜索特定的名字,通过年龄筛选。让我们尝试一种更高级的搜索,全文搜索---一种传统数据库很难实现的功能。http://192.168.32.80:9200/megacorp/employee/_search/                                             POST{"query" : {"match" : {"about" : "rock climbing"}}}返回:{    "took": 6,    "timed_out": false,    "_shards": {        "total": 5,        "successful": 5,        "failed": 0    },    "hits": {        "total": 3,        "max_score": 0.16273327,        "hits": [            {                "_index": "megacorp",                "_type": "employee",                "_id": "1",                "_score": 0.16273327,                "_source": {                    "first_name": "John",                    "last_name": "Smith",                    "age": 25,                    "about": "I love to go rock climbing",                    "interests": [                        "sports"                        ,                        "music"                    ]                }            }            ,            {                "_index": "megacorp",                "_type": "employee",                "_id": "2",                "_score": 0.016878016,                "_source": {                    "first_name": "Jane",                    "last_name": "Smith",                    "age": 32,                    "about": "I like to collect rock albums",                    "interests": [                        "music"                    ]                }            }            ,            {                "_index": "megacorp",                "_type": "employee",                "_id": "3",                "_score": 0.016878016,                "_source": {                    "first_name": "Jane",                    "last_name": "lee",                    "age": 32,                    "about": "I like to collect rock albums",                    "interests": [                        "music"                    ]                }            }        ]    }}默认情况下，Elasticsearch根据结果相关性评分来对结果集进行排序，所谓的「结果相关性评分」就是文档与查询条件的匹配程度。很显然，排名第一的 John Smith  的 about  字段明确的写到“rock climbing”。但是为什么 Jane Smith  也会出现在结果里呢？原因是“rock”在她的 abuot  字段中被提及了。因为只有“rock”被提及而“climbing”没有，所以她的 _score  要低于John。这个例子很好的解释了Elasticsearch如何在各种文本字段中进行全文搜索，并且返回相关性最大的结果集。相关性(relevance)的概念在Elasticsearch中非常重要，而这个概念在传统关系型数据库中是不可想象的，因为传统数据库对记录的查询只有匹配或者不匹配短语搜索:目前我们可以在字段搜索单独的一个词,这挺好的,但是有时候你想要确切的匹配若干个单词或者短语(phrases).例如我们想要查询同时包含"rock" 和"combing"（并且是相邻的）员工记录。要做到这个,我们只要将match查询变更为match_phrase查询既可:http://192.168.32.80:9200/megacorp/employee/_search/                                             POST{"query" : {"match_phrase" : {"about" : "rock climbing"}}}查询{"query":{"match_all":{}}}易读结果转换器?重复请求显示选项?{    "took": 15,    "timed_out": false,    "_shards": {        "total": 5,        "successful": 5,        "failed": 0    },    "hits": {        "total": 1,        "max_score": 0.23013961,        "hits": [            {                "_index": "megacorp",                "_type": "employee",                "_id": "1",                "_score": 0.23013961,                "_source": {                    "first_name": "John",                    "last_name": "Smith",                    "age": 25,                    "about": "I love to go rock climbing",                    "interests": [                        "sports"                        ,                        "music"                    ]                }            }        ]    }}分析；最后,我们还有一个需求需要完成:允许管理者在职员中进行分析。Elasticsearch 有一个功能叫做聚合(aggregations),它允许你在数据上生成复杂的分析统计。它很像SQL中的GROUP BY 但是功能更强大。http://192.168.32.80:9200/megacorp/employee/_search/                                            POST{"aggs": {"all_interests": {"terms": { "field": "interests" }}}{    "took": 8,    "timed_out": false,    "_shards": {        "total": 5,        "successful": 5,        "failed": 0    },    "hits": {        "total": 3,        "max_score": 1,        "hits": [            {                "_index": "megacorp",                "_type": "employee",                "_id": "2",                "_score": 1,                "_source": {                    "first_name": "Douglas",                    "last_name": "Fir",                    "age": 35,                    "about": "I like to build cabinets",                    "interests": [                        "forestry"                    ]                }            }            ,            {                "_index": "megacorp",                "_type": "employee",                "_id": "1",                "_score": 1,                "_source": {                    "first_name": "John",                    "last_name": "Smith",                    "age": 25,                    "about": "I love to go rock climbing",                    "interests": [                        "sports"                        ,                        "music"                    ]                }            }            ,            {                "_index": "megacorp",                "_type": "employee",                "_id": "3",                "_score": 1,                "_source": {                    "first_name": "Jane",                    "last_name": "lee",                    "age": 32,                    "about": "I like to collect rock albums",                    "interests": [                        "music"                    ]                }            }        ]    },    "aggregations": {        "all_interests": {            "doc_count_error_upper_bound": 0,            "sum_other_doc_count": 0,            "buckets": [                {                    "key": "music",                    "doc_count": 2                }                ,                {                    "key": "forestry",                    "doc_count": 1                }                ,                {                    "key": "sports",                    "doc_count": 1                }            ]        }    }}我们可以看到两个职员对音乐有兴趣，一个喜欢林学，一个喜欢运动。这些数据并没有被预先计算好，它们是实时的从匹配查询语句的文档中动态计算生成的。如果我们想知道所有姓"Smith"的人最大的共同点（兴趣爱好），我们只需要增加合适的语句既可：/megacorp/employee/3{"first_name" : "Douglas","last_name" : "smith","age" : 35,"about": "I like to build cabinets","interests": [ "music" ]}http://192.168.32.80:9200/megacorp/employee/_search/                                                                                     POST{"query": {"match": {"last_name": "smith"}},"aggs": {"all_interests": {"terms": {"field": "interests"}}}}http://192.168.32.80:9200/megacorp/employee/_search/                                      POST{"aggs" : {"all_interests" : {"terms" : { "field" : "interests" },"aggs" : {"avg_age" : {"avg" : { "field" : "age" }}}}}}聚合也允许分级汇总。例如，让我们统计每种兴趣下职员的平均年龄：分布式的特性；Elasticsearch致力于隐藏分布式系统的复杂性。以下这些操作都是在底层自动完成的：将你的文档分区到不同的容器或者分片(shards)中，它们可以存在于一个或多个节点中。将分片均匀的分配到各个节点，对索引和搜索做负载均衡。冗余每一个分片，防止硬件故障造成的数据丢失。将集群中任意一个节点上的请求路由到相应数据所在的节点。无论是增加节点，还是移除节点，分片都可以做到无缝的扩展和迁移。
0 0