心丨悦 2020-05-08
如果返回的结果集中很多符合条件的结果,那怎么能一眼就能看到我们想要的那个结果呢?比如下面网站所示的那样,我们搜索elasticsearch
,在结果集中,将所有elasticsearch
高亮显示?
如上图我们搜索百度一样。我们该怎么做呢?
PUT lqz/doc/4 { "name":"石头", "age":29, "from":"gu", "desc":"粗中有细,狐假虎威", "tags":["粗", "大","猛"] }
我们来查询:
GET lqz/doc/_search { "query": { "match": { "name": "石头" } }, "highlight": { "fields": { "name": {} } } } #我们使用highlight属性来实现结果高亮显示,需要的字段名称添加到fields内即可,elasticsearch会自动帮我们实现高亮。
结果如下: { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.5098256, "hits" : [ { "_index" : "lqz", "_type" : "doc", "_id" : "4", "_score" : 1.5098256, "_source" : { "name" : "石头", "age" : 29, "from" : "gu", "desc" : "粗中有细,狐假虎威", "tags" : [ "粗", "大", "猛" ] }, "highlight" : { "name" : [ "<em>石</em><em>头</em>" ] } } ] } }
查询结果
上例中,elasticsearch
会自动将检索结果用标签包裹起来,用于在页面中渲染。
GET lqz/chengyuan/_search { "query": { "match": { "from": "gu" } }, "highlight": { "pre_tags": "<b class=‘key‘ style=‘color:red‘>", "post_tags": "</b>", "fields": { "from": {} } } } 上例中,在highlight中,pre_tags用来实现我们的自定义标签的前半部分,在这里,我们也可以为自定义的标签添加属性和样式。post_tags实现标签的后半部分,组成一个完整的标签。至于标签中的内容,则还是交给fields来完成。
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5753642, "hits" : [ { "_index" : "lqz", "_type" : "chengyuan", "_id" : "1", "_score" : 0.5753642, "_source" : { "name" : "老二", "age" : 30, "sex" : "male", "birth" : "1070-10-11", "from" : "gu", "desc" : "皮肤黑,武器长,性格直", "tags" : [ "黑", "长", "直" ] }, "highlight" : { "name" : [ "<b class=‘key‘ style=‘color:red‘>老</b><b class=‘key‘ style=‘color:red‘>二</b>" ] } } ] } }
查询结果
需要注意的是:自定义标签中属性或样式中的逗号一律用英文状态的单引号表示,应该与外部elasticsearch
语法的双引号区分开。
前后端分离,你怎么处理?把<b class=‘key‘ style=‘color:red‘>串直接以json格式返回,前端自行渲染
avg
max
min
sum
# 查询`from`是`gu`的人的平均年龄。 # select max(age) as my_avg from user; GET lqz/doc/_search { "query": { "match": { "from": "gu" } }, "aggs": { "my_avg": { "avg": { "field": "age" } } }, "_source": ["name", "age"] }
上例中,首先匹配查询from
是gu
的数据。在此基础上做查询平均值的操作,这里就用到了聚合函数,其语法被封装在aggs
中,而my_avg
则是为查询结果起个别名,封装了计算出的平均值。那么,要以什么属性作为条件呢?是age
年龄,查年龄的什么呢?是avg
,查平均年龄。
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 0.6931472, "hits" : [ { "_index" : "lqz", "_type" : "doc", "_id" : "4", "_score" : 0.6931472, "_source" : { "name" : "石头", "age" : 29 } }, { "_index" : "lqz", "_type" : "doc", "_id" : "1", "_score" : 0.2876821, "_source" : { "name" : "顾老二", "age" : 30 } }, { "_index" : "lqz", "_type" : "doc", "_id" : "3", "_score" : 0.2876821, "_source" : { "name" : "龙套偏房", "age" : 22 } } ] }, "aggregations" : { "my_avg" : { "value" : 27.0 } } }
查询结果
上例中,在查询结果的最后是平均值信息,可以看到是27岁。
虽然我们已经使用_source
对字段做了过滤,但是还不够。我不想看都有哪些数据,只想看平均值怎么办?别忘了size
!
GET lqz/doc/_search { "query": { "match": { "from": "gu" } }, "aggs": { "my_avg": { "avg": { "field": "age" } } }, "size": 0, "_source": ["name", "age"] }
上例中,只需要在原来的查询基础上,增加一个size
就可以了,输出几条结果,我们写上0,就是输出0条查询结果。
{ "took" : 8, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "my_avg" : { "value" : 27.0 } } }
查询结果
GET lqz/doc/_search { "query": { "match": { "from": "gu" } }, "aggs": { "my_max": { "max": { "field": "age" } } }, "size": 0 }
上例中,只需要在查询条件中将avg
替换成max
即可。
GET lqz/doc/_search { "query": { "match": { "from": "gu" } }, "aggs": { "my_min": { "min": { "field": "age" } } }, "size": 0 }
# 求年龄总和GET lqz/doc/_search { "query": { "match": { "from": "gu" } }, "aggs": { "my_sum": { "sum": { "field": "age" } } }, "size": 0 }
现在我想要查询所有人的年龄段,并且按照15~20,20~25,25~30
分组,并且算出每组的平均年龄。
GET lqz/doc/_search { "size": 0, "query": { "match_all": {} }, "aggs": { "age_group": { "range": { "field": "age", "ranges": [ { "from": 15, "to": 20 }, { "from": 20, "to": 25 }, { "from": 25, "to": 30 } ] }, "aggs": { "my_avg": { "avg": { "field": "age" } } } } } }
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "age_group" : { "buckets" : [ { "key" : "15.0-20.0", "from" : 15.0, "to" : 20.0, "doc_count" : 1, "my_avg" : { "value" : 18.0 } }, { "key" : "20.0-25.0", "from" : 20.0, "to" : 25.0, "doc_count" : 1, "my_avg" : { "value" : 22.0 } }, { "key" : "25.0-30.0", "from" : 25.0, "to" : 30.0, "doc_count" : 2, "my_avg" : { "value" : 27.0 } } ] } } }
查询结果
上例中,在aggs
的自定义别名age_group
中,使用range
来做分组,field
是以age
为分组,分组使用ranges
来做,from
和to
是范围,我们根据需求做出三组。在分组下面,我们使用aggs
对age
做平均数处理,这样就可以了。返回的结果中可以看到,已经拿到了三个分组。doc_count
为该组内有几条数据,此次共分为三组,查询出4条内容。还有一条数据的age
属性值是30
,不在分组的范围内!
注意:聚合函数的使用,一定是先查出结果,然后对结果使用聚合函数做处理
另外一部分,则需要先做聚类、分类处理,将聚合出的分类结果存入ES集群的聚类索引中。数据处理层的聚合结果存入ES中的指定索引,同时将每个聚合主题相关的数据存入每个document下面的某个field下。