huzai 2019-08-30
##索引数据,dynamic mapping 会不断加入新增字段 PUT cookie_service/_doc/1 { "url":"www.google.com", "cookies":{ "username":"tom", "age":32 } } PUT cookie_service/_doc/2 { "url":"www.amazon.com", "cookies":{ "login":"2019-01-01", "email":"[email protected]" } }
我们的cookie字段使用了Dynamic=true
默认值,所以随着写入的数据越来越多,如果不对cookies字段的子字段进行限制的话,字段数会越来越多,会影响性能,
#使用 Nested 对象,增加key/value PUT cookie_service { "mappings": { "dynamic": "strict", "properties": { "cookies": { "type": "nested", "properties": { "name": { "type": "keyword" }, "dateValue": { "type": "date" }, "keywordValue": { "type": "keyword" }, "IntValue": { "type": "integer" } } }, "url": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } }
需要说明几点的是:
"dynamic": "strict"
阻止其他字段加入正则查询的性能不够好,前缀查询属于Term查询
文档中某个字段包含了Elasticsearch的版本信息,例如version: "7.1.0",现在我们需要查询朱版本是7,次要版本是2的文档,不要使用正则查询
PUT softwares/ { "mappings": { "_meta": { "software_version_mapping": "1.1" }, "properties": { "version": { "properties": { "display_name": { "type": "keyword" }, "hot_fix": { "type": "byte" }, "marjor": { "type": "byte" }, "minor": { "type": "byte" } } } } } }
然后我们再使用查询
POST softwares/_search { "query": { "bool": { "filter": [ { "match":{ "version.marjor":7 } }, { "match":{ "version.minor":2 } } ] } } }
PUT ratings/_doc/1 { "rating":5 } PUT ratings/_doc/2 { "rating":null } POST ratings/_search { "size": 0, "aggs": { "avg": { "avg": { "field": "rating" } } } } # 查询结果 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "avg" : { "value" : 5.0 } } }
很明显,我们查到了两条数据,但是平均值是5,这个很难以理解,
PUT ratings { "mappings": { "properties": { "rating": { "type": "float", "null_value": 0 } } } }
再次插入上面的数据,我们得到下面的结果
{ "took" : 5, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "avg" : { "value" : 2.5 } } }
这样就比较对了,当然null_value
的值是可以自己根据业务需求自己设定的
另外一部分,则需要先做聚类、分类处理,将聚合出的分类结果存入ES集群的聚类索引中。数据处理层的聚合结果存入ES中的指定索引,同时将每个聚合主题相关的数据存入每个document下面的某个field下。