指定Elasticsearch 的 Analyzer

kingdz 2017-07-07

安装好ELK后,默认的elasticsearch用的分词器为standard analyzer,所以我们的异常“org.springframework.jdbc.BadSqlGrammarException”不能通过BadSqlGrammarException搜索到。

以“one.two.three.+four”为例子,如果用standard analyzer,只有两个term,用simple将有4个term

https://discuss.elastic.co/t/dot-analyzer/3635/2

default analyzer,即standard analyzer

curl -XGET '
http://localhost:9200/twitter/_analyze?text=one.two.three.+four&pretty=1'
{
"tokens" : [ {
"token" : "one.two.three",
"start_offset" : 0,
"end_offset" : 14,
"type" : "",
"position" : 1
}, {
"token" : "four",
"start_offset" : 15,
"end_offset" : 19,
"type" : "",
"position" : 2
} ]
}

 改用simple analyzer,有4个term被分出来:

curl -XGET '
http://localhost:9200/twitter/_analyze?analyzer=simple&text=one.two.three.+four&pretty=1
'

{
"tokens" : [ {
"token" : "one",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
}, {
"token" : "two",
"start_offset" : 4,
"end_offset" : 7,
"type" : "word",
"position" : 2
}, {
"token" : "three",
"start_offset" : 8,
"end_offset" : 13,
"type" : "word",
"position" : 3
}, {
"token" : "four",
"start_offset" : 15,
"end_offset" : 19,
"type" : "word",
"position" : 4
} ]
}

分词器可以为每个query指定,每个field或者每个index。refer to :https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html

es选择分词器的顺序为:

索引阶段

-The analyzer defined in the field mapping.
-An analyzer named default in the index settings.
-The standard analyzer.

 查询阶段

-The analyzer defined in a full-text query.
-The search_analyzer defined in the field mapping.
-The analyzer defined in the field mapping.
-An analyzer named default_search in the index settings.
-An analyzer named default in the index settings.
-The standard analyzer.

我们设置logstash过来的数据对message field指定为simple analyzer:

PUT _template/logstash
{
    "template" : "logstash-*",
    "mappings": {
      "test": {
        "properties": {
          "message": { 
            "type": "text",
            "analyzer": "simple"
          }
        }
      }
  }
}

 创建一个名为logstash的template,它应用于所有名为logstash-*的index,为这个template建了一个名为test的mapping,该mapping下的message filed为文本类型,使用的analyze为simple。

OK,更改了名为logstash-*的index的analyzer为simple analyzer。测试一下:

1)在logstash监听的log中增加一条数据:

org.springframework.jdbc.BadSqlGrammarException: ### Error querying database.

 2)看到elasticsearch的console打出一行日志,上面这条数据已经被索引,并使用了我们定义的template logstash,以及我们的mapping test:

[2017-07-06T23:33:22,262][INFO ][o.e.c.m.MetaDataCreateIndexService] [4fPRwZ3] [logstash-2017.07.06] creating index, cause [auto(bulk api)], templates [logstash], shards [5]/[1], mappings [test]

 *关于 shards [5]/[1]:

By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.

refer to:https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html#getting-started-shards-and-replicas

 3)测试通过“ jdbc”在message field来搜索: 

GET /_search
{
  "query": {
      "query_string": {
          "query": "jdbc",
          "fields": [
              "message"
          ]
      }
  }
}

可以看到搜索成功:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.1290016,
    "hits": [
      {
        "_index": "logstash-2017.07.06",
        "_type": "testlogs",
        "_id": "AV0YmlmqLzl7sqCPLrgd",
        "_score": 0.28582606,
        "_source": {
          "path": "/Users/jo/lp_logs/error.log",
          "@timestamp": "2017-07-06T15:52:34.975Z",
          "@version": "1",
          "host": "Zhuos-MacBook-Pro.local",
          "message": "\torg.springframework.jdbc.BadSqlGrammarException: ### Error querying database.",
          "type": "testlogs"
        }
      }
    ]
  }
}

 在Kibana中搜索message:jdbc即可获得结果。

但是比较奇怪的现象是,必须指定message域才能查出来。而message中的一些其他term,比如“Error”就可以不指定message直接查出来。

相关推荐