章鱼之家 2019-12-24
了解Elasticsearch,并简单的运用到springboot项目中。
Elasticsearch(开源,分布式,RESTful搜索引擎)
github地址(https://github.com/elastic/elasticsearch)
笔者早期参与的php项目并没有涉及到搜索方面,就算有也是比较简单的使用一下 like 语句来实现搜索功能。
Elasticsearch这个名词倒是早有耳闻,不过当时一来业务场景用不到,二来它在java领域的使用更广泛,直到后来需要做用户行为日志分析才简单的用到了它。
日志分析(用户行为、监控、安全、业务等等)、搜索业务。
当时的业务场景是需要做用户的行为分析。我们在前端进行了埋点,通过收集用户的访问路径来进行分析。
异步的收集与api都是用的笔者自己写的php异步框架GroupCo
贴一下部分代码
写入日志
$record = $log['ip'].' ['.$log['time'].'] "'.$log['url'].'" "'.$log['referrer'].'" "'.$log['agent'].'" "'.$log['uuid'].'" "'.$log['device']."\"\n"; yield AsyncFile::write(__ROOT__."runtime/data/".date('Ymd').".log", $record, FILE_APPEND);
logstash配置处理日志
input { file { path => "/var/www/log/runtime/data/*" start_position => "beginning" } } filter { mutate { replace => { "type" => "access" } } grok { match=>{ "message"=>"%{IP:clientip} \[%{HTTPDATE:timestamp}\] \"%{NOTSPACE:request}\" \"(?:%{URI:referrer}|-)\" \"%{GREEDYDATA:agent}\" \"%{NOTSPACE:uuid}\" \"%{NOTSPACE:device}\"" } } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] } } output { elasticsearch { hosts => ["localhost:9200"] } stdout { codec => rubydebug } }
通过Elasticsearch的搜索API进行相关查询(搜索参数可以到官方文档查阅).
比如列出当天用户(以uuidl区分)
$http = new AsyncHttp('http://127.0.0.1:9200'); yield $http->parseDomain(); $client = $http->getClient("/logstash-{$date}/_search"); $client->setMethod("GET"); $client->setData(' { "size" : 0, "aggs" : { "device" : { "terms" : { "field" : "device.keyword" }, "aggs" : { "uuids" : { "terms" : { "field" : "uuid.keyword", "size": 1000 } } } } } }'); $client->setHeaders(['Content-Type' => 'application/json']); $res = (yield $client);
从php项目来简单了解了一下Elasticsearch在日志分析时的使用。有兴趣的phper可以自己尝试一下。
在springboot中使用spring-data-elasticsearch其实挺简单的,大概分为4步骤:
package com.clothesmake.user.dao.entity; import lombok.Data; import org.hibernate.annotations.DynamicInsert; import org.hibernate.annotations.DynamicUpdate; import org.springframework.data.elasticsearch.annotations.Document; import javax.persistence.*; import java.io.Serializable; /** * user * @author coco */ @Entity @Data @DynamicInsert @DynamicUpdate @Table(name = "user") @Document(indexName = "user") public class UserEntity implements Serializable { private static final long serialVersionUID = 1L; /** * id */ @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Integer id; /** * nickname */ private String nickname; /** * 手机 */ private String mobile; /** * email */ private String email; /** * password */ private String password; public UserEntity() { } }
package com.clothesmake.user.dao.repository.search; import com.clothesmake.user.dao.entity.UserEntity; import org.springframework.data.elasticsearch.repository.ElasticsearchRepository; public interface UserSearchRepository extends ElasticsearchRepository<UserEntity, Integer> { }
这里有一个坑,就是ElasticsearchRepositories与其他JpaRepositories应该放到不同的包下面,然后通过配置分别设置注入解析不同的包。
不然会报错"No property index found for type user"
@EnableJpaRepositories("com.clothesmake.user.dao.repository") @EnableElasticsearchRepositories("com.clothesmake.user.dao.repository.search")
public Page<UserEntity> searchUsers(String query, Pageable pageable) { Page<UserEntity> users = userSearchRepository.search(queryStringQuery(query), pageable); return users; }
spring.data.elasticsearch.cluster-nodes=127.0.0.1:9300
Keyword | Sample | Elasticsearch Query String |
---|---|---|
And | findByNameAndPrice | {"bool" : {"must" : [ {"field" : {"name" : "?"}}, {"field" : {"price" : "?"}} ]}} |
Or | findByNameOrPrice | {"bool" : {"should" : [ {"field" : {"name" : "?"}}, {"field" : {"price" : "?"}} ]}} |
Is | findByName | {"bool" : {"must" : {"field" : {"name" : "?"}}}} |
Not | findByNameNot | {"bool" : {"must_not" : {"field" : {"name" : "?"}}}} |
Between | findByPriceBetween | {"bool" : {"must" : {"range" : {"price" : {"from" : ?,"to" : ?,"include_lower" : true,"include_upper" : true}}}}} |
LessThanEqual | findByPriceLessThan | {"bool" : {"must" : {"range" : {"price" : {"from" : null,"to" : ?,"include_lower" : true,"include_upper" : true}}}}} |
GreaterThanEqual | findByPriceGreaterThan | {"bool" : {"must" : {"range" : {"price" : {"from" : ?,"to" : null,"include_lower" : true,"include_upper" : true}}}}} |
Before | findByPriceBefore | {"bool" : {"must" : {"range" : {"price" : {"from" : null,"to" : ?,"include_lower" : true,"include_upper" : true}}}}} |
After | findByPriceAfter | {"bool" : {"must" : {"range" : {"price" : {"from" : ?,"to" : null,"include_lower" : true,"include_upper" : true}}}}} |
Like | findByNameLike | {"bool" : {"must" : {"field" : {"name" : {"query" : "?*","analyze_wildcard" : true}}}}} |
StartingWith | findByNameStartingWith | {"bool" : {"must" : {"field" : {"name" : {"query" : "?*","analyze_wildcard" : true}}}}} |
EndingWith | findByNameEndingWith | {"bool" : {"must" : {"field" : {"name" : {"query" : "*?","analyze_wildcard" : true}}}}} |
Contains/Containing | findByNameContaining | {"bool" : {"must" : {"field" : {"name" : {"query" : "?","analyze_wildcard" : true}}}}} |
In | findByNameIn(Collectionnames) | {"bool" : {"must" : {"bool" : {"should" : [ {"field" : {"name" : "?"}}, {"field" : {"name" : "?"}} ]}}}} |
NotIn | findByNameNotIn(Collectionnames) | {"bool" : {"must_not" : {"bool" : {"should" : {"field" : {"name" : "?"}}}}}} |
True | findByAvailableTrue | {"bool" : {"must" : {"field" : {"available" : true}}}} |
False | findByAvailableFalse | {"bool" : {"must" : {"field" : {"available" : false}}}} |
OrderBy | findByAvailableTrueOrderByNameDesc | {"sort" : [{ "name" : {"order" : "desc"} }],"bool" : {"must" : {"field" : {"available" : true}}}} |
另外一部分,则需要先做聚类、分类处理,将聚合出的分类结果存入ES集群的聚类索引中。数据处理层的聚合结果存入ES中的指定索引,同时将每个聚合主题相关的数据存入每个document下面的某个field下。