编码之路 2020-05-03
1.介绍
1.Lucene
就是一个jar包,里面包含了封装好的各种建立倒排索引,以及进行搜索的代码,包含各种算法,我们就用java开发的时候,引入lucene jar,然后基于lucene的api去进行开发就可以了,
我们就可以将已有的数据数据建立索引,lucene会在本地磁盘上面,给我们组织索引的数据结构。另外的话,我们也可以用lucene提供的的功能和api来针对磁盘上的索引数据,进行搜索。
2.elasticsearch
分布式搜索和分析引擎
Elasticsearch也使用Java开发并使用Lucene作为其核心来实现所有索引和搜索的功能,但是它的目的是通过简单的RESTful API来隐藏Lucene的复杂性,从而让全文搜索变得简单。
总结:
es就是lucene封装了外壳,讲lucene复杂的流程简单化 Elasticsearch不是什么新技术,主要是将全文检索,数据分析以及分布式技术,合并在一起,才形成了独一无二的ES,lucene(全文检索)
2.安装
1.安装java
yum install -y java-1.8.0-openjdk.x86_64
2.下载安装软件
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.6.0.rpm rpm -ivh elasticsearch-6.6.0.rpm warning: elasticsearch-6.6.0.rpm: Header V4 RSA/SHA512 Signature, key ID d88e42b4: NOKEY Preparing... ################################# [100%] Creating elasticsearch group... OK Creating elasticsearch user... OK Updating / installing... 1:elasticsearch-0:6.6.0-1 ################################# [100%] ### NOT starting on installation, please execute the following statements to configure elasticsearch service to start automatically using systemd sudo systemctl daemon-reload sudo systemctl enable elasticsearch.service ### You can start elasticsearch service by executing sudo systemctl start elasticsearch.service Created elasticsearch keystore in /etc/elasticsearch ### 配置启动 systemctl daemon-reload systemctl enable elasticsearch.service systemctl start elasticsearch.service systemctl status elasticsearch.service ### 检查是否启动成功 [ soft]# netstat -lntup |grep 9200 tcp6 0 0 127.0.0.1:9200 :::* LISTEN 10317/java tcp6 0 0 ::1:9200 :::* LISTEN 10317/java [ soft]# curl 127.0.0.1:9200 { "name" : "ixFqenL", "cluster_name" : "elasticsearch", "cluster_uuid" : "IOiVybhlTe6X6j1YUNOrMA", "version" : { "number" : "6.6.0", "build_flavor" : "default", "build_type" : "rpm", "build_hash" : "a9861f4", "build_date" : "2019-01-24T11:27:09.439740Z", "build_snapshot" : false, "lucene_version" : "7.6.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }
3.配置文件
rpm -ql elasticsearch #查看elasticsearch软件安装了哪些目录 rpm -qc elasticsearch #查看elasticsearch的所有配置文件 /etc/elasticsearch/elasticsearch.yml #配置文件 /etc/elasticsearch/jvm.options. #jvm虚拟机配置文件 /etc/init.d/elasticsearch #init启动文件 /etc/sysconfig/elasticsearch #环境变量配置文件 /usr/lib/sysctl.d/elasticsearch.conf #sysctl变量文件,修改最大描述符 /usr/lib/systemd/system/elasticsearch.service #systemd启动文件 /var/lib/elasticsearch # 数据目录 /var/log/elasticsearch #日志目录 /var/run/elasticsearch #pid目录 1.修改配置文件 [ soft]# grep "^[a-Z]" /etc/elasticsearch/elasticsearch.yml ##节点名node.name: node-1 ##数据目录 path.data: /data/elasticsearch ##日志目录 path.logs: /var/log/elasticsearch ##锁内存,提前占用内存 bootstrap.memory_lock: true ##网络ip,不配置默认为127.0.0.1,这样只能自己访问 network.host: 192.168.100.29 ##端口,默认9200 http.port: 9200 2.查看最大最小内存 [ soft]# vi /etc/elasticsearch/jvm.options -Xms1g -Xmx1g 官方文档 内存限制: 1.不要超过32G 2.最大最小设置一样 3.配置文件设置锁定内存 4.至少给服务器本身空余50%的内存 3.创建数据目录 mkdir /data/elasticsearch ##因为es会自动创建用户,所以需要讲目录给他,不然写不了数据 chown -R elasticsearch:elasticsearch /data/elasticsearch/ 4.启动 此时会失败 tail -f /var/log/elasticsearch/elasticsearch.log [1]: memory locking requested for elasticsearch process but memory is not locked 官方解决方法: ### 修改启动配置文件或创建新配置文件 方法1: systemctl edit elasticsearch 方法2: vim /usr/lib/systemd/system/elasticsearch.service ### 增加如下参数 [Service] LimitMEMLOCK=infinity ### 重新启动 systemctl daemon-reload systemctl restart elasticsearch 5.配置完成 [ ~]# netstat -lntup |grep 9200 tcp6 0 0 192.168.100.29:9200 :::* LISTEN 10848/java [ ~]# curl 192.168.100.29:9200 { "name" : "node-1", "cluster_name" : "elasticsearch", "cluster_uuid" : "q04N04iKQ-KaU8_KPCLJBw", "version" : { "number" : "6.6.0", "build_flavor" : "default", "build_type" : "rpm", "build_hash" : "a9861f4", "build_date" : "2019-01-24T11:27:09.439740Z", "build_snapshot" : false, "lucene_version" : "7.6.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }
3.head插件安装
vi /etc/elasticsearch/elasticsearch.yml http.cors.enabled: true http.cors.allow-origin: "*"
把head包放入谷歌
Elasticsearch 数据库
=========================
Document 行
Type 表
Index 库
filed 字段
创建索引
curl -XPUT 192.168.100.29:9200/vipinfo?pretty curl -XPUT ‘192.168.100.29:9200/vipinfo/user/1?pretty‘ -H ‘Content-Type: application/json‘ -d‘ { "first_name" : "John", "last_name": "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] }‘ curl -XPUT ‘localhost:9200/vipinfo/user/2?pretty‘ -H ‘Content-Type: application/json‘ -d‘ { "first_name": "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests": [ "music" ] }‘ curl -XPUT ‘localhost:9200/vipinfo/user/3?pretty‘ -H ‘Content-Type: application/json‘ -d‘ { "first_name": "Douglas", "last_name" : "Fir", "age" : 35, "about": "I like to build cabinets", "interests": [ "forestry" ] }‘
主键不能重复(id) 不指定为随机生成
随机指定有利于提高效率,因为数据库在插入时会判断是否id重复
另外一部分,则需要先做聚类、分类处理,将聚合出的分类结果存入ES集群的聚类索引中。数据处理层的聚合结果存入ES中的指定索引,同时将每个聚合主题相关的数据存入每个document下面的某个field下。