laravel-china.org Elastcisearch 的配置与使用,为了全文搜索 | Laravel China 社区 PHPHub 39 - 49 分钟
file
最近公司项目要使用全文搜索引擎,之前使用过的 sphInx ,似乎没有那么好用了,而且中文分词也没有合适的 ,所以准备换个其它的来试试,老项目使用的是 thinkphp 3.1 框架,虽然框架老了点。但是新的想法还是可以用上的,这里只是简单演示下 elasticsearch 的上手体难,实际项目中还需要完善。
Elasticsearch 安装
因本文环境为 Laradock ,所以直接使用 elasticsearch的镜像即可,这里省略了 java 环境的安装及 elasticsearch 软件的安装,网上教程很多,请自行查找,后期会补上一个,这里默认已经安装好了。 浏览器打开 http://localhost:9200/ 或者终端执行
curl ‘http:
你会看到如下响应
{ name: “g2ODObY”, cluster_name: “laradock-cluster”, cluster_uuid: “w8Hhov2bQDi_Wo2DEx044Q”, version: { number: “6.2.3”, build_hash: “c59ff00”, build_date: “2018-03-13T10:06:29.741383Z”, build_snapshot: false, lucene_version: “7.2.1”, minimum_wire_compatibility_version: “5.6.0”, minimum_index_compatibility_version: “5.0.0” }, tagline: “You Know, for Search” }
如果响应正常显示,说明你安装成功了。 这里需注意一点,查看elasticsearch配置:vi config/elasticsearch.yml
network.host: 0.0.0.0
Elasticsearch 中文插件
这里使用的是 analysis-ik 中文插件,项目地址,需根据不同的 Elasticsearch 版本选择插本版本,本项目使用的最新的 6.2.3 版本。 进入 elasticsearch 目录
./bin/elasticsearch-plugin install https:
查看是否安装成功(注意你的 elasticsearch 版本,版本不同命令不同)
./bin/elasticsearch-plugin list
返回结果
analysis-ik ingest-geoip ingest-user-agent …
看到 analysis-ik 证明你插件安装成功了,你也可以到 plugins 目录下查看插件是否存在
$ cd plugins
drwxr-xr-x 2 root root 4096 Apr 17 03:08 analysis-ik drwxrwxr-x 2 elasticsearch root 4096 Mar 13 11:35 ingest-geoip drwxrwxr-x 2 elasticsearch root 4096 Mar 13 11:35 ingest-user-agent drwxrwxr-x 11 elasticsearch root 4096 Mar 13 11:35 x-pack
Elasticsearch 索引的使用
Thinkphp 没有 laravel 那么方便,但是用个 trait 还是可以的。 新建一个traits
<?php
use Elasticsearch\ClientBuilder;
trait Elastic
{
private $client;
public function __construct()
{
$hosts=[
env(‘ELASTICSEARCH_URL’,’localhost:9200’)
];
$this->client = ClientBuilder::create()
->setHosts($hosts)
->build();
}
}
laravel 的 env 实现其它是用的 vlucas/phpdotenv,所以我在 Thinkphp中也把他拿了过来。 这里首先实例化一个 client。 创建索引
$params = [
'index' => $index,
'body' => [
'settings' => [
'number_of_shards' => 1,
'number_of_replicas' => 0
],
'mappings' => [
$type => [
'_all'=>[
'enabled' => 'false'
],
'_source' => [
'enabled' => true
],
'properties' => [
'id' => [
'type' => 'integer',
],
'title' => [
'type' => 'text',
"analyzer"=> "ik_max_word",
"search_analyzer"=> "ik_max_word",
],
'body' => [
'type' => 'text',
"analyzer"=> "ik_max_word",
"search_analyzer"=> "ik_max_word",
]
]
]
]
]
];
return $this->client->indices()->create($params);
这里需要注意的是 analyzer, IK插件目前只支持两种: ik_max_word 和ik_smart,
ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合;
ik_smart : 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。
返回如下信息说明创建成功
array:3 [▼ “acknowledged” => true “shards_acknowledged” => true “index” => “my_index” ]
这里使用的 dd打印的结果,之后的结果承现一样用 dd打印。 删除索引
$params = [ ‘index’ => ‘my_index’, ]; return $this->client->indices()->delete($params);
返回如下
array:1 [▼ “acknowledged” => true ]
查看索引设置
$params = [‘index’ => ‘my_index’]; $response = $client->indices()->getSettings($params);
$params = [ ‘index’ => [ ‘my_index’, ‘my_index2’ ] ]; $response = $client->indices()->getSettings($params);
返回信息如下
array:1 [▼ “my_index” => array:1 [▼ “settings” => array:1 [▼ “index” => array:6 [▼ “creation_date” => “1524037463950” “number_of_shards” => “1” “number_of_replicas” => “0” “uuid” => “okYiWK0WRiqebMAHUCsvzA” “version” => array:1 [▼ “created” => “6020399” ] “provided_name” => “my_index” ] ] ] ]
查看 mapping 信息
$response = $client->indices()->getMapping();
$params = [‘index’ => ‘my_index’]; $response = $client->indices()->getMapping($params);
$params = [‘type’ => ‘my_type’ ]; $response = $client->indices()->getMapping($params);
$params = [ ‘index’ => ‘my_index’ ‘type’ => ‘my_type’ ]; $response = $client->indices()->getMapping($params);
$params = [ ‘index’ => [ ‘my_index’, ‘my_index2’ ] ]; $response = $client->indices()->getMapping($params);
返回如下代码
array:1 [▼ “my_index” => array:1 [▼ “mappings” => array:1 [▼ “my_type” => array:2 [▼ “_all” => array:1 [▼ “enabled” => false ] “properties” => array:3 [▼ “body” => array:2 [▼ “type” => “text” “analyzer” => “ik_max_word” ] “id” => array:1 [▼ “type” => “integer” ] “title” => array:2 [▼ “type” => “text” “analyzer” => “ik_max_word” ] ] ] ] ] ]
Elasticsearch 的增删改查 增加数据
1.增加单条数据
$data=[ ‘title’ => ‘我爱北京天安门’, ‘body’ => ‘天安门上太阳升’ ]; $params = [ ‘index’ => ‘my_index’, ‘type’ => ‘my_type’,
'body' => $data
];
return $this->client->index($params);
返回如下
array:8 [▼ “index” => “my_index” “_type” => “my_type” “_id” => “WKXd12IBwuLBOSSKe5k” “_version” => 1 “result” => “created” “_shards” => array:3 [▼ “total” => 2 “successful” => 1 “failed” => 0 ] “_seq_no” => 0 “_primary_term” => 1 ]
2.批量增加多条数据
$dataList =[ [ ‘id’ => ‘10001’, ‘title’ => ‘北京’, ‘body’ => ‘我们是首都’,
],[
'id' => '10002',
'title' => '上海',
'body' => '啊啦是上海人',
],[
'id' => '10003',
'title' => '广州',
'body' => '我们有小蛮腰',
],[
'id' => '10004',
'title' => '深圳',
'body' => '我们啥也没有,来了就是深圳人。',
],
];
foreach($dataList as $value){ $params[‘body’][] = [ ‘index’ => [ ‘_index’ => ‘my_index’, ‘_type’ => ‘my_type’, ‘_id’ =>$value[‘id’] ] ]; $params[‘body’][] = [ ‘id’ => $value[‘id’], ‘title’ => $value[‘title’], ‘body’ => $value[‘body’], ]; } return $this->client->bulk($params);
这里需注意,批量增加多条数据时并不是直接将数组扔进去,而是要进行处理,生成对应的数组后使用 bulk 方法批量创建。 返回如下
array:3 [▼ “took” => 27 “errors” => false “items” => array:4 [▼ 0 => array:1 [▼ “index” => array:9 [▼ “_index” => “my_index” “_type” => “my_type” “_id” => “10001” “_version” => 1 “result” => “created” “_shards” => array:3 [▼ “total” => 2 “successful” => 1 “failed” => 0 ] “_seq_no” => 1 “_primary_term” => 1 “status” => 201 ] ] 1 => array:1 [▼ “index” => array:9 [▼ “_index” => “my_index” “_type” => “my_type” “_id” => “10002” “_version” => 1 “result” => “created” “_shards” => array:3 [▼ “total” => 2 “successful” => 1 “failed” => 0 ] “_seq_no” => 2 “_primary_term” => 1 “status” => 201 ] ] …
这里使用了 $dataList 自带的 id,在实际项目中建议使用数据的 id,用做数据的唯一 id,方便通过 id 查询数据。 删除数据
删除文档只能单条删除,需指定数据 ID
$param = [ ‘index’ => ‘my_index’, ‘type’ => ‘my_type’, ‘id’ => ‘my_id’ ]; return $this->client->delete($param);
返回如下
array:1 [▼ “acknowledged” => true ]
查询数据
查询数据需指定数据 ID
$params = [ ‘index’ => ‘my_index’, ‘type’ => ‘my_type’, ‘id’ => ‘WKXd12IBwuLBOSSKe5k_’ ]; return $this->client->get($params);
返回如下
array:6 [▼ “index” => “my_index” “_type” => “my_type” “_id” => “WKXd12IBwuLBOSSKe5k” “_version” => 1 “found” => true “_source” => array:2 [▼ “title” => “我爱北京天安门” “body” => “天安门上太阳升” ] ]
数据修改
$params = [
‘index’ => ‘my_index’,
‘type’ => ‘my_type’,
‘id’ => ‘WKXd12IBwuLBOSSKe5k_’,
‘body’ => [
‘doc’ => [
‘age’ => 150
]
]
];
return $this->client->update($params);
返回如下
array:8 [▼ “index” => “my_index” “_type” => “my_type” “_id” => “WKXd12IBwuLBOSSKe5k” “_version” => 2 “result” => “updated” “_shards” => array:3 [▼ “total” => 2 “successful” => 1 “failed” => 0 ] “_seq_no” => 4 “_primary_term” => 1 ]
再次查询数据
array:6 [▼ “index” => “my_index” “_type” => “my_type” “_id” => “WKXd12IBwuLBOSSKe5k” “_version” => 2 “found” => true “_source” => array:3 [▼ “title” => “我爱北京天安门” “body” => “天安门上太阳升” “age” => 150 ] ]
发现返回数据中多了 age 字段, 修改成功。 搜索数据
先来个简单的.
$params = [
‘index’ => ‘my_index’,
‘type’ => ‘my_type’,
‘body’ => [
‘query’=>[
‘match’=>[
“title” => ‘北京’,
],
],
]
];
return $this->client->search($params);
返回如下
array:4 [▼ “took” => 188 “timed_out” => false “shards” => array:4 [▼ “total” => 5 “successful” => 5 “skipped” => 0 “failed” => 0 ] “hits” => array:3 [▼ “total” => 2 “max_score” => 1.6451461 “hits” => array:2 [▼ 0 => array:5 [▼ “_index” => “my_index” “_type” => “my_type” “_id” => “10001” “_score” => 1.6451461 “_source” => array:3 [▼ “id” => “10001” “title” => “北京” “body” => “我们是首都” ] ] 1 => array:5 [▼ “_index” => “my_index” “_type” => “my_type” “_id” => “WKXd12IBwuLBOSSKe5k” “_score” => 0.94175816 “_source” => array:3 [▼ “title” => “我爱北京天安门” “body” => “天安门上太阳升” “age” => 150 ] ] ] ] ]
可以看到,title 中包含北京 的数据已经全部返回了。
Elasticsearch 最重要的也是最灵活的就是搜索了,你能想到的方法基本上Elasticsearch 都已经帮你做好,比如: term,match,multi_match,range.prefix,wildcard,regexp.fuzzy,match_phrase,match_phrase_prefix,exists等等,了解具体方法的使用请参考:
Elasticsearch: 权威指南 » 深入搜索
Elasticsearch Reference » Query DSL
原文地址:http://www.qiehe.net/posts/4/the-use-and-configuration-of-elastcisearch-for-full-text-search
Good Good Study , Day Day Up!!
本文由 MeiLe 创作,采用 知识共享署名4.0 国际许可协议进行许可
本站文章除注明转载/出处外,均为本站原创或翻译,转载前请务必署名
最后编辑时间为:2018-01-15 22:06:42