es 分词器(五)之elasticsearch-analysis-jieba 8.7.0

2024-05-16 01:36

本文主要是介绍es 分词器(五)之elasticsearch-analysis-jieba 8.7.0,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

es 分词器(五)之elasticsearch-analysis-jieba 8.7.0

今天咱们就来讲一下es jieba 8.7.0 分词器的实现,以及8.x其它版本的实现方式,如果想直接使用es 结巴8.x版本,请直接修改pom文件的elasticsearch.version版本号即可,然后打包安装就行,不需要做太多的操作。

一、elasticsearch-jieba-plugin

最近更新的版本为8.4.1,最近更新的时间停留在2022年,从这之后便无人维护此开源项目
GitHub地址:​​https://github.com/sing1ee/elasticsearch-jieba-plugin/tree/8.4.1​​

二、elasticsearch-analysis-jieba

最近更新的版本为6.8.17,比上面的插件更惨,已经有三年无人维护了。
Github地址:​​https://github.com/huaban/elasticsearch-analysis-jieba/tree/dependabot/maven/org.elasticsearch-elasticsearch-6.8.17​​

三、决定换壳elasticsearch-jieba-plugin

当前我开发的项目采用的版本为8.7.0,目前在网上无法找到与之匹配的版本。
ik分词器用户比jieba分词器用户多,因为会对应的es版本不断更新,目前ik分词器的版本已经更新至8.12.2,2024年5月14日位置es的最新版本为8.14.x
2024年5月14日es最新版本为8.14.x

四、编译elasticsearch-analysis-jieba分词器

由于原有的插件【elasticsearch-analysis-jieba】已经很久没有人使用,但我又感觉【elasticsearch-analysis-jieba】这个名称比【elasticsearch-jieba-plugin】【https://github.com/sing1ee/elasticsearch-jieba-plugin/tree/8.4.1】这个好听一点,所以我本地新开了一个【elasticsearch-analysis-jieba】项目,将这个【elasticsearch-jieba-plugin】这个项目的代码复制到新建的项目中,因为这个【elasticsearch-jieba-plugin】使用的是gradle管理,我想使用的是maven仓库,所以修改了一下。

image-20240515221232488

4.1 新增pom.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><name>elasticsearch-analysis-jieba</name><modelVersion>4.0.0</modelVersion><groupId>org.elasticsearch</groupId><artifactId>elasticsearch-analysis-jieba</artifactId><version>${elasticsearch.version}</version><packaging>jar</packaging><description>jieba Analyzer for Elasticsearch</description><inceptionYear>2011</inceptionYear><properties><elasticsearch.version>8.7.0</elasticsearch.version><maven.compiler.target>17</maven.compiler.target><elasticsearch.assembly.descriptor>${project.basedir}/src/main/assemblies/plugin.xml</elasticsearch.assembly.descriptor><elasticsearch.plugin.name>analysis-jieba</elasticsearch.plugin.name><elasticsearch.plugin.classname>org.elasticsearch.plugin.analysis.jieba.AnalysisJiebaPlugin</elasticsearch.plugin.classname><elasticsearch.plugin.jvm>true</elasticsearch.plugin.jvm><tests.rest.load_packaged>false</tests.rest.load_packaged><skip.unit.tests>true</skip.unit.tests></properties><licenses><license><name>The Apache Software License, Version 2.0</name><url>http://www.apache.org/licenses/LICENSE-2.0.txt</url><distribution>repo</distribution></license></licenses><developers><developer><name>INFINI Labs</name><email>hello@infini.ltd</email><organization>INFINI Labs</organization><organizationUrl>https://infinilabs.com</organizationUrl></developer></developers><parent><groupId>org.sonatype.oss</groupId><artifactId>oss-parent</artifactId><version>9</version></parent><distributionManagement><snapshotRepository><id>oss.sonatype.org</id><url>https://oss.sonatype.org/content/repositories/snapshots</url></snapshotRepository><repository><id>oss.sonatype.org</id><url>https://oss.sonatype.org/service/local/staging/deploy/maven2/</url></repository></distributionManagement><repositories><repository><id>oss.sonatype.org</id><name>OSS Sonatype</name><releases><enabled>true</enabled></releases><snapshots><enabled>true</enabled></snapshots><url>https://oss.sonatype.org/content/repositories/releases/</url></repository></repositories><dependencies><dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>${elasticsearch.version}</version><scope>compile</scope></dependency><dependency><groupId>com.huaban</groupId><artifactId>jieba-analysis</artifactId><version>1.0.2</version></dependency><dependency><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-api</artifactId><version>2.19.0</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.13.2</version><scope>test</scope></dependency></dependencies><build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><version>3.5.1</version><configuration><source>${maven.compiler.target}</source><target>${maven.compiler.target}</target></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-surefire-plugin</artifactId><version>2.11</version><configuration><includes><include>**/*Tests.java</include></includes></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-source-plugin</artifactId><version>2.1.2</version><executions><execution><id>attach-sources</id><goals><goal>jar</goal></goals></execution></executions></plugin><plugin><artifactId>maven-assembly-plugin</artifactId><configuration><appendAssemblyId>false</appendAssemblyId><outputDirectory>${project.build.directory}/releases/</outputDirectory><descriptors><descriptor>${basedir}/src/main/assemblies/plugin.xml</descriptor></descriptors><archive><manifest><mainClass>fully.qualified.MainClass</mainClass></manifest></archive></configuration><executions><execution><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build><profiles><profile><id>disable-java8-doclint</id><activation><jdk>[1.8,)</jdk></activation><properties><additionalparam>-Xdoclint:none</additionalparam></properties></profile><profile><id>release</id><build><plugins><plugin><groupId>org.sonatype.plugins</groupId><artifactId>nexus-staging-maven-plugin</artifactId><version>1.6.3</version><extensions>true</extensions><configuration><serverId>oss</serverId><nexusUrl>https://oss.sonatype.org/</nexusUrl><autoReleaseAfterClose>true</autoReleaseAfterClose></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-release-plugin</artifactId><version>2.1</version><configuration><autoVersionSubmodules>true</autoVersionSubmodules><useReleaseProfile>false</useReleaseProfile><releaseProfiles>release</releaseProfiles><goals>deploy</goals></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><version>3.5.1</version><configuration><source>${maven.compiler.target}</source><target>${maven.compiler.target}</target></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-gpg-plugin</artifactId><version>1.5</version><executions><execution><id>sign-artifacts</id><phase>verify</phase><goals><goal>sign</goal></goals></execution></executions></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-source-plugin</artifactId><version>2.2.1</version><executions><execution><id>attach-sources</id><goals><goal>jar-no-fork</goal></goals></execution></executions></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-javadoc-plugin</artifactId><version>2.9</version><executions><execution><id>attach-javadocs</id><goals><goal>jar</goal></goals></execution></executions></plugin></plugins></build></profile></profiles>
</project>

4.2 修改plugin-descriptor.properties文件

# Elasticsearch plugin descriptor file
# This file must exist as 'plugin-descriptor.properties' at
# the root directory of all plugins.
#
# A plugin can be 'site', 'jvm', or both.
#
### example site plugin for "foo":
#
# foo.zip <-- zip file for the plugin, with this structure:
#   _site/ <-- the contents that will be served
#   plugin-descriptor.properties <-- example contents below:
#
# site=true
# description=My cool plugin
# version=1.0
#
### example jvm plugin for "foo"
#
# foo.zip <-- zip file for the plugin, with this structure:
#   <arbitrary name1>.jar <-- classes, resources, dependencies
#   <arbitrary nameN>.jar <-- any number of jars
#   plugin-descriptor.properties <-- example contents below:
#
# jvm=true
# classname=foo.bar.BazPlugin
# description=My cool plugin
# version=2.0.0-rc1
# elasticsearch.version=2.0
# java.version=1.7
#
### mandatory elements for all plugins:
#
# 'description': simple summary of the plugin
description=${project.description}
#
# 'version': plugin's version
version=${project.version}
#
# 'name': the plugin name
name=${elasticsearch.plugin.name}
#
# 'classname': the name of the class to load, fully-qualified.
classname=${elasticsearch.plugin.classname}
#
# 'java.version' version of java the code is built against
# use the system property java.specification.version
# version string must be a sequence of nonnegative decimal integers
# separated by "."'s and may have leading zeros
java.version=${maven.compiler.target}
#
# 'elasticsearch.version' version of elasticsearch compiled against
# You will have to release a new version of the plugin for each new
# elasticsearch release. This version is checked when the plugin
# is loaded so Elasticsearch will refuse to start in the presence of
# plugins with the incorrect elasticsearch.version.
elasticsearch.version=${elasticsearch.version}

4.3 新增plugin-security.policy文件

grant {// needed because of the hot reload functionalitypermission java.net.SocketPermission "*", "connect,resolve";permission java.lang.RuntimePermission "setContextClassLoader";
};

4.4 构建插件

打包

image-20240515221635642

找到打包之后的zip包

image-20240515221711379

放到elasticsearch-8.7.0/plugin/analysis-jieba目录下。

image-20240515221917297

现在,再手动重启一下es就将elasticsearch-analysis-jieba分词器安装好啦。

五、测试jieba分词器

在kibana中创建索引

PUT jieba_index
{"settings": {"analysis": {"analyzer": {"my_ana": {"tokenizer": "jieba_index","filter": ["lowercase"]}}}}
}

文本分词器

PUT jieba_index/_analyze
{"analyzer" : "my_ana","text" : "黄河之水天上来"
}

返回结果

{"tokens": [{"token": "黄河","start_offset": 0,"end_offset": 2,"type": "word","position": 0},{"token": "黄河之水天上来","start_offset": 0,"end_offset": 7,"type": "word","position": 0},{"token": "之水","start_offset": 2,"end_offset": 4,"type": "word","position": 1},{"token": "天上","start_offset": 4,"end_offset": 6,"type": "word","position": 2},{"token": "上来","start_offset": 5,"end_offset": 7,"type": "word","position": 2}]
}

这篇关于es 分词器(五)之elasticsearch-analysis-jieba 8.7.0的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/993502

相关文章

SpringBoot2.0.4整合elasticsearch为5.6.10

网上找了很一些,很多跑不起来,可能是我的环境和介绍的环境不一样,自己搞重新搞一下! 环境说明: spring boot 使用2.0.4elasticsearch为5.6.10本地安装ES集群为 6.x版本 第一步 使用IDEA创建Spring boot web项目,使用spring boot 使用2.0.4版本, elasticsearch为5.6.10 <parent><groupI

springboot 整合elasticsearch 跨版本处理方式

场景 springboot 在整合elasticsearch 通常会受到elasticsearch 版本的影响, 1.一套代码可能要兼容多个elasticsearch 的情况 2.低版本的无法调用高版本一些方法(例如:delete_by_query) 实现逻辑 用高版本的去做参数封装,用restTemplate 去发送请求(草台战法) 实现代码 依赖 <dependency><gr

Elasticsearch:向量相似度技术和评分

作者:来自 Elastic Valentin Crettaz 当需要搜索自由文本并且 Ctrl+F / Cmd+F 不再有效时,使用词法搜索引擎通常是你想到的下一个合理选择。 词汇搜索引擎擅长分析要搜索的文本并将其标记为可在搜索时匹配的术语,但在理解和理解被索引和搜索的文本的真正含义时通常会表现不佳。 这正是向量搜索引擎的闪光点。 他们可以对同一文本进行索引,以便可以根据它所代表的含义及其与具

elasticsearch资料记录

elasticsearch官网python elasticsearch APIelasticsearch 中文权威指南elasticsearch中文社区elasticsearch jieba pluginelasticsearch ik plugin

使用filebeat采集文件到es中

继上一篇logstash采集日志。 之前用logstash做日志采集,但是发现logstash很占用机器资源导致机器运行有点慢。查询资料表明logstash使用Java编写,插件是使用jruby编写,对机器的资源要求会比较高,网上有一篇关于其性能测试的报告。之前做过和filebeat的测试对比。在采集日志方面,对CPU,内存上都要比前者高很多。那么果断使用filebeat作为替代方案。走起!

jiebaNET中文分词器

最近我接手了一个有趣的需求,需要对用户评价进行分词,进行词频统计和情绪分析,并且根据词频权重制成词云图以供后台数据统计,于是我便引入了jieba分词器,但是我发现网上关于jiebaNET相关文档实在太少了,甚至连配置文件都很难搞清楚,所以我来大家介绍一下jiebaNET。 jiebaNET分词器介绍: ieba.NET分词器是一款基于.NET平台的中文分词工具,它借鉴了jieba分词器的算

ES 数据写入方式:直连 VS Flink 集成系统

ES 作为一个分布式搜索引擎,从扩展能力和搜索特性上而言无出其右,然而它有自身的弱势存在,其作为近实时存储系统,由于其分片和复制的设计原理,也使其在数据延迟和一致性方面都是无法和 OLTP(Online Transaction Processing)系统相媲美的。 也正因如此,通常它的数据都来源于其他存储系统同步而来,做二次过滤和分析的。这就引入了一个关键节点,即 ES 数据的同步写入方式,本文

Elasticsearch 8.1官网文档梳理 - 十四、Query DSL(ES 查询语法)

Query DSL Elasticsearch 提供了一种基于JSON 的查询 DSL (Domain Specific Language) 来定义查询。可以把查询 DSL 看作是查询的 AST(Abstract Syntax Tree),由两种类型的子句组成: 叶子节点查询: 叶子查询子句在特定字段中查找特定值,例如 match,term,range 查询。这类查询可以单独使用 复合查询子句

Elasticsearch:利用Redis缓存解决ES查询延迟问题

因为ES的近时效性,所以insert或update es的数据的时候短时间可能查询不到(1s左右) 在es中新增的document会被收集到indexing buffer区后被重写成一个segment然后直接写入filesystem cache中,这个操作是非常轻量级的,相对耗时较少,之后经过一定的间隔或外部触发后才会被flush到磁盘上,这个操作非常耗时。但只要sengment文件被写入

ElasticSearch java API - 聚合查询

以球员信息为例,player索引的player type包含5个字段,姓名,年龄,薪水,球队,场上位置。 index的mapping为: "mappings": {"player": {"properties": {"name": {"index": "not_analyzed","type": "string"},"age": {"type": "integer"},"salary":