The SPECIALIST Lexicon API

2023-10-28 03:18
文章标签 api specialist lexicon

本文主要是介绍The SPECIALIST Lexicon API,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

The SPECIALIST Lexicon JAVA API使用

affix 为词缀,按缀位分为 prefix (前缀)和 suffix(后缀);
按缀形分成 inflection (屈折词)和 derivation (衍生词)
derivation 分为 prefix 和 suffix,如:happy 加suffix为happily,加prefix为unhappy.
inflection 只在词尾加词缀,表时态,数,格等变化,如:ask,asks,asking,asked,etc.

derivation 派生词 改变词性和语义

inflection 语法变化

LvgCmdApi

全部组件说明 lvg2021/docs/designDoc/UDF/flow/index.html

-f:a

缩写扩展

-f:b

uninflect a term 还原单词形态

it can make plural nouns in to singular nouns, inflected verbs into their infinitive forms, and adjectives and adverbs into their positive forms.

复数转换成单数,动词转换成不定式,副词形容词转换成原级(不能转换成名词)

-f:An 

Anti-Normalize (Approximate Match)

‎使用规范化术语作为输入返回词汇中的转换后的术语。可用作基本近似匹配。

‎在词典中找到近似匹配,可用于不规范术语转换

The order of the results is sorted by alphabetical, EUI, category, and then inflection.

String outputFromLvg = null;
LvgCmdApi lvgApi = new LvgCmdApi("-f:An", "D:/lvg2021/data/config/lvg.properties");// ---------------------------------       
// process each term
// ---------------------------------
outputFromLvg = lvgApi.MutateToString("term");

-f:d

Generate derivational variants

生成派生词

派生规则文件 lvg2021/docs/designDoc/UDF/derivations/index.html

Derivational variants are generated by FACTs (a pre-computed derivational table) and morphology rules (RULEs). Facts are stored in database and retrieved by SQL query. RULEs are stored and retrieved through Trie mechanism.

派生转换由FACT(预计算的派生表)和形态规则(RULEs)生成。FACTs存储在数据库中,由SQL查询检索。RULEs通过Trie机制存储和检索。

-f:dc~数字

以数字指定派生词词性

CategoryValue
adj1
adv2
aux4
compl8
conj16
det32
modal64
noun128
prep256
pron512
verb

1024

String outputFromLvg = null;
LvgCmdApi lvgApi = new LvgCmdApi("-f:dc~128", "D:/lvg2021/data/config/lvg.properties");
outputFromLvg = lvgApi.MutateToString(w);
String[] outs = outputFromLvg.split("\n");
if (outputFromLvg.length()>0) {for (String out : outs) {derivword.add(out.split("\\|")[1]);}}

-f:d kdt:STR

限制派生类型

  • Z (zeroD): restricts the outputs zero derivations of the input.无变化
  • S (suffixD): restricts the outputs suffix derivations of the input. 后缀
  • P (prefixD): restricts the outputs prefix derivations of the input. 前缀
  • ZS (zeroD and suffixD): restricts the outputs zero and suffix derivations of the input. This is one of the most used options with query expansion for CUI mapping. 
  • ZP (zeroD and prefixD): restricts the outputs zero and prefix derivations of the input.
  • SP (suffixD and prefixD): restricts the outputs suffix and prefix derivations of the input.
  • ZSP (all): No restriction on the outputs on derivation type. All zeroD (Z), suffixD (S), and prefixD (P) are displayed. This is the default option.

-f:f

 Filter output to contain only forms from the lexicon.

过滤词典中不存在的,只返回一条记录

inflection输出过滤 -k:i:1 

输出派生变体过滤 -k:d:1

-f:i

Generate inflectional variants

生成屈折变体

-f:Ln

从数据库中检索单词类别(词性)和变体信息

-f:nom

Retrieve nominalizations form for an input term.

输入的标准化形式

-f:N3

=LuiNorm?

normalize non-ASCII Unicode characters to ASCII, remove genitives, then remove parenthetic plural forms, then replace punctuations with spaces, then remove stop words, then lowercase, then uninflected words, then take each of the normalized uninflected words and map them to their canonical form, then strip or map non-ASCII Unicode characters to ASCII, and then word order sort.

非ASCII字符转换,删除所有格,删除括号复数,替换标点符号为空格,小写,词形还原,转为正式名称,排序单词

-f:r

递归生成同义词

Norm API

lvg2021/docs/userDoc/examples/norm.html

同 -f:q0:g:rs:o:t:l:B:Ct:q7:q8:w

  1. q0: map Unicode symbols and punctuation to ASCII
  2. g: remove genitives,
  3. rs: then remove parenthetic plural forms of (s), (es), (ies), (S), (ES), and (IES),
  4. o: then replace punctuation with spaces,
  5. t: then remove stop words,
  6. l: then lowercase,
  7. B: then uninflect each word,
  8. Ct: then get citation form for each base form,
  9. q7: then Unicode Core Norm
  10. q8: then strip or map non-ASCII Unicode characters,
  11. w: and finally sort the words in alphabetic order.

生成的单词有可能不存在于词典中

right经norm后成ride 

import java.util.*;
import gov.nih.nlm.nls.lvg.Api.*;public class Normalization
{// test driverpublic static void main(String[] args){// instantiate a LvgApi object by config fileString lvgConfigFile= "/export/home/lu/Projects/LVG/lvg2012/data/config/lvg.properties";NormApi normApi = new NormApi(lvgConfigFile);// Process the inflectional variants mutationString in = "left"; // use lexItem as input to lvgApitry{Vector outs = normApi.Mutate(in);// PrintOut the Resultfor(String out: outs){System.out.println(in + "|" + out);}// clean upnormApi.CleanUp();}catch (Exception e){System.err.println("** ERR: " + e.toString());}}
}

输出形式

Field 1Field 2Field 3Field 4Field 5Field 6Field 7+
InputOutput TermCategoriesInflectionsFlow HistoryFlow NumberAdditional Information

output term:转换后的术语

categories:

BitValueVariantOther SymbolsExample
01adj
  • adjective
  • ADJ
  • red
12adv
  • adverb
  • ADV
  • quickly
24aux
  • auxiliary
  • be
  • is
  • are
  • do
  • have
  • has
38compl
  • complementizer
  • that
416conj
  • conjunction
  • CON
  • con
  • and
  • or
  • but
532det
  • determiner
  • DET
  • a
  • the
  • some
  • each
664modal.
  • can
  • dare
  • may
  • must
  • ought
  • shall
  • will
7128noun
  • NOM
  • NPR
  • dog
8256prep
  • preposition
  • PRE
  • pre
  • to
  • on
  • in
  • at
  • by
9512pron
  • pronoun
  • it
  • he
  • they
101024verb
  • VER
  • ver
  • break

inflection:

 

BitValueVariantOther SymbolsExample
01base.
  • dog
  • break
  • red
  • quickly
12comparative 比较级.
  • redder
24superlative 最高级.
  • reddest
38plural 复数
  • p
  • dogs
416presPart 现在分词
  • ing
  • breaking
532past 过去式.
  • broke
664pastPart  过去分词.
  • broken
7128pres3s 现在第三人称单数.
  • breaks
8256positive.
  • red
9512singular
  • s
  • dog
101024infinitive
  • inf
  • break
112048pres123p.
  • break
124096pastNeg.
  • didn't
  • couldn't
  • wouldn't
  • shouldn't
138192pres123pNeg.
  • don't
  • won't
1416384pres1s.
  • am
1532768past1p23pNeg.
  • weren't
1665536past1p23p.
  • were
17131072past1s3sNeg.
  • wasn't
18262144pres1p23p.
  • are
19524288pres1p23pNeg.
  • aren't
201048576past1s3s.
  • was
212097152pres.
  • can
224194304pres3sNeg.
  • isn't
  • hasn't
238388608presNeg.
  • can't
  • cannot

where:

  • pres: present
  • past: past
  • Part: participle
  • 1: first personal
  • 2: second personal
  • 3: third personal
  • s: singular
  • p: plural
  • Neg: Negative

additional information:-m

 

Sub-Term Mapping Tools (SMTM)

Sub-Term Mapping Tools (nih.gov)

LexItem Sub-Term Finder (LSF):

  • to find if a term is in the Lexicon
  • to find all sub-terms are in the Lexicon
  • to find all prefix sub-terms are in the Lexicon
  • to find the longest prefix sub-term in the Lexicon
//判断语料库中是否存在该词
LsfApi lsfApi = new LsfApi("D:/stmt2015/data/config/lsf.properties");String isincorpus = lsfApi.CheckInCorpus("alis");
//前缀? 对于单独的单词好像无法识别前缀
Vector<String> prefixes = lsfApi.FindPrefixes("cricoarytenoid");

 

这篇关于The SPECIALIST Lexicon API的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/290321

相关文章

PHP应用中处理限流和API节流的最佳实践

《PHP应用中处理限流和API节流的最佳实践》限流和API节流对于确保Web应用程序的可靠性、安全性和可扩展性至关重要,本文将详细介绍PHP应用中处理限流和API节流的最佳实践,下面就来和小编一起学习... 目录限流的重要性在 php 中实施限流的最佳实践使用集中式存储进行状态管理(如 Redis)采用滑动

Go语言使用net/http构建一个RESTful API的示例代码

《Go语言使用net/http构建一个RESTfulAPI的示例代码》Go的标准库net/http提供了构建Web服务所需的强大功能,虽然众多第三方框架(如Gin、Echo)已经封装了很多功能,但... 目录引言一、什么是 RESTful API?二、实战目标:用户信息管理 API三、代码实现1. 用户数据

Python用Flask封装API及调用详解

《Python用Flask封装API及调用详解》本文介绍Flask的优势(轻量、灵活、易扩展),对比GET/POST表单/JSON请求方式,涵盖错误处理、开发建议及生产环境部署注意事项... 目录一、Flask的优势一、基础设置二、GET请求方式服务端代码客户端调用三、POST表单方式服务端代码客户端调用四

SpringBoot结合Knife4j进行API分组授权管理配置详解

《SpringBoot结合Knife4j进行API分组授权管理配置详解》在现代的微服务架构中,API文档和授权管理是不可或缺的一部分,本文将介绍如何在SpringBoot应用中集成Knife4j,并进... 目录环境准备配置 Swagger配置 Swagger OpenAPI自定义 Swagger UI 底

使用Python的requests库调用API接口的详细步骤

《使用Python的requests库调用API接口的详细步骤》使用Python的requests库调用API接口是开发中最常用的方式之一,它简化了HTTP请求的处理流程,以下是详细步骤和实战示例,涵... 目录一、准备工作:安装 requests 库二、基本调用流程(以 RESTful API 为例)1.

SpringBoot监控API请求耗时的6中解决解决方案

《SpringBoot监控API请求耗时的6中解决解决方案》本文介绍SpringBoot中记录API请求耗时的6种方案,包括手动埋点、AOP切面、拦截器、Filter、事件监听、Micrometer+... 目录1. 简介2.实战案例2.1 手动记录2.2 自定义AOP记录2.3 拦截器技术2.4 使用Fi

Knife4j+Axios+Redis前后端分离架构下的 API 管理与会话方案(最新推荐)

《Knife4j+Axios+Redis前后端分离架构下的API管理与会话方案(最新推荐)》本文主要介绍了Swagger与Knife4j的配置要点、前后端对接方法以及分布式Session实现原理,... 目录一、Swagger 与 Knife4j 的深度理解及配置要点Knife4j 配置关键要点1.Spri

HTML5 getUserMedia API网页录音实现指南示例小结

《HTML5getUserMediaAPI网页录音实现指南示例小结》本教程将指导你如何利用这一API,结合WebAudioAPI,实现网页录音功能,从获取音频流到处理和保存录音,整个过程将逐步... 目录1. html5 getUserMedia API简介1.1 API概念与历史1.2 功能与优势1.3

使用Python实现调用API获取图片存储到本地的方法

《使用Python实现调用API获取图片存储到本地的方法》开发一个自动化工具,用于从JSON数据源中提取图像ID,通过调用指定API获取未经压缩的原始图像文件,并确保下载结果与Postman等工具直接... 目录使用python实现调用API获取图片存储到本地1、项目概述2、核心功能3、环境准备4、代码实现

无法启动此程序因为计算机丢失api-ms-win-core-path-l1-1-0.dll修复方案

《无法启动此程序因为计算机丢失api-ms-win-core-path-l1-1-0.dll修复方案》:本文主要介绍了无法启动此程序,详细内容请阅读本文,希望能对你有所帮助... 在计算机使用过程中,我们经常会遇到一些错误提示,其中之一就是"api-ms-win-core-path-l1-1-0.dll丢失