Lucene5学习之SpanQuery跨度查询

2023-11-02 01:48

本文主要是介绍Lucene5学习之SpanQuery跨度查询,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

    SpanQuery下的子类有好几个,我就放一篇里集中说说。SpanQuery即跨度查询,首先要理解跨度这个概念,Lucene里跨度是用Spans这个类定义的,源码如下:

 

/** Expert: an enumeration of span matches.  Used to implement span searching.* Each span represents a range of term positions within a document.  Matches* are enumerated in order, by increasing document number, within that by* increasing start position and finally by increasing end position. */
public abstract class Spans {/** Move to the next match, returning true iff any such exists. */public abstract boolean next() throws IOException;/** Skips to the first match beyond the current, whose document number is* greater than or equal to <i>target</i>.* <p>The behavior of this method is <b>undefined</b> when called with* <code> target &le; current</code>, or after the iterator has exhausted.* Both cases may result in unpredicted behavior.* <p>Returns true iff there is such* a match.  <p>Behaves as if written: <pre class="prettyprint">*   boolean skipTo(int target) {*     do {*       if (!next())*         return false;*     } while (target > doc());*     return true;*   }* </pre>* Most implementations are considerably more efficient than that.*/public abstract boolean skipTo(int target) throws IOException;/** Returns the document number of the current match.  Initially invalid. */public abstract int doc();/** Returns the start position of the current match.  Initially invalid. */public abstract int start();/** Returns the end position of the current match.  Initially invalid. */public abstract int end();/*** Returns the payload data for the current span.* This is invalid until {@link #next()} is called for* the first time.* This method must not be called more than once after each call* of {@link #next()}. However, most payloads are loaded lazily,* so if the payload data for the current position is not needed,* this method may not be called at all for performance reasons. An ordered* SpanQuery does not lazy load, so if you have payloads in your index and* you do not want ordered SpanNearQuerys to collect payloads, you can* disable collection with a constructor option.<br>* <br>* Note that the return type is a collection, thus the ordering should not be relied upon.* <br/>* @lucene.experimental** @return a List of byte arrays containing the data of this payload, otherwise null if isPayloadAvailable is false* @throws IOException if there is a low-level I/O error*/// TODO: Remove warning after API has been finalizedpublic abstract Collection<byte[]> getPayload() throws IOException;/*** Checks if a payload can be loaded at this position.* <p/>* Payloads can only be loaded once per call to* {@link #next()}.** @return true if there is a payload available at this position that can be loaded*/public abstract boolean isPayloadAvailable() throws IOException;/*** Returns the estimated cost of this spans.* <p>* This is generally an upper bound of the number of documents this iterator* might match, but may be a rough heuristic, hardcoded value, or otherwise* completely inaccurate.*/public abstract long cost();
}

    跨度里包含了匹配Term的起始位置和结束位置信息以及跨度价值估算值以及payload信息等等。

 

   首先要说的就是SpanTermQuery,他和TermQuery用法很相似,唯一区别就是SapnTermQuery可以得到Term的span跨度信息,用法如下:

 

package com.yida.framework.lucene5.query;import java.io.IOException;import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.search.AutomatonQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MultiTermQuery;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.spans.SpanQuery;
import org.apache.lucene.search.spans.SpanTermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.automaton.Automata;
import org.apache.lucene.util.automaton.Automaton;
/*** SpanTermQuery用法测试* @author Lanxiaowei**/
public class SpanTermQueryTest {public static void main(String[] args) throws IOException {Directory dir = new RAMDirectory();Analyzer analyzer = new StandardAnalyzer();IndexWriterConfig iwc = new IndexWriterConfig(analyzer);iwc.setOpenMode(OpenMode.CREATE);IndexWriter writer = new IndexWriter(dir, iwc);Document doc = new Document();doc.add(new TextField("text", "the quick brown fox jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick red fox jumps over the sleepy cat", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick brown fox jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);writer.close();IndexReader reader = DirectoryReader.open(dir);IndexSearcher searcher = new IndexSearcher(reader);String queryString = "red";SpanQuery query = new SpanTermQuery(new Term("text",queryString));TopDocs results = searcher.search(query, null, 100);ScoreDoc[] scoreDocs = results.scoreDocs;for (int i = 0; i < scoreDocs.length; ++i) {//System.out.println(searcher.explain(query, scoreDocs[i].doc));int docID = scoreDocs[i].doc;Document document = searcher.doc(docID);String path = document.get("text");System.out.println("text:" + path);}}
}

   SpanNearQuery:用来匹配两个Term之间的跨度的,即一个Term经过几个跨度可以到达另一个Term,slop为跨度因子,用来限制两个Term之间的最大跨度,不可能一个Term和另一个Term之间要经过十万八千个跨度才到达也算两者相近,这不符合常理。所以有个slop因子进行限制。还有一个inOrder参数要引起注意,它用来设置是否允许进行倒序跨度,什么意思?即TermA到TermB不一定是从左到右去匹配也可以从右到左,而从右到左就是倒序,inOrder为true即表示order(顺序)很重要不能倒序去匹配必须正向去匹配,false则反之。注意停用词不在slop统计范围内

    Slop的理解很重要:

    在默认情况下slop的值是0, 就相当于TermQuery的精确匹配, 通过设置slop参数(比如"one five"匹配"one two three four five"就需要slop=3,如果slop=2就无法得到结果。这里我们可以认为slope是单词移动得次数,可以左移或者右移。这里特别提 醒,PhraseQuery不保证前后单词的次序,在上面的例子中,"two one"就需要2个slop,也就是认为one 向左边移动2位, 就是能够匹配的”one two”如果是“five three one” 就需要slope=6才能匹配。

 

还有一个collectPayloads参数表示是否收集payload信息,关于payload后面再单独说。

 

    SpanNearQuery的构造函数如下:

 

public SpanNearQuery(SpanQuery[] clauses, int slop, boolean inOrder, boolean collectPayloads) {// copy clauses array into an ArrayListthis.clauses = new ArrayList<>(clauses.length);for (int i = 0; i < clauses.length; i++) {SpanQuery clause = clauses[i];if (field == null) {                               // check fieldfield = clause.getField();} else if (clause.getField() != null && !clause.getField().equals(field)) {throw new IllegalArgumentException("Clauses must have same field.");}this.clauses.add(clause);}this.collectPayloads = collectPayloads;this.slop = slop;this.inOrder = inOrder;}

   SpanNearQuery使用示例:

 

 

/*** SpanNearQuery测试* @author Lanxiaowei**/
public class SpanNearQueryTest {public static void main(String[] args) throws IOException {Directory dir = new RAMDirectory();Analyzer analyzer = new StandardAnalyzer();IndexWriterConfig iwc = new IndexWriterConfig(analyzer);iwc.setOpenMode(OpenMode.CREATE);IndexWriter writer = new IndexWriter(dir, iwc);Document doc = new Document();doc.add(new TextField("text", "the quick brown fox jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick red fox jumps over the sleepy cat", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick brown fox jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);writer.close();IndexReader reader = DirectoryReader.open(dir);IndexSearcher searcher = new IndexSearcher(reader);String queryStringStart = "dog";String queryStringEnd = "quick";SpanQuery queryStart = new SpanTermQuery(new Term("text",queryStringStart));SpanQuery queryEnd = new SpanTermQuery(new Term("text",queryStringEnd));SpanQuery spanNearQuery = new SpanNearQuery(new SpanQuery[] {queryStart,queryEnd}, 6, false, false);TopDocs results = searcher.search(spanNearQuery, null, 100);ScoreDoc[] scoreDocs = results.scoreDocs;for (int i = 0; i < scoreDocs.length; ++i) {//System.out.println(searcher.explain(query, scoreDocs[i].doc));int docID = scoreDocs[i].doc;Document document = searcher.doc(docID);String path = document.get("text");System.out.println("text:" + path);}}
}

   示例中dog要到达quick需要经过6个跨度,需要从右至左倒序匹配,所以inOrder设置为false,如果设置为true会导致查询不出来数据。

 

   SpanNotQuery:使用场景是当使用SpanNearQuery时,如果两个Term从TermA到TermB有多种情况,即可能出现TermA或者TermB在索引中重复出现,则可能有多种情况,SpanNotQuery就是用来限制TermA和TermB之间不存在TermC,从而排除一些情况,实现更精确的控制。默认SpanNotQuery的构造函数是这样的:

 

/** Construct a SpanNotQuery matching spans from <code>include</code> which* have no overlap with spans from <code>exclude</code>.*/public SpanNotQuery(SpanQuery include, SpanQuery exclude) {this(include, exclude, 0, 0);}

 显然这里的第一个参数include应该是SpanNearQuery,第二个参数就是用来做排除的。

 

   SpanNotQuery另一个重载构造函数如下:

 

/** Construct a SpanNotQuery matching spans from <code>include</code> which* have no overlap with spans from <code>exclude</code> within * <code>dist</code> tokens of <code>include</code>. */public SpanNotQuery(SpanQuery include, SpanQuery exclude, int dist) {this(include, exclude, dist, dist);}

 它多加了一个dist参数,官方的解释是:Construct a SpanNotQuery matching spans from include which have no overlap with spans from exclude within dist tokens of include. 说白了就是,使用exclude限制以后匹配到以后,TermA和TermB之间间隔的字符长度做个限制,这就是dist的作用。

 

SpanNotQuery还有一个更复杂的构造函数重载:

 

/** Construct a SpanNotQuery matching spans from <code>include</code> which* have no overlap with spans from <code>exclude</code> within * <code>pre</code> tokens before or <code>post</code> tokens of <code>include</code>. */public SpanNotQuery(SpanQuery include, SpanQuery exclude, int pre, int post) {this.include = include;this.exclude = exclude;this.pre = (pre >=0) ? pre : 0;this.post = (post >= 0) ? post : 0;if (include.getField() != null && exclude.getField() != null && !include.getField().equals(exclude.getField()))throw new IllegalArgumentException("Clauses must have same field.");}

 最后一个post参数其实就是dist,pre参数就是限制exclude Term前面有几个字符。这样解释太抽象,用示例代码来说明吧:

 

 

package com.yida.framework.lucene5.query;import java.io.IOException;import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.spans.SpanNearQuery;
import org.apache.lucene.search.spans.SpanNotQuery;
import org.apache.lucene.search.spans.SpanQuery;
import org.apache.lucene.search.spans.SpanTermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;/*** SpanNotQuery测试* @author Lanxiaowei**/
public class SpanNotQueryTest {public static void main(String[] args) throws IOException {Directory dir = new RAMDirectory();Analyzer analyzer = new StandardAnalyzer();IndexWriterConfig iwc = new IndexWriterConfig(analyzer);iwc.setOpenMode(OpenMode.CREATE);IndexWriter writer = new IndexWriter(dir, iwc);Document doc = new Document();doc.add(new TextField("text", "the quick brown fox jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick red fox jumps over the sleepy cat", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick brown fox quick gox jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick brown adult slave nice fox winde felt testcase gox quick jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick brown fox quick jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);writer.close();IndexReader reader = DirectoryReader.open(dir);IndexSearcher searcher = new IndexSearcher(reader);String queryStringStart = "dog";String queryStringEnd = "quick";String excludeString = "fox";SpanQuery queryStart = new SpanTermQuery(new Term("text",queryStringStart));SpanQuery queryEnd = new SpanTermQuery(new Term("text",queryStringEnd));SpanQuery excludeQuery = new SpanTermQuery(new Term("text",excludeString));SpanQuery spanNearQuery = new SpanNearQuery(new SpanQuery[] {queryStart,queryEnd}, 12, false, false);SpanNotQuery spanNotQuery = new SpanNotQuery(spanNearQuery, excludeQuery, 4,3);TopDocs results = searcher.search(spanNotQuery, null, 100);ScoreDoc[] scoreDocs = results.scoreDocs;for (int i = 0; i < scoreDocs.length; ++i) {//System.out.println(searcher.explain(query, scoreDocs[i].doc));int docID = scoreDocs[i].doc;Document document = searcher.doc(docID);String path = document.get("text");System.out.println("text:" + path);}}
}

   示例代码意思就是查询dog和quick之间没有fox的索引文档,自己运行示例代码参悟吧。

 

   SpanOrQuery顾名思义就是把多个Span'Query用or连接起来,其实你也可以用BooleanQuery来代替SpanOrQuery,但SpanOrQuery会返回额外的Span跨度信息,它的构造函数如下:

 

SpanOrQuery(SpanQuery... clauses) 

    接收多个SpanQuery对象并用or连接起来,下面是SpanOrQuery示例代码:

 

   

package com.yida.framework.lucene5.query;import java.io.IOException;import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.spans.SpanNearQuery;
import org.apache.lucene.search.spans.SpanNotQuery;
import org.apache.lucene.search.spans.SpanOrQuery;
import org.apache.lucene.search.spans.SpanQuery;
import org.apache.lucene.search.spans.SpanTermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;/*** SpanOrQuery测试* @author Lanxiaowei**/
public class SpanOrQueryTest {public static void main(String[] args) throws IOException {Directory dir = new RAMDirectory();Analyzer analyzer = new StandardAnalyzer();IndexWriterConfig iwc = new IndexWriterConfig(analyzer);iwc.setOpenMode(OpenMode.CREATE);IndexWriter writer = new IndexWriter(dir, iwc);Document doc = new Document();doc.add(new TextField("text", "the quick brown fox jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick red fox jumps over the sleepy cat", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick brown fox quick gox jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick brown adult slave nice fox winde felt testcase gox quick jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick brown adult sick slave nice fox winde felt testcase fox quick jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "the quick brown fox quick jumps over the lazy dog", Field.Store.YES));writer.addDocument(doc);writer.close();IndexReader reader = DirectoryReader.open(dir);IndexSearcher searcher = new IndexSearcher(reader);String queryStringStart = "dog";String queryStringEnd = "quick";String excludeString = "fox";String termString = "sick";SpanQuery queryStart = new SpanTermQuery(new Term("text",queryStringStart));SpanQuery queryEnd = new SpanTermQuery(new Term("text",queryStringEnd));SpanQuery excludeQuery = new SpanTermQuery(new Term("text",excludeString));SpanQuery spanNearQuery = new SpanNearQuery(new SpanQuery[] {queryStart,queryEnd}, 12, false, false);SpanNotQuery spanNotQuery = new SpanNotQuery(spanNearQuery, excludeQuery, 4,3);SpanQuery spanTermQuery = new SpanTermQuery(new Term("text",termString));SpanOrQuery spanOrQuery = new SpanOrQuery(spanNotQuery,spanTermQuery);TopDocs results = searcher.search(spanOrQuery, null, 100);ScoreDoc[] scoreDocs = results.scoreDocs;for (int i = 0; i < scoreDocs.length; ++i) {//System.out.println(searcher.explain(query, scoreDocs[i].doc));int docID = scoreDocs[i].doc;Document document = searcher.doc(docID);String path = document.get("text");System.out.println("text:" + path);}}
}

   SpanMultiTermQueryWrapper:就是一个Query转换器,用于把MultiTermQuery包装转换成SpanQuery的,具体使用示例,我贴下官方API里提供的示例代码吧:

 

 

WildcardQuery wildcard = new WildcardQuery(new Term("field", "bro?n"));SpanQuery spanWildcard = new SpanMultiTermQueryWrapper<WildcardQuery>(wildcard);

 SpanPositionRangeQuery:这个query是用来限制匹配的情况是否分布在(start,end)这个区间内,区间索引从零开始计算,拿示例代码说话,

 

 

package com.yida.framework.lucene5.query;import java.io.IOException;import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.search.FuzzyQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.spans.SpanMultiTermQueryWrapper;
import org.apache.lucene.search.spans.SpanNearQuery;
import org.apache.lucene.search.spans.SpanNotQuery;
import org.apache.lucene.search.spans.SpanPositionRangeQuery;
import org.apache.lucene.search.spans.SpanQuery;
import org.apache.lucene.search.spans.SpanTermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;/*** SpanPositionRangeQuery测试* @author Lanxiaowei**/
public class SpanPositionRangeQueryTest {public static void main(String[] args) throws IOException {Directory dir = new RAMDirectory();Analyzer analyzer = new StandardAnalyzer();IndexWriterConfig iwc = new IndexWriterConfig(analyzer);iwc.setOpenMode(OpenMode.CREATE);IndexWriter writer = new IndexWriter(dir, iwc);Document doc = new Document();doc.add(new TextField("text", "quick brown fox", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "jumps over lazy broun dog", Field.Store.YES));writer.addDocument(doc);doc = new Document();doc.add(new TextField("text", "jumps over extremely very lazy broxn dog", Field.Store.YES));writer.addDocument(doc);writer.close();IndexReader reader = DirectoryReader.open(dir);IndexSearcher searcher = new IndexSearcher(reader);FuzzyQuery fq = new FuzzyQuery(new Term("text", "broan"));SpanQuery sfq = new SpanMultiTermQueryWrapper<FuzzyQuery>(fq);SpanPositionRangeQuery spanPositionRangeQuery = new SpanPositionRangeQuery(sfq, 3, 5);TopDocs results = searcher.search(spanPositionRangeQuery, null, 100);ScoreDoc[] scoreDocs = results.scoreDocs;for (int i = 0; i < scoreDocs.length; ++i) {//System.out.println(searcher.explain(query, scoreDocs[i].doc));int docID = scoreDocs[i].doc;Document document = searcher.doc(docID);String path = document.get("text");System.out.println("text:" + path);}}
}

 稍微解释下上面的代码,首先呢,FuzzyQuery fq = new FuzzyQuery(new Term("text", "broan"));用来查询包含跟单词broan相似字符的索引文档,显然第一个索引文档不符合排除了一个,然后呢,我们new了一个SpanQuery包装器Wrapper,把FuzzyQuery转换成了SpanQuery,然后使用SpanPositionRangeQuery对匹配到的2种情况的落放的位置进行限制即跟broan相似的单词必须分布在(3,5)这个区间内,显然第3个索引文档是分布在(3,6)这个区间内,所以第3个索引文档被排除了,最后只返回第2个索引文档。

 

    SpanPositionRangeQuery还有个子类SpanFirstQuery,其实SpanFirstQuery只不过是把SpanPositionRangeQuery构造函数里的start参数值设置为0,仅此而已,所以不用多说,你也懂的,它的构造函数如下:

SpanFirstQuery(SpanQuery match, int end) 
Construct a SpanFirstQuery matching spans in match whose end position is less than or equal to end.

 这也就是为什么只有一个end,没有start,因为start默认为零,看源码:


 SpanFirstQuery示例我就不提供了,略过。

   最后一个要说的就是FieldMaskingSpanQuery,它用于在多个域之间查询,即把另一个域看作某个域,从而看起来就像在同一个域里查询,因为Lucene默认某个条件只能作用在单个域上,不支持跨域查询只能在同一个域里查询,所以有了FieldMaskingSpanQuery,,下面是示例代码:

package com.yida.framework.lucene5.query;import java.io.IOException;import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.spans.FieldMaskingSpanQuery;
import org.apache.lucene.search.spans.SpanNearQuery;
import org.apache.lucene.search.spans.SpanQuery;
import org.apache.lucene.search.spans.SpanTermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;/*** FieldMaskingSpanQuery测试* @author Lanxiaowei**/
public class FieldMaskingSpanQueryTest {public static void main(String[] args) throws IOException {Directory dir = new RAMDirectory();Analyzer analyzer = new StandardAnalyzer();IndexWriterConfig iwc = new IndexWriterConfig(analyzer);iwc.setOpenMode(OpenMode.CREATE);IndexWriter writer = new IndexWriter(dir, iwc);Document doc = new Document();doc.add(new Field("teacherid", "1", Field.Store.YES, Field.Index.NOT_ANALYZED));doc.add(new Field("studentfirstname", "james", Field.Store.YES, Field.Index.NOT_ANALYZED));doc.add(new Field("studentsurname", "jones", Field.Store.YES, Field.Index.NOT_ANALYZED));writer.addDocument(doc);//teacher2doc = new Document();doc.add(new Field("teacherid", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));doc.add(new Field("studentfirstname", "james", Field.Store.YES, Field.Index.NOT_ANALYZED));doc.add(new Field("studentsurname", "smith", Field.Store.YES, Field.Index.NOT_ANALYZED));doc.add(new Field("studentfirstname", "sally", Field.Store.YES, Field.Index.NOT_ANALYZED));doc.add(new Field("studentsurname", "jones", Field.Store.YES, Field.Index.NOT_ANALYZED));writer.addDocument(doc);writer.close();IndexReader reader = DirectoryReader.open(dir);IndexSearcher searcher = new IndexSearcher(reader);SpanQuery q1  = new SpanTermQuery(new Term("studentfirstname", "james"));SpanQuery q2  = new SpanTermQuery(new Term("studentsurname", "jones"));SpanQuery q2m = new FieldMaskingSpanQuery(q2, "studentfirstname");Query query = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false);TopDocs results = searcher.search(query, null, 100);ScoreDoc[] scoreDocs = results.scoreDocs;for (int i = 0; i < scoreDocs.length; ++i) {//System.out.println(searcher.explain(query, scoreDocs[i].doc));int docID = scoreDocs[i].doc;Document document = searcher.doc(docID);String teacherid = document.get("teacherid");System.out.println("teacherid:" + teacherid);}}
}

   OK,SpanQuery就说这么多,接下来要说的就是PhraseQuery。

 

   如果你还有什么问题请加我Q-Q:7-3-6-0-3-1-3-0-5,

或者加裙
一起交流学习!

这篇关于Lucene5学习之SpanQuery跨度查询的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!


原文地址:
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.chinasem.cn/article/327269

相关文章

在Java中基于Geotools对PostGIS数据库的空间查询实践教程

《在Java中基于Geotools对PostGIS数据库的空间查询实践教程》本文将深入探讨这一实践,从连接配置到复杂空间查询操作,包括点查询、区域范围查询以及空间关系判断等,全方位展示如何在Java环... 目录前言一、相关技术背景介绍1、评价对象AOI2、数据处理流程二、对AOI空间范围查询实践1、空间查

MySQL基本查询示例总结

《MySQL基本查询示例总结》:本文主要介绍MySQL基本查询示例总结,本文通过实例代码给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值,需要的朋友参考下吧... 目录Create插入替换Retrieve(读取)select(确定列)where条件(确定行)null查询order by语句li

重新对Java的类加载器的学习方式

《重新对Java的类加载器的学习方式》:本文主要介绍重新对Java的类加载器的学习方式,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录1、介绍1.1、简介1.2、符号引用和直接引用1、符号引用2、直接引用3、符号转直接的过程2、加载流程3、类加载的分类3.1、显示

MySQL中like模糊查询的优化方案

《MySQL中like模糊查询的优化方案》在MySQL中,like模糊查询是一种常用的查询方式,但在某些情况下可能会导致性能问题,本文将介绍八种优化MySQL中like模糊查询的方法,需要的朋友可以参... 目录1. 避免以通配符开头的查询2. 使用全文索引(Full-text Index)3. 使用前缀索

基于Redis实现附近商铺查询功能

《基于Redis实现附近商铺查询功能》:本文主要介绍基于Redis实现-附近商铺查询功能,这个功能将使用到Redis中的GEO这种数据结构来实现,需要的朋友可以参考下... 目录基于Redis实现-附近查询1.GEO相关命令2.使用GEO来实现以下功能3.使用Java实现简China编程单的附近商铺查询4.Red

mysql递归查询语法WITH RECURSIVE的使用

《mysql递归查询语法WITHRECURSIVE的使用》本文主要介绍了mysql递归查询语法WITHRECURSIVE的使用,WITHRECURSIVE用于执行递归查询,特别适合处理层级结构或递归... 目录基本语法结构:关键部分解析:递归查询的工作流程:示例:员工与经理的层级关系解释:示例:树形结构的数

SQL常用操作精华之复制表、跨库查询、删除重复数据

《SQL常用操作精华之复制表、跨库查询、删除重复数据》:本文主要介绍SQL常用操作精华之复制表、跨库查询、删除重复数据,这些SQL操作涵盖了数据库开发中最常用的技术点,包括表操作、数据查询、数据管... 目录SQL常用操作精华总结表结构与数据操作高级查询技巧SQL常用操作精华总结表结构与数据操作复制表结

MySQL 复合查询案例详解

《MySQL复合查询案例详解》:本文主要介绍MySQL复合查询案例详解,本文给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值,需要的朋友参考下吧... 目录基本查询回顾多表笛卡尔积子查询与where子查询多行子查询多列子查询子查询与from总结合并查询(不太重要)union基本查询回顾查询

MySQL复合查询从基础到多表关联与高级技巧全解析

《MySQL复合查询从基础到多表关联与高级技巧全解析》本文主要讲解了在MySQL中的复合查询,下面是关于本文章所需要数据的建表语句,感兴趣的朋友跟随小编一起看看吧... 目录前言:1.基本查询回顾:1.1.查询工资高于500或岗位为MANAGER的雇员,同时还要满足他们的姓名首字母为大写的J1.2.按照部门

MyBatis模糊查询报错:ParserException: not supported.pos 问题解决

《MyBatis模糊查询报错:ParserException:notsupported.pos问题解决》本文主要介绍了MyBatis模糊查询报错:ParserException:notsuppo... 目录问题描述问题根源错误SQL解析逻辑深层原因分析三种解决方案方案一:使用CONCAT函数(推荐)方案二: