How to match exact text in Lucene search?(如何在 Lucene 搜索中匹配精确文本?)
问题描述
我正在尝试匹配 TITLE 列中的文本Config migration from ASA5505 8.2 to ASA5516.
Im trying to match a text Config migration from ASA5505 8.2 to ASA5516 in column TITLE.
我的程序是这样的.
Directory directory = FSDirectory.open(indexDir);
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(Version.LUCENE_35,new String[] {"TITLE"}, new StandardAnalyzer(Version.LUCENE_35));
IndexReader reader = IndexReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
queryParser.setPhraseSlop(0);
queryParser.setLowercaseExpandedTerms(true);
Query query = queryParser.parse("TITLE:Config migration from ASA5505 8.2 to ASA5516");
System.out.println(queryStr);
TopDocs topDocs = searcher.search(query,100);
System.out.println(topDocs.totalHits);
ScoreDoc[] hits = topDocs.scoreDocs;
System.out.println(hits.length + " Record(s) Found");
for (int i = 0; i < hits.length; i++) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println(""Title :" " +d.get("TITLE") );
}
但它的回归
"Title :" Config migration from ASA5505 8.2 to ASA5516
"Title :" Firewall migration from ASA5585 to ASA5555
"Title :" Firewall migration from ASA5585 to ASA5555
第二个 2 结果不是预期的.所以需要什么修改才能匹配确切的文本配置从 ASA5505 8.2 迁移到 ASA5516
Second 2 results are not expected.So what modification required to match exact text Config migration from ASA5505 8.2 to ASA5516
我的索引函数看起来像这样
And my indexing function looks like this
public class Lucene {
public static final String INDEX_DIR = "./Lucene";
private static final String JDBC_DRIVER = "oracle.jdbc.OracleDriver";
private static final String CONNECTION_URL = "jdbc:oracle:thin:xxxxxxx"
private static final String USER_NAME = "localhost";
private static final String PASSWORD = "localhost";
private static final String QUERY = "select * from TITLE_TABLE";
public static void main(String[] args) throws Exception {
File indexDir = new File(INDEX_DIR);
Lucene indexer = new Lucene();
try {
Date start = new Date();
Class.forName(JDBC_DRIVER).newInstance();
Connection conn = DriverManager.getConnection(CONNECTION_URL, USER_NAME, PASSWORD);
SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_35, analyzer);
IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir), indexWriterConfig);
System.out.println("Indexing to directory '" + indexDir + "'...");
int indexedDocumentCount = indexer.indexDocs(indexWriter, conn);
indexWriter.close();
System.out.println(indexedDocumentCount + " records have been indexed successfully");
System.out.println("Total Time:" + (new Date().getTime() - start.getTime()) / (1000));
} catch (Exception e) {
e.printStackTrace();
}
}
int indexDocs(IndexWriter writer, Connection conn) throws Exception {
String sql = QUERY;
Statement stmt = conn.createStatement();
stmt.setFetchSize(100000);
ResultSet rs = stmt.executeQuery(sql);
int i = 0;
while (rs.next()) {
System.out.println("Addind Doc No:" + i);
Document d = new Document();
System.out.println(rs.getString("TITLE"));
d.add(new Field("TITLE", rs.getString("TITLE"), Field.Store.YES, Field.Index.ANALYZED));
d.add(new Field("NAME", rs.getString("NAME"), Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(d);
i++;
}
return i;
}
}
推荐答案
PVR 是正确的,在这里使用短语查询可能是正确的解决方案,但是他们错过了如何使用 PhraseQuery
类.不过,您已经在使用 QueryParser
,因此只需将搜索文本括在引号中即可使用查询解析器语法:
PVR is correct, that using a phrase query is probably the right solution here, but they missed on how to use the PhraseQuery
class. You are already using QueryParser
though, so just use the query parser syntax by enclosing you search text in quotes:
Query query = queryParser.parse("TITLE:"Config migration from ASA5505 8.2 to ASA5516"");
<小时>
根据您的更新,您在索引时和查询时使用了不同的分析器.SimpleAnalyzer
和 StandardAnalyzer
不做同样的事情.除非您有很好的理由不这样做,否则您应该在索引和查询时以相同的方式进行分析.
Based on your update, you are using a different analyzer at index-time and query-time. SimpleAnalyzer
and StandardAnalyzer
don't do the same things. Unless you have a very good reason to do otherwise, you should analyze the same way when indexing and querying.
因此,将索引代码中的分析器更改为 StandardAnalyzer
(反之亦然,在查询时使用 SimpleAnalyzer
),您应该会看到更好的结果.
So, change the analyzer in your indexing code to StandardAnalyzer
(or vice-versa, use SimpleAnalyzer
when querying), and you should see better results.
这篇关于如何在 Lucene 搜索中匹配精确文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何在 Lucene 搜索中匹配精确文本?
- value & 是什么意思?0xff 在 Java 中做什么? 2022-01-01
- Eclipse 插件更新错误日志在哪里? 2022-01-01
- Spring Boot连接到使用仲裁器运行的MongoDB副本集 2022-01-01
- C++ 和 Java 进程之间的共享内存 2022-01-01
- Java包名称中单词分隔符的约定是什么? 2022-01-01
- 将log4j 1.2配置转换为log4j 2配置 2022-01-01
- 如何使用WebFilter实现授权头检查 2022-01-01
- 从 finally 块返回时 Java 的奇怪行为 2022-01-01
- Jersey REST 客户端:发布多部分数据 2022-01-01
- Safepoint+stats 日志,输出 JDK12 中没有 vmop 操作 2022-01-01