How do I use ASCIIFoldingFilter in my Lucene app?(如何在我的 Lucene 应用程序中使用 ASCIIFoldingFilter?)
问题描述
我有一个从索引中搜索的标准 Lucene 应用程序.我的索引包含很多法语术语,我想使用 ASCIIFoldingFilter.
I have a standard Lucene app which searches from an index. My index contains a lot of french terms and I'd like to use the ASCIIFoldingFilter.
我已经做了很多搜索,但我不知道如何使用它.构造函数接受一个 TokenStream 对象,当您向它发送一个字段时,我是否调用分析器上检索 TokenStream 的方法?那我该怎么办?有人可以指出一个使用 TokenFilter 的例子吗?谢谢.
I've done a lot of searching and I have no idea how to use it. The constructor takes a TokenStream object, do I call the method on the analyzer that retrieves a TokenStream when you send it a field? Then what do I do? Can someone point me to an example where a TokenFilter is being used? Thanks.
推荐答案
令牌过滤器 - 就像 ASCIIFoldingFilter - 在它们的基础上是一个 TokenStream,所以它们是分析器主要通过使用以下方法返回的东西:
The token filters - like the ASCIIFoldingFilter - are at their base a TokenStream, so they are something that the Analyzer returns mainly by use of the following method:
public abstract TokenStream tokenStream(String fieldName, Reader reader);
如您所见,过滤器将 TokenStream 作为输入.它们的作用类似于包装器,或者更准确地说,类似于输入的 装饰器.这意味着它们增强了包含的 TokenStream 的行为,同时执行它们的操作和包含的输入的操作.
As you have noticed, the filters take a TokenStream as an input. They act like wrappers or, more correctly said, like decorators to their input. That means they enhance the behavior of the contained TokenStream, performing both their operation and the operation of the contained input.
您可以在这里找到解释.它不是直接引用 ASCIIFoldingFilter 但同样的原则适用.基本上,您创建一个自定义分析器,其中包含类似的内容(精简示例):
You can find an explanation here. It is not directly refering to an ASCIIFoldingFilter but the same principle applies. Basically, you create a custom Analyzer with something like this in it (stripped down example):
public class CustomAnalyzer extends Analyzer {
// other content omitted
// ...
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
// etc etc ...
result = new StopFilter(result, yourSetOfStopWords);
result = new ASCIIFoldingFilter(result);
return result;
}
// ...
}
TokenFilter 和 Tokenizer 都是 TokenStream 的子类.
Both the TokenFilter and the Tokenizer are subclasses of TokenStream.
还请记住,您必须在索引和搜索中使用相同的自定义分析器,否则您可能会在查询中得到不正确的结果.
Remember also that you must make use of the same custom analyzer both in indexing and searching or you might get incorrect results in your queries.
这篇关于如何在我的 Lucene 应用程序中使用 ASCIIFoldingFilter?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何在我的 Lucene 应用程序中使用 ASCIIFoldingFilter?
- C++ 和 Java 进程之间的共享内存 2022-01-01
- value & 是什么意思?0xff 在 Java 中做什么? 2022-01-01
- Jersey REST 客户端:发布多部分数据 2022-01-01
- Java包名称中单词分隔符的约定是什么? 2022-01-01
- Spring Boot连接到使用仲裁器运行的MongoDB副本集 2022-01-01
- 将log4j 1.2配置转换为log4j 2配置 2022-01-01
- 从 finally 块返回时 Java 的奇怪行为 2022-01-01
- Eclipse 插件更新错误日志在哪里? 2022-01-01
- Safepoint+stats 日志,输出 JDK12 中没有 vmop 操作 2022-01-01
- 如何使用WebFilter实现授权头检查 2022-01-01