-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add kstem token filter docs #8150 #8473
Changes from all commits
3b71e8a
18971b3
8a25852
9eccfbf
0c7e4c1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
--- | ||
layout: default | ||
title: KStem | ||
parent: Token filters | ||
nav_order: 220 | ||
--- | ||
|
||
# KStem token filter | ||
Check failure on line 8 in _analyzers/token-filters/kstem.md GitHub Actions / style-job
Check failure on line 8 in _analyzers/token-filters/kstem.md GitHub Actions / style-job
|
||
|
||
The `kstem` token filter is a stemming filter used to reduce words to their root forms. The filter is a lightweight algorithmic stemmer designed for the English language that performs the following stemming operations: | ||
|
||
- Reduces plurals to their singular form. | ||
- Converts different verb tenses to their base form. | ||
- Removes common derivational endings, such as "-ing" or "-ed". | ||
|
||
The `kstem` token filter is equivalent to the a `stemmer` filter configured with a `light_english` language. It provides a more conservative stemming compared to other stemming filters like `porter_stem`. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Line 16: Remove either "a" or "the" preceding |
||
The `kstem` token filter is based on the Lucene KStemFilter. For more information, see the [Lucene documentation](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/en/KStemFilter.html). | ||
|
||
## Example | ||
|
||
The following example request creates a new index named `my_kstem_index` and configures an analyzer with a `kstem` filter: | ||
|
||
```json | ||
PUT /my_kstem_index | ||
{ | ||
"settings": { | ||
"analysis": { | ||
"filter": { | ||
"kstem_filter": { | ||
"type": "kstem" | ||
} | ||
}, | ||
"analyzer": { | ||
"my_kstem_analyzer": { | ||
"type": "custom", | ||
"tokenizer": "standard", | ||
"filter": [ | ||
"lowercase", | ||
"kstem_filter" | ||
] | ||
} | ||
} | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"content": { | ||
"type": "text", | ||
"analyzer": "my_kstem_analyzer" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Generated tokens | ||
|
||
Use the following request to examine the tokens generated using the analyzer: | ||
|
||
```json | ||
POST /my_kstem_index/_analyze | ||
{ | ||
"analyzer": "my_kstem_analyzer", | ||
"text": "stops stopped" | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
The response contains the generated tokens: | ||
|
||
```json | ||
{ | ||
"tokens": [ | ||
{ | ||
"token": "stop", | ||
"start_offset": 0, | ||
"end_offset": 5, | ||
"type": "<ALPHANUM>", | ||
"position": 0 | ||
}, | ||
{ | ||
"token": "stop", | ||
"start_offset": 6, | ||
"end_offset": 13, | ||
"type": "<ALPHANUM>", | ||
"position": 1 | ||
} | ||
] | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.