Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline parameter for Elasticsearch 5+ #29

Open
danizen opened this issue Apr 10, 2018 · 2 comments
Open

pipeline parameter for Elasticsearch 5+ #29

danizen opened this issue Apr 10, 2018 · 2 comments

Comments

@danizen
Copy link

danizen commented Apr 10, 2018

Elasticsearch 5+ supports ingestion pipelines. They are similar to taggers. Probably useful to support.

@essiembre
Copy link
Contributor

Couldn't they be set up independently of crawlers? You envision them as Committer configuration options?

@danizen
Copy link
Author

danizen commented Apr 12, 2018

Yup - just a pipeline configuration option that is passed to Elasticsearch to invoke that pipeline.
I don't need it, like, at all. Because taggers are more flexible than Elasticsearch pipelines. However, someone might.

I find the pipelines useful when fixing up data, e.g. do a query on all documents having a field "openi.error" and "openi.summary", and drop the "openi.error" column. Then, I can use an update_by_query that runs the pipeline.

Pipelines are more useful when someone doesn't have a norconex crawler, but since they can create javascript functions inside Elasticsearch, they may be committed to using it for some reason.

So, it is just a completeness of support thang that I didn't want to miss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants