Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for scrolling #103

Closed
priamai opened this issue Feb 2, 2022 · 6 comments
Closed

Support for scrolling #103

priamai opened this issue Feb 2, 2022 · 6 comments
Labels
bug Something isn't working

Comments

@priamai
Copy link

priamai commented Feb 2, 2022

Hi there,
would be nice also to support scrolling/iteration for simple queries like this:

q = s.query("match", cat__keyword='A').filter('term', act__keyword='B').size(10000)

response = q.execute()
data_df = response.hits.to_dataframe()

currently we are limited by size.

@leonardbinet
Copy link
Collaborator

@priamai you can use pandagg.search.Search.scan:

q = s.query("match", cat__keyword='A').filter('term', act__keyword='B')
for hit in q.scan():
    # do stuff
    print(hit)

@priamai
Copy link
Author

priamai commented Feb 15, 2022

What would be the equivalent of the match_all query?

q = s.query("match_all")
for hit in q.scan():
    # do stuff
    print(hit)

Gives me:

DSL class match_all does not exist in query.

@priamai
Copy link
Author

priamai commented Feb 15, 2022

I was also reading:

scan() → Iterator[pandagg.response.Hit][[source]](https://pandagg.readthedocs.io/en/latest/_modules/pandagg/search.html#Search.scan)[](https://pandagg.readthedocs.io/en/latest/reference/pandagg.search.html#pandagg.search.Search.scan)
Turn the search into a scan search and return a generator that will iterate over all the documents matching the query.

Use params method to specify any additional arguments you with to pass to the underlying scan helper from elasticsearch-py - https://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.scan

But if I try:

for hit in s.scan(params={'scroll':'2m'}):

It says params not expected.

@leonardbinet
Copy link
Collaborator

The match_all query clause was missing, I've added it in the PR above (will merge as soon as I have rights).

You can perform a search without any clause, simply by executing your search without query :)

from elasticsearch import Elasticsearch
from pandagg import Search

client = Elasticsearch()
s = Search(using=client, index="ny*")
s.execute()

@leonardbinet
Copy link
Collaborator

I've made an additional PR to fix the scan method, good catch 👍

@leonardbinet leonardbinet added the bug Something isn't working label Feb 17, 2022
@leonardbinet
Copy link
Collaborator

@priamai fixes/features are available on 0.2.4 version (#115)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants