Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggregations to dataframe always misses the first pagination #101

Closed
Andy7475 opened this issue Dec 17, 2021 · 4 comments
Closed

aggregations to dataframe always misses the first pagination #101

Andy7475 opened this issue Dec 17, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@Andy7475
Copy link

Hi,
This is a really useful package, thank you! I noticed though that scan_composite_agg has a bug. It misses the first page of aggregations. I think it is because you declare variable 'buckets', enter while loop, re-declare it, then iterate over it. So the first version of buckets never has a chance to be iterated over, which is why I presume we always miss the 1st pagination. . Hope that makes sense, think you just need to move a line

YOU DECLARE BUCKETS HERE
buckets: List[BucketDict] = r.aggregations.data[a_name][ # type: ignore
"buckets"
]
after_key: AfterKey = r.aggregations.data[a_name]["after_key"] # type: ignore

    init: bool = True
    while init or len(buckets) == size:
        init = False
        s._aggs = s._aggs.as_composite(size=size, after=after_key)
        r = s.execute()
        agg_clause_response = r.aggregations.data[a_name]

THEN CHANGE IT HERE, BEFORE YOU HAVE HAD A CHANCE TO ITERATE OVER THE OLD ONE
buckets = agg_clause_response["buckets"] # type: ignore ****MOVE THIS LINE TO LATER
for bucket in buckets:

@leonardbinet
Copy link
Collaborator

Hi @Andy7475 , thanks for finding this bug 👍 I'll merge the fix as soon as I regain admin rights on this repo.

@leonardbinet leonardbinet added the bug Something isn't working label Feb 17, 2022
@leonardbinet
Copy link
Collaborator

@Andy7475 fixes/features are available on pandagg 0.2.4 version (#115)

@Andy7475
Copy link
Author

Andy7475 commented Oct 31, 2022 via email

@Andy7475
Copy link
Author

Subject: Request for a new release of Pandagg on PyPI

Hi Leonard,
I hope this email finds you well. I was trying to install the latest version of the package via pip, but I noticed that the version on PyPI (0.2.4) is not the same as the latest release on GitHub master branch (0.2.1). 0.2.4 [dev branch] has a bug in search.py file with the line starting raw_data =..., but the master branch looks good.

I was wondering if it would be possible for you to publish a new version of the package from the master branch to PyPI, so that users can easily install the latest version using pip.

I understand that this may not be a priority for you, and I would be happy to assist in any way that I can. If there is anything I can do to help, please let me know.

Thank you for your time and for maintaining such a valuable package. I look forward to your response.

All the best,

Andy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging a pull request may close this issue.

2 participants