-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 list_objects_v2 paginator MaxItems only counts keys (Contents) not prefixes (CommonPrefixes) #2376
Comments
@bsmedberg-xometry - I am able to reproduce the issue. Thank you for pointing it out. Marking this as bug. |
@bsmedberg-xometry - After some digging into the code base i found that this is the expected behavior. When it is not a first request(means starting token is not included) then we are only considering the first result key response to truncate. So in this case we are only considering to truncate the response of Contents not CommonPrefixes. Can you please confirm whether its the number of keys you are getting , more than MaxItems or the number of commonprefixes ? |
In this case, there will be nothing in Contents ever, there will only be CommonPrefixes. And the problem is not the truncation: the problem is that even after getting 2000 CommonPrefixes, it keeps making calls forever. I have worked around this locally by doing something like this: import boto3
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
found_keys = 0
found_prefixes = 0
for result in paginator.paginate(Bucket='mybucket', Delimiter='/', PaginationConfig={'MaxItems': 2000}):
found_prefixes += len(result.get('CommonPrefixes', []))
found_keys += len(result.get('Contents', []))
if found_prefixes + found_keys > 2000:
break # stop iterating here to prevent eternal iteration However I don't believe that this is or should be the expected behavior of the boto3 paginator. If this is is the expected behavior of the paginator, then the paginator docs need to be updated to warn of this behavior. |
While I agree with @bsmedberg-xometry (hey man, what's up? 😁) that this behavior seems really wrong (what good is the That said, I think it would be very helpful to be able to explain why it works this way. Can someone explain that to me, so I can write this up properly? |
Or even better: add a behavior to the S3 listobjects API call that actually does the right thing. I'm happy for this to turn into an S3 feature request instead of a boto problem if the problem is actually that S3 doesn't provide a logical pagination API. |
@kdaily LMK whether this should be documented as per the current behavior or if this is going to be fixed or otherwise addressed in the code, so I can slot in updating the documentation. |
Greetings! It looks like this issue hasn’t been active in longer than one year. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one. |
I believe that this issue is still valid. Neither of the docs at https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#customizing-page-iterators or https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2 document this behavior, and there is still no way to limit the pagination to both prefixes and keys. |
@nateprewitt: please note whether this is something that needs to be addressed in boto3, or if it's a documentation issue and how it should be handled. |
Hi there, issue still persists. Is there any plan for resolution? |
I am working on collecting information to update content to reflect this information but it is taking time as I want to be sure I say the truth and not just a best guess at what everything means. Getting this resolved is on my current list of tasks, but it has been for a few weeks. It has not been forgotten! |
@bsmedberg-xometry I just read through this again -- you say that the paginator loops infinitely? I can't imagine why that would be expected behavior. Can someone please confirm this again for me? I can't reproduce that. Instead, I get 20,000 |
I have completed updating the documentation update for Botocore, and that PR is waiting to be merged. I'm now starting work on the same changes to the Boto3 guide. |
The PR for the boto3 guide changes is also now complete. |
Describe the bug
When using boto3 to iterate an S3 bucket with a Delimiter, MaxItems only counts the keys, not the prefixes. So if you have a bucket with only prefixes, MaxItems will never stop searching and may take unbounded time.
Steps to reproduce
Set up a bucket with 20000 keys of the form result1/results.txt ... result20000/results.txt
Run this code:
Expected behavior
The above program should return a maximum of 2000 keys. It actually returns all 20,000 keys, because MaxItems doesn't count prefixes.
The text was updated successfully, but these errors were encountered: