-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process s3 event messages from SQS #215
base: master
Are you sure you want to change the base?
Conversation
Added ability to consume s3 event messages from an SQS queue. This facilitates the option of having multiple consumers of s3event notifications for the same prefix. You could for example subscribe an SNS topic to the s3event for a given prefix and then have multiple sqs queues subscribed to that topic. This expands usage options for multiple varying consumers of the same s3 prefix event.
Hi, We'd like to expose the option of having multiple consumers of s3 events for a given prefix. AWS only allows you to have one consumer of s3 events for a given prefix. This consumer can be a Lambda, SQS, or SNS topic. If the RedshiftLambdaLoader could consume s3 events indirectly via an SQS queue this would open the option of using the SNS topic as the subscriber of s3 events. In this case we could subscribe multiple queues, for example if you wanted to load the data into Redshift via the RedshiftLambdaLoader but independently consumer from a separate queue with a process that ingests the data into an index. I've created the attached PR for review. Please note that in my example I've also allowed the processing of multiple sqs messages in the same request. In my testing I enabled raw message delivery on an SNS topic, to which I subscribed an SQS queue that the RedshiftLambdaLoader could consume from. This minimised the impact of the changes. Please note, I'm not fully convinced that processing multiple sqs mesages in the same request is safe with the batch locking logic but I believe it may handle this although perhaps it was not intended to do so? Could you review and let me know your thoughts on exposing such a feature? Would this approach cause you any concerns? Thanks, |
Hi @IanMeyers, would you mind reviewing my suggestion above? Much appreciated, |
Sorry have been on holiday. Will review monday the 10th. |
Thank you, have a great weekend. |
after testing, I confirmed my initial concerns that processing the sqs messages in a tight loop will cause problems. Failing further processing to Batch 7a2afa1f-7dc5-45e8-b0ad-677ac59738d1 which may be stuck in 'locked' state. If so, unlock the back using I'm still digging into the cause, I think a Redshift Maintenance window might be in play. Perhaps the batch was left in a locked state because the lambda timed out/did not mark the batch as failed. |
Lambda timeouts can indeed result in locked batches. |
yeah, I've seen logs that look like this It's possible I simply got unlucky and the function did not manage to fail the batch leaving it in a locked state. I've had my changes running through via SQS loading batches for 12 hours now. The data I'm testing with gets placed onto s3 every 15mins. |
I'll be testing this over a longer time period but this is ready for review now. I've increased my timeouts a little and added failed batch re-processing with SQS delayed delivery for the redshift maintenance windows. |
interestingly I noticed this in the logs(I don't think is in anyway related to this PR) 2020-08-12T13:48:12.102+01:00 2020-08-12T13:48:12.115+01:00 2020-08-12T13:48:12.458+01:00 I confirmed that the Remaining duration of -89145 insufficient to load cluster did pertain to the same request 1a5aa337-9f84-5f13-9486-f49a78bcbb46 by checking the batches table. This means the context.getRemainingTimeInMillis() function returned -89145 13ms after request invocation when my lambda had been configured with a whopping 2min 30 second timeout. The number makes no sense and I'm not convinced getRemainingTimeInMillis is behaving nicely. I've upgraded my nodejs runtime from 10.x to 12.x in case there was an AWS defect. I could find a record of one, but I'll see if that eliminates the strange behavior around context.getRemainingTimeInMillis() |
hmm. I can't regress this. Also, if this consistently happened, then no one's loads would ever succeed. I think there's something else going on here, but don't see anything obvious in the code that could cause it - this is a single instance of the function being used, and while it accesses |
yeah I've just seen the same issue with 12.x. Its infrequent, I've seen it 3 times in 48 batches. |
Can you please raise a support ticket? |
yes, I'll raise a ticket for it, I'll spend a little more time this afternoon to see if I can spot anything that might explain it. |
@IanMeyers this has gone through rounds of testing and is looking good. What are your thoughts of rolling this in? Kind Regards, Chris. |
Thanks for the info. In general I'm supportive, but need to do a code review. I will book out some time tomorrow (18/9) to do so. |
Thank you @IanMeyers |
[REQ-2126] - consume the message for bad events, log and move on.
[REQ-2126] safely consume and ignore aws test events which despite them being test in nature do not honour raw message delivery settings.
Added ability to consume s3 event messages from an SQS queue. This
facilitates the option of having multiple consumers of s3event
notifications for the same prefix. You could for example subscribe an
SNS topic to the s3event for a given prefix and then have multiple sqs
queues subscribed to that topic. This expands usage options for
multiple varying consumers of the same s3 prefix event.
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.