-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to choose or filter data and create multiple files on s3 #119
Comments
Can you give an example of a filter you'd like to apply?
How would you "route" which record goes to which file? |
Our json record contains a key called "type". Based on type, we want to create separate files. |
@BenFradet I don't think this feature is there yet right? |
No there is no such feature at the moment. However, this use case seems to be very specific to your setup and it would be hard to translate into a feature everyone can use since you'd want to inspect every json and act on their content. How would you do it? |
This is quite meta, but I think there's a case for us taking the logic of Kinesis Tee over time and making that embeddable in loaders like this, so you can do partitioning and final transformation inside a loader. |
@BenFradet Maybe but I think a lot of people would need this. Whosoever is using a single Kinesis stream to store all data would need this and majority are using 1 kinesis stream. @alexanderdean Something like that. A way we could add a filter to input Kinesis stream and choose the type of data we want to push to s3... For now, we just need 1 type of data to be put on s3 for later analysis, but it's pushing the other 3 types of data too causing ambiguity and large sized files to be put on s3. Elastic Co.. Logstash gives all the options, pulls from Kinesis, applies filters and then pushes to Elasticsearch + S3 but the problem there with s3 output is that when it creates an s3 file, the different JSON entries don't get separated by a \n causing analysis difficult. |
So, we are using Snowplow to read data from out Kinesis stream and put it to S3.
There is no option to apply a filter to choose what data to pull from Kinesis and put to s3..
Also, if there could be an option to pull different data from same kinesis stream and push it to different files in S3 like Logstash.
The text was updated successfully, but these errors were encountered: