Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a handler that appends a static date field to all outbound messages #136

Open
salsferrazza opened this issue Apr 11, 2023 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@salsferrazza
Copy link
Collaborator

Some feeds only provide timestamps as duration past midnight, assuming prior context of what day the processed messages are from. Since the messages themselves don't have this context, it would be useful to be able to append a statically specified date, convert that to Unix milliseconds, and append it as a manufactured int field to outbound messages. This would reduce friction for downstream SQL analytics on messages ingested from these feeds.

e.g --message_handlers AppendDateHandler:date=20291102

would turn that date into it's UNIX seconds equivalent, and append that as column to all outbound messages.

@salsferrazza salsferrazza added enhancement New feature or request good first issue Good for newcomers labels Apr 11, 2023
@mservidio
Copy link
Collaborator

This could be an additional option to the timestamp pull forward handler.

https://github.com/GoogleCloudPlatform/market-data-transcoder/blob/main/transcoder/message/handler/TimestampPullForwardHandler.py

@salsferrazza
Copy link
Collaborator Author

Good point, now with handlers being somewhat configurable, this could possibly just become a TimestampHandler with several modes: e.g.: pull forward from Seconds message, append static date, manufacture single timestamp column from a nanos timestamp + date, etc.

Algorithmically, normalizing dates from low-context streams (messages providing only nanos past midnight, e.g.) might look something like:

day = datetime.fromisoformat('20191230')            # YYYYMMDD from MessageHandler params
day_epoch = time.mktime(day.timetuple())            # UNIX seconds equivalent
midnight_in_nanos = day_epoch * 1000000000
epoch_nanos = midnight_in_nanos + msg['nanos_past_midnight']

Then the handler can manufacture any combination of those values as a field, or unify into a single field (like ts_epoch_nanos)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants