-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate portfolio processing into runner #45
Conversation
This PR integrates the runner with the PACTA portfolio processing code, integrating with Azure blob storage to read + write portfolios and reports. I've tested the `process_portfolio` piece against local Docker (i.e. `dockertask`), but I haven't tested the `create_report` flow yet in it's current incarnation. This PR is already large enough that I'll add the relevant async task stuff in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truly excellent PR. Thank you for writing this, it feels like the PACTA work is really coming together nicely.
return nil | ||
} | ||
|
||
func (h *handler) processPortfolio(ctx context.Context, req *task.ProcessPortfolioRequest) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very clear - thank you!
} | ||
|
||
func (h *handler) processPortfolio(ctx context.Context, req *task.ProcessPortfolioRequest) error { | ||
// Load the portfolio from blob storage, place it in /mnt/raw_portfolios, where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/portfolio/portfolios/(many places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually a question I had for you about how we want to architect async task running. I was originally thinking that the async runner would get a "portfolio group ID" or whatever, but that means it needs to have database access, to translate from "portfolio group X" to "list of assets in portfolios"
I'd like it to be as dumb as possible because it's a large Docker image with code from N different teams in N different languages, so ideally we could sculpt the interface to be "here are some files in blob store, do stuff to them and write them to this other blob store", which minimizes the permissions that we need to give the runner.
My question here is what the end-user experience/data model looks like and if that is a reasonable/achievable goal. Are there reasons we can't process portfolios within a portfolio group individually at this stage? Let's chat if it's not a simple answer.
paths = append(paths, strings.TrimSpace(line[idx+17:])) | ||
} | ||
|
||
// NOTE: This code could benefit from some concurrency, but I'm opting not to prematurely optimize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thought here as above, maybe less important, since the cardinality is O(report outputs) not O(portfolio inputs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, my knowledge of what the actual data steps look like is fuzzy, and I don't know what the discrete units of compute are
|
||
// Load the processed portfolio from blob storage, place it in /mnt/ | ||
// processed_portfolios, where the `create_report.R` script expects it | ||
// to be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably just missing context that you have - is a portfolio report always generated off of exactly one processed portfolio? I thought it (like above) could be multiple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question! I actually don't know, and I expect this code will require a bunch of tweaks once Alex's stub interface gets reconciled with the actual code interface
Specifically, add the `task_id` to log lines in the runner so we can figure out how long tasks are taking from the log output.
This PR integrates the runner with the PACTA portfolio processing code, integrating with Azure blob storage to read + write portfolios and reports.
It creates a minimal set of
pactasrv
APIs and frontend pages to test out the process of 1) uploading portfolios and 2) running the PACTA processing on them.I've tested the
process_portfolio
piece against local Docker (i.e.dockertask
), but I haven't tested thecreate_report
flow yet in it's current incarnation. This PR is already large enough that I'll add the relevant async task stuff in a follow-up PR.