Integrate portfolio processing into runner #45

bcspragu · 2023-10-30T15:32:17Z

This PR integrates the runner with the PACTA portfolio processing code, integrating with Azure blob storage to read + write portfolios and reports.

It creates a minimal set of pactasrv APIs and frontend pages to test out the process of 1) uploading portfolios and 2) running the PACTA processing on them.

I've tested the process_portfolio piece against local Docker (i.e. dockertask), but I haven't tested the create_report flow yet in it's current incarnation. This PR is already large enough that I'll add the relevant async task stuff in a follow-up PR.

This PR integrates the runner with the PACTA portfolio processing code, integrating with Azure blob storage to read + write portfolios and reports. I've tested the `process_portfolio` piece against local Docker (i.e. `dockertask`), but I haven't tested the `create_report` flow yet in it's current incarnation. This PR is already large enough that I'll add the relevant async task stuff in a follow-up PR.

gbdubs

Truly excellent PR. Thank you for writing this, it feels like the PACTA work is really coming together nicely.

cmd/runner/main.go

gbdubs · 2023-11-06T12:12:51Z

cmd/runner/main.go

+	return nil
+}
+
+func (h *handler) processPortfolio(ctx context.Context, req *task.ProcessPortfolioRequest) error {


This is very clear - thank you!

gbdubs · 2023-11-06T12:13:16Z

cmd/runner/main.go

+}
+
+func (h *handler) processPortfolio(ctx context.Context, req *task.ProcessPortfolioRequest) error {
+	// Load the portfolio from blob storage, place it in /mnt/raw_portfolios, where


s/portfolio/portfolios/(many places)

That's actually a question I had for you about how we want to architect async task running. I was originally thinking that the async runner would get a "portfolio group ID" or whatever, but that means it needs to have database access, to translate from "portfolio group X" to "list of assets in portfolios"

I'd like it to be as dumb as possible because it's a large Docker image with code from N different teams in N different languages, so ideally we could sculpt the interface to be "here are some files in blob store, do stuff to them and write them to this other blob store", which minimizes the permissions that we need to give the runner.

My question here is what the end-user experience/data model looks like and if that is a reasonable/achievable goal. Are there reasons we can't process portfolios within a portfolio group individually at this stage? Let's chat if it's not a simple answer.

gbdubs · 2023-11-06T12:14:43Z

cmd/runner/main.go

+		paths = append(paths, strings.TrimSpace(line[idx+17:]))
+	}
+
+	// NOTE: This code could benefit from some concurrency, but I'm opting not to prematurely optimize.


Same thought here as above, maybe less important, since the cardinality is O(report outputs) not O(portfolio inputs).

Same comment as above, my knowledge of what the actual data steps look like is fuzzy, and I don't know what the discrete units of compute are

gbdubs · 2023-11-06T12:16:36Z

cmd/runner/main.go

+
+	// Load the processed portfolio from blob storage, place it in /mnt/
+	// processed_portfolios, where the `create_report.R` script expects it
+	// to be.


I'm probably just missing context that you have - is a portfolio report always generated off of exactly one processed portfolio? I thought it (like above) could be multiple?

Great question! I actually don't know, and I expect this code will require a bunch of tweaks once Alex's stub interface gets reconciled with the actual code interface

cmd/runner/main.go

Specifically, add the `task_id` to log lines in the runner so we can figure out how long tasks are taking from the log output.

bcspragu requested a review from gbdubs October 30, 2023 15:33

bcspragu merged commit 0aa5fcb into main Oct 30, 2023
2 checks passed

bcspragu deleted the brandon/runner-blobs branch October 30, 2023 15:50

gbdubs reviewed Nov 6, 2023

View reviewed changes

bcspragu added a commit that referenced this pull request Nov 7, 2023

Address some feedback from #45

5cbf241

Specifically, add the `task_id` to log lines in the runner so we can figure out how long tasks are taking from the log output.

bcspragu added a commit that referenced this pull request Nov 7, 2023

Address some feedback from #45 (#54)

90a1cd2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate portfolio processing into runner #45

Integrate portfolio processing into runner #45

bcspragu commented Oct 30, 2023

gbdubs left a comment

gbdubs Nov 6, 2023

gbdubs Nov 6, 2023

bcspragu Nov 7, 2023

gbdubs Nov 6, 2023

bcspragu Nov 7, 2023

gbdubs Nov 6, 2023

bcspragu Nov 7, 2023

Integrate portfolio processing into runner #45

Integrate portfolio processing into runner #45

Conversation

bcspragu commented Oct 30, 2023

gbdubs left a comment

Choose a reason for hiding this comment

gbdubs Nov 6, 2023

Choose a reason for hiding this comment

gbdubs Nov 6, 2023

Choose a reason for hiding this comment

bcspragu Nov 7, 2023

Choose a reason for hiding this comment

gbdubs Nov 6, 2023

Choose a reason for hiding this comment

bcspragu Nov 7, 2023

Choose a reason for hiding this comment

gbdubs Nov 6, 2023

Choose a reason for hiding this comment

bcspragu Nov 7, 2023

Choose a reason for hiding this comment