Skip to content
This repository has been archived by the owner on May 31, 2024. It is now read-only.

feat: Download and report WES run log files #319

Merged
merged 4 commits into from
Mar 22, 2022

Conversation

adamnovak
Copy link
Contributor

Issue #, if available: N/A

Description of Changes

Currently, AGC does not retrieve the stdout and stderr URLs referenced by the WES server for a workflow run. The URLs themselves are available in the code, but their content is not fetched and displayed to the user. This can make it hard for the user to debug a workflow run with an engine like Toil (because error logs for individual workflows are not available).

This PR extends the WES API bindings with a method that can fetch the URLs returned with workflow logs, by evaluating them relative to the URL of the document containing them and by properly signing the requests. It also reports these logs on the command line when available.

Description of how you validated changes

As part of 822e51a I used these changes to test and debug the Toil engine for #317.

Checklist

  • If this change would make any existing documentation invalid, I have included those updates within this PR
  • I have added unit tests that prove my fix is effective or that my feature works
    I don't see any tests at the moment for GetRunLog on the Manager (and there's no suite to put one in), or any tests for the WES client, but I added some tests for the CLI output based on what the Manager returns that exercise the new fields at that level.
  • I have linted my code before raising the PR
  • Title of this Pull Request follows Conventional Commits standard: https://www.conventionalcommits.org/en/v1.0.0/

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@patmagee
Copy link

Hey @adamnovak does this add the ability to get run logs from the AGC or from the WES API directly? The WES specification requires that run and task level logs are accesible using the same credentials that the rest of the WES api uses. What I have done in the past is basically have a proxy endpoint which I simply redirect users to a signed_url of the actual log contents if that helps?

@adamnovak
Copy link
Contributor Author

@patmagee What this does is make the agc WES client actually query the log content URLs that the WES server sends, and then display those logs to the user. I'm not touching any of the WES server implementations here.

* @param dataUrl The URL string from the GetRunLog call
@return string
*/
func (a *WorkflowExecutionServiceApiService) GetRunLogData(ctx _context.Context, runId string, dataUrl string) (string, *_nethttp.Response, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this is inspired by the OpenAPI-generated api_workflow_execution_service.go? Could we be a little less verbose if we don't think of this as an extension/part of the WES client?

I find WES to be a little under-specified in some places, but I'm no expert and am pretty much going off the OpenAPI spec. If I'm wrong about any of my interpretations of what this URL should be and what should be contained there, let me know.

If we're assuming these are just text files available at either an absolute URL or one relative to the WES endpoint, can we just download that file? Do we need to bother with checking the WES-specified error codes and content types? I'm not sure we have any guarantees that the endpoint hosting the file will respond with the error codes/payloads that the WES client is expecting.

Not something we need to solve right now, but what's prompting me to consider a slightly less WES-endpoint-coupled approach is that most of our existing engines run in Batch, and we currently access their logs via Batch+Cloudwatch. Cloudwatch Logs don't have a standard URL scheme, but I suspect we'll (AGC) eventually want to extend this to allow for loading stdout/stderr from an S3 file or a Cloudwatch Log stream as well as via a URL.

For now, I'm just wondering if we can clean up some of this generated-ish code by just adding some plain-old file loading to packages/cli/internal/pkg/wes/Client.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not so much "inspired by" the OpenAPI-generated code as it is "a complete copy-paste of" the OpenAPI-generated code. I did that for consistency with the rest of the WES API implementation, since retrieving these logs is specified in WES (just in English that the code generator can't read). I also did it because I thought it would work better, faster, than me trying to write something from scratch.

If we don't care about consistency, we could streamline this to look a little more like a human wrote it.

It's not quite as simple as just downloading the file; we definitely need to visit prepareRequest, which does the signing to get the request through AGC's proxy, assuming the URL we got actually points to the server that gave it to us.

I left all the error code detection and error response parsing identical to the WES API request I copied from. If we get an error specified by WES, we'll understand it as specified by WES, and if we get some other error, we won't really understand it but we'll report it anyway. Probably nothing bad would happen if I cut the code, but I took leaving it in as the default.

This should eventually be extended to support S3 or CloudWatch URLs; it could even get AGC away from the | task names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer human-readable code to code that's consistent with the generated code, personally. Generated code is good in that it's generated, but not good for much else 😛

If these URLs being URLs on the WES server itself is common, then I'm fine with continuing to prepare requests and parse errors in the WES style.

Comment on lines 109 to 118
if localVarHTTPResponse.StatusCode == 403 {
var v ErrorResponse
err = a.client.decode(&v, localVarBody, localVarHTTPResponse.Header.Get("Content-Type"))
if err != nil {
newErr.error = err.Error()
return localVarReturnValue, localVarHTTPResponse, newErr
}
newErr.model = v
return localVarReturnValue, localVarHTTPResponse, newErr
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. It might be reasonable to expect the endpoint to reply with standard HTTP 403/etc. codes, but can we expect the response payload to be something the WES client would be able to decode?

Copy link
Contributor Author

@adamnovak adamnovak Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way Toil implements this is by having log-retrieval endpoints as Toil-specific extensions to the WES server, so when they return errors they will probably be WES-style errors (401, 403, 404, or 500, with a JSON body with msg and status_code). So if we do get such an error, we probably should parse it as laid out in WES, if we can.

If we get something else, we definitely need to still return an error, but I'm not sure what kind of error. Right now it looks like it would be undefined response type or some complaint about unmarshalling. Maybe when decoding fails, instead of returning the error decode raised (wrapped as a WES error), we should return something more like Error: the server said <a bunch of bytes the server sent>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that's reasonable if that's a common case for these URLs. But yeah, I would suggest that we be prepared to just output the raw response of the server in case the error isn't a WES error. Perhaps just printing that something failed in the normal case and printing the raw response from the server in a debug case?

return localVarReturnValue, localVarHTTPResponse, err
}

localVarBody, err := _ioutil.ReadAll(localVarHTTPResponse.Body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we concerned about using ReadAll to load a potentially-large log file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not overly concerned about file size here; even a very large workflow log is probably under a gigabyte.

Streaming would of course be better, but I didn't want to invest the time to learn, design and implement a good way to stream it here. It's going to the console, so when it starts to get too large for memory it starts to get too large to really be useful. But it's also not a good idea to let a very large file crash the client.

(Really we would want to be able to make HEAD and byte range requests or something, and be able to get just a manageable piece from the end of the log, and to let the user then ask for other parts of the log somehow, but I also didn't want to build out that whole system right now.)

Maybe implementing streaming wouldn't actually be more work than it's worth. I could revise the download code to be able to handle errors internally while streaming successful responses, and change the return type to whatever type Body is, and then where it's actually used figure out what to do with the stream (it looks like io.Copy can copy streams to streams?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusting this to return a stream, and passing the stream up to the CLI command implementation in logs_workflow.go, and using io.Copy there, breaks the tests, because the data is copied to the standard output of the test process, instead of wherever the output capturing machinery that makes the expectedOutput values in logs_workflow_test.go expects the output to be going. It looks like the test code allows only printLn to be used for output.

@adamnovak
Copy link
Contributor Author

I've revised this to make the WES client piece a little more straightforward, to handle non-WES-style errors from the server better, and to stream the log files line by line when printing them.

What I haven't yet done is tested this revised version against a real WES endpoint, to see if the logs really make it through end to end now. To do that, I have to get it into the same branch with my MFA feature and with the Toil engine that actually uses these URLs.

Is there a way to exercise the WES HTTP client code that's part of the existing test suite? Or a practical way to spin up a fake server under the test harness we're using here?

@adamnovak
Copy link
Contributor Author

@nbraid Now that this has the required number of approvals and passes all tests, can it be merged?

@elliot-smith elliot-smith merged commit 2a8fab0 into aws:main Mar 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants