Not Understanding Upload Code #151

gbdubs · 2024-01-20T18:42:57Z

Lines 468 to 484 in b743bc9

    
           dirEntries, err := os.ReadDir(reportDir) 
        
           if err != nil { 
        
           	return fmt.Errorf("failed to read report directory: %w", err) 
        
           } 
        
           var artifacts []*task.AnalysisArtifact 
        
           for _, dirEntry := range dirEntries { 
        
           	if !dirEntry.IsDir() { 
        
           		continue 
        
           	} 
        
           	dirPath := filepath.Join(reportDir, dirEntry.Name()) 
        
           	tmp, err := h.uploadDirectory(ctx, dirPath, h.reportContainer) 
        
           	if err != nil { 
        
           		return fmt.Errorf("failed to upload report directory: %w", err) 
        
           	} 
        
           	artifacts = tmp 
        
           }

Three questions about this code - there are some fundamental things I'm not understanding:
1 - Won't it miss files that directly live in the reportDir, if any exist?
2 - Why do we list dirEntries when we already do a walk of the dirs in uploadDirectory? I.e. couldn't this section of code just be replaced with a single call to uploadDirectory (which would fix 1 if that's correct)
3 - Doesn't this code only return the artifacts from the last dirEntry, overwriting anything prior to the last?

@bcspragu Would you mind helping me understand this?

The text was updated successfully, but these errors were encountered:

bcspragu · 2024-01-20T19:12:59Z

1 - Won't it miss files that directly live in the reportDir, if any exist?

That would indeed be the case, this is exploiting the fact that we know how the RMI scripts work (see here), and that they put all the report artifacts in a portfolio_id directory:

https://github.com/AlexAxthelm/stub_PACTA_containers_for_app/blob/190763cba67bafdf666f56eb34693da728fea21b/report_runner/create_report.R#L52-L66

One could argue this is part of the API contract.

2 - Why do we list dirEntries when we already do a walk of the dirs in uploadDirectory? I.e. couldn't this section of code just be replaced with a single call to uploadDirectory (which would fix 1 if that's correct)

Similar answer to above - We expect a certain structure, so it's two steps:

List directories (e.g. look for created report directories)
For each directory that we found, upload the assets

We intentionally don't upload things like /mnt/reports/<random file>, because that's not something we expect to be there.

3 - Doesn't this code only return the artifacts from the last dirEntry, overwriting anything prior to the last?

Indeed, we currently assume there's only one report (e.g. artifacts = tmp), but I don't know if that's an invariant we plan on holding (probably?). I'm not sure if I thought about this all that much when I wrote it, should probably be made more explicit/opinionated (e.g. only one report dir, or N and we preserve them all correctly)

gbdubs · 2024-01-20T19:18:10Z

OK this makes sense - I think we need to talk with the RMI team about this - because we already have situations where we're running processes for multiple logical portfolios (PortfolioGroups, Initiatives), but I don't know whether the outputs there are 1:1 with inputs or Many:1 (which was my assumption).

hodie · 2024-01-22T14:55:18Z

@AlexAxthelm @cjyetman to chime in!

cjyetman · 2024-01-22T15:39:52Z

Not sure I totally understand what we're talking about here, but... if we're talking about the directory where all the output files are for the interactive report, I would think it would be ideal for any file that ends up in there be transferred to wherever it needs to go to be accessible to the user when viewing their report, because it's not necessarily a fixed set of files, e.g. some initiatives require certain files for their reports that are not required for "GENERAL" reports, and I can foresee wanting to add a new plot or something that might require new files and it would be ideal that doing so would not require changing the server code. Am I talking about the same thing?

AlexAxthelm · 2024-01-24T20:11:46Z

they put all the report artifacts in a portfolio_id directory

This is an artifact of how we've been doing things on our previous system. I don't think we're married to this, if it would make life easier on your side. On the previous system, the files were organized in a user-level directory that get's mounted to the docker container, so the portfolio_id directory in reports is necessary to distinguish reports for different portfolios.

Simplified current schematic

User123
├── Portfolio
│   ├── 1.csv
│   └── 2.csv
├── Results
|   ├── 1
|   |   └── results.rds
|   └── 2
|       └── results.rds
└── Report
    ├── 1
    |   ├── foo.json
    |   └── index.html
    └── 2
        ├── foo.json
        └── index.html

If it would be better to have something different, I'm totally open to that. In fact somethin like this would align well with what we're thinking on our side for how we want to change the directory structure (cc @cjyetman @jdhoffa @hodie ):

Possible schematic

portfolio_1
├── 1.csv
├── results.rds
└── Report
    ├── foo.json
    └── index.html
portfolio_2
├── 2.csv
├── results.rds
└── Report
    ├── foo.json
    └── index.html

The only issue I'm seeing is in how this relates to #106, and if we need ot have a unique ID of some kind on the report directory, but if that's not a problem from your side, then it probably isn't from ours either.

AlexAxthelm · 2024-01-24T20:14:27Z

but I don't know whether the outputs there are 1:1 with inputs or Many:1 (which was my assumption).

The outputs right now are 1:1 (1 portfolio to 1 report) but that's because the current system does the grouping before it hits the PACTA code, and gives it to us as a single file. I think we're comfortable with the idea of doing a many:one mapping, along the lines of:

portfolio_group_xyz
├── 1.csv
├── 2.csv
├── group_definitions.json
├── results.rds
└── Report
    ├── foo.json
    └── index.html

where group_definitions.json is as described in #82

gbdubs · 2024-02-16T14:55:38Z

@bcspragu given the above, I think we probably want to change this code inline with Alex's suggestion - confirming that matches your expectations, since you've been doing the coupling w/ the docker image?

gbdubs assigned bcspragu Jan 20, 2024

gbdubs added the documentation Discussion or improvements to documentation label Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not Understanding Upload Code #151

Not Understanding Upload Code #151

gbdubs commented Jan 20, 2024

bcspragu commented Jan 20, 2024

gbdubs commented Jan 20, 2024

hodie commented Jan 22, 2024

cjyetman commented Jan 22, 2024

AlexAxthelm commented Jan 24, 2024 •

edited

Loading

AlexAxthelm commented Jan 24, 2024 •

edited

Loading

gbdubs commented Feb 16, 2024

Not Understanding Upload Code #151

Not Understanding Upload Code #151

Comments

gbdubs commented Jan 20, 2024

bcspragu commented Jan 20, 2024

gbdubs commented Jan 20, 2024

hodie commented Jan 22, 2024

cjyetman commented Jan 22, 2024

AlexAxthelm commented Jan 24, 2024 • edited Loading

AlexAxthelm commented Jan 24, 2024 • edited Loading

gbdubs commented Feb 16, 2024

AlexAxthelm commented Jan 24, 2024 •

edited

Loading

AlexAxthelm commented Jan 24, 2024 •

edited

Loading