Feature/azure container #75

AlexAxthelm · 2024-01-18T20:34:29Z

Contains infrastrucure to Build a docker image suitable to run on Azure Container Instances, as well as a deploy template.

Coincidentally:
Closes #3
Closes #17

config.yml

run_pacta_data_preparation.R

jdhoffa · 2024-01-19T06:56:53Z

I think I'm just out of sync from being away, or maybe you and CJ already discussed this in person, but could you ELI5 this in the PR description?

I'm having a hard time following what this does.

That said, will try to test it in my machine later today, I wanted to try to get data prep running anyway to make sure I can still do it 😀

run_pacta_data_preparation.R

cjyetman · 2024-01-19T07:56:10Z

I feel like completely swapping out one dependency for another ({rlog} to {logger}) should be in a separate PR, not one titled "Feature/azure container".

AlexAxthelm · 2024-01-19T09:29:32Z

@jdhoffa still a fair bit of testing to do on my side, but while it's cooking, I'll be writing docs for it.

jdhoffa · 2024-01-19T09:43:54Z

Sweet sounds good. Let me know when it's ready for a test run!

Also agree with CJ. Very happy to swap rlog with logger, but prob deserves it's own PR.

AlexAxthelm · 2024-01-19T10:06:11Z

Switching to logger available as #76. Basically the same changes as have been implemented here, but isolated.

AlexAxthelm · 2024-01-23T13:26:03Z

@cjyetman @jdhoffa Following the discussion in #73 I'm thinking that changing the strategy for this PR makes sense.

I'd been not trying to change the logic in run_data_preparation.R (just add logging), or touch config.yml too much, which is why there's two scripts run in this PR: copy_raw_data.R (in the ACI directory), and run_data_preparation.R. Where the first one copies the raw data files from Azure files to what is the "inputs" directory for run_data_preparation.R.

I'm now thinking that the way to address the "how do we track the source files" question might best be addressed by including the paths to the raw data files in the config (see example below), and including a copy_inputs option in the config, so that if we want to isolate what were the inputs for this particular set of outputs, we can.

Sketch:

Here's a (simplified) sketch of what I'm thinking:

#config.yaml
default:
  copy_inputs: false
  # these two replace the data_prep_inputs_path:
  factset_data_path: /inputs
  asset_impact_data_path: /inputs

2022Q4_CICD
  copy_inputs: true
  factset_data_path: "/mnt/factset-extracted/factset-pacta_timestamp-20221231T000000Z_pulled-20231221T195325Z"
  asset_impact_data_path: "/mnt/rawdata/AssetImpact"

# run_data_preparation.R
# Loading config...
masterdata_ownership_path <- file.path(asset_impact_data_path, masterdata_ownership_filename) #and friends
factset_financial_data_path <- file.path(factset_data_path, "factset_financial_data.rds") # and friends

# Do all the normal data_prep stuff

if (copy_inputs) {
  inputs_copy_path <- file.path(data_prep_outputs_path, "data_prep_inputs")
  dir.create(inputs_copy_path)
  file.copy(input_files, inputs_copy_path)
}

log_info("Done!)

jdhoffa · 2024-01-23T13:32:12Z

Sounds reasonable to me?

cjyetman · 2024-03-01T13:56:41Z

@AlexAxthelm is this superseded by one or more of the other recent PRs? can/should we close it?

cjyetman · 2024-03-07T08:45:57Z

@AlexAxthelm can this be closed now?

AlexAxthelm · 2024-03-07T11:51:44Z

Closing this, but not deleting the branch yet, since there's some elements from that that I'd like to bring into main, I just need to get some time to do that.

AlexAxthelm and others added 16 commits September 25, 2023 16:56

Add Dockerfile for Azure Container Instances

b843511

Pin package and FROM versions

c99da36

Separate Chrome Installation

797d711

Set working directory before copy, limit copy

7683c6b

Use docker build secrets to pass github auth

815110b

Update documentation, and do not leak secrets

658ac32

Update buildkit information

e65debc

Add deploy ARM Template

7dea5e5

Add infrastructure to copy files from rawdata to inputs

239ef62

Rearrange files

6f90760

WIP: deploy works until factset pull

1e2b176

Use DESCRIPTION for dependency management

9257d1a

Resolve dependency installation

f8f6dec

Wrap up file copy step

1f862ec

copy FS files

4d2dcdb

Merge branch 'main' into feature/azure-container

a23fc83

cjyetman reviewed Jan 18, 2024

View reviewed changes

config.yml Show resolved Hide resolved

cjyetman reviewed Jan 18, 2024

View reviewed changes

run_pacta_data_preparation.R Outdated Show resolved Hide resolved

AlexAxthelm added 7 commits January 18, 2024 21:48

convert from {rlog} to {logger}

b977903

Clean logging strings

845ca1a

add {glue} to dependencies

6d81834

Disable readr progress bar

264d9dc

Don't update factset data on CICD runs

879faac

Ensure output path exists

7d0c0d3

fix bad mount path

65b792c

cjyetman reviewed Jan 19, 2024

View reviewed changes

run_pacta_data_preparation.R Outdated Show resolved Hide resolved

AlexAxthelm added 2 commits January 19, 2024 10:52

disable object_name_linter

5b10fa7

improve creating output directory

d830af0

AlexAxthelm and others added 11 commits January 19, 2024 13:02

Merge branch 'main' into feature/azure-container

30c061d

prefer seq over x:y

9387905

Add DEBUG and TRACE logging

ea8dfb8

remove unused file

4222390

don't check for missing envvars, just read .env

4466032

Add pak options to not update sysreqs db

4d2dee1

Add tar to pack up files

676a778

Allow option to not create tar

1f0cd00

Set CRAN Repo in Rprofile.site

75d70b1

Increase memory available

468f1ff

Change to supporte GPU, update docs

acc9c0f

AlexAxthelm mentioned this pull request Jan 22, 2024

remove FactSet database code #73

Merged

cjyetman mentioned this pull request Feb 9, 2024

use explicit filenames for FactSet files #99

Merged

AlexAxthelm closed this Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/azure container #75

Feature/azure container #75

AlexAxthelm commented Jan 18, 2024

jdhoffa commented Jan 19, 2024

cjyetman commented Jan 19, 2024

AlexAxthelm commented Jan 19, 2024

jdhoffa commented Jan 19, 2024 •

edited

Loading

AlexAxthelm commented Jan 19, 2024 •

edited

Loading

AlexAxthelm commented Jan 23, 2024

jdhoffa commented Jan 23, 2024

cjyetman commented Mar 1, 2024

cjyetman commented Mar 7, 2024

AlexAxthelm commented Mar 7, 2024

Feature/azure container #75

Feature/azure container #75

Conversation

AlexAxthelm commented Jan 18, 2024

jdhoffa commented Jan 19, 2024

cjyetman commented Jan 19, 2024

AlexAxthelm commented Jan 19, 2024

jdhoffa commented Jan 19, 2024 • edited Loading

AlexAxthelm commented Jan 19, 2024 • edited Loading

AlexAxthelm commented Jan 23, 2024

Sketch:

jdhoffa commented Jan 23, 2024

cjyetman commented Mar 1, 2024

cjyetman commented Mar 7, 2024

AlexAxthelm commented Mar 7, 2024

jdhoffa commented Jan 19, 2024 •

edited

Loading

AlexAxthelm commented Jan 19, 2024 •

edited

Loading