Skip to content
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.

Extract data processing, cleaning and renaming #235

Closed
Clare2D opened this issue Oct 19, 2020 · 3 comments
Closed

Extract data processing, cleaning and renaming #235

Clare2D opened this issue Oct 19, 2020 · 3 comments
Assignees
Labels

Comments

@Clare2D
Copy link
Contributor

Clare2D commented Oct 19, 2020

As we update the data used by the tool, there are more and more challenges to ensure we've got the correct data sets, in the right file format and that they are pushed to Git. Part of this cleaning takes place here https://github.com/2DegreesInvesting/PACTA_analysis/blob/f4ba7c8083466545d5772e145c6514f81830acb9/web_tool_script_1.R#L33
however this could be extracted to a separate supporting script that is only run whenever updating the data and not included as an option in the code

To do:

  • create a new "clean_data" script including these functions
  • include tests to ensure that files have the correct contents (names and col types)
  • rename and extract the peers and index files and save to the correct location
  • save files as fst and ensure < 100MB so they can be pushed through pacta-data to git (inc ald_raw and ald_scen in web_script_2)
  • check all file encoding is correct.

Other tasks @jacobvjk

@cjyetman
Copy link
Member

just fyi... according to some experimentation I did here 2DegreesInvesting/pacta-data/issues/4#issuecomment-713395008, using FST may not give the file size improvements we need to achieve one of the goals here

@cjyetman
Copy link
Member

#247 is intended to address this issue, but it's been put on pause for now

@Clare2D Clare2D added the large label Nov 18, 2020
@cjyetman
Copy link
Member

cjyetman commented Dec 4, 2021

closing because this mostly unnecessary anymore

@cjyetman cjyetman closed this as completed Dec 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants