The TidyTuesday project is originally from the rfordatascience community in which a new dataset is published in a tidy format to be easy to work with that the community can use. My original intention was to post a blog series every tuesday by taking the published dataset and addressed some question using the dataset with some thoughtful analysis; this repo would host all the work to make the blog posts.
Since then, my economic interests have drastically changed and my primary interest is not running the regressions that I planned on before. Still, I wanted to do something with this repo and another friend and I have always joked about running random regressions to see what kind of weird interactions show up.
The code is currently set up in a manner in which pulling data from fred requires the user to create a .env
file to store their Fred API key. For instruction on how to request a Fred API key, click here.
To setup a .env
file, simply create a new file in the same repository and name it .env
. You will need to separate all api key information by a newline; to include the Fred api key, add the line FRED_API_KEY=[your api key]
where you should replace the item in brackets (including the brackets themselves) by your fred api key.
- add toml files to talk about what packages are needed, there are some specific versions of pandas and numpy needed to make things work.
- Store and report the series information as well
- Pulling data from bloomberg
- Automating the latex report creation, user only has to edit one file to finish the report
- Script using
crontab
to automatically run reports each day- automatically commit and push daily runs of the main script
- Faster method to choose a random series from fred (iterating through all the categories each time from the root seems a bit extensive, could keep an ongoing list of series that I've tried and all available series and update this list periodically)
- Adding more features to the regression such as detrending, or other applied micro stuff that I just don't really know yet. Could ask some applied micro friends what they think.
- Pulling the same dataframe twice (or more), unlikely but still possible
- fix the virtual environment handling in the bash script
- Handle cases with too little usable merged data
- There still seem to be some edge case errors in the pulling random series (and iterating through a category tree), I've handled it so far by just retrying until something does work but this warrants another look.
reduce_format_dfs
currently only works for data pulled from the fred api just because of the way it's handling the formatting. On this note, a lot of the other function I suspect are not generalized as well. This is probably an issue to handle when adding other data sources- There are instances in which you get dataframes whose dates don't coincide with eachother at all, as of right now I am just re-pulling data until I get two compatible series but still need to find a cleaner way to address this