This repository holds code for retrieving and formatting datasets for use with the GUODA service. Right now the target is to take data and generated well- formed Spark dataframes and write them out as parquet files.
This is also a place to discuss what data should be made availible in GUODA and what format it should be in.