This is very early stage library to simplify working with pandas for common EBMDataLab operations
Install the package as you usually would, e.g.
pip install ebmdatalab
To install upgrades you should use
pip install ebmdatalab --upgrade
This will save the results of the SQL query as a CSV, and when it's
run again, as long as the SQL hasn't changed, load that CSV rather
than querying BigQuery again. Use a .zip
, .gz
, .bz2
or .xz
file extension to store the cache in a compressed format, or .csv
for uncompressed:
from ebmdatalab import bq
sql = "SELECT * FROM ebmdatalab.hscic.bnf"
df = bq.cached_read(sql, csv_path='bnf_codes.zip') # add `use_cache=False` to override
df.head()
To access data in BigQuery, users will need to have the credentials of a service account with "BigQuery Guest" and "BigQuery Dataset Guest" roles. You can do that here.
- Click "+ CREATE SERVICE ACCOUNT", choose a name for the account (eg "Rich Croker service account"), and add the "BigQuery Guest" and "BigQuery Dataset Guest" roles.
- Select the newly created account, click "KEYS", "ADD KEY", "Create new key", "JSON", and "CREATE". This will download the credentials.
If using this library through a notebook derived from the datalab-jupyter image, the credentials file should be renamed to bq-service-account.json
and moved to the repo root.
Otherwise, EBMDATALAB_BQ_CREDENTIALS_PATH
should be set to the path of the credentials file.
See the examples/
directory for:
- Logistic regression
- CCG maps
- Deciles charts
This project uses flit
for bundling and publishing. Publish thus:
flit publish
To install a package locally for development, install with a symlink so you can test changes without reinstalling the module:
flit install --symlink
1. Clone repository to local drive e.g. through GitHub desktop.
2. Open Anaconda command prompt by right-clicking and selecting Run as administrator
.
- Type
pip list
to check thatebmdatalab
is installed- If it is, uninstall using
pip uninstall ebmdatalab
- If it is, uninstall using
- Install
flit
: check if already installed using e.g.flit help
, if not, typepip install flit
- Change directory to work in same location as the repo e.g.
>cd C:\Users\hcurtis\Documents\GitHub\datalab-pandas
- Install
symlink
:flit install --symlink
3. Make changes to the package .py
files as required
- Edit code e.g. via Jupyter notebook
- If you think this change should be incorporated into its own release, open
__init__.py
and increase the version number
4. Test the changes
- To avoid having to restart the kernel every time you make a change, add the following commands in your notebook to tell the kernel to update its reference to the package (e.g. for
charts
):
import importlib
from ebmdatalab import charts
importlib.reload(charts)
5. Push changes
- Open GitHub desktop and you should see your changed files.
- Create a branch rather than commiting to
master
. - Describe and commit changes.
- Make a pull request to merge branch with master (select from dropdown menu under
Branch
). - This will take you to GitHub where you need to click
Create pull request