A Pandas DataFrame SQL Query Utility
- Free software: MIT license
- Documentation: https://df-query.readthedocs.io.
Pip install the package
$ pip install df_query
To use DFQuery in a project
from df_query import QueryContext
# by default the database will be stored in memory
dfq = QueryContext()
# OR if you want to persist you can specify a path on local disk
dfq = QueryContext('path/to/db.sqlite')
customer_data = {
'id':[1,2,3],
'name': ['Boris', 'Bobby', 'Judit']
}
order_data = {
'id':[1,2,3,4,5,6],
'customer_id':[1,2,1,3,3,1],
'amount':[10,20,32,12,33,25]
}
customer_df = pd.DataFrame(customer_data)
order_df =pd.DataFrame(order_data)
# now we create the views in the db
dfq.create_view(customer_df,'customers')
dfq.create_view(order_df,'orders')
# now we can construct the query and return another df
sql_query = """
select
c.name,
sum(o.amount)
from customers c
join orders o on o.customer_id = c.id
group by c.name
"""
# create a dataframe from the sql query
df = dfq.sql(sql_query)
print(df)
# delete database if stored on disk
df1.db_cleanup()
- use SparkSQL-like functinality with your pandas dataframes
- create temporary views and join them using SQL
- use in-memory sqliteDB or persist to disk
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.