Skip to content

Allo users perform SQL over csv dataset without putting it into SQL database.

Notifications You must be signed in to change notification settings

yayachenyi/SQL-on-the-fly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL-on-the-fly

Allow users perform SQL over csv dataset without putting it into SQL database.

Run the program:

  1. Put all the .csv files in to the SQL-on-the-fly/ folder.

  2. Modify setup.sh to ensure you choose the correct python2.7 version according to your local environment

  3. Make sure you have the newest version of all the Python packages listed in the requirements.txt

  4. Change the 'flist' and 'nlist' in index.py to the .csv files you use.

  5. Change the 'flist' and 'nlist' in disk.py to the .csv files you use.

  6. Run setup.sh to create all supporting and indexing folders/files automatically.

./setup.sh
  1. Start the program and you don't need to wait for anything.
python myproject.py
  1. Run the queries. The program will take a query statement and output the result and query time to the console. Some sample queries are listed below.

  2. Exit the program.

exit

Query Instruction and Formatting

SELECT basics

Single table

Use attribute of the csv file directly inside SELECT and WHERE clasure. For example:

SELECT review_id, stars, useful FROM review-1m WHERE useful > 20 AND stars >= 4

Multi table

Use the abbreviation of the Table and the attribute of the csv file along with a '__' inside the query.

SELECT B__city, B__state, R__business_id, R__stars, R__useful FROM business B, review-1m R WHERE B__city LIKE "Champaign" AND B__state LIKE "IL" AND B__business_id = R__business_id

FROM basics

Single table

No abbreviation, just the name of the csv file (without '.csv'). For example:

FROM review-1m

Multi table

Must include abbreviation after the name of the csv file. For example:

FROM review-1m R1, review-1m R2

WHERE basics

Single table

No quotation mark on string. For example:

SELECT review_id, stars, useful FROM review-1m WHERE useful > 20 AND stars >= 4 AND city = Champaign

Multi table

Join conditions go here. Conditions order matter. Please use your domain knowledge to manipulate the order. Attribtue must along with there name. For example:

WHERE B__city = Urbana

LIKE operation must be warpped into a quotation mark. For example:

SELECT B__city, B__state, R__business_id, R__stars, R__useful FROM business B, review-1m R WHERE B__city LIKE "Champaign" AND B__state LIKE "IL" AND B__business_id = R__business_id

NOTICE: [MODE 0] WHERE conditions in multi table join after the join: Numeric values and string need to be inside the quotation mark. Use [MODE 1] if you don't want to pay attention to the quotation mark. [MODE 0] will be slightly faster than [MODE1] since [MODE 0] will not need to tranfer data type.

Demo queries:

  1. SELECT review_id, funny, useful FROM review-1m WHERE funny >= 20 AND useful > 30

Time: 0.261s

  1. SELECT name, city, state FROM business WHERE city = Champaign AND state = IL

Time: 0.082s

  1. SELECT B__name, B__postal_code, R__stars, R__useful FROM business B, review-1m R WHERE B__name = Sushi Ichiban AND B__postal_code = 61820 AND B__business_id = R__business_id

Time: 0.379s

  1. SELECT R1__user_id, R2__user_id, R1__stars, R2__stars FROM review-1m R1, review-1m R2 WHERE R1__useful > 50 AND R2__useful > 50 AND R1__business_id = R2__business_id AND R1__stars = '5' AND R2__stars = '1'

Time: 0.277s

  1. SELECT B__name, B__city, B__state, R__stars, P__label FROM business B, review-1m R, photos P WHERE B__city = Champaign AND P__label = inside AND B__state = IL AND B__business_id = P__business_id AND B__business_id = R__business_id AND R__stars = '5'

Time: 0.745s

  1. SELECT B__name, R1__user_id, R2__user_id, B__address FROM business B, review-1m R1, review-1m R2 WHERE R1__useful > 50 AND R2__useful > 50 AND B__business_id = R1__business_id AND R1__business_id = R2__business_id AND R1__stars = '5' AND R2__stars = '1'

Time: 0.444s

Sample queries:

  1. SELECT * FROM photos

Time: 0.359s

  1. SELECT DISTINCT stars FROM review-1m

Time: 0.004s

  1. SELECT DISTINCT stars, useful FROM review-1m

Time: 7.694s

  1. SELECT review_id, stars, useful FROM review-1m WHERE useful > 20 AND stars >= 4

Time: 0.391s

  1. SELECT review_id, stars, useful FROM review-1m WHERE useful > 20 AND stars >= 4 - 0

Time: 0.208s

  1. SELECT review_id, stars, useful FROM review-1m WHERE useful > 10 AND (useful < 20 OR stars >= 4)

Time: 0.306s

  1. SELECT B__city, B__state, R__business_id, R__stars, R__useful FROM business B, review-1m R WHERE B__city LIKE "Champaign" AND B__state LIKE "IL" AND B__business_id = R__business_id

Time: 0.427s

  1. SELECT DISTINCT B__name FROM business B, review-1m R, photos P WHERE B__city = Champaign AND B__state = IL AND P__label = inside AND R__stars = 5 AND B__business_id = P__business_id AND B__business_id = R__business_id

Time: 6.765s

SELECT DISTINCT B__name FROM business B, review-1m R, photos P WHERE B__city = Champaign AND B__state = IL AND P__label = inside AND B__business_id = P__business_id AND B__business_id = R__business_id AND R__stars = '5'

Time: 0.745s

About

Allo users perform SQL over csv dataset without putting it into SQL database.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published