Allow users perform SQL over csv dataset without putting it into SQL database.
-
Put all the .csv files in to the SQL-on-the-fly/ folder.
-
Modify setup.sh to ensure you choose the correct python2.7 version according to your local environment
-
Make sure you have the newest version of all the Python packages listed in the requirements.txt
-
Change the 'flist' and 'nlist' in index.py to the .csv files you use.
-
Change the 'flist' and 'nlist' in disk.py to the .csv files you use.
-
Run setup.sh to create all supporting and indexing folders/files automatically.
./setup.sh
- Start the program and you don't need to wait for anything.
python myproject.py
-
Run the queries. The program will take a query statement and output the result and query time to the console. Some sample queries are listed below.
-
Exit the program.
exit
Use attribute of the csv file directly inside SELECT and WHERE clasure. For example:
SELECT review_id, stars, useful FROM review-1m WHERE useful > 20 AND stars >= 4
Use the abbreviation of the Table and the attribute of the csv file along with a '__' inside the query.
SELECT B__city, B__state, R__business_id, R__stars, R__useful FROM business B, review-1m R WHERE B__city LIKE "Champaign" AND B__state LIKE "IL" AND B__business_id = R__business_id
No abbreviation, just the name of the csv file (without '.csv'). For example:
FROM review-1m
Must include abbreviation after the name of the csv file. For example:
FROM review-1m R1, review-1m R2
No quotation mark on string. For example:
SELECT review_id, stars, useful FROM review-1m WHERE useful > 20 AND stars >= 4 AND city = Champaign
Join conditions go here. Conditions order matter. Please use your domain knowledge to manipulate the order. Attribtue must along with there name. For example:
WHERE B__city = Urbana
LIKE operation must be warpped into a quotation mark. For example:
SELECT B__city, B__state, R__business_id, R__stars, R__useful FROM business B, review-1m R WHERE B__city LIKE "Champaign" AND B__state LIKE "IL" AND B__business_id = R__business_id
NOTICE: [MODE 0] WHERE conditions in multi table join after the join: Numeric values and string need to be inside the quotation mark. Use [MODE 1] if you don't want to pay attention to the quotation mark. [MODE 0] will be slightly faster than [MODE1] since [MODE 0] will not need to tranfer data type.
- SELECT review_id, funny, useful FROM review-1m WHERE funny >= 20 AND useful > 30
Time: 0.261s
- SELECT name, city, state FROM business WHERE city = Champaign AND state = IL
Time: 0.082s
- SELECT B__name, B__postal_code, R__stars, R__useful FROM business B, review-1m R WHERE B__name = Sushi Ichiban AND B__postal_code = 61820 AND B__business_id = R__business_id
Time: 0.379s
- SELECT R1__user_id, R2__user_id, R1__stars, R2__stars FROM review-1m R1, review-1m R2 WHERE R1__useful > 50 AND R2__useful > 50 AND R1__business_id = R2__business_id AND R1__stars = '5' AND R2__stars = '1'
Time: 0.277s
- SELECT B__name, B__city, B__state, R__stars, P__label FROM business B, review-1m R, photos P WHERE B__city = Champaign AND P__label = inside AND B__state = IL AND B__business_id = P__business_id AND B__business_id = R__business_id AND R__stars = '5'
Time: 0.745s
- SELECT B__name, R1__user_id, R2__user_id, B__address FROM business B, review-1m R1, review-1m R2 WHERE R1__useful > 50 AND R2__useful > 50 AND B__business_id = R1__business_id AND R1__business_id = R2__business_id AND R1__stars = '5' AND R2__stars = '1'
Time: 0.444s
- SELECT * FROM photos
Time: 0.359s
- SELECT DISTINCT stars FROM review-1m
Time: 0.004s
- SELECT DISTINCT stars, useful FROM review-1m
Time: 7.694s
- SELECT review_id, stars, useful FROM review-1m WHERE useful > 20 AND stars >= 4
Time: 0.391s
- SELECT review_id, stars, useful FROM review-1m WHERE useful > 20 AND stars >= 4 - 0
Time: 0.208s
- SELECT review_id, stars, useful FROM review-1m WHERE useful > 10 AND (useful < 20 OR stars >= 4)
Time: 0.306s
- SELECT B__city, B__state, R__business_id, R__stars, R__useful FROM business B, review-1m R WHERE B__city LIKE "Champaign" AND B__state LIKE "IL" AND B__business_id = R__business_id
Time: 0.427s
- SELECT DISTINCT B__name FROM business B, review-1m R, photos P WHERE B__city = Champaign AND B__state = IL AND P__label = inside AND R__stars = 5 AND B__business_id = P__business_id AND B__business_id = R__business_id
Time: 6.765s
SELECT DISTINCT B__name FROM business B, review-1m R, photos P WHERE B__city = Champaign AND B__state = IL AND P__label = inside AND B__business_id = P__business_id AND B__business_id = R__business_id AND R__stars = '5'
Time: 0.745s