Skip to content

Latest commit

 

History

History
49 lines (33 loc) · 4.06 KB

README.md

File metadata and controls

49 lines (33 loc) · 4.06 KB

columnstore-innodb-monetdb

Context

This project was part of a university assignment with the following goals:

  • Compare MariaDB ColumnStore, InnoDB and MonetDB in terms of performance
  • Check ease of installation
  • Check ease of use (how easy are queries to execute?)
  • Powerfulness of the language (which applications are particularly well supported?)
  • How can certain queries be executed particularly fast?
  • Extended summary of the above questions in word document (incl. source code)

Of the above, only the performance timings are included in this README - for the other topics please refer to the Word document (available in English and German).

The flight data from the U.S. Bureau of Transportation Statistics was used as the original data basis.

For this project it had to be manually downloaded for each month as well as cleaned and merged using custom python code (see ./Python code).

All necessary SQL commands to connect to the DBMS, create a table, import the data and run the queries can be found in the folder ./SQL code.

The cleaned data has been uploaded on Kaggle. Its size is 2.7GB uncompressed / 377MB compressed.

Also take a look at the sources used throughout this project in ./sources.md.

Performance

The following timings were taken on Windows 10 running a virtual machine via VMWare Workstation 16 running CentOS 8 with 4 CPU cores (Intel i7 9700K), 8GB DDR4 RAM as well as 50GB fixed storage (Samsung 970 EVO NVMe M.2).

CSV Import timings

Engine 36M records 5.6M records 500K records Method
Columnstore 00m 17s 4.2s 1.1s Columnstore cpimport
MonetDB 01m 07s 15s 1s MonetDB CSV Bulk Loads
InnoDB 16m 40s 2m 37s 13s LOAD DATA INFILE

Query timings (warm)

All queries were performed multiple times in a row (except for InnoDB) to make most use of the runtime optimization strategies that the DBMS employ.

Query statement Description MonetDB Columnstore InnoDB
select * from <table> where <column> = <value> limit 10; select where 0.06s 0.1s 0.02s
select count(*) from <table> where <column> = <value>; count where 0.04s 0.6s 0.02s
select count(*) from <table>; simple count 0.003s 0.3s 5m 8s
select avg(DISTANCE) as AVERAGE from <table> aggregation 0.03s 0.6s 5m 38s
select * from <table> where CRS_DEP_TIME > '10:00' and CRS_ARR_TIME < '15:00' and AIR_TIME > 300 order by AIR_TIME desc limit 10; complex where 0.1s 0.8s 6m 6s
select <columns>, FLOOR(DEP_DELAY_NEW/60) as DELAY_IN_HOURS from <table> order by DELAY_IN_HOURS desc limit 10; TOP10 delayed 0.3s 8s 6m 33s