Skip to content

Latest commit

 

History

History
81 lines (71 loc) · 1.41 KB

3-LoadData.md

File metadata and controls

81 lines (71 loc) · 1.41 KB
  1. Load the flight dataset =======================
  • On bash: Create a HDFS directory

hadoop fs -mkdir /user/hive/warehouse/flight_data

  • On bash: Untar the flight dataset and load it to HDFS

tar -xzvf bdtc-hive/2008.tar.gz
hadoop fs -put 2008.csv /user/hive/warehouse/flight_data/2008.csv

  • On bash: Verify it got loaded

hadoop fs -ls /user/hive/warehouse/flight_data/

  • On hive shell: Create hive table, flight_data:

CREATE EXTERNAL TABLE flight_data(
   year INT,
   month INT,
   day INT,
   day_of_week INT,
   dep_time INT,
   crs_dep_time INT,
   arr_time INT,
   crs_arr_time INT,
   unique_carrier STRING,
   flight_num INT,
   tail_num STRING,
   actual_elapsed_time INT,
   crs_elapsed_time INT,
   air_time INT,
   arr_delay INT,
   dep_delay INT,
   origin STRING,
   dest STRING,
   distance INT,
   taxi_in INT,
   taxi_out INT,
   cancelled INT,
   cancellation_code STRING,
   diverted INT,
   carrier_delay STRING,
   weather_delay STRING,
   nas_delay STRING,
   security_delay STRING,
   late_aircraft_delay STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/user/hive/warehouse/flight_data';

  • On hive shell: Ensure that the create and load table were successful by running the following command:

SHOW TABLES;
SELECT
   *
FROM
   flight_data
LIMIT 10;