This Programming Assignment involves implementing the Sort Application using 3 different approaches:
- Shared Memory Sort.
- Apache Hadoop.
- Apache Spark.
The Assignment Directory contains following documents and folders:
- Source Code of the program for Terasort on Hadoop, Spark adn Shared Memory - Source Code
- Performance Evaluation Report - prog2_report.pdf
- Snapshots of outputs running on Amazon AWS- Snapshots
- Configuration files of Hadoop and Spark - Config files
STEPS FOR EXECUTION:
SHARED MEMORY:
-
Navigate to the Folder which contains the Source Code.
-
Once landed in the Folder, execute the accompanying Commands:
Gathering: javac SharedMemoryTera.java
Execution: java SharedMemoryTera
So as to execute the Module on AWS, play out the accompanying advances:
-
Go to Amazon Web Services (AWS).
-
Launch an AWS Instance and pick "Linux Ubuntu AMI".
-
Perform the Compilation and Execution Commands as expressed previously.
APACHE HADOOP:
-
First of all, we need to introduce Apache Hadoop by executing the Script.
-
Once Apache Hadoop is introduced effectively, play out the accompanying advances:
I) Execute "gensort".
ii) Execute "TeraByteSorting.java".
iii) Execute "valsort".
APACHE SPARK:
-
First of all, we need to introduce Apache Spark by executing the Bash Script.
-
The Bash Script will introduce Apache Spark on the Amazon Cluster.
-
Once Apache Spark is introduced effectively, play out the accompanying advances:
I) Execute "gensort" and take the "input" document.
ii) Transfer File where the information is arranged for the gensort
iii) Execute "pyTeraSort.py".
iv) Execute "valsort".